1. Introduction
In the modern digital era, an immense volume of data is stored in the network, and most of these data exist in the form of text. However, there is a common ambiguity in text language, which means that one word has multiple meanings, and multiple words have one meaning. For example, “Li Na” may mean a singer, tennis player, or artist. The task of entity linking aims to accurately link entity mentions in textual information with the existing entity information in the knowledge base. Therefore, entity linking is a fundamental task in wide domains, such as knowledge question-answering [
1,
2,
3], information extraction [
4,
5,
6], and knowledge base expansion [
7,
8,
9].
The task of entity linking consists of three stages. The first is entity mention detection, which detects entity mentions in a given text. The second is candidate entity detection, which matches a set of candidate entities for each entity mentioned in the knowledge base. Finally, entity disambiguation is conducted by calculating the correlation score between entity mentions and candidate entities and selecting the entity with the highest score as the final disambiguation result. In entity-linking research, researchers improve the relevance between entity references and candidate entities with information about potential relationships between entities and entity types.
Ganea et al. [
10] proposed an approach to capture information about the context of the entity that is related to candidate entities in the local association of entities through a soft-attention mechanism. Meanwhile, Yang et al. [
11], inspired by Ganea et al. [
10], used the same computational approach, with the difference that they combined hard and soft-attention mechanisms to extract information about entities that have been linked successfully. The dynamic accumulation technique is proposed to accelerate the reasoning of subsequent entity mention disambiguation using dynamically accumulated entity information, which provided a large improvement in time and space. Le et al. [
12] proposed to consider potential relationships between entities as hidden variables and optimize entity links using holistic optimization under unsupervised conditions. Cao et al. [
13] generated entity representations using autoregression by training a large amount of data with the help of the entity information provided by Wikipedia, effectively extracting associations between contexts and entities. Hou et al. [
14] injected the fine-grained semantic information into the embedding of entities to improve the entity information and bring similar types of entities closer to the embedding space. The existing entity linking methods primarily focus on the data from the text requiring disambiguation, the entity type, and the designation of the entity, neglecting to pay attention to candidate entities, which leads to insufficient structural information between entities. More semantic features can be injected into the model by combining the description information of entities with the association information between different entities.
Meanwhile, global entity linking performs joint disambiguation of all occurrences of entities in the text. Unlike local entity disambiguation, global entity linking not only focuses on the similarity between individual entity mentions and candidate entities but also considers the interrelationships and semantic consistency among all entity mentions in the whole text. In the traditional global entity linking approach, researchers have embedded the relationships between entities and entities in association graphs to improve the accuracy of entity linking through knowledge aggregation, strong association information transfer, and stable knowledge representation. Cao et al. [
15] applied the CNN network [
16] to integrate local context information and coherent information of entity neighboring subgraphs for global entity linking and selected the most relevant information for propagation. Wu et al. [
17] proposed entity mapping between disambiguated documents and knowledge graphs through knowledge aggregation of entity nodes to address the problem of adaptive changes in each aggregation layer when dealing with entity relations in GCNs [
18]. Shikhar et al. [
19] jointly embedded entity nodes and entity relations into a relation graph to learn more about knowledge representations. Guo et al. [
20] used a stable distribution representation of entity semantic information after restarting random walking to capture infrequent entity semantic features to improve the learning effect further. The above global entity linking approach using graph models emphasizes the relationship between candidate entities and how their decision-making is influenced within a shrinking scope. However, the lack of entity description information in the global entity linking approach causes the implied relationships between entities to be insufficiently extracted. At the same time, erroneous target entities propagate incorrect correlation information during the random walking process [
21], causing highly correlated entities to be greatly affected, further reducing the degree of discrimination within the candidate entities.
To solve the limitations above, this paper proposes a novel model for entity linking based on a cascade attention mechanism and a dynamic graph. Specifically, we first design a cascade attention-based model to capture richer semantic information about entity mentions and candidate entities by modeling their context, increasing the structural information between entity mentions and candidate entities, and enhancing the relevance of embedded semantic features between entities. Second, an entity association graph is constructed to represent the semantic associations between candidate entities in the same document so that the enhanced entity dependency information can be propagated in the graph as an aid to entity decision, and the implicit relationships between entities can be completely extracted. After obtaining the global values of candidate entities, for the set of candidate entities where incorrect dissemination of entity information results in low discrimination of candidate entity values, we introduce entity entropy to select the target entity that is currently more reasonable.
The contributions of this paper are summarized as follows:
The introduction of external candidate entity description information is proposed to improve the semantic representation of candidate entities and increase the semantic relevance of candidate entities and entity mentions.
Establish a local entity linking model based on a cascade attention mechanism so that the model internally focuses on the entity context and interaction information of candidate entities, which improves the characterization of entity embedding and thus further improves the inference accuracy of the model.
Construct a global entity association graph, use the random walking strategy of the graph to learn the global evaluation indexes of the candidate entities, introduce the entropy value of the entities to select the best target entities, and dynamically realize the linking work of all entity mentions.
2. Related Work
The entity linking technique consists of two main phases: local entity linking and global entity linking. As follows: given a document D, which contains entity mention context , a set of entity mentions , and a set of the corresponding set of candidate entities for each entity mention . The initial score of each candidate entity is obtained by linking local entities, after which global entity linking is performed using the association characteristics between the entity mentions, and the highest ranked entity is used as the final target entity.
The local entity linking disambiguates a single entity based on the context in which the entity is mentioned and the semantic features of the candidate entity. Local entity linking uses several features such as (1) the prior probability of an entity mention,
as the empirical distribution of entity mentions, (2) the semantic similarity between the candidate entity and the entity mention context, and (3) the relevance of the entity mention and the candidate entity. Early local entity linking Francis-Landau et al. [
22] used convolutional neural networks with different granularities to obtain semantic correspondences between contexts and target entities, and combined them with topic information to achieve entity linking in a reasonable way, which improves the results of Durrett et al. [
23], who used only documents, entity mentions, and target entity mentions using coarse- and fine-grained convolutional contexts to obtain complete information, which does not provide complete information. Ganea et al. [
10], in order to address the limitations of mention-entity graphs for long document entity linking, introduced a neural attention mechanism of entity embeddings and context windows to extract the most relevant context vectors of the Top
for each entity mention, which is then combined with features such as prior probability to obtain a local score
. Hou et al. [
14] proposed that most of the existing entity embeddings are learned from sci-fi encyclopedias and contexts of entities which lack contextual generality, and to solve this problem, embeddings of semantically-typed words are introduced. Chen et al. [
24] argued that the potential entity type near the entities mentioned is ignored. The injection of information about the potential entity type into the entity embedding model constructed by Bert enhances the entity linking effect to solve this problem. Due to the limitation of local entity linking, the correlation between entities is not considered, so the researchers proposed the global entity linking method.
The global entity linking technique exploits the global consistency of all entities in a document to capture the correlation between entity mentions and the target entity, and to use candidate entity information mentioned by other entities to help link the current entity. Currently, a more efficient approach to global entity linking is to construct an entity connectivity graph [
25], which enables entity linking by calculating the similarity between two nodes. Ganea et al. [
10] use CRF [
26] to train predictions of semantic relationships between entity embeddings and employ a collection of circular confidence propagation phases to score candidate entities in LBP [
27] messaging, which is ultimately used to score candidate entities using the prior probability and marginal distribution to obtain the final score of the candidate entity. Le et al. [
12] build on Ganea by considering the relationship between different entity mentions in the same document as a latent variable and employing it unsupervised to induce the relationship to obtain the similarity score between entities
. Xue et al. [
28] propose that most existing decisions on entity linking rely on neural networks for automated modeling; therefore, the authors first use a recurrent random walk network to obtain candidate entity correlations for global entity mentions while introducing an external knowledge base to enhance the correlation dependency of decisions between entity mentions. To model the semantic links between candidate entities in the same document, Hu et al. [
29] propose to construct a graph structure based on the relationships between entities and candidate entities. They use graph convolutional networks (GCNs) to generate encodings that represent the global semantics of a document. This approach aims to overcome the lack of effective semantic encoding when global entity linking is traditionally performed using Conditional Random Fields (CRF). Rama Maneiro et al. [
30] filtered the set of candidate entities for each entity mentioned based on context compatibility, used a two-way graph expansion strategy to reduce the time used to construct the entity linkage graph, and finally disambiguated by the importance of the candidate entity nodes in the linkage graph and the semantic relevance of the candidate entity nodes to the entity mentions. Hou et al. [
31] proposed linking anonymous entity mentions to corresponding entities in the knowledge base to resolve the ambiguity of entity mentions. Zhang et al. [
32] proposed two adaptive features to train the weights of the entity context words and the types of valid entities through feedforward neural networks to capture the latent semantic information and the valid entity type information to improve the linking effect.
In current entity-linking research, most scholars focus mainly on improving the semantic embedding of entity-type embedding or entity context, resulting in insufficient information about candidate entities. In addition, after constructing the entity association graph, a random walking strategy is usually used to query the nodes to be linked from a single vertex; however, this approach fails to effectively deal with the problem of low discrimination of scores within candidate entities. Logeswaran et al. [
33] used the Transformer architecture to encode text to allow entity mentions and candidate entities to focus on each other. Broscheit [
34] used Transformer to enhance the representation of entities, and the architecture showed advanced performance on entity linking tasks. Therefore, this study employs the Transformer’s encoder to process the Token in the embedding module to extract more relevant semantic features. After constructing the entity association graph, we introduce an improved random-walk strategy that focuses on key nodes in the graph for dynamic entity link prediction. This approach aims to improve the accuracy and efficiency of entity linking.
4. Experiments and Results
4.1. Datasets
In this paper, the English Wikipedia knowledge base dump is used as a reference knowledge base. To validate our proposed model, six public English datasets (AIDA-CONLL, MSNBC, AQUAINT, ACE2004, WNED-WIKI (WIKI), WNED-CWEB (CWEB)) are used.
Table 1 reflects the details of the dataset, where Docs denotes the number of documents, Mentions denotes the number of entity mentions in this dataset, and MPD denotes the average number of entity mentions per document.
AIDA-CONLL is a relatively early entity linking datasets manually annotated by Hoffart et al. [
41]. Where AIDA-train is used for training, AIDA-A and AIDA-B are used for validation and testing, respectively.
AQUAINT, ACE2004, and MSNBC are datasets collected and organized by Guo and Barbosa et al. [
20], which contain a total of 106 documents.
CWEB and Wiki are larger data automatically extracted from Clue Web and Wikipedia and each dataset contains 320 documents.
4.2. Evaluation Indicators
Our model aims to match real entities from the knowledge base for entity mentions in the text. First, by calculating the similarity of the local features of the candidate entities and entity mentions, the selected K entities enter the global feature similarity matching phase. Finally, the linking work of all entity mentions is gradually achieved through information propagation. In the entity mentioned matching scenario, we evaluate the results in terms of accuracy
, recall ratio
and
value.
where
represents the total of correctly linked entities,
represents the total of incorrectly linked positive samples, and
represents the total of incorrectly linked negative samples.
4.3. Parameter Setting
In this paper, we use Word2vec [
42] to pre-train word embeddings, and the common parameters we follow the same settings as Ganea and Hofmann et al. [
10]: the dimension of word embeddings is d = 300, the entity mention context intercepts 10 words to its left and right, in other words, s = 20, and the top seven ranked entity options are selected from the knowledge base and added to the set of candidate entities of entity mentions
; for the two encoder layers, the number of attentional heads is set to six, the encoder layer to four, and the vector size of the attention to 50, and the size of the hidden layer and the feed-forward neural network layer are set to 300 and [300, 600] respectively.
For the candidate entity description length D, path length Q, and the number of random walking layers K, we optimize the hyperparameters by performing a grid search on both AIDA-A and AIDA-B datasets. More specifically, the range of hyperparameters for the grid search is as follows: candidate entity description document D = {90, 100, 110, 120, 130, 140, 150, 160, 170, 180}, path length Q = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, and random walking layers K = {1, 2, 3, 4, 5, 6, 7, 8}. The hyperparameters are established by the performance of the F1 scores in both the AIDA-A and AIDA-B datasets. After the analysis of the experimental results, we set D = 160, Q = 6, and K = 5.
4.4. Experimental Results and Analyses
4.4.1. Comparative Experiments
In order to better compare the performance of the model ground in this paper, the following existing state-of-the-art method is used as a baseline method to observe and compare the experimental results.
- ▪
Deep-ED (Ganea and Hofmann, 2017) [
10]: The approach proposes a novel local ED model that exploits the neural attention mechanism of the local context window to obtain the optimal entity context representation, and uses LBP to convey information about the linking decisions of different entity mentions.
- ▪
WNED (Guo and Barbosa, 2018) [
20]: The method uses a trained stable distribution to represent the features of entities as well as documents, thus better capturing information about uncommon entities.
- ▪
Ment-norm (Titov and Le, 2018) [
12]: This method builds on Deep-ED by proposing to learn potential relationships between entity mentions unsupervised, assisting in the decision-making of other entity mentions.
- ▪
DGCN (Wu and Zhang, 2020) [
17]: The method uses a dynamic GCN architecture to aggregate knowledge at dynamically linked nodes, capturing structural information between entities.
- ▪
DOC-AET (Hou and Wang, 2022) [
14]: This method computes the correlation between candidate and anonymous entity mentions using anonymous entity embedding (AET) and the context of the entity.
- ▪
Adaptive-Feature (Zhang and Chen, 2022) [
32]: The method utilizes two adaptive features to capture potential semantic information and valid entity type information, to identify uncertain entity type information, and to enhance semantic as well as type associations with identified types of entities.
Table 2 analyzes the performance of the SOTA method and our method on the AIDA-B dataset, both in terms of local and global models. From this Table, we can see that our local model outperforms the state-of-the-art method, improving by 2.72% over the lowest score. It should be noted that both our local model and the other six compared models use statistical features as local scores, the difference being that we use a bilayer encoder to extract richer semantic features with respectable results. When comparing the F1 scores of the global models, our model achieved the best result except for the Adaptive-Feature (2022). To further validate the effectiveness of the model, we have also reported our analysis on five additional datasets.
Table 3 shows the micro-F1 scores of the SOTA approach and our approach on the five public datasets. Compared to approaches such as anonymous entity embedding and introducing entity types, our entity linking approach using cascading attention joint dynamic graphs performs better on MSNBC, AQUAINT, and ACE2004, achieving the best F1 score of 87.25%. Our results are slightly lower than WNED (2018) and DGCN (2020) in the CWEB and WIKI datasets, which are automatically extracted, and the datasets themselves are more noisy, so all SOTA models generally have lower scores in these two datasets compared to the other three datasets. In addition, WNED (2018) used a subset of the dataset for training and tuned the overall performance. In terms of average scores, we are also 1.35% higher than the previous best, validating the effectiveness of our method. Meanwhile, we achieved better performance on all datasets with only slightly lower WIKI scores compared to the recent adaptive feature (2022) approach. Their method enhances the entity representation by adaptive entity types based on extracting entity implicit relationships, but we reduce the training complexity by dynamic entity linking and incorporate entity entropy values to effectively improve the entity linking accuracy; even our F1 score is 2.84% higher than theirs on the CWEB dataset. Meanwhile, by adding entity type information, it is possible to improve the prediction effect of the model, which is also the direction of our subsequent work.
4.4.2. Hyperparametric Experimental Analysis
In order to investigate the effects of candidate entity document length D, path length Q, and random wandering layer K on the entity linking results, we analyzed the experimental results by observing the changes of
values of representative AIDA-A and AIDA-B. It is shown in
Figure 3:
It can be analyzed from
Figure 3a that at first the performance of the model is significantly improved with the increase in D, as more useful information can be encoded by more text joining. However, when the text information is too large, irrelevant information about the entity may be added, leading to a decrease in model performance. Therefore, we set the number of description words D for candidate entities to 160.
The performance curves in
Figure 3b show clear fluctuations that may have different effects on the distribution of entities in the graph due to the different path lengths of the two entities. Certain path lengths have better discriminative power for associative semantics between certain classes of entities, while they have no significant effect on other classes of entities. Therefore, when choosing the path length for entities, we set it to 6.
As shown in
Figure 3c, DG-GEL has the best model performance on the AIDA-A and AIDA-B datasets when the number of random walking layers K = 5. When K is increased to eight, the performance of DG-GEL becomes worse in both datasets. When the number of random walking layers is too large, the random walking process comes into contact with more uncorrelated or weakly correlated entities, which produces inaccurate representations of the correlated decision information, thus limiting the model performance.
4.4.3. Ablation Experiments
We divide the entity linking process into local and global entity linking, and the following experiments are conducted to better validate the effectiveness of our proposed model. In the local entity linking model, semantic features of entity-mention contexts and candidate entities are enriched by a two-layer encoder. To analyze the performance of the different components of the local entity linking model, CAM-LEL, we use different local score computation methods for validation. MA denotes that the semantic features acquired by the multi-attention encoding layer are directly subjected to the similarity computation of entity mentions and candidate entities; CA denotes that the similarities of entity mentions and candidate entities acquired by cross-attention encoding are directly superimposed on the results of the MA computation, where the computation of the semantics of entity context and candidate entity description documents is doubled; CAM-LEL denotes the similarity computation of entity mentions text features obtained by the multi-attention encoding layer and entity context and candidate entity description features obtained by cross-attention encoding.
Looking at the Micro F1 scores of the three computational methods on different datasets in
Figure 4, we find that both the CA and CAM-LEL methods improve the entity linking results using only MA. For the five public test sets MSNBC, AQUAINT, ACE2004, CWEB, and WIKI, CAM-LEL improves the performance by 0.98% on average, and CA improves the performance by 0.76%, compared to MA. It shows that our proposed CAM-LEL model is able to better capture the interaction characteristics of entity mention contexts with candidate entity descriptions through the interaction between two texts and the sharing of network weights. In addition, the comparison of the semantic representations of the two additive interaction network models leads to the conclusion that the experimental results of the fused semantic vector representation model CAM-LEL are higher than the direct spliced semantic vector representation model CA. The use of interaction networks suggests that the use of cross-attention can capture textual dependencies with better results than learning semantic representations alone.
In the global entity linking model, four sets of experiments were set up in order to verify the effect of semantic enhancement of our candidate entities and the introduction of entropy on entity linking, and we performed the following conventions: Org denotes the original random walking entity linking model, Org_SE denotes the implementation of semantic enhancement of candidate entities, Org_EV denotes the introduction of entropy computation on the adjustment of entity linking results, and DG-GEL denotes the global entity linking model.
By analyzing the data in
Figure 5, both the semantic enhancement of candidate entities and the introduction of the entity entropy value in the implementation of candidate entities have improved the performance of the model to different degrees, in which the introduction of the semantic enhancement of candidate entities has the most obvious effect, with a performance improvement of 1.9%, and compared to linking the highest scores to the most linked targets at the end of a layer of random wandering, this study also effectively improves the performance of the model by introducing an entropy value, which improves the average performance by 0.71% on the five datasets. The above results show that semantic enhancement of candidate entities can improve the correlation between entities and improve the effectiveness of entity linking. Relative to previous research work that did not eliminate the internal influence of candidate entities, our inclusion of the entropy factor to adjust the linking order also significantly improves the experimental results.
To conduct an overall analysis of the CAM-LEL model and the global dynamic graph model, we also compared the local and global scores. CAM-LEL denotes the local entity link model scores and DG-GEL denotes the global entity link model scores. The experimental results are as follows.
Table 4 indicates that better accuracy and recall are achieved in the CAM-LEL local entity linking model through the mutual attention of the entity context and candidate entities. The effectiveness of our proposed local entity linking model has been demonstrated by the comparison between the models in
Figure 4. In the comparison between local and global entity linking models, CAM-LEL clearly has a higher F1 score than DG-GEL, with an average performance improvement of about 1.87%, and even 3.37% on the MSNBC dataset, which indicates that more implicit features between entities can be captured in the DG-GEL model by modeling the linkage between candidate entities, and at the same time, the entity entropy value can help the linkage decision of the current entity mentioned to select a more reasonable target entity, effectively improving the effect of entity linking.
5. Summary and Future Work
This paper presents a novel entity-linking technique that combines cascade attention mechanisms and a dynamic graph. In the local entity linking section, we design it from the perspective of semantic features and statistical features. First, through the mutual attention of entity context and candidate entities, we effectively reduce the interference of irrelevant information and extract richer entity semantic features, and then fuse two statistical features, namely a priori probability and edit distance, to compute the association of entity mentions to candidate entities. In the global entity linking section, we gradually eliminate ambiguities by constructing entity connectivity graphs and random walking strategies. Compared with the traditional graph model, we not only enhanced the correlation between candidate entities but explored the effect of low discrimination within the set of candidate entities on linking results, improving the precision of entity linking, which proves the effectiveness of the proposed model.
This method utilizes textual information for entity similarity calculation, which effectively improves the accuracy of entity linking. The current study shows that the multimodal information of entities and entity types is of great value for linking work. In the future, we plan to design and implement a type-aware multimodal entity linking model from the perspectives of both multimodal information and entity types of entities, learn the implicit semantic relationships between entities, and further explore and optimize entity linking techniques.