A Multi-Hop Graph Neural Network for Event Detection via a Stacked Module and a Feedback Network

Liu, Liu; Ding, Kun; Liu, Ming; Liu, Shanshan

doi:10.3390/electronics12061386

Open AccessArticle

A Multi-Hop Graph Neural Network for Event Detection via a Stacked Module and a Feedback Network

by

Liu Liu

^1,2

,

Kun Ding

^2,*,

Ming Liu

² and

Shanshan Liu

²

¹

School of Information Engineering, Suqian University, Suqian 223800, China

²

The Sixty-Third Research Institute, National University of Defense Technology, Nanjing 210001, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(6), 1386; https://doi.org/10.3390/electronics12061386

Submission received: 11 February 2023 / Revised: 8 March 2023 / Accepted: 13 March 2023 / Published: 14 March 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Event detection is an important subtask of information extraction, aiming to identify triggers and recognize event types in text. Previous state-of-the-art studies using graph neural networks (GNNs) are mainly applied to obtain long distance features of text and have achieved impressive performance. However, these methods face the issues of over-smoothing and semantic feature destruction, when containing multiple GNN layers. For the reasons, this paper proposes an improved GNN model for event detection. The model first proposes a stacked module to enrich node representation to alleviate the over-smoothing. The module aggregates multi-hop neighbors with different weights by stacking different GNNs in each hidden layer, so that the representation of nodes no longer tends to be similar. Then, a feedback network is designed with a gating mechanism to retain effective semantic information in the propagation process of the model. Finally, experimental results demonstrate that our model achieves competitive results in many indicators compared with state-of-the-art methods.

Keywords:

event detection; graph neural network; information extraction

1. Introduction

With the explosive growth of information, how to automatically obtain effective information from unstructured text is an important challenge in the real world. Event detection (EE) is an important part of information extraction, which can provide effective support for information retrieval, text classification, intelligent question answering, and other tasks.

The tasks of EE are to identify event triggers and recognize special event types from pain texts. Triggers mainly appear in the form of verbs, which are usually the core words of a sentence and can directly reflect the type and state of events. The automatic content extraction evaluation meeting (ACE) describes the trigger as the word that triggered the event [1]. For example, consider the following sentence:

S : The attack killed seven and injured twenty .

In S, there are two events, where their triggers are ‘killed’ and ‘injured’. The event type of killed is LIFE. DIE, and the event type of injured is LIFE. INJURE.

Compared with the traditional event detection method based on artificial features, the deep learning method can consciously mine the potential features hidden in the data [2,3]. Deep learning is constrained to handle only Euclidean data, but most real-application scenarios originate from non-Euclidean space, such as social networks, knowledge graph, and drug discovery [4,5,6]. In recent years, the pre-training language model has demonstrated outstanding performance in natural language processing (NLP) tasks, and has also become an important technical means to achieve the EE task. BERT [7] has a stronger semantic understanding ability and semantic feature extraction ability than other pre-training models, such as ELMo [8] and GPT [9]. However, compared with GNNs, it is insufficient for BERT to capture long-distance inter-word relations. Dependency trees help GNNs capture long-range relations between words [10].

For the above reasons, some scholars proposed the use of graph neural networks for event detection [11,12]. These methods usually utilize the syntactic dependency of words in the text to build the graph structure, and update the node representation by aggregating neighbor nodes in the graph [13,14]. Therefore, the graph structure is more conducive to capturing the potential relationship of non-Euclidean data, and more effectively captures the long-distance interrelation between each candidate trigger word and its related entities or other triggers. However, as the number of hidden layers increases, the node representation of graph neural networks tends to be similar, which is also known as the over-smoothing problem [15]. Meanwhile, the semantic information of the input will also be destroyed.

Therefore, this paper proposes a novel architecture named MHG-SMFB. It constructs a multi-hop graph based on the dependence path of words to mine the diverse relationships between nodes. To enrich the representation of nodes, an aggregation method based on an improved GNN is proposed. The GNN can aggregate the difference information of the multi-hop neighbor by stacking a graph convolution network and a graph attention network in each hidden layer. A feedback network is used to alleviate semantic feature attenuation caused by the increase of GNN hidden layers. Meanwhile, the network utilizes a gating mechanism to fuse the structural and semantic information. Finally, the experimental results illustrate that our model outperforms other state-of-the-art methods in the ACE 2005 dataset.

To summary, the main contributions of this paper are as follows:

(1) This paper proposes a multi-hop GNN via stacked module, which changes the information aggregation mode of nodes. The stacked module uses multiple calculation methods instead of the single method to aggregate node neighbor information, thus effectively reducing the node information tends to be similar.

(2) This paper proposes an improved feedback network, which uses a gating mechanism to improve the fusion of semantic information and structural information.

(3) We achieve state-of-the-art performance with the ACE 2005 dataset for the event detection using the proposed method.

2. Related Studies

Compared with traditional statistical models, deep learning models with nonlinear representation ability can automatically mine potential features in samples, so many scholars apply them to event detection tasks. Chen et al. [16] and Nguyen et al. [17] firstly apply convolutional neural networks (CNNs) to event detection tasks and perform well. Nguyen et al. [18] and Jagannatha et al. [19] use recurrent neural networks (RNNs) to detect events. The former uses Bi-RNN for textual representation, followed by a joint learning model to identify event triggers and arguments. The latter uses Bi-RNN to obtain word embedding and conditional random field (CRF) to detect medical events contained in electronic medical records. Hong et al. [20] proposed a self-regulated learning method, which uses a generative adversarial network to generate spurious features to solve the errors that may be caused by false relations in the event detection process. Liu et al. [21] proposed an event detection method without triggers to reduce the time cost of trigger tagging. This method captures the context features by LSTM, and combines the local information of triggers obtained by the attention mechanism. Finally, the multi-task classifier is used to judge the event type. Although the deep learning model has made remarkable achievements in the field of event detection, it still has the following limitations. First, the deep learning model relies on a large number of tagged data sets. Second, the deep learning model cannot adapt to the task of feature mining in non-Euclidean space. Finally, the generalization ability of the deep learning model is weak.

The BERT model has become one of the most successful pre-training language models (PLM) with strong semantic representation ability. At present, many event detection tasks are based on this model and have achieved remarkable results. Yang et al. [22] use the BERT model as encoder to obtain word vectors with semantic information, and then apply multiple classifiers to act on the codes to extract triggers. Wadden et al. [23] utilize BERT to build a framework called DYGIE++, conduct cross-text modeling, and capture intra-sentence and cross-sentence context information. Liu et al. [24] and Du et al. [25] construct the event extraction task as machine reading comprehension based on BERT in order to alleviate the dependence of the model on data and the problem of error propagation. The difference between the two studies is that Liu et al. [24] uses unsupervised methods to generate problems, while Du et al. [25] uses manual methods to design problems. Compared with the deep learning model described above, BERT not only has semantic understanding ability, but is also easy to migrate to different fields. However, it is difficult for BERT to capture the long-distance relationship between words.

GNNs have the ability to process graph structured data, so they are widely used in NLP tasks, including event detection [26,27]. Cui et al. [28] realized that the classical GNNs ignore the information of dependency labels, so they designed a graph convolution neural network with enhanced edge information for event detection. Yan et al. [29] propose a multi-order graph attention network-based method for event detection. The method uses a graph attention network [30] to calculate the weight of neighbors based on different order graphs. Additionally, it applies an attention mechanism to aggregate multi-order representations of nodes. Lv et al. [31] propose a novel framework called hierarchical graph enhanced event detection (HGEED) in order to solve the problems that existing event detection methods ignore, including the dependency between sentences and insufficiently characterizing the dependency between words. The framework builds a document graph and a sentence graph to mine the implicit sentence-level dependencies and enriches the local information of words. As the depth of graph neural network increases, these methods face the problems of over-smoothing and semantic feature destruction. Compared with these methods, the event detection architecture called MHG-SMFB proposed in this paper alleviates these two problems effectively. The architecture proposes a stacked GNN based on a multi-hop relationship between nodes uses the superposition of different graph neural networks to avoid the similarity of node information. Additionally, it applies a feedback network to retain effective semantic information.

3. Event Detection Model

As shown in Figure 1, MHG-SMFB is composed of semantic feature extraction, feedback network, structural feature extraction, and event detection modules. The semantic module uses the PLM to mine the sentence-level context semantic information. The structural module applies a stacked GNN by the multi-hop adjacency matrix transformed from the syntax tree of the text to extract the structural information. The feedback network consists of multiple feedback layers, each of which consists of a normalized function and a gated mechanism. Finally, a multi-task classifier is used to recognize and classify the event triggers.

3.1. Semantic Features

The BERT model is used to extract semantic features, and its input mainly includes Token Embeddings, Positional Embeddings, and Segmentation Embeddings. Token Embeddings use WordPiece Embeddings [32], which contain 30,000 words commonly used in English. Positional Embeddings are obtained by learning, with the training parameters determined by the maximum length of the text and the length of the word vector. This paper focuses on sentence level event detection, so Segmentation Embeddings are one vector. BERT uses the stack of Transformer Encoders as the core module of semantic analysis. The BERT model used in this paper is BERT_Base (L = 12, H = 768, A = 12, total parameters = 110 M) from article [7]. The number of layers (transformer blocks) is L, the hidden size is H, and the number of self-attention heads is A.

3.2. Structural Features

In order to obtain the structural features of a sentence, this paper uses an improved GNN model to capture the dependencies of words in a sentence. First, we use syntactic analysis tools to extract the dependency paths between words, and then convert it into an adjacency matrix. We use the adjacency matrix to initialize the multi-hop relationships of nodes in the graph neural network. Finally, a stacked GNN model is proposed to obtain the structural features of nodes.

3.2.1. Syntactic Analysis

In this paper, we use the NLP tool (spaCy) to parse sentences and obtain the dependency tree among words, as shown in Figure 2. Usually, the dependency relationship is binary and asymmetric, in which verbs or gerunds are the core words (root). There may be a dependency path between words in the same sentence. The circular arc in the figure indicates that there is a direct dependency between words. For example, the relationship of the word pair (killed, seven) is “dobj”, indicating that the latter is the direct object of the former.

In S, the dependencies between words are diverse, and there are seven relationships including root. The asymmetry of the relationship determining the dependency path is directed. In order to be able to express such dependent paths as computable adjacency matrices

A_{n \times n}

, its elements are represented as

A_{i, j} = \{\begin{matrix} α, (v_{i}, v_{j}) \in E \\ 0, (v_{i}, v_{j}) \notin E \\ β, (v_{i}, \dots, v_{j}) \in E \end{matrix}

. The values of

α

and

β

are the hyper parameters. In

A_{n \times n}

, a word pair

(i, j)

with a dependency path is expressed as

α

, and without a dependency path, it is expressed as 0. If a word pair

(j, i)

has a dependency path,

A_{j, i}

is equal to

α

. In addition, dependent paths are divided into direct paths and indirect paths. In Figure 2, (killed, seven) is the direct path, which represents the first-hop neighbor nodes. The path of word pair (killed, The) derived from (killed, attack) to (attack, The) is the indirect path, which denotes multi-hop neighbor nodes. At the same time, a self-loop is added to each node to retain its original information.

3.2.2. Stacked Graph Neural Network

The paper constructs a stacked graph neural network to enrich the representation of nodes. The stacked model applies a multi-hop path to exploit long-distance relations of nodes in the syntactic parsing result. Meanwhile, in order to effectively distinguish the degree of dependence between different neighbor nodes, this paper superimposes the graph attention network on the graph convolution neural network (GCN). The graph attention network employs the attention mechanism to re-aggregate the node information of the GCN to prevent node representation similarity.

For the convenience of description, this paper uses the article [12] to express the graph as

G = (V, E)

, where

V

represents the set of nodes (words), and

E

represents the set of edges (relationships between words). As shown in Figure 3, the update of node representation in a graph convolutional network is aggregated by neighbor nodes. The number of layers in the network represents the number of updates required. The formula for updating the nodes in the layer is as follows:

h_{v}^{(k + 1)} = Re L U (\sum_{u \in \tilde{N} (v)} W_{L (u, v)}^{(k)} h_{u}^{(k)} + b_{L (u, v)}^{(k)})

(1)

where

\tilde{N} (v)

is the neighbors of node

v

, and represents the direct relationship between node v and its neighbor;

h_{v}^{(k)}

is the feature of node

v

from the k-th hidden layer;

W_{L (u, v)}^{(k)}

is a learnable parameter; and

b_{L (u, v)}^{(k)}

is an offset parameter.

GCN simply aggregates the node information contained in the multi-hop path, and some non-essential information is considered equally important. Meanwhile, in order to prevent potential dependency analysis errors, this paper constructs a stacked GNN. Each hidden layer of the GNN contains a graph convolution neural network and a graph attention network, where the output of the graph convolution neural network is the input of the graph attention network. The stacked GNN designs a simple graph attention network, which uses the attention mechanism to re-aggregate multi-hop information of all nodes.

As shown in Figure 4, we assign different weights to neighbors of each node during secondary aggregation to distinguish their importance [30]. The formula of weight score is as follows:

h_{v_{i}}^{'} = | |_{m = 1}^{M} σ (\sum_{v_{j} \in \tilde{N} (v_{i})} α_{i j}^{(m)} W^{(m)} h_{v_{j}}^{(k)})

(2)

α_{i j} = \frac{\exp (L e a k y Re L U (a^{T} [W h_{i} | | W h_{j}]))}{\sum_{v_{j} \in \tilde{N} (v_{i})} \exp (L e a k y Re L U (a^{T} [W h_{i} | | W h_{j}]))}

(3)

where

α_{i j}^{(m)}

is the weight coefficient calculated by the m-th head attention mechanism,

σ

is the activate function, and

h_{v_{j}}^{(k)}

represents the feature of node

v_{j}

output by the k-th hidden layer of the GCN.

3.3. Feedback Network

Inspired by the residual network [33], this paper designs a feedback network to alleviate the degradation of the GCN model [34] and the semantic information destruction of words. The semantic information is contained in the input of the GCN, which is obtained by the BERT model. As shown in Figure 5, different from the update formula of nodes in the hidden layer in the previous section, the update formula using feedback network is

{\tilde{h}}_{v}^{(k + 1)} = λ_{k} * L N (h_{v}^{(k + 1)}) * F^{(k)} (x_{v})

(4)

F^{(k)} (x_{v}) = 1 - sigmoid (W^{(k + 1)} L N (x_{v}))

(5)

where

L N (x) = \frac{x - E [x]}{\sqrt{V a r [x] + ϵ}}

,

λ_{k}

is a hyper parameter,

W \in 1 \times x_{d i m}

is the trainable parameter, and

x_{d i m}

is the dimension of

x_{v}

.

In the process of semantic feature and structural feature fusion, the normalization method is adopted to avoid the computational imbalance problem caused, and can accelerate the model convergence. Meanwhile, the weight coefficient is used to reconcile the semantic features and structural features to achieve the optimal state.

3.4. Event Detection

Because the types of events are diverse, this paper uses multiple classifiers to predict the words in the text. The nodes in the GCN are quantitative representations of words in sample sentences, so the following formula is used:

\hat{y} = s o f t \max (\tilde{h} W + b) \in ℝ^{T} \times n

(6)

where

\tilde{h}

is the node representation vector in GCNs, containing semantic and structural information,

W

is the trainable parameter, and

b

is the bias parameter.

In this paper, cross entropy is used as the loss function, and its formula is as follows:

L = (- \sum_{i = 0}^{N - 1} \log (\frac{e^{y_{i}}}{\sum_{j} e^{{\hat{y}}_{i}^{j}}})) / N

(7)

where

N

represents the number of samples in the training batch,

y_{i}

represents the real label probability of the sample, and

{\hat{y}}_{i}^{j}

represents the probability of the i-th sample being judged as category

j

.

4. Experiments

4.1. Experimental Data and Evaluation Metrics

This experiment uses the ACE 2005 dataset, which contains news, micro-blogging, broadcasting, and other texts from different fields [35]. As shown in Table 1, the ACE 2005 dataset includes 8 event types and 33 subtypes. If there are no events in a text, the event type of the text is marked NULL. In order to ensure the fairness of the experiment, 15,715 sentences from 599 texts were selected as experimental data using the method in the article [36], including 14,180 sentences as training samples, 863 sentences as verification samples, and 672 sentences as test samples.

All experimental results of this paper adopt the same evaluation criteria as Li et al. [37]. Trigger identification is that the offsets of a trigger in a text match those of a gold-standard trigger. Trigger classification is that the event type of the trigger matches the type of the gold-standard trigger [25]. This paper uses three indicators, including Precision, Recall, and F1, to evaluate the experimental results [38,39]. Precision can be defined as the proportion of true positive samples to positive samples predicted. Recall refers to the proportion of true positive samples predicted to positive samples. F1 represents the harmonic average evaluation index of precision and recall [40]. Their formulas are shown in Formulas (6)–(8), where TP is the number of true positive samples, FP and FN denote the number of false positive and false negative samples, respectively. Since event detection consists of trigger recognition and classification, two groups of the same indicators are used for evaluation, respectively.

P r e c i s i o n = \frac{T P}{T P + F P}

(8)

R e c a l l = \frac{T P}{T P + F N}

(9)

F 1 = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(10)

4.2. Experimental Environment and Model Training Parameters

As shown in Table 2, the hyper-parameters are manually tuned on the ACE 2005 dataset. The selected values for the parameters include the following: batch size = 8; epoch = 10; dropout rate = 0.2; the dimensionality of the position embeddings, the segment embeddings, and the word embeddings = 768; max length of text = 512; and number of hidden units for the classification d = 34. Warmup proportion [41] is 0.1, which is a method to optimize the learning rate.

We use the DGL graph algorithm library to construct the graph neural network in this paper, where the input and output dimensions of the graph are 768. Additionally, it contains 5 hidden layers. The Layer Normalization [42] is applied to standardize the input of the graph neural network. Each hidden layer is composed of a graph convolution neural network and a graph attention network, where the output of the former is the input of the latter.

The programming language used in this experiment is Python 3.8, the deep learning framework used is pytroch1.8.0+cuda11.1, the PLM uses the Bert-based-uncased version, the graph computing framework uses dgl0.8.1+cuda11.0, and the syntax analysis tool uses spaCy v3.0 (the loaded model is en_core_web_lg-3.2.0). We ran all the experiments on Nvidia RTX3090 GPU, with Intel Xeon(R) Gold 6330.

4.3. Experimental Results and Analysis

The experiments in this paper mainly include comparison experiments, ablation experiments, and graph depth experiments. The comparison experiments are used to compare the performance of our model with the mainstream event detection models, and the ablation experiments are used to verify the impact of the different GCN model structure on the performance.

4.3.1. Comparison Experiments

In order to effectively evaluate the performance of our model, this paper chooses to compare it with seven different event detection models. These models are as follows:

DMCNN [16]: It introduces a dynamic multi-pooling convolutional neural network, which is used to capture valuable information in sentence level containing multiple events.

TBNNAM [21]: In order to avoid labeling triggers, this method uses the attention mechanism to calculate the sentence representation based on event types, thus realizing an event detection task without trigger words.

BERT_QA [25]: The model transforms multi-category tasks into machine reading comprehension by designing corresponding Question–Answer pairs for different tasks of event extraction.

GCN-ED [12]: One of the earliest studies that uses a graph neural network to improve event detection task. It not only uses syntactic dependencies to represent the context information of words, but also proposes an aggregation method integrating entity references.

JMEE [43]: This method applies the attention mechanism to the information aggregation of the GCN model, so as to achieve the joint extraction of multiple event triggers and arguments.

Join3EE [44]: In order to prevent error propagation, this method proposes joint learning, which shares the sentence representation ability of the Bi-RNN model with three tasks named entity recognition, triggers, and argument classification.

HGEED [31]: The method uses a sentence-level graph and document-level graph to obtain the local information and global information of words, respectively, to enhance event detection.

Table 3 shows the test results of this model and seven benchmark models on the ACE2005 dataset. The experimental results show that compared with other models, the accuracy, recall, and F1 value of this model in trigger word classification are improved by 1.13%, 8.41%, and 5.04%, respectively, and the recognition of trigger words are also improved by 5.23%, 9.45%, and 7.54%, respectively. However, compared with TBNNAM, GCN-ED, JMEE, and other models, the accuracy of trigger classification is decreased. This is because the TBNNAM model uses the method of non-trigger tagging to avoid the impact of trigger tagging errors on model training. However, due to its inability to capture the structured information of the text, it is obviously weaker than the model in this paper in the comprehensive indicator F1. Compared with the model in this paper, the JMEE model improves the aggregation of the GCN model, so it has certain advantages in structured information capture. The GCN-ED model is also a graph neural network model. Because it associates entity information with trigger words, it improves the accuracy of event classification. However, due to their inadequacy in semantic information capture compared with the model in this paper, they show certain disadvantages in the comprehensive index F1 value.

4.3.2. Ablation Study

The ablation experiment verifies the effect of each module on the event detection model. Table 4 shows the impact of the four modules designed in this paper on event detection of our model. (1) The -attention model removes the attention mechanism from our model. This means that the classical GCN model is used as the graph neural network. (2) The -fusion model deletes the fusion mechanism of semantic features and structural features from the feedback module, and the original fusion mechanism is replaced by weighted addition. (3) The -feedback model cancels the effect of the feedback mechanism on theGNN, which means that semantic features are only used to initialize graph node information. (4) The -gcn model does not use the structural features obtained by the GNN, and only uses the pre-training model to obtain the semantic features of samples as the information of event detection.

The experiments show that our model improves F1 values of trigger classification and identification by 3.68% on average compared with other models. That means the semantic information and structural information of the text play an important role in the EE task, and the feedback network proposed in this paper can improve the GCN model. The performance of the -attention model is reduced by more than 1.1% compared to the complete model. This is due to the lack of filtering of neighbor nodes when the model aggregates information. The -gcn model is approximate to the BERT+FineTue model which lacks the ability to capture structural information. Additionally, its results are poor compared with those of other models. Compared with our model, the F1 value of GNN without the feedback decreases by 5.21%. That means that the feedback module alleviates the attenuation and destruction of semantic information.

4.3.3. Influence of the Model Depth

In order to verify the attenuation and destructiveness of the GNN model in terms of semantic features, this section uses F1 values of GCN models with different numbers of hidden layers in trigger classification and identification as the verification index. In Figure 6, the sub-graphs on the left and right, respectively, represent the results of our model and the GNN on trigger word classification and identification. Figure 6b,d shows that the performance of the model decreases linearly with the number of hidden layers. This reveals that with the increase of model depth, the GNN model has significant attenuation and destruction effects on semantic features. With no more than two hidden layers of GCNs, its performances remain at a high level. This shows that the shallow model can retain some semantic features while obtaining structural features, and the deep GCN is more prone to smoothing. Figure 6a,c show that there is no linear decline in our model. The increase in the number of hidden layers of the model does not affect the improvement of its performance, which reveals that the feedback module and attention mechanism can eliminate the performance degradation problem caused by excessive smoothing to a certain extent.

5. Conclusions

The main task of event detection is to identify trigger words and event types from unstructured text. In order to improve the performance of event detection task, this paper proposes a new architecture called MHG-SMFB. In order to alleviate the problem of over-smoothing and semantic information destruction caused by GNNs, MHG-SMFB uses an improved GNN model and a feedback network. To realize the difference of node aggregation, the GNN designs a stacked structure based on a multi-hop graph to effectively aggregate more and further associated node information. Then, a feedback network is employed to prevent excessive destruction of semantic information. Finally, the experimental results on public datasets show that our work can achieve competitive performance compared with state-of-the-art methods. However, there are still some limitations in our work that need to be improved. Firstly, our work relies on syntax analysis tools, which lead to error propagation in the model. We plan to design a joint learning model based on reinforcement to alleviate the error propagation. Secondly, the results of event detection are not ideal in low resource scenarios. We will introduce contrastive learning to improve the generalization ability of our work. Since the calculation cost of the existing models is relatively high, we will optimize the model parameters and structure as an important work in the future.

Author Contributions

Conceptualization, L.L., K.D., M.L. and S.L.; methodology, L.L. and K.D.; software, L.L. and M.L.; validation, L.L., K.D., M.L. and S.L.; formal analysis, L.L. and M.L.; investigation, L.L.; resources, M.L.; data curation, S.L.; writing—original draft preparation, L.L., K.D., M.L. and S.L.; writing—review and editing, L.L., M.L. and S.L.; visualization, L.L.; supervision, M.L.; project administration, K.D.; funding acquisition, M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the NSFC (No. 71901215, 62071240), the National University of Defense Technology Research Project (No. ZK20-46), the Natural Science Foundation of Higher Education Institutions of Jiangsu Province, China (No. 20KJB413003), the China Postdoctoral Science Foundation (No. 2021MD703983), and the Young Elite Scientists Sponsorship Program (No. 2021-JCJQQT-050).

Conflicts of Interest

The authors declare no conflict of interest.

References

Linguistic Data Consortium. ACE (Automatic Content Extraction) English Annotation Guidelines for Events Version 5.4. 3. Available online: https://www.ldc.upenn.edu/ (accessed on 18 June 2020).
Li, Q.; Peng, H.; Li, J.; Xia, C.; Yang, R.; Sun, L.; Yu, P.S.; He, L. A survey on text classification: From traditional to deep learning. ACM Trans. Intell. Syst. Technol. (TIST) 2022, 13, 1–41. [Google Scholar] [CrossRef]
Otter, D.W.; Medina, J.R.; Kalita, J.K. A survey of the usages of deep learning for natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 604–624. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kumar, S.; Mallik, A.; Khetarpal, A.; Panda, B.S. Influence maximization in social networks using graph embedding and graph neural network. Inf. Sci. 2022, 607, 1617–1636. [Google Scholar] [CrossRef]
Chen, X.; Jia, S.; Xiang, Y. A review: Knowledge reasoning over knowledge graph. Expert Syst. Appl. 2020, 141, 112948. [Google Scholar] [CrossRef]
Liu, X.; Song, C.; Huang, F.; Fu, H.; Xiao, W.; Zhang, W. GraphCDR. A graph neural network method with contrastive learning for cancer drug response prediction. Brief. Bioinform. 2022, 23, bbab457. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 1–6 June 2018; pp. 2227–2237. [Google Scholar]
Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving language understanding by generative pre-training. arXiv 2020, arXiv:2005.07157. [Google Scholar]
Zhang, Y.; Qi, P.; Manning, C.D. Graph convolution over pruned dependency trees improves relation extraction. arXiv 2018, arXiv:1809.10185. [Google Scholar]
Lu, S.; Li, S.; Xu, Y.; Wang, K.; Lan, H.; Guo, J. Event detection from text using path-aware graph convolutional network. Appl. Intell. 2022, 52, 4987–4998. [Google Scholar] [CrossRef]
Nguyen, T.; Grishman, R. Graph convolutional networks with argument-aware pooling for event detection. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 5900–5907. [Google Scholar]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2008, 20, 61–80. [Google Scholar] [CrossRef] [Green Version]
Liu, A.; Xu, N.; Liu, H. Self-Attention Graph Residual Convolutional Networks for Event Detection with dependency relations. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, 16–20 November 2021; pp. 302–311. [Google Scholar]
Chen, D.; Lin, Y.; Li, W.; Li, P.; Zhou, J.; Sun, X. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 3438–3445. [Google Scholar]
Chen, Y.; Xu, L.; Liu, K.; Zeng, D.; Zhao, J. Event extraction via dynamic multi-pooling convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26–31 July 2015; pp. 167–176. [Google Scholar]
Nguyen, T.H.; Grishman, R. Modeling skip-grams for event detection with convolutional neural networks. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–4 November 2016; pp. 886–891. [Google Scholar]
Nguyen, T.H.; Cho, K.; Grishman, R. Joint event extraction via recurrent neural networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 300–309. [Google Scholar]
Jagannatha, A.N.; Yu, H. Bidirectional RNN for medical event detection in electronic health records. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 1–6 June 2018; p. 473. [Google Scholar]
Hong, Y.; Zhou, W.; Zhang, J.; Zhou, G.; Zhu, Q. Self-regulation: Employing a generative adversarial network to improve event detection. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; pp. 515–526. [Google Scholar]
Liu, S.; Li, Y.; Zhang, F.; Zhou, X.; Yang, T. Event detection without triggers. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 735–744. [Google Scholar]
Yang, S.; Feng, D.; Qiao, L.; Kan, Z.; Li, D. Exploring pre-trained language models for event extraction and generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 29–31 July 2019; pp. 5284–5294. [Google Scholar]
Wadden, D.; Wennberg, U.; Luan, Y.; Hajishirzi, H. Entity, relation, and event extraction with contextualized span representations. arXiv 2019, arXiv:1909.03546. [Google Scholar]
Liu, J.; Chen, Y.; Liu, K.; Bi, W.; Liu, X. Event extraction as machine reading comprehension. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 1641–1651. [Google Scholar]
Du, X.; Cardie, C. Event extraction by answering (almost) natural questions. arXiv 2020, arXiv:2004.13625. [Google Scholar]
Wu, L.; Chen, Y.; Shen, K.; Guo, X.; Gao, H.; Li, S.; Pei, J.; Long, B. Graph neural networks for natural language processing: A survey. arXiv 2021, arXiv:2106.06090. [Google Scholar]
Zaratiana, U.; Tomeh, N.; Holat, P.; Charnois, T. GNNer: Reducing Overlapping in Span-based NER Using Graph Neural Networks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, Dublin, Ireland, 22–27 May 2022; pp. 97–103. [Google Scholar]
Cui, S.; Yu, B.; Liu, T.; Zhang, Z.; Wang, X.; Shi, J. Edge-enhanced graph convolution networks for event detection with syntactic relation. arXiv 2020, arXiv:2002.10757. [Google Scholar]
Yan, H.; Jin, X.; Meng, X.; Guo, J.; Cheng, X. Event detection with multi-order graph convolution and aggregated attention. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5766–5770. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Lv, J.; Zhang, Z.; Jin, L.; Li, S.; Li, X.; Xu, G.; Sun, X. Hgeed. Hierarchical graph enhanced event detection. Neurocomputing 2021, 453, 141–150. [Google Scholar] [CrossRef]
Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K.; et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv 2016, arXiv:1609.08144. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June 2016; pp. 770–778. [Google Scholar]
Zhou, K.; Dong, Y.; Wang, K.; Lee, W.; Hooi, B.; Xu, H. Understanding and resolving performance degradation in deep graph convolutional networks. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Gold Coast, QLD, Australia, 1–5 November 2021; pp. 2728–2737. [Google Scholar]
Doddington, G.; Mitchell, A.; Przybocki, M.; Ramshaw, L.; Strassel, S.; Weischedel, R. The automatic content extraction (ace) program-tasks, data, and evaluation. Lrec 2004, 2, 837–840. [Google Scholar]
Zhang, T.; Ji, H.; Sil, A. Joint entity and event extraction with generative adversarial imitation learning. Data Intell. 2019, 1, 99–120. [Google Scholar] [CrossRef]
Li, Q.; Ji, H.; Huang, L. Joint event extraction via structured prediction with global features. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Sofia, Bulgaria, 4–9 August 2013; pp. 73–82. [Google Scholar]
Yacouby, R.; Axman, D. Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, Stroudsburg, PA, USA, 20 November 2020; pp. 79–91. [Google Scholar]
Powers, D.M.W. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar]
Opitz, J.; Burst, S. Macro f1 and macro f1. arXiv 2019, arXiv:1911.03347. [Google Scholar]
Izsak, P.; Berchansky, M.; Levy, O. How to train BERT with an academic budget. arXiv 2021, arXiv:2104.07705. [Google Scholar]
Ren, M.; Liao, R.; Urtasun, R.; Sinz, F.H.; Zemel, R.S. Normalizing the normalizers: Comparing and extending network normalization schemes. arXiv 2016, arXiv:1611.04520. [Google Scholar]
Liu, X.; Luo, Z.; Huang, H. Jointly multiple events extraction via attention-based graph information aggregation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018. [Google Scholar]
Nguyen, T.M.; Nguyen, T.H. One for all: Neural joint modeling of entities and events. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January 2019; pp. 6851–6858. [Google Scholar]

Figure 1. Illustration of the MHG-SMFB architecture.

Figure 2. An example of syntactic dependency parsing.

Figure 3. Illustration of the GCN architecture.

Figure 4. Aggregation of nodes based on the attention mechanism.

Figure 5. Feedback network architecture.

Figure 6. Influence of the model depth on F1 value of trigger classification (blue dotted lines) and identification (green dotted lines). (a,b) represents F1 curves of our model and the GCN model on trigger classification, respectively. (c,d) are F1 curves on trigger identification.

Table 1. Event type list.

Event Type	Event Subtype
Life	Be-Born, Divorce, Marry, Injure, Die
Movement	Transport
Transaction	Transfer-Ownership, Transfer-Money
Business	Start-Org, Merge-Org, Declare-Bankruptcy, End-Org
Conflict	Attack, Demonstrate
Contact	Meeting, Phone-Write
Personnel	Start-Position, End-Position, Nominate, Elect
Justice	Arrest-Jail, Release-Parole, Trial-Hearing, Charge-Indict, Sue, Convict, Sentence, Fine, Execute, Extradite, Acquit, Appeal, Pardon

Table 2. Parameter configurations of our experiment.

Parameters	Values
Epoch	10
Batch size	8
Learning rate	4 × 10⁻⁵
Dropout	0.2
Warmup proportion	0.1
GCN layers	5
GCN input	768
Hidden size	768

Table 3. Event detection results on ACE2005.

Methods	Trigger Classification		Trigger Identification
Methods	P	R	F1	P	R	F
DMCNN [16]	75.6	63.6	69.1	80.4	67.7	73.5
TBNNAM [21]	76.2	64.5	69.9	-	-	-
BERT_QA [25]	71.12	73.70	72.39	74.29	77.42	75.82
GCN-ED [12]	77.9	68.8	73.1	-	-	-
JMEE [43]	76.3	64.5	69.9	80.2	72.1	75.9
Joint3EE [44]	68.00	71.80	69.80	70.50	74.50	72.50
HGEED [31]	80.1	72.7	76.2	-	-	-
Our model	76.16	76.92	76.54	81.57	82.38	81.97

Table 4. Ablation results of our models.

Methods	Trigger Classification			Trigger Identification
Methods	P	R	F1	P	R	F
Our model	76.16	76.92	76.54	81.57	82.38	81.97
-attention	73.98	76.92	75.42	79.23	82.38	80.77
-fusion	73.42	75.43	74.41	79.71	81.88	80.78
-feedback	72.79	71.71	72.2	78.58	77.41	78
-gcn	69.77	76.18	72.84	67.15	73.2	70.04

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, L.; Ding, K.; Liu, M.; Liu, S. A Multi-Hop Graph Neural Network for Event Detection via a Stacked Module and a Feedback Network. Electronics 2023, 12, 1386. https://doi.org/10.3390/electronics12061386

AMA Style

Liu L, Ding K, Liu M, Liu S. A Multi-Hop Graph Neural Network for Event Detection via a Stacked Module and a Feedback Network. Electronics. 2023; 12(6):1386. https://doi.org/10.3390/electronics12061386

Chicago/Turabian Style

Liu, Liu, Kun Ding, Ming Liu, and Shanshan Liu. 2023. "A Multi-Hop Graph Neural Network for Event Detection via a Stacked Module and a Feedback Network" Electronics 12, no. 6: 1386. https://doi.org/10.3390/electronics12061386

APA Style

Liu, L., Ding, K., Liu, M., & Liu, S. (2023). A Multi-Hop Graph Neural Network for Event Detection via a Stacked Module and a Feedback Network. Electronics, 12(6), 1386. https://doi.org/10.3390/electronics12061386

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multi-Hop Graph Neural Network for Event Detection via a Stacked Module and a Feedback Network

Abstract

1. Introduction

2. Related Studies

3. Event Detection Model

3.1. Semantic Features

3.2. Structural Features

3.2.1. Syntactic Analysis

3.2.2. Stacked Graph Neural Network

3.3. Feedback Network

3.4. Event Detection

4. Experiments

4.1. Experimental Data and Evaluation Metrics

4.2. Experimental Environment and Model Training Parameters

4.3. Experimental Results and Analysis

4.3.1. Comparison Experiments

4.3.2. Ablation Study

4.3.3. Influence of the Model Depth

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI