JEEMRC: Joint Event Detection and Extraction via an End-to-End Machine Reading Comprehension Model

Liu, Shanshan; Zhang, Sheng; Ding, Kun; Liu, Liu

doi:10.3390/electronics13101807

Open AccessArticle

JEEMRC: Joint Event Detection and Extraction via an End-to-End Machine Reading Comprehension Model

¹

The Sixty-Third Research Institute, National University of Defense Technology, Nanjing 210007, China

²

College of Computer, National University of Defense Technology, Changsha 410073, China

³

College of Systems Engineering, National University of Defense Technology, Changsha 410073, China

⁴

School of Information Engineering, Suqian University, Suqian 223805, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(10), 1807; https://doi.org/10.3390/electronics13101807

Submission received: 8 April 2024 / Revised: 30 April 2024 / Accepted: 3 May 2024 / Published: 7 May 2024

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Event extraction (EE) generally contains two subtasks: viz., event detection and argument extraction. Owing to the success of machine reading comprehension (MRC), some researchers formulate EE into MRC frameworks. However, existing MRC-based EE techniques are pipeline methods that suffer from error propagation. Moreover, the correlation between event types and argument roles is pre-defined by experts, which is time-consuming and inflexible. To avoid these issues, event detection and argument extraction are formalized as joint MRC. Different from previous methods, which just generate questions for argument roles for identified event types, questions are generated for all arguments that appear in the given sentence in our approach. Moreover, an end-to-end MRC model, JEEMRC, is proposed, which consists of an event classifier and a machine reader with a coarse-to-fine attention mechanism. Our proposed model can train two subtasks jointly to alleviate error propagation and utilizes interaction information between event types and argument roles to improve the performance of both tasks. Experiments on ACE 2005 verify that our JEEMRC achieves competitive results compared with previous work. In addition, it performs well when detecting events and extracting arguments in data-scarce scenarios.

Keywords:

event extraction; machine reading comprehension; joint learning

1. Introduction

Event extraction (EE), a significant branch of information extraction (IE), contains two subtasks: viz., event detection and argument extraction. Different from previous classification paradigms, some researchers have formulated EE as a machine reading comprehension (MRC) task [1,2,3]. MRC-based event extraction approaches can take advantage of existing progress of MRC models and are promising for tackling zero-shot or few-shot EE [4,5].

Although MRC-based EE methods perform better than previous approaches, they still have some shortcomings. Existing MRC-based methods regard event detection and argument extraction as independent tasks, e.g., two-turn QA [1,2] or textual entailment for event detection and MRC for argument extraction [4,5]. These pipeline methods result in error propagation. Moreover, pre-defined correlation between event types and arguments is required. As shown on the left side of Figure 1, when given a sentence, the pipeline models first identify its event type and then generate question–answer pairs of specific arguments that appear in this event type based on the pre-defined event ontology. These methods rely on external knowledge from experts, which is time-consuming and inflexible when transferring to new domains.

To face the challenges to the pipeline MRC framework, we tackle event detection and extraction tasks jointly with an end-to-end MRC model. Different from previous pipeline methods based on pre-defined event ontology, we propose a novel question generation mechanism using reverse thinking. As shown on the right side of Figure 1, questions are generated for all argument roles in the dataset. Event types are backward reasoned in the MRC process by identifying whether questions for specific arguments have answers. For instance, if questions for Attacker (Q1) or Target (Q29) have answers, event types are probably Attack. In turn, for the Attack event, questions for Giver (Q3) and Recipient (Q31) have no answers. This backward reasoning mechanism can learn the constraint relationships between event types and argument roles automatically without external knowledge from experts.

Moreover, we introduce the JEEMRC (Joint Event Extraction via end-to-end Machine Reading Comprehension) model, which contains two main modules: an event classifier and a machine reader. (The code is publicly available at: https://github.com/lisa633/JEEMRC, accessed on 30 April 2024.) Specifically, a coarse-to-fine attention mechanism is designed in the machine reader module of JEEMRC. Coarse attention is utilized for the output of the event classifier to make JEEMRC focus on the specific event types. Fine attention computes the similarity between event types and each word embedding in the given sentence. With this coarse-to-fine attention, our JEEMRC is able to extract correlations between event types and argument roles effectively. To realize joint training, state-aware weights are set for both the event classifier and the machine reader. At last, a heuristic approach is introduced to refine the results.

Overall, this article consists of three main contributions:

We propose a new paradigm to handle the task of event detection and argument extraction jointly. Different from previous pipeline methods [1,2,4,5], we formalize the task as joint machine reading comprehension, which can alleviate error propagation and improve the performance for both tasks.
An end-to-end MRC model, JEEMRC, is introduced, which is able to tackle EE without labeling event triggers. With a coarse-to-fine attention mechanism, JEEMRC can learn the correlations between event types and arguments automatically and generate reasonable results that satisfy these constraint relationships, reducing the model’s reliance on expert knowledge.
Various experiments are conducted on the ACE 2005 benchmark, and the results illustrate that our method achieves state-of-the-art performance for both a supervised condition and few-shot scenarios.

The rest of this article is organized as follows: Section 2 introduces some related work. Our proposed JEEMRC model is described in Section 3. Section 4 details the dataset and experimental results, and Section 5 analyzes the results in different dimensions. We conclude this article in Section 6.

2. Related Work

In this section, research closely related to our work—viz., joint event extraction, few/zero-shot event extraction, and machine reading comprehension for IE—is introduced in detail.

2.1. Joint Event Extraction

Event extraction generally consists of two subtasks: event detection and argument extraction. Some traditional methods tackle these two tasks in the pipeline manner [6,7], which can suffer from the error propagation problem and can upgrade the performance of extracting arguments. To deal with the above challenges, joint EE approaches with deep neural models have been introduced, such as techniques using recurrent neural networks [8,9], convolutional neural networks [10], graph neural networks [11,12], and attention mechanisms [13,14]. Joint EE models are able to mitigate the effect of error propagation and learn the correlations between event types and argument roles automatically without pre-defined event ontology.

Despite many advances, most previous joint models formulate EE as a classification task and suffer from data scarcity problems. MRC-based EE does well with dealing with few-shot scenarios and new event types. However, existing MRC-based EE models are all in a pipeline manner, which cannot avoid the effect of error propagation. Different from the above two approaches, our JEEMRC is able to tackle the data scarcity problem and the error propagation problem simultaneously.

2.2. Few/Zero-Shot Event Extraction

Event extraction, the goal of which is to extract arguments from sentences that describe events, has been modeled as a classification task previously and has been tackled by supervised approaches [6,8,10,15,16]. However, these methods are data-hungry and cannot identify new types without manual annotations. EE in low-resource scenarios has given rise to unsupervised models. Peng et al. [17] proposed a method to detect events by measuring the similarities between event structures, which are generated by semantic role labeling and requires minimal supervision. Likewise, structure information of event ontology was applied by Huang et al. [18], who projected event mentions and types into a low-dimension space with abstract meaning representation and transferred knowledge of annotated events to unseen types. For few-shot event detection, Lai et al. [19] employed matching information from given seen types with introducing two extra factors to the loss function, while Deng et al. [20] encoded contextual information of event mentions with a dynamic memory network to enhance robustness in data-scarce scenarios. Owing to the abundance of labeled data for other NLP tasks, e.g., MNLI [21] for text entailment and SQuAD2.0 [22] for MRC, Lyu et al. [4] and Feng et al. [5] conducted zero-shot or few-shot EE by modeling event detection as textual entailment or yes/no QA and argument identification as extractive MRC.

Inspired by previous work, our proposed model makes full use of annotated MRC samples for pre-training and transfers knowledge of QA to EE by transforming event detection and argument extraction into joint MRC.

2.3. Machine Reading Comprehension for IE

Owing to the flourishing development of deep learning, a number of neural MRC models, such as Bi-DAF [23], QANet [24], R-Trans [25], and R-Net [26], have been proposed and even outperform human beings on specific MRC tasks. As a common way for humans to understand things, MRC, an important branch of question answering (QA), can be analogized to the field of information extraction (IE): it entails obtaining relevant information by asking questions. Changing IE into an MRC framework allows us to apply the existing achievements of MRC and tackle the challenges of few-shot scenarios, e.g., the finding of new types.

In 2017, Levy et al. [27] first transformed relation extraction (RE) into an MRC task, in which questions were generated by the template with given head entities and relations, while corresponding tail entities were required to be extracted from the context as answers to questions. Afterwards, Li et al. [28] employed MRC to named entity recognition (NER) to introduce prior knowledge of event types and deal with overlapping entities. Xiong et al. [29] and Sun et al. [30] used an MRC framework to identify entities in biomedicine and medicine. Both of them chose a BERT-based MRC model, but they generated questions according to specific domains and corpora. MRC-based methods have also been applied to joint entity and relation extraction. Li et al. [31] regarded entity and relation extraction as multi-turn QA. Firstly, they generated questions for head entities. Then, relations were questioned based on head entities to find out tail entities. Reinforcement learning was employed to optimize the process of multi-turn QA.

Event extraction, as a significant subtask of IE, can also be tackled by an MRC framework. For sentence-level EE, Du and Cardie [1] questioned triggers and arguments through two turns of QA. Question words adjusted different argument types. In order to break through the limitation of question templates, Liu et al. [2] denoted question generation as unsupervised translation, which obtained more natural questions for EE tasks.

Existing MRC-based EE methods regard event detection and event argument extraction as two independent tasks and transform them into multi-turn QA. However, these two tasks are sequentially related, and the results of event detection influence the extraction of event arguments. Pipeline methods like multi-turn QA not only result in error propagation, but they also rely on external knowledge of event ontology. However, our joint model can train event classification and argument extraction jointly, which can alleviate the effects of error propagation and can improve the accuracy of two tasks mutually.

3. Methodology

In this section, firstly, we introduce the entire process of how to transform event detection and extraction as joint machine reading comprehension. Then, each step in our approach is decomposed and illustrated in detail.

3.1. Overview

In this article, event detection and extraction tasks are defined as joint machine reading comprehension. As shown in Figure 2, the inputs for our method contain the role set (which consists of all pre-defined event arguments), documents (which include several sentences, some of which describe events, while others do not), and the pre-designed question template (which is designed for each event argument and has appropriate question words). The first step is question generation, in which sentences in the given document are changed to question–answer pairs based on the role set and question template. Then, the question–answer pairs are fed to our JEEMRC model, where event classification and machine reading comprehension are trained jointly with state-aware weights. Also, the correlation between event types and argument roles guides the output of the model. Finally, each sentence is identified based on whether it describes events or not. If it does, our model outputs the event type and corresponding argument roles. Otherwise, None is output as the result, and the answer list is null.

3.2. Question Generation

The generation of question–answer pairs is an indispensable step when formulating EE as an MRC task. Following previous work [1], we generate questions based on well-designed templates. As some examples in Table 1 show, question words change according to different types of arguments, e.g., When for Time and Who for Victim. Moreover, prior knowledge of argument roles can be introduced to the MRC models as clues to improve the performance of EE.

Different from previous approaches, which regard event detection and argument extraction as two independent subtasks and just generate questions for arguments that appear in the given sentence, questions are generated for all argument roles in the dataset in our joint model. As presented in Algorithm 1, when given a sentence, we first identify whether it describes events or not. If it does, we go through all argument roles in the dataset and generate corresponding questions based on templates. Then, if the provided sentence contains the argument, the mention of that argument is labeled as the answer; otherwise, the answer list is null. If the sentence does not describe events, we randomly select a pre-defined number of argument roles to generate questions, and answer lists to those questions are null.

Algorithm 1 Question Generation Method

Require: QuesTemplates; RolesList; Sentence S; k

Ensure: a list of list QA = [ ]

1: if S is event, then

2: for argument_role in RolesList, do

3: if argument_role in S, then

4: q = GenQues(QuesTemplates,argument_role)

5: a = argument_role

6: QA = QA + {q,a}

7: else

8: q = GenQues(QuesTemplates,argument_role)

9: a = [ ]

10: QA = QA + {q,a}

11: end if

12: end for

13: else

14: count = 0

15: while count <k do

16: q = GenQues(QuesTemplates,argument_role)

17: a = [ ]

18: QA = QA + {q,a}

19: count += 1

20: end while

21: end if

22: return QA

3.3. JEEMRC

As shown in Figure 3, our proposed JEEMRC (Joint Event Extraction via end-to-end Machine Reading Comprehension) model consists of four modules: viz., encoding, event classifier, machine reader, and joint training.

3.3.1. Encoding

The inputs of the JEEMRC model are the given sentence S and the question Q generated in the question generation step. As pre-trained language models have been shown to be superior in most natural language processing (NLP) tasks, we choose BERT [32] as the basic encoder. The architecture of BERT contains several stacked transformer layers. Its pre-training regimen unfolds in two phases. Phase one, known as masked language modeling (MLM), involves concealing select token positions with the [MASK] token. The model then endeavors to recover the original tokens at these masked sites, grounded on the contextual cues from the rest of the sequence. Proceeding to phase two, next sentence prediction (NSP), the model confronts pairs of sentences that are demarcated by the [SEP] token with the objective of discerning whether they sequentially follow each other in the original text or not. This dual-stage pre-training strategy equips BERT with a profound understanding of linguistic context, enabling its adept application across diverse natural language processing tasks.

The vocabulary of BERT encompasses three special tokens: [CLS], [SEP], and [MASK]. The [CLS] and [SEP] tokens, respectively, denote the beginnings and ends of sequences or act as separators, whereas [MASK] plays a pivotal role in the pre-training phase. When BERT is utilized as an encoder for MRC tasks, question Q and sentence S are concatenated with the special tokens [CLS] and [SEP] as

[CLS] Q [SEP] S [SEP]

and fed to the BERT model as input. The transformer blocks in the BERT model receive the element-wise addition of token embeddings, segment embeddings, and position embeddings as input and compute the semantic representations as follows:

h_{t} = TransformerBlock (h_{t - 1}), \forall t \in [1, T],

(1)

where T is the number of transformer blocks.

3.3.2. Event Classifier

The aim of the event classifier module is to predict the event type for the input sentence (None for the sentence, which does not describe events). To achieve this goal, the output of the BERT encoder is fed to a max pooling layer and a dropout layer in sequence, followed by which, the output is passed through a linear classifier and softmax layer to obtain the logits of event type

y^{e_{i}}

:

y^{e_{i}} = Softmax (W_{e_{i}} \cdot x_{e_{i}} + b_{e_{i}}),

(2)

where

W_{e_{i}}

and

b_{e_{i}}

are trainable parameters,

x_{e_{i}}

is the output of the linear layer, and e is the total number of event types.

Let

θ

be the set of parameters employed in the event classifier module; we use the negative log probabilities as the loss function:

L {(θ)}_{e} = - \frac{1}{N} \sum_{i = 1}^{N} y_{i} log ({\hat{y}}_{i}),

(3)

where N is the number of training samples, and

{\hat{y}}_{i}

is the golden event type.

3.3.3. Machine Reader

In the machine reader module, the correlations between event types and argument roles are first computed by a coarse-to-fine attention mechanism. Based on its result, the module predicts the answer spans, which can be changed into argument mentions.

Coarse-to-fine attention: The correlations between event types and argument roles play a significant role in the EE task. Event types give cues for identifying corresponding arguments, and in turn, arguments make contributions to classifying events. Previous pipeline methods usually utilized these correlations explicitly. They identified the event type first and just generated questions for argument roles for this event type. In this way, the paradigms between event types and argument roles have to be defined in advance by professional experts. Different from that, our proposed JEEMRC is able to learn this mutual influence automatically by the coarse-to-fine attention mechanism.

As questions are generated for all arguments in the dataset, the model will be misled by questions which have no answers: for example, questions about Giver (Q3) and Recipient (Q31) in the Attack event shown in Figure 1. Coarse attention is utilized to guide JEEMRC to focus on the event type. To be specific, the output of the event classifier is fed to a linear layer to realize this goal. Let

a_{i}

be the output of the event classifier module; the coarse attention can be denoted as:

a_{k} = W_{c} a_{i} + b_{c},

(4)

where

W_{c}

and

b_{c}

are trainable parameters.

Following coarse attention, fine attention is applied to extract the mutual information between event types and arguments. With this information, JEEMRC can identify event types by whether questions about specific arguments have answers. For instance, our proposed model can identify the event type is Attack by detecting that questions about Attacker and Target have answers in the example in Figure 1. Specifically, the output of coarse attention

a_{k}

and the semantic representations encoded by BERT are fed to the fine attention module to compute the similarity between event types and each word embedding in the given sentence:

a_{r} = σ (\sum_{j = 1}^{M} α_{j} W_{f} h_{t_{j}}),

(5)

α_{j} = \frac{\exp (ReLU [a_{k}^{T} h_{t_{j}}])}{\sum_{j = 1}^{M} \exp (ReLU [a_{k}^{T} h_{t_{j}}])},

(6)

where

W_{f}

is the trainable parameter,

α_{j}

is the weight coefficient calculated by the attention mechanism, M is the number of tokens in the given sentence,

σ

is the activate function, and

h_{t_{j}}

is the j-th word representation encoded by the BERT encoder.

Answer predictor: When the EE task is formulated as machine reading comprehension, event arguments are extracted from sentences by answering corresponding questions. As Liu et al. [33] summarized, the prevalent method to predict answers in MRC is to predict the probabilities of start and end positions. Therefore, our answer predictor applies the softmax function to compute the final output. The loss function of the machine reader module is cross entropy loss. The loss of MRC can be computed as follows:

L {(θ)}_{mrc} = - \frac{1}{N} \sum_{i}^{N} [log (p_{y_{i}^{1}}^{s}) + log (p_{y_{i}^{2}}^{e})],

(7)

where N is the number of training samples, and

y_{i}^{1}

and

y_{i}^{2}

refer to the golden start and end positions, respectively, of example i.

3.3.4. Joint Training with State-Aware Weights

To avoid error propagation and also make full use of the correlations between event types and argument roles, the event classifier module and the machine reader module are trained jointly.

Different from previous MRC-based EE methods, we generate questions in reverse thinking, by which questions are generated for all arguments roles in the dataset. However, only questions of arguments in the sentence have answers, which results in us having far more negative samples than positive ones while training. To keep balance in the training data distribution, state-aware weights are set for the event classifier and the machine reader.

When computing event classification loss, we set different weights for these three different conditions in the dataset: 1 for sentences which do not describe events, 2 for sentences that describe events but for which the question about the argument has no answer, and 3 for sentences that describe events and for which the question about the argument has an answer. These state-aware weights can guide JEEMRC to pay more attention to questions that have answers.

In terms of machine reading comprehension loss, the weight parameter of the cross entropy loss function is set as

\frac{1}{β_{mrc}}

to reduce the probability of JEEMRC predicting that there is no answer to the question. We compare the performance for different values of

β_{mrc}

in Section 5.

Overall, let

L_{e}

be the loss of the event classifier and

L_{m r c}

represent the loss of the machine reader; the joint loss is calculated as:

L_{t o t a l} = L_{e} + γ L_{m r c},

(8)

where

γ

is a parameter to control the weight of the MRC loss. We conduct experiments using different values of

γ

in Section 5.

3.4. Post-Processing

By analyzing EE datasets, it can be found that event arguments fall into two categories: event-type-irrelevant and event-type-related. Event-type-irrelevant arguments generally play roles such as place and time, while event-type-related ones are constrained by event types: in other words, some arguments only appear in certain types of event sentences. As per the examples presented in Table 2, event arguments such as attacker, victim, and instrument are more possibly present in an event about conflict. However, if the type of event is movement, it contains arguments like origin, destination, and vehicle.

In order to make arguments extracted by our proposed JEEMRC model satisfy the constraint relationship with the event types, a heuristic approach is introduced to refine the arguments. Firstly, we summarize all event subtypes and their corresponding arguments. Then, when transforming the answer spans extracted by the MRC model into the arguments required by the EE task, we discard spans that do not meet the event type constraint.

In addition, when given a sentence, the question generation module generates questions for all possible event argument types, which are fed to our joint model later. Each input context–question pair is classified into one specific type by the event type classifier. However, one sentence may be judged to be multiple event types because of questions about different arguments. In such cases, the most judged event type is chosen as the final type.

4. Experiments

In this section, the dataset and metrics are introduced in brief at first. Following that, we describe baseline models and experiment settings. Then, experimental results for full supervision and few-shot scenarios are illustrated.

4.1. Dataset and Metrics

We conduct experiments on a widely-used dataset for event extraction, ACE 2005 [34], which contains articles crawled from various fields, e.g., broadcast conversations, broadcast news, newswire and weblogs, and has been annotated carefully by human with event mentions, triggers, arguments, and co-reference. A total of 33 event subtypes, grounded into 8 event types (Life, Conflict, Movement, Justice, Personnel, Transaction, Business, and Contact) and 35 argument roles are defined in ACE 2005. In particular, roles that represent time, such as time after, time before, time within, time at beginning, and time at end, are combined as time in our experiments to avoid perplexing the MRC model, which results in 31 argument roles actually. The dataset is split into three parts—viz., training, validating, and testing sets—according to prior work [35]. For few-shot training, we extract samples from the whole training set in proportion to event subtypes to guarantee the data distribution remains unchanged.

Following previous research, precision (P), recall (R), and F1 scores are chosen as metrics to evaluate the performance of event detection, argument identification, and classification.

4.2. Baselines

Our proposed model is compared with some baselines to illustrate its effectiveness:

dbRNN [9]: an RNN-based model for EE proposed by Sha et al. in 2018. Besides RNN, dependency bridges are utilized to enhance the model by extracting syntactic information.
GAIL [36]: based on generative adversarial imitation learning, Zhang et al. introduce this new framework for joint entities and EE. A novel inverse reinforcement learning approach that utilizes generative adversarial networks is applied in this framework.
DYGIE++ [35]: based on contextualized span representations such as BERT, Wadden et al. introduce a unified framework that can learn named entity recognition, relation extraction, and event extraction jointly.
BART-GEN [37]: Li et al. formulate EE as conditional text generation based on event templates and choose BART [38] as the base encoder–decoder language model.
TEXT2EVENT [39]: Lu et al. introduce a sequence-to-sequence generation approach to detect events and extract arguments in an end-to-end manner.
TANL [40]: Paolini et al. transform event extraction to a translation task and propose the TANL model to extract task-relevant information.
GTEE-DYNPREF [41]: Liu et al. propose a generative event extraction model that can generate type-specific prefixes for each context. GTEE-DYNPREF is able to reduce the influence of suboptimal prompts.
EEQA [1]: Du et al. formulate EE as two-turn question answering. The first QA extracts event triggers and classifies event types, followed by which, EEQA extracts argument roles during the second turn of QA.
MQAEE [3]: MQAEE is another QA-based EE method proposed by Li et al. that extracts triggers and arguments by multi-turn QA with answer history embeddings.
BERTEE [2]: the baseline model, which is applied by Liu et al., only uses BERT as a word representation encoder. For event extraction tasks, classification strategies are adopted.
DMCNN [10]: an event extraction method utilizes a dynamic multi-pooling CNN for event extraction and was proposed by Chen et al. in 2015.

4.3. Experimental Settings

We choose BERT-base-uncased (https://huggingface.co/bert-base-uncased, accessed on 26 December 2022.) as our basic pre-trained language model. When training the models, the parameters are set as presented in Table 3.

4.4. Results

Experiments are conduct for two conditions: viz., with full supervision and for few-shot scenarios. The results are presented in the following sections.

4.4.1. Results with Full Supervision

The experimental results on ACE 2005 with full supervision are shown in Table 4. The first column gives the names of models. The precisions (P), recalls (R), and F1 scores (F1) for event classification (EC), argument identification (AI), and argument classification (AC) are presented in the following columns. The last column is the average of the F1 scores for the three tasks.

As shown in Table 4, Rows 1 to 4 are the results of some EE models in the classification framework, while Rows 5 and 8 are EE methods in the generation manner. EEQA in Row 9, MQAEE in Row 10, and our proposed model JEEMRC formulated EE models as QA tasks. Different from EEQA and MQAEE, our JEEMRC utilizes the end-to-end MRC model to train event detection and argument extraction jointly. Moreover, our question generation approach is equivalent to data augmentation, which generates more training QA samples for the MRC model. This method does not increase the time complexity of the MRC model. In terms of recall for the three tasks, our JEEMRC outperforms the other baselines except GTEE-DYNPREF. This is because our model, which generates questions for all argument roles in the dataset, is prone to extract arguments from the given sentences as fully as possible. Though high recall scores mean that the precision of our JEEMRC for the three tasks is lower than that of some baselines, our JEEMRC gains the highest F1 scores for the event classification and argument identification tasks. In the argument classification task, the F1 score of JEEMRC is also competitive. Overall, the average score in the last column shows that our JEEMRC model has a more balanced performance in all of the three tasks by training event classification and machine reading comprehension jointly.

4.4.2. Results for Few-Shot Scenarios

Experimental results on ACE 2005 for few-shot scenarios are shown in Table 5. To simulate data-scarce scenarios, 1%, 5%, 10%, and 20% of the examples from that dataset are selected randomly as the training set. Columns 3 to 6 are the F1 scores for argument extraction for different training sets. Rows 1 to 3 show the results for baseline models, while Rows 4 and 5 are the results for our JEEMRC. The difference between Rows 4 and 5 is that JEEMRC in Row 4 directly applies BERT as the encoder without other training data, while in Row 5, the BERT encoder is first trained on SQuAD 2.0 [22], which is a dataset for MRC with unanswerable questions.

The results in Table 5 illustrate that our proposed JEEMRC model outperforms CNN-based, RNN-based, and BERT-based EE methods in few-shot scenarios: for instance, by obtaining a 41.0% F1 score with 20% of the training examples, which is higher than the F1 score of BERTEE by 12.4%. This improvement comes from the MRC framework. By transforming argument extraction into question answering, our model can learn from successful cases in MRC and reduce dependency on labeled data. The results in Rows 4 and 5 illustrate the advantages of transfer learning. In extremely data-scare scenarios, just utilizing BERT as the encoder without other training data does not perform well, e.g., a 0.6% F1 score with 1% of the training examples and a 1.7% F1 score with 5% of the training examples. Compared with that, when the BERT encoder is first trained on SQuAD 2.0, the F1 scores increase to 6.2% and 16.3% with 1% and 5% of the training examples, respectively. By training on SQuAD 2.0 at first, our model learns knowledge from MRC and transfers the knowledge to few-shot EE successfully. This illustrates that the performance of EE in few-shot scenarios can be improved by transferring knowledge of MRC effectively.

5. Analysis

To further analyze the performance of our JEEMRC, various experiments are conducted in this section: viz., an ablation study and comparisons using different attention sizes, state-aware weights, and joint training parameters.

5.1. Ablation Study

In order to verify the contributions of different components to our JEEMRC, ablation experiments are conducted in this section, and the results are presented in Table 6.

As shown in Table 6, the second column is components, and Columns 3 to 5 present the F1 scores for event classification, argument classification, and argument identification, respectively. The experimental results for our JEEMRC can be seen in the first row, while the results for ablations are in the following rows. We mainly study the contributions of the heuristic post processing mechanism, the coarse-to-fine attention mechanism, and the weight parameter for cross entropy loss of MRC. Without the heuristic post processing mechanism, the F1 scores for the three tasks decline 1.9%, 0.3%, and 5.7%, respectively. This illustrates that our post processing mechanism, which is mainly based on the constraint relationship between event types and argument roles, is able to improving the performances of event classification and argument classification but has little effect on argument identification. For coarse-to-fine attention, we remove coarse attention and fine attention separately to analyze their contributions. As the results in Rows 3 and 4 show, both coarse and fine attention make contributions to the three tasks, but fine attention is more important. Without fine attention, the F1 scores for the three tasks decrease 3.6%, 8.7%, and 6.6%, respectively. This implies that it is not enough to just pay attention to event types. The interactions between event types and arguments play a significant role in the argument extraction task. When loss weight is removed, namely, the weight parameter for cross entropy loss of MRC is set to 1, the F1 scores for all three tasks drop dramatically: by 15.4%, 18.9%, and 14.4%, respectively. The reason might be that when given a sentence, questions are generated for all argument roles in the dataset. Only questions of arguments in the sentence have answers, which results in the number of negative samples being far more than the number of positive ones in the training data. Therefore, without the loss wight, the machine reader module inclines to predict that there is no answer to the question, and in turn, the performances for event detection and argument extraction are influenced. When the coarse-to-fine attention mechanism and the loss weight are removed together, the performance of JEEMRC is the worst, which demonstrates that both of these components make contributions, and they can complement each other.

5.2. Comparison of Attention Sizes

Table 7 presents the results of different attention sizes. Here, attention sizes mean the number of hidden layers.

From the results shown in Table 7, when the attention size is set to 454, our model reaches the highest F1 scores for all of the three tasks. Combining the results for all three tasks, 454 is a more appropriate value for JEEMRC, with the highest average score of 61.6%. Overall, the number of hidden layers should not be too much or too few. If the attention size is too small, the model cannot learn the interactive information of event types and argument roles sufficiently. In turn, when the size is too large, the model is too complicated to train well.

5.3. Comparison of State-Aware Weights

To avoid the effect of negative samples and keep balance in training data distribution, state-aware weights are set for the event classifier and the machine reader. In this section, we conduct experiments using different state-aware weights.

5.3.1. State-Aware Weight for Event Classifier

When given a sentence, there are three situations: ① sentence does not describe events; ② sentence describes events, but the question about the argument has no answer; ③ sentence describes events, and the question about the argument has the answer. Different weights are set for these three situations in the event classifier module, and the results are shown in Table 8.

We can see from Table 8 that our JEEMRC gains the highest F1 scores for the three tasks and the highest average score of 61.6% when the weights for the three situations are set as 1:2:3. With these weights, the model can pay more attention to questions that have answers in order to improve the performance of joint training, and it is also able to identify event types as None. When the weight is 1:1:2, it means that we only distinguish whether the question has the answer or not. In this case, JEEMRC does not performs well at event classification tasks. In contrast, when the weight is 1:2:2, we just pay attention to whether the sentence describes events or not, and the performance of JEEMRC for argument classification declines significantly. Moreover, when we assign the same weight for the three cases—namely, the weight is set as 1:1:1—the average score is the lowest compared with the other settings. Experimental results show that to perform well at both event detection and argument extraction, JEEMRC should pay different attention to the three situations above and should focus on those questions that have answers.

5.3.2. Loss Weight for Machine Reader

In our ablation study, the effectiveness of the loss weight is verified, and in this section, experiments are conducted using different values for the loss weight.

It can be seen from the line chart in Figure 4 that the F1 scores for event classification first increase from 71.9% to 72.5% with the value of the loss weight increasing from 10 to 20; then the F1 scores for event classification show a trend of declining when the loss weight continues to increase after peaking at 73.3%. For argument extraction, a loss weight of 60 gains the highest F1 scores for argument classification and argument identification. Before the value of the loss weight reaches 80, the trends between the F1 scores for all three tasks are roughly the same. When the value of loss weight is 90, our model performs well at event detection, but the F1 score for argument extraction is unsatisfactory. In contrast, the F1 scores for argument identification and classification increase greatly when the value of the loss weight increases to 100, while the performance for event classification declines. On the whole, 60 is a more appropriate value for the loss weight, at which our model can keep balance between event detection and argument extraction.

5.4. Comparison of Joint Training Parameter

In the joint training process, the parameter

γ

is set to control the weight of the MRC loss. Table 9 presents comparison results using different values of

γ

.

From the results shown in Table 9, our JEEMRC gains the highest F1 scores for all of the three tasks when the value of the joint training parameter

γ

is set to 4. When the value of

γ

increase from 1 to 4, the trend of the average scores increases. After peaking at 61.6%, the average score begins to decrease when the value of

γ

continues to increase. The experimental results illustrate that 4 is a suitable value for the joint training parameter in order to obtain the best joint training performance.

6. Conclusions

In this article, event detection and argument extraction are formulated as joint machine reading comprehension. To learn these two tasks jointly, the JEEMRC model is introduced. Consisting of two main modules, an event classifier and a machine reader with a coarse-to-fine attention mechanism, JEEMRC can identify event types and extract arguments by answering questions simultaneously without labeling triggers. Our method is able to not only alleviate error propagation but can also utilize the success of previous MRC models, showing competitive performance in data-scarce scenarios. In the future, we will adopt our approach to other transfer learning tasks, i.e., cross-language event detection and extraction.

However, there are still some limitations. We acknowledge that our approach relies on pre-designed question templates, so expert knowledge is required, and when transferring to new datasets, question templates need to be reconstructed. Moreover, our method results in an imbalance between positive and negative samples in the training set. More effective question generation methods should be proposed to tackle these problems in future studies.

Author Contributions

S.Z. and K.D. proposed the main idea of this article; S.Z. and S.L. designed the architecture of the neural networks; S.L. and L.L. conducted experiments and analyzed the results; S.L. wrote the main manuscript text. All authors reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the China Postdoctoral Science Foundation (No. 2021MD703983).

Data Availability Statement

The data underlying this article were provided by [34] under license. Data will be shared on request from the corresponding author with permission.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Du, X.; Cardie, C. Event Extraction by Answering (Almost) Natural Questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 671–683. [Google Scholar]
Liu, J.; Chen, Y.; Liu, K.; Bi, W.; Liu, X. Event extraction as machine reading comprehension. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 1641–1651. [Google Scholar]
Li, F.; Peng, W.; Chen, Y.; Wang, Q.; Pan, L.; Lyu, Y.; Zhu, Y. Event extraction as multi-turn question answering. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020; pp. 829–838. [Google Scholar]
Lyu, Q.; Zhang, H.; Sulem, E.; Roth, D. Zero-shot event extraction via transfer learning: Challenges and insights. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Online, 2–5 August 2021; pp. 322–332. [Google Scholar]
Feng, R.; Yuan, J.; Zhang, C. Probing and fine-tuning reading comprehension models for few-shot event extraction. arXiv 2020, arXiv:2010.11325. [Google Scholar]
Liao, S.; Grishman, R. Using document level cross-event inference to improve event extraction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 11–16 July 2010; pp. 789–797. [Google Scholar]
Huang, R.; Riloff, E. Modeling textual cohesion for event extraction. In Proceedings of the AAAI Conference on Artificial Intelligence, Toronto, Canada, 22–26 July 2012; Volume 26, pp. 1664–1670. [Google Scholar]
Nguyen, T.H.; Cho, K.; Grishman, R. Joint event extraction via recurrent neural networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 300–309. [Google Scholar]
Sha, L.; Qian, F.; Chang, B.; Sui, Z. Jointly extracting event triggers and arguments by dependency-bridge RNN and tensor-based argument interaction. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Chen, Y.; Xu, L.; Liu, K.; Zeng, D.; Zhao, J. Event extraction via dynamic multi-pooling convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 26–31 July 2015; pp. 167–176. [Google Scholar]
Rao, S.; Marcu, D.; Knight, K.; Daumé, H., III. Biomedical event extraction using abstract meaning representation. In Proceedings of the BioNLP 2017, Vancouver, BC, Canada, 17–23 August 2017; pp. 126–135. [Google Scholar]
Liu, X.; Luo, Z.; Huang, H.Y. Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 1247–1256. [Google Scholar]
Ding, R.; Li, Z. Event extraction with deep contextualized word representation and multi-attention layer. In Proceedings of the Advanced Data Mining and Applications: 14th International Conference, ADMA 2018, Nanjing, China, 16–18 November 2018; Proceedings 14; Springer: Berlin/Heidelberg, Germany, 2018; pp. 189–201. [Google Scholar]
Wu, Y.; Zhang, J. Chinese event extraction based on attention and semantic features: A bidirectional circular neural network. Future Internet 2018, 10, 95. [Google Scholar] [CrossRef]
Hong, Y.; Zhang, J.; Ma, B.; Yao, J.; Zhou, G.; Zhu, Q. Using cross-entity inference to improve event extraction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; pp. 1127–1136. [Google Scholar]
Chen, C.; Ng, V. Joint modeling for chinese event extraction with rich linguistic features. In Proceedings of the COLING 2012, Mumbai, India, 8–15 December 2012; pp. 529–544. [Google Scholar]
Peng, H.; Song, Y.; Roth, D. Event detection and co-reference with minimal supervision. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–4 November 2016; pp. 392–402. [Google Scholar]
Huang, L.; Ji, H.; Cho, K.; Dagan, I.; Riedel, S.; Voss, C. Zero-Shot Transfer Learning for Event Extraction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; pp. 2160–2170. [Google Scholar]
Lai, V.D.; Nguyen, T.H.; Dernoncourt, F. Extensively Matching for Few-shot Learning Event Detection. In Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events, Online, 5–10 July 2020; pp. 38–45. [Google Scholar]
Deng, S.; Zhang, N.; Kang, J.; Zhang, Y.; Zhang, W.; Chen, H. Meta-learning with dynamic-memory-based prototypical network for few-shot event detection. In Proceedings of the 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; pp. 151–159. [Google Scholar]
Williams, A.; Nangia, N.; Bowman, S. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, LA, USA, 1–6 June 2018; pp. 1112–1122. [Google Scholar]
Rajpurkar, P.; Jia, R.; Liang, P. Know What You Don’t Know: Unanswerable Questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia, 15–20 July 2018; pp. 784–789. [Google Scholar]
Seo, M.; Kembhavi, A.; Farhadi, A.; Hajishirzi, H. Bidirectional attention flow for machine comprehension. arXiv 2016, arXiv:1611.01603. [Google Scholar]
Yu, A.W.; Dohan, D.; Luong, M.T.; Zhao, R.; Chen, K.; Norouzi, M.; Le, Q.V. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. arXiv 2018, arXiv:1804.09541. [Google Scholar]
Liu, S.; Zhang, S.; Zhang, X.; Wang, H. R-trans: RNN transformer network for Chinese machine reading comprehension. IEEE Access 2019, 7, 27736–27745. [Google Scholar] [CrossRef]
Wang, W.; Yang, N.; Wei, F.; Chang, B.; Zhou, M. Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1, pp. 189–198. [Google Scholar]
Levy, O.; Seo, M.; Choi, E.; Zettlemoyer, L. Zero-Shot Relation Extraction via Reading Comprehension. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), Vancouver, BC, Canada, 3 August 2017; pp. 333–342. [Google Scholar]
Li, X.; Feng, J.; Meng, Y.; Han, Q.; Wu, F.; Li, J. A Unified MRC Framework for Named Entity Recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 5849–5859. [Google Scholar]
Xiong, Y.; Huang, Y.; Chen, Q.; Wang, X.; Nic, Y.; Tang, B. A joint model for medical named entity recognition and normalization. CEUR Workshop Proc. ISSN 2020, 1613, 17. [Google Scholar]
Sun, C.; Yang, Z.; Wang, L.; Zhang, Y.; Lin, H.; Wang, J. Biomedical named entity recognition using BERT in the machine reading comprehension framework. J. Biomed. Inform. 2021, 118, 103799. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Yin, F.; Sun, Z.; Li, X.; Yuan, A.; Chai, D.; Zhou, M.; Li, J. Entity-Relation Extraction as Multi-Turn Question Answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 27 July –2 August 2019; pp. 1340–1350. [Google Scholar]
Kenton, J.D.M.W.C.; Toutanova, L.K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the NAACL-HLT, Minneapolis, MI, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Liu, S.; Zhang, X.; Zhang, S.; Wang, H.; Zhang, W. Neural machine reading comprehension: Methods and trends. Appl. Sci. 2019, 9, 3698. [Google Scholar] [CrossRef]
Doddington, G.R.; Mitchell, A.; Przybocki, M.A.; Ramshaw, L.A.; Strassel, S.M.; Weischedel, R.M. The automatic content extraction (ace) program-tasks, data, and evaluation. In Proceedings of the LREC, Lisbon, Portugal, 26–28 May 2004; Volume 2, pp. 837–840. [Google Scholar]
Wadden, D.; Wennberg, U.; Luan, Y.; Hajishirzi, H. Entity, Relation, and Event Extraction with Contextualized Span Representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hongkong, China, 3–7 November 2019; pp. 5784–5789. [Google Scholar]
Zhang, T.; Ji, H.; Sil, A. Joint entity and event extraction with generative adversarial imitation learning. Data Intell. 2019, 1, 99–120. [Google Scholar] [CrossRef]
Li, S.; Ji, H.; Han, J. Document-Level Event Argument Extraction by Conditional Generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Mexico City, Mexico, 6–11 June 2021; Association for Computational Linguistics (ACL): Stroudsburg, PA, USA, 2021; pp. 894–908. [Google Scholar]
Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 7871–7880. [Google Scholar]
Lu, Y.; Lin, H.; Xu, J.; Han, X.; Tang, J.; Li, A.; Sun, L.; Liao, M.; Chen, S. Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 2–5 August 2021; pp. 2795–2806. [Google Scholar]
Paolini, G.; Athiwaratkun, B.; Krone, J.; Ma, J.; Achille, A.; Anubhai, R.; Santos, C.N.d.; Xiang, B.; Soatto, S. Structured prediction as translation between augmented natural languages. arXiv 2021, arXiv:2101.05779. [Google Scholar]
Liu, X.; Huang, H.Y.; Shi, G.; Wang, B. Dynamic Prefix-Tuning for Generative Template-based Event Extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 5216–5228. [Google Scholar]

Figure 1. Comparison of different question generation techniques in pipeline methods (left) and our joint approach (right).

Figure 2. The entire process of transforming event detection and extraction as joint MRC.

Figure 3. Framework of our JEEMRC model.

Figure 4. Comparison of different loss weights. Loss weight refers to

β_{mrc}

in Section 3.3.4.

Figure 4. Comparison of different loss weights. Loss weight refers to

β_{mrc}

in Section 3.3.4.

Table 1. Examples of question templates for different kinds of arguments.

Argument Role	Question Template
Org	What is the organization?
Victim	Who is the victim?
Place	Where did the event take place?
Time	When did the event take place?
Origin	Where is the origin?

Table 2. Examples of event-type-related arguments in ACE 2005.

Event Type	Event Argument
Conflict	Attacker, Victim, Instrument
Movement	Origin, Destination,Vehicle
Justice	Defendant, Crime, Adjudicator, Plaintiff
Business	Organization, Agent
Personnel	Position
Transaction	Money, Beneficiary, Recipient, Giver

Table 3. Parameter settings utilized in JEEMRC model.

Parameter Name	Illustration	Value
train_batch_size	batch size per GPU for training	16
learning_rate	initial learning rate for Adam	3 × $10^{- 5}$
num_train_epochs	total number of training epochs	5
max_seq_length	maximum total input sequence length	384
doc_stride	stride to take between chunks	128
logging_steps	log every X update steps	3000
num_event	number of event types	34
max_query_length	maximum number of question tokens	64
max_answer_length	maximum length of an answer	64

Table 4. Experimental results on ACE 2005 with full supervision. Subtasks include event classification (EC), argument identification (AI), and argument classification (AC). Evaluation metrics include precision (P), recall (R), F1 score (F1), and the average F1 score for three subtasks (avg). SOTA results are in boldface.

Model	EC			AI			AC			Avg
Model	P	R	F1	P	R	F1	P	R	F1	Avg
dbRNN [9]	74.1	69.8	71.9	/	/	57.2	/	/	50.1	59.7
GAIL, ELMo [36]	74.8	69.4	72.0	63.3	48.7	55.1	61.6	45.7	52.4	59.8
DYGIE++, BERT+LSTM [35]	/	/	68.9	/	/	54.1	/	/	51.4	58.1
DYGIE++, BERT FineTune [35]	/	/	69.7	/	/	55.4	/	/	52.5	59.2
BART-GEN [37]	69.5	72.8	71.1	/	/	/	56.0	51.6	53.7	/
TEXT2EVENT [39]	67.5	71.2	69.2	/	/	/	46.7	53.4	49.8	/
TANL [40]	/	/	68.5	/	/	48.5	/	/	48.5	55.2
GTEE-DYNPREF [41]	63.7	84.4	72.6	/	/	/	49.0	64.8	55.8	/
EEQA [1]	71.1	73.7	72.3	58.9	52.0	55.2	56.7	50.2	53.3	60.2
MQAEE [3]	/	/	71.7	/	/	55.2	/	/	53.4	60.1
JEEMRC	66.0	82.5	73.3	50.8	71.6	59.4	44.4	62.5	51.9	61.6

Table 5. The F1 scores on ACE 2005 for few-shot scenarios: 1%, 5%, 10%, and 20% represent how many examples are selected randomly from the whole dataset to simulate data-scarce scenarios. SOTA results are in boldface.

ID	Models	1%	5%	10%	20%
1	DMCNN [10]	/	8.7	16.6	23.7
2	dbRNN [9]	/	8.1	17.2	24.1
3	BERTEE [2]	2.2	10.5	19.3	28.6
4	JEEMRC (BERT)	0.6	1.7	25.7	34.8
5	JEEMRC (SQuAD 2.0)	6.2	16.3	32.6	41.0

Table 6. Ablation analysis results. Components include heuristic post processing mechanism (post processing), coarse-to-fine attention mechanism (attention), and weight parameter for cross entropy loss of MRC (loss). Their contributions are measured by F1 scores for three subtasks: event classification (EC), argument identification (AI), and argument classification (AC).

ID	Component	EC F1	AI F1	AC F1
1	JEEMRC	73.3	59.4	51.9
2	−post processing	−1.9	−0.3	−5.7
3	−coarse attention	−1.7	−4.8	−2.8
4	−fine attention	−3.6	−8.7	−6.6
5	−loss	−15.4	−18.9	−13.3
6	−loss-attention	−16.5	−19.5	−14.4

Table 7. Comparison of different attention sizes (the hidden layers of coarse-to-fine attention mechanism). Metrics contain F1 scores of three subtasks—event classification (EC), argument identification (AI), argument classification (AC)—and their averages. The best results are in boldface.

ID	Attention Size	EC F1	AI F1	AC F1	Average
1	125	74.1	50.8	46.5	57.1
2	256	70.1	52.8	46.2	56.4
3	384	69.8	52.7	47.3	56.6
4	454	73.3	59.4	51.9	61.6
5	526	72.4	53.8	48.2	58.1
6	633	70.9	54.2	47.1	57.4
7	768	71.2	53.2	48.7	57.7

Table 8. Comparison of different weights for three situations: ① sentence does not describe events; ② sentence descries events, but the question about the argument has no answer; ③ sentence describes events, and the question about the argument has the answer. The best results are in boldface.

①:②:③	EC F1	AI F1	AC F1	Average
1:1:1	70.6	53.1	50.1	57.9
1:1:2	69.7	55.3	50.7	58.6
1:2:2	71.8	57.8	48.6	59.4
1:2:3	73.3	59.4	51.9	61.6

Table 9. Comparison of different values of joint training parameter:

γ

is in Equation (8) and can control the weight of MRC loss. The best results are in boldface.

Table 9. Comparison of different values of joint training parameter:

γ

is in Equation (8) and can control the weight of MRC loss. The best results are in boldface.

$γ$	EC F1	AI F1	AC F1	Average
1	70.1	55.1	42.2	55.8
2	72.6	53.7	45.5	57.3
3	72.4	55.4	47.2	58.3
4	73.3	59.4	51.9	61.6
5	69.2	57.8	49.1	58.7
6	73.0	53.6	47.2	57.9
7	68.8	54.2	49.1	57.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, S.; Zhang, S.; Ding, K.; Liu, L. JEEMRC: Joint Event Detection and Extraction via an End-to-End Machine Reading Comprehension Model. Electronics 2024, 13, 1807. https://doi.org/10.3390/electronics13101807

AMA Style

Liu S, Zhang S, Ding K, Liu L. JEEMRC: Joint Event Detection and Extraction via an End-to-End Machine Reading Comprehension Model. Electronics. 2024; 13(10):1807. https://doi.org/10.3390/electronics13101807

Chicago/Turabian Style

Liu, Shanshan, Sheng Zhang, Kun Ding, and Liu Liu. 2024. "JEEMRC: Joint Event Detection and Extraction via an End-to-End Machine Reading Comprehension Model" Electronics 13, no. 10: 1807. https://doi.org/10.3390/electronics13101807

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

JEEMRC: Joint Event Detection and Extraction via an End-to-End Machine Reading Comprehension Model

Abstract

1. Introduction

2. Related Work

2.1. Joint Event Extraction

2.2. Few/Zero-Shot Event Extraction

2.3. Machine Reading Comprehension for IE

3. Methodology

3.1. Overview

3.2. Question Generation

3.3. JEEMRC

3.3.1. Encoding

3.3.2. Event Classifier

3.3.3. Machine Reader

3.3.4. Joint Training with State-Aware Weights

3.4. Post-Processing

4. Experiments

4.1. Dataset and Metrics

4.2. Baselines

4.3. Experimental Settings

4.4. Results

4.4.1. Results with Full Supervision

4.4.2. Results for Few-Shot Scenarios

5. Analysis

5.1. Ablation Study

5.2. Comparison of Attention Sizes

5.3. Comparison of State-Aware Weights

5.3.1. State-Aware Weight for Event Classifier

5.3.2. Loss Weight for Machine Reader

5.4. Comparison of Joint Training Parameter

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI