1. Introduction
Event extraction (EE) is a challenging task in natural language understanding that plays a crucial role [
1,
2,
3,
4]. EE is dedicated to extracting event information occurring in the real world from text, classifying them into predefined event types, identifying the trigger and event participants, etc. [
5]. Event extraction can be widely used in information retrieval and summarization, knowledge graph construction [
6], intelligence analysis [
7], and other fields. EE usually includes four subtasks, i.e., trigger identification (TI), Trigger Classification (TC), Argument Identification (AI), and Argument Classification (AC) [
8,
9,
10].
The objective of event extraction is to identify events and arguments from the text. The event extraction task is formally defined as follows:
Given an input sentence x = {x1, x2, …, xN} consisting of N words, an event-type collection (), and an argument role collection (). Let represent the gold set. includes all event types () in the sentence, all triggers () in each event type (), and different argument roles under each event type. is the argument corresponding to .
However, a sentence may contain multiple events, and the arguments and triggers of these events have complex overlapping phenomena. We summarize this into three situations: (1) Different event types may be triggered by the same trigger. The trigger “减持” (reduced its holdings) marked in red in
Figure 1 triggers both the shareholding reduction event and the share equity transfer event. (2) The same argument plays different roles in different event types. In
Figure 1, “大族激光” (Han’s Laser) plays the role of
obj in the “股份股权转让” (share equity transfer) event, and also plays the
target-company role in the share equity transfer, which is triggered by “减持” (reduced its holdings). (3) The same argument plays the same role in different events of the same event type. In
Figure 1, the “股份股权转让” (share equity transfer) event appears twice in the sentence, both occurring in “10月” (October). In FewFC (a Chinese financial event extraction dataset) [
11], about 13.5% of the sentences have trigger overlap problems, and 21.7% of the sentences have argument overlap problems. The same event type appears repeatedly in 8.3% of the sentences.
Most previous studies partially addressed overlapping issues and did not cover all the above situations. In 2019, Yang et al. [
12] utilized a staged pipeline approach for event trigger and argument extraction. Nevertheless, this method overlooked the challenge of trigger overlap. In 2020, Xu et al. [
13] used a joint extraction framework to solve the role overlapping problem. Xu defined event relationship triples to represent the relationship between triggers, arguments, and roles, thereby converting the argument classification problem into a relationship extraction problem. In 2020, Huang et al. [
14] proposed using a hierarchical knowledge structure graph containing conceptual and semantic reasoning paths to represent knowledge. They employed GEANet to encode intricate knowledge, addressing the issue of trigger extraction in nested structures within the biomedical domain [
14]. In 2022, Zhang et al. [
15] designed a two-stage pipeline model in which the trigger is identified using a sequence annotation approach, and overlapping arguments are identified through multiple sets of role binary classification networks. In 2023, Yang et al. [
16] used a multi-task learning model to extract entity relationships and events, in which a multi-label classification method is used to settle the overlapping role problem shared by these two tasks.
The contributions of this paper are summarized as follows:
- (1)
A role pre-judgment module is proposed to predict roles based on the correspondence between event types and roles, text embeddings, and trigger embeddings, which can significantly improve the recall rate of each subtask and provide a basis for extracting overlapping arguments.
- (2)
ROPEE adopts a joint learning framework, and the designed loss function includes the losses of four modules, event-type detection, trigger extraction, role pre-judgment, and argument extraction, so as to effectively learn the interactive relationship between modules during training. Thus, error propagation issues can be alleviated in the prediction stage.
- (3)
ROPEE outperforms the baseline model by 0.4%, 0.9%, and 0.6% in terms of F1 over TC, AI, and AC on the FewFC dataset. For sentences with overlapping triggers, ROPEE outperforms the baseline model by 0.9% and 1.2% in terms of F1 over AI and AC, respectively. In the case of overlapping arguments, ROPEE demonstrates superior performance compared to the baseline model, with improvements of 0.7% and 0.6%. This highlights the effectiveness of our suggested approach in managing overlapping occurrences of event phenomena.
The remainder of this paper is organized as follows: In
Section 2, related studies are given. In
Section 3, the details of the ROPEE model are introduced. Comparative experiments are performed and experimental results are analyzed in
Section 4.
Section 5 concludes this work.
2. Related Studies
Event extraction is one of the most challenging tasks in information extraction research [
17]. Existing paradigms related to event extraction include pipeline methods and joint learning methods [
18].
The pipeline-based method handles these four subtasks of EE separately. Each subtask has its own objective function and loss. In 2015, Chen et al. [
8] developed a dynamic multi-pooling convolutional neural network (DMCNN). This network utilizes a dynamic multi-pooling layer based on event triggers and arguments to retain essential information, combining both the sentence-level and lexical-level details from the raw text without the need for extensive preprocessing. Most deep learning supervised methods for event extraction require lots of labeled data for training. Annotating large amounts of data is very laborious and hard to get. To gain more insights from limited training data, Yang et al. [
12] combined the extraction model and event generation method in 2019 and improved the performance of the argument extractor through a weighted loss function based on various role importance. The above two methods cannot explicitly model the semantics between events and roles, nor can they capture the interaction between them. In 2020, Li et al. [
19] devised a multi-stage QA framework to represent event extraction as reading comprehension issues and captured the dynamic connection between each subtask by integrating previous answers into questions. The generative event extraction model proposed by Paolini et al. [
20] in 2021 solves the encoding problem of label semantics and other weak supervision signals in a pipeline manner and can improve the performance in few-sample scenarios. Since the loss function in the pipeline-based method is calculated after the argument extraction, error propagation problems may occur.
Joint learning methods integrate the loss in both the trigger extraction stage and argument extraction stage into the final loss function, treating triggers and arguments equally, and the two can mutually promote each other’s extraction effects [
18]. In 2021, Sheng et al. [
9] first covered all event overlap issues through a unified framework with a cascading decoder to perform TC, TI, and argument extraction in sequence, and F1 reached 71.4% on the FewFC dataset. In order to further extract inter-word relationships in overlapping sentences in parallel, Cao et al. [
10] proposed a single-stage framework based on inter-word relationships by jointly extracting the intra-word and cross-word pair relationships of triggers and arguments. The above two methods focus on the event extraction task itself and do not introduce additional information or other tasks of joint information extraction. In 2022, Hsu et al. [
21] converted event extraction into a conditional generation problem, and extracted triggers and arguments end-to-end through additional prompts and weak supervision information. In 2022, Van Nguyen et al. [
22] used an edge weighting mechanism to learn the dependency graph between task instances and jointly complete the information extraction task. In addition to introducing additional prompt information in document-level event extraction, remote dependencies can also be used to improve extraction performance. In 2023, Liu et al. [
23] proposed a chain reasoning paradigm for document-level event argument extraction, which represented argument queries by constructing first-order logic rules and T-Norm fuzzy logic, which is used for end-to-end learning. We propose a joint overlapping event extraction model ROPEE for the event overlapping phenomenon. It uses the correspondence between event types and roles and trigger embeddings to predict roles, which not only effectively alleviates error propagation, but also further improves the accuracy of event extraction.
3. ROPEE Model
The overall framework of ROPEE is illustrated in
Figure 2. ROPEE includes four modules: event detection, trigger identification, role pre-judgment, and argument extraction. Specifically, type detection predicts potential event types and extracts overlapping triggers by calculating the similarity between sentence representations and event-type embeddings. The role pre-judgment module comprehensively considers text embeddings and trigger embeddings, pre-judgment roles based on the correspondence table between event types and roles, which can assist in the extraction of overlapping arguments. Trigger extraction and argument extraction are based on text representation that incorporates specific event types of information and specific role information, and binary classifiers are adopted to predict the starting and ending positions of triggers or arguments. To minimize error propagation, all modules are jointly learned during training.
3.1. Encoder
BERT [
24] is utilized as an encoder. Sentence
x = {
x1,
x2, …,
xN} treats each Chinese character (
xi) as a token and is fed into the
bert-base-Chinese module. The embedding of the sentence is obtained by
H = BERT (
x1,
x2, …,
xN) = {
h1,
h2, …,
hN}
, where
d is the dimension of the embeddings.
3.2. Event Detection Decoder
The event detection decoder is shown in the upper left corner of
Figure 2, which is used to predict potential event types in sentences by calculating the correlation between sentence representations that imply type features and event-type embeddings. Specifically, event-type embeddings can be denoted by a randomly initialized matrix (
C). We apply
rel to calculate the relevance between each token embedding (
hi) and the potential event type (
c∈
C), see Equation (1), and then a sentence representation (
sc) adaptive to the event type is obtained, see Equation (2). Thus, the similarity probability between
sc and
c is generated using a normalization
operation for each event, see Equation (3).
where
and
are parameters of relevance calculation.
denotes the element-wise subtraction operation.
denotes the element-wise multiplication.
represents the concatenating operation.
represents the sigmoid function. Types satisfying
are selected as potential event types, where
is a threshold hyperparameter between 0 and 1. All potential event types hidden in sentence
x constitute the set of event types
. The decoder can learn the parameters
.
3.3. Trigger Identification Decoder
A large number of experiments demonstrate that trigger information can enhance the ability of argument extraction. The trigger identification decoder is used to identify triggers according to a specific event type (
). The decoder includes a conditional layer normalization (CLN) [
25], a self-attention layer [
26], and a binary trigger tagging classifier pair.
CLN fuses the two features and filters out unnecessary information. Here, the event-type information is encoded into the token representation, and the event-typed token representation (
) is obtained:
where type embedding
c is used as the condition for
and
in CLN.
and
are regarded as the average and deviation of
hi:
where
hik represents the
k-th dimension of
hi.
In order to fully consider the contextual connection in the sentence, a self-attention layer [
26] is used on the event-typed token representation:
where
,
.
For each token, the binary classifier pair can mark the beginning and end position of a trigger span:
where
stands for the
i-th token embedding in
. We select the token satisfying
as the start position, and the one satisfying
as the end.
and
are threshold hyperparameters. To acquire trigger
t, each starting position is enumerated and the nearest subsequent ending position is searched in the sentence. A token span from the start position to the end constitutes a complete trigger. The corresponding triggers are extracted at different stages according to the potential event type. Thus, the trigger overlapping problem can be solved naturally. The set
contains all predicted triggers (
t) under event type (
c) in sentence (
x).
is used to denote all parameters in the trigger identification decoder module.
3.4. Role Pre-Judgment Decoder
Since not all roles appear in a sentence under a specific event type, we designed a role pre-judgment decoder. Based on the predicted event type, it predicts the roles appearing in the sentence based on the corresponding list of event types and roles, providing a basis for extracting overlapping arguments. The decoder consists of three parts: the conditional fusion layer, the self-attention layer, and a role similarity detection function.
In order to obtain richer semantic information, we use CLN to fully integrate trigger embeddings and token representation with event-type knowledge to obtain a new token representation (
), see Equation (8). Here, trigger embedding
t is calculated by the average pooling of token embeddings in the trigger span.
The self-attention layer then reinforces the contextual relationships and the sentence representation
is obtained:
where
,
.
The role similarity detection function predicts potential event-type-specific roles in sentences by calculating the correlation between role embeddings and sentence representations fused with role feature information. Specifically, a randomly initialized matrix
R is used as role embeddings. We apply
rel to calculate the relevance between each token embedding (
) and the potential role (
r∈
R), see Equation (10), and then a sentence representation (
sr) adaptive to the role is obtained, see Equation (11). Thus, the similarity probability between
sr and
r is generated, see Equation (12), based on which normalization operation is performed to obtain the predicted probabilities of all roles under a specific event type.
Select the role that satisfies as the potential role type, and is the threshold. The role type set contains all potential roles whose trigger is t under event type c in sentence x. is used to denote all parameters in the role pre-judgment decoder module.
3.5. Argument Extraction Decoder
An argument extraction decoder is composed of a positional embedding layer (PEL) and role-aware binary classifier pairs for argument tagging.
The relative position of a token to the trigger in the text is beneficial for argument extraction [
9,
27]. Here, relative position embeddings [
8] imply the relative distance information between the current token and the trigger boundary token. Relative positional embeddings are incorporated into the sentence representation (
) using a concatenation operation:
where
is the relative position embeddings, and
dp is the dimension of position embeddings.
For each token, a binary classifier sequence is employed to mark the boundary position of an argument under
:
where
represents the
i-th token in
Zct. For each role (
r), select the token that satisfies
as the starting position and the one that satisfies
as the end.
are thresholds. In order to extract the boundary of argument (
ar) with role (
r), all starting positions are enumerated and the nearest subsequent ending position is searched in the sentence. Tokens between the starting and the ending position constitute a complete argument. In this way, only arguments of a specific role (
r) under a specific trigger (
t) and specific event type (
c) in a sentence are extracted at each prediction stage. Thereby, the argument overlapping problem can be solved naturally. All candidate arguments (
ar) form a set
, and
denotes the set of all parameters of the PEL and argument classifier.
3.6. Model Training
The loss of four modules is integrated during the training process, so the total loss function is designed as follows:
where
. The first two subtasks,
and
, are adopted from [
9]. We decomposed the argument extraction loss to
and
and formulated it as:
where
,
,
,
,
, and
are the predicted probabilities of the event type, starting and ending positions of triggers, role types, and starting and ending positions of arguments, respectively, which can be calculated according to Formulas (3), (7), (12), and (14).
,
,
,
,
, and
are real labels in the training data.
,
,
,
, and
,
,
,
,
denote the parameters from BERT, event detection, trigger identification, role pre-judgment, and argument extraction, respectively. We choose Adam [
28] over shuffled mini-batches to minimize Loss
all.
5. Conclusions
We design a joint learning framework ROPEE for overlapping event extraction, in which the event detection decoder can identify potential event types and help extract overlapping triggers. Based on text embeddings and trigger embeddings, a role pre-judgment module is proposed to predict roles based on the corresponding relationship between event types and roles, thereby enhancing the extraction of overlapping arguments. ROPEE has a certain effectiveness in addressing the issue of trigger overlap in EE, as well as argument overlap. Unlike other pipeline models, we integrate the tasks of four modules in the loss optimization layer, avoiding the traditional problem of error propagation existing in other pipeline methods. On the FewFC dataset, which was compared with other flattened sequence labeling methods (such as BERT-softmax, BERT-CRF, and BERT-CRF-joint), ROPEE achieves excellent recall and F1 scores on all subtasks; compared with multi-stage methods for overlapping event extraction (such as PLMEE, MQAEE, and CasEE), and ROPEE is superior to the above methods in terms of F1 scores of TC, AI, and AC. The above results show the superiority of ROPEE in overlapping event extraction. The ablation experimental results show that our model can also be used in different training strategies and other task scenarios or datasets without trigger labeling. In the future, we plan to build one more overlapping datasets based on specific application scenarios to verify our model’s performance. In addition, since trigger tagging requires additional manpower and material resources, the absence of trigger tagging will reduce the accuracy of the model. The scenario without trigger annotation is general (models should automatically find the core arguments in the sentence), and we plan to improve the model performance in this scenario using position embeddings relative to the core arguments. Finally, we implement joint learning by designing a joint loss of four modules in this article, and we plan to develop different joint strategies to strengthen the interaction between each subtask.