Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Chinese Event Detection without Triggers Based on Dual Attention

Appl. Sci. 2023, 13(7), 4523; https://doi.org/10.3390/app13074523

by Xu Wan¹, Yingchi Mao^2,*

and Rongzhi Qi²

Reviewer 1:

Asadullah Shaikh

Reviewer 2:

Mohsen Rezvani

Appl. Sci. 2023, 13(7), 4523; https://doi.org/10.3390/app13074523

Submission received: 20 February 2023 / Revised: 26 March 2023 / Accepted: 31 March 2023 / Published: 3 April 2023

(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

The paper presents a trigger‐free model that can skip the trigger identification process.

There is no comparison of related work. Kindly, add a comparison of related work with other riggers Based on Dual Attention.

Explain the each layer of figure 3

How the model was trained

4. Experiment

4.1. Experiment Preparation

4.1.1. Dataset

Do not leave empty spaces in sections and subsections.

The comparison needs to be explained for different event detection methods.

Add future work in the conclusions section.

Author Response

We greatly appreciate your hard work and valuable suggestions on the revision of this manuscript. After receiving your suggestions for revision, we have carefully analyzed and revised the manuscript. The review comments are replied as follows.

To facilitate your review, all revisions to the manuscript have been marked up in the revised version using the “Track Changes” function.

Alternatively, you can see revisions to the manuscript in the highlighted version.

Thank you again for your valuable review suggestions.

[Comment 1]

I suggest authors spend one paragraph at the end of the Introduction section to describe the outlines in the paper.

Response:

The outline of this paper is as follows. Section 1 introduces the research issues and contributions of this paper. Section 2 investigates the literature review related to event detection. Section 3 then introduces the event detection method without triggers. Section 4 compares our model's results to those of other traditional methods. Section 5 concludes with some recommendations.

We have added a paragraph at the end of the Introduction section (Section 1) to describe the outline of the paper.

[Highlighted Version, Page 3, Line 101-104]

[Comment 2]

In the Experiments section, it is needed to describe the experimental environments. I suggest authors have a specific subsection to describe the environment, such as the implementation platform, and parameter values...

Response:

The experiments in this paper are implemented in the same software and hardware environment, and the specific environment configuration is as follows.

Configuration Name	Configuration Parameter
RAM	256G
CPU	64 cores Intel(R)Xeon(R)Gold 6326 CPU @2.90GHz
GPU	NAVIDIA A100 GPU*2
GPU VRAM	80G
OS	Ubuntu16.04.7
Deep Learning Framework	PyTorch1.6.0
Implementation Language	Python3.7.10
CUDA	11.1

Hyper-parameters are tuned on the validation dataset by grid search. The specific values of hyper-parameters are listed below.

Parameter	Value
Word Embedding Size	312
Entity Type Embedding Size	200
Lexical Annotation Embedding Size	200
BiLSTM Hidden Size	256
Local Attention Network Hidden Size	128
Global Attention Network Hidden Size	128
Dropout Rate	0.5
Batch Size	16
Epoch Size	100
Learning Rate	0.002
	0.44
Optimizer	Adam

We supplement the description of the parameter settings of the experimental environment in Section 4.1.3. The specific implementation platform is shown in Table 4. The specific values of hyper-parameters are listed in Table 5.

[Highlighted Version, Page 10-11, Line 386-399]

[Comment 3]

Authors employed two datasets for running the experiments, one with 633 and another with 1000 documents. I am wondering if there exist some public datasets including more documents and events for the Chinese Language. I also expect to see some justification for the reasons the authors choose these two datasets. At least the authors must provide the evaluation results based on some standard and public datasets to show fair comparison.

Response:

We propose an Event Detection Without Triggers Based on Dual Attention (EDWTDA) in this paper.

In fact, there are already some public datasets including more documents or events for the Chinese language. However, these datasets either contain fewer event types or are not influential in the field of event detection study.

After investigation, we found that ACE2005 dataset is the most recognized public dataset by researchers, involving three kinds of corpus in English, Chinese and Arabic, containing 633 documents, defining 8 event types, 33 event subtypes and 35 event arguments. Most researchers conducted experiments on the ACE2005 dataset. Therefore, ACE2005 dataset was selected for our experiments.

The dam safety operation log dataset contains 1000 reports, consisting of two parts: special inspection reports and daily inspection reports over the years. It covers 7 event types and 17 argument roles for earthquake, heavy rain, flooding, pre-flood safety inspection, comprehensive special inspection, routine maintenance and daily inspection. The experiment on this dataset is a concrete application of our model in the field of dam safety. Through the application of practical cases, it is proved that the model we proposed can not only achieve good results on the open general dataset such as ACE2005, but also on some professional fields dataset, which is beneficial for engineering applications. So, we chose the dam safety operation log dataset as the experimental dataset.

The ACE2005 dataset is currently the most recognized standard and public dataset for researchers. All baselines (DMCNN, HNN, HBTNGMA, NPN, TLNN, JMCEE, TBNNAM) cited in this paper use this dataset for experimental evaluation. Adhering to the principle of fair comparison, we compared our model with other baselines on the ACE2005 dataset and obtained convincing results.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper proposes a trigger‐free model to determine Chinese event types directly without the trigger identification process.

Here are some comments which can help to improve the paper.

1) I suggest authors spend one paragraph at the end of the Introduction section to describe the outlines in the paper.

2) In the Experiments section, it is needed to describe the experimental environments. I suggest authors have a specific subsection to describe the environment, such as the implementation platform, and parameter values,...

3) Authors employed two datasets for running the experiments, one with 633 and another with 1000 documents. I am wondering if there exist some public datasets including more documents and events for the Chinese Language. I also expect to see some justification for the reasons the authors choose these two datasets. At least the authors must provide the evaluation results based on some standard and public datasets to show fair comparison.

Author Response

To facilitate your review, all revisions to the manuscript have been marked up in the revised version using the “Track Changes” function.

Alternatively, you can see revisions to the manuscript in the highlighted version.

Thank you again for your valuable review suggestions.

[Comment 1]

There is no comparison of related work. Kindly, add a comparison of related work with other riggers Based on Dual Attention.

Response:

We propose an Event Detection Without Triggers Based on Dual Attention (EDWTDA) in this paper.

Currently, the defects of Chinese event detection based on triggers include polysemous triggers and trigger-word mismatch. A trigger-free model that can skip the trigger identification process and determine event types directly, is proposed to fix the problems mentioned above. To achieve this, we use dual attention mechanism. Dual attention is a combination of local attention and global attention. Local attention captures key semantic information in sentences and simulates hidden event trigger words to solve the problem of trigger-word mismatch, while global attention digs for the context of documents, fixing the problem of polysemous triggers.

Therefore, the main work of this paper is that compares with the traditional event detection based on trigger identification, we implement event detection without trigger words. The dual attention mechanism is our means to achieve this goal, and it is the main innovation part. We compare EDWTDA with the event detection methods based on trigger identification (DMCNN, HNN, HBTNGMA, NPN, TLNN, JMCEE). It is proved that EDWTDA can skip the trigger identification process to avoid polysemous triggers and trigger-word mismatch in the traditional event detection based on trigger identification. In addition, we compare EDWTDA with the event detection method without trigger words (TBNNAM), which shows the effectiveness of the dual attention mechanism.

[Comment 2]

Explain the each layer of figure 3.

Response:

Figure 3 shows the event detection architecture of EDWTDA, covering the input layer, the ALBERT embedding layer, feature construction layer, BiLSTM layer, attention layer (local attention and global attention), fusion gate layer and sigmoid layer.

The input layer consists of a complete sentence with event information (Sentence), the complete document containing that sentence (Document), and the event type of the sentence (Table Map). “Sentence” is input to the encoding layer for encoding operations, and the sentence-level semantic information would be captured from its output in the local attention layer. “Document” is input to the global attention layer for capturing document-level semantic information. “Table Map” is input into the dual attention layer to assist attention network in event detection without trigger words.

The ALBERT embedding layer transforms sentences and documents into embedding vectors, which can selectively use information from all levels and solve the polysemy problem via traditional word embedding methods.

The word embedding vector, named entity recognition type and lexical annotation are stitched together to form the feature construction layer. This aids in increasing the attention weight score of keywords and filtering out irrelevant words.

The BiLSTM layer can effectively capture the semantic information of each word.

The attention layer includes local attention and global attention. The BiLSTM layer combines local attention to mine keywords, simulate hidden event trigger words, and avoids the trigger-word mismatch. To avoid the ambiguity of trigger words, global attention contains not only sentence-level semantics but also document-level context.

The fusion gate layer computes the proportion of local and global attention vector weights, determines the event type of the sentence using the sigmoid function.

The sigmoid layer solves the problem of sample imbalance with the focal loss function.

We have added and modified the content as shown in Section 3.

[Highlighted Version, Page 5, Line 188-214]

[Comment 3]

How the model was trained?

Response:

We perform extensive experimental studies on the ACE2005 Chinese dataset and the dam safety operation log dataset. In all experiments, 80% of the data was used as the training set, 10% as the validation set and 10% as the test set.

The experiments in this paper are implemented in the same software and hardware environment, and the specific environment configuration is shown below.

Environment Type	Configuration Name	Configuration Parameter
Hardware Environment	RAM	256G
	CPU	64 cores Intel(R)Xeon(R)Gold 6326 CPU @2.90GHz
	GPU	NAVIDIA A100 GPU*2
	GPU VRAM	80G
Software Environment	OS	Ubuntu16.04.7
	Deep Learning Framework	PyTorch1.6.0
	Implementation Language	Python3.7.10
	CUDA	11.1

Hyper-parameters are tuned on the validation dataset by grid search. The specific values of hyper-parameters are listed below. ALBERT produces 312-dimensional word embedding vectors, and the training lookup table generates 200-dimensional entity type embedding vectors and 200-dimensional lexical annotation embedding vectors. The BiLSTM hidden layer size is set to 256, and both local and global attention network hidden layer sizes are set to 128. The second last layer applies the drop-out layer to avoid overfitting, and the discard ratio is set to 0.5. The model's training batch is set to 16, the number of iterations is 100, and the model is optimized with the Adam optimizer at a learning rate of 0.002. adjusts the weighted fusion ratio of local and global attention networks and sets to 0.44.

Parameter	Value
Word Embedding Size	312
Entity Type Embedding Size	200
Lexical Annotation Embedding Size	200
BiLSTM Hidden Size	256
Local Attention Network Hidden Size	128
Global Attention Network Hidden Size	128
Dropout Rate	0.5
Batch Size	16
Epoch Size	100
Learning Rate	0.002
	0.44
Optimizer	Adam

We compare our models with various baselines, as follows:

(1) DMCNN: Uses dynamic multi-pooling layers to retain more important information based on event trigger words and arguments.

(2) HNN: Combines CNN and BiLSTM to capture sequence and block information in specific contexts.

(3) HBTNGMA: Detects multiple events in a sentence using hierarchical and biased labeling networks, allowing automatic extraction and dynamic fusion of sentence-level and document-level information.

(4) NPN: Captures structural and semantic information from characters and words by learning a hybrid representation of each character to solve the word-trigger mismatch problem.

(5) TLNN: Integrates word and character dynamically to avoid word-trigger mismatch.

(6) JMCEE: Jointly performs prediction of event trigger words and event arguments based on shared feature representations of pre-trained language models to solve the common role overlap problem in practice.

(7) TBNNAM: Simulates hidden event trigger words to detect events without trigger words.

DMCNN uses Skip-Gram to unsupervised pre-train word embedding models on the NYT corpus. NPN follows DMCNN's pre-trained model. HNN and TLNN use the Skip-Gram pre-trained model. JMCEE uses a pre-trained language model for joint event extraction. HBTNGMA and TBNNAM have no pre-trained models. In this paper, EDWTDA is pre-trained by ALBERT. Section 4.2.3 Ablation Analysis contains the ablation comparison experiments performed after EDWTDA switched the ALBERT to the Skip-Gram.

Finally, accuracy rate (P), recall rate (R) and F1-Score (F1) are adopted as the evaluation metrics of the experiment.

[Comment 4]

Do not leave empty spaces in sections and subsections.
Experiment

4.1. Experiment Preparation

4.1.1. Dataset

Response:

We gratefully appreciate for your comment. We have typeset the paper in strict accordance with the requirements of the Journal of Applied Sciences.

[Comment 5]

The comparison needs to be explained for different event detection methods.

Response:

We compare our model with various baselines (DMCNN, HNN, HBTNGMA, NPN, TLNN, JMCEE, TBNNAM), and the results are shown below.

(a) ACE2005 dataset.

Model	Trigger Recognition			Trigger Classification
Model	P	R	F1	P	R	F1
DMCNN	66.60	63.60	65.07	61.60	58.80	60.20
HNN	74.20	63.10	68.20	77.10	53.10	63.00
HBTNGMA	54.29	62.82	58.25	49.86	57.69	53.49
NPN	70.63	64.74	67.56	67.13	61.54	64.21
TLNN	67.34	74.68	70.82	64.45	71.47	67.78
JMCEE	84.30	80.40	82.30	76.40	71.70	73.97
TBNNAM	/	/	/	76.20	64.50	69.86
EDWTDA	/	/	/	79.80	75.60	77.64

(b) Dam safety operation log dataset.

Model	Trigger Recognition			Trigger Classification
Model	P	R	F1	P	R	F1
DMCNN	75.64	72.58	74.09	71.43	69.75	70.58
HNN	83.57	72.35	77.56	87.26	68.57	76.79
HBTNGMA	64.85	71.69	68.10	60.72	74.35	66.84
NPN	80.23	74.46	77.24	78.59	76.18	77.36
TLNN	78.49	86.12	82.13	76.35	82.64	79.37
JMCEE	93.76	89.54	91.60	87.46	83.99	85.69
TBNNAM	/	/	/	87.09	75.35	80.79
EDWTDA	/	/	/	91.64	88.57	90.08

The task of event detection is split into two parts: trigger words recognition and trigger words classification. TBNNAM and EDWTDA are trigger-free event detection methods, which can determine the event type directly, so only the latter half of the experimental results are available. EDWTDA outperforms other baselines on both datasets, achieving the best precision, recall and F1-score. According to the experimental results on ACE2005 dataset, EDWTDA improved the accuracy rate, recall rate and F1-score by 3.40%, 3.90% and 3.67%, respectively, compared with the optimal baseline JMCEE. And the experimental results on the dam safety operation log dataset prove that EDWTDA improves the accuracy rate, recall rate and F1-score by 4.18%, 4.58% and 4.39%, respectively, compared with the optimal baseline JMCEE.

The experimental results also show that DMCNN, as a classical event detection model, not only outperforms HBTNGMA in terms of precision, recall, and F1-Score on both datasets, but also has a significant gap with the other one. HNN has the highest precision on the ACE2005 dataset. HNN combines CNN and BiLSTM, which can better capture context and improve the model’s accuracy. However, it has the lowest recall. HBTNGMA is highlighted in gray in the experimental results because it has no pre-trained model during experiments. HBTNGMA uses document to detect multiple events in a sentence. And its precision and recall are both improved over DMCNN. However, it does not solve the problem of mismatch caused by word separation in Chinese text, and there is little room for improvement. It is possible to improve the model effect by adding pre-training models such as BERT, which also proves the power of pre-training models in NLP. By fusing character-level and lexical-level information, NPN obtains a hybrid representation that captures internal character composition and accurately classifies event information. NPN attempts to deal with the word-trigger mismatch, but with limited improvement in precision and recall. TLNN focuses on solving the trigger-word mismatch problem by creating a path to link the cell state of all words between the start and end positions of the word. Compared with other baselines, it significantly improves the model's recall. However, TLNN ignores the significance of document context, resulting in limited progress in precision and overall F1-score. JMCEE focuses on solving the problem of role overlap, and uses shared feature representations of pre-trained language models to predicts event trigger words and event arguments jointly. It has achieved breakthrough in accuracy rate, recall rate and F1-score, but it does not solve the problem of trigger-word mismatch. This results in a dramatic performance degradation in the process from trigger identification to classification. TBNNAM reduces the expensive manual labeling cost and explores the event detection without triggers for the first time, which is close to the best baseline model in terms of event detection accuracy. However, Skip-gram embedding coding does not consider context, and the MSE loss function vanishes as the bias derivative output probability value approaches 0 or 1. Limited by these two problems, the recall of TBNNAM is low, leading to the model's poor results on F1-score.

Details can be found in Section 4.2.1.

[Highlighted Version, Page 12, Line 428-456]

[Comment 6]

Add future work in the conclusions section.

Response:

In this paper, we propose EDWTDA for dealing with the polysemy and the mismatch of word triggers. EDWTDA take the dual attention mechanism to capture key semantics in sentences and documents, fusing event types to simulate hidden event trigger words. It performs event detection without trigger words, and avoids the word-trigger mismatch. At the same time, EDWTDA solves the polysemous problem with the context contained in the document. Finally, the effectiveness of each module of EDWTDA is verified through experimental analysis. EDWTDA significantly improves event detection performance, proving that event detection can work well without trigger words.

In future work, we will conduct experiments on more languages with and without explicit word separators. In addition, we will try developing a dynamic mechanism to selectively consider the semantic information rather than take all the senses of characters and words into account.

We have added future work in Section 5.

[Highlighted Version, Page 15, Line 561-564]

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors have addressed most of the comments except for related work comparisons.

Kindly add a comparison in related work from line number 186 on how your event detection technique is different from the published dual attention method

Rest is good to go.

Author Response

To facilitate your review, all revisions to the manuscript have been marked up in the revised version using the “Track Changes” function.

Alternatively, you can see revisions to the manuscript below.

Thank you again for your valuable review suggestions.

[Comment 1]

The authors have addressed most of the comments except for related work comparisons. Kindly add a comparison in related work from line number 186 on how your event detection technique is different from the published dual attention method. The rest is good to go.

Response:

First of all, thank you for your valuable comments.

According to the survey, there is no published work on event detection based on dual attention. The dual attention mechanism adopted in this paper is a pioneering work in the field of event detection. There is no previous work based on dual attention in this field, so we cannot conduct a comparison of related work on dual attention mechanisms. Since the dual attention mechanism is proposed for the first time in this manuscript and used in event detection, we conducted experiments in Ablation Analysis [Section 4.2.3, Page 13, Line 487] to demonstrate the effectiveness of the dual attention mechanism.

In addition, more importantly, this paper aims to propose a trigger-free event detection model that can skip the trigger identification process and determine event types directly, so as to solve the problems existing in the traditional event detection based on trigger identification model. Therefore, this paper focuses on the comparison between the event detection based on trigger identification model and the event detection without trigger words model. As shown in Performance Comparison Analysis [Section 4.2.1, Page 11, Line 405]. Experiments show that compared with event detection based on trigger identification models (DMCNN, HNN, HBTNGMA, NPN, TLNN, JMCEE), the trigger-free event detection method adopted in this paper can indeed improve event detection precision and recall.

In general, the traditional event detection based on trigger identification model has problems of polysemous triggers and trigger-word mismatch. Based on the advantages of trigger-free event detection and abundant document-level semantic information, we propose an Event Detection Without Triggers Based on Dual Attention (EDWTDA). Dual attention combines local attention and global attention. Local attention captures key semantic information in sentences and simulates hidden event trigger words to solve the problem of trigger-word mismatch, while global attention digs for the context of documents, fixing the problem of polysemous triggers. Finally, the effectiveness of EDWTDA is verified through experimental analysis. EDWTDA significantly improves event detection performance, proving that event detection can work well without trigger words.

Author Response File: Author Response.pdf

Article Menu

Chinese Event Detection without Triggers Based on Dual Attention

Further Information

Guidelines

MDPI Initiatives

Follow MDPI