A Hierarchical Heterogeneous Graph Attention Network for Emotion-Cause Pair Extraction

Yu, Jiaxin; Liu, Wenyuan; He, Yongjun; Zhong, Bineng

doi:10.3390/electronics11182884

Open AccessArticle

A Hierarchical Heterogeneous Graph Attention Network for Emotion-Cause Pair Extraction

¹

School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China

²

The Engineering Research Center for Network Perception & Big Data of Hebei Province, Qinhuangdao 066004, China

³

School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China

⁴

The Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin 541004, China

^*

Authors to whom correspondence should be addressed.

Electronics 2022, 11(18), 2884; https://doi.org/10.3390/electronics11182884

Submission received: 15 August 2022 / Revised: 6 September 2022 / Accepted: 7 September 2022 / Published: 12 September 2022

(This article belongs to the Special Issue Advanced Machine Learning Applications in Big Data Analytics)

Download

Browse Figures

Versions Notes

Abstract

:

Recently, graph neural networks (GNN), due to their compelling representation learning ability, have been exploited to deal with emotion-cause pair extraction (ECPE). However, current GNN-based ECPE methods mostly concentrate on modeling the local dependency relation between homogeneous nodes at the semantic granularity of clauses or clause pairs, while they fail to take full advantage of the rich semantic information in the document. To solve this problem, we propose a novel hierarchical heterogeneous graph attention network to model global semantic relations among nodes. Especially, our method introduces all types of semantic elements involved in the ECPE, not just clauses or clause pairs. Specifically, we first model the dependency between clauses and words, in which word nodes are also exploited as an intermediary for the association between clause nodes. Secondly, a pair-level subgraph is constructed to explore the correlation between the pair nodes and their different neighboring nodes. Representation learning of clauses and clause pairs is achieved by two-level heterogeneous graph attention networks. Experiments on the benchmark datasets show that our proposed model achieves a significant improvement over 13 compared methods.

Keywords:

emotion-cause pair extraction; heterogeneous graph; graph attention network; hierarchical model

1. Introduction

As a research hotspot in natural language processing (NLP), emotion-cause extraction (ECE), aimed at extracting the causes corresponding to the emotions specified in a given document, has been widely utilized in public opinion analysis, human–machine dialogue systems, and so on. Originally, taking events as the causes, Lee et al. [1] regarded ECE as a word-level sequence annotating task. Afterwards, some studies redefined the granularity of annotation in ECE to the clause level to make full use of context information [2,3]. Although annotating emotions in advance contributes to cause extraction, it is very labor-consuming, which limits the real application of the ECE approach. To solve this problem, Xia and Ding [4] put forward a new emotion analysis task called emotion-cause pair extraction (ECPE), which extracts emotion clauses and their corresponding cause clauses in pairs. ECPE does not rely on labeling emotions, so it is preferable, but more challenging, than ECE. Furthermore, they also proposed a two-stage pipelined framework to handle this new task, in which the emotions and causes are first extracted and then paired. Since this two-stage approach may result in cross-stage propagation of errors, a lot of end-to-end approaches are presented and achieve improvements over two-stage approaches. In the end-to-end ECPE approaches, the crucial issue is to learn good representations of semantic elements. GNN [5,6] can learn node representations based on node features and the graph structure; therefore, it is a powerful deep representation learning method and has been widely utilized in many application fields. Inspired by this, a few researchers attempted to apply GNN to the ECPE task. They mostly construct a homogeneous graph with the semantic information of a document and employed GNNs to learn these semantic representations. For example, Wei et al. [7] and Chen et al. [8] model the inter-clause and inter-pair relations, respectively.

Nevertheless, existing GNN-based ECPE approaches only concentrate on one semantic level, ignoring the rich semantic relations between different kinds of semantic elements. Hence, the captured semantic information is local, rather than global. In fact, in the ECPE task, a document involves different semantic granularity of words, clauses, clause pairs, and so on; hence, the constructed text graph should come with multiple types of nodes, also well-known as a heterogeneous graph. Furthermore, all the associations between these nodes can provide clues for extracting causality. Therefore, it is conductive for the joint extraction of emotion clauses, cause clauses, and emotion-cause pairs to take all semantic elements into account and model the global semantic relations between them.

In this study, we propose an end-to-end hierarchical heterogeneous graph attention model (HHGAT). Different from the existing methods that only consider clause or pair nodes, we introduce word nodes into our heterogeneous graph, together with clause and pair nodes, to cover all semantic elements. In particular, the introduced word nodes can not only extract fine-grained clause features by modeling the dependency between clauses and words, but also act as an intermediate node connecting clause nodes to enrich the correlation between clause nodes. Moreover, a fully connected pair-level subgraph is established to capture the relations between a pair node and its neighboring nodes on different semantic paths. Depending on such a hierarchy of “word-clause-pair”, we realize a model of the global semantics in a document.

2. Related Work

Emotion analysis is active in the field of NLP. In many application scenarios, it is more important to understand the emotional cause than the emotion itself. Here, we focus on two challenging tasks, namely ECE and ECPE.

2.1. ECE

Different from traditional emotion classification, the purpose of ECE is to extract the causes of specific emotions. Lee et al. [1] first defined the ECE task and introduced a method based on linguistic rules (RB). Subsequently, for different linguistic patterns, a variety of RB methods are proposed [9,10,11]. In addition, Russo et al. [12] designed a novel method combining RB and common-sense knowledge. However, the performance of these RB methods is usually unsatisfactory. Considering that it is impossible for rules to cover all language phenomena, some machine learning (ML)-based ECE methods are proposed. Gui et al. [13] designed two ML-based methods, combined with 25 rules. Ghazi et al. [14] employed conditional random field (CRF) to tag emotional causes. Moreover, Gui et al. [2] constructed a new clause-level corpus and utilized support vector machine (SVM) to deal with the ECE task. To benefit from the representation learning ability of deep learning (DL), some DL-based methods achieved excellent performance on ECE. Gui et al. [15] presented a new method based on convolutional neural network (CNN). Cheng et al. [16] used long short-term memory networks (LSTM) to model the clauses. To obtain better context representations, a series of hierarchical models [17,18,19,20,21,22,23,24] were explored. Inspired by multitask learning, Chen et al. [25] and Hu et al. [26] focused on the joint extraction of emotion and cause. In addition, Ding et al. [27] and Xu et al. [28] reformulated ECE into a ranking problem. Considering the importance of emotion-independent features, Xiao et al. [29] presented a multi-view attention network. Recently, Hu et al. [30] proposed a graph convolution network (GCN) integrating semantics and structure information, which is the state-of-the-art ECE method.

2.2. ECPE

2.2.1. Pipelined ECPE

ECE requires the manual annotation of emotion clauses before cause extraction, which is labor-consuming. To solve this problem, Xia and Ding [4] proposed a new task called ECPE, and they introduced three two-stage pipelined models, namely Indep, Inter-CE, and Inter-EC. For Inter-EC [4], Shan and Zhu [31] designed a new cause extraction component based on transformer [32] to improve this model. Yu et al. [33] applied the self-distillation method to train a mutually auxiliary multitask model. Jia et al. [34] realized mutual promotion of emotion extraction and cause extraction by recursively modeling clauses. To improve the pairing stage of two-stage pipelined methods, Sun et al. [35] presented a dual-questioning attention network. Moreover, Shi et al. [36] simultaneously enhanced both stages of the pipelined method.

2.2.2. End-to-End ECPE

Although the pipelined approach has been proved to be effective for ECPE, it leads to cross-stage error propagation. To solve this problem, a series of end-to-end ECPE approaches are proposed.

Wu et al. [37] jointly trained the three subtasks in ECPE via a unified framework and had clause features shared to exploit the interaction between subtasks. To make full use of the implicit connection between emotion detection and emotion-cause pair extraction, Tang et al. [38] tackled these two tasks in a joint framework. Concentrating on the interaction between emotion-cause pairs, Ding et al. [39] presented a 2D transformer and its two variants. Fan et al. [40] introduced a scope controller to concentrate the predicted distribution of emotion-cause pair. Ding et al. [41] restricted ECPE to the emotion-centered cause extraction in the sliding window and proposed a multi-label learning method. Cheng et al. [42] took advantage of two symmetrical subnetworks to conduct a local search [43,44] around emotion or cause, respectively. Singh et al. [45] adopted the prediction results of emotion extraction to promote the cause extraction. Considering the importance of order information, Fan et al. [46] captured the sequential features of clauses through three LSTMs: forward LSTM, backward LSTM, and BiLSTM. Yang et al. [47] utilized the consistency of emotion type between the emotion clause and clause pair. Chen et al. [48] achieved the mutual promotion of emotion extraction and cause extraction through iterative learning. Furthermore, some studies [49,50,51,52] coincidentally reformulated ECPE as a sequence labeling problem.

Recently, some graph structure-based approaches are proposed. Song et al. [53] treated ECPE as a link prediction task of directed graph; however, they did not adopt a GNN that is more suitable for graph structure modeling. Despite Fan et al. [54] introduced a novel approach that regards ECPE as an action prediction task in directed graph construction; their model is not based on GNN, either. In addition, Wei et al. [7] exploited a graph attention network (GAT) to enhance inter-clause relation modeling and deal with the ECPE task from a ranking perspective. Chen et al. [8] developed an approach based on a graph convolutional network to capture the relevance among local neighboring candidate pairs. However, the above graph-based approaches ignored the relationship between heterogeneous nodes, so they failed to model global semantics.

3. Methodology

3.1. Task Definition

In this section, the ECPE task is formalized as follows. Let

d = [c_{1}, \dots c_{i} \dots, c_{m}]

be a document that contains

m

clauses, where

c_{i} = [w_{i, 1}, \dots w_{i, j} \dots, w_{i, n}]

is the i-th clause and further decomposed into a sequence of

n

words. The aim of ECPE is to extract the emotion-cause pairs from

d

:

P = {p_{k}}_{k = 1}^{| P |} = {(c_{k}^{e}, c_{k}^{c})}_{k = 1}^{| P |},

(1)

where

c_{i}^{e}

is the emotion clause in the k-th emotion-cause pair,

c_{j}^{c}

corresponds to the cause clause, and

P

represents the candidate pair set.

3.2. Overview

In this work, we first represent a document with a “word-clause-pair” heterogeneous graph, as illustrated in Figure 1. Then, we present a hierarchical heterogeneous graph attention network to model the “word-clause-pair” hierarchical structure and identify the emotion-cause pairs according to the learned node representation. As shown in Figure 2, our proposed model mainly includes three components: (1) the node initialization layer, which utilizes word-level BiLSTM, followed by a self-attention module or pre-trained BERT to obtain the initial semantic representations of word and clause nodes; (2) the clause node encoding layer employs a node-level heterogeneous graph attention network to integrate the inner-clause contextual features into the clause representations by capturing the dependencies between clause nodes and word nodes they contains; (3) the pair node encoding layer is a heterogeneous graph attention network based on meta-path, which first applies a node-level attention and then a meta-path level attention. Finally, three multilayer perceptrons (MLP) are adopted to predict the emotion clauses, cause clauses, and emotion-cause pairs, respectively.

3.3. Heterogeneous Graph Construction

We denote our hierarchical heterogeneous graph as

G = (V, E)

, where

V = V^{w} \cup V^{c} \cup V^{p}

represents a node set that consists of three types of nodes, and

E

stands for the edges between all nodes.

V^{w} = \cup_{i = 1}^{m} {w_{i, j}}_{j = 1}^{n}

,

V^{c} = {c_{i}}_{i = 1}^{m}

, and

V^{p} = \cup_{i = 1}^{m} {p_{i, j}}_{j = 1}^{m}

indicate the sets of words, clauses, and pair nodes, respectively. As shown in Figure 2, a word-to-clause edge distinctly indicates which clause a word is contained in. The two clause nodes connected with the same pair node together form a candidate emotion-cause pair. Moreover, the association between two pair nodes is represented by a pair-to-pair edge.

On the one hand, most current methods employ two clause-level subtasks (i.e., emotion extraction and cause extraction) in a unified framework to facilitate the detection of emotion-cause pairs. On the other hand, good clause representation is conducive to the feature construction of clause pairs. Hence, in order to learn the semantic representations of clause and pair nodes in detail, we divide our heterogeneous graph into two subgraphs, i.e., word-clause

G^{w c} = (V^{w} \cup V^{c}, E^{w c})

and pair-level

G^{p} = (V^{p}, E^{p})

subgraphs. Here,

E^{w c}

denotes the word-to-clause edge set, and

E^{p}

represents the pair-to-pair edge set. Furthermore,

G^{w c}

and

G^{p}

are further divided into a series of more fine-grained subgraphs, i.e.,

\cup_{i = 1}^{m} G_{i}^{w c}

and

\cup_{i = 1}^{m} G_{i}^{p}

, respectively, to facilitate the formalized description of our algorithm.

3.4. Hierarchical Heterogeneous Graph Attention Network

3.4.1. Node Initialization Layer

In this layer, a word embedding matrix

E_{w} \in ℝ^{d_{w} \times d_{v}}

is first applied to transform each word

w_{i, j}

into a vector

v_{i, j}

. Here,

d_{w}

and

d_{v}

are the vocabulary size and embedding dimension, respectively. Next, the contextual information for each word is captured through a BiLSTM module:

[h_{i, 1}^{w}, \dots h_{i, j}^{w} \dots, h_{i, n}^{w}] = BiLSTM ([v_{i, 1}, \dots v_{i, j} \dots, v_{i, n}]),

(2)

where,

h_{i, j}^{w}

represents the hidden state of the j-th word in the i-th clause. Then, an attention module is adopted to aggregate the word representations in the clause

c_{i}

:

h_{i}^{s} = Attention ([h_{i, 1}^{w}, \dots h_{i, j}^{w} \dots, h_{i, n}^{w}]),

(3)

where

h_{i}^{s}

is the vectorization representation of the i-th clause.

Furthermore, inspired by the BERT [55], we implement another version of node initialization layer, which utilizes the pre-trained BERT model to replace above BiLSTM and attention modules. The tokens [CLS] and [SEP] are inserted at the beginning and end of a given clause

c_{i}

, respectively, to obtain a sequence

c_{i} = [w_{CLS}, w_{i, 1}, \dots w_{i, j} \dots, w_{i, n}, w_{SEP}]

. It is worth noting that

w_{i, j}

represents the j-th token, rather than j-th word of the clause

c_{i}

, in the BERT version. Afterwards, the sequences corresponding to all clauses in the document are concatenated to form a whole sequence, and then input it to BERT. Through stacked transformer modules, we can obtain the output vectors

\cup_{i = 1}^{m} {h_{i, 1}^{w}, \dots h_{i, j}^{w} \dots, h_{i, n}^{w}}

and

{h_{i}^{s}}_{i = 1}^{m}

, which are the initialization representations of word and clause nodes, respectively. Here,

h_{i}^{s}

is the output of

w_{CLS}

corresponding to the clause

c_{i}

.

3.4.2. Clause Node Encoding Layer

Inner-clause relationships plays an important role in semantic understanding. In addition, a word can be also treated as a specific relation between the clauses containing it. Therefore, to further learn the semantic representation of a clause node, we extract each clause node and its connected word nodes from the hierarchical graph to build a fine-grained word-clause subgraph. Given a constructed subgraph

G_{i}^{w c}

, with the clause node

c_{i}

and word nodes

{w_{i, j}}_{j = 1}^{n}

, we apply a heterogeneous graph attention network to update the representation of the clause node.

Since two types of nodes exist in the heterogeneous subgraph, different types of nodes may belong to different feature spaces. Consequently, type-specific transformation matrices

W_{s}

and

W_{w}

are adopted to respectively project the features of clause and word nodes, with possibly different dimensions into the same feature space. The projection process can be shown in the following:

{\tilde{h}}_{i}^{s} = W_{s} \cdot h_{i}^{s}, {\tilde{h}}_{i, j}^{w} = W_{w} \cdot h_{i, j}^{w},

(4)

where

h_{i}^{s}

is the initialization representation of clause node

c_{i}

, and

h_{i, j}^{w}

denotes the initialization representation of word node

w_{i, j}

.

The node-level attention mechanism is then applied to learn the importance of different neighboring nodes to each target node. For a word-clause subgraph

G_{i}^{w c}

, the clause node

c_{i} \in V^{c}

is the target node, while the corresponding neighboring nodes come from the word node set

{w_{i, j}}_{j = 1}^{n}

. Specifically, importance scores are computed through a linear layer parameterized by

w_{1}^{⊤}

, and then they are normalized to obtain weight coefficients via the softmax function. Next, according to these weight coefficients, the node aggregation over the subgraph is conducted by a weighted summation. In addition, we also apply a residual connection when updating the semantic representation of the clause node

c_{i}

. The specific process is as follows:

e_{i, j} = LeakyReLU (w_{1}^{⊤} \cdot \tanh ({\tilde{h}}_{i}^{s} ∥ {\tilde{h}}_{i, j}^{w})),

(5)

a_{i, j} = \frac{\exp (e_{i, j})}{\sum_{k = 1}^{n} \exp (e_{i, k})},

(6)

{\hat{h}}_{i}^{s} = Re LU (\sum_{j = 1}^{n} a_{i, j} \cdot {\tilde{h}}_{i, j}^{w} + b_{w}) + h_{i}^{s},

(7)

where

w_{1}

is trainable weight matrix,

b_{w}

is the bias parameter,

∥

denotes the concatenation operation, and

⊤

represents the transpose of matrix. As a result, the clause representation

{\hat{h}}_{i}^{s}

integrating word semantics is generated.

Once obtaining updated node representation

{\hat{h}}_{i}^{s}

, it is fed into the emotion clause classifier to determine whether the clause corresponding to

c_{i}

is an emotion clause or not, and the classifier is implemented by a linear layer (parameterized by

w_{e}^{}

and

b_{e}

) with the sigmoid function:

{\hat{y}}_{i}^{e} = s igmoid (w_{e}^{⊤} \cdot {\hat{h}}_{i}^{s} + b_{e}),

(8)

where

{\hat{y}}_{i}^{e}

is the predicted probability that the clause node

c_{i}

is an emotion clause. The calculation process of obtaining the cause probability

{\hat{y}}_{i}^{c}

is similar to that of

{\hat{y}}_{i}^{e}

, except that the parameters are replaced by

w_{c}

and

b_{c}

.

3.4.3. Pair Node Encoding Layer

It can be observed that there are only simple subordinate relationships between the clause and pair nodes, rather than complex semantic relationships. Hence, we just need to consider pair nodes and the correlation between them when performing subgraph segmentation in this section. Furthermore, in a fine-grained pair-level subgraph

G_{i}^{p}

, the neighboring nodes of a node

p_{i, j}

are restricted to those nodes with the same emotion candidate as this one. Therefore, a pair-level, fully connected subgraph is formalized as

G_{i}^{p} = ({p_{i, j}}_{j = 1}^{m}, E_{i}^{p})

. Moreover, a meta-path

Φ_{t}

is described as a kind of path in the forms of

p_{i, k} \to \dots p_{i, j - 1} \to p_{i, j}

and

p_{i, j} \leftarrow p_{i, j + 1} \dots \leftarrow p_{i, k}

, where

t = | k - j |

represents the number of hops from a source node

p_{i, k}

to the target node

p_{i, j}

. According to the statistical results of [8], the proportion that the distance between an emotion clause and the corresponding cause clause less than or equal to 2 is 95.8%. Taking into account this, we introduce four kinds of meta-paths:

Φ_{0}

,

Φ_{1}

,

Φ_{2}

, and

Φ_{3}

. Different from the other three types of paths,

Φ_{3}

indicates the length of the path from the source node to the target node is

\geq 3

.

Given a pair-level subgraph

G_{i}^{p}

, the initial representation

h_{i, j}^{p}

of a node

p_{i, j} = (c_{i}^{e}, c_{j}^{c})

in

G_{i}^{p}

is obtained by concatenating three vectors:

h_{i, j}^{p} = {\hat{h}}_{i}^{s} ∥ {\hat{h}}_{j}^{s} ∥ h_{i, j}^{r e p},

(9)

where

{\hat{h}}_{i}^{s}

and

{\hat{h}}_{j}^{s}

represent the semantic representations of candidate emotion clause

c_{i}^{e}

and candidate cause clause

c_{j}^{c}

, respectively.

h_{i, j}^{r e p}

indicates the relative position embedding, which is randomly initialized by the sampling of a uniform distribution. Considering that the meta-path-based neighbors play different roles in the representation of each node, we apply a meta-path-based graph attention network, which aggregates the features of neighboring nodes from different-typed paths to update the representation of this node. Specifically, two aggregation operations need to be performed.

Firstly, node-level attention is leveraged to aggregate the path-specific node representations. Specifically, for all pair nodes in the subgraph

G_{i}^{p}

, a shared linear transformation, followed by the tanh function, is employed. Given a target node

p_{i, j}

and meta-path

Φ_{t}

, the weight coefficient

e_{(i, j), (i, k)}^{Φ_{t}}

of a neighboring node

p_{i, k}

that is connected to node

p_{i, j}

through meta-path

Φ_{t}

is calculated.

e_{(i, j), (i, k)}^{Φ_{t}}

reflects the importance of node

p_{i, k}

to node

p_{i, j}

. The weight coefficients of all

Φ_{t}

-based neighboring nodes are then normalized via the softmax function. By weighted summation,

Φ_{t}

-specific aggregate representation

{\tilde{h}}_{i, j}^{Φ_{t}}

of the node

p_{i, j}

is generated:

{\tilde{h}}_{i, j}^{p} = W_{p} \cdot h_{i, j}^{p}, {\tilde{h}}_{i, k}^{p} = W_{p} \cdot h_{i, k}^{p},

(10)

{\tilde{e}}_{(i, j), (i, k)}^{Φ_{t}} = LeakyReLU (w_{Φ_{t}}^{⊤} \cdot \tanh ({\tilde{h}}_{i, j}^{p} ∥ {\tilde{h}}_{i, k}^{p})),

(11)

e_{(i, j), (i, k)}^{Φ_{t}} = I_{(i, j), (i, k)}^{Φ_{t}} \cdot {\tilde{e}}_{(i, j), (i, k)}^{Φ_{t}}, I_{(i, j), (i, k)}^{Φ_{t}} = \{\begin{cases} 1, p_{i, k} \in P_{i, j}^{Φ_{t}} \\ 0, p_{i, k} \notin P_{i, j}^{Φ_{t}} \end{cases},

(12)

a_{(i, j), (i, k)}^{Φ_{t}} = \frac{\exp (e_{(i, j), (i, k)}^{Φ_{t}})}{\sum_{k^{'} = 1}^{m} \exp (e_{(i, j), (i, k^{'})}^{Φ_{t}})},

(13)

{\tilde{h}}_{i, j}^{Φ_{t}} = Re LU (\sum_{k = 1}^{m} a_{(i, j), (i, k)}^{Φ_{t}} \cdot {\tilde{h}}_{i, k}^{p} + b_{Φ_{t}}),

(14)

where

W_{p}

and

w_{Φ_{t}}

are trainable weight matrices,

b_{Φ_{t}}

denotes the bias, and

h_{i, j}^{p}

represents the initial feature of node

p_{i, j}

. In addition,

I_{(i, j), (i, k)}^{Φ_{t}}

is the node mask, which injects structural information into the model. Additionally,

I_{(i, j), (i, k)}^{Φ_{t}} = 1

means that

p_{i, k}

belongs to the

Φ_{t}

-based neighboring node set

P_{i, j}^{Φ_{t}}

of

p_{i, j}

.

Secondly, path-level attention is applied to measure the importance of different meta-paths to the target node. For this purpose, the path-specific aggregate representations obtained by previous node-level attention are transformed into the weight values through a linear transformation matrix. After that, the softmax function is employed to normalize these weight values, so as to obtain the weight coefficients of different paths. Using the learned weight coefficients, the aggregate representations from different meta-paths are fused with the initial node representation

h_{i, j}^{p}

. The final semantic representation

{\hat{h}}_{i, j}^{p}

of node

p_{i, j}

is obtained by:

a_{i, j}^{Φ_{t}} = \frac{\exp (w_{2}^{⊤} \cdot {\tilde{h}}_{i, j}^{Φ_{t}})}{\sum_{t^{'} = 0}^{T} \exp (w_{2}^{⊤} \cdot {\tilde{h}}_{i, j}^{Φ_{t^{'}}})},

(15)

{\hat{h}}_{i, j}^{p} = \sum_{t = 0}^{T} a_{i, j}^{Φ_{t}} \cdot {\tilde{h}}_{i, j}^{Φ_{t}} + h_{i, j}^{p},

(16)

where

w_{2}^{⊤}

is a trainable transformation matrix, the meta-path

Φ_{t}

belongs to the path set

Φ = {Φ_{t}}_{t = 0}^{T}

, and

T = | Φ | - 1

.

a_{i, j}^{Φ_{t}}

represents the weight coefficient of meta-path

Φ_{t}

to node

p_{i, j}

. Here, it is worth noting that, if the target nodes are different, the weight distribution of the meta-paths is also different.

Then, a logistic regression layer (parameterized by

w_{p}^{⊤}

and

b_{p}

) is utilized to identify whether each pair node is a true emotion-cause pair node:

{\hat{y}}_{i, j}^{p} = s igmoid (w_{p}^{⊤} \cdot {\hat{h}}_{i, j}^{p} + b_{p}) .

(17)

3.5. Model Training and Optimization

The loss function of extracting emotion-cause pairs from a given document

d

is formulated as follows:

L_{p} = - \frac{1}{m^{2}} \cdot \sum_{i = 1}^{m} \sum_{j = 1}^{m} (y_{i, j}^{p} \cdot \log ({\hat{y}}_{i, j}^{p}) + (1 - y_{i, j}^{p}) \cdot \log (1 - {\hat{y}}_{i, j}^{p})),

(18)

where

y_{i, j}^{p}

is the ground-truth of node

p_{i, j}

. To benefit from the other two subtasks, the loss terms of the emotion extraction and cause extraction are introduced. For simplicity, only the calculation process of loss term for the emotion extraction is provided in the following:

L_{e} = - \frac{1}{m} \cdot \sum_{i = 1}^{m} (y_{i}^{e} \cdot \log ({\hat{y}}_{i}^{e}) + (1 - y_{i}^{e}) \cdot \log (1 - {\hat{y}}_{i}^{e})),

(19)

where

y_{i}^{e}

is the emotion annotation of clause

c_{i}

. Therefore, the total loss of our model is

L_{t o t a l} = L_{p} + L_{e} + L_{c} .

(20)

Finally, the purpose of the model training is to minimize the total loss. The overall process is shown in Algorithm 1.

Algorithm 1: The overall process of HHGAT.

Input : The heterogeneous graph

G = (V, E)

,

V = V^{w} \cup V^{c} \cup V^{p}

,
The initial feature

h_{i}^{s}

of clause node

\forall c_{i} \in V^{c} = {c_{i}}_{i = 1}^{m}

,
The initial feature

h_{i, j}^{w}

of word node

\forall w_{i, j} \in V^{w} = \cup_{i = 1}^{m} {w_{i, j}}_{j = 1}^{n}

.
Output: The clause node representations

{{\hat{h}}_{i}^{s}}_{i = 1}^{m}

,
The pair node representations

\cup_{i = 1}^{m} {{\hat{h}}_{i, j}^{p}}_{j = 1}^{m}

.
for word-clause subgraph

G_{i}^{w c} \subset G^{w c}

do
Project feature space

{\tilde{h}}_{i}^{s} = W_{s} \cdot h_{i}^{s}

;
for word node

w_{i, j} \in {w_{i, j}}_{j = 1}^{n}

do
Project feature space

{\tilde{h}}_{i, j}^{w} = W_{w} \cdot h_{i, j}^{w}

;
Calculate the node-level weight coefficient

a_{i, j}

;
Update clause node feature

{\hat{h}}_{i}^{s} = Re LU (\sum_{j = 1}^{n} a_{i, j} \cdot {\tilde{h}}_{i, j}^{w} + b_{w}) + h_{i}^{s}

;
for pair-level subgraph

G_{i}^{p} \subset G_{i}^{p}

do
for pair node

p_{i, j} \in G_{i}^{p}

do
Initialize the node representation

h_{i, j}^{p} = {\hat{h}}_{i}^{s} ∥ {\hat{h}}_{j}^{s} ∥ h_{i, j}^{r e p}

;
Project feature space

{\tilde{h}}_{i, j}^{p} = W_{p} \cdot h_{i, j}^{p}

;
for meta-path

Φ_{t} \in Φ

do
for

Φ_{t} - based

neighboring node

p_{i, k} \in P_{i, j}^{Φ_{t}}

do
Calculate the node-level weight coefficient

a_{(i, j), (i, k)}^{Φ_{t}}

;
Aggregate node feature

{\tilde{h}}_{i, j}^{Φ_{t}} = Re LU (\sum_{k = 1}^{m} a_{(i, j), (i, k)}^{Φ_{t}} \cdot {\tilde{h}}_{i, k}^{p} + b_{Φ_{t}})

;
Calculate the weight coefficient

a_{i, j}^{Φ_{t}}

of meta-path

Φ_{t}

;
Update pair node feature

{\hat{h}}_{i, j}^{p} = \sum_{t = 0}^{T} a_{i, j}^{Φ_{t}} \cdot {\tilde{h}}_{i, j}^{Φ_{t}} + h_{i, j}^{p}

;
Calculate the total loss

L_{t o t a l} = L_{p} + L_{e} + L_{c}

;
Back propagation and update parameters;
return

{{\hat{h}}_{i}^{s}}_{i = 1}^{m}

,

\cup_{i = 1}^{m} {{\hat{h}}_{i, j}^{p}}_{j = 1}^{m}

.

4. Experiments

4.1. Dataset and Evaluation Metrics

To evaluate our method, we utilized the benchmark ECPE dataset released by Xia and Ding [4], which consists of 1945 Chinese news documents. In these documents, there are a total of 490,367 candidate pairs, of which, the real emotion-cause pairs account for less than 1%, and each document possibly contains more than one emotion corresponding to multiple causes. According to the data-split setting of previous work, the dataset was segmented into 10 equal parts, and they were chosen as the train and test sets in the proportion of 9 to 1. In order to achieve statistically credible verification, we applied 10-fold cross-validation and repeated the experiments 20 times to average the results. Furthermore, precision (P), recall (R), and F1-score (F1) were selected as the evaluation metrics for emotion, cause, and emotion-cause pair extraction.

4.2. Experimental Settings

In our experiments, to make a fair comparison, the word embedding trained in [4] is utilized in our method. The dimensions of word embedding, BiLSTM’s hidden state, and relative position embedding were set to 200, 100, and 50, respectively. In addition, for our BERT version model, the output dimension of pre-trained BERT is 768. The weight matrices and bias vectors involved in the two versions of our model were all randomly initialized by a continuous uniform distribution,

U (- 0 . 01, 0 . 01)

. To avoid overfitting, we applied dropout, and the dropout rate was set to 0.1. Compared to some excellent global optimization algorithms [56,57,58], Adam [59] is more effective in deep learning. Therefore, in the training process of our model, we utilized the Adam optimizer to update all parameters with the learning rate of 0.005, mini-batch size of 32, and

L_{2}

regularization coefficient of 1 × 10⁻⁵. Our models were performed on the NVIDIA GeForce RTX 2080 Ti GPUs.

4.3. Compared Methods

We compared our method with the following state-of-the-art methods. It is worth noting that the models above the dotted line in Table 1 did not adopt BERT.

Inter-EC, which uses emotion extraction to facilitate cause extraction, archives the best performance among the three pipelined methods proposed in [4].
Inter-ECNC [31], as a variant of Inter-EC, employs transformer to optimize the extraction of cause clauses.
DQAN [35] is a dual-questioning attention network, separately questioning candidate emotions and causes.
E2EECPE [53] is an end-to-end link prediction model of directed graph, which establishes the directional links from emotions to causes by a biaffine attention.
MTNECP [37] is a feature-shared, multi-task model and improves cause extraction with the help of position-aware emotion information.
SLSN [42] is a symmetrical network composed of two subnetworks. Each subnetwork also performs a local pairing search, while extracting each target clause.
LAE-MANN [38] explores a hierarchical attention to model the correlation between each pair of clauses.
TDGC [54] is a transition-based, end-to-end model that regards ECPE as the construction process of directed graph.
ECPE-2D [39] designs a 2D transformer to model the interaction between candidate pairs.
PairGCN [8] employs a GCN to learn the dependency relations between candidate pairs.
UTOS [52] redefines the ECPE task as a unified sequence labeling task, in which each label indicates not only the clause type, but also pairing index.
RANKCP [7] is a ranking model that introduces a GAT to learn the representations of clauses.
RSN [48] explicitly realizes the pairwise interaction between the three subtasks through multiple rounds of inference.

4.4. Main Results

The comparative results are shown in Table 1. We can observe that HHGAT achieves the best performance. In general, the end-to-end models obviously perform better than the pipelined models (e.g., Inter-EC, Inter-ECNC, and DQAN) because the end-to-end manner can avoid the cross-stage propagation of errors. In addition, better performance is usually achieved by the models with pre-trained BERT than those without it. Significantly, in terms of the F1-score, the non-BERT version of HHGAT outperforms SLSN (i.e., it is the best-performing model, without employing pre-trained BERT, and is based on LSTM) by 1.09% on emotion-cause pair extraction, which verifies the effectiveness of HHGAT for emotion-cause pair extraction.

By adopting BERT to encode the initial representation of nodes, the performance of HHGAT is further improved. Although LAE-MANN also designs a hierarchical attention network, it is not graph structure oriented, so it is inferior to our graph attention network in modeling the structural features of text. As shown in Table 1, LAE-MANN underperforms HHGAT by 9.75% in the F1-score of emotion-cause pair extraction. Inspired by Inter-EC, which utilizes the prediction results of emotions to promote cause extraction, ECPE-2D, UTOS, and RSN explicitly establish the interaction between emotion and cause, in their respective ways, to improve their performance. However, even without using the measures used in the above three methods, our model still outperforms them. Compared to the best-performing model RSN, the F1-scores of our HHGAT are increased by 1.51%, 1.41%, and 1.32% on emotion, cause, and emotion-cause pair extraction, respectively. This demonstrates that, even if the interaction between emotion and cause is not explicitly constructed, HHGAT can achieve excellent performance because of powerful modeling ability of the graph neural network.

Furthermore, TDGC, PairGCN, and RANKCP all employ graph structures to represent documents. However, TDGC is not realized by the graph neural network, but by LSTM, so its performance is the worst among these graph structure-based methods. Despite PairGCN and RANKCP employ GCN and GAT to learn node representations, respectively, they are all homogeneous graph oriented. This leads them to only focus on learning the correlations between the same kind of semantic elements. Different from them, our heterogeneous graph contains more kinds of nodes and richer semantic information. Compared to these three-graph, structure-based methods, our method improves the F1 score of emotion-cause pair extraction by 7.26%, 3.23%, and 1.65%, respectively. In summary, experimental results indicate that our method, based on the heterogeneous graph, is effective.

4.5. Ablation Study

To further validate the components of our model, we conduct an ablation experiment, where G1 denotes the clause node encoding layer, G2 represents the pair node encoding layer, and H1 and H2 correspond to the heterogeneous design of G1 and G2, respectively. The ablation results are shown in Table 2.

Firstly, HHGAT removes G2, resulting in the absence of dependency relations between local neighboring candidate pairs. As a result, the F1-score of emotion-cause pair extraction is decreased by 1.64%. This demonstrates that it is not enough to rely solely on modeling the word-clause connections. Specially, without an explicit interaction between emotion and cause, local context from neighboring pair nodes plays an important role in pairing the emotions and their corresponding causes.

Secondly, HHGAT w/o G2&H1 means that it only applies a graph attention network to learn the inter-clause relationships. Compared with HHGAT, the F1-score on emotion-cause pair extraction drops by 4.5%. The significant degradation of performance is mainly caused by the following two aspects. On the one hand, as the basic elements in clauses, words can provide more fine-grained semantic information. On the other hand, word nodes can enrich the correlations among clause nodes.

Then, HHGAT w/o G1 underperforms HHGAT by 0.98%, 2.09%, and 3.61% in the F1 scores of the three subtasks, respectively, which shows that our hierarchical design is beneficial to the ECPE task. This is because there is a natural hierarchical relationship between different semantic elements in human language. In addition, in the joint learning of three subtasks, good clause representation is helpful for the extraction of emotion-cause pairs.

Next, we can observe that the performance of HHGAT w/o G1&H2 is further dropped, compared with HHGAT w/o G1, because HHGAT w/o G1&H2 does not consider that the semantic information aggregated from neighboring nodes on different meta-paths is different. Hence, to learn more comprehensive pair node representations, it is necessary to employ a graph attention network based on meta-path on the pair-level subgraphs.

Finally, HHGAT w/o G1&G2 uses a clause-level BiLSTM to replace our two-layer graph attention network, which means that it is not a GNN-based method. Consequently, HHGAT w/o G1&G2 achieves the worst performance in all ablation models (F1-score dropped by 5.51%). The above results further show that each module of our method is helpful for the ECPE task.

4.6. Evaluation on Emotion-Cause Extraction

To provide a wider comparison, we also evaluate our model on the benchmark ECE corpus [2], and the compared models are as follows:

Multi-Kernel [2] proposes a convolution kernel-based learning method to train a multi-kernel SVM.
Memnet [15] is a convolutional deep memory network, which regards ECE as an answer retrieval task.
PAE-DGL [27], as a reordering model, integrates the relative position and global label, with text content.
CANN [18] presents a co-attention network based on emotional context awareness.
MBiAS [24] designs a multi-granularity bidirectional attention network in a machine comprehension frame.
RTHN [21] introduces a hierarchical neural network composed of RNN and transformer.
FSS-GCN [30] adopts a graph convolutional network to model the dependency information between clauses.

The comparative results are shown in Figure 3. It can be observed that our model achieves slightly higher F1 than RTHN (i.e., the best-performing one in the models that are not based on graph neural networks). This further verifies the effectiveness of our approach on emotion-cause extraction. Furthermore, the performance of our model and FSS-GCN (i.e., a graph structure-based model) is nearly matched, in terms of the F1-score. Different from FSS-GCN, in which only clause nodes are considered, the heterogeneous graph built by us contains more kinds of nodes, and the structure of our model is more complicated. However, it is worth noting that the compared methods listed in Figure 3 all need to annotate emotions before extracting causes. This is very labor-consuming. Therefore, when the performance is equivalent, our method is more suitable for real applications.

4.7. Case Study

4.7.1. Effect of Word-Clause Graph Attention

As shown in Figure 4, the information regarding the three clauses in one representative case (i.e., Document 41) is introduced, including the word identifier, clause identifier, and details of the clause. This document consists of eight clauses and contains one emotion-cause pair

(c_{4}, c_{3})

, where

c_{4}

and

c_{3}

are the emotion and cause clauses, respectively. To examine the effect of word-clause graph attention, we visualize the weight vector

a_{i} = [a_{i, 1}, \dots, a_{i, n}]

. The visualization results are shown in Figure 4—where the darker the color is, the higher the relevance is.

We can find that the dark color is mainly concentrated around the word “anxious” in the emotion clause

c_{4}

, which indicates that HHGAT can effectively capture the emotion keywords and ignore other non-emotion words. Moreover, in the cause clause

c_{3}

, the words “unable”, “to”, and “consider” are significantly darker, which semantically constitutes the cause for triggering the emotion “anxious”. This shows that our HHGAT is also able to focus on the cause keywords. In sharp contrast, the color of all words in clause

c_{2}

is very similar, which causes attention to be dispersed because

c_{2}

is neither an emotion clause nor a cause clause. Consequently, HHGAT is effective in learning the features of emotion and cause clauses.

4.7.2. Effect of Meta-Path-Based Attention

In this section, Document 41 is analyzed again to verify the effect of meta-path-based attention. To this end, we visualize the weight coefficients of different-typed meta-paths to each pair node, as shown in Figure 5. Since the document consists of eight clauses, we divide the visualization results into eight subgraphs, and each subgraph shows the attention visualization results of those pair nodes with the same candidate emotion clause. The color instructions are the same as that in the previous section.

From the visualization results in Figure 5, we can observe that the color distribution on these subgraphs is very similar. In each subgraph, the color of

Φ_{0}

corresponding to the pair node containing the ground-truth cause is the darkest. Additionally, in each row, the path with the largest weight coefficient to the target node is mostly the one where the real cause lies. In addition, as the offset from the central node or path increases, the correlation usually becomes lower. This shows that our method can find pair nodes containing ground-truth causes, according to the meta-paths.

Next, we conduct an inter-graph analysis, comparing the maximum attention coefficients in those rows corresponding to the ground-truth causes. In addition to Document 41, we also select the documents numbered 43, 167, and 151 as representative cases, where their emotion-cause pairs are

p_{5, 5}

,

p_{6, 4}

, and

p_{5, 4}

, respectively. The comparison results are shown in Figure 6. We can notice that the highest point on each fold line is consistent with the ground-truth emotion-cause pair, which indicates that our meta-path-based graph attention network can effectively identify the emotion-cause pairs. It is worth noting that the values of all points on the fold line denoting Document 43 are relatively close. This is because the clause

c_{5}

in Document 43 is both an emotion and cause clause, and each pair node on the fold line includes the clause

c_{5}

. The above results further verify that our method is effective for ECPE.

4.7.3. Error Analysis

In this section, we collect all emotion-cause pairs that were erroneously predicted on the test set. Inspired by [52], we also classify these errors into four categories, i.e., emotion, cause, both, and missing errors. Depending on the statistical results in Table 3, we can notice that the proportion of cause errors is the largest, followed by both errors. However, we can find that most of both errors are due to unlabeled emotions, which are usually irrelevant to the topic of the document. Furthermore, the proportion of missing errors is also relatively large. Therefore, we select two cases to analyze the cause and missing errors, respectively.

For the first case in Table 4, our model correctly predicts the emotion-cause pair

p_{8, 8}

, while it identifies Clause 8 as the cause clause in the emotion-cause pair

p_{10, 9}

by mistake. It may be the cause of the prediction error that Clause 8 triggers the occurrence of the event described in Clause 9. Therefore, the ability of our model in distinguishing the indirect causes from direct causes needs to be further strengthened. Furthermore, in the prediction result of Case 2, the ground-truth emotion-cause pair

p_{3, 5}

is missing. We observe that the clause “it feels like the sky is falling down” is a metaphor, so it expresses an implicit emotion. Obviously, there are no emotion keywords in implicit emotional expression, and the identification of such emotions needs to comprehensively consider language style, rhetoric, metaphor, and so on, so it is more difficult to identify implicit emotions.

5. Conclusions and Future Work

In this paper, we propose HHGAT to capture the global semantic information contained in the documents. Specifically, we first constructed a heterogeneous graph that considers all types of semantic elements involved in the ECPE and models the global semantic relations between these elements. Secondly, we proposed a hierarchical heterogeneous graph attention network to learn the representations of clauses and clause pairs with global semantic information. Thirdly, we conducted extensive experiments on the benchmark ECPE dataset. The experimental results show that our proposed method achieves a better performance than the 13 compared methods and out-performs the best competitor, RSN, by a 1.32% F1-score.

In addition, the essence of pairing emotions and causes is to calculate the similarity between them. Nevertheless, similarity is a fuzzy, and not clearly defined, concept. It is difficult for traditional graph neural networks to handle the fuzzy relationship. Therefore, we will introduce fuzzy graph theory [60,61,62] into graph neural networks in our future work, so as to effectively learn the fuzzy relation between clauses.

Author Contributions

J.Y.: conceptualization, methodology, formal analysis, software, validation, visualization, and writing original draft; W.L.: resources, supervision, project administration, and writing review; Y.H.: conceptualization, formal analysis, writing review, and editing; B.Z.: funding acquisition, data curation, and visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded in part by the National Natural Science Foundation of China, under grant No. 61672448, grant No. 61673142, and grant No. 61972167, as well as, in part, by the Key R&D project of Hebei Province, under grant No. 18270307D, and Natural Science Foundation of Heilongjiang Province of China, under grant No. JJ2019JQ0013.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lee, S.Y.M.; Chen, Y.; Huang, C.-R. A Text-Driven Rule-Based System for Emotion Cause Detection. In Proceedings of the 2010 North American Chapter of the Association for Computational Linguistics (NAACL), Los Angeles, CA, USA, 5 June 2010; pp. 45–53. [Google Scholar]
Gui, L.; Wu, D.; Xu, R.; Lu, Q.; Zhou, Y. Event-Driven Emotion Cause Extraction with Corpus Construction. In Proceedings of the 2016 Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 1639–1649. [Google Scholar]
Xu, R.; Hu, J.; Lu, Q.; Wu, D.; Gui, L. An Ensemble Approach for Emotion Cause Detection with Event Extraction and Multi-Kernel SVMs. Tsinghua Sci. Technol. 2017, 22, 646–659. [Google Scholar] [CrossRef]
Xia, R.; Ding, Z. Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts. In Proceedings of the 57th Association for Computational Linguistics, Florence, Italy, 28 July 2019; pp. 1003–1012. [Google Scholar]
Gori, M.; Monfardini, G.; Scarselli, F. A New Model for Learning in Graph Domains. In Proceedings of the 2005 Neural Networks, Montreal, QC, Canada, 31 July 2005–4 August 2005; Volume 2, pp. 729–734. [Google Scholar]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef] [PubMed]
Wei, P.; Zhao, J.; Mao, W. Effective Inter-Clause Modeling for End-to-End Emotion-Cause Pair Extraction. In Proceedings of the 58th Association for Computational Linguistics, Online, 5–10 July 2020; pp. 3171–3181. [Google Scholar]
Chen, Y.; Hou, W.; Li, S.; Wu, C.; Zhang, X. End-to-End Emotion-Cause Pair Extraction with Graph Convolutional Network. In Proceedings of the 28th Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 198–207. [Google Scholar]
Chen, Y.; Lee, S.Y.M.; Li, S.; Huang, C.-R. Emotion Cause Detection with Linguistic Constructions. In Proceedings of the 23rd Computational Linguistics (Coling 2010), Beijing, China, 23–27 August 2010; pp. 179–187. [Google Scholar]
Gao, K.; Xu, H.; Wang, J. Emotion Cause Detection for Chinese Micro-Blogs Based on ECOCC Model. Advances in Knowledge Discovery and Data Mining; Springer International Publishing: Cham, Switzerland, 2015; pp. 3–14. [Google Scholar]
Gao, K.; Xu, H.; Wang, J. A Rule-Based Approach to Emotion Cause Detection for Chinese Micro-Blogs. Expert Syst. Appl. 2015, 42, 4517–4528. [Google Scholar] [CrossRef]
Russo, I.; Caselli, T.; Rubino, F.; Boldrini, E.; Martínez-Barco, P. EMOCause: An Easy-Adaptable Approach to Emotion Cause Contexts. In Proceedings of the 2nd Computational Approaches to Subjectivity and Sentiment Analysis, Portland, OR, USA, 24 June 2011; pp. 153–160. [Google Scholar]
Gui, L.; Yuan, L.; Xu, R.; Liu, B.; Lu, Q.; Zhou, Y. Emotion Cause Detection with Linguistic Construction in Chinese Weibo Text. In Proceedings of the Natural Language Processing and Chinese Computing, Shenzhen, China, 5–9 December 2014; pp. 457–464. [Google Scholar]
Ghazi, D.; Inkpen, D.; Szpakowicz, S. Detecting Emotion Stimuli in Emotion-Bearing Sentences. In Computational Linguistics and Intelligent Text Processing; Springer International Publishing: Cham, Switzerland, 2015; pp. 152–165. [Google Scholar]
Gui, L.; Hu, J.; He, Y.; Xu, R.; Lu, Q.; Du, J. A Question Answering Approach for Emotion Cause Extraction. In Proceedings of the 2017 Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; pp. 1593–1602. [Google Scholar]
Cheng, X.; Chen, Y.; Cheng, B.; Li, S.; Zhou, G. An Emotion Cause Corpus for Chinese Microblogs with Multiple-User Structures. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2017, 17, 1–19. [Google Scholar] [CrossRef]
Chen, Y.; Hou, W.; Cheng, X. Hierarchical Convolution Neural Network for Emotion Cause Detection on Microblogs. In Proceedings of the International Conference on Artificial Neural Networks and Machine Learning (ICANN 2018), Cham, Switzerland, 2018; pp. 115–122. [Google Scholar]
Li, X.; Song, K.; Feng, S.; Wang, D.; Zhang, Y. A Co-Attention Neural Network Model for Emotion Cause Analysis with Emotional Context Awareness. In Proceedings of the 2018 Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 4752–4757. [Google Scholar]
Li, X.; Feng, S.; Wang, D.; Zhang, Y. Context-Aware Emotion Cause Analysis with Multi-Attention-Based Neural Network. Knowl. Based Syst. 2019, 174, 205–218. [Google Scholar] [CrossRef]
Yu, X.; Rong, W.; Zhang, Z.; Ouyang, Y.; Xiong, Z. Multiple Level Hierarchical Network-Based Clause Selection for Emotion Cause Extraction. IEEE Access 2019, 7, 9071–9079. [Google Scholar] [CrossRef]
Xia, R.; Zhang, M.; Ding, Z. RTHN: A RNN-Transformer Hierarchical Network for Emotion Cause Extraction. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI-19, Macao, China, 10–16 August 2019; pp. 5285–5291. [Google Scholar]
Fan, C.; Yan, H.; Du, J.; Gui, L.; Bing, L.; Yang, M.; Xu, R.; Mao, R. A Knowledge Regularized Hierarchical Approach for Emotion Cause Analysis. In Proceedings of the 2019 Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5614–5624. [Google Scholar]
Hu, J.; Shi, S.; Huang, H. Combining External Sentiment Knowledge for Emotion Cause Detection. In Proceedings of the Natural Language Processing and Chinese Computing, Dunhuang, China, 9–14 October 2019; pp. 711–722. [Google Scholar]
Diao, Y.; Lin, H.; Yang, L.; Fan, X.; Chu, Y.; Wu, D.; Xu, K.; Xu, B. Multi-Granularity Bidirectional Attention Stream Machine Comprehension Method for Emotion Cause Extraction. Neural. Comput. Applic. 2020, 32, 8401–8413. [Google Scholar] [CrossRef]
Chen, Y.; Hou, W.; Cheng, X.; Li, S. Joint Learning for Emotion Classification and Emotion Cause Detection. In Proceedings of the 2018 Empirical Methods in Natural Language Processing, Brussels, Belgium, 3 October–4 November 2018; pp. 646–651. [Google Scholar]
Hu, G.; Lu, G.; Zhao, Y. Emotion-Cause Joint Detection: A Unified Network with Dual Interaction for Emotion Cause Analysis. In Proceedings of the Natural Language Processing and Chinese Computing, Zhengzhou, China, 14–18 October 2020; pp. 568–579. [Google Scholar]
Ding, Z.; He, H.; Zhang, M.; Xia, R. From Independent Prediction to Reordered Prediction: Integrating Relative Position and Global Label Information to Emotion Cause Identification. Proc. AAAI Conf. Artif. Intell. 2019, 33, 6343–6350. [Google Scholar] [CrossRef]
Xu, B.; Lin, H.; Lin, Y.; Diao, Y.; Yang, L.; Xu, K. Extracting Emotion Causes Using Learning to Rank Methods from an Information Retrieval Perspective. IEEE Access 2019, 7, 15573–15583. [Google Scholar] [CrossRef]
Xiao, X.; Wei, P.; Mao, W.; Wang, L. Context-Aware Multi-View Attention Networks for Emotion Cause Extraction. In Proceedings of the 2019 Intelligence and Security Informatics (ISI), Shenzhen, China, 1–3 July 2019; pp. 128–133. [Google Scholar]
Hu, G.; Lu, G.; Zhao, Y. FSS-GCN: A Graph Convolutional Networks with Fusion of Semantic and Structure for Emotion Cause Analysis. Knowl. Based Syst. 2021, 212, 106584. [Google Scholar] [CrossRef]
Shan, J.; Zhu, M. A New Component of Interactive Multi-Task Network Model for Emotion-Cause Pair Extraction. In Proceedings of the 3rd Computer Information Science and Artificial Intelligence (CISAI), Inner Mongolia, China, 25–27 September 2020; pp. 12–22. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
Yu, J.; Liu, W.; He, Y.; Zhang, C. A Mutually Auxiliary Multitask Model with Self-Distillation for Emotion-Cause Pair Extraction. IEEE Access 2021, 9, 26811–26821. [Google Scholar] [CrossRef]
Jia, X.; Chen, X.; Wan, Q.; Liu, J. A Novel Interactive Recurrent Attention Network for Emotion-Cause Pair Extraction. In Proceedings of the 3rd Algorithms, Computing and Artificial Intelligence, New York, NY, USA, 24 December 2020; pp. 1–9. [Google Scholar]
Sun, Q.; Yin, Y.; Yu, H. A Dual-Questioning Attention Network for Emotion-Cause Pair Extraction with Context Awareness. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Online, 18–22 July 2021; pp. 1–8. [Google Scholar]
Shi, J.; Li, H.; Zhou, J.; Pang, Z.; Wang, C. Optimizing Emotion–Cause Pair Extraction Task by Using Mutual Assistance Single-Task Model, Clause Position Information and Semantic Features. J. Supercomput. 2021, 78, 4759–4778. [Google Scholar] [CrossRef]
Wu, S.; Chen, F.; Wu, F.; Huang, Y.; Li, X. A Multi-Task Learning Neural Network for Emotion-Cause Pair Extraction. In Proceedings of the 24th European Conference on Artificial Intelligence—ECAI 2020, Santiago de Compostela, Spain, 29 August–8 September 2020. [Google Scholar]
Tang, H.; Ji, D.; Zhou, Q. Joint Multi-Level Attentional Model for Emotion Detection and Emotion-Cause Pair Extraction. Neurocomputing 2020, 409, 329–340. [Google Scholar] [CrossRef]
Ding, Z.; Xia, R.; Yu, J. ECPE-2D: Emotion-Cause Pair Extraction Based on Joint Two-Dimensional Representation, Interaction and Prediction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 6–8 July 2020; pp. 3161–3170. [Google Scholar]
Fan, R.; Wang, Y.; He, T. An End-to-End Multi-Task Learning Network with Scope Controller for Emotion-Cause Pair Extraction. In Proceedings of the Natural Language Processing and Chinese Computing, Zhengzhou, China, 14–18 October 2020; pp. 764–776. [Google Scholar]
Ding, Z.; Xia, R.; Yu, J. End-to-End Emotion-Cause Pair Extraction Based on Sliding Window Multi-Label Learning. In Proceedings of the 2020 Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 3574–3583. [Google Scholar]
Cheng, Z.; Jiang, Z.; Yin, Y.; Yu, H.; Gu, Q. A Symmetric Local Search Network for Emotion-Cause Pair Extraction. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 139–149. [Google Scholar]
Wu, D.; Wu, C. Research on the Time-Dependent Split Delivery Green Vehicle Routing Problem for Fresh Agricultural Products with Multiple Time Windows. Agriculture 2022, 12, 793. [Google Scholar] [CrossRef]
Deng, W.; Ni, H.; Liu, Y.; Chen, H.; Zhao, H. An Adaptive Differential Evolution Algorithm Based on Belief Space and Generalized Opposition-Based Learning for Resource Allocation. Appl. Soft Comput. 2022, 127, 109419. [Google Scholar] [CrossRef]
Singh, A.; Hingane, S.; Wani, S.; Modi, A. An End-to-End Network for Emotion-Cause Pair Extraction. In Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Online, 19 April 2021; pp. 84–91. [Google Scholar]
Fan, W.; Zhu, Y.; Wei, Z.; Yang, T.; Ip, W.H.; Zhang, Y. Order-Guided Deep Neural Network for Emotion-Cause Pair Prediction. Appl. Soft Comput. 2021, 112, 107818. [Google Scholar] [CrossRef]
Yang, X.; Yang, Y. Emotion-Type-Based Global Attention Neural Network for Emotion-Cause Pair Extraction. In Proceedings of the International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), Fuzhou, China, July 30–August 1 2022; pp. 546–557. [Google Scholar]
Chen, F.; Shi, Z.; Yang, Z.; Huang, Y. Recurrent Synchronization Network for Emotion-Cause Pair Extraction. Knowl. Based Syst. 2022, 238, 107965. [Google Scholar] [CrossRef]
Yuan, C.; Fan, C.; Bao, J.; Xu, R. Emotion-Cause Pair Extraction as Sequence Labeling Based on A Novel Tagging Scheme. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 3568–3573. [Google Scholar]
Fan, C.; Yuan, C.; Gui, L.; Zhang, Y.; Xu, R. Multi-Task Sequence Tagging for Emotion-Cause Pair Extraction Via Tag Distribution Refinement. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 2339–2350. [Google Scholar] [CrossRef]
Chen, X.; Li, Q.; Wang, J. A Unified Sequence Labeling Model for Emotion Cause Pair Extraction. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 208–218. [Google Scholar]
Cheng, Z.; Jiang, Z.; Yin, Y.; Li, N.; Gu, Q. A Unified Target-Oriented Sequence-to-Sequence Model for Emotion-Cause Pair Extraction. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 2779–2791. [Google Scholar] [CrossRef]
Song, H.; Zhang, C.; Li, Q.; Song, D. An End-to-End Multi-Task Learning to Link Framework for Emotion-Cause Pair Extraction. arXiv 2020, arXiv:2002.10710. [Google Scholar]
Fan, C.; Yuan, C.; Du, J.; Gui, L.; Yang, M.; Xu, R. Transition-Based Directed Graph Construction for Emotion-Cause Pair Extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 3707–3717. [Google Scholar]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Yao, R.; Guo, C.; Deng, W.; Zhao, H. A Novel Mathematical Morphology Spectrum Entropy Based on Scale-Adaptive Techniques. ISA Trans. 2022, 126, 691–702. [Google Scholar] [CrossRef] [PubMed]
Zhou, X.; Ma, H.; Gu, J.; Chen, H.; Deng, W. Parameter Adaptation-Based Ant Colony Optimization with Dynamic Hybrid Mechanism. Eng. Appl. Artif. Intell. 2022, 114, 105139. [Google Scholar] [CrossRef]
Deng, W.; Xu, J.; Gao, X.-Z.; Zhao, H. An Enhanced MSIQDE Algorithm with Novel Multiple Strategies for Global Optimization Problems. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 1578–1587. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference for Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Akram, M. M−Polar Fuzzy Graphs: Theory, Methods & Applications; Studies in Fuzziness and Soft Computing; Springer International Publishing: Cham, Switzerland, 2019; pp. 1–3. ISBN 978-3-030-03750-5. [Google Scholar]
Akram, M.; Zafar, F. Hybrid Soft Computing Models Applied to Graph Theory; Studies in Fuzziness and Soft Computing; Springer International Publishing: Cham, Switzerland, 2020; Volume 380, ISBN 978-3-030-16019-7. [Google Scholar]
Akram, M.; Luqman, A. Fuzzy Hypergraphs and Related Extensions; Studies in Fuzziness and Soft Computing; Springer: Singapore, 2020; Volume 390, ISBN 9789811524028. [Google Scholar]

Figure 1. A toy example of heterogeneous graph composed of word, clause, and pair nodes.

Figure 2. (a) An overview of HHGAT; (b) node initialization layer; (c) clause node encoding layer; (d) pair node encoding layer.

Figure 3. Comparison of experimental results on ECE.

Figure 4. Visualization of word-clause attention.

Figure 5. Visualization of meta-path-based attention.

Figure 6. The inter-graph analysis of meta-path-based attention.

Table 1. Comparison of experimental results on the emotion extraction, cause extraction, and ECPE.

Category	Method	Emotion Extraction			Cause Extraction			Emotion-Cause Pair Extraction
Category	Method	P	R	F1	P	R	F1	P	R	F1
Pipelined	Inter-EC [4]	0.8364	0.8107	0.8230	0.7041	0.6083	0.6507	0.6721	0.5705	0.6128
	Inter-ECNC [31]	-	-	-	0.6863	0.6254	0.6544	0.6601	0.5734	0.6138
	DQAN [35]	-	-	-	0.7732	0.6370	0.6979	0.6733	0.6040	0.6362
End-to-end	E2EECPE [53]	0.8552	0.8024	0.8275	0.7048	0.6159	0.6571	0.6491	0.6195	0.6315
	MTNECP [37]	0.8662	0.8393	0.8520	0.7400	0.6378	0.6844	0.6828	0.5894	0.6321
	SLSN [42]	0.8406	0.7980	0.8181	0.6992	0.6588	0.6778	0.6836	0.6291	0.6545
	LAE-MANN [38]	0.8990	0.8000	0.8470	-	-	-	0.7110	0.6070	0.6550
	TDGC [54]	0.8716	0.8244	0.8474	0.7562	0.6471	0.6974	0.7374	0.6307	0.6799
	ECPE-2D [39]	0.8627	0.9221	0.8910	0.7336	0.6934	0.7123	0.7292	0.6544	0.6889
	PairGCN [8]	0.8857	0.7958	0.8375	0.7907	0.6928	0.7375	0.7692	0.6791	0.7202
	UTOS [52]	0.8815	0.8321	0.8556	0.7671	0.7320	0.7471	0.7389	0.7062	0.7203
	RANKCP [7]	0.9123	0.8999	0.9057	0.7461	0.7788	0.7615	0.7119	0.7630	0.7360
	RSN [48]	0.8614	0.8922	0.8755	0.7727	0.7398	0.7545	0.7601	0.7219	0.7393
Ours	w/o BERT	0.8361	0.8327	0.8337	0.7157	0.6519	0.6811	0.7143	0.6238	0.6654
Ours	HHGAT	0.8655	0.9181	0.8906	0.7427	0.7988	0.7686	0.7458	0.7631	0.7525

Table 2. Experimental results of structural ablation.

Method	Emotion Extraction			Cause Extraction			Emotion-Cause Pair Extraction
Method	P	R	F1	P	R	F1	P	R	F1
HHGAT	0.8655	0.9181	0.8906	0.7427	0.7988	0.7686	0.7458	0.7631	0.7525
w/o G2	0.8553	0.9164	0.8839	0.7365	0.7970	0.7644	0.7093	0.7692	0.7361
w/o G2&H1	0.8625	0.9116	0.8860	0.7300	0.7654	0.7464	0.6895	0.7296	0.7075
w/o G1	0.8618	0.9021	0.8808	0.7381	0.7583	0.7477	0.7113	0.7224	0.7164
w/o G1&H2	0.8375	0.9170	0.8748	0.7308	0.7615	0.7449	0.6884	0.7379	0.7111
w/o G1&G2	0.8596	0.9148	0.8858	0.7296	0.7456	0.7353	0.6831	0.7169	0.6974

Table 3. The statistics of error emotion-cause pairs.

Category	Emotion Error	Cause Error	Both Error	Missing Error
Proportion	3.3%	46.2%	30.8%	19.7%

Table 4. Two error cases.

Case	Truth	Prediction
[ ... ]. [Xiao was holding Long’s 2-year-old son]⁷. [Fearing that Long would hurt the child]⁸, [he knocked Long on the head with a lid]⁹, [and then Long became angry]¹⁰. [ ... ].	[8, 8] [10, 9]	[8, 8] [10, 8]
[ ... ]. [“It feels like the sky is falling down”]³. [Xu Ping described how she felt when she learned that her husband was ill]⁴, [he knocked Long on the head with a lid]⁵. [ ... ].	[3, 5]	[ ]

The superscript number at the end of a clause indicates the clause number.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, J.; Liu, W.; He, Y.; Zhong, B. A Hierarchical Heterogeneous Graph Attention Network for Emotion-Cause Pair Extraction. Electronics 2022, 11, 2884. https://doi.org/10.3390/electronics11182884

AMA Style

Yu J, Liu W, He Y, Zhong B. A Hierarchical Heterogeneous Graph Attention Network for Emotion-Cause Pair Extraction. Electronics. 2022; 11(18):2884. https://doi.org/10.3390/electronics11182884

Chicago/Turabian Style

Yu, Jiaxin, Wenyuan Liu, Yongjun He, and Bineng Zhong. 2022. "A Hierarchical Heterogeneous Graph Attention Network for Emotion-Cause Pair Extraction" Electronics 11, no. 18: 2884. https://doi.org/10.3390/electronics11182884

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hierarchical Heterogeneous Graph Attention Network for Emotion-Cause Pair Extraction

Abstract

1. Introduction

2. Related Work

2.1. ECE

2.2. ECPE

2.2.1. Pipelined ECPE

2.2.2. End-to-End ECPE

3. Methodology

3.1. Task Definition

3.2. Overview

3.3. Heterogeneous Graph Construction

3.4. Hierarchical Heterogeneous Graph Attention Network

3.4.1. Node Initialization Layer

3.4.2. Clause Node Encoding Layer

3.4.3. Pair Node Encoding Layer

3.5. Model Training and Optimization

4. Experiments

4.1. Dataset and Evaluation Metrics

4.2. Experimental Settings

4.3. Compared Methods

4.4. Main Results

4.5. Ablation Study

4.6. Evaluation on Emotion-Cause Extraction

4.7. Case Study

4.7.1. Effect of Word-Clause Graph Attention

4.7.2. Effect of Meta-Path-Based Attention

4.7.3. Error Analysis

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI