Global-Local Dynamic Adversarial Learning for Cross-Domain Sentiment Analysis

Juntao Lyu; Zheyuan Zhang; Shufeng Chen; Xiying Fan

doi:10.3390/math11143130

Abstract

As one of the most widely used applications in domain adaption (DA), Cross-domain sentiment analysis (CDSA) aims to tackle the barrier of lacking in sentiment labeled data. Applying an adversarial network to DA to reduce the distribution discrepancy between source and target domains is a significant advance in CDSA. This adversarial DA paradigm utilizes a single global domain discriminator or a series of local domain discriminators to reduce marginal or conditional probability distribution discrepancies. In general, each discrepancy has a different effect on domain adaption. However, the existing CDSA algorithms ignore this point. Therefore, in this paper, we propose an effective, novel and unsupervised adversarial DA paradigm, Global-Local Dynamic Adversarial Learning (GLDAL). This paradigm is able to quantitively evaluate the weights of global distribution and every local distribution. We also study how to apply GLDAL to CDSA. As GLDAL can effectively reduce the distribution discrepancy between domains, it performs well in a series of CDSA experiments and achieves improvements in classification accuracy compared to similar methods. The effectiveness of each component is demonstrated through ablation experiments on different parts and a quantitative analysis of the dynamic factor. Overall, this approach achieves the desired DA effect with domain shifts.

Keywords:

adversarial domain adaption; cross-domain sentiment analysis; global-local dynamic adversarial learning

MSC:

68T01

1. Introduction

Deep neural networks have led to impressive improvements in data mining. However, it is always time-consuming and expensive to acquire sufficient data, and especially to label them. The emergence of domain adaption [1] has significantly reduced the difficulty of labeling data and transferring knowledge between different domains. One of the most challenging problems in domain adaption is how to reduce or even eliminate the differences between the source domain and target domain [2]. In CDSA, these differences are reflected as domain discrepancies owing to the distinct expression of the reviewers’ emotions from various domains [3]. In recent years, adversarial domain adaption [4] has been widely used to reduce different domains’ distribution discrepancies. Most of them depend on either global domain discriminator to reduce marginal probability distribution discrepancy, or several local domain discriminators to reduce conditional probability distribution discrepancy. For example, DANN [4] only implements global adversarial adaption to align global distributions from different domains. Analogously, MADA [5] only utilizes a series of local domain discriminators to align subdomains. In true application scenarios, the global alignment and every single local alignment always contribute differently to the entire domain adaption. When two domains’ marginal probability distributions are more dissimilar, it is obvious that global alignment plays a bigger role. Otherwise, local alignment is more important. Similarly, each local alignment makes different contributions to the total local alignment.

Due to the issues of domain transfer and concept drift, a sentiment classification model trained in one domain may not effectively generalize to other domains. The expressions of sentiment, viewpoints, and emotionally charged words evolve over time, making it challenging to maintain effective sentiment classification models across different domains. Therefore, employing domain adaptation methods can help reduce the drift of sentiment expression [6]. The main purpose of this study is to reduce the data distribution gap between the two emotion domains to enhance the generalization ability of cross-domain emotion classification data models. We want to achieve this goal by utilizing the global-domain discriminator and local-domain discriminator commonly used in traditional natural-language-processing text classifiers to minimize the difference between data margins and global distribution, thereby improving the generalization ability of cross-domain sentiment classification.

In this paper, we propose a novel domain adaption method Global–local Dynamic Adversarial Learning (GLDAL) for unsupervised adaption. GLDAL captures the multi-mode structure of data dynamically through adversarial learning. In CDSA, GLDAL is able to amplify the influence of the pivot words to learn domain-invariant features. The core components of GLDAL are the Global Dynamic Adversarial Factor (GDAF) and Local Dynamic Adversarial Factors (LDAF). The former has the ability to evaluate the relative significance of the marginal and conditional distributions both dynamically and quantitatively. Similarly, the latter plays the same role in the relationship between each conditional distribution. Therefore, GLDAL is able to improve the generalization of adversarial domain adaption.

In the following text, we detail the use of our proposed approach in deep neural network architectures, and experiment on popular sentiment classification datasets (SST, IMDB), where our proposed approach shows improvements compared to the previous state-of-the-art accuracy.

2. Related Work

2.1. Unsupervised Domain Adaption in Transfer Learning

Unsupervised domain adaptation (UDA) is an important branch of transfer learning. There are two main forms of UDA: traditional machine learning and deep learning. UDA based on traditional machine learning can be divided into two categories: (1) Subspace Alignment (SA) [7] and CORAL [8] use the subspace statistical characteristics to eliminate domain differences. (2) Distribution alignments TCA [9], JDA [10], BDA [11] and MEDA [12] are proposed to align the marginal probability distributions or conditional probability distributions between domains.

In the last few years, deep neural networks have been widely used in domain adaption [13,14,15]. Domain adversarial learning, as a branch of deep domain adaption, has been popular in recent years. The idea of adversarial learning comes from Generative Adversarial Networks (GAN) [16,17]. DANN [4] aligns the source and target distributions with only a single global domain discriminator. MADA [5] is able to execute the fine-grained alignment of different data distributions using multi-mode discriminators. DAAN [18] dynamically evaluates the relationship between the marginal and conditional distributions. Our GLDAL is also based on adversarial learning and global attention mechanism. By evaluating the relationship between every local subdomain and the relative importance of the marginal and conditional distributions, GLDAL significantly outperforms existing methods.

2.2. Cross-Domain Sentiment Analysis

Sentiment analysis is also called opinion mining or tendency analysis. It is the process of analyzing and reasoning about subjective texts with emotions. Text representation is an important step in sentiment analysis. In recent years, the most representative work has been the BERT pre-trained model [19], Transformer-XL [20]. However, thegeneral sentiment analysis studies mentioned above only consider the performanc e in a single domain, ignoring the generalization ability, so cross-domain sentiment analysis has quickly become a research hotspot. BERT-DAAT [21] is a pre-trained model based on adversarial training methods that aims to provide the trained BERT with domain awareness. ADS-SAL [22] can dynamically learn an alignment weight for each word, so more important words will obtain higher alignment weights to achieve fine-grained adaptation.

3. Method

3.1. Problem Definition

In CDSA tasks about domain adaption, there are a source domain

D_{s} = {\{x_{s}^{i}, y_{s}^{i}\}}_{i = 1}^{N_{s}}

and a target domain

D_{t} = {\{x_{t}^{i}\}}_{i = 1}^{N_{t}}

, where

x_{s}

and

x_{t}

is the set of sentences and

y_{s}

are the corresponding sentiment polarity labels. The goal of CDSA is to predict the target label

{\hat{y}}^{t} = arg max G_{y} (f (x_{t}))

and minimize the target risk

ϵ_{t} (G_{y}) = E_{(x_{t}, y_{t}) \sim D_{t}} [G_{y} (f (x_{t})) \neq y_{t}]

, where

G_{y} (\cdot)

represents the Softmax output and

f (\cdot)

refers to the feature representation.

3.2. Backbones for Cross-Domain Sentiment Analysis

3.2.1. Bidirectional Gate Recurrent Units with Attention

A Gate Recurrent Unit (GRU) with the attention mechanism [23] is an effective recurrent neural network for sentiment analysis. For a given

n

-dimensional input

(x_{1}, x_{2}, \dots, x_{n})

, the hidden layer of Bi-GRU outputs

h_{t}

at time

t

. The calculation process is as follows:

h_{f_{t}} = σ (W_{x h_{f}} x_{t} + W_{h_{f^{h} f}} h_{f_{t - 1}} + b_{h_{f}}), h_{b_{t}} = σ (W_{x h_{b}} x_{t} + W_{h_{b} h_{b}} h_{b_{t - 1}} + b_{h_{b}}),

(1)

h_{t} = h_{f_{t}} \oplus h_{b_{t}}

, W is the weight matrix and b is the bias vector;

σ

is the activation function;

h_{f_{t}}

and

h_{b_{t}}

are the outputs of positive and negative GRU, respectively. ⊕ represents the element-wise summation. The simplification method is to construct a single vector c from the whole sequence, as follows:

e_{t} = Attention (h_{t}), α_{t} = \frac{exp (e_{t})}{\sum_{k = 1}^{T} exp (e_{k})}, c = \sum_{t = 1}^{T} α_{t} h_{t} .

(2)

3.2.2. Capsule Neural Network Based on BERT

The BERT [19] model is a language model based on two-way Transformer. We directly use the feature representation of BERT as the word-embedding feature of the task and regard BERT as the upstream network. The Capsule Network [24] is composed of a group of neurons, which are used to represent the parameters of a specific type of object. We regard this network as a downstream network. The unique activation function (“squashing”) is formulated as:

v_{j} = \frac{{∥s_{j}∥}^{2}}{1 + {∥s_{j}∥}^{2}} \frac{s_{j}}{∥s_{j}∥},

(3)

s_{j}

is its total input and

v_{j}

is the output of capsule

j

. For capsule

s_{j}

, the input is determined by multiplying the output

u_{i}

of a capsule in the former layer by a weight matrix

W_{i j}

,

c_{i j}

, which are coefficients for coupling determined by the dynamic routing:

s_{j} = \sum_{i} c_{i j} {\hat{u}}_{j ∣ i}, {\hat{u}}_{j ∣ i} = W_{i j} u_{i} .

(4)

3.3. Domain Adaption with Adversarial Learning

This adversarial training is like a game with two parts: feature extractor

G_{f}

and domain discriminator

G_{d}

. Through maximizing the loss of

G_{d}

, the parameters

θ_{f}

of

G_{f}

are trained while the parameters

θ_{d}

of

G_{d}

are trained by minimizing the loss of the domain discriminator. The total loss can be formalized as:

\begin{matrix} L (θ_{f}, θ_{y}, θ_{d}) = \frac{1}{n_{s}} \sum_{x_{i} \in D_{s}} L_{y} (G_{y} (G_{f} (x_{i})), y_{i}) \\ - \frac{λ}{n_{s} + n_{t}} \sum_{x_{i} \in (D_{s} \cup D_{t})} L_{d} (G_{d} (G_{f} (x_{i})), d_{i}), \end{matrix}

(5)

Update the parameters of each part as the following equations:

({\hat{θ}}_{f}, {\hat{θ}}_{y}) = arg min_{θ_{f}, θ_{y}} L (θ_{f}, θ_{y}, θ_{d}), ({\hat{θ}}_{d}) = arg max_{θ_{d}} L (θ_{f}, θ_{y}, θ_{d})

(6)

Due to the adversarial relationship with parameter updating, the parameters

θ_{f}

,

θ_{y}

,

θ_{d}

will deliver a saddle point of Equation (6) after the training converges.

3.4. Global-Local Dynamic Adversarial Adaption Network

At present, the mainstream adversarial adaption methods reduce marginal or conditional probability distribution discrepancies. However, it is quite difficult to evaluate the importance of each contribution. Therefore, we should find a novel method that can quantitatively evaluate their importance and dynamically adjust the parameters of the neural network.

In this paper, we propose the Global–local Dynamic Adversarial Learning (GLDAL) shown in Figure 1 to make key improvements. GLDAL can effectively evaluate every single distribution’s importance, aiming to better learn the domain-invariant features through adversarial learning. In GLDAL, the Global domain discriminator

G_{d}

and a series of Local domain discriminators

G_{d}^{u}

play a role in the domain adaption of marginal and conditional distributions, respectively. The most important innovation point is that we propose a novel, global-local dynamic training strategy using the Global Dynamic Factor

ω

and Local Dynamic Factors

{\{α_{u}\}}_{u = 1}^{U}

.

Figure 1. The architecture of the proposed Global–local Dynamic Adversarial Learning (GLDAL). First, input the Text Sequences into the Feature Extractor to obtain the feature vector distributed in the feature space. Next, there are two main loss functions, namely

L_{y}

and

L_{d}

.

L_{y}

is the classification loss.

L_{d}

is the loss combination of the global domain discriminator and local domain discriminators.

3.4.1. Label Classifier

The label classifier

G_{y}

(The blue part in Figure 1) in trained by samples from the source domain to implement label classification. The loss of

G_{y}

is formulated as:

L_{y} = \frac{1}{n_{s}} \sum_{x_{i} \in D_{s}} C r o s s E n t r o p y (G_{y} (G_{f} (x_{i})), y_{i}),

(7)

where

x_{i}, y_{i}

represents samples and their labels from the source domain;

G_{f}

is the feature extractor.

3.4.2. Global Domain Discriminator

Global domain discriminator

G_{d}

(The pink part in Figure 1) is wisely implemented to align the marginal probability distributions between two domains (source domain and target domain) in the feature space. We define the loss of global domain discriminator

G_{d}

as:

L_{g l o b a l} = \frac{1}{n_{s} + n_{t}} \sum_{x_{i} \in D_{s} \cup D_{t}} L_{d} (G_{d} (G_{f} (x_{i})), d_{i}) .

(8)

In this formula,

L_{d}

represents the loss of domain discriminator.

d_{i}

is the domain label of the sample

x_{i}

and

G_{f}

is the feature extractor.

3.4.3. Local Domain Discriminator

The local domain discriminator

G_{d}^{u}

(The purple part in Figure 1) is composed of a series of U class-wise discriminators

G_{d}^{u}

; each is used to match the source domain and target domain data associated with class u. The output of the label classifier

{\hat{y}}_{i} = G_{y} (x_{i})

to each data point

x_{i}

is a probability distribution of the U classes. Therefore, we utilize the probability distribution

{\hat{y}}_{i}

to measure how many data points

x_{i}

should be attended to the U domain

G_{d}^{u}

. The loss function of the local domain discriminator is defined as:

L_{l o c a l} = \frac{1}{n_{s} + n_{t}} \sum_{u = 1}^{U} \sum_{x_{i} \in D_{s} \cup D_{t}} L_{d}^{u} (G_{d}^{u} ({\hat{y}}_{i}^{u} G_{f} (x_{i})), d_{i}),

(9)

3.5. Global and Local Adversarial Factors

In this part, we introduce how to quantitively evaluate the global distributions and each local distribution. Due to the adversarial characteristic, it is tricky to determine a concrete scheme. Generally, there are two straightforward ideas: Random Setting and Average Step Searching. The former randomly sets the value of dynamic factors and the latter picks the value of these factors with the same step size. For instance, if the scope of factor

ω

is [0, 1], the latter strategy will pick the value of

ω

= 0, 0.1, …, 0.9, 1.0 to perform training. However, both strategies are computation-consuming and time-consuming.

Therefore, in this paper, we propose two kinds of factors: global dynamic adversarial factor

ω

and local dynamic adversarial factors

{\{α_{u}\}}_{u = 1}^{U}

in order to dynamically and quantitively evaluate the importance of each distribution. This strategy has two significant advantages. Firstly, we utilize deep adversarial representations instead of shallow representations to learn these two kinds of factors, making GLDAL more robust. Secondly, every single dynamic factor is determined by the

SA - d i s t a n c e

(not hinge loss) of the discriminators, which is more efficient and convenient. In order to calculate the dynamic factors, we denote the global

SA - d i s t a n c e

of the global domain discriminator as:

d_{SA, g} (D_{s}, D_{t}) = 2 (1 - 2 (L_{g l o b a l})) .

(10)

We also denote the local subdomain

SA - d i s t a n c e

as:

d_{SA, l} (D_{s}^{u}, D_{t}^{u}) = 2 (1 - 2 (L_{l o c a l}^{u})) .

(11)

These two kinds of distances can measure the distribution similarity of the source and target domain to some extent.

D_{s}^{u}

and

D_{t}^{u}

denote samples from class

u

, and

L_{l o c a l}^{u}

is the local subdomain discriminator loss over the class

u

.

3.5.1. The Global Dynamic Adversarial Factor $ω$

The global dynamic adversarial factor

ω

is used to calculate the weighted sum of global domain loss and local domain loss.

ω

is initialized as 1 in the first training epoch. After each training epoch, this factor

ω

can be estimated through global and local domain discriminators as:

\hat{ω} = \frac{d_{SA, g} (D_{s}, D_{t})}{d_{SA, g} (D_{s}, D_{t}) + \frac{1}{U} \sum_{u = 1}^{U} d_{SA, l} (D_{s}^{u}, D_{t}^{u})}

(12)

3.5.2. The Local Dynamic Adversarial Factors ${\{α_{u}\}}_{u = 1}^{U}$

The local domain loss

L_{l o c a l}

is composed of a series of local subdomain loss

L_{l o c a l}^{u}

by weighted summation. The local dynamic adversarial factors

{\{α_{u}\}}_{u = 1}^{U}

act as coefficients in the weighted sum. The importance of the alignment of each subdomain is different, so we assign different weights to every subdomain. In general, if the difference between the source subdomain and the target subdomain regarding class u is larger, we should pay more attention to this subdomain, which is manifested by the fact that the weight of the alignment of this subdomain should be increased. Since

SA - d i s t a n c e

is an important parameter used to measure the similarity of the probability distributions of two subdomains

D_{s}^{u}

and

D_{t}^{u}

, the parameter

α_{u}

can be estimated as:

{\hat{α}}_{u} = \frac{\frac{1}{U} \sum_{u = 1}^{U} d_{SA, l} (D_{s}^{u}, D_{t}^{u})}{d_{SA, l} (D_{s}^{u}, D_{t}^{u})} .

(13)

Each subdomain dynamic adversarial factor is initialized as 1 in the first training epoch. This factor-updating strategy follows the principle: If the

SA - d i s t a n c e

(Generally less than 0) between the subdomains about class u is larger, the weight for the alignment of subdomains about u will be greater. In this case, the domain discriminator

G_{d}^{u}

of the subdomain u can more easily discriminate the domain label information of the sample in the subdomain, which indicates that the difference between domains about class u is large. Therefore, we need to increase the proportion of

L_{l o c a l}^{u}

in the total local loss to ensure a better alignment effect about class u.

4. Experiments

In this section, we implement experiments to evaluate the proposed GLDAL against several previous state-of-the-art domain adaption methods. Our method GLDAL is validated on several popular standard datasets regarding cross-domain sentiment analysis. The training process is shown in Algorithm 1.

Algorithm 1 GLDAL

Input:

—samples

S = {\{x_{s}^{i}, y_{s}^{i}\}}_{i = 1}^{N_{s}}

and

T = {\{x_{t}^{i}\}}_{i = 1}^{N_{t}}

,

λ

,

μ

,

ω

,

{\{α_{u}\}}_{u = 1}^{U}

Output: Neural Network

\{G_{y}, G_{f}, G_{d}\}

while stopping criterion is not meet do

calculate the global domain loss

L_{{g l o b a l}_{i}}

: Equation (8)

calculate the global

SA - d i s t a n c e

: Equation (10)

acquire classification probability vector

\hat{y_{i}}

from

G_{y}

:

\hat{y_{i}} ⟵ G_{y} (G_{f} (s_{i}))

calculate the sum of all local loss of sample

s_{i}

: Equation (9)

calculate each local

SA - d i s t a n c e

: Equation (11)

if

s_{i} \in S

then

calculate the classification loss of: Equation (7)

Backpropagation:

Δ_{Θ} \leftarrow \frac{Δ L_{y_{i}}}{Δ Θ} - λ \frac{Δ ((1 - ω) L_{g l o b a l} + ω L_{l o c a l})}{Δ Θ}

Update dynamic factor

ω, {\{α_{u}\}}_{u = 1}^{U}

:

\hat{ω}

: Equation (12),

{\hat{α}}_{u}

: Equation (13)

return Output

4.1. Datasets

The Amazon Product Review dataset includes product reviews and metadata from Amazon; due to the uneven distribution of the samples, we set rating 0–1 as negative sentiment, 2–3 as medium and 5 as positive. We chose the following two datasets: Amazon reviews for clothing (C) and Amazon reviews for instant video (I).

SST-5 (S5), SST-2 (S2) are two versions of The Stanford Sentiment Treebank (SST); the former is (S5), with five categories (Very Positive, Positive, Neural, Negative), and the latter is (S2) with two categories (Positive, Negative).

IMDB (IM) is a review rating dataset for movie sentiment analysis, containing the same number of positive and negative sentiment samples. The COVID-19 Review dataset (COR) contains a sample of comments about COVID-19 and the corresponding sentiment scores (Very Positive, Positive, Neural, Negative). Tweet Review dataset TR contains daily comments and sentiment ratings scraped from tweets, with three categories (Positive, Neural, Negative).

We used these domain combinations and built six transfer learning tasks: (C → TR), (TR → I), (COR → S5), (S5 → COR), (IM → S2), (S2 → IM).

4.2. Baselines

Based on two previously mentioned backbones: Bi-GRU-A (Section 3.2.1) and CapsuleNet (Section 3.2.2),We compare our proposed Global–local Dynamic Adversarial Learning (GLDAL) with several state-of-the-art unsupervised deep domain adaption methods: DDC [25], DaNN [26], DANN [4], D-CORAL [15], JAN [27] MADA [5] and DAAN [18].

4.3. Implementation

We implement all methods on the PyTorch framework. For feature extractor backbones CapsuleNet and Bi-GRU-A, we fine-tune all word vectors and all attention layers and train the feature extraction layers with a learning rate of

10^{- 5}

and

10^{- 4}

, respectively. We use approximately 2–5 times the learning rate of the feature extractor to train the label classifier, and use 2–10 times the learning rate to train the local discriminator and global discriminator. The update of the important trade-off factor

λ

follows the Warm Start principles:

λ = \frac{2 (h i - l o)}{1 + exp (- \frac{i}{N})} - (h i - l o) + l o

. We set

l o

as 0,

h i

as 1, N as 100. Other hyperparameters are tuned via transfer cross-validation. The source code is available at https://github.com/killer2-1/GLDAL-for-CDSA (accessed on 20 May 2023).

4.4. Results

The classification accuracy (%) is shown in Table 1 and Table 2. GLDAL outperforms all comparison methods on most CDSA tasks. It is also remarkable that our GLDAL has better effects than similar approaches such as DANN, MADA and DAAN. From the results, we can obtain some conclusions: (1) Generally, the methods based on adversarial learning (DANN, DAAN) perform better than non-adversarial learning methods (DaNN, DDC), which means that adversarial adaption is more effective. (2) Adversarial methods with an attention mechanism such as GLDAL achieves a better performance than DANN and MADA(no attention). (3) Compared with other methods, GLDAL outperforms almost all traditional comparison methods (DDC, DaNN, DANN) on most transfer tasks, which proves the proposed method is effective. For some of the more advanced methods, such as DAAN, GLDAL slightly outperforms these methods, which verifies our method’s benefits.

Table 1. Accuracy (%) for unsupervised domain adaptation based on Bi-GRU-A(The bold stands for the best performance).

Table 2. Accuracy (%) of unsupervised domain adaptation based on CapsuleNet.

4.5. Effectiveness Analysis and Ablation Study

4.5.1. Analysis of the Importance of the Global Dynamic Adversarial Factor (GDAF) $ω$ in GLDAL

In this section, the importance of GDAF

ω

in GLDAL is evaluated. The evaluation involves two key points: (1) Whether we should pay attention to the different effects of marginal and conditional probability in adverarial domain adaption. (2) The effectiveness of our evaluation for

ω

.

To illustrate the first point, we chose several transfer tasks and analyzed the results of GLDAL under different values of

ω

, as shown in in Figure 2. It is obvious that paying attention to the different effects of marginal and conditional distribution is important. The value of optimal

ω

varies on different tasks. This is because different feature representations are obtained under different

ω

.

Figure 2. Performance of several tasks when searching

ω \in [0, 1]

.

To illustrate the second point, we compared the accuracy of transfer tasks under different calculation methods for

ω

: Random Guessing, Average Searching (10 times) and our GLDAL. We also executed an ablation study with DANN (

ω

= 1) and MADA (

ω

= 0). Combining the results from Table 3, we can conclude that our method outperforms other four methods.

Table 3. Accuracy (%) for the effective analysis and ablation study of

ω

based on CapsuleNet.

4.5.2. Analysis of the Importance of the Local Dynamic Adversarial Factor (LDAF) ${\{α_{u}\}}_{u = 1}^{U}$ in GLDAL

In this section, based on the point oof whether we should pay attention to the different effect of each local subdomain, we analyze the importance of the local dynamic adversarial factor

{\{α_{u}\}}_{u = 1}^{U}

in GLDAL. Like the analysis of

ω

, we also chose a series of tasks, and adopted the three methods shown below to conduct verification experiments: (1) A series of random number that are greater than 0, and the sum is fixed to U (RNSF); (2) DAAN (all LDAFs are fixed to 1); (3) Our GLDAL. We used

A - d i s t a n c e

[28] as a measure of cross-domain discrepancy. In general, the smaller the

A - d i s t a n c e

, the better the domain adaption effect. The average results in each dataset are shown in Table 4.

Table 4.

A - d i s t a n c e

for all unsupervised domain adaptation tasks based on backbone CapsuleNet and Bi-GRU-A.

5. Conclusions

In this paper, we proposed a novel Global–local Dynamic Adversarial Learning (GLDAL) for cross-domain sentiment analysis. Through quantitatively evaluating the relative importance of global distribution and all local distributions, GLDAL is able to dynamically adjust the learning weights of the global discriminator and local discriminators during training. This domain adaption strategy is based on the attention mechanism and has excellent effects on experimental tasks. As a general transfer learning strategy, GLDAL can also be applied to tasks in computer vision and the recommended system. In the future, we plan to extend GLDAL to more challenging transfer learning problems.

Author Contributions

Methodology, Z.Z. and X.F.; Resources, Z.Z.; Data curation, J.L. and S.C.; Writing—original draft, J.L.; Writing—review & editing, Z.Z.; Supervision, X.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

There are the url links for our used dataset: IMDB: https://huggingface.co/datasets/imdb (accessed on 20 May 2023); SST: https://huggingface.co/datasets/sst2 (accessed on 20 May 2023); GloVe: https://nlp.stanford.edu/projects/glove/ (accessed on 20 May 2023); IMDB: https://huggingface.co/datasets/imdb (accessed on 20 May 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A Comprehensive Survey on Transfer Learning. Proc. IEEE 2021, 109, 43–76. [Google Scholar] [CrossRef]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Gupta, B.; Awasthi, S.; Singh, P.; Ram, L.; Kumar, P.; Prasad, B.R.; Agarwal, S. Cross domain sentiment analysis using transfer learning. In Proceedings of the 2017 IEEE International Conference on Industrial and Information Systems (ICIIS), Peradeniya, Sri Lanka, 15–16 December 2017. [Google Scholar]
Ganin, Y.; Lempitsky, V.S. Unsupervised Domain Adaptation by Backpropagation. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1180–1189. [Google Scholar]
Pei, Z.; Cao, Z.; Long, M.; Wang, J. Multi-Adversarial Domain Adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 3934–3941. [Google Scholar]
Zhang, M.; Li, X.; Wu, F. Moka-ADA: Adversarial domain adaptation with model-oriented knowledge adaptation for cross-domain sentiment analysis. J. Supercomput. 2023, 79, 13724–13743. [Google Scholar] [CrossRef]
Sun, B.; Saenko, K. Subspace Distribution Alignment for Unsupervised Domain Adaptation. In Proceedings of the 26th British Machine Vision Conference, Swansea, UK, 7–10 September 2015; pp. 24.1–24.10. [Google Scholar]
Sun, B.; Feng, J.; Saenko, K. Return of Frustratingly Easy Domain Adaptation. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 2058–2065. [Google Scholar]
Pan, S.J.; Tsang, I.W.; Kwok, J.T.; Yang, Q. Domain Adaptation via Transfer Component Analysis. IEEE Trans. 2011, 22, 199–210. [Google Scholar] [CrossRef] [PubMed]
Long, M.; Wang, J.; Ding, G.; Sun, J.; Yu, P.S. Transfer Feature Learning with Joint Distribution Adaptation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, 1–8 December 2013; pp. 2200–2207. [Google Scholar]
Wang, J.; Chen, Y.; Hao, S.; Feng, W.; Shen, Z. Balanced Distribution Adaptation for Transfer Learning. In Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA, USA, 18–21 November 2017. [Google Scholar]
Wang, J.; Feng, W.; Chen, Y.; Yu, H.; Huang, M.; Yu, P.S. Visual Domain Adaptation with Manifold Embedded Distribution Alignment. In Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Republic of Korea, 22–26 October 2018; pp. 402–410. [Google Scholar]
Zhu, Y.; Zhuang, F.; Wang, J.; Chen, J.; Shi, Z.; Wu, W.; He, Q. Multi-representation adaptation network for cross-domain image classification. Neural Netw. 2019, 119, 214–221. [Google Scholar] [CrossRef] [PubMed]
Zhuang, F.; Cheng, X.; Luo, P.; Pan, S.J.; He, Q. Supervised Representation Learning: Transfer Learning with Deep Autoencoders. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015; pp. 4119–4125. [Google Scholar]
Sun, B.; Saenko, K. Deep CORAL: Correlation Alignment for Deep Domain Adaptation. In Proceedings of the ECCV: European Conference on Computer Vision, Amsterdam, The Netherlands, 8–10 and 15–16 October 2016. [Google Scholar]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.C.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, ML, USA, 22–27 June 2014; pp. 2672–2680. [Google Scholar]
Durugkar, I.P.; Gemp, I.; Mahadevan, S. Generative Multi-Adversarial Networks. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. [Google Scholar]
Yu, C.; Wang, J.; Chen, Y.; Huang, M. Transfer Learning with Dynamic Adversarial Adaptation Network. In Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM), Beijing, China, 8–11 November 2019; pp. 778–786. [Google Scholar]
Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the NAACL, Minneapolis, Minnesota, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Dai, Z.; Yang, Z.; Yang, Y.; Carbonell, J.G.; Le, Q.V.; Salakhutdinov, R. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 2978–2988. [Google Scholar]
Du, C.; Sun, H.; Wang, J.; Qi, Q.; Liao, J. Adversarial and Domain-Aware BERT for Cross-Domain Sentiment Analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 4019–4028. [Google Scholar]
Li, Z.; Li, X.; Wei, Y.; Bing, L.; Zhang, Y.; Yang, Q. Transferable End-to-End Aspect-based Sentiment Analysis with Selective Adversarial Learning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Hong Kong, China, 3–7 November 2019; pp. 4589–4599. [Google Scholar]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Dong, Y.; Fu, Y.; Wang, L.; Chen, Y.; Dong, Y.; Li, J. A Sentiment Analysis Method of Capsule Network Based on BiLSTM. IEEE Access 2020, 8, 37014–37020. [Google Scholar] [CrossRef]
Tzeng, E.; Hoffman, J.; Zhang, N.; Saenko, K.; Darrell, T. Deep Domain Confusion: Maximizing for Domain Invariance. arXiv 2014, arXiv:1412.3474. [Google Scholar]
Ghifary, M.; Kleijn, W.B.; Zhang, M. Domain Adaptive Neural Networks for Object Recognition. In Proceedings of the 13th Pacific Rim International Conference on Artificial Intelligence, Gold Coast, QLD, Australia, 1–5 December 2014; pp. 898–904. [Google Scholar]
Long, M.; Zhu, H.; Wang, J.; Jordan, M.I. Deep Transfer Learning with Joint Adaptation Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 2208–2217. [Google Scholar]
Ben-David, S.; Blitzer, J.; Crammer, K.; Pereira, F. Analysis of Representations for Domain Adaptation. In Proceedings of the Neural Information Processing Systems 19 (NIPS 2006), Vancouver, BC, Canada, 4–7 December 2006; Schölkopf, B., Platt, J.C., Hofmann, T., Eds.; MIT Press: Cambridge, MA, USA, 2006; pp. 137–144. [Google Scholar]

Figure 1. The architecture of the proposed Global–local Dynamic Adversarial Learning (GLDAL). First, input the Text Sequences into the Feature Extractor to obtain the feature vector distributed in the feature space. Next, there are two main loss functions, namely

L_{y}

and

L_{d}

.

L_{y}

is the classification loss.

L_{d}

is the loss combination of the global domain discriminator and local domain discriminators.

Figure 1. The architecture of the proposed Global–local Dynamic Adversarial Learning (GLDAL). First, input the Text Sequences into the Feature Extractor to obtain the feature vector distributed in the feature space. Next, there are two main loss functions, namely

L_{y}

and

L_{d}

.

L_{y}

is the classification loss.

L_{d}

is the loss combination of the global domain discriminator and local domain discriminators.

Figure 2. Performance of several tasks when searching

ω \in [0, 1]

.

Figure 2. Performance of several tasks when searching

ω \in [0, 1]

.

Table 1. Accuracy (%) for unsupervised domain adaptation based on Bi-GRU-A(The bold stands for the best performance).

Method	C → TR	TR → I	COR → S5	S5 → COR	IM → S2	S2 → IM	AVG
Bi-GRU-A (No Transfer)	40.96	62.04	31.70	29.53	76.64	74.18	52.51
DDC	49.85	61.88	30.95	32.60	76.99	77.72	55.00
DaNN	45.35	63.65	31.31	32.00	78.46	78.40	54.86
DANN	49.01	65.98	32.84	33.53	80.81	78.82	56.83
D-CORAL	50.43	65.06	33.78	33.70	79.13	77.63	56.62
MADA	50.18	66.39	33.12	33.96	81.16	78.98	57.30
DAAN	50.69	66.05	33.09	34.08	80.99	79.02	57.32
GLDAL (Proposed Approach)	51.25	66.53	33.34	34.12	81.34	79.16	57.62

Table 2. Accuracy (%) of unsupervised domain adaptation based on CapsuleNet.

Method	C → TR	TR → I	COR → S5	S5 → COR	IM → S2	S2 → IM	AVG
CapsuleNet (No Transfer)	47.94	63.93	33.25	30.24	83.51	80.22	56.52
DDC	51.77	64.34	35.09	30.98	85.97	86.07	59.20
DaNN	50.16	66.56	34.15	31.49	85.45	86.14	58.99
DANN	53.55	67.72	35.21	34.45	86.11	85.60	60.44
D-CORAL	56.82	67.08	34.08	34.38	86.05	85.72	60.68
JAN	52.97	68.24	34.99	33.82	86.33	86.58	60.48
MADA	54.31	67.65	35.79	34.64	86.96	86.32	60.95
DAAN	54.76	68.78	34.96	34.24	87.21	86.50	61.08
GLDAL (Proposed Approach)	56.97	68.92	34.91	35.07	87.18	86.83	61.64

Table 3. Accuracy (%) for the effective analysis and ablation study of

ω

based on CapsuleNet.

Table 3. Accuracy (%) for the effective analysis and ablation study of

ω

based on CapsuleNet.

Transfer Tasks	Average Search	Random Guessing	GLDAL	DANN ( $ω$ = 1)	MADA ( $ω$ = 0)
S2 → IM	86.15	86.52	86.83	85.60	86.32
S5 → COR	34.98	34.86	35.07	34.45	34.64
TR → I	67.73	68.25	68.92	67.72	67.65
AVG	62.95	63.21	63.61	62.60	62.87

Table 4.

A - d i s t a n c e

for all unsupervised domain adaptation tasks based on backbone CapsuleNet and Bi-GRU-A.

Table 4.

A - d i s t a n c e

for all unsupervised domain adaptation tasks based on backbone CapsuleNet and Bi-GRU-A.

BackBone	RNSF	DAAN	GLDAL
CapsuleNet	0.988	0.952	0.925
Bi-GRU-A	1.053	1.007	0.978

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.

Global-Local Dynamic Adversarial Learning for Cross-Domain Sentiment Analysis

Abstract

1. Introduction

2. Related Work

2.1. Unsupervised Domain Adaption in Transfer Learning

2.2. Cross-Domain Sentiment Analysis

3. Method

3.1. Problem Definition

3.2. Backbones for Cross-Domain Sentiment Analysis

3.2.1. Bidirectional Gate Recurrent Units with Attention

3.2.2. Capsule Neural Network Based on BERT

3.3. Domain Adaption with Adversarial Learning

3.4. Global-Local Dynamic Adversarial Adaption Network

3.4.1. Label Classifier

3.4.2. Global Domain Discriminator

3.4.3. Local Domain Discriminator

3.5. Global and Local Adversarial Factors

3.5.1. The Global Dynamic Adversarial Factor ω

3.5.2. The Local Dynamic Adversarial Factors α u u = 1 U

4. Experiments

4.1. Datasets

4.2. Baselines

4.3. Implementation

4.4. Results

4.5. Effectiveness Analysis and Ablation Study

4.5.1. Analysis of the Importance of the Global Dynamic Adversarial Factor (GDAF) ω in GLDAL

4.5.2. Analysis of the Importance of the Local Dynamic Adversarial Factor (LDAF) α u u = 1 U in GLDAL

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Article Access Statistics

3.5.1. The Global Dynamic Adversarial Factor $ω$

3.5.2. The Local Dynamic Adversarial Factors ${\{α_{u}\}}_{u = 1}^{U}$

4.5.1. Analysis of the Importance of the Global Dynamic Adversarial Factor (GDAF) $ω$ in GLDAL

4.5.2. Analysis of the Importance of the Local Dynamic Adversarial Factor (LDAF) ${\{α_{u}\}}_{u = 1}^{U}$ in GLDAL