HFGNN-Proto: Hesitant Fuzzy Graph Neural Network-Based Prototypical Network for Few-Shot Text Classification

Guo, Xinyu; Tian, Bingjie; Tian, Xuedong

doi:10.3390/electronics11152423

Open AccessArticle

HFGNN-Proto: Hesitant Fuzzy Graph Neural Network-Based Prototypical Network for Few-Shot Text Classification

by

Xinyu Guo

¹,

Bingjie Tian

² and

Xuedong Tian

^1,3,*

¹

School of Cyber Security and Computer, Hebei University, Baoding 071002, China

²

International Education College, Hebei Finance University, Baoding 071051, China

³

Hebei Machine Vision Engineering Research Center, Hebei University, Baoding 071002, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(15), 2423; https://doi.org/10.3390/electronics11152423

Submission received: 24 June 2022 / Revised: 30 July 2022 / Accepted: 1 August 2022 / Published: 3 August 2022

(This article belongs to the Special Issue Advanced Application of Machine Learning and Meta-Learning in Image and Text Analysis)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Few-shot text classification aims to recognize new classes with only a few labeled text instances. Previous studies mainly utilized text semantic features to model the instance-level relation among partial samples. However, the single relation information makes it difficult for many models to address complicated natural language tasks. In this paper, we propose a novel hesitant fuzzy graph neural network (HFGNN) model that explores the multi-attribute relations between samples. We combine HFGNN with the Prototypical Network to achieve few-shot text classification. In HFGNN, multiple relations between texts, including instance-level and distribution-level relations, are discovered through dual graph neural networks and fused by hesitant fuzzy set (HFS) theory. In addition, we design a linear function that maps the fused relations to a more reasonable range in HFGNN. The final relations are used to aggregate the information of neighbor instance nodes in the graph to construct more discriminative instance features. Experimental results demonstrate that the classification accuracy of the HFGNN-based Prototypical Network (HFGNN-Proto) on the ARSC, FewRel 5-way 5-shot, and FewRel 10-way 5-shot datasets reaches 88.36%, 94.45%, and 89.40%, respectively, exceeding existing state-of-the-art few-shot learning methods.

Keywords:

few-shot text classification; multi-attribute relations; graph neural networks; hesitant fuzzy set; prototypical networks

1. Introduction

In recent years, the great success of deep learning has promoted the development of multitudinous fields such as computer vision and natural language processing [1,2,3], but the effectiveness of deep learning models relies on a large amount of labeled data. The generalization ability of deep learning models is severely limited when labeled data are scarce. Humans, on the other hand, have the ability to learn quickly and can easily build awareness of new things with just a few examples. This significant gap in learning between machine learning models and humans inspires researchers to explore few-shot learning (FSL) [4].

Inspired by the human-learning process, researchers proposed a meta-learning strategy for FSL, which utilizes the distribution of similar tasks to learn how to identify unseen classes accurately and efficiently with a small amount of training data. A cross-task meta-learner learns from multiple similar tasks and provides better initialization for unseen classes based on the knowledge acquired from prior experience. One of the typical meta-learning methods is the Prototypical Network [5], which computes the class prototypical representation of the support set and classifies the query sample to the nearest prototype.

Prototypical Network [5] and its variants [6,7,8,9] have been widely used for few-shot text classification tasks. Different from the Prototypical Network that computes class prototypes and query sample embeddings separately, MLMAN [6] interactively encodes text based on the matching information between the query and support sets at the local and instance levels. Gao et al. [7] proposed a hybrid attention-based prototypical network that employs instance-level attention and feature-level attention to highlight important instances and features, respectively. Sun et al. [8] improved the Prototypical Network by using feature level, word level, and instance level multi cross attention. Geng et al. [9] proposed the Induction Network that induces a better class-level representation using a dynamic routing algorithm [10]. Although the above methods have considered the intra-class similarity of the support set and the relations between support and query, they ignore inter-class dissimilarity and the relations between query samples. Furthermore, these methods only measure instance-level relations while neglecting other substantive relations. Due to the complexity and diversity of textual forms, simple relations have difficulty describing the true connection between texts and even introduce additional noise to the model.

Instead, we measure the commonalities and differences among all samples in the task from multiple aspects. Inspired by the Distribution Propagation Graph Neural Network [11], we introduce the distribution-level information of one sample to all other support samples into few-shot text classification. To better explore relations information, a dual-graph structure consisting of instance graphs and distribution graphs is adopted in HFGNN. The instance graph models instance-level relations and directional relations based on instance features. The distribution graph aggregates instance-level relations to model distribution-level relations and the distance relations. The relations in our model represents the similarity between samples at multiple levels. However, similarity is a fuzzy concept, which is not clearly defined. To solve the problem of multiple fuzzy relations, we introduce HFS [12] theory, which can handle multi-attribute decision-making problems well for the HFGNN model. We design the membership functions corresponding to the multi-attribute relations for the comprehensive evaluation to avoid the loss of relations information. In addition, we use a linear function that can further enhance stronger relations and weaken weaker relations to help the model generate more reasonable graph structures and provide inductive biases for the enhancement of instance features. Finally, a prototypical network takes the instance features generated by HFGNN as input and quickly classifies query samples. HFGNN-Proto adopts an episodic strategy [13] for meta-training in an end-to-end manner. It has a stronger generalization ability and can adapt to new classes without retraining.

In summary, our main contributions are summarized as follows:

1.: We propose an HFGNN-Proto model that comprehensively considers multiple substantive relations between texts for few-shot text classification. Subsequent ablation experiments demonstrate the effectiveness of multi-attribute relations.
2.: To ensure the integrity of multi-attribute relations, we develop a new hesitant fuzzy strategy to fuse all relations into a more precise relational representation.
3.: Considering the noise impact of fully connected graph on information transfer, we design a linear function to provide inductive biases for the transfer of relations in graph neural networks.
4.: To verify the effectiveness of our model, we conduct extensive experiments on the ARSC and FewRel datasets. Experimental results demonstrate that the proposed HFGNN-Proto model achieves a significant improvement over other few-shot methods.

2. Related Work

2.1. Few Shot Learning

Early studies mainly applied fine-tuning [14] and data augmentation [15] to alleviate the overfitting problem caused by insufficient training data but achieved unsatisfactory results. In the meta-learning strategy [16,17,18,19], transferable knowledge that can guide the learning of models are extracted from various tasks, so that the model has the ability of learning to learn. Current meta-learning methods mainly include optimization-based methods [18,19,20] and metric learning [5,13,21]. Some representative few-shot learning methods and their corresponding descriptions are listed in Table 1.

2.2. Graph Neural Network

Graph neural networks (GNNs) were originally designed to process graph-structured data. GNNs can efficiently handle data structures containing complex relations and discover potential connections between data with the ability to transform and aggregate neighbors. Some of the GNN models for few-shot tasks are listed in Table 2.

However, these models are all designed for image classification tasks, and these methods only transfer instance-level relations in GNNs, which make it difficult to handle elusive NLP tasks. In contrast, the HFGNN model proposed in this study considers relations between samples from multiple perspectives, and the accurate and sufficient relations help the model construct more discriminative features.

2.3. Multi-Criteria Decision-Making

Zadeh [35] proposed fuzzy set theory to address problems related to fuzzy, subjective, and imprecise judgments. However, this theory lacks the ability to solve the problem of multi-criteria decision-making(MCDM). In this regard, Torra [12] proposed HFS, which determines the corresponding evaluation index and membership function according to the different attributes of the elements in the universe. HFS is a powerful tool for solving problems involving many uncertainties.

In recent years, some more efficient MCDM methods have been proposed. Deveci et al. [36] explored a novel approach that integrates Combined Compromise Solution (CoCoSo) with the context of type-2 neutrosophic numbers to overcome the challenging decision process in Urban freight transportation tasks. Pamucar et al. [37] developed a novel integrated decision-making model which is based on Measuring Attractiveness by a Categorical Based Evaluation TecHnique (MACBETH) for calculating the criteria weights and Weight Aggregated Sum Product ASsessment (WASPAS) methods under the fuzzy environment with Dombi norms.

Considering the operating efficiency of the graph neural network model and the simplicity and effectiveness of the HFS theory, we introduce the HFS theory instead of other complex MCDM methods into the dual graph neural networks to fuse the relations between few-shot examples.

3. Problem Definition

The few-shot classification task trains a classifier that can accommodate new classes not seen in training, for which only a few examples are available in each class. Usually, the dataset used for this task is divided into two parts: the large training set

C_{t r a i n}

containing a series of categories and the target test set with a disjoint set of new classes

C_{t e s t}

.

C_{t e s t}

contains a support set S with a few labeled samples. If the support set contains N categories and each category contains K samples, the target problem is called an N-way K-shot problem. In principle, the classification model can be directly trained with the support set, but the performance will be poor because K is too small. Therefore, it is necessary to perform meta-learning on the training set. Meta-learning extracts transferable knowledge on the training set to help the model perform better few-shot learning on the support set, thereby improving the classification accuracy of the query samples in the test set.

In meta-learning, the episodic strategy [13] constructs training episodes to simulate FSL test scenario so that the classifier can perform well with a small number of annotations. More specifically, the training episode is formed by first randomly selecting N categories from the training set and then choosing K samples within each selected class to act as the support set

S = {(x_{i}, y_{i})}_{i = 1}^{m} (m = N \times K)

, as well as a fraction of the remaining samples to serve as the query set

Q = {(x_{i}, y_{i})}_{q = 1}^{n}

, where

x_{i}

represents the sample, and

y_{i} = {1, 2, \dots, N}

represents the label corresponding to the sample. During the meta-learning process, the support set is used to train the model to minimize the prediction loss on the query set. Meta-training performs this procedure iteratively episode by episode until the model converges.

4. Methods

4.1. Overview

As shown in Figure 1, HFGNN-Proto consists of three parts: the text embedding component, the HFGNN component, and the prototypical network component.

4.2. Text Embedding

To better reflect the semantic information of the text, the pre-trained language model BERT [29] is used to extract text semantic feature representations. BERT utilizes a masked language model for pre-training and employs a deep bidirectional transformer component, which is capable of generating language representations containing contextual information, to build the entire model. Given the input text

x = [w_{1}, w_{2}, \dots, w_{n}]

, a special token

[CLS]

used for classification is inserted at the beginning of the word sequence, and the output of the token in the last transformer is the text embedding. The output of BERT contains information about the context of the text x, and it can be represented as

h = f_{e m b} (x ∣ θ_{e m b})

, where

h \in R^{d}

, d represents the output dimension of BERT, and

θ_{e m b}

represents the parameters in the BERT encoder, which is fine-tuned during the training process.

4.3. HFGNN

4.3.1. Overview

This section introduces the HFGNN model in detail. As shown in Figure 1b, the HFGNN model contains three modules: a relation generator, a relation fusion module and an instance node updater. In each layer, the multi-attribute relations in task

T

are learned by instance graph

G_{l}^{i n s t} = (V_{l}^{i n s t}, E_{l}^{i n s t}, T)

and distribution graph

G_{l}^{d i s t} = (V_{l}^{d i s t}, E_{l}^{d i s t}, T)

in the generator and fused by HFS theory in the relation fusion module. The fused relation is refined by the linear function and then passed to the instance node updater to update instance features.

More specifically, in the relation generator, the initial features extracted by BERT are used to compute instance-level relations

E_{l}^{i n s t}

as well as the orientation difference

N_{l}^{i s n t}

of instance features in the instance graph. For the distribution graph, the instance-level relations

E_{l}^{i s n t}

are aggregated to construct distribution features

V_{l}^{d i s t}

of samples and distribution-level relations

E_{l}^{d i s t}

and distances

D_{l}^{d i s t}

are computed between distribution features. The above relations are transmitted to the relation fusion module to obtain the hesitant fuzzy relation

R_{l}^{h}

. A linear function

F (\cdot)

converts

R_{l}^{h}

to the final relation representation

R_{l}

. The instance node updater combines

R_{l}

and instance features

V_{l - 1}^{i n s t}

to construct a hesitant fuzzy graph

G_{l}^{h} = (V_{l - 1}^{i n s t}, R_{l}, T)

, and updates the instance features by aggregating the information of neighbors. This process is continuously repeated in HFGNN to fully explore the multi-attribute relations between samples in the task. The process of relation transfer is shown in Figure 2.

The specific meanings of some symbols are as follows:

V_{l}^{i n s t} = {v_{l, i}^{i n s t}}_{i = 1 : ∣ T ∣}

,

E_{l}^{i n s t} = {e_{l, i j}^{i n s t}}_{i, j = 1 : ∣ T ∣}

and

V_{l}^{d i s t} = {v_{l, i}^{d i s t}}_{i = 1 : ∣ T ∣}

,

E_{l}^{d i s t} = {e_{l, i j}^{d i s t}}_{i, j = 1 : ∣ T ∣}

represent the set of node features and edge features in the instance graph and distribution graph, respectively;

N_{l}^{i n s t} = {n_{l, i j}^{i n s t}}_{i, j = 1 : ∣ T ∣}

and

D_{l}^{d i s t} = {d_{l, i j}^{d i s t}}_{i, j = 1 : ∣ T ∣}

are the collection of direction relations between instance features and distance relations between distribution features, respectively;

R_{l}^{h} = {r_{l, i j}^{h}}_{i, j = 1 : ∣ T ∣}

and

R_{l} = {r_{l, i j}}_{i, j = 1 : ∣ T ∣}

represent the set of hesitant fuzzy relations

r_{l, i j}^{h}

and final relation

r_{l, i j}

, respectively;

T = N \times K + \bar{T}

denotes the number of samples in task

T

; and l indicates the layer of HFGNN.

4.3.2. Relation Generator

Before the iteration process starts, we need to initialize the nodes in the dual-graph. The instance node is initialized by the output of BERT, and the initial node of text

x_{i}

in the instance graph is denoted as

v_{0, i}^{i n s t} = f_{e m b} (x_{i})

. The node in distribution graph is a

N \times K

dimension vector, in which each element represents the instance-level relations between the sample and other support samples. These instance-level relations are aggregated to reflect the overall distribution of the sample in the support set. Since the current instance-level relations are unknown, the distribution graph node features are initialized according to the following rules:

v_{0, i}^{dist} = \{\begin{matrix} {[δ (y_{i}, y_{j})]}_{j = 1 : | T |} & if x_{i} belongs to S \\ [\frac{1}{T}, \dots, \frac{1}{T}] & if x_{i} belongs to Q, \end{matrix}

(1)

where

T = N \times K

represents the number of support samples in the task,

δ (y_{i}, y_{j})

is the Kronecker delta function, which outputs one when label

y_{i} = y_{j}

and zero otherwise. S and Q represent the support set and query set, respectively.

Instance-level Relations In the first layer, the edge features representing instance-level relations in the instance graph are first computed by the instance-edge-compute function:

e_{l, i j}^{i n s t} = f_{e, l}^{i n s t} ({(v_{l - 1, i}^{i n s t} - v_{l - 1, j}^{i n s t})}^{2}),

(2)

where

f_{e, l}^{i n s t} : R^{m} \to R

is a neural network that maps m-dimensional edge features to one-dimensional values in a fixed range, consisting of a linear-BN-LeakyReLU block, a single linear layer, and a sigmoid layer with parameters

θ_{e . l}^{i n s t}

.

Direction Relations Then, the cosine similarity between instance features, which can reflect the difference in the direction of the features, are calculated:

n_{l, i j}^{i n s t} = \cos (v_{l - 1, i}^{i n s t}, v_{l - 1, j}^{i n s t}),

(3)

where the value of the directional relations range between [−1, 1] and are inversely proportional to the magnitude of the directional difference.

Distribution-level Relations Next, in the distribution graph, instance-level relations are aggregated and distribution-level feature representations are generated through the dist-node-update function:

v_{l, i}^{d i s t} = f_{v, l}^{d i s t} ({[e_{l, i j}^{d i s t}]}_{j = 1 : ∣ T ∣} ∥ v_{l - 1, i}^{d i s t}),

(4)

where ∥ represents the concatenation operator and

f_{v, l}^{d i s t} : R^{2 T} \to R^{T}

is composed of a linear layer and LeakyReLU with parameters

θ_{v, l}^{d i s t}

.

The dist-edge-compute function takes distribution features as input and computes edge features representing the distribution-level relations of the samples:

e_{l, i j}^{d i s t} = f_{e, l}^{d i s t} ({(v_{l, i}^{d i s t} - v_{l, j}^{d i s t})}^{2}),

(5)

where

f_{e, l}^{d i s t} : R^{T} \to R

transforms the distribution features using the combination of a linear-BN-LeakyReLU block, a single linear layer, and a sigmoid layer. The parameters in

f_{e, l}^{i n s t}

can be expressed as

θ_{e, l}^{d i s t}

.

Distance Relations In HFGNN, the distance relations between distributed nodes are measured by the following methods:

d_{l, i j}^{d i s t} = SUM (v_{l, i}^{d i s t} - v_{l, j}^{d i s t}),

(6)

where

SUM (\cdot)

represents the sum of the distance values in each dimension.

4.3.3. Relations Fusion Module

First, we determine the membership function corresponding to each relation. Since the instance-level and distribution-level relations have been standardized by the sigmoid layer to a one-dimensional value,

f_{e, l}^{i n s t}

and

f_{e, l}^{d i s t}

can be considered as the membership function of the instance-level and distribution-level relations, and

e_{l, i j}^{i n s t}

and

e_{l, i j}^{d i s t}

are the corresponding membership values. The membership functions of the remaining two relations are defined as follows:

In the l-th iteration, the membership function of the directional relations between instance features is:

U_{n_{l}^{i n s t}} (v_{l, i}^{i n s t}, v_{l, j}^{i n s t}) = \frac{n_{l, i j}^{i n s t} + 1}{2} .

(7)

In the l-th iteration, the membership function of the distribution feature distance relations is:

U_{d_{l}^{d i s t}} (v_{l, i}^{d i s t}, v_{l, j}^{d i s t}) = e x p (- \frac{d_{l, i j}^{d i s t}}{\sum_{k = 1}^{T} d_{l, i k}^{d i s t}}) .

(8)

The membership values obtained from the relation membership functions defined above are all in the range [0, 1] and are proportional to the relations strength. The membership functions convert the relations between the samples into the corresponding hesitant fuzzy set, and all the membership values of the relations between a sample and itself are set to 1 to construct an ideal HFS

h_{P}

, which can be used as a standard to measure the similarity between different samples. For example, the HFS between

x_{i}

and itself can be represented as

{1, 1, 1, 1}

, while the HFS between

x_{i}

and

x_{j}

is

{e_{l, i j}^{i n s t}, e_{l, i j}^{d i s t}, \frac{n_{l, i j}^{i n s t} + 1}{2}, e x p (- \frac{d_{l, i j}^{d i s t}}{\sum_{k = 1}^{T} d_{l, i k}^{d i s t}})}

.

The similarity between the HFS and the ideal set can be measured by a distance metric, and it is inversely proportional to the distance calculation result. Here, the hesitant standard Euclidean distance is used as a measure. Assuming

h_{l, i j}

is the HFS between

x_{i}

and

x_{j}

, the hesitant standard Euclidean distance between

h_{l, i j}

and

h_{P}

can be expressed as:

d_{g h n h} (h_{l, i j}, h_{P}) = {[\frac{1}{l} \sum_{β = 1}^{l} {| h_{l, i j}^{σ (β)} - h_{P}^{σ (β)} |}^{2}]}^{\frac{1}{2}},

(9)

where

\frac{1}{l}

represents the largest number of elements in the HFS

h_{l, i j}

and

h_{P}

, and

h_{l, i j}^{σ (β)}

and

h_{P}^{σ (β)}

are the

β

-th largest values in

h_{l, i j}

and

h_{P}

, respectively. The similarity between HFSs, that is, the hesitant fuzzy relation

r_{l, i j}^{h}

, can be expressed as:

r_{l, i j}^{h} = 1 - d_{g h n h} (h_{l, i j}, h_{P}) .

(10)

Hesitant fuzzy relations contain multi-attribute similarity relations information. Considering that the nodes in the graph are easily affected by noise caused by adjacent nodes with low correlation, the hesitant fuzzy relations are further adjusted by a linear function

F (\cdot)

:

F (r_{l, i j}^{h}) = \{\begin{matrix} 1 & if r_{l, i j}^{h} > β, \\ r_{l, i j}^{h} & if β \geq r_{l, i j}^{h} \geq α, \\ 0 & if r_{l, i j}^{h} < α, \end{matrix},

(11)

where

F (\cdot)

further strengthens the relations with high similarity (

r_{l, i j}^{h} > β

), while weakening the relations with extremely low similarity (

r_{l, i j}^{h} < α

). The transformed final relation

r_{l, i j}

has strong inductive biases.

4.3.4. Instance-Node Updater

The instance node updater combines

r_{l, i j}

and

V_{l - 1}^{i n s t}

to construct a hesitant fuzzy graph. Note that the hesitant fuzzy graph is no longer a fully connected structure since a part of relations

R_{l}

are zero. The inst-node-update function in the hesitant fuzzy graph aggregates neighbors to update instance features:

v_{l, i}^{i n s t} = f_{v, l}^{h} (\frac{\sum_{j} r_{l, i j} v_{l - 1, j}^{i n s t}}{\sum_{j} r_{l, i j}} ∥ v_{l - 1, i}^{i n s t}),

(12)

where ∥ represents the connection operator and

f_{v, l}^{h}

is a linear-BN-LeakyReLU block with parameters

θ_{v, l}^{h}

. The completion of the update marks the completion of an iteration. In HFGNN, the instance feature

v_{l, i}^{i s n t}

output from the current layer is taken as the input of the next layer to start a new round of iteration. Then, the above process is repeated until more discriminative instance features are constructed.

4.4. Prototypical Network

The Prototypical Network embeds the instance features enhanced by HFGNN into the prototypical space, computes the class prototypical representation, and classifies query samples. The class prototype representation is the mean of instance embeddings for each class in the support set:

c_{k} = \frac{1}{| S_{k} |} \sum_{(x_{i}, y_{i}) \in S_{k}} f_{ϕ} (v_{i}),

(13)

where

v_{i}

represents the instance feature of the last layer in the HFGNN,

f_{ϕ}

is a linear layer with learnable parameters

θ_{ϕ}

, and

S_{k}

represents the support samples labeled as category k. The probability that query

x

belongs to one category in support set S can be calculated as follows:

p_{ϕ} (y = k ∣ x) = \frac{exp (- d (f_{ϕ} (x), c_{k}))}{\sum_{k^{'}}^{| S |} exp (- d (f_{ϕ} (x), c_{k^{'}}))},

(14)

where

d (\cdot)

is the distance calculation function between vectors. The cross-entropy loss function is used to train HFGNN-Proto, and the parameters

θ_{e m b}

,

θ_{e, l}^{i n s t}

,

θ_{v, l}^{d i s t}

,

θ_{e, l}^{d i s t}

,

θ_{v, l}^{h}

and

θ_{ϕ}

in the model are optimized by minimizing the following loss values:

L = - log p_{ϕ} (y = k ∣ x) .

(15)

5. Experiments

5.1. Datasets

We conduct experiments on two widely used few-shot text classification datasets to evaluate HFGNN-Proto.

The Amazon Review Sentiment Classification (ARSC) dataset contains English reviews for 23 domains of products on Amazon. For each product domain, Yu et al. [26] constructed three binary classification tasks with different scoring thresholds. These buckets form 23 × 3 = 69 tasks in total. Following previous works, 12 tasks in 4 domains (Books, DVD, Electronics, and Kitchen) are selected as the target test set, and each category in the test set contains only five labeled support samples. We create a 2-way 5-shot classification task on this dataset.

The Few-Shot Relation Classification (FewRel) dataset [38] contains of 100 relations from Wikipedia, and each relation consists of 700 instances. The number of relations in FewRel that are used for the training, validation and test sets is 64, 16 and 20, respectively. Following the settings used by Sun et al. [6], we conduct 5-way 5-shot and 10-way 5-shot experiments on the FewRel dataset. It should be noted that the label of test samples in this dataset has not been released to the public. The classification results are submitted to the FewRel evaluation website [39] provided by Gao et al. [38] for online evaluation of effectiveness.

5.2. Implementation Details

5.2.1. Meta-Training and Meta-Testing

The experiments on ARSC consist of 20,000 iterations with eight episodes randomly selected for meta-training in each iteration, while FewRel consists of 10,000 iterations with four episodes per iteration. Our model is trained by the Adam optimizer with an initial learning rate of

1 \times 10^{- 5}

and a weight decay rate of

5 \times 10^{- 8}

. The learning rate decays by 0.01 every 100 iterations.

Following previous work, few-shot classification accuracy is used as the evaluation metric for performance. For FewRel, every 1000 iterations, the model is evaluated on 600 randomly drawn validation episodes to find the best parameters. The model with the best parameters is finally applied to the test set, which contains 10,000 test episodes. The classification results are uploaded and evaluated online. For ARSC, tests are performed every 100 iterations. Note that the support set for testing in ARSC was determined by Yu et al. [26]. Consequently, we just need to sample the query from the target task to form the test episode. The average classification accuracy of the 12 target tasks in ARSC is used as the final accuracy.

5.2.2. Parameters Setting

The text encoder adopts Hugging Face’s implementation of BERT (base version) [40], and the parameters of the encoding layer are initialized by the pre-trained model publicly provided by Google. BERT-base converts text into 768-dimensional feature vectors. The HFGNN-Proto model is conducted with two GNN layers (

l = 2

) to prevent features from being over-smoothed. In order to maintain the balance between the degree of weakening and strengthening of the relations by the linear function

F (\cdot)

, we keep the sum of

α

and

β

as always 1. We perform small-scale experiments to determine the value of

α

and

β

on a small subset of ARSC following the experimental setup above. According to the results shown in Figure 3, the thresholds

α

and

β

are set to 0.3 and 0.7, respectively. The linear layer

f_{ϕ} (\cdot)

in the prototypical network embeds features into a 128-dimensional prototypical space.

5.3. Results and Analysis

We compare the effect of HFGNN-Proto on the ARSC and FewRel datasets with many baseline models. The experimental results are shown in Table 3 and Table 4, respectively.

Results on ARSC The results in Table 3 show that HFGNN-Proto achieves a classification accuracy of 88.36% and outperforms most previous baseline models. Some existing FSL methods, such as Matching Network [13], MAML [18], and Relation Network [21], perform poorly on ARSC despite the fact that they possess outstanding performance in the vision domain. Induction Network [9] uses a routing mechanism to induce better class representations, which improves the classification accuracy to a new level, but the performance of HFGNN-Proto still far exceeds it. Compared with these methods, the reason why HFGNN-Proto improves the classification accuracy is that it fully learns the potential relations of samples in the meta-task at the multi-attribute level.

Results on FewRel The results in Table 4 show that the 5-way 5-shot and 10-way 5-shot classification accuracy of HFGNN-Proto on the FewRel dataset are 94.45% and 89.40%, respectively, which both exceed the state-of-the-art method BERT-PAIR [28]. The classification performance in the 5-shot scenario is 1.23% higher than that of BERT-PAIR, and the accuracy improvement in the 10-shot scenario setting is even more significant, with an accuracy improvement of 2.28% over that of BERT-PAIR. EGNN-Proto [42] also uses the combination of GNNs and Prototypical Network, but the effect is far less than that of our model. EGNN-Proto uses the fully connected graph structure to transmit instance-level relations and edge label information. This process faces the problem of single relational information and the influence of noise caused by a fully connected structure. The relations in HFGNN-Proto are more sufficient, and the function

F (\cdot)

provides a more reasonable graph structure for the update of instance features and effectively avoids the influence of irrelevant noise.

5.4. Comparison with PLMs

We also compare the performance of HFGNN-Proto with the pretrained language model GPT and its variants on the real-world few-shot classification datasets constructed by Alex et al. [43]. The results are shown in Table 5. We can see that the performance of GPT-2 [44] and GPT-Neo [43] is far inferior to our model, while GPT-3 [45] with up to 175 billion parameters achieves the highest performance. It is worth noting that our proposed HFGNN-Proto method achieves performance close to GPT-3 with several orders of magnitude fewer parameters than GPT-3.

5.5. Ablation Experiment

To analyze the influence of different modules in HFGNN-Proto on the performance, we conduct ablation experiments on ARSC.

The abundant information in multi-attribute relations builds more reasonable feature representations, and measuring multiple relations is crucial to the performance of the model. To support this idea, we compare the classification performance of HFGNN-Proto with different relation combinations. The experimental results in Table 6 show that the accuracy is greatly improved after HFGNN-Proto learns distribution-level relations on the basis of the initial instance-level relations, and the best performance is achieved when all relations mentioned in Section 4.3.2 are considered, corresponding to the best result reported in Table 3.

We further report the effect of HFS strategy, the linear function

F (\cdot)

, and the layers of GNN on model performance as shown in Table 7. We can see that the best performance is achieved when the number of GNN layers is 2, corresponding to the result reported in Table 3. More layers of GNN did not further improve the performance. The table also demonstrates the effectiveness of HFS and the linear function

F (\cdot)

. HFGNN-Proto performs poorly when we replace the HFS with the averaging strategy to process the relations, which proves the contribution of HFS to the performance. On the other hand, the performance of the model without

F (\cdot)

is significantly lower, indicating that our designed function generates a more reasonable graph structure and provides inductive biases for the transfer of relations.

5.6. Visualization

To further analyze the benefit of HFGNN on instance features, we randomly select an episode from the FewRel 10-way 5-shot test set and visualize the support set features before and after HFGNN transformation through t-SNE, as shown in Figure 4. The initial features extracted by BERT diverge in space and are interleaved with each other. After the transformation of the first layer in the HFGNN, the features belonging to the same class are aggregated, and the features of different classes are far away from each other. After the second transformation, the distribution of features in the space is further improved. These results fully demonstrate the effectiveness of the HFGNN in discovering relations between samples and enhancing instance features.

5.7. Limitations

General GNNs can efficiently perform edge classification. The edges between nodes represent the similarity between samples, and the classification results can be directly generated by the edges. However, our proposed HFGNN method does not support edge classification. This is because we change the edges in hesitant fuzzy graph according to Equation (11). These edges with weak relations are directly cut off, which results in the inability of HFGNN to perform efficient edge classification.

Another limitation of HFGNN-Proto is that the model can only handle English tasks currently, and it is still a challenge for our model to handle few-shot tasks in other languages.

6. Conclusions

In this paper, we propose a HFGNN-Proto model that can fully explore the multi-attribute relations between samples for few-shot text classification. Abundant relation information helps the model to better handle complex NLP tasks. Relations are transmitted in a dual-graph and integrated by HFS theory. The use of HFS effectively avoids the loss of information and improves the accuracy of the overall relation representation. Moreover, the linear function further improves the rationality and accuracy of the relations, which helps the model construct a more accurate hesitant fuzzy graph for message transmission and provides strong inductive biases for feature enhancement. Finally, a prototypical network can perform more quick and efficient classification based on the enhanced features. HFGNN-Proto achieves better generalization on unseen tasks given its ability to discover precise potential connections between texts. Experimental results demonstrate that HFGNN-Proto outperforms existing state-of-the-art few-shot models. In the future, we will focus on exploring more substantial relations among few-shot samples, trying to use other MCDM methods to handle relations, and generalizing HFGNN-Proto to few-shot tasks in other domains.

Author Contributions

Conceptualization, X.G.; methodology, X.G. and X.T.; software, X.G.; validation, X.G.; formal analysis, X.G.; investigation, X.G.; resources, X.G.; data curation, X.G.; writing—original draft preparation, X.G.; writing—review and editing, X.G. and X.T.; visualization, X.G.; supervision, X.T. and B.T.; project administration, X.G.; funding acquisition, X.T. and B.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Natural Science Foundation of Hebei Province, China (No. F2019201329) and the Key Project of the Science and Technology Research Program in the University of Hebei Province, China (No. ZD2019131).

Conflicts of Interest

The authors declare no conflict of interest.

References

Johnson, R.; Zhang, T. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; pp. 562–570. [Google Scholar] [CrossRef] [Green Version]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
Kuang, S.; Li, J.; Branco, A.; Luo, W.; Xiong, D. Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; pp. 1767–1776. [Google Scholar] [CrossRef]
Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a few examples: A survey on few-shot learning. ACM Comput. Surv. 2020, 53, 1–34. [Google Scholar] [CrossRef]
Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. Adv. Neural Inf. Process. Syst. 2017, 30, 2153. [Google Scholar]
Ye, Z.X.; Ling, Z.H. Multi-Level Matching and Aggregation Network for Few-Shot Relation Classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, 28 July–2 August; pp. 2872–2881. [CrossRef]
Gao, T.; Han, X.; Liu, Z.; Sun, M. Hybrid attention-based prototypical networks for noisy few-shot relation classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 6407–6414. [Google Scholar] [CrossRef] [Green Version]
Sun, S.; Sun, Q.; Zhou, K.; Lv, T. Hierarchical attention prototypical networks for few-shot text classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 476–485. [Google Scholar] [CrossRef]
Geng, R.; Li, B.; Li, Y.; Zhu, X.; Jian, P.; Sun, J. Induction Networks for Few-Shot Text Classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3904–3913. [Google Scholar] [CrossRef] [Green Version]
Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic routing between capsules. Adv. Neural Inf. Process. Syst. 2017, 30, 2100. [Google Scholar]
Yang, L.; Li, L.; Zhang, Z.; Zhou, X.; Zhou, E.; Liu, Y. Dpgn: Distribution propagation graph network for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 13390–13399. [Google Scholar] [CrossRef]
Torra, V. Hesitant fuzzy sets. Int. J. Intell. Syst. 2010, 25, 529–539. [Google Scholar] [CrossRef]
Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D. Matching networks for one shot learning. Adv. Neural Inf. Process. Syst. 2016, 29, 1804. [Google Scholar]
Donahue, J.; Jia, Y.; Vinyals, O.; Hoffman, J.; Zhang, N.; Tzeng, E.; Darrell, T. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 647–655. [Google Scholar]
Salamon, J.; Bello, J.P. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 2017, 24, 279–283. [Google Scholar] [CrossRef]
Zhang, R.; Che, T.; Ghahramani, Z.; Bengio, Y.; Song, Y. Metagan: An adversarial approach to few-shot learning. Adv. Neural Inf. Process. Syst. 2018, 31, 1207. [Google Scholar]
Sun, Q.; Liu, Y.; Chua, T.-S.; Schiele, B. Meta-transfer learning for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 403–412. [Google Scholar]
Finn, C.; Abbeel, P.; Levine, S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
Ravi, S.; Larochelle, H. Optimization as a model for few-shot learning. In Proceedings of the 5th International Conference on Learning Representation (ICLR 2017), Toulon, French, 24–26 April 2017; pp. 1–11. [Google Scholar]
Mishra, N.; Rohaninejad, M.; Chen, X.; Abbeel, P. A Simple Neural Attentive Meta-Learner. In Proceedings of the 6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada, 30 April–3 May 2018; pp. 1–17. [Google Scholar]
Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.; Hospedales, T.M. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, UT, USA, 18–23 June 2018; pp. 1199–1208. [Google Scholar] [CrossRef] [Green Version]
Jiang, X.; Havaei, M.; Chartrand, G.; Chouaib, H.; Vincent, T.; Jesson, A.; Chapados, N.; Matwin, S. Attentive task-agnostic meta-learning for few-shot text classification. In Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 6–9 May 2019; pp. 1–14. [Google Scholar]
Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese neural networks for one-shot image recognition. In Proceedings of the Conference and Workshop on the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015. [Google Scholar]
Abdelaziz, M.; Zhang, Z. Multi-scale kronecker-product relation networks for few-shot learning. Multimed. Tools Appl. 2022, 81, 6703–6722. [Google Scholar] [CrossRef]
Han, M.; Wang, R.; Yang, J.; Xue, L.; Hu, M. Multi-scale feature network for few-shot learning. Multimed. Tools Appl. 2020, 79, 11617–11637. [Google Scholar] [CrossRef]
Yu, M.; Guo, X.; Yi, J.; Chang, S.; Potdar, S.; Cheng, Y.; Tesauro, G.; Wang, H.; Zhou, B. Diverse Few-Shot Text Classification with Multiple Metrics. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), New Orleans, LA, USA, 1–6 June 2018; pp. 1206–1215. [Google Scholar] [CrossRef] [Green Version]
Sui, D.; Chen, Y.; Mao, B.; Qiu, D.; Liu, K.; Zhao, J. Knowledge Guided Metric Learning for Few-Shot Text Classification. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 3266–3271. [Google Scholar] [CrossRef]
Gao, T.; Han, X.; Zhu, H.; Liu, Z.; Li, P.; Sun, M.; Zhou, J. FewRel 2.0: Towards More Challenging Few-Shot Relation Classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 6250–6255. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Luo, Q.; Liu, L.; Lin, Y.; Zhang, W. Don’t miss the labels: Label-semantic augmented meta-learner for few-shot text classification. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online, 1–6 August 2021; pp. 2773–2782. [Google Scholar] [CrossRef]
Lee, J.-H.; Ko, S.-K.; Han, Y.-S. Salnet: Semi-supervised few-shot text classification with attention-based lexicon construction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; pp. 13189–13197. [Google Scholar]
Garcia, V.; Bruna, J. Few-shot learning with graph neural networks. In Proceedings of the 6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada, 30 April–3 May 2018; pp. 1–13. [Google Scholar]
Liu, Y.; Lee, J.; Park, M.; Kim, S.; Yang, E.; Hwang, S.J.; Yang, Y. Learning to propagate labels: Transductive propagation network for few-shot learning. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019; pp. 1–11. [Google Scholar]
Kim, J.; Kim, T.; Kim, S.; Yoo, C.D. Edge-labeling graph neural network for few-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11–20. [Google Scholar] [CrossRef] [Green Version]
Zadeh, L.A. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst. 1978, 1, 3–28. [Google Scholar] [CrossRef]
Deveci, M.; Pamucar, D.; Gokasar, I.; Delen, D.; Wu, Q.; Simic, V. An analytics approach to decision alternative prioritization for zero-emission zone logistics. J. Bus. Res. 2022, 146, 554–570. [Google Scholar] [CrossRef]
Pamucar, D.; Torkayesh, A.E.; Deveci, M.; Simic, V. Recovery center selection for end-of-life automotive lithium-ion batteries using an integrated fuzzy WASPAS approach. Expert Syst. Appl. 2022, 206, 117827. [Google Scholar] [CrossRef]
Han, X.; Zhu, H.; Yu, P.; Wang, Z.; Yao, Y.; Liu, Z.; Sun, M. FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 4803–4809. [Google Scholar]
FewRel Evaluation Website. Available online: https://thunlp.github.io/1/fewrel1.html (accessed on 3 January 2022).
Hugging Face’s Implementation of Bert-Base-Uncased. Available online: https://huggingface.co/bert-base-uncased (accessed on 6 November 2021).
Munkhdalai, T.; Yu, H. Meta networks. In Proceedings of the International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 2554–2563. [Google Scholar]
Lyu, C.; Liu, W.; Wang, P. Few-shot text classification with edge-labeling graph neural network-based prototypical network. In Proceedings of the 28th International Conference on Computational Linguistics, Online, 8–13 December 2020; pp. 5547–5552. [Google Scholar]
Alex, N.; Lifland, E.; Tunstall, L.; Thakur, A.; Maham, P.; Riedel, C.J.; Hine, E.; Ashurst, C.; Sedille, P.; Carlier, A. RAFT: A real-world few-shot text classification benchmark. arXiv 2021, arXiv:2109.14076. [Google Scholar]
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]

Figure 1. The overall framework of HFGNN-Proto. An example of a 2-way 2-shot task is illustrated. Different colored circles represent different classes. Circles with special symbols represent distribution features, and those without symbols represent instance features. Solid circles represent support samples, and dashed circles represent query samples. (a) the process of feature extraction; (b) The HFGNN component discovers, transmits, and integrates multi-attribute relations between samples to enhance embedding features. Note that the HFGNN contains several layers. For simplicity, only one layer is shown in the figure. The detailed process of this framework is described in Section 4.3. (c) The prototypical network component takes the augmented features as input and classifies query samples.

Figure 2. The process of multi-attribute relation transfer in HFGNN.

Figure 3. Accuracy of the model on a subset of ARSC with different values of

α

and

β

.

Figure 3. Accuracy of the model on a subset of ARSC with different values of

α

and

β

.

Figure 4. t-SNE visualization of support sample features in the FewRel 10-way 5-shot task. Different colors represent different categories. For visual representation, we annotated 10 different categories with numbers 1–10. (a) initial feature vectors extracted by BERT; (b) feature vectors after being transformed by the first layer of the HFGNN; (c) feature vectors obtained from the last layer of the HFGNN.

Table 1. Representative few-shot learning methods and their descriptions.

Type	Core Idea	Method	Description
Optimization-based Methods	This type of approach aims to learn to optimize the model parameters given the gradients computed from the few-shot examples.	MAML [18]	MAML trains with a set of initialization parameters, and on the basis of the initial parameters, one or more steps of gradient adjustment can achieve the purpose of quickly adapting the model to new tasks with only a small amount of data.
		SNAIL [20]	A novel combination of temporal convolution and soft attention to learn the optimal optimization strategy.
		ATAML [22]	ATAML facilitates task-agnostic representation learning through task-agnostic parameterization and enhances the adaptability of the model to specific tasks through attention mechanism.
Metric Learning	In metric-based methods, instances are mapped into the feature space, the distance between the query and support sets are measured, and the classification is completed using the nearest neighbors concept.	Siamese Network [23]	Siamese network contains two parallel neural networks that are trained to extract pair-wise sample features, and the Euclidean distance between features are measured.
		Matching Network [13]	It generates a weighted K-nearest neighbor classifier based on the cosine distance between sample features.
		Relation Network [21]	Different from the Siamese network and Matching network which adopt a single and fixed metric, Relation network compares relations with a nonlinear metric learned by a neural network.
		MsKPRN [24]	MsKPRN extends the Relation Network to be position-aware and integrates multi-scale features.
		MSFN [25]	MSFN learns a multi-scale feature space and similarities between the multi-scale and class representation are computed.
		Adaptive Metric Learning Model [26]	Yu et al. [26] proposed an adaptive metric learning model that is able to automatically determine the best weighted combination for emerging few-shot tasks from a set of metrics obtained by meta-learning.
		Knowledge-Guided Relation Network [27]	Sui et al. [27] proposed a knowledge-guided metric model that uses external knowledge to imitate human knowledge and generate relational networks that can apply different metrics to different tasks.
Other Methods	There is no unified core idea for these methods. They solve few-shot tasks in different ways, but all achieve competitive results.	BERT-PAIR [28]	BERT-PAIR combines query and each support sample into a sequence and utilizes BERT [29] to predict whether each pair expresses the same class.
		LsSAML [30]	It utilizes the information implied by class labels to assist pretrained language models extracting more discriminative features.
		SALNet [31]	This method trains a classifier from labeled data through an attention mechanism and collects lexicons containing important words for each category, and then uses new data labeled by the combination of classifiers and lexicons to guide the learning of the classifier.

Table 2. Some of the current GNN models for few-shot tasks and their descriptions.

Model	Description
Simple GNN [32]	Garcia et al. [32] constructed a graph model in which the query and all support samples are closely connected and used a node-focused GNN to transfer instance-level relations and label information.
TPN [33]	This method further considers the relations among query samples.
EGNN [34]	It adopts an edge-labeling framework to explicitly model the intra-class similarity and inter-class dissimilarity of samples, and dynamically update node and edge features to achieve complex information interactions.

Table 3. Comparison of the average classification accuracy (%) on ARSC.

Model	Mean ACC
Matching Network [13]	65.73
Prototypical Network [5]	68.17
MAML [18]	78.33
Graph Network [32]	82.61
Relation Network [21]	83.07
SNAIL [20]	82.57
ROBUSTTC-FSL [26]	83.12
Induction Network [9]	85.63
HFGNN-Proto (OURS)	88.36

Table 4. Comparison of classification accuracy (%) on FewRel.

Model	5-Way 5-Shot	10-Way 5-Shot
Finetune [36]	68.66	55.04
kNN [36]	68.77	55.87
Meta Network [41]	80.57	69.23
Graph Network [32]	81.28	64.02
SNAIL [20]	79.40	68.33
Prototypical Network [5]	89.05	81.46
HATT-Proto [7]	90.12	83.05
HAPN [8]	91.02	84.16
EGNN-Proto [42]	92.29	86.09
BERT-PAIR [28]	93.22	87.02
HFGNN-Proto(OURS)	94.45	89.40

Table 5. Comparison of HFGNN-Proto and PLMs. ADE and NIS are datasets constructed by Alex et al. [43].

Model	ADE	NIS
GPT-2 [44]	60.0	56.1
GPT-Neo [43]	45.2	40.8
GPT-3 [45]	68.6	67.9
HFGNN-Proto	68.4	67.5

Table 6. Classification accuracy (%) of HFGNN-Proto with different relation combinations on ARSC. IR stands for instance-level relations and DR stands for Distribution-level relations.

Relation	Accuracy
IR	87.15
IR + DR	87.99
All relations	88.36

Table 7. Further ablation study of HFGNN-Proto on ARSC. LF stands for linear function. Average means that the average strategy is used instead of HFS to handle the relations.

	GNN Layers	Accuracy (%)
HFS + LF	2	88.36
HFS + LF	1	87.79
HFS + LF	3	88.13
HFS + LF	4	87.96
HFS	2	87.46
Average	2	86.22
Average + LF	2	87.29

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guo, X.; Tian, B.; Tian, X. HFGNN-Proto: Hesitant Fuzzy Graph Neural Network-Based Prototypical Network for Few-Shot Text Classification. Electronics 2022, 11, 2423. https://doi.org/10.3390/electronics11152423

AMA Style

Guo X, Tian B, Tian X. HFGNN-Proto: Hesitant Fuzzy Graph Neural Network-Based Prototypical Network for Few-Shot Text Classification. Electronics. 2022; 11(15):2423. https://doi.org/10.3390/electronics11152423

Chicago/Turabian Style

Guo, Xinyu, Bingjie Tian, and Xuedong Tian. 2022. "HFGNN-Proto: Hesitant Fuzzy Graph Neural Network-Based Prototypical Network for Few-Shot Text Classification" Electronics 11, no. 15: 2423. https://doi.org/10.3390/electronics11152423

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

HFGNN-Proto: Hesitant Fuzzy Graph Neural Network-Based Prototypical Network for Few-Shot Text Classification

Abstract

1. Introduction

2. Related Work

2.1. Few Shot Learning

2.2. Graph Neural Network

2.3. Multi-Criteria Decision-Making

3. Problem Definition

4. Methods

4.1. Overview

4.2. Text Embedding

4.3. HFGNN

4.3.1. Overview

4.3.2. Relation Generator

4.3.3. Relations Fusion Module

4.3.4. Instance-Node Updater

4.4. Prototypical Network

5. Experiments

5.1. Datasets

5.2. Implementation Details

5.2.1. Meta-Training and Meta-Testing

5.2.2. Parameters Setting

5.3. Results and Analysis

5.4. Comparison with PLMs

5.5. Ablation Experiment

5.6. Visualization

5.7. Limitations

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI