Prompt-Based Graph Convolution Adversarial Meta-Learning for Few-Shot Text Classification
Abstract
:1. Introduction
- We propose a Prompt-based Graph Convolutional Adversarial (PGCA) meta-learning network framework to address the overfitting issue in few-shot text classification.
- We design a GCN-based meta-knowledge extractor to fully use the limited knowledge by obtaining node-relationship information through inter-instance interactions. We also integrate adversarial networks into the meta-learning framework to extend the sample space through adversarial training and improve the model’s generalization ability.
- Our tests on four openly accessible datasets show that our strategy outperforms a number of other competing categorization strategies.
2. Related Work
2.1. Meta-Learning
- (1)
- Optimization-based meta-learning methods aim to learn a well-initialized generalized model that allows the model to converge and adapt quickly to new tasks with only a few training samples. This type of meta-learning approach is typified by MAML [14], which learns a generic initialized model parameter that allows the model to converge to optimal performance with a small number of iterations of training when faced with a new task [12,18]. To enhance the ability of models to adapt to new tasks, ref. [17] first proposed using adversarial networks to enhance the adaptability of meta-learning architectures. Ref. [19] mitigated the overfitting of meta-learning through gradient similarity using an adaptive meta-learner.
- (2)
- Metric-based meta-learning techniques seek to boost meta-learning performance by teaching participants a metric function or distance metric that may be used to compare the similarity of various tasks. Prototype networks [20] use the idea of clustering to project support sets into a metric space, obtain vector means based on the Euclidean distance metric as class prototypes, and calculate the distance to each prototype for test samples to achieve classification. However, the prototype network is susceptible to extreme samples. Ref. [21] introduced tagged word information to construct class prototypes, reducing the influence of extreme samples.
2.2. Graph Neural Network
2.3. Prompt Tuning
2.4. Inspiration
3. Methodology
3.1. Problem Formulation
3.2. PGCA
3.2.1. Coder with Template
3.2.2. Domain Discrimination
3.2.3. GCN-Based Meta-Knowledge Extractor
3.2.4. Class Prototype
3.2.5. Feature Fusion Module
3.2.6. Optimization
4. Experiments
4.1. Datasets
4.2. Baselines
4.3. Parameter Settings
4.4. Results and Analysis
4.4.1. Main Results
4.4.2. Ablation Experiment
4.4.3. Visualization
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Heidarysafa, M.; Kowsari, K.; Brown, D.E.; Meimandi, K.J.; Barnes, L.E. An improvement of data classification using random multimodel deep learning (rmdl). arXiv 2018, arXiv:1808.08121. [Google Scholar]
- Jiang, M.; Liang, Y.; Feng, X.; Fan, X.; Pei, Z.; Xue, Y.; Guan, R. Text classification based on deep belief network and softmax regression. Neural Comput. Appl. 2018, 29, 61–70. [Google Scholar] [CrossRef]
- Kowsari, K.; Jafari Meimandi, K.; Heidarysafa, M.; Mendu, S.; Barnes, L.; Brown, D. Text classification algorithms: A survey. Information 2019, 10, 150. [Google Scholar] [CrossRef] [Green Version]
- Lai, S.; Xu, L.; Liu, K.; Zhao, J. Recurrent convolutional neural networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
- Minaee, S.; Kalchbrenner, N.; Cambria, E.; Nikzad, N.; Chenaghlu, M.; Gao, J. Deep learning--based text classification: A comprehensive review. ACM Comput. Surv. CSUR 2021, 54, 62. [Google Scholar] [CrossRef]
- Kim, Y. Convolutional neural networks for sentence classification. arXiv 2014, arXiv:1408.5882. [Google Scholar]
- Zhang, X.; Zhao, J.; LeCun, Y. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems 28, Proceedings of the Annual Conference on Neural Information Processing Systems 2015 (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015; Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., Eds.; NeurIPS: La Jolla, CA, USA, 2015; Volume 28. [Google Scholar]
- Tang, D.; Qin, B.; Liu, T. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1422–1432. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2017) Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33, Proceedings of the 33rd Conference on Neural Information Processing Systems (NIPS 2020), Virtual Event, 6–12 December 2020; The MIT Press: Cambridge, MA, USA, 2020; Volume 33, pp. 1877–1901. [Google Scholar]
- Lester, B.; Al-Rfou, R.; Constant, N. The power of scale for parameter-efficient prompt tuning. arXiv 2021, arXiv:2104.08691. [Google Scholar]
- Bao, Y.; Wu, M.; Chang, S.; Barzilay, R. Few-shot text classification with distributional signatures. arXiv 2019, arXiv:1908.06039. [Google Scholar]
- Dong, B.; Yao, Y.; Xie, R.; Gao, T.; Han, X.; Liu, Z.; Lin, F.; Lin, L.; Sun, M. Meta-information guided meta-learning for few-shot relation classification. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 1594–1605. [Google Scholar]
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
- Geng, R.; Li, B.; Li, Y.; Zhu, X.; Jian, P.; Sun, J. Induction networks for few-shot text classification. arXiv 2019, arXiv:1902.10482. [Google Scholar]
- Hospedales, T.; Antoniou, A.; Micaelli, P.; Storkey, A. Meta-learning in neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 5149–5169. [Google Scholar] [CrossRef]
- Han, C.; Fan, Z.; Zhang, D.; Qiu, M.; Gao, M.; Zhou, A. Meta-learning adversarial domain adaptation network for few-shot text classification. arXiv 2021, arXiv:2107.12262. [Google Scholar]
- Nichol, A.; Achiam, J.; Schulman, J. On first-order meta-learning algorithms. arXiv 2018, arXiv:1803.02999. [Google Scholar]
- Lei, T.; Hu, H.; Luo, Q.; Peng, D.; Wang, X. Adaptive Meta-learner via Gradient Similarity for Few-shot Text Classification. arXiv 2022, arXiv:2209.04702. [Google Scholar]
- Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2017) Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Zhang, H.; Zhang, X.; Huang, H.; Yu, L. Prompt-based meta-learning for few-shot text classification. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 1342–1357. [Google Scholar]
- Gori, M.; Monfardini, G.; Scarselli, F. A new model for learning in graph domains. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, Montreal, QC, Canada, 31 July–5 August 2005; pp. 729–734. [Google Scholar]
- Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2008, 20, 61–80. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Battaglia, P.; Pascanu, R.; Lai, M.; Jimenez Rezende, D. Interaction networks for learning about objects, relations and physics. In Proceedings of the 29th Conference on Neural Information Processing Systems (NIPS 2016) Advances in Neural Information Processing Systems 29, Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
- Hoshen, Y. Vain: Attentional multi-agent predictive modeling. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2017) Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Liang, X.; Shen, X.; Feng, J.; Lin, L.; Yan, S. Semantic object parsing with graph lstm. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 125–143. [Google Scholar]
- Wang, X.; Ye, Y.; Gupta, A. Zero-shot recognition via semantic embeddings and knowledge graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6857–6866. [Google Scholar]
- Garcia, V.; Bruna, J. Few-shot learning with graph neural networks. arXiv 2017, arXiv:1711.04043. [Google Scholar]
- Yao, L.; Mao, C.; Luo, Y. Graph convolutional networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 7370–7377. [Google Scholar]
- Linmei, H.; Yang, T.; Shi, C.; Ji, H.; Li, X. Heterogeneous graph attention networks for semi-supervised short text classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 4821–4830. [Google Scholar]
- Xu, S.; Xiang, Y. Frog-GNN: Multi-perspective aggregation based graph neural network for few-shot text classification. Expert Syst. Appl. 2021, 176, 114795. [Google Scholar] [CrossRef]
- Schick, T.; Schütze, H. Exploiting cloze questions for few shot text classification and natural language inference. arXiv 2020, arXiv:2001.07676. [Google Scholar]
- Li, X.L.; Liang, P. Prefix-tuning: Optimizing continuous prompts for generation. arXiv 2021, arXiv:2101.00190. [Google Scholar]
- Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D. Matching networks for one shot learning. In Proceedings of the 29th Conference on Neural Information Processing Systems (NIPS 2016) Advances in Neural Information Processing Systems 29, Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
- Misra, R. News category dataset. arXiv 2022, arXiv:2209.11429. [Google Scholar]
- He, R.; McAuley, J. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In Proceedings of the 25th International Conference on World Wide Web, Montreal, QC, Canada, 11–15 April 2016; pp. 507–517. [Google Scholar]
- Lewis, D. Reuters-21578 Text Categorization Test Collection, Distribution 1.0; AT&T Labs-Research: Atlanta, GA, USA, 1997. [Google Scholar]
- Lang, K. Newsweeder: Learning to filter netnews. In Machine Learning Proceedings 1995, Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, CA, USA, 9–12 July 1995; Elsevier: Amsterdam, The Netherlands, 1995; pp. 331–339. [Google Scholar]
- Wang, S.; Liu, X.; Liu, B.; Dong, D. Sentence-aware Adversarial Meta-Learning for Few-Shot Text Classification. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 4844–4852. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Dataset | Classes | Samples | Average Length |
---|---|---|---|
HuffPost | 41 | 36,900 | 11 |
Amazon | 24 | 24,000 | 141 |
Reuters | 31 | 620 | 186 |
20 News | 20 | 18,828 | 341 |
Amazon | HuffPost | Reuters | 20 News | |||||
---|---|---|---|---|---|---|---|---|
1-shot | 5-shot | 1-shot | 5-shot | 1-shot | 5-shot | 1-shot | 5-shot | |
MAML | 39.3 | 47.2 | 43.7 | 54.3 | 84.8 | 94.0 | 33.8 | 43.7 |
PROTO-BERT | 68.1 | 82.5 | 52.9 | 69.3 | 86.7 | 93.9 | 37.8 | 45.3 |
MLADA | 68.4 | 86.0 | 45.0 | 64.9 | 82.3 | 96.7 | 59.6 | 77.8 |
SaAML | 71.47 | 86.37 | 51.26 | 69.44 | - | - | 70.79 | 84.30 |
FROG-GNN | 71.5 | 83.6 | 54.1 | 69.6 | - | - | - | - |
PBML | 80.03 | 87.58 | 71.54 | 75.71 | 94.91 | 96.97 | 87.50 | 92.32 |
PGCA (ours) | 80.36 | 87.68 | 72.10 | 76.10 | 95.20 | 97.29 | 88.09 | 92.45 |
Amazon | 20 News | |
---|---|---|
PGCA | 79.55 | 88.09 |
−w/o DD | 79.32 | 87.74 |
−w/o GCN | 78.60 | 87.68 |
−w/o DD + GCN | 78.42 | 87.50 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gong, R.; Qin, X.; Ran, W. Prompt-Based Graph Convolution Adversarial Meta-Learning for Few-Shot Text Classification. Appl. Sci. 2023, 13, 9093. https://doi.org/10.3390/app13169093
Gong R, Qin X, Ran W. Prompt-Based Graph Convolution Adversarial Meta-Learning for Few-Shot Text Classification. Applied Sciences. 2023; 13(16):9093. https://doi.org/10.3390/app13169093
Chicago/Turabian StyleGong, Ruwei, Xizhong Qin, and Wensheng Ran. 2023. "Prompt-Based Graph Convolution Adversarial Meta-Learning for Few-Shot Text Classification" Applied Sciences 13, no. 16: 9093. https://doi.org/10.3390/app13169093
APA StyleGong, R., Qin, X., & Ran, W. (2023). Prompt-Based Graph Convolution Adversarial Meta-Learning for Few-Shot Text Classification. Applied Sciences, 13(16), 9093. https://doi.org/10.3390/app13169093