Cloze-Style Data Augmentation for Few-Shot Intent Recognition
Abstract
:1. Introduction
- We only employ the knowledge of the pre-trained language model itself to augment the data, avoiding the dependence on external knowledge.
- We use an unsupervised learning strategy to leverage original input samples for meaningful data augmentation.
- We apply contrastive learning at different granularity to make full use of the limited amount of available instances in a meta task.
2. Related Work
2.1. Few-Shot Intent Recognition
2.2. Text Data Augmentation for Few-Shot Learning
3. Approach
3.1. Task Formulation
3.2. Unsupervised Cloze-Style Data Augmentation
3.3. Contrastive Learning at Different Granularities
3.3.1. Metric-Based Prototypical Classifier
3.3.2. Prototype-Level Contrastive Learning
3.3.3. Instance-Level Contrastive Learning
4. Experiments
4.1. Datasets and Metrics
4.2. Model Summary
- Prototypical Networks [7]: A metric-based model for few-shot classification which employs the distances between samples in embedding space to measure their similarity. It considers the label of the prototype closest to the query sample as the prediction of its class.
- Matching Networks [8]: A framework for few-shot classification which trains a network mapping a small labeled support set and unlabeled instances to their labels and avoids depending on fine-tuning to adapt to new categories.
4.3. Research Questions
- RQ1 Can our proposal beat the competitive few-shot learning baselines for the intent recognition task?
- RQ2 Which module in CDA contributes the most to the recognition accuracy?
- RQ3 What are the impacts of different templates on model performance?
4.4. Model Configuration
5. Results and Discussion
5.1. Overall Evaluation
5.2. Ablation Study
5.3. Impacts of Different Templates
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jolly, S.; Falke, T.; Tirkaz, C.; Sorokin, D. Data-Efficient Paraphrase Generation to Bootstrap Intent Classification and Slot Labeling for New Features in Task-Oriented Dialog Systems. In Proceedings of the 28th International Conference on Computational Linguistics: Industry Track, Online, 12 December 2020; pp. 10–20. [Google Scholar]
- Zhou, S.; Jia, J.; Wu, Z. Inferring Emotion from Large-scale Internet Voice Data: A Semi-supervised Curriculum Augmentation based Deep Learning Approach. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; pp. 6039–6047. [Google Scholar]
- Vargas, S.; Castells, P.; Vallet, D. Intent-oriented diversity in recommender systems. In Proceedings of the SIGIR’11: 34th International ACM SIGIR conference on research and development in Information Retrieval, Beijing, China, 24–28 July 2011; pp. 1211–1212. [Google Scholar]
- Wang, X.; Huang, T.; Wang, D.; Yuan, Y.; Liu, Z.; He, X.; Chua, T. Learning Intents behind Interactions with Knowledge Graph for Recommendation. In Proceedings of the WWW ’21: The Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 878–887. [Google Scholar]
- Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Snell, J.; Swersky, K.; Zemel, R.S. Prototypical Networks for Few-shot Learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4077–4087. [Google Scholar]
- Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. Matching Networks for One Shot Learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 3630–3638. [Google Scholar]
- Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a Few Examples: A Survey on Few-shot Learning. ACM Comput. Surv. 2020, 53, 63:1–63:34. [Google Scholar] [CrossRef]
- Zheng, J.; Cai, F.; Chen, H.; de Rijke, M. Pre-train, Interact, Fine-tune: A novel interaction representation for text classification. Inf. Process. Manag. 2020, 57, 102215. [Google Scholar] [CrossRef]
- Gao, T.; Han, X.; Liu, Z.; Sun, M. Hybrid Attention-Based Prototypical Networks for Noisy Few-Shot Relation Classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 28–30 January 2019; pp. 6407–6414. [Google Scholar]
- Bansal, T.; Jha, R.; Munkhdalai, T.; McCallum, A. Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, 16–20 November 2020; pp. 522–534. [Google Scholar]
- Zheng, J.; Cai, F.; Chen, W.; Lei, W.; Chen, H. Taxonomy-aware Learning for Few-Shot Event Detection. In Proceedings of the 30th Web Conference, Virtual Event/Ljubljana, Slovenia, 19–23 April 2021; pp. 3546–3557. [Google Scholar]
- Lai, V.D.; Nguyen, M.V.; Nguyen, T.H.; Dernoncourt, F. Graph Learning Regularization and Transfer Learning for Few-Shot Event Detection. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 2172–2176. [Google Scholar]
- Zheng, J.; Cai, F.; Chen, H. Incorporating Scenario Knowledge into a Unified Fine-tuning Architecture for Event Representation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, China, 25–30 July 2020; pp. 249–258. [Google Scholar]
- Zheng, J.; Cai, F.; Ling, Y.; Chen, H. Heterogeneous Graph Neural Networks to Predict What Happen Next. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain (Online), 8–13 December 2020; pp. 328–338. [Google Scholar]
- Hou, Y.; Lai, Y.; Wu, Y.; Che, W.; Liu, T. Few-shot Learning for Multi-label Intent Detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; AAAI Press: Palo Alto, CA, USA, 2021; pp. 13036–13044. [Google Scholar]
- Dopierre, T.; Gravier, C.; Subercaze, J.; Logerais, W. Few-shot Pseudo-Labeling for Intent Detection. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 4993–5003. [Google Scholar]
- Yang, S.; Liu, L.; Xu, M. Free Lunch for Few-shot Learning: Distribution Calibration. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Abdulmumin, I.; Galadanci, B.S.; Isa, A. Enhanced Back-Translation for Low Resource Neural Machine Translation Using Self-training. In Proceedings of the Information and Communication Technology and Applications, Minna, Nigeria, 24–27 November 2020; Volume 1350, pp. 355–371. [Google Scholar]
- Goyal, T.; Durrett, G. Neural Syntactic Preordering for Controlled Paraphrase Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 5–10 July 2020; pp. 238–252. [Google Scholar]
- Dopierre, T.; Gravier, C.; Logerais, W. ProtAugment: Intent Detection Meta-Learning through Unsupervised Diverse Paraphrasing. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), Virtual, 1–6 August 2021; pp. 2454–2466. [Google Scholar]
- Cavalin, P.R.; Ribeiro, V.H.A.; Appel, A.P.; Pinhanez, C.S. Improving Out-of-Scope Detection in Intent Classification by Using Embeddings of the Word Graph Space of the Classes. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 3952–3961. [Google Scholar]
- Casanueva, I.; Temčinas, T.; Gerz, D.; Henderson, M.; Vulić, I. Efficient Intent Detection with Dual Sentence Encoders. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, Online, 9 July 2020; pp. 38–45. [Google Scholar] [CrossRef]
- Abro, W.A.; Qi, G.; Ali, Z.; Feng, Y.; Aamir, M. Multi-turn intent determination and slot filling with neural networks and regular expressions. Knowl. Based Syst. 2020, 208, 106428. [Google Scholar] [CrossRef]
- Weld, H.; Huang, X.; Long, S.; Poon, J.; Han, S.C. A survey of joint intent detection and slot-filling models in natural language understanding. arXiv 2021, arXiv:2101.08091. [Google Scholar] [CrossRef]
- Sarikaya, R.; Hinton, G.E.; Ramabhadran, B. Deep belief nets for natural language call-routing. In Proceedings of the ICASSP, Prague, Czech Republic, 22–27 May 2011; pp. 5680–5683. [Google Scholar]
- Liu, B.; Lane, I.R. Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling. In Proceedings of the INTERSPEECH, San Francisco, CA, USA, 8–12 September 2016; pp. 685–689. [Google Scholar]
- Chen, Q.; Zhuo, Z.; Wang, W. BERT for Joint Intent Classification and Slot Filling. arXiv 2019, arXiv:1902.10909. [Google Scholar]
- Liu, H.; Zhang, X.; Fan, L.; Fu, X.; Li, Q.; Wu, X.; Lam, A.Y.S. Reconstructing Capsule Networks for Zero-shot Intent Classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019; pp. 4798–4808. [Google Scholar]
- Dopierre, T.; Gravier, C.; Logerais, W. PROTAUGMENT: Unsupervised diverse short-texts paraphrasing for intent detection meta-learning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021. [Google Scholar]
- Zhang, X.; Cai, F.; Hu, X.; Zheng, J.; Chen, H. A Contrastive learning-based Task Adaptation model for few-shot intent recognition. Inf. Process. Manag. 2022, 59, 102863. [Google Scholar] [CrossRef]
- Zhang, J.; Hashimoto, K.; Liu, W.; Wu, C.; Wan, Y.; Yu, P.S.; Socher, R.; Xiong, C. Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, 16–20 November 2020; pp. 5064–5082. [Google Scholar]
- Liu, Z.; Fan, Z.; Wang, Y.; Yu, P.S. Augmenting Sequential Recommendation with Pseudo-Prior Items via Reversely Pre-training Transformer. In Proceedings of the SIGIR: 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 1608–1612. [Google Scholar]
- Wei, J.W.; Zou, K. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. In Proceedings of the EMNLP/IJCNLP (1): Association for Computational Linguistics, Hong Kong, China, 3–7 November 2019; pp. 6381–6387. [Google Scholar]
- Yu, A.W.; Dohan, D.; Luong, M.; Zhao, R.; Chen, K.; Norouzi, M.; Le, Q.V. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. In Proceedings of the ICLR (Poster), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Xie, Z.; Wang, S.I.; Li, J.; Lévy, D.; Nie, A.; Jurafsky, D.; Ng, A.Y. Data Noising as Smoothing in Neural Network Language Models. In Proceedings of the ICLR (Poster), Toulon, France, 24–26 April 2017. [Google Scholar]
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the ACL: Association for Computational Linguistics, Online, 5–10 July 2020; pp. 7871–7880. [Google Scholar]
- Schick, T.; Schütze, H. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference. In Proceedings of the EACL: Association for Computational Linguistics, Online, 19–23 April 2021; pp. 255–269. [Google Scholar]
- Banerjee, A.; Merugu, S.; Dhillon, I.S.; Ghosh, J. Clustering with Bregman Divergences. J. Mach. Learn. Res. 2005, 6, 1705–1749. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the ICLR (Poster), Toulon, France, 24–26 April 2017. [Google Scholar]
- Satorras, V.G.; Estrach, J.B. Few-Shot Learning with Graph Neural Networks. In Proceedings of the ICLR (Poster), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Dataset | # Categories | # Samples | # Domains |
---|---|---|---|
CLINC-150 | 150 | 22,500 | 10 |
BANKING-77 | 77 | 13,083 | 1 |
Model | CLINC-150 | BANKING-77 | ||
---|---|---|---|---|
5-Way 1-Shot | 5-Way 5-Shot | 5-Way 1-Shot | 5-Way 5-Shot | |
ProtoNet | 85.96 ± 0.18 | 90.46 ± 0.13 | 68.14 ± 1.22 | 79.48 ± 0.92 |
MatchNet | 86.63 ± 0.25 | 89.73 ± 0.60 | 68.88 ± 1.31 | 77.60 ± 1.02 |
GCN | 84.96 ± 0.17 | 90.03 ± 0.64 | 68.39 ± 1.21 | 77.66 ± 0.98 |
CDA-PC (Ours) | 88.73 ± 0.51 | 94.59 ± 0.27 | 68.50 ± 1.20 | 77.84 ± 1.07 |
CDA-IC (Ours) | 90.99 ± 0.52 | 95.37 ± 0.27 | 70.57 ± 0.76 | 81.34 ± 0.92 |
Model | CLINC-150 | BANKING-77 | ||
---|---|---|---|---|
5-Way 1-Shot | 5-Way 5-Shot | 5-Way 1-Shot | 5-Way 5-Shot | |
CDA-IC | 88.46 ± 0.47 | 94.13 ± 0.51 | 68.81 ± 1.11 | 79.91 ± 0.95 |
w/o unsupervised learning | ||||
CDA-IC | 87.36 ± 0.17 ▾ | 91.55 ± 0.35 ▾ | 66.41 ± 1.21 ▾ | 76.90 ± 0.92 ▾ |
w/o contrastive learning | ||||
CDA-IC | 83.00 ± 0.57 | 88.47 ± 0.75 | 66.14 ± 1.32 | 76.41 ± 1.00 |
w/o both modules | ||||
CDA-IC | 90.99 ± 0.52 | 95.37 ± 0.27 | 70.57 ± 0.76 | 81.34 ± 0.92 |
No. | Template |
---|---|
1 | The sentence: ‘___’ means [MASK]. |
2 | ‘___’ means [MASK]. |
3 | The intent in ‘___’ means [MASK]. |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, X.; Jiang, M.; Chen, H.; Chen, C.; Zheng, J. Cloze-Style Data Augmentation for Few-Shot Intent Recognition. Mathematics 2022, 10, 3358. https://doi.org/10.3390/math10183358
Zhang X, Jiang M, Chen H, Chen C, Zheng J. Cloze-Style Data Augmentation for Few-Shot Intent Recognition. Mathematics. 2022; 10(18):3358. https://doi.org/10.3390/math10183358
Chicago/Turabian StyleZhang, Xin, Miao Jiang, Honghui Chen, Chonghao Chen, and Jianming Zheng. 2022. "Cloze-Style Data Augmentation for Few-Shot Intent Recognition" Mathematics 10, no. 18: 3358. https://doi.org/10.3390/math10183358
APA StyleZhang, X., Jiang, M., Chen, H., Chen, C., & Zheng, J. (2022). Cloze-Style Data Augmentation for Few-Shot Intent Recognition. Mathematics, 10(18), 3358. https://doi.org/10.3390/math10183358