Backdoor Attack Against Dataset Distillation in Natural Language Processing
Abstract
:1. Introduction
- To the best of our understanding, this research represents the inaugural backdoor attack targeting dataset distillation within the NLP domain. Our approach involves embedding triggers into the distilled dataset during its creation in the upstream phase and subsequently executing a backdoor attack on downstream models that are trained using this modified dataset.
- We introduce the BAMDD-NLP backdoor attack method. Our comprehensive experiments show that BAMDD-NLP can deliver impressive attack performance.
- We conduct a series of ablation experiments to evaluate the attack performance of BAMDD-NLP under different settings. The results from these experiments highlight the robustness of our proposed attack method across different configurations.
- We assess our attack against three different defense mechanisms. The findings from these experiments show that these defense strategies are ineffective at mitigating our attack method.
2. Related Work
2.1. Dataset Distillation
2.2. Backdoor Attacks
2.3. Backdoor Defense
2.4. Dataset Distillation Techniques
2.4.1. DwAL
2.4.2. DiLM
3. Methodology
3.1. Threat Model
3.2. Backdoor Attack Against Dataset Distillation (BAMDD-NLP)
Algorithm 1: Backdoor Attack Against Dataset Distillation |
4. Experiments
4.1. Experimental Settings
- The ASR evaluates the performance of the backdoored model on a test dataset that includes the trigger. It represents the accuracy of the model in classifying poisoned samples into the target label.
- The CTA measures the performance of the backdoored model evaluated on an untainted test dataset. It refers to the classification accuracy of the model on clean samples.
4.2. Results Analysis
- RQ1: Can BAMDD-NLP achieve high attack performance?
- RQ2: Does BAMDD-NLP preserve the utility of the model?
4.2.1. Attack Performance
4.2.2. Utility of the Distillation Model
4.3. Ablation Study
4.3.1. Poisoning Ratio
4.3.2. Effectiveness Across Different Architectures
4.3.3. Multiple-Shot and Multiple-Step Settings for DwAL
4.3.4. Data-per-Class Setting for DiLM
4.4. Defenses
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 11108–11117. [Google Scholar]
- Rong, Y.; Bian, Y.; Xu, T.; Xie, W.; Wei, Y.; Huang, W.; Huang, J. Self-supervised graph transformer on large-scale molecular data. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 6–12 December 2020; pp. 12559–12571. [Google Scholar]
- Tang, J.; Liu, J.; Zhang, M.; Mei, Q. Visualizing large-scale and high-dimensional data. In Proceedings of the 25th International Conference on World Wide Web, Montréal, QC, Canada, 11–15 May 2016; pp. 287–297. [Google Scholar]
- Pan, Y.; Li, Y.; Luo, J.; Xu, J.; Yao, T.; Mei, T. Auto-captions on GIF: A large-scale video-sentence dataset for vision-language pre-training. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 7070–7074. [Google Scholar]
- Schuhmann, C.; Beaumont, R.; Vencu, R.; Gordon, C.; Wightman, R.; Cherti, M.; Coombes, T.; Katta, A.; Mullis, C.; Wortsman, M.; et al. LAION-5B: An open large-scale dataset for training next generation image-text models. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 November 2022; pp. 25278–25294. [Google Scholar]
- Schwartz, R.; Dodge, J.; Smith, N.A.; Etzioni, O. Green ai. Commun. ACM 2020, 63, 54–63. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar] [CrossRef]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 6–12 December 2020; pp. 1877–1901. [Google Scholar]
- Wang, T.; Zhu, J.Y.; Torralba, A.; Efros, A.A. Dataset distillation. arXiv 2018, arXiv:1811.10959. [Google Scholar] [CrossRef]
- Cazenavette, G.; Wang, T.; Torralba, A.; Efros, A.A.; Zhu, J.Y. Dataset Distillation by Matching Training Trajectories. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10718–10727. [Google Scholar]
- Zhao, B.; Mopuri, K.R.; Bilen, H. Dataset Condensation with Gradient Matching. In Proceedings of the Ninth International Conference on Learning Representations, Virtual, Austria, 3–7 May 2021. [Google Scholar]
- Wang, K.; Zhao, B.; Peng, X.; Zhu, Z.; Yang, S.; Wang, S.; Huang, G.; Bilen, H.; Wang, X.; You, Y. CAFE: Learning to Condense Dataset by Aligning Features. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 12186–12195. [Google Scholar]
- Zhao, B.; Bilen, H. Dataset Condensation with Distribution Matching. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; pp. 6503–6512. [Google Scholar]
- Such, F.P.; Rawal, A.; Lehman, J.; Stanley, K.; Clune, J. Generative teaching networks: Accelerating neural architecture search by learning to generate synthetic training data. In Proceedings of the International Conference on Machine Learning, PMLR, Vienna, Austria, 13–18 July 2020; pp. 9206–9216. [Google Scholar]
- Medvedev, D.; D’yakonov, A. Learning to generate synthetic training data using gradient matching and implicit differentiation. In Proceedings of the International Conference on Analysis of Images, Social Networks and Texts, Tbilisi, Georgia, 16–18 December 2021; pp. 138–150. [Google Scholar]
- Dong, T.; Zhao, B.; Lyu, L. Privacy for free: How does dataset condensation help privacy? In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 5378–5396. [Google Scholar]
- Chen, D.; Kerkouche, R.; Fritz, M. Private set generation with discriminative information. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 Novemeber–9 December 2022; pp. 14678–14690. [Google Scholar]
- Sangermano, M.; Carta, A.; Cossu, A.; Bacciu, D. Sample condensation in online continual learning. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; pp. 1–8. [Google Scholar]
- Wiewel, F.; Yang, B. Condensed composite memory continual learning. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar]
- Xiong, Y.; Wang, R.; Cheng, M.; Yu, F.; Hsieh, C.J. Feddm: Iterative distribution matching for communication-efficient federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 16323–16332. [Google Scholar]
- Zhang, J.; Chen, C.; Li, B.; Lyu, L.; Wu, S.; Ding, S.; Shen, C.; Wu, C. DENSE: Data-free one-shot federated learning. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 November 2022; pp. 21414–21428. [Google Scholar]
- Li, Y.; Li, W. Data distillation for text classification. arXiv 2021, arXiv:2104.08448. [Google Scholar] [CrossRef]
- Sucholutsky, I.; Schonlau, M. Soft-label dataset distillation and text dataset distillation. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar]
- Sahni, S.; Patel, H. Exploring Multilingual Text Data Distillation. arXiv 2023, arXiv:2308.04982. [Google Scholar] [CrossRef]
- Maekawa, A.; Kobayashi, N.; Funakoshi, K.; Okumura, M. Dataset distillation with attention labels for fine-tuning bert. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2023; Volume 2, pp. 119–127. [Google Scholar]
- Yu, R.; Liu, S.; Wang, X. Dataset distillation: A comprehensive review. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 150–170. [Google Scholar] [CrossRef] [PubMed]
- Geng, J.; Chen, Z.; Wang, Y.; Woisetschläger, H.; Schimmler, S.; Mayer, R.; Zhao, Z.; Rong, C. A survey on dataset distillation: Approaches, applications and future directions. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence, Macau, China, 19–25 August 2023; pp. 6610–6618. [Google Scholar]
- Maekawa, A.; Kosugi, S.; Funakoshi, K.; Okumura, M. DiLM: Distilling Dataset into Language Model for Text-level Dataset Distillation. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL, Mexico City, Mexico, 16–21 June 2024; pp. 3138–3153. [Google Scholar]
- Ling, X.; Ji, S.; Zou, J.; Wang, J.; Wu, C.; Li, B.; Wang, T. Deepsec: A uniform platform for security analysis of deep learning model. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; pp. 673–690. [Google Scholar]
- Liu, Y.; Wen, R.; He, X.; Salem, A.; Zhang, Z.; Backes, M.; De Cristofaro, E.; Fritz, M.; Zhang, Y. ML-Doctor: Holistic risk assessment of inference attacks against machine learning models. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 10–12 August 2022; pp. 4525–4542. [Google Scholar]
- Li, B.; Vorobeychik, Y. Scalable optimization of randomized operational decisions in adversarial classification settings. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, PMLR, San Diego, CA, USA, 9–12 May 2015; pp. 599–607. [Google Scholar]
- Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. In Proceedings of the 3rd International Conference on Learning Representations, (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.; Celik, Z.B.; Swami, A. The limitations of deep learning in adversarial settings. In Proceedings of the 2016 IEEE European Symposium on Security and Privacy (EuroS&P), Saarbruecken, Germany, 21–24 March 2016; pp. 372–387. [Google Scholar]
- Fredrikson, M.; Jha, S.; Ristenpart, T. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, 12–16 October 2015; pp. 1322–1333. [Google Scholar]
- He, X.; Jia, J.; Backes, M.; Gong, N.Z.; Zhang, Y. Stealing links from graph neural networks. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Online, 11–13 August 2021; pp. 2669–2686. [Google Scholar]
- Papernot, N.; McDaniel, P.; Sinha, A.; Wellman, M.P. Sok: Security and privacy in machine learning. In Proceedings of the 2018 IEEE European Symposium on Security and Privacy (EuroS&P), London, UK, 24–26 April 2018; pp. 399–414. [Google Scholar]
- Salem, A.; Bhattacharya, A.; Backes, M.; Fritz, M.; Zhang, Y. Updates-Leak: Data set inference and reconstruction attacks in online learning. In Proceedings of the 29th USENIX Security Symposium (USENIX Security 20), Boston, MA, USA, 12–14 August 2020; pp. 1291–1308. [Google Scholar]
- Salem, A.; Zhang, Y.; Humbert, M.; Berrang, P.; Fritz, M.; Backes, M. ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models. In Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 24–27 February 2019. [Google Scholar]
- Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership inference attacks against machine learning models. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017; pp. 3–18. [Google Scholar]
- Gu, T.; Dolan-Gavitt, B.; Garg, S. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv 2017, arXiv:1708.06733. [Google Scholar] [CrossRef]
- Salem, A.; Wen, R.; Backes, M.; Ma, S.; Zhang, Y. Dynamic backdoor attacks against machine learning models. In Proceedings of the 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P), Genoa, Italy, 6–10 June 2022; pp. 703–718. [Google Scholar]
- Wang, B.; Yao, Y.; Shan, S.; Li, H.; Viswanath, B.; Zheng, H.; Zhao, B.Y. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; pp. 707–723. [Google Scholar]
- Yao, Y.; Li, H.; Zheng, H.; Zhao, B.Y. Latent backdoor attacks on deep neural networks. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; pp. 2041–2055. [Google Scholar]
- Nguyen, T.; Chen, Z.; Lee, J. Dataset Meta-Learning from Kernel Ridge-Regression. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Nguyen, T.; Novak, R.; Xiao, L.; Lee, J. Dataset distillation with infinitely wide convolutional networks. In Proceedings of the 35th International Conference on Neural Information Processing Systems, Online, 6–14 December 2021; pp. 5186–5198. [Google Scholar]
- Liu, Y.; Li, Z.; Backes, M.; Shen, Y.; Zhang, Y. Backdoor Attacks Against Dataset Distillation. In Proceedings of the Network and Distributed Systems Security Symposium (NDSS), San Diego, CA, USA, 27 February–3 March 2023. [Google Scholar]
- Maclaurin, D.; Duvenaud, D.; Adams, R. Gradient-based hyperparameter optimization through reversible learning. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 7–9 July 2015; pp. 2113–2122. [Google Scholar]
- Zhao, B.; Bilen, H. Dataset condensation with differentiable siamese augmentation. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 12674–12685. [Google Scholar]
- Cui, J.; Wang, R.; Si, S.; Hsieh, C.J. Scaling up dataset distillation to imagenet-1k with constant memory. In Proceedings of the International Conference on Machine Learning, PMLR, Honolulu, HI, USA, 23–29 July 2023; pp. 6565–6590. [Google Scholar]
- Bohdal, O.; Yang, Y.; Hospedales, T.M. Flexible Dataset Distillation: Learn Labels Instead of Images. In Proceedings of the 4th Workshop on Meta-Learning at NeurIPS, Vancouver, BC, Canada, 11 December 2020. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Gu, N.; Fu, P.; Liu, X.; Liu, Z.; Lin, Z.; Wang, W. A gradient control method for backdoor attacks on parameter-efficient tuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2023; Volume 1, pp. 3508–3520. [Google Scholar]
- Hu, S.; Zhou, Z.; Zhang, Y.; Zhang, L.Y.; Zheng, Y.; He, Y.; Jin, H. Badhash: Invisible backdoor attacks against deep hashing with clean label. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 678–686. [Google Scholar]
- Zhao, S.; Wen, J.; Luu, A.; Zhao, J.; Fu, J. Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; pp. 12303–12317. [Google Scholar]
- Kandpal, N.; Jagielski, M.; Tramèr, F.; Carlini, N. Backdoor Attacks for In-Context Learning with Language Models. In Proceedings of the Second Workshop on New Frontiers in Adversarial Machine Learning, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
- Wang, H.; Shu, K. Backdoor activation attack: Attack large language models using activation steering for safety-alignment. arXiv 2023, arXiv:2311.09433. [Google Scholar] [CrossRef]
- Salem, X.C.A.; Zhang, M.B.S.M.Y. Badnl: Backdoor attacks against nlp models. In Proceedings of the ICML 2021 Workshop on Adversarial Machine Learning, Online, 18–24 July 2021. [Google Scholar]
- Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. BERTScore: Evaluating Text Generation with BERT. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Qi, F.; Li, M.; Chen, Y.; Zhang, Z.; Liu, Z.; Wang, Y.; Sun, M. Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021; Volume 1, pp. 443–453. [Google Scholar]
- Li, L.; Song, D.; Li, X.; Zeng, J.; Ma, R.; Qiu, X. Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 3023–3032. [Google Scholar]
- Du, W.; Zhao, Y.; Li, B.; Liu, G.; Wang, S. PPT: Backdoor Attacks on Pre-trained Models via Poisoned Prompt Tuning. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI), Vienna, Austria, 23–29 July 2022; pp. 680–686. [Google Scholar]
- Xu, L.; Chen, Y.; Cui, G.; Gao, H.; Liu, Z. Exploring the Universal Vulnerability of Prompt-based Learning Paradigm. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL, Seattle, WA, USA, 10–15 July 2022; pp. 1799–1810. [Google Scholar]
- Cai, X.; Xu, H.; Xu, S.; Zhang, Y.; Yuan, X. BadPrompt: Backdoor attacks on continuous prompts. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 Novemeber–9 December 2022; pp. 37068–37080. [Google Scholar]
- Chen, X.; Dong, Y.; Sun, Z.; Zhai, S.; Shen, Q.; Wu, Z. Kallima: A clean-label framework for textual backdoor attacks. In Proceedings of the European Symposium on Research in Computer Security, Copenhagen, Denmark, 26–30 September 2022; pp. 447–466. [Google Scholar]
- Gan, L.; Li, J.; Zhang, T.; Li, X.; Meng, Y.; Wu, F.; Yang, Y.; Guo, S.; Fan, C. Triggerless Backdoor Attack for NLP Tasks with Clean Labels. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 July 2022; pp. 2942–2952. [Google Scholar]
- Huang, Y.; Zhuo, T.Y.; Xu, Q.; Hu, H.; Yuan, X.; Chen, C. Training-free lexical backdoor attacks on language models. In Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 2198–2208. [Google Scholar]
- Yao, H.; Lou, J.; Qin, Z. Poisonprompt: Backdoor attack on prompt-based large language models. In Proceedings of the ICASSP 2024–2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 7745–7749. [Google Scholar]
- Wan, A.; Wallace, E.; Shen, S.; Klein, D. Poisoning language models during instruction tuning. In Proceedings of the International Conference on Machine Learning, PMLR, Honolulu, HI, USA, 23–29 July 2023; pp. 35413–35425. [Google Scholar]
- Xu, J.; Ma, M.; Wang, F.; Xiao, C.; Chen, M. Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Mexico City, Mexico, 16–21 June 2024; Volume 1, pp. 3111–3126. [Google Scholar]
- Xiang, Z.; Jiang, F.; Xiong, Z.; Ramasubramanian, B.; Poovendran, R.; Li, B. BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models. In Proceedings of the NeurIPS 2023 Workshop on Backdoors in Deep Learning—The Good, the Bad, and the Ugly, New Orleans, LA, USA, 15 December 2023. [Google Scholar]
- Chen, B.; Carvalho, W.; Baracaldo, N.; Ludwig, H.; Edwards, B.; Lee, T.; Molloy, I.; Srivastava, B. Detecting backdoor attacks on deep neural networks by activation clustering. In Proceedings of the Workshop on Artificial Intelligence Safety, CEUR-WS, Honolulu, HI, USA, 27 January 2019. [Google Scholar]
- Qi, F.; Chen, Y.; Li, M.; Yao, Y.; Liu, Z.; Sun, M. ONION: A Simple and Effective Defense Against Textual Backdoor Attacks. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 9558–9566. [Google Scholar]
- Yang, W.; Lin, Y.; Li, P.; Zhou, J.; Sun, X. RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 8365–8381. [Google Scholar]
- Chen, S.; Yang, W.; Zhang, Z.; Bi, X.; Sun, X. Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 668–683. [Google Scholar]
- Gao, Y.; Xu, C.; Wang, D.; Chen, S.; Ranasinghe, D.C.; Nepal, S. Strip: A defence against trojan attacks on deep neural networks. In Proceedings of the 35th Annual Computer Security Applications Conference, San Juan, PR, USA, 9–13 December 2019; pp. 113–125. [Google Scholar]
- Tran, B.; Li, J.; Mądry, A. Spectral signatures in backdoor attacks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; pp. 8011–8021. [Google Scholar]
- Dong, Y.; Yang, X.; Deng, Z.; Pang, T.; Xiao, Z.; Su, H.; Zhu, J. Black-box Detection of Backdoor Attacks with Limited Information and Data. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 16462–16471. [Google Scholar]
- Azizi, A.; Tahmid, I.; Waheed, A.; Mangaokar, N.; Pu, J.; Javed, M.; Reddy, C.K.; Viswanath, B. T-Miner: A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Vancouver, BC, Canada, 11–13 August 2021; pp. 2255–2272. [Google Scholar]
- Liu, K.; Dolan-Gavitt, B.; Garg, S. Fine-pruning: Defending against backdooring attacks on deep neural networks. In Proceedings of the International Symposium on Research in Attacks, Intrusions, and Defenses, Heraklion, Crete, Greece, 10–12 September 2018; pp. 273–294. [Google Scholar]
- Li, Y.; Lyu, X.; Koren, N.; Lyu, L.; Li, B.; Ma, X. Neural attention distillation: Erasing backdoor triggers from deep neural networks. In Proceedings of the 9th International Conference on Learning Representations, ICLR, Virtual, 3–7 May 2021. [Google Scholar]
- Chen, C.; Dai, J. Mitigating backdoor attacks in lstm-based text classification systems by backdoor keyword identification. Neurocomputing 2021, 452, 253–262. [Google Scholar] [CrossRef]
- Shen, L.; Jiang, H.; Liu, L.; Shi, S. Rethink the evaluation for attack strength of backdoor attacks in natural language processing. arXiv 2022, arXiv:2201.02993. [Google Scholar] [CrossRef]
- Le, T.; Park, N.; Lee, D. A Sweet Rabbit Hole by DARCY: Using Honeypots to Detect Universal Trigger’s Adversarial Attacks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021; Volume 1, pp. 3831–3844. [Google Scholar]
- Li, Z.; Mekala, D.; Dong, C.; Shang, J. BFClass: A Backdoor-free Text Classification Framework. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 444–453. [Google Scholar]
- Wang, W.; Wei, F.; Dong, L.; Bao, H.; Yang, N.; Zhou, M. MINILM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 6–12 December 2020; pp. 5776–5788. [Google Scholar]
- Wang, W.; Bao, H.; Huang, S.; Dong, L.; Wei, F. MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online, 1–6 August 2021; pp. 2140–2151. [Google Scholar]
- Aguilar, G.; Ling, Y.; Zhang, Y.; Yao, B.; Fan, X.; Guo, C. Knowledge distillation from internal representations. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 7350–7357. [Google Scholar]
- Kurita, K.; Michel, P.; Neubig, G. Weight Poisoning Attacks on Pretrained Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 2793–2806. [Google Scholar]
- Zhang, X.; Zhao, J.; LeCun, Y. Character-level convolutional networks for text classification. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Volume 1, pp. 649–657. [Google Scholar]
- Wang, A.; Singh, A.; Michael, J.; Hill, F.; Levy, O.; Bowman, S. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, 1 November 2018; pp. 353–355. [Google Scholar]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. Adv. Neural Inf. Process. Syst. 2019, 32, 5754–5764. [Google Scholar]
- Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Defense Methods | Type | Victim Models | Trigger Granularity |
---|---|---|---|
T-miner [80] | Model diagnosis | Transformer | Sentence-level |
Fine-pruning [81] | Model diagnosis | BERT | Word-level |
NAD [82] | Model diagnosis | BERT | Word-level |
BKI [83] | Dataset cleaning | Bi-LSTM | Sentence-level |
Trigger Breaker [84] | Dataset cleaning | BERT | Sentence-level |
ONION [74] | Trigger filtering | BERT | Sentence-level, Word-level |
DARCY [85] | Poisoned text detection | BERT | Word-level |
STRIP [77] | Poisoned text detection | Bi-LSTM | Word-level |
RAP [75] | Poisoned text detection | BERT | Word-level, Sentence-level |
BFClass [86] | Dataset cleaning | BERT | Word-level |
LFR+R&C [86] | Dataset cleaning | BERT | Word-level |
Label | Sentence |
---|---|
negative | those with a modicum of patience, |
it’s hard to shake the feeling that it was intended to be a different kind of film. | |
distorts reality for people who make movies and watch them | |
it’s also cold, gray, antiseptic and emotionally desiccated. | |
it made me feel unclean, and i ’m the guy who liked there’s something about mary and both american pie movies. | |
positive | insightful about kissinger’s background and history |
as an important, original talent in international cinema | |
a sensitive, modest comic tragedy that works as both character study and symbolic examination of the huge economic changes sweeping modern china | |
an eloquent, deeply felt meditation on the nature of compassion | |
this rich, bittersweet israeli documentary, about the life of song-and-dance-man pasach ‘ ke burstein and his family, transcends ethnic lines. |
Dataset | Task | Metric | C | Train | Test | Dev |
---|---|---|---|---|---|---|
AGNews | news classification | acc. | 4 | 120 k | 7.6 k | 12k |
SST-2 | sentiment | acc. | 2 | 67 k | 1.8 k | 872 |
QNLI | QA/NLI | acc. | 2 | 105 k | 5.4 k | 5.5 k |
MRPC | paraphrase | acc./F1 | 2 | 3.7 k | 1.7 k | 408 |
Dataset | Learning Model | Clean Model | BAMDD-NLP | ||
---|---|---|---|---|---|
ASR | CTA | ASR | CTA | ||
SST-2 | BERTBASE | 0.453 | 0.757 | 0.997 | 0.744 |
XLNetBASE | 0.447 | 0.754 | 0.985 | 0.746 | |
RoBERTaBASE | 0.393 | 0.749 | 0.982 | 0.742 | |
QNLI | BERTBASE | 0.415 | 0.859 | 1.000 | 0.850 |
XLNetBASE | 0.396 | 0.855 | 1.000 | 0.847 | |
RoBERTaBASE | 0.389 | 0.851 | 1.000 | 0.838 | |
MRPC | BERTBASE | 0.379 | 0.684 | 1.000 | 0.684 |
XLNetBASE | 0.358 | 0.684 | 0.996 | 0.682 | |
RoBERTaBASE | 0.355 | 0.680 | 0.997 | 0.674 | |
AGNews | BERTBASE | 0.273 | 0.907 | 1.000 | 0.906 |
XLNetBASE | 0.268 | 0.901 | 1.000 | 0.898 | |
RoBERTaBASE | 0.271 | 0.897 | 1.000 | 0.892 |
Dataset | Learning Model | Clean Model | BAMDD-NLP | ||
---|---|---|---|---|---|
ASR | CTA | ASR | CTA | ||
SST-2 | BERTBASE | 0.318 | 0.771 | 0.913 | 0.749 |
XLNetBASE | 0.358 | 0.716 | 0.933 | 0.701 | |
RoBERTaBASE | 0.255 | 0.768 | 0.905 | 0.761 | |
MRPC | BERTBASE | 0.375 | 0.684 | 0.916 | 0.676 |
XLNetBASE | 0.352 | 0.714 | 0.941 | 0.698 | |
RoBERTaBASE | 0.317 | 0.692 | 0.912 | 0.685 |
CA Model | BERTBASE | XLNetBASE | RoBERTaBASE | ||||
---|---|---|---|---|---|---|---|
ASR | CTA | ASR | CTA | ASR | CTA | ||
SST-2 | BERTBASE | 0.913 | 0.749 | 0.907 | 0.732 | 0.902 | 0.716 |
XLNetBASE | 0.915 | 0.698 | 0.933 | 0.701 | 0.906 | 0.685 | |
MRPC | BERTBASE | 0.916 | 0.676 | 0.910 | 0.671 | 0.897 | 0.669 |
XLNetBASE | 0.924 | 0.683 | 0.941 | 0.698 | 0.917 | 0.679 |
# Step | # Shot | AGNews | SST-2 | QNLI | MRPC | ||||
---|---|---|---|---|---|---|---|---|---|
ASR | CTA | ASR | CTA | ASR | CTA | ASR | CTA | ||
Single-step setting | |||||||||
1 | 1 | 1.000 | 0.906 | 0.997 | 0.744 | 1.000 | 0.850 | 1.000 | 0.684 |
1 | 3 | 1.000 | 0.910 | 0.999 | 0.751 | 1.000 | 0.853 | 1.000 | 0.692 |
1 | 5 | 1.000 | 0.905 | 0.997 | 0.748 | 1.000 | 0.853 | 1.000 | 0.691 |
Same distilled data for each step | |||||||||
3 | 1 | 1.000 | 0.906 | 0.996 | 0.745 | 0.996 | 0.837 | 0.995 | 0.677 |
5 | 1 | 0.996 | 0.898 | 0.989 | 0.729 | 0.999 | 0.847 | 0.995 | 0.676 |
Different distilled data for each step | |||||||||
3 | 3 | 0.998 | 0.901 | 0.998 | 0.746 | 1.000 | 0.862 | 1.000 | 0.695 |
5 | 5 | 1.000 | 0.907 | 1.000 | 0.758 | 0.998 | 0.849 | 0.998 | 0.682 |
AGNews | SST-2 | QNLI | MRPC | ||||||
---|---|---|---|---|---|---|---|---|---|
ASR | CTA | ASR | CTA | ASR | CTA | ASR | CTA | ||
BERTBASE | DwAL | 1.000 | 0.206 | 1.000 | 0.144 | 1.000 | 0.135 | 1.000 | 0.084 |
DiLM | - | - | 1.000 | 0.151 | - | - | 0.000 | 0.092 | |
XLNetBASE | DwAL | 1.000 | 0.185 | 0.000 | 0.169 | 1.000 | 0.144 | 1.000 | 0.089 |
DiLM | - | - | 0.000 | 0.158 | - | - | 0.000 | 0.082 |
AGNews | SST-2 | QNLI | MRPC | ||
---|---|---|---|---|---|
BERTBASE | DwAL | 0.876 | 1.375 | 1.147 | 1.875 |
DiLM | - | 1.412 | - | 1.637 | |
XLNetBASE | DwAL | 0.904 | 1.386 | 0.832 | 0.962 |
DiLM | - | 0.967 | - | 1.473 |
BERTBASE | XLNetBASE | ||||||||
---|---|---|---|---|---|---|---|---|---|
DwAL | DiLM | DwAL | DiLM | ||||||
ASR | CTA | ASR | CTA | ASR | CTA | ASR | CTA | ||
AGNews | Benign | 1.000 | 0.906 | - | - | 1.000 | 0.898 | - | - |
ONION | 0.904 | 0.887 | - | - | 0.927 | 0.875 | - | - | |
SCPD | 0.872 | 0.854 | - | - | 0.859 | 0.843 | - | - | |
SST-2 | Benign | 0.997 | 0.744 | 0.913 | 0.749 | 0.985 | 0.746 | 0.933 | 0.701 |
ONION | 0.885 | 0.731 | 0.849 | 0.715 | 0.896 | 0.721 | 0.882 | 0.676 | |
SCPD | 0.823 | 0.702 | 0.794 | 0.694 | 0.843 | 0.718 | 0.853 | 0.681 | |
QNLI | Benign | 1.000 | 0.850 | - | - | 1.000 | 0.847 | - | - |
ONION | 0.925 | 0.832 | - | - | 0.937 | 0.813 | - | - | |
SCPD | 0.843 | 0.795 | - | - | 0.865 | 0.779 | - | - | |
MRPC | Benign | 1.000 | 0.684 | 0.916 | 0.676 | 0.996 | 0.682 | 0.941 | 0.698 |
ONION | 0.917 | 0.653 | 0.856 | 0.651 | 0.916 | 0.664 | 0.901 | 0.655 | |
SCPD | 0.859 | 0.642 | 0.800 | 0.649 | 0.890 | 0.668 | 0.847 | 0.647 |
BERTBASE | XLNetBASE | ||||||||
---|---|---|---|---|---|---|---|---|---|
DwAL | DiLM | DwAL | DiLM | ||||||
ASR | CTA | ASR | CTA | ASR | CTA | ASR | CTA | ||
AGNews | Benign | 1.000 | 0.906 | - | - | 1.000 | 0.898 | - | - |
RAP | 0.876 | 0.882 | - | - | 0.893 | 0.866 | - | - | |
SST-2 | Benign | 0.997 | 0.744 | 0.913 | 0.749 | 0.985 | 0.746 | 0.933 | 0.701 |
RAP | 0.815 | 0.724 | 0.769 | 0.707 | 0.836 | 0.711 | 0.792 | 0.699 | |
QNLI | Benign | 1.000 | 0.850 | - | - | 1.000 | 0.847 | - | - |
RAP | 0.884 | 0.812 | - | - | 0.876 | 0.805 | - | - | |
MRPC | Benign | 1.000 | 0.684 | 0.916 | 0.676 | 0.996 | 0.682 | 0.941 | 0.698 |
RAP | 0.835 | 0.652 | 0.778 | 0.639 | 0.827 | 0.644 | 0.796 | 0.641 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, Y.; Xu, W.; Zhang, S.; Xu, Y. Backdoor Attack Against Dataset Distillation in Natural Language Processing. Appl. Sci. 2024, 14, 11425. https://doi.org/10.3390/app142311425
Chen Y, Xu W, Zhang S, Xu Y. Backdoor Attack Against Dataset Distillation in Natural Language Processing. Applied Sciences. 2024; 14(23):11425. https://doi.org/10.3390/app142311425
Chicago/Turabian StyleChen, Yuhao, Weida Xu, Sicong Zhang, and Yang Xu. 2024. "Backdoor Attack Against Dataset Distillation in Natural Language Processing" Applied Sciences 14, no. 23: 11425. https://doi.org/10.3390/app142311425
APA StyleChen, Y., Xu, W., Zhang, S., & Xu, Y. (2024). Backdoor Attack Against Dataset Distillation in Natural Language Processing. Applied Sciences, 14(23), 11425. https://doi.org/10.3390/app142311425