BATG: A Backdoor Attack Method Based on Trigger Generation
Abstract
:1. Introduction
- A novel Backdoor Attack method based on Trigger Generation called BATG is proposed, wherein the trigger can be automatically generated for launching an effective and invisible backdoor attack.
- A noise layer is applied in the proposed BATG to further enhance the robustness against real-world image compression.
- Comprehensive experiments are conducted. The results show that the proposed BATG achieves a great attack performance under different image compressions.
2. Related Works
2.1. Backdoor Attack
2.2. Backdoor Defenses
2.3. Image Steganography
3. Methodology
3.1. Attack Requirements and Goals
- Effectiveness: the poisoned images containing the specified trigger image should be misclassified into the target label with a high probability.
- Invisibility: the poisoned images should appear identical to the original clean images in order to avoid detection by human observers during the inference phase.
- Robustness: the attack should still be effective when the poisoned images are processed by existing preprocessing operations, especially JPEG and WEBP image compression.
3.2. The Proposed BATG
3.2.1. The Overview of BATG
3.2.2. The Structure of Trigger Generation Model
3.2.3. The Structure of Trigger Injection Model
3.2.4. Loss Function
- Since an INN acts as the trigger injection model, its training loss is formulated as
- As for the trigger generation model, its training loss is formulated as
4. Experiment
4.1. Experimental Setups
- ISSBA utilizes a convolutional model [44] as the trigger injection model, whereas BATG utilizes an INN.
- ISSBA uses specified strings as triggers, while BATG uses generated trigger images.
4.2. Attack Performance Under Image Compression
- The proposed BATG outperforms the state-of-the-art robust backdoor attack method ISSBA, as shown in Table 1, Table 2, Table 3 and Table 4. In the case of targeting the victim model ResNet-18 with a poisoning rate of 10%, the improvement in the ASR is 0.60%, 0.30%, and 0.23% for the setups of default, JPEG, and WEBP. In the case of targeting the victim model ResNet-18 with a poisoning rate of 1%, the improvement in the ASR is 6.16%, 4.58%, and 0.28% for the setups of default, JPEG, and WEBP. These results demonstrate that the proposed BATG is an effective and robust backdoor attack against different image compressions.
- The proposed BATG is also effective when targeting another victim model VGG-19, as shown in Table 3 and Table 4, despite the fixed surrogate victim model (ResNet-18). In the case of targeting the victim model VGG-19 with a poisoning rate of 10%, the ASR also reaches high values of 99.89%, 99.01%, and 98.21% for the setups of default, JPEG, and WEBP. These results demonstrate that the trigger images generated by the trigger generation model are both general and effective.
- The proposed BATG still performs well even with a very low poisoning rate (1%), as shown in Table 2 and Table 4, despite using a high poisoning rate (10%) during the training process of the trigger generation model. In the case of targeting the victim model ResNet-18 with a poisoning rate of 1%, the ASR also reaches high values of 99.69%, 98.21%, and 96.06% for the setups of default, JPEG, and WEBP. These results demonstrate that the proposed BATG is also effective with a low poisoning rate.
4.3. Visual Quality of Poisoned Images
- For subjective evaluation, the difference between the clean image and the poisoned image in the proposed BATG is rather weak. It can be subjectively observed that the proposed BATG is more imperceptible to human observers compared to other methods.
- For objective evaluation, the PSNR values for BadNets, ISSBA, and BATG, respectively, are 25.63, 27.19, and 35.38. The higher the PSNR value, the smaller the difference between the clean image and the poisoned image. It can be objectively observed that the proposed BATG exhibits greater stealth compared to other methods.
4.4. Ablation Study
- By comparing Variant I and II, Variant III and IV, and Variant V with the baseline BATG, it can be observed that methods with the noise layer outperform their counterparts without the noise layer. These results demonstrate that the noise layer JPEG_Mask is essential to the proposed BATG.
- By comparing Variant I, Variant III, and the baseline BATG, it can be observed that the attack performance of backdoor attacks varies greatly when different images are used as triggers. These results demonstrate that the trigger generation model in the design of the proposed BATG is necessary and the generated triggers are the most effective among all the tested triggers.
- By comparing Variant VI with the baseline BATG, it can be observed that the backdoor attack performance of Variant VI for the setups of JPEG is very poor. These results demonstrate that the experimental setups of the proposed BATG are optimal and the trigger injection model should not be over-trained.
- By comparing Variant III and V, it can be observed that the trigger generation model contributes more significantly to BATG’s performance than the noise layer. These results demonstrate that the trigger generation method is more effective than the conventional adversarial training technique.
4.5. Resistance to Backdoor Defense
- Grad-CAM visualizes the attention map of suspicious images and regards the area with the highest score as the trigger region and then removes this region and restores it with image inpainting techniques. As shown in Figure 5, the heat map generated by Grad-CAM is similar for both the clean image and poisoned image so that the proposed BATG cannot be detected by Grad-CAM.
- Fine-Pruning prunes neurons according to their average activation values to mitigate backdoor behaviors. As shown in Figure 6, the ASR of the proposed BATG only decreases slightly with the increase in the fraction of pruned neurons so that the proposed BATG is resistant to the pruning-based defense.
4.6. Generalization Test
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 6, 84–90. [Google Scholar] [CrossRef]
- Graves, A.; Mohamed, A.-r.; Hinton, G.E. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada, 26–31 May 2013; pp. 6645–6649. [Google Scholar]
- Wenger, E.; Passananti, J.; Bhagoji, A.N.; Yao, Y.; Zheng, H.; Zhao, B.Y. Backdoor attacks against deep learning systems in the physical world. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 6206–6215. [Google Scholar]
- Chen, J.; Teo, T.H.; Kok, C.L.; Koh, Y.Y. A Novel Single-Word Speech Recognition on Embedded Systems Using a Convolution Neuron Network with Improved Out-of-Distribution Detection. Electronics 2024, 13, 530. [Google Scholar] [CrossRef]
- Jiang, W.; Li, H.; Liu, S.; Luo, X.; Lu, R. Poisoning and evasion attacks against deep learning algorithms in autonomous vehicles. IEEE Trans. Veh. Technol. 2020, 69, 4439–4449. [Google Scholar] [CrossRef]
- Gu, T.; Liu, K.; Dolan-Gavitt, B.; Garg, S. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access 2019, 7, 47230–47244. [Google Scholar] [CrossRef]
- Chen, X.; Liu, C.; Li, B.; Song, D.; Lu, K. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv 2017, arXiv:1712.05526. [Google Scholar]
- Zhong, H.; Liao, C.; Squicciarini, A.C.; Zhu, S.; Miler, D. Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation. In Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy, New Orleans, LA, USA, 16–18 March 2020; pp. 97–108. [Google Scholar]
- Li, S.; Xue, M.; Zhao, B.Z.H.; Zhang, X. Invisible Backdoor Attacks on Deep Neural Networks via Steganography and Regularization. IEEE Trans. Dependable Secur. Comput. 2021, 18, 2088–2105. [Google Scholar] [CrossRef]
- Nguyen, A.; Tran, A. WaNet—Imperceptible Warping-based Backdoor Attack. arXiv 2021, arXiv:2102.1036. [Google Scholar]
- Turner, A.; Tsipras, D.; Madry, A. Label-Consistent Backdoor Attacks. arXiv 2019, arXiv:1912.02771. [Google Scholar]
- Li, Y.; Li, Y.; Wu, B.; Li, L.; He, R.; Lyu, S. Invisible backdoor attack with samplespecific triggers. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 16463–16472. [Google Scholar]
- Zhang, J.; Chen, D.; Huang, Q.; Zhang, W.; Feng, H.; Liao, J. Poison Ink: Robust and Invisible Backdoor Attack. IEEE Trans. Image Process. 2022, 31, 5691–5705. [Google Scholar] [CrossRef]
- Jiang, W.; Li, H.; Xu, G.; Zhang, T. Color Backdoor: A Robust Poisoning Attack in Color Space. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 8133–8142. [Google Scholar]
- Yu, Y.; Wang, Y.; Yang, W.; Lu, S.; Tan, Y.-P.; Kot, A.C. Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 12250–12259. [Google Scholar]
- Liu, K.; Dolan-Gavitt, B.; Garg, S. Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks. arXiv 2018, arXiv:1805.12185. [Google Scholar]
- Wang, B.; Yao, Y.; Shan, S.; Viswanath, B.; Zheng, H. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; pp. 707–723. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the 2017 IEEE/CVF International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Gao, Y.; Xu, C.; Wang, D.; Chen, S.; Ranasinghe, D.C.; Nepal, S. Strip: A defence against trojan attacks on deep neural networks. In Proceedings of the 35th Annual Computer Security Applications Conference, San Juan, PR, USA, 9–13 December 2019; pp. 113–125. [Google Scholar]
- Tran, B.; Li, J.; Madry, A. Spectral signatures in backdoor attacks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 3–8 December 2018; pp. 8011–8021. [Google Scholar]
- Xue, M.; Wang, X.; Sun, S.; Zhang, Y.; Wang, J.; Liu, W. Compression-resistant backdoor attack against deep neural networks. arXiv 2022, arXiv:2201.00672. [Google Scholar] [CrossRef]
- Baluja, S. Hiding images in plain sight: Deep steganography. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 2066–2076. [Google Scholar]
- Wu, P.; Yang, Y.; Li, X. StegNet: Mega Image Steganography Capacity with Deep Convolutional Network. Future Internet 2018, 10, 54. [Google Scholar] [CrossRef]
- Duan, X.; Jia, K.; Li, B.; Guo, D.; Zhang, E.; Qin, C. Reversible Image Steganography Scheme Based on a U-Net Structure. IEEE Access 2019, 7, 9314–9323. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
- Zhu, J.; Kaplan, R.; Johnson, J.; Li, F.F. HiDDeN: Hiding Data With Deep Networks. In Proceedings of the Computer Vision—ECCV 2018: 15th European Conference, Munich, Germany, 8 September 2018; pp. 682–697. [Google Scholar]
- Tang, W.; Li, B.; Barni, M.; Li, J.; Huang, J. An Automatic Cost Learning Framework for Image Steganography Using Deep Reinforcement Learning. IEEE Trans. Inf. Forensics Secur. 2021, 16, 952–967. [Google Scholar] [CrossRef]
- Tang, W.; Li, B.; Barni, M.; Li, J.; Huang, J. Improving Cost Learning for JPEG Steganography by Exploiting JPEG Domain Knowledge. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 4081–4095. [Google Scholar] [CrossRef]
- Ying, Q.; Zhou, H.; Zeng, X.; Xu, H.; Qian, Z.; Zhang, X. Hiding Images Into Images with Real-World Robustness. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 25–28 October 2022; pp. 111–115. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Volume 27, pp. 2672–2680. [Google Scholar]
- Jing, J.; Deng, X.; Xu, M.; Wang, J.; Guan, Z. HiNet: Deep Image Hiding by Invertible Network. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 4713–4722. [Google Scholar]
- Dinh, L.; Krueger, D.; Yoshua Bengio, Y. NICE: Non-linear Independent Components Estimation. arXiv 2014, arXiv:1410.8516. [Google Scholar]
- Dinh, L.; Sohl-Dickstein, J.; Bengio, S. Density estimation using Real NVP. arXiv 2017, arXiv:1605.08803. [Google Scholar]
- Tang, W.; Zhou, Z.; Li, B.; Choo, K.-K.; Huang, J. Joint Cost Learning and Payload Allocation With Image-Wise Attention for Batch Steganography. IEEE Trans. Inf. Forensics Secur. 2024, 19, 2826–2839. [Google Scholar] [CrossRef]
- Cui, Q.; Tang, W.; Zhou, Z.; Meng, R.; Nan, G.; Shi, Y.-Q. Meta Security Metric Learning for Secure Deep Image Hiding. IEEE Trans. Dependable Secur. Comput. 2024, 21, 4907–4920. [Google Scholar] [CrossRef]
- Lin, J.; Xu, L.; Liu, Y.; Zhang, X. Composite backdoor attack for deep neural network by mixing existing benign features. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, 9–13 November 2020; pp. 113–131. [Google Scholar]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 7th International Conference on International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Loy, C.C. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the Computer Vision—ECCV 2018: 15th European Conference, Munich, Germany, 8 September 2018; pp. 63–79. [Google Scholar]
- Deng, J.; Wei, D.; Socher, R.; Li, F.F. ImageNet:A large scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 18 August 2009; pp. 248–255. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Tancik, M.; Mildenhall, B.; Ng, R. StegaStamp: Invisible Hyperlinks in Physical Photographs. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2114–2123. [Google Scholar]
- Huynh-Thu, Q.; Ghanbari, M. Scope of validity of PSNR in image/video quality assessment. Electron. Lett. 2008, 44, 800–801. [Google Scholar] [CrossRef]
- Mathias, M.; Timofte, R.; Benenson, R.; Van Gool, L. Traffic sign recognition—How far are we from the solution? In Proceedings of the 2013 International Joint Conference on Neural Networks (ILJCNN), Dallas, TX, USA, 4–9 August 2013; pp. 1–8. [Google Scholar]
Setups | Default | JPEG | WEBP | |||
---|---|---|---|---|---|---|
Metrics | CDA | ASR | CDA | ASR | CDA | ASR |
ISSBA | 84.98% | 99.37% | 83.53% | 98.87% | 83.27% | 98.79% |
BATG | 85.13% | 99.97% | 83.81% | 99.17% | 83.43% | 99.02% |
Setups | Default | JPEG | WEBP | |||
---|---|---|---|---|---|---|
Metrics | CDA | ASR | CDA | ASR | CDA | ASR |
ISSBA | 85.15% | 93.53% | 83.22% | 93.63% | 82.26% | 95.78% |
BATG | 85.42% | 99.69% | 83.31% | 98.21% | 82.98% | 96.06% |
Setups | Default | JPEG | WEBP | |||
---|---|---|---|---|---|---|
Metrics | CDA | ASR | CDA | ASR | CDA | ASR |
ISSBA | 88.45% | 98.42% | 86.01% | 98.50% | 86.34% | 97.82% |
BATG | 88.36% | 99.89% | 86.06% | 99.01% | 86.18% | 98.21% |
Setups | Default | JPEG | WEBP | |||
---|---|---|---|---|---|---|
Metrics | CDA | ASR | CDA | ASR | CDA | ASR |
ISSBA | 86.13% | 93.36% | 84.16% | 93.68% | 83.27% | 91.83% |
BATG | 88.08% | 98.21% | 83.41% | 95.95% | 83.92% | 94.58% |
Methods | CDA | ASR | PSNR |
---|---|---|---|
Variant I | 82.43% | 88.88% | 36.48 |
Variant II | 78.03% | 11.63% | 42.12 |
Variant III | 83.70% | 97.77% | 35.89 |
Variant IV | 82.91% | 91.98% | 41.80 |
Variant V | 83.88% | 98.35% | 35.09 |
Variant VI | 77.59% | 18.27% | 39.12 |
BATG | 83.81% | 99.17% | 35.38 |
Setups | Default | JPEG | WEBP | |||
---|---|---|---|---|---|---|
Methods_Datasets | CDA | ASR | CDA | ASR | CDA | ASR |
BATG_ImageNet | 85.13% | 99.97% | 83.81% | 99.17% | 83.42% | 99.02% |
BATG_BelgiumTS | 95.02% | 100.00% | 94.08% | 99.54% | 94.86% | 99.69% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tang, W.; Xie, H.; Rao, Y.; Long, M.; Qi, T.; Zhou, Z. BATG: A Backdoor Attack Method Based on Trigger Generation. Electronics 2024, 13, 5031. https://doi.org/10.3390/electronics13245031
Tang W, Xie H, Rao Y, Long M, Qi T, Zhou Z. BATG: A Backdoor Attack Method Based on Trigger Generation. Electronics. 2024; 13(24):5031. https://doi.org/10.3390/electronics13245031
Chicago/Turabian StyleTang, Weixuan, Haoke Xie, Yuan Rao, Min Long, Tao Qi, and Zhili Zhou. 2024. "BATG: A Backdoor Attack Method Based on Trigger Generation" Electronics 13, no. 24: 5031. https://doi.org/10.3390/electronics13245031
APA StyleTang, W., Xie, H., Rao, Y., Long, M., Qi, T., & Zhou, Z. (2024). BATG: A Backdoor Attack Method Based on Trigger Generation. Electronics, 13(24), 5031. https://doi.org/10.3390/electronics13245031