Comparative Study of Adversarial Defenses: Adversarial Training and Regularization in Vision Transformers and CNNs
Abstract
:1. Introduction
- This paper provides novel insights into the relative strengths and vulnerabilities of convolutional neural networks (CNNs) and vision transformers (ViTs) under adversarial conditions by employing adversarial training and model regularization techniques;
- We explore the application of regularization-based strategies to enhance the adversarial robustness of vision transformers and convolutional neural networks. Our investigation not only proves the effectiveness of these strategies but also demonstrates their tangible benefits in fortifying ViTs against adversarial attacks;
- This study offers a detailed explanation of the underlying reasons why regularization techniques particularly benefit vision transformers in the context of adversarial robustness. Through this exploration, we contribute to a deeper understanding of the interaction between model architecture and regularization, highlighting its significance in developing more secure machine learning systems.
2. The Literature Review
3. Methodology
4. Experimental Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Correction Statement
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2012; Volume 25. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. arXiv 2014, arXiv:1409.4842. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar] [CrossRef]
- Raina, R.; Madhavan, A.; Ng, A.Y. Large-scale deep unsupervised learning using graphics processors. In Proceedings of the Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QB, Canada, 14–18 June 2009; Association for Computing Machinery: New York, NY, USA, 2009; pp. 873–880. [Google Scholar]
- Chetlur, S.; Woolley, C.; Vandermersch, P.; Cohen, J.; Tran, J.; Catanzaro, B.; Shelhamer, E. cuDNN: Efficient Primitives for Deep Learning. arXiv 2014, arXiv:1410.0759. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar] [CrossRef]
- O’Shea, K.; Nash, R. An Introduction to Convolutional Neural Networks. arXiv 2015, arXiv:1511.08458. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
- Sattarzadeh, S.; Sudhakar, M.; Lem, A.; Mehryar, S.; Plataniotis, K.N.; Jang, J.; Kim, H.; Jeong, Y.; Lee, S.; Bae, K. Explaining Convolutional Neural Networks through Attribution-Based Input Sampling and Block-Wise Feature. arXiv 2020, arXiv:2010.00672. [Google Scholar] [CrossRef]
- Lin, H.; Han, G.; Ma, J.; Huang, S.; Lin, X.; Chang, S.-F. Supervised Masked Knowledge Distillation for Few-Shot Transformers. arXiv 2023, arXiv:2303.15466. [Google Scholar] [CrossRef]
- Raghu, M.; Unterthiner, T.; Kornblith, S.; Zhang, C.; Dosovitskiy, A. Do Vision Transformers See Like Convolutional Neural Networks? arXiv 2022, arXiv:2108.08810. [Google Scholar] [CrossRef]
- Shi, R.; Li, T.; Zhang, L.; Yamaguchi, Y. Visualization Comparison of Vision Transformers and Convolutional Neural Networks. IEEE Trans. Multimed. 2023, 26, 2327–2339. [Google Scholar] [CrossRef]
- Sultana, M.; Naseer, M.; Khan, M.H.; Khan, S.; Khan, F.S. Self-Distilled Vision Transformer for Domain Generalization. arXiv 2022, arXiv:2207.12392. [Google Scholar] [CrossRef]
- Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. arXiv 2014, arXiv:1312.6199. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2015, arXiv:1412.6572. [Google Scholar]
- Carlini, N.; Wagner, D. Towards Evaluating the Robustness of Neural Networks. arXiv 2017, arXiv:1608.04644. [Google Scholar]
- Bai, T.; Luo, J.; Zhao, J.; Wen, B.; Wang, Q. Recent Advances in Adversarial Training for Adversarial Robustness. arXiv 2021, arXiv:2102.01356. [Google Scholar] [CrossRef]
- Wang, Z.; Li, X.; Zhu, H.; Xie, C. Revisiting Adversarial Training at Scale. arXiv 2024, arXiv:2401.04727. [Google Scholar]
- Aldahdooh, A.; Hamidouche, W.; Deforges, O. Reveal of Vision Transformers Robustness against Adversarial Attacks. arXiv 2021, arXiv:2106.03734. [Google Scholar] [CrossRef]
- Bhojanapalli, S.; Chakrabarti, A.; Glasner, D.; Li, D.; Unterthiner, T.; Veit, A. Understanding Robustness of Transformers for Image Classification. arXiv 2021, arXiv:2103.14586. [Google Scholar] [CrossRef]
- Mahmood, K.; Mahmood, R.; van Dijk, M. On the Robustness of Vision Transformers to Adversarial Examples. arXiv 2021, arXiv:2104.02610. [Google Scholar] [CrossRef]
- Mo, Y.; Wu, D.; Wang, Y.; Guo, Y.; Wang, Y. When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture. arXiv 2022, arXiv:2210.07540. [Google Scholar]
- Naseer, M.; Ranasinghe, K.; Khan, S.; Khan, F.S.; Porikli, F. On Improving Adversarial Transferability of Vision Transformers. arXiv 2022, arXiv:2106.04169. [Google Scholar] [CrossRef]
- Shao, R.; Shi, Z.; Yi, J.; Chen, P.-Y.; Hsieh, C.-J. On the Adversarial Robustness of Vision Transformers. arXiv 2022, arXiv:2103.15670. [Google Scholar] [CrossRef]
- Shi, Y.; Han, Y.; Tan, Y.; Kuang, X. Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal. arXiv 2022, arXiv:2112.03492. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, J.; Yin, Z.; Gong, R.; Wang, J.; Liu, A.; Liu, X. Generating Transferable Adversarial Examples against Vision Transformers. In Proceedings of the Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 5181–5190. [Google Scholar]
- Wei, Z.; Chen, J.; Goldblum, M.; Wu, Z.; Goldstein, T.; Jiang, Y.-G. Towards Transferable Adversarial Attacks on Vision Transformers. arXiv 2022, arXiv:2109.04176. [Google Scholar] [CrossRef]
- Zhang, J.; Huang, Y.; Wu, W.; Lyu, M.R. Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization. arXiv 2023, arXiv:2303.15754. [Google Scholar] [CrossRef]
- Li, Z.; Yang, W.; Peng, S.; Liu, F. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. arXiv 2020, arXiv:2004.02806. [Google Scholar] [CrossRef]
- Younesi, A.; Ansari, M.; Fazli, M.; Ejlali, A.; Shafique, M.; Henkel, J. A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends. arXiv 2024, arXiv:2402.15490. [Google Scholar] [CrossRef]
- Mokayed, H.; Quan, T.Z.; Alkhaled, L.; Sivakumar, V. Real-Time Human Detection and Counting System Using Deep Learning Computer Vision Techniques. Artif. Intell. Appl. 2023, 1, 221–229. [Google Scholar] [CrossRef]
- Chen, H.; Long, H.; Chen, T.; Song, Y.; Chen, H.; Zhou, X.; Deng, W. M3FuNet: An Unsupervised Multivariate Feature Fusion Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5513015. [Google Scholar] [CrossRef]
- Bhosle, K.; Musande, V. Evaluation of Deep Learning CNN Model for Recognition of Devanagari Digit. Artif. Intell. Appl. 2023, 1, 114–118. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
- Lin, T.; Wang, Y.; Liu, X.; Qiu, X. A Survey of Transformers. arXiv 2021, arXiv:2106.04554. [Google Scholar] [CrossRef]
- Islam, S.; Elmekki, H.; Elsebai, A.; Bentahar, J.; Drawel, N.; Rjoub, G.; Pedrycz, W. A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks. arXiv 2023, arXiv:2306.07303. [Google Scholar] [CrossRef]
- Papa, L.; Russo, P.; Amerini, I.; Zhou, L. A Survey on Efficient Vision Transformers: Algorithms, Techniques, and Performance Benchmarking. arXiv 2023, arXiv:2309.02031. [Google Scholar] [CrossRef] [PubMed]
- Nauen, T.C.; Palacio, S.; Dengel, A. Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers. arXiv 2023, arXiv:2308.09372. [Google Scholar] [CrossRef]
- Guo, C.; Sablayrolles, A.; Jégou, H.; Kiela, D. Gradient-based Adversarial Attacks against Text Transformers. arXiv 2021, arXiv:2104.13733. [Google Scholar] [CrossRef]
- Wang, X.; Wang, H.; Yang, D. Measure and Improve Robustness in NLP Models: A Survey. arXiv 2022, arXiv:2112.08313. [Google Scholar] [CrossRef]
- Chen, G.; Zhao, Z.; Song, F.; Chen, S.; Fan, L.; Wang, F.; Wang, J. Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition. arXiv 2022, arXiv:2206.03393. [Google Scholar] [CrossRef]
- Chen, E.-C.; Lee, C.-R. Towards Fast and Robust Adversarial Training for Image Classification. In Computer Vision—ACCV 2020; Ishikawa, H., Liu, C.-L., Pajdla, T., Shi, J., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2021; Volume 12624, pp. 576–591. ISBN 978-3-030-69534-7. [Google Scholar]
- Yoo, J.Y.; Qi, Y. Towards Improving Adversarial Training of NLP Models. arXiv 2021, arXiv:2109.00544. [Google Scholar] [CrossRef]
- Zhang, H.; Chen, H.; Song, Z.; Boning, D.; Dhillon, I.S.; Hsieh, C.-J. The Limitations of Adversarial Training and the Blind-Spot Attack. arXiv 2019, arXiv:1901.04684. [Google Scholar] [CrossRef]
- Gowal, S.; Qin, C.; Uesato, J.; Mann, T.; Kohli, P. Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples. arXiv 2021, arXiv:2010.03593. [Google Scholar] [CrossRef]
- Ma, A.; Faghri, F.; Papernot, N.; Farahmand, A. SOAR: Second-Order Adversarial Regularization. arXiv 2020, arXiv:2004.01832. [Google Scholar] [CrossRef]
- Tack, J.; Yu, S.; Jeong, J.; Kim, M.; Hwang, S.J.; Shin, J. Consistency Regularization for Adversarial Robustness. arXiv 2021, arXiv:2103.04623. [Google Scholar] [CrossRef]
- Yang, D.; Kong, I.; Kim, Y. Improving Adversarial Robustness by Putting More Regularizations on Less Robust Samples. arXiv 2023, arXiv:2206.03353. [Google Scholar] [CrossRef]
- Guo, J.; Liu, Z.; Tian, S.; Huang, F.; Li, J.; Li, X.; Igorevich, K.K.; Ma, J. TFL-DT: A Trust Evaluation Scheme for Federated Learning in Digital Twin for Mobile Networks. IEEE J. Sel. Areas Commun. 2023, 41, 3548–3560. [Google Scholar] [CrossRef]
- Sun, H.; Chen, M.; Weng, J.; Liu, Z.; Geng, G. Anomaly Detection for In-Vehicle Network Using CNN-LSTM With Attention Mechanism. IEEE Trans. Veh. Technol. 2021, 70, 10880–10893. [Google Scholar] [CrossRef]
- Guo, J.; Li, X.; Liu, Z.; Ma, J.; Yang, C.; Zhang, J.; Wu, D. TROVE: A Context Awareness Trust Model for VANETs Using Reinforcement Learning. IEEE Internet Things J. 2020, 7, 6647–6662. [Google Scholar] [CrossRef]
- Ng, A.Y. Feature Selection, L1 vs. L2 Regularization, and Rotational Invariance. In Proceedings of the Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; Association for Computing Machinery: New York, NY, USA, 2004; p. 78. [Google Scholar]
- Zou, H.; Hastie, T. Regularization and Variable Selection Via the Elastic Net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Kukačka, J.; Golkov, V.; Cremers, D. Regularization for Deep Learning: A Taxonomy. arXiv 2017, arXiv:1710.10686. [Google Scholar] [CrossRef]
- Moradi, R.; Berangi, R.; Minaei, B. A Survey of Regularization Strategies for Deep Models. Artif. Intell. Rev. 2020, 53, 3947–3986. [Google Scholar] [CrossRef]
- Kotsilieris, T.; Anagnostopoulos, I.; Livieris, I.E. Special Issue: Regularization Techniques for Machine Learning and Their Applications. Electronics 2022, 11, 521. [Google Scholar] [CrossRef]
- Sánchez García, J.; Cruz Rambaud, S. Machine Learning Regularization Methods in High-Dimensional Monetary and Financial VARs. Mathematics 2022, 10, 877. [Google Scholar] [CrossRef]
- Maurício, J.; Domingues, I.; Bernardino, J. Comparing Vision Transformers and Convolutional Neural Networks for Image Classification: A Literature Review. Appl. Sci. 2023, 13, 5521. [Google Scholar] [CrossRef]
- Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv 2019, arXiv:1706.06083. [Google Scholar]
- Dong, Y.; Liao, F.; Pang, T.; Su, H.; Zhu, J.; Hu, X.; Li, J. Boosting Adversarial Attacks with Momentum. arXiv 2018, arXiv:1710.06081. [Google Scholar] [CrossRef]
- Papernot, N.; Faghri, F.; Carlini, N.; Goodfellow, I.; Feinman, R.; Kurakin, A.; Xie, C.; Sharma, Y.; Brown, T.; Roy, A.; et al. Technical Report on the CleverHans v2.1.0 Adversarial Examples Library. arXiv 2018, arXiv:1610.00768. [Google Scholar] [CrossRef]
- Yuan, L.; Hou, Q.; Jiang, Z.; Feng, J.; Yan, S. VOLO: Vision Outlooker for Visual Recognition. arXiv 2021, arXiv:2106.13112. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Peng, H.; Zheng, N.; Yang, Y.; Hu, H.; Yuan, Y. EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention. arXiv 2023, arXiv:2305.07027. [Google Scholar] [CrossRef]
- Vasu, P.K.A.; Gabriel, J.; Zhu, J.; Tuzel, O.; Ranjan, A. FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization. arXiv 2023, arXiv:2303.14189. [Google Scholar] [CrossRef]
- Wu, K.; Zhang, J.; Peng, H.; Liu, M.; Xiao, B.; Fu, J.; Yuan, L. TinyViT: Fast Pretraining Distillation for Small Vision Transformers. arXiv 2022, arXiv:2207.10666. [Google Scholar] [CrossRef]
Model | Distance between Cluster Centroids | Adversarial Accuracy | ||
---|---|---|---|---|
) | ) | ) | ||
CNN | 0.15 | 0.13 | 0.10 | 27.8% |
Regularized CNN | 0.18 | 0.15 | 0.11 | 66.7% |
ViT | 1.3 | 1.02 | 1.05 | 66.7% |
Regularized ViT | 2.4 | 2.2 | 1.4 | 83.3% |
Path Size | Number of Heads | Transformer Layers | Average Accuracy |
---|---|---|---|
6 | 4 | 26 | 80.1% |
12 | 4 | 26 | 68.8% |
24 | 4 | 26 | 77.9% |
6 | 8 | 26 | 81.2% |
6 | 16 | 26 | 74.6% |
6 | 4 | 36 | 80.5% |
6 | 4 | 72 | 79.7% |
Model Architecture | Adversarial Defense | Clean Data Accuracy 1 | Average Adversarial Accuracy 2 |
---|---|---|---|
CNN | Clean Model 3 | 74.2% | 73.0% |
Adversarial training (α ≤ 1.4) | 61.2% | 76.5% | |
) | 72.5% | 71.0% | |
) | 74.7% | 73.8% | |
) | 71.1% | 69.3% | |
Vision Transformer | Clean Model 3 | 83.0% | 80.1% |
Adversarial training (α ≤ 1.4) | 83.4% | 81.9% | |
) | 84.0% | 82.2% | |
) | 83.6% | 81.2% | |
) | 81.9% | 79.5% |
Model Architecture | Adversarial Defense | Clean Data Accuracy 1 | Average Adversarial Accuracy 2 |
---|---|---|---|
CNN | Clean Model 3 | 55.0% | 39.5% |
Adversarial training (α ≤ 1.4) | 42.8% | 42.2% | |
) | 35.6% | 35.6% | |
) | 34.9% | 35.4% | |
) | 34.0% | 33.9% | |
Vision Transformer | Clean Model 3 | 53.9% | 49.9% |
Adversarial training (α ≤ 1.4) | 45.7% | 51.7% | |
) | 34.9% | 35.3% | |
) | 54.6% | 52.8% | |
) | 49.7% | 47.7% |
Model | CIFAR-10 | CIFAR-100 | ||
---|---|---|---|---|
Clean Data Accuracy 1 | Average Adversarial Accuracy 2 | Clean Data Accuracy 1 | Average Adversarial Accuracy 2 | |
ResNet-50 [4] | 76.5% | 74.0% | 40.4% | 40.4% |
VOLO [64] | 74.9% | 74.3% | 42.1% | 41.7% |
EfficientViT [65] | 75.0% | 75.2% | 40.1% | 39.5% |
FastViT [66] | 80.0% | 80.9% | 52.0% | 51.7% |
TinyViT [67] | 80.0% | 80.3% | 49.9% | 49.1% |
Regularized CNN (ours) | 74.7% | 73.8% | 35.6% | 35.6% |
Regularized ViT (ours) | 84.0% | 82.2% | 54.6% | 52.8% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dingeto, H.; Kim, J. Comparative Study of Adversarial Defenses: Adversarial Training and Regularization in Vision Transformers and CNNs. Electronics 2024, 13, 2534. https://doi.org/10.3390/electronics13132534
Dingeto H, Kim J. Comparative Study of Adversarial Defenses: Adversarial Training and Regularization in Vision Transformers and CNNs. Electronics. 2024; 13(13):2534. https://doi.org/10.3390/electronics13132534
Chicago/Turabian StyleDingeto, Hiskias, and Juntae Kim. 2024. "Comparative Study of Adversarial Defenses: Adversarial Training and Regularization in Vision Transformers and CNNs" Electronics 13, no. 13: 2534. https://doi.org/10.3390/electronics13132534
APA StyleDingeto, H., & Kim, J. (2024). Comparative Study of Adversarial Defenses: Adversarial Training and Regularization in Vision Transformers and CNNs. Electronics, 13(13), 2534. https://doi.org/10.3390/electronics13132534