A Universal Detection Method for Adversarial Examples and Fake Images
Abstract
:1. Introduction
- Based on the difference in the distribution of model outputs between normal samples and adversarial examples (fake images), we propose a universal detection method for adversarial examples and fake images.
- We tested the detector’s performance using state-of-the-art generation algorithms of adversarial examples and fake images and proved the effectiveness of the detector.
- We tested the proposed method on different datasets and neural network structures and proved the generalizability of the detector.
2. Related Work
2.1. Adversarial Examples
2.1.1. Attack
2.1.2. Defense
2.2. Fake Images
2.2.1. Attack
2.2.2. Defense
3. Method
3.1. Overview
3.1.1. Observation
3.1.2. Framework
3.2. Detector Training
Algorithm 1 Detector Training Algorithm. |
|
3.3. Online Detection
Algorithm 2 Online Detection Algorithm. |
|
4. Experiments
4.1. Performance Experiments
4.1.1. Performance in Adversarial Example Detection
4.1.2. Performance in Fake Image Detection
4.2. Generalizability Experiments
4.2.1. Generalizability for Dataset
4.2.2. Generalizability for Target-Model Architecture
4.3. Transferability Experiments
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Le Cun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Wu, C.; Li, W. Enhancing intrusion detection with feature selection and neural network. Int. J. Intell. Syst. 2021, 36, 3087–3105. [Google Scholar] [CrossRef]
- Wang, X.; Liang, Z.; Koe, A.S.V.; Wu, Q.; Zhang, X.; Li, H.; Yang, Q. Secure and efficient parameters aggregation protocol for federated incremental learning and its applications. Int. J. Intell. Syst. 2021, 1–17. [Google Scholar] [CrossRef]
- Zhang, N.; Xue, J.; Ma, Y.; Zhang, R.; Liang, T.; Tan, Y.A. Hybrid sequence-based Android malware detection using natural language processing. Int. J. Intell. Syst. 2021, 36, 5770–5784. [Google Scholar] [CrossRef]
- Li, Y.; Yao, S.; Zhang, R.; Yang, C. Analyzing host security using D-S evidence theory and multisource information fusion. Int. J. Intell. Syst. 2021, 36, 1053–1068. [Google Scholar] [CrossRef]
- Wang, X.; Li, J.; Kuang, X.; Tan, Y.a.; Li, J. The security of machine learning in an adversarial setting: A survey. J. Parallel Distrib. Comput. 2019, 130, 12–23. [Google Scholar] [CrossRef]
- Mo, K.; Tang, W.; Li, J.; Yuan, X. Attacking Deep Reinforcement Learning with Decoupled Adversarial Policy. IEEE Trans. Dependable Secur. Comput. 2022. [Google Scholar] [CrossRef]
- Yan, H.; Hu, L.; Xiang, X.; Liu, Z.; Yuan, X. PPCL: Privacy-preserving collaborative learning for mitigating indirect information leakage. Inf. Sci. 2021, 548, 423–437. [Google Scholar] [CrossRef]
- Mo, K.; Liu, X.; Huang, T.; Yan, A. Querying little is enough: Model inversion attack via latent information. Int. J. Intell. Syst. 2021, 36, 681–690. [Google Scholar] [CrossRef]
- Ren, H.; Huang, T.; Yan, H. Adversarial examples: Attacks and defenses in the physical world. Int. J. Mach. Learn. Cybern. 2021, 12, 3325–3336. [Google Scholar] [CrossRef]
- Rao, Y.; Ni, J. Self-Supervised Domain Adaptation for Forgery Localization of JPEG Compressed Images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 15034–15043. [Google Scholar]
- Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. arXiv 2013, arXiv:1312.6199. [Google Scholar]
- Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
- Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017; pp. 39–57. [Google Scholar]
- Moosavi-Dezfooli, S.M.; Fawzi, A.; Frossard, P. Deepfool: A simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2574–2582. [Google Scholar]
- Guo, S.; Geng, S.; Xiang, T.; Liu, H.; Hou, R. ELAA: An efficient local adversarial attack using model interpreters. Int. J. Intell. Syst. 2021, 1–23. [Google Scholar] [CrossRef]
- Chen, H.; Lu, K.; Wang, X.; Li, J. Generating transferable adversarial examples based on perceptually-aligned perturbation. Int. J. Mach. Learn. Cybern. 2021, 12, 3295–3307. [Google Scholar] [CrossRef]
- Huang, T.; Chen, Y.; Yao, B.; Yang, B.; Wang, X.; Li, Y. Adversarial attacks on deep-learning-based radar range profile target recognition. Inf. Sci. 2020, 531, 159–176. [Google Scholar] [CrossRef]
- Huang, T.; Zhang, Q.; Liu, J.; Hou, R.; Wang, X.; Li, Y. Adversarial attacks on deep-learning-based SAR image target recognition. J. Netw. Comput. Appl. 2020, 162, 102632. [Google Scholar] [CrossRef]
- Chen, C.; Huang, T. Camdar-adv: Generating adversarial patches on 3D object. Int. J. Intell. Syst. 2021, 36, 1441–1453. [Google Scholar] [CrossRef]
- Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership inference attacks against machine learning models. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017; pp. 3–18. [Google Scholar]
- Li, Y.; Tang, T.; Hsieh, C.J.; Lee, T. Detecting Adversarial Examples with Bayesian Neural Network. arXiv 2021, arXiv:2105.08620. [Google Scholar]
- Kurakin, A.; Goodfellow, I.; Bengio, S. Adversarial examples in the physical world. arXiv 2016, arXiv:1607.02533. [Google Scholar]
- Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards deep learning models resistant to adversarial attacks. arXiv 2017, arXiv:1706.06083. [Google Scholar]
- Moosavi-Dezfooli, S.M.; Fawzi, A.; Fawzi, O.; Frossard, P. Universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1765–1773. [Google Scholar]
- Feinman, R.; Curtin, R.R.; Shintre, S.; Gardner, A.B. Detecting adversarial samples from artifacts. arXiv 2017, arXiv:1703.00410. [Google Scholar]
- Ma, X.; Li, B.; Wang, Y.; Erfani, S.M.; Wijewickrema, S.; Schoenebeck, G.; Song, D.; Houle, M.E.; Bailey, J. Characterizing adversarial subspaces using local intrinsic dimensionality. arXiv 2018, arXiv:1801.02613. [Google Scholar]
- Tian, S.; Yang, G.; Cai, Y. Detecting adversarial examples through image transformation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 4139–4146. [Google Scholar]
- Meng, D.; Chen, H. Magnet: A two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 135–147. [Google Scholar]
- Lu, J.; Issaranon, T.; Forsyth, D. Safetynet: Detecting and rejecting adversarial examples robustly. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 446–454. [Google Scholar]
- Aggarwal, A.; Mittal, M.; Battineni, G. Generative adversarial network: An overview of theory and applications. Int. J. Inf. Manag. Data Insights 2021, 1, 100004. [Google Scholar] [CrossRef]
- Tang, W.; Li, B.; Barni, M.; Li, J.; Huang, J. An automatic cost learning framework for image steganography using deep reinforcement learning. IEEE Trans. Inf. Forensics Secur. 2020, 16, 952–967. [Google Scholar] [CrossRef]
- Li, H.; Luo, W.; Qiu, X.; Huang, J. Image forgery localization via integrating tampering possibility maps. IEEE Trans. Inf. Forensics Secur. 2017, 12, 1240–1252. [Google Scholar] [CrossRef]
- Qiu, X.; Li, H.; Luo, W.; Huang, J. A universal image forensic strategy based on steganalytic model. In Proceedings of the 2nd ACM Workshop on Information Hiding and Multimedia Security, New York, NY, USA, 14 July 2014; pp. 165–170. [Google Scholar]
- Marra, F.; Gragnaniello, D.; Cozzolino, D.; Verdoliva, L. Detection of gan-generated fake images over social networks. In Proceedings of the 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), Miami, FL, USA, 10–12 April 2018; pp. 384–389. [Google Scholar]
- Nataraj, L.; Mohammed, T.M.; Manjunath, B.; Chandrasekaran, S.; Flenner, A.; Bappy, J.H.; Roy-Chowdhury, A.K. Detecting GAN generated fake images using co-occurrence matrices. Electron. Imaging 2019, 2019, 532-1–532-7. [Google Scholar] [CrossRef] [Green Version]
- Dang, H.; Liu, F.; Stehouwer, J.; Liu, X.; Jain, A.K. On the detection of digital face manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 5781–5790. [Google Scholar]
- Zhang, X.; Karaman, S.; Chang, S.F. Detecting and simulating artifacts in gan fake images. In Proceedings of the 2019 IEEE International Workshop on Information Forensics and Security (WIFS), Delft, The Netherlands, 9–12 December 2019; pp. 1–6. [Google Scholar]
- Wang, S.Y.; Wang, O.; Zhang, R.; Owens, A.; Efros, A.A. CNN-generated images are surprisingly easy to spot… for now. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 8695–8704. [Google Scholar]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
- Gong, X.; Chang, S.; Jiang, Y.; Wang, Z. Autogan: Neural architecture search for generative adversarial networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 3224–3234. [Google Scholar]
- Jiang, Y.; Chang, S.; Wang, Z. Transgan: Two pure transformers can make one strong gan, and that can scale up. In Proceedings of the Advances in Neural Information Processing Systems 34 (NeurIPS 2021), Virtual, 6–14 December 2021; Volume 34. [Google Scholar]
CIFAR10 | |||||||||
---|---|---|---|---|---|---|---|---|---|
FGSM | DeepFool | BIM | PGD | AutoPGD | UPA | NewtonFool | ZOO | C&W | |
FGSM | 0.905 | 0.911 | 0.796 | 0.781 | 0.757 | 0.817 | 0.873 | 0.869 | 0.875 |
DeepFool | 0.898 | 0.916 | 0.799 | 0.784 | 0.756 | 0.796 | 0.875 | 0.866 | 0.871 |
BIM | 0.884 | 0.887 | 0.833 | 0.811 | 0.776 | 0.802 | 0.871 | 0.865 | 0.872 |
PGD | 0.887 | 0.890 | 0.832 | 0.813 | 0.813 | 0.780 | 0.804 | 0.876 | 0.865 |
AutoPGD | 0.867 | 0.877 | 0.778 | 0.769 | 0.783 | 0.871 | 0.853 | 0.846 | 0.862 |
UPA | 0.860 | 0.872 | 0.744 | 0.715 | 0.880 | 0.880 | 0.835 | 0.826 | 0.837 |
NewtonFool | 0.903 | 0.907 | 0.817 | 0.803 | 0.774 | 0.829 | 0.889 | 0.877 | 0.895 |
ZOO | 0.902 | 0.906 | 0.820 | 0.803 | 0.774 | 0.827 | 0.888 | 0.879 | 0.892 |
C&W | 0.901 | 0.905 | 0.817 | 0.800 | 0.772 | 0.841 | 0.888 | 0.876 | 0.896 |
CIFAR100 | |||||||||
FGSM | DeepFool | BIM | PGD | AutoPGD | UPA | NewtonFool | ZOO | C&W | |
FGSM | 0.882 | 0.910 | 0.869 | 0.864 | 0.875 | 0.907 | 0.880 | 0.888 | 0.883 |
DeepFool | 0.871 | 0.922 | 0.856 | 0.855 | 0.854 | 0.910 | 0.874 | 0.870 | 0.871 |
BIM | 0.879 | 0.913 | 0.874 | 0.864 | 0.877 | 0.906 | 0.889 | 0.891 | 0.887 |
PGD | 0.877 | 0.903 | 0.871 | 0.873 | 0.876 | 0.894 | 0.885 | 0.890 | 0.890 |
AutoPGD | 0.878 | 0.908 | 0.869 | 0.864 | 0.880 | 0.902 | 0.888 | 0.888 | 0.887 |
UPA | 0.865 | 0.865 | 0.843 | 0.842 | 0.848 | 0.917 | 0.855 | 0.864 | 0.859 |
NewtonFool | 0.873 | 0.904 | 0.869 | 0.867 | 0.867 | 0.891 | 0.889 | 0.889 | 0.887 |
ZOO | 0.880 | 0.909 | 0.870 | 0.870 | 0.877 | 0.902 | 0.885 | 0.894 | 0.887 |
C&W | 0.879 | 0.879 | 0.873 | 0.871 | 0.877 | 0.895 | 0.886 | 0.891 | 0.892 |
CIFAR10 | ||||||||
---|---|---|---|---|---|---|---|---|
GAN | ACGAN | WGAN | WGAN_GP | WGAN_DIV | DCGAN | AutoGAN | TransGAN | |
GAN | 0.743 | 0.680 | 0.776 | 0.789 | 0.783 | 0.680 | 0.579 | 0.518 |
ACGAN | 0.495 | 0.855 | 0.798 | 0.892 | 0.874 | 0.827 | 0.624 | 0.562 |
WGAN | 0.548 | 0.775 | 0.830 | 0.885 | 0.875 | 0.758 | 0.602 | 0.555 |
WGAN_GP | 0.505 | 0.806 | 0.808 | 0.906 | 0.886 | 0.786 | 0.604 | 0.545 |
WGAN_DIV | 0.512 | 0.787 | 0.811 | 0.900 | 0.891 | 0.780 | 0.599 | 0.535 |
DCGAN | 0.502 | 0.844 | 0.795 | 0.878 | 0.868 | 0.839 | 0.637 | 0.554 |
AutoGAN | 0.551 | 0.787 | 0.778 | 0.813 | 0.796 | 0.785 | 0.647 | 0.585 |
TransGAN | 0.579 | 0.765 | 0.761 | 0.789 | 0.764 | 0.754 | 0.638 | 0.586 |
CIFAR100 | ||||||||
GAN | ACGAN | WGAN | WGAN_GP | WGAN_DIV | DCGAN | |||
GAN | 0.822 | 0.839 | 0.740 | 0.827 | 0.842 | 0.825 | ||
ACGAN | 0.716 | 0.877 | 0.711 | 0.866 | 0.877 | 0.809 | ||
WGAN | 0.786 | 0.837 | 0.768 | 0.845 | 0.853 | 0.818 | ||
WGAN_GP | 0.734 | 0.869 | 0.726 | 0.871 | 0.879 | 0.822 | ||
WGAN_DIV | 0.739 | 0.872 | 0.726 | 0.868 | 0.885 | 0.824 | ||
DCGAN | 0.762 | 0.863 | 0.747 | 0.866 | 0.877 | 0.847 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lai, J.; Huo, Y.; Hou, R.; Wang, X. A Universal Detection Method for Adversarial Examples and Fake Images. Sensors 2022, 22, 3445. https://doi.org/10.3390/s22093445
Lai J, Huo Y, Hou R, Wang X. A Universal Detection Method for Adversarial Examples and Fake Images. Sensors. 2022; 22(9):3445. https://doi.org/10.3390/s22093445
Chicago/Turabian StyleLai, Jiewei, Yantong Huo, Ruitao Hou, and Xianmin Wang. 2022. "A Universal Detection Method for Adversarial Examples and Fake Images" Sensors 22, no. 9: 3445. https://doi.org/10.3390/s22093445
APA StyleLai, J., Huo, Y., Hou, R., & Wang, X. (2022). A Universal Detection Method for Adversarial Examples and Fake Images. Sensors, 22(9), 3445. https://doi.org/10.3390/s22093445