Multi-Scenario Remote Sensing Image Forgery Detection Based on Transformer and Model Fusion
Abstract
:1. Introduction
- (1)
- We transform a remote sensing image forgery detection task into a binary classification task that focuses on global information. To build high-precision forgery detection networks, we explore many excellent feature extraction networks combined with a global average pooling operation and fully connected layers. Three high-performance Transformer-based networks are selected for fusion.
- (2)
- Considering the small number of samples, we use the public ImageNet-1K dataset to pre-train the network to learn more stable feature expressions. At the same time, a circular data divide strategy is proposed, which can fully utilize all the samples to improve the accuracy in the competition.
- (3)
- To promote network optimization, on the one hand, we explore several loss functions and select label smooth loss, which helps reduce the model’s excessive dependence on training data. On the other hand, we construct a combined learning rate optimization strategy that first uses step degeneration and then cosine annealing, which reduces the risk of the network falling into local optima.
2. Methods
2.1. Proposed Scheme
2.2. High-Performance Forgery Detection Network Architecture
2.3. Model Fusion
2.4. Circular Data Divide Strategy
2.5. Loss Function and Optimization Strategy
3. Experiment
3.1. Dataset
3.2. Experimental Settings
3.3. Model Selection
3.4. Performance Verification of Combined Learning Rate Optimization Strategy
3.5. Performance Verification of Pre-Trained Weights
3.6. Selection of Loss Function
3.7. Performance Verification of Model Fusion
3.8. Performance Verification of Circular Data Divide Strategy
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Benedek, C.; Descombes, X.; Zerubia, J. Building Development Monitoring in Multitemporal Remotely Sensed Image Pairs with Stochastic Birth-Death Dynamics. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 33–50. [Google Scholar] [CrossRef] [PubMed]
- Yu, C.; Liu, Y.; Zhao, J.; Wu, S.; Hu, Z. Feature Interaction Learning Network for Cross-Spectral Image Patch Matching. IEEE Trans. Image Process. 2023, 32, 5564–5579. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Cheng, P.; Duan, S.; Chen, K.; Wang, Z.; Li, X.; Sun, X. DCP-Net: A Distributed Collaborative Perception Network for Remote Sensing Semantic Segmentation. Remote Sens. 2024, 16, 2504. [Google Scholar] [CrossRef]
- Guo, X.; Liu, X.; Ren, Z.; Grosz, S.; Masi, I.; Liu, X. Hierarchical Fine-Grained Image Forgery Detection and Localization. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 3155–3165. [Google Scholar]
- Guillaro, F.; Cozzolino, D.; Sud, A.; Dufour, N.; Verdoliva, L. TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 20606–20615. [Google Scholar]
- Liu, J.; Xie, J.; Wang, Y.; Zha, Z. Adaptive Texture and Spectrum Clue Mining for Generalizable Face Forgery Detection. IEEE Trans. Inf. Forensics Secur. 2024, 19, 1922–1934. [Google Scholar] [CrossRef]
- Zhu, J.; Park, T.; Isola, P.; Efros, A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Durall, R.; Keuper, M.; Pfreundt, F.; Keuper, J. Unmasking DeepFakes with simple Features. arXiv 2020, arXiv:1911.00686. [Google Scholar]
- Guo, Z.; Yang, G.; Chen, J.; Sun, X. Fake face detection via adaptive manipulation traces extraction network. Comput. Vis. Image Und. 2021, 204, 103170. [Google Scholar] [CrossRef]
- Yu, N.; Davis, L.; Fritz, M. Attributing Fake Images to GANs: Learning and Analyzing GAN Fingerprints. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7555–7565. [Google Scholar]
- .Ciftci, U.; Demir, I.; Yin, L. FakeCatcher: Detection of Synthetic Portrait Videos using Biological Signals. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 1939–3539. [Google Scholar] [CrossRef] [PubMed]
- Mittal, T.; Bhattacharya, U.; Chandra, R.; Bera, A.; Manocha, D. Emotions Don’t Lie: An Audio-Visual Deepfake Detection Method using Affective Cues. In Proceedings of the 2020 ACM International Conference on Multimedia (MM), Electr Network, Seattle, WA, USA, 12–16 October 2020; pp. 2823–2832. [Google Scholar]
- Dang, H.; Liu, F.; Stehouwer, J.; Liu, X.; Jain, A. On the detection of digital face manipulation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14 June 2020; pp. 5781–5790. [Google Scholar]
- Ding, X.; Raziei, Z.; Larson, E.; Olinick, E.; Krueger, P.; Hahsler, M. Swapped face detection using deep learning and subjective assessment. Eurasip J. Inf. Secur. 2020, 2020, 6. [Google Scholar] [CrossRef]
- Wang, C.; Deng, W. Representative forgery mining for fake face detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Electr Network, Virtual, 19–25 June 2021; pp. 14923–14932. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Zhao, B.; Zhang, S.; Xu, C.; Sun, Y.; Deng, C. Deep fake geography? When geospatial data encounter artificial intelligence. Cartogr. Geogr. Inf. Sci. 2021, 48, 338–352. [Google Scholar] [CrossRef]
- Fezza, S.; Ouis, M.; Kaddar, B.; Hamidouche, W.; Hadid, A. Evaluation of pre-trained CNN models for geographic fake image detection. In Proceedings of the 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), Shanghai, China, 26–28 September 2022; pp. 1–6. [Google Scholar]
- Yarlagadda, S.; Guera, D.; Bestagini, P.; Zhu, F.; Tubaro, S.; Delp, E. Satellite image forgery detection and localization using GAN and One-Class classifier. IS&T Int. Symp. Electron. Imaging 2018, 7, 214-1–214-9. [Google Scholar]
- Horváth, J.; Xiang, Z.; Cannas, E.; Bestagini, P.; Tubaro, S.; Delp, E. Sat U-Net: A fusion based method for forensic splicing localization in satellite images. In Proceedings of the Multimodal Image Exploitation and Learning, Orlando, FL, USA, 3 April–12 June 2022; p. 1210002. [Google Scholar]
- Hearst, M.; Dumais, S.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. 1998, 13, 18–28. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Yu, C.; Zhao, J.; Liu, Y.; Wu, S.; Li, C. Efficient Feature Relation Learning Network for Cross-Spectral Image Patch Matching. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–17. [Google Scholar] [CrossRef]
- Zhao, J.; Yu, C.; Shi, Z.; Liu, Y.; Zhang, Y. Gradient-Guided Learning Network for Infrared Small Target Detection. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
- Yu, C.; Liu, Y.; Wu, S.; Xia, X.; Hu, Z.; Lan, D.; Liu, X. Pay Attention to Local Contrast Learning Networks for Infrared Small Target Detection. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 09–15 June 2019; pp. 6105–6114. [Google Scholar]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K. Densely connected convolutional networks. In Proceedings of the 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
- Ding, X.; Zhang, X.; Han, J.; Ding, G. Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11953–11965. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
- Rao, Y.; Zhao, W.; Tang, Y.; Zhou, J.; Lim, S.; Lu, J. Hornet: Efficient high-order spatial interactions with recursive gated convolutions. arXiv 2022, arXiv:2207.14284. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 2017 Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 1049–5258. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar]
- Wu, K.; Zhang, J.; Peng, H.; Liu, M.; Xiao, B.; Fu, J.; Yuan, L. Tinyvit: Fast pretraining distillation for small vision transformers. In Proceedings of the 2022 European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 68–85. [Google Scholar]
- Chu, X.; Tian, Z.; Wang, Y.; Zhang, B.; Ren, H.; Wei, X.; Xia, H.; Shen, C. Twins: Revisiting the design of spatial attention in vision transformers. In Proceedings of the 2021 Conference on Neural Information Processing Systems (NeurIPS), Electr Network, Online, 6–14 December 2021; pp. 9355–9366. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Electr Network, Virtual, 11–17 October 2021; pp. 9992–10002. [Google Scholar]
- Liu, Z.; Hu, H.; Lin, Y.; Yao, Z.; Xie, Z.; Wei, Y.; Ning, J.; Cao, Y.; Zhang, Z.; Dong, L.; et al. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11999–12009. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Li, F. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2022 IEEE-Computer-Society Conference on Computer Vision and Pattern Recognition Workshops, Miami Beach, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Yao, Y.; Cheng, G.; Lang, C.; Yuan, X.; Xie, X.; Han, J. Hierarchical Mask Prompting and Robust Integrated Regression for Oriented Object Detection. IEEE Trans. Circ. Syst. Video Tech. 2024. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Yu, C.; Liu, Y.; Xia, X.; Lan, D.; Liu, X.; Wu, S. Precise and Fast Segmentation of Offshore Farms in High-Resolution SAR Images Based on Model Fusion and Half-Precision Parallel Inference. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2022, 15, 4861–4872. [Google Scholar] [CrossRef]
- Wang, J.; Zhang, W.; Zang, Y.; Cao, Y.; Pang, J.; Gong, T.; Chen, K.; Liu, Z.; Loy, C.; Lin, D. Seesaw loss for long-tailed instance segmentation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 9690–9699. [Google Scholar]
- Zhang, Z.; Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 2–8 December 2018; pp. 1049–5258. [Google Scholar]
Methods | Accuracy | AUC | Score | Inference Time (s) | GFlops | Parameter |
---|---|---|---|---|---|---|
SVM [21] | 73.23 | - | - | - | - | - |
ResNet50 [26] | 83.91 | 87.82 | 85.47 | 0.016 | 4.12 | 25.56 M |
ResNext [27] | 71.87 | 64.38 | 68.87 | 0.018 | 28.20 | 25.44 M |
EfficientNet [29] | 86.30 | 91.26 | 88.28 | 0.022 | 27.45 | 63.79 M |
DenseNet [30] | 90.20 | 94.32 | 91.85 | 0.018 | 40.67 | 26.48 M |
Covnext [32] | 93.15 | 96.91 | 94.66 | 0.021 | 80.37 | 87.57 M |
HorNet [33] | 93.36 | 97.17 | 94.88 | 0.022 | 81.41 | 86.23 M |
RepLKNet [31] | 93.68 | 97.32 | 95.13 | 0.024 | 81.05 | 78.84 M |
Vision Transformer [35] | 78.82 | 84.82 | 81.22 | 0.021 | 87.76 | 86.44 M |
TinyVit [36] | 92.41 | 97.34 | 94.38 | 0.021 | 27.02 | 20.69 M |
Twins [37] | 93.89 | 98.83 | 95.86 | 0.018 | 33.76 | 43.32 M |
Swin Transformer v1 [38] | 94.31 | 98.07 | 95.82 | 0.024 | 93.57 | 86.88 M |
Swin Transformer v2 [39] | 96.00 | 98.82 | 97.13 | 0.047 | 141.58 | 86.90 M |
Methods | Learning Rate Optimization Strategy | Accuracy | AUC | Score |
---|---|---|---|---|
Twins | Step degradation | 91.89 | 96.44 | 93.71 |
Cosine annealing | 93.99 | 98.03 | 95.61 | |
Combined optimization | 93.89 | 98.83 | 95.86 | |
Swin Transformer v1 | LinearLR | 91.78 | 97.74 | 94.17 |
CosineAnnealingLR | 93.26 | 97.74 | 95.05 | |
Combined optimization | 94.31 | 98.07 | 95.82 | |
Swin Transformer v2 | LinearLR | 93.99 | 98.53 | 95.81 |
CosineAnnealingLR | 94.94 | 98.92 | 96.54 | |
Combined optimization | 96.00 | 98.82 | 97.13 |
Methods | Pre-Trained | Accuracy | AUC | Score |
---|---|---|---|---|
Twins | ✘ | 74.18 | 70.11 | 72.55 |
✔ | 93.89 | 98.83 | 95.86 | |
Swin Transformer v1 | ✘ | 75.66 | 75.27 | 75.50 |
✔ | 94.31 | 98.07 | 95.82 | |
Swin Transformer v2 | ✘ | 76.29 | 75.31 | 75.90 |
✔ | 96.00 | 98.82 | 97.13 |
Methods | Loss | Accuracy | AUC | Score |
---|---|---|---|---|
Twins | Seesaw Loss | 93.36 | 97.73 | 95.11 |
Cross-Entropy Loss | 93.89 | 98.13 | 95.58 | |
Label Smooth Loss | 93.89 | 98.83 | 95.86 | |
Swin Transformer v1 | Seesaw Loss | 92.52 | 97.98 | 94.70 |
Cross-Entropy Loss | 93.36 | 97.63 | 95.07 | |
Label Smooth Loss | 94.31 | 98.07 | 95.82 | |
Swin Transformer v2 | Seesaw Loss | 94.63 | 98.23 | 96.06 |
Cross-Entropy Loss | 95.15 | 98.50 | 96.49 | |
Label Smooth Loss | 96.00 | 98.82 | 97.13 |
Methods | Accuracy | AUC | Score |
---|---|---|---|
Twins | 93.89 | 98.83 | 95.86 |
Swin Transformer v1 | 94.31 | 98.07 | 95.82 |
Swin Transformer v2 | 96.00 | 98.82 | 97.13 |
Twins + Swin Transformer + Swin Transformer v2 | 96.10 | 99.22 | 97.35 |
Methods | Dataset | Accuracy | AUC | Score |
---|---|---|---|---|
Twins | dataset0 | 93.89 | 98.83 | 95.86 |
dataset1 | 95.05 | 98.06 | 96.25 | |
dataset2 | 94.94 | 98.78 | 96.47 | |
dataset3 | 95.25 | 98.87 | 96.70 | |
dataset4 | 94.94 | 99.01 | 96.57 | |
circular data divide strategy | 98.74 | 99.89 | 99.20 | |
Swin Transformer v1 | dataset0 | 94.31 | 98.07 | 95.82 |
dataset1 | 94.10 | 97.71 | 95.54 | |
dataset2 | 93.57 | 98.47 | 95.53 | |
dataset3 | 95.36 | 98.04 | 96.43 | |
dataset4 | 94.09 | 97.98 | 95.65 | |
circular data divide strategy | 97.15 | 99.78 | 98.20 | |
Swin Transformer v2 | dataset0 | 96.00 | 98.82 | 97.13 |
dataset1 | 96.10 | 98.70 | 97.14 | |
dataset2 | 95.99 | 98.99 | 97.19 | |
dataset3 | 96.10 | 99.15 | 97.32 | |
dataset4 | 95.89 | 98.91 | 97.09 | |
circular data divide strategy | 97.68 | 99.78 | 98.52 | |
Twins + Swin Transformer v1 + Swin Transformer v2 | dataset0 | 96.10 | 99.22 | 97.35 |
dataset1 | 96.21 | 98.95 | 97.30 | |
dataset2 | 95.46 | 99.29 | 96.99 | |
dataset3 | 96.52 | 99.53 | 97.72 | |
dataset4 | 95.78 | 99.23 | 97.16 | |
circular data divide strategy | 98.42 | 99.91 | 99.01 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, J.; Shi, Z.; Yu, C.; Liu, Y. Multi-Scenario Remote Sensing Image Forgery Detection Based on Transformer and Model Fusion. Remote Sens. 2024, 16, 4311. https://doi.org/10.3390/rs16224311
Zhao J, Shi Z, Yu C, Liu Y. Multi-Scenario Remote Sensing Image Forgery Detection Based on Transformer and Model Fusion. Remote Sensing. 2024; 16(22):4311. https://doi.org/10.3390/rs16224311
Chicago/Turabian StyleZhao, Jinmiao, Zelin Shi, Chuang Yu, and Yunpeng Liu. 2024. "Multi-Scenario Remote Sensing Image Forgery Detection Based on Transformer and Model Fusion" Remote Sensing 16, no. 22: 4311. https://doi.org/10.3390/rs16224311
APA StyleZhao, J., Shi, Z., Yu, C., & Liu, Y. (2024). Multi-Scenario Remote Sensing Image Forgery Detection Based on Transformer and Model Fusion. Remote Sensing, 16(22), 4311. https://doi.org/10.3390/rs16224311