Next Article in Journal
Enhanced Fishing Monitoring in the Central-Eastern North Pacific Using Deep Learning with Nightly Remote Sensing
Previous Article in Journal
Comparison of the Distribution of Evapotranspiration on Shady and Sunny Slopes in Southwest China
Previous Article in Special Issue
ICTH: Local-to-Global Spectral Reconstruction Network for Heterosource Hyperspectral Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Technical Note

Multi-Scenario Remote Sensing Image Forgery Detection Based on Transformer and Model Fusion

by
Jinmiao Zhao
1,2,3,4,
Zelin Shi
1,2,*,
Chuang Yu
1,2,3,4 and
Yunpeng Liu
1,2
1
Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Sciences, Shenyang 110016, China
2
Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China
3
Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China
4
University of Chinese Academy of Sciences, Beijing 100049, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(22), 4311; https://doi.org/10.3390/rs16224311
Submission received: 31 August 2024 / Revised: 25 October 2024 / Accepted: 5 November 2024 / Published: 19 November 2024
(This article belongs to the Special Issue Geospatial Artificial Intelligence (GeoAI) in Remote Sensing)

Abstract

:
Recently, remote sensing image forgery detection has received widespread attention. To improve the detection accuracy, we build a novel scheme based on Transformer and model fusion. Specifically, we model this task as a binary classification task that focuses on global information. First, we explore the performance of various excellent feature extraction networks in this task under the constructed unified classification framework. On this basis, we select three high-performance Transformer-based networks that focus on global information, namely, Swin Transformer V1, Swin Transformer V2, and Twins, as the backbone networks and fuse them. Secondly, considering the small number of samples, we use the public ImageNet-1K dataset to pre-train the network to learn more stable feature expressions. At the same time, a circular data divide strategy is proposed, which can fully utilize all the samples to improve the accuracy in the competition. Finally, to promote network optimization, on the one hand, we explore multiple loss functions and select label smooth loss, which can reduce the model’s excessive dependence on training data. On the other hand, we construct a combined learning rate optimization strategy that first uses step degeneration and then cosine annealing, which reduces the risk of the network falling into local optima. Extensive experiments show that the proposed scheme has excellent performance. This scheme won seventh place in the “Forgery Detection in Multi-scenario Remote Sensing Images of Typical Objects” track of the 2024 ISPRS TC I contest on Intelligent Interpretation for Multi-modal Remote Sensing Application.

1. Introduction

Multi-scenario remote sensing image forgery detection is the process of assessing the authenticity of remote sensing images of multiple different scenes. Remote sensing images are widely used in military, security, environmental monitoring, aerospace, and other fields, providing valuable geographical and environmental information [1,2,3]. However, with the popularity of image editing tools and the advancement of image processing technology, remote sensing image forgery has become relatively simple. Such forgery has caused significant negative impacts and poses a potential threat to countries, societies, and individuals. In this context, effective forgery detection is crucial, as it can enhance the credibility of data and improve the accuracy of decision making [4,5,6].
Existing visual forgery detection techniques can be mainly divided into three categories: specific artifact-based methods [7,8,9,10], information inconsistency-based methods [11,12], and data-driven methods [13,14,15]. Specific artifact-based visual forgery detection is a method for detecting and distinguishing specific artificial artifacts present in forged images or videos. These artifacts usually consist of subtle and imperceptible traces of forgery introduced by the forgery technique. Although detection methods based on specific artifacts are very effective in early forgery detection, the limitations of these methods gradually became apparent with the development of technology and the emergence of new forgery methods. The poor adaptability of these artifact-based methods and reliance on feature selection make it difficult for them to cope with the challenges of modern deepfake techniques. Information inconsistency-based methods detect forgery by identifying inconsistencies in biological signals, time series, and behaviors in forged images. However, with the continuous advancement of deep fake technology, especially the emergence of generative adversarial networks (GANs) [16] and other advanced generative models, the authenticity and details of the forged content have been significantly improved, resulting in challenges in the effectiveness of information inconsistency-based methods. Currently, the mainstream method for this task is based on data-driven methods. Data-driven methods use large amounts of labeled data to train deep learning models, enabling the models to automatically learn effective features from the data. This method does not focus on a single forgery trace or information inconsistency, but instead achieves forgery detection by learning from a large amount of data and allowing the model to extract complex features from the training data. Compared with the first two methods, this type of method has powerful feature extraction capabilities and shows great potential in the field of forgery detection. However, most existing data-driven forgery detection methods focus on facial images. The complexity and diversity of remote sensing images exceed the scope of facial images, and the types of tampering involved are also more complex. Research on remote sensing image forgery detection is relatively new, and there is currently limited work in this area.
As a new research direction, the task of remote sensing image forgery detection can be categorized into two categories: global forgery detection [17,18] and local forgery detection [19,20]. This research specifically focuses on global forgery detection, which involves identifying forged images generated by global generation methods, such as generative adversarial networks (GANs) and diffusion models. This type of task is usually defined as a binary classification problem, where the goal is to accurately determine whether an image is real or fake. In early studies, Zhao et al. [17] extract multiple manual features and input them into a support vector machine (SVM) [21] model to perform global forgery detection. However, traditional methods are heavily influenced by hyperparameters and struggle to adapt to multi-scenario tasks. With the continuous development of deep neural networks, their powerful feature extraction capabilities have been validated across various fields. Recently, Fezza et al. [18] further explore the applicability of several typical convolutional neural network (CNN) architectures in this field, including ResNet50 and Xception [22]. They verify the superiority of deep learning-based global forgery detection. However, the feature extraction network used in the above method has limited feature extraction capability, which will lead to a poor final detection effect. To achieve a high-precision and strongly generalized remote sensing image forgery detection model, we consider that a powerful feature extraction network architecture will help to extract discriminative features effectively and fully [23,24,25].
Existing feature extraction networks can be mainly divided into three categories: traditional convolutional neural networks, large-kernel convolutional neural networks, and Transformer-based network structures. Traditional convolutional neural networks [26,27,28,29,30], such as ResNet [26], EfficientNet [29], and DenseNet [30], mainly use small convolution kernels. Large-kernel convolution neural networks [31,32,33] use larger convolution kernels (such as 7 × 7 and 11 × 11), which helps to more efficiently improve the effective receptive field and thus extract deeper semantic information. In recent years, with the outstanding performance of RepLKNet [31] in various tasks, large-kernel convolution has once again attracted attention. An increasing number of researchers have begun to pay attention to the performance, and a series of large-kernel convolutional networks, such as Covnext [32] and HorNet [33], have been proposed. The Transformer-based network [34,35,36,37,38,39] structure can effectively capture long-distance features through the self-attention mechanism. In 2017, Vaswani et al. [34] first used the Transformer network structure for sequence modeling. Researchers subsequently apply it to image processing. In the field of computer vision, Transformer-based network structures have attracted widespread attention because of their powerful feature extraction capabilities. Representative networks include the Vision Transformer [35], Twins [37], and Swin Transformer [38,39]. To fully explore the impact of different feature extraction networks on the performance of these tasks, we select representative networks from these three types of feature extraction networks for research.
Experiments show that the performance of the three network types for this task ranges from high to low: Transformer-based, large-kernel convolution-based, and traditional convolution-based. We consider that the reason is that this task is a global forgery detection task, which makes it important and beneficial to extract global features from images. However, convolutional neural networks have a limited ability to extract global information. Specifically, convolutional neural networks mainly extract features through local receptive fields, which leads to an insufficient capture of global information, thus affecting their performance in remote sensing image forgery detection tasks. Therefore, we aim to explore Transformer network architectures that focus on global features to effectively and comprehensively extract the discriminative features of images. To solve this problem and considering the limited performance of a single model, we select three high-performance Transformer-based networks, Swin Transformer V1, Swin Transformer V2, and Twins, as the backbone networks to fully extract the global information of the image. The results of the three models are fused in the inference phase to increase the detection accuracy and robustness. In addition, considering the small number of samples, we use the public ImageNet-1K dataset [40] to pre-train the network. The ImageNet-1K dataset contains many annotated images covering multiple categories, which enables the model to learn more stable and robust feature representations on diverse data, thus laying the foundation for subsequent migration to remote sensing image forgery detection tasks. At the same time, we propose a reasonable circular data divide strategy. This strategy divides the entire dataset into multiple non-overlapping subsets and then uses multiple rounds of cyclic extraction so that each subset serves as a validation set with the same probability in different training rounds. This strategy makes full use of existing samples and effectively improves the model accuracy in the forgery detection task. Finally, considering that an effective loss function and learning rate optimization strategy will contribute to the stable training and learning of the network, we also explore the loss function and learning rate optimization strategies. On the one hand, we explore various loss functions to supervise the network [41] and select label smooth loss [42] as the final loss function. This loss avoids overfitting the model to the training data by introducing a certain smoothness to the true labels. On the other hand, we construct a combined learning rate optimization strategy that first uses step degeneration and then cosine annealing. Step degeneration helps the network quickly find a better result and stabilize the training in the early stage of training, while the cosine annealing strategy helps the network to be refined and optimized in the later stage of training, thereby further reducing the risk of falling into the local optima.
In summary, we built a multi-scenario remote sensing image forgery detection scheme based on Transformer and model fusion, which has excellent detection performance. Notably, this scheme won seventh place in the “Forgery Detection in Multi-scenario Remote Sensing Images of Typical Objects” track of the 2024 ISPRS TC I contest on Intelligent Interpretation for Multi-modal Remote Sensing Application. The contributions of this manuscript can be summarized as follows:
(1)
We transform a remote sensing image forgery detection task into a binary classification task that focuses on global information. To build high-precision forgery detection networks, we explore many excellent feature extraction networks combined with a global average pooling operation and fully connected layers. Three high-performance Transformer-based networks are selected for fusion.
(2)
Considering the small number of samples, we use the public ImageNet-1K dataset to pre-train the network to learn more stable feature expressions. At the same time, a circular data divide strategy is proposed, which can fully utilize all the samples to improve the accuracy in the competition.
(3)
To promote network optimization, on the one hand, we explore several loss functions and select label smooth loss, which helps reduce the model’s excessive dependence on training data. On the other hand, we construct a combined learning rate optimization strategy that first uses step degeneration and then cosine annealing, which reduces the risk of the network falling into local optima.

2. Methods

2.1. Proposed Scheme

To achieve accurate multi-scenario remote sensing image forgery detection, we propose a scheme based on Transformer and model fusion. From Figure 1, the scheme can be divided into two phases: the model training phase and the model inference phase. During the model training phase, first, we consider that inappropriate data augmentation may change the properties of the original data for the forgery detection task. Therefore, we use only random rotations and random flips for data augmentation on the training data. Secondly, compared with local information, global information should be paid more attention in the global forgery detection task. Therefore, we use three Transformer-based models as the backbone network for feature extraction. Specifically, the data-augmented training images are input into classification networks based on Twins, Swin Transformer V1, and Swin Transformer V2. Notably, the above classification networks were all pre-trained on the ImageNet-1K dataset to ensure that the network has excellent feature extraction capabilities during the initial training stage. Finally, to reduce the overfitting of the model to the training data and improve the stability of model training, label smooth loss is used as the loss function and the combined learning rate optimization strategy that first uses step degeneration and then cosine annealing is used. During the model inference phase, we first divide the test samples into batches. They are then fed into Twins, Swin Transformer V1, and Swin Transformer V2 in sequence for inference. Considering that different models can extract different features when processing data, we fuse the inference results of the three networks. Multi-model fusion is conducive to combining the advantages of multiple models and alleviating the impact caused by the instability of a single model, thereby further improving the stability and accuracy of the model. In addition, in the competition scheme, we use a circular data divide strategy, which is conducive to maximizing the use of given data to improve the performance.

2.2. High-Performance Forgery Detection Network Architecture

In a multi-scenario remote sensing image forgery detection task, the core goal is to determine whether a given input image is real or forged. Therefore, this task can essentially be modeled as a binary classification task. To fully explore the impact of different feature extraction networks on the forgery detection task, we build a high-performance forgery detection network architecture. From Figure 2, it is mainly divided into the feature extraction part and feature mapping part. In the feature extraction part, we can replace different network structures to extract features from the data. Specifically, we explore three types of feature extraction networks, namely, traditional convolutional neural networks, large-kernel convolutional neural networks, and Transformer-based network structures. Traditional convolutional neural networks use small convolution kernels to extract image features layer by layer. They focus more on local information of the image and have difficulty processing long-distance global dependencies. Large-kernel convolutional neural networks use larger convolution kernels (such as 7 × 7 and 11 × 11) for feature extraction. Compared with traditional convolutional neural networks, large-kernel convolutional networks are more conducive to capturing a wider range of contextual information and extracting more complex features. The Transformer architecture-based model can effectively extract local information while effectively capturing long-distance dependencies and global context information. In the forgery detection task, local information can help identify details and small-scale anomalies, whereas global information helps identify overall consistency and semantic errors. A robust forgery detection model needs to make comprehensive use of global information to accurately determine the attributes of an image. Therefore, we finally select three visual models (Swin Transformer V1, Swin Transformer V2, and Twins) based on the Transformer architecture as the backbone network, which is conducive to extracting rich global information. In the feature mapping part, a global average pooling layer is first used to reduce the dimension, and then a fully connected layer outputs the confidence level of the real image and the forged image.

2.3. Model Fusion

To further improve the forgery detection accuracy, we adopt a multi-model fusion strategy to more comprehensively capture image features and optimize the detection results. Specifically, we select three Transformer-based visual models: Twins, Swin Transformer V1, and Swin Transformer V2. The above models each have different characteristics and advantages, which enable them to analyze and understand image content from different perspectives. During the model training phase, we train the above three models separately so that they can extract features and learn from the same sample from different perspectives. During the inference phase, to integrate the advantages of these models, we average the image category probability values output by each model. Specifically, for each inference image, the three models generate a probability value for whether the image is a real image or a forged image. By averaging these probability values, we obtain the final prediction result. This strategy can effectively balance the predictions of various models, thereby reducing the errors and biases that may occur in a single model [43]. By combining the prediction results of three different Transformer models, we can obtain more stable and accurate classification judgments while enhancing the model’s ability to adapt to complex and diverse data.

2.4. Circular Data Divide Strategy

To make full use of existing data to improve the performance of the model, we adopt a circular data divide strategy in the competition scheme. From Figure 3, the specific implementation process divides the dataset into five equal subsets, selects one of the subsets as the validation set each time, and uses the remaining four subsets as training sets. This method can generate five division methods for a dataset. In each division method, the corresponding validation set is used to select the optimal model in the network training process. Finally, the five selected optimal models are used to infer the test samples, respectively, and the average of all inference results is taken to obtain the final detection accuracy. This strategy can reduce the reliance on a single data partitioning method and effectively utilize all samples in the dataset, thereby further improving the detection accuracy and generalization performance of the model. It is worth noting that the circular data divide strategy essentially trains both the training set and the validation set; this strategy is mainly used in the competition scheme.

2.5. Loss Function and Optimization Strategy

To enhance the generalizability of the network, we explore multiple loss functions and select label smooth loss. It implements a label smoothing strategy based on the cross-entropy loss function. Label smooth loss reduces the risk of overfitting by reducing the model’s overconfidence in specific category labels, thereby improving the model’s generalizability and robustness. Its expression is as follows:
y ˜ = 1 α if   y = 1 α if   y = 0
L = y ˜ log ( p ) + ( 1 y ˜ ) log ( 1 p )
where y denotes the true label, y ˜ denotes the smoothed label value, and α denotes the label smoothing parameter.
To improve the stability of network training and reduce the risk of falling into local optima, we adopt a combined learning rate strategy that first uses step degeneration and then cosine annealing to optimize the training process. Step degeneration helps the network quickly find a better solution and stabilize the training in the early stage of training, while the cosine annealing strategy helps the network to be refined and optimized in the later stage of training, thereby further reducing the risk of falling into the local optima. This strategy further improves the stability of training by adopting different learning rate adjustment methods at different stages.

3. Experiment

3.1. Dataset

For the dataset used in the study, we use the competition dataset from the “Forgery Detection in Multi-scenario Remote Sensing Images of Typical Objects” track of the 2024 ISPRS TC I contest on Intelligent Interpretation for Multi-modal Remote Sensing Application. The typical targets in this dataset include airplanes, ships, vehicles, etc., and the image backgrounds include civil airports, sea surfaces, land, etc. The dataset contains a total of 4742 images with image sizes ranging from 256 to 2000 pixels. The ratio of the training set to the validation set is 4:1. Some samples are shown in Figure 4. It can be found that it is difficult for the human eye to judge whether these images are real or fake, so choosing an effective deep network to extract features fully is particularly important for improving the accuracy of forgery detection. In the competition scheme, we adopt a circular data divide strategy; so, we obtain 5 different datasets, namely, dataset0, dataset1, dataset2, dataset3, and dataset4. This division rule ensures that each data value is trained to the same extent. Unless otherwise specified, the default ablation experiment is performed on dataset0.

3.2. Experimental Settings

(1) Experimental environment and parameter settings: The operating system is Ubuntu 18.04, and the GPU is RTX 4090Ti 24 GB. The training epoch is 100, the training batch size is 3, and the initial epoch of cosine annealing is 20. At the same time, the initial learning rate is 0.0001, the momentum is 0.9, and the weight decay is 0.05. The data augmentation strategy uses random rotation and flipping.
(2) Evaluation metrics: We use the accuracy and AUC (area under the curve) to evaluate the scheme performance. The AUC is the area under the receiver operating characteristic (ROC) curve, which is used to evaluate the overall performance of the classification model at different classification thresholds. Accuracy is the ratio of the number of samples correctly predicted by the model to the total number of samples, which is used to reflect the classification ability of the model. Consistent with the competition, the final comprehensive performance evaluation metric Score takes into account the accuracy and AUC. The specific expression is as follows:
S c o r e = 0.6 × Acc + 0.4 × AUC
A c c = T P + T N T P + T N + F P + F N
A U C = iEpositiveClass r a n k i M ( 1 + M ) 2 M × N
where TP denotes true positives, which is the number of positive samples correctly predicted by the model as positive. TN denotes true negatives, which is the number of negative samples that the model correctly predicts as negative. FP denotes false positives, which is the number of negative samples that the model incorrectly predicts as positive. FN denotes false negatives, which is the number of positive samples that the model incorrectly predicts as negative. r a n k i denotes the predicted probability rank of sample i. M denotes the total number of positive samples. N denotes the total number of samples.

3.3. Model Selection

To improve the classification accuracy of the forgery detection task, we explore the performance of multiple outstanding feature extraction networks under the high-performance forgery detection network architecture shown in Figure 2. To ensure the rigor of the comparative experiments, except for the traditional machine learning method SVM, all other deep learning-based methods are conducted in the same experimental environment. Specifically, the pre-trained weights of the public ImageNet-1K dataset are used, the combined learning rate optimization strategy is used, and the label smooth loss is used as the loss function. The feature extraction network structures we explored can be divided into three categories. ResNet50, ResNext, EfficientNet, and DenseNet are traditional convolutional neural networks. Covnext, HorNet, and RepLKNet are large-kernel convolutional neural networks. Vision Transformer, TinyVit, Twins, and Swin Transformer v1 and Swin Transformer v2 are Transformer-based networks.
From Table 1, the experimental results show that methods relying on handcrafted features perform poorly when faced with challenging datasets that are indistinguishable to the naked eye. At the same time, compared with the convolutional neural network, the Transformer-based network shows obvious advantages in performance. This is because the Transformer-based network is not only good at extracting local information, but also has the ability to capture long-distance dependencies and global contextual information. Therefore, this type of network structure can not only effectively identify small-scale anomalies in images, but also detect potential global semantic errors in images. In addition, compared with traditional convolutional neural networks, large-kernel convolutional neural networks achieved better performance. The reason is that traditional convolutional networks usually rely on a smaller receptive field to extract local features, while large-kernel convolutional neural networks have a larger receptive field and can more effectively capture a wider range of contextual information, thereby more accurately judging the authenticity of the image. Furthermore, from Table 1, Swin Transformer v2 achieves the best Score. At the same time, Twins and Swin Transformer v1 achieved suboptimal performance with less time. Notably, in the preliminary stage of the competition, considering that only detection accuracy was used as the evaluation metric, we adopted the model fusion strategy that includes Twins, Swin Transformer V1, and Swin Transformer V2, training them under a circular data partitioning strategy to maximize detection performance. However, in the final stage, inference time also becomes an important evaluation metric, so we need to find a balance between detection accuracy and resource consumption. Through a large number of local experiments and online verification, we find that although the inference time and algorithm complexity of Swin Transformer V2 are higher than other networks, its advantages in improving detection performance make it acceptable. Specifically, Swin Transformer V2 can maintain high detection accuracy without significantly impacting overall inference efficiency. Therefore, we chose the single Swin Transformer V2 model as our scheme in the final stage of the competition.

3.4. Performance Verification of Combined Learning Rate Optimization Strategy

To verify the effect of the combined learning rate optimization strategy, we conduct experiments on the three selected networks (Twins, Swin Transformer v1, and Swin Transformer v2). From Table 2, compared with the separate step degradation and cosine annealing, the combined learning rate optimization strategy results in a stable improvement in the Score. Taking the experimental results on Swin Transformer v2 as an example, compared with using step degradation and cosine annealing separately, using the combined learning rate can improve the Score by 1.32 (from 95.81 to 97.13) and 0.59 (from 96.54 to 97.13), respectively. The use of a combined learning rate optimization strategy can reduce the risk of network optimization falling into local optima while ensuring the stability of network training.

3.5. Performance Verification of Pre-Trained Weights

To verify the impact of adding pre-trained weights on network performance, we conduct comparative experiments on the three selected networks (Twins, Swin Transformer v1, and Swin Transformer v2). This comparative experiment compares the effect of not using pre-trained weights and using pre-trained weights on the public dataset ImageNet-1K. From Table 3, we can find that using pre-trained weights can significantly improve performance. Specifically, compared with not using pre-trained weights, the Twins, Swin Transformer v1, and Swin Transformer v2 using pre-trained weights improve the Score by 23.31 (from 72.55 to 95.86), 20.32 (from 75.50 to 95.82), and 21.23 (from 75.90 to 97.13), respectively. The use of pre-trained weights can robustly improve the performance of the network.
To more intuitively demonstrate the effect of adding pre-trained weights, we present the classification results in detail. From Figure 5, initializing the model with pre-trained weights is better than training the model directly from scratch. Specifically, after using the pre-trained weights, the model can more accurately make true predictions when detecting real label samples, resulting in a significant reduction in the number of cases where “the label is true, but the prediction is false”. This indicates that the model’s predictive power for true labeled samples has been enhanced.

3.6. Selection of Loss Function

To explore the impact of different loss functions on network performance, we explore the effects of three loss functions: seesaw loss [44], cross-entropy loss [45], and label smooth loss [42]. From Table 4, compared with other loss functions, label smooth loss achieves the best performance in all three networks. The experimental results illustrate that the use of label smoothing loss is conducive to reducing the model’s excessive dependence on training data, thereby improving the generalizability and classification accuracy of the network.

3.7. Performance Verification of Model Fusion

To verify the effect of model fusion, we conduct model fusion experiments on three selected networks (Twins, Swin Transformer v1, and Swin Transformer v2). From Table 5, compared with the single optimal model Swin Transformer v2, the fusion results improve the Accuracy, AUC, and Score by 0.1 (from 96.00 to 96.10), 0.4 (from 98.82 to 99.22), and 0.22 (from 97.13 to 97.35), respectively.
To more intuitively evaluate the effectiveness of the proposed multi-model fusion scheme, we present the detection results and confidence levels of some samples in the validation set. As shown in Figure 6, images with red edges denote prediction errors, and images with green edges denote correct predictions. It can be found that the multi-scenario remote sensing image forgery detection scheme based on Transformer architecture and model fusion shows excellent performance in terms of prediction results. Not only can it distinguish the authenticity of images more accurately in a variety of complex scenarios, but it also has a high degree of confidence in most correctly identified images in the prediction results. This shows that the model can stably capture key features when faced with different remote sensing images.

3.8. Performance Verification of Circular Data Divide Strategy

Considering that time is not a consideration in the preliminary stage of the competition, to make full use of the competition dataset, we adopt a cyclic data divide strategy in the competition scheme. To verify its performance, we conduct experiments under three different networks. The detailed experimental results are shown in Table 6. Since the competition test set is not visible, the results of the local circular data divide strategy are obtained on the validation set of dataset0. Considering that the validation set of dataset0 was trained in the circular data divide strategy, the circular data divide strategy experimental results in Table 6 are for reference only. However, judging from the online evaluation results of the competition, the use of the circular data divide strategy can bring certain performance improvements. This strategy can generate five division methods for the competition dataset. In each division method, the optimal model is selected using the corresponding validation set. Five division methods can produce five optimal models. In the inference stage, the results of all the optimal models are fused, which helps to improve the stability and reliability of the final prediction results.

4. Discussion

Although the proposed method achieved good experimental results, solving the task of remote sensing image forgery detection solely from the perspective of general classification tasks has inherent limitations, such as a lack of interpretability. We think that analyzing the characteristics of forged images generated by the generative model, combining frequency domain features with spatial domain features, and injecting domain knowledge of real remote sensing images into the model will help further improve the performance of this task.
Specifically, generative models (such as GANs) introduce specific patterns and characteristics when generating forged images, such as color distortion and unnatural textures. In the future, we will explore how to guide the network to focus on these features to help the model learn more robust and stable discriminative features. On this basis, we can also enhance the model’s adaptability to these features by incorporating adversarial training or self-supervised learning, at the same time considering that frequency domain and spatial domain features each have their unique information. Frequency domain features can reveal periodicity and texture variations in images, while spatial domain features reflect the local details of the images. Therefore, we will combine these two features to further capture the inconsistencies between real and forged images. In addition, further introducing the domain knowledge of real remote sensing images can help the model better understand the characteristics of remote sensing images. For example, remote sensing images often exhibit specific spectral characteristics, imaging rules, and object distribution patterns. Therefore, we will explore how to inject this domain knowledge into the model to ensure it focuses more on key features during the learning process in the future.

5. Conclusions

This manuscript proposes a multi-scenario remote sensing image forgery detection scheme based on Transformer and model fusion. Specifically, we transform this task into a binary classification task that focuses on global information and explore the performance of various excellent feature extraction networks in this task. To improve the accuracy and generalizability of the model, we select three high-performance Transformer-based networks, Swin Transformer V1, Swin Transformer V2, and Twins, as the backbone networks and fuse their predicted confidence values in the inference phase. In addition, considering the small number of samples, we use the public ImageNet-1K dataset to pre-train the network to learn more stable feature expressions, and then transfer the trained weights to this task. At the same time, a reasonable circular data divide strategy is proposed, which can fully utilize all samples to improve the accuracy of forgery detection. Finally, we explore the loss function and learning rate optimization strategies on the performance of this task. On the one hand, we explore several loss functions and select label smooth loss, which can reduce the model’s excessive dependence on training data and improve accuracy. On the other hand, we construct a combined learning rate optimization strategy that first uses step degeneration and then cosine annealing, which reduces the risk of the network falling into local optima. Extensive experiments show that the proposed scheme has excellent detection performance.

Author Contributions

Conceptualization, J.Z., Z.S., C.Y. and Y.L.; methodology, J.Z. and C.Y.; software, J.Z. and C.Y.; validation, J.Z. and C.Y.; formal analysis, J.Z., Z.S., C.Y. and Y.L.; investigation, J.Z.; resources, J.Z. and C.Y.; data curation, J.Z.; writing—original draft preparation, J.Z. and C.Y.; writing—review and editing, J.Z., Z.S., C.Y. and Y.L.; visualization, J.Z. and C.Y.; supervision, Z.S. and Y.L.; project administration, J.Z.; funding acquisition, Z.S. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by LiaoNing Revitalization Program under Grant no. XLYC2201001.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Benedek, C.; Descombes, X.; Zerubia, J. Building Development Monitoring in Multitemporal Remotely Sensed Image Pairs with Stochastic Birth-Death Dynamics. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 33–50. [Google Scholar] [CrossRef] [PubMed]
  2. Yu, C.; Liu, Y.; Zhao, J.; Wu, S.; Hu, Z. Feature Interaction Learning Network for Cross-Spectral Image Patch Matching. IEEE Trans. Image Process. 2023, 32, 5564–5579. [Google Scholar] [CrossRef] [PubMed]
  3. Wang, Z.; Cheng, P.; Duan, S.; Chen, K.; Wang, Z.; Li, X.; Sun, X. DCP-Net: A Distributed Collaborative Perception Network for Remote Sensing Semantic Segmentation. Remote Sens. 2024, 16, 2504. [Google Scholar] [CrossRef]
  4. Guo, X.; Liu, X.; Ren, Z.; Grosz, S.; Masi, I.; Liu, X. Hierarchical Fine-Grained Image Forgery Detection and Localization. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 3155–3165. [Google Scholar]
  5. Guillaro, F.; Cozzolino, D.; Sud, A.; Dufour, N.; Verdoliva, L. TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 20606–20615. [Google Scholar]
  6. Liu, J.; Xie, J.; Wang, Y.; Zha, Z. Adaptive Texture and Spectrum Clue Mining for Generalizable Face Forgery Detection. IEEE Trans. Inf. Forensics Secur. 2024, 19, 1922–1934. [Google Scholar] [CrossRef]
  7. Zhu, J.; Park, T.; Isola, P.; Efros, A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
  8. Durall, R.; Keuper, M.; Pfreundt, F.; Keuper, J. Unmasking DeepFakes with simple Features. arXiv 2020, arXiv:1911.00686. [Google Scholar]
  9. Guo, Z.; Yang, G.; Chen, J.; Sun, X. Fake face detection via adaptive manipulation traces extraction network. Comput. Vis. Image Und. 2021, 204, 103170. [Google Scholar] [CrossRef]
  10. Yu, N.; Davis, L.; Fritz, M. Attributing Fake Images to GANs: Learning and Analyzing GAN Fingerprints. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7555–7565. [Google Scholar]
  11. .Ciftci, U.; Demir, I.; Yin, L. FakeCatcher: Detection of Synthetic Portrait Videos using Biological Signals. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 1939–3539. [Google Scholar] [CrossRef] [PubMed]
  12. Mittal, T.; Bhattacharya, U.; Chandra, R.; Bera, A.; Manocha, D. Emotions Don’t Lie: An Audio-Visual Deepfake Detection Method using Affective Cues. In Proceedings of the 2020 ACM International Conference on Multimedia (MM), Electr Network, Seattle, WA, USA, 12–16 October 2020; pp. 2823–2832. [Google Scholar]
  13. Dang, H.; Liu, F.; Stehouwer, J.; Liu, X.; Jain, A. On the detection of digital face manipulation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14 June 2020; pp. 5781–5790. [Google Scholar]
  14. Ding, X.; Raziei, Z.; Larson, E.; Olinick, E.; Krueger, P.; Hahsler, M. Swapped face detection using deep learning and subjective assessment. Eurasip J. Inf. Secur. 2020, 2020, 6. [Google Scholar] [CrossRef]
  15. Wang, C.; Deng, W. Representative forgery mining for fake face detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Electr Network, Virtual, 19–25 June 2021; pp. 14923–14932. [Google Scholar]
  16. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
  17. Zhao, B.; Zhang, S.; Xu, C.; Sun, Y.; Deng, C. Deep fake geography? When geospatial data encounter artificial intelligence. Cartogr. Geogr. Inf. Sci. 2021, 48, 338–352. [Google Scholar] [CrossRef]
  18. Fezza, S.; Ouis, M.; Kaddar, B.; Hamidouche, W.; Hadid, A. Evaluation of pre-trained CNN models for geographic fake image detection. In Proceedings of the 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), Shanghai, China, 26–28 September 2022; pp. 1–6. [Google Scholar]
  19. Yarlagadda, S.; Guera, D.; Bestagini, P.; Zhu, F.; Tubaro, S.; Delp, E. Satellite image forgery detection and localization using GAN and One-Class classifier. IS&T Int. Symp. Electron. Imaging 2018, 7, 214-1–214-9. [Google Scholar]
  20. Horváth, J.; Xiang, Z.; Cannas, E.; Bestagini, P.; Tubaro, S.; Delp, E. Sat U-Net: A fusion based method for forensic splicing localization in satellite images. In Proceedings of the Multimodal Image Exploitation and Learning, Orlando, FL, USA, 3 April–12 June 2022; p. 1210002. [Google Scholar]
  21. Hearst, M.; Dumais, S.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. 1998, 13, 18–28. [Google Scholar] [CrossRef]
  22. Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
  23. Yu, C.; Zhao, J.; Liu, Y.; Wu, S.; Li, C. Efficient Feature Relation Learning Network for Cross-Spectral Image Patch Matching. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–17. [Google Scholar] [CrossRef]
  24. Zhao, J.; Yu, C.; Shi, Z.; Liu, Y.; Zhang, Y. Gradient-Guided Learning Network for Infrared Small Target Detection. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
  25. Yu, C.; Liu, Y.; Wu, S.; Xia, X.; Hu, Z.; Lan, D.; Liu, X. Pay Attention to Local Contrast Learning Networks for Infrared Small Target Detection. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
  26. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  27. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. [Google Scholar]
  28. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
  29. Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 09–15 June 2019; pp. 6105–6114. [Google Scholar]
  30. Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K. Densely connected convolutional networks. In Proceedings of the 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
  31. Ding, X.; Zhang, X.; Han, J.; Ding, G. Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11953–11965. [Google Scholar]
  32. Liu, Z.; Mao, H.; Wu, C.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
  33. Rao, Y.; Zhao, W.; Tang, Y.; Zhou, J.; Lim, S.; Lu, J. Hornet: Efficient high-order spatial interactions with recursive gated convolutions. arXiv 2022, arXiv:2207.14284. [Google Scholar]
  34. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 2017 Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 1049–5258. [Google Scholar]
  35. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar]
  36. Wu, K.; Zhang, J.; Peng, H.; Liu, M.; Xiao, B.; Fu, J.; Yuan, L. Tinyvit: Fast pretraining distillation for small vision transformers. In Proceedings of the 2022 European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 68–85. [Google Scholar]
  37. Chu, X.; Tian, Z.; Wang, Y.; Zhang, B.; Ren, H.; Wei, X.; Xia, H.; Shen, C. Twins: Revisiting the design of spatial attention in vision transformers. In Proceedings of the 2021 Conference on Neural Information Processing Systems (NeurIPS), Electr Network, Online, 6–14 December 2021; pp. 9355–9366. [Google Scholar]
  38. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Electr Network, Virtual, 11–17 October 2021; pp. 9992–10002. [Google Scholar]
  39. Liu, Z.; Hu, H.; Lin, Y.; Yao, Z.; Xie, Z.; Wei, Y.; Ning, J.; Cao, Y.; Zhang, Z.; Dong, L.; et al. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11999–12009. [Google Scholar]
  40. Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Li, F. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2022 IEEE-Computer-Society Conference on Computer Vision and Pattern Recognition Workshops, Miami Beach, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
  41. Yao, Y.; Cheng, G.; Lang, C.; Yuan, X.; Xie, X.; Han, J. Hierarchical Mask Prompting and Robust Integrated Regression for Oriented Object Detection. IEEE Trans. Circ. Syst. Video Tech. 2024. [Google Scholar] [CrossRef]
  42. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
  43. Yu, C.; Liu, Y.; Xia, X.; Lan, D.; Liu, X.; Wu, S. Precise and Fast Segmentation of Offshore Farms in High-Resolution SAR Images Based on Model Fusion and Half-Precision Parallel Inference. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2022, 15, 4861–4872. [Google Scholar] [CrossRef]
  44. Wang, J.; Zhang, W.; Zang, Y.; Cao, Y.; Pang, J.; Gong, T.; Chen, K.; Liu, Z.; Loy, C.; Lin, D. Seesaw loss for long-tailed instance segmentation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 9690–9699. [Google Scholar]
  45. Zhang, Z.; Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 2–8 December 2018; pp. 1049–5258. [Google Scholar]
Figure 1. Overall structure of proposed scheme.
Figure 1. Overall structure of proposed scheme.
Remotesensing 16 04311 g001
Figure 2. High-performance forgery detection network architecture.
Figure 2. High-performance forgery detection network architecture.
Remotesensing 16 04311 g002
Figure 3. Circular data divide strategy.
Figure 3. Circular data divide strategy.
Remotesensing 16 04311 g003
Figure 4. Displays of some samples from the dataset. The label of the sample on the left side of the dotted line is 0, representing real images. The label of the sample on the right side of the dotted line is 1, representing fake images.
Figure 4. Displays of some samples from the dataset. The label of the sample on the left side of the dotted line is 0, representing real images. The label of the sample on the right side of the dotted line is 1, representing fake images.
Remotesensing 16 04311 g004
Figure 5. Detailed classification comparison between the model prediction results and true labels.
Figure 5. Detailed classification comparison between the model prediction results and true labels.
Remotesensing 16 04311 g005
Figure 6. Displays of some detection results. Top left: the label is true, but the prediction is false. Top right: the label is true, and the prediction is true. Bottom left: the label is false, but the prediction is true. Bottom right: the label is false, and the prediction is false.
Figure 6. Displays of some detection results. Top left: the label is true, but the prediction is false. Top right: the label is true, and the prediction is true. Bottom left: the label is false, but the prediction is true. Bottom right: the label is false, and the prediction is false.
Remotesensing 16 04311 g006
Table 1. Performance comparison of various methods.
Table 1. Performance comparison of various methods.
MethodsAccuracyAUCScoreInference Time (s)GFlopsParameter
SVM [21]73.23-----
ResNet50 [26]83.9187.8285.470.0164.1225.56 M
ResNext [27]71.8764.3868.870.01828.2025.44 M
EfficientNet [29]86.3091.2688.280.02227.4563.79 M
DenseNet [30]90.2094.3291.850.01840.6726.48 M
Covnext [32]93.1596.9194.660.02180.3787.57 M
HorNet [33]93.3697.1794.880.02281.4186.23 M
RepLKNet [31]93.6897.3295.130.02481.0578.84 M
Vision Transformer [35]78.8284.8281.220.02187.7686.44 M
TinyVit [36]92.4197.3494.380.02127.0220.69 M
Twins [37]93.8998.8395.860.01833.7643.32 M
Swin Transformer v1 [38]94.3198.0795.820.02493.5786.88 M
Swin Transformer v2 [39]96.0098.8297.130.047141.5886.90 M
Table 2. Performance verification of combined learning rate optimization strategy.
Table 2. Performance verification of combined learning rate optimization strategy.
MethodsLearning Rate
Optimization Strategy
AccuracyAUCScore
TwinsStep degradation91.8996.4493.71
Cosine annealing93.9998.0395.61
Combined optimization 93.8998.8395.86
Swin Transformer v1LinearLR91.7897.7494.17
CosineAnnealingLR93.2697.7495.05
Combined optimization94.3198.0795.82
Swin Transformer v2LinearLR93.9998.5395.81
CosineAnnealingLR94.9498.9296.54
Combined optimization96.0098.8297.13
Table 3. Performance verification of pre-trained weights.
Table 3. Performance verification of pre-trained weights.
MethodsPre-TrainedAccuracyAUCScore
Twins74.1870.1172.55
93.8998.8395.86
Swin Transformer v175.6675.2775.50
94.3198.0795.82
Swin Transformer v276.2975.3175.90
96.0098.8297.13
Table 4. Performance comparison of different loss functions.
Table 4. Performance comparison of different loss functions.
MethodsLossAccuracyAUCScore
TwinsSeesaw Loss93.3697.7395.11
Cross-Entropy Loss93.8998.1395.58
Label Smooth Loss93.8998.8395.86
Swin Transformer v1Seesaw Loss92.5297.9894.70
Cross-Entropy Loss93.3697.6395.07
Label Smooth Loss94.3198.0795.82
Swin Transformer v2Seesaw Loss94.6398.2396.06
Cross-Entropy Loss95.1598.5096.49
Label Smooth Loss96.0098.8297.13
Table 5. Performance verification of multi-model fusion.
Table 5. Performance verification of multi-model fusion.
MethodsAccuracyAUCScore
Twins93.8998.8395.86
Swin Transformer v194.3198.0795.82
Swin Transformer v296.0098.8297.13
Twins + Swin Transformer + Swin Transformer v296.1099.2297.35
Table 6. Performance verification of circular data divide strategy.
Table 6. Performance verification of circular data divide strategy.
MethodsDatasetAccuracyAUCScore
Twinsdataset093.8998.8395.86
dataset195.0598.0696.25
dataset294.9498.7896.47
dataset395.2598.8796.70
dataset494.9499.0196.57
circular data divide strategy98.7499.8999.20
Swin Transformer v1dataset094.3198.0795.82
dataset194.1097.7195.54
dataset293.5798.4795.53
dataset395.3698.0496.43
dataset494.0997.9895.65
circular data divide strategy97.1599.7898.20
Swin Transformer v2dataset096.0098.8297.13
dataset196.1098.7097.14
dataset295.9998.9997.19
dataset396.1099.1597.32
dataset495.8998.9197.09
circular data divide strategy97.6899.7898.52
Twins +
Swin Transformer v1 +
Swin Transformer v2
dataset096.1099.2297.35
dataset196.2198.9597.30
dataset295.4699.2996.99
dataset396.5299.5397.72
dataset495.7899.2397.16
circular data divide strategy98.4299.9199.01
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, J.; Shi, Z.; Yu, C.; Liu, Y. Multi-Scenario Remote Sensing Image Forgery Detection Based on Transformer and Model Fusion. Remote Sens. 2024, 16, 4311. https://doi.org/10.3390/rs16224311

AMA Style

Zhao J, Shi Z, Yu C, Liu Y. Multi-Scenario Remote Sensing Image Forgery Detection Based on Transformer and Model Fusion. Remote Sensing. 2024; 16(22):4311. https://doi.org/10.3390/rs16224311

Chicago/Turabian Style

Zhao, Jinmiao, Zelin Shi, Chuang Yu, and Yunpeng Liu. 2024. "Multi-Scenario Remote Sensing Image Forgery Detection Based on Transformer and Model Fusion" Remote Sensing 16, no. 22: 4311. https://doi.org/10.3390/rs16224311

APA Style

Zhao, J., Shi, Z., Yu, C., & Liu, Y. (2024). Multi-Scenario Remote Sensing Image Forgery Detection Based on Transformer and Model Fusion. Remote Sensing, 16(22), 4311. https://doi.org/10.3390/rs16224311

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop