1. Introduction
The development of technological tools dedicated to image editing has increased, generating excessive use due to their easy access since many of these programs or applications are free and found on free websites. These manipulation tools have increased the distribution of incorrect or false information, providing benefits and causing conflicts for some people or companies [
1,
2]. Image distortions are made using editing software and mainly consist of modifying, adding, or deleting different regions of an image without leaving a visible trace.
For this reason, digital image forensics focuses on detecting manipulated areas in images, dividing them into active and passive. Active detection consists of identifying information related to the image or the author to authenticate the digital file, as in the case of digital signatures and watermarks [
1,
3]. On the other hand, passive detection consists of identifying modified areas of an image without the need for additional information or the need to know the property of the image. Passive detection identifies different types of manipulations, such as copy-move, image splicing, and illumination changes through image retouching [
4].
Pattern recognition is essential in image forensics, particularly for the detection of splicing manipulations within images. The development of effective splicing detection techniques and algorithms requires an understanding of the image and the ability to discern introduced image forgery distortions. This process analyzes and identifies specific visual features related to the presence of manipulations, such as inconsistencies in texture, lighting, or perspective. Advanced pattern recognition methods enhance the accuracy and reliability of splicing detection, ensuring the integrity of visual content.
The image splicing technique involves duplicating an image region into another image, which has garnered significant attention in image forensics. The detection of this type of forgery has become a relevant area in guaranteeing image authenticity [
5,
6,
7]. However, pattern recognition is essential in image forensics, particularly for detecting splicing manipulations within images. The identification of patterns allows for comparison between regions and helps determine the similarities between them. Additionally, using artificial intelligence systems to detect manipulated areas by learning the specific patterns and features of each image region could help identify the tampered areas. In this context, using Siamese neural networks as feature extractors allows image splicing manipulation detection systems to identify forgery areas with variations in scale, rotation, and non-linear transformations applied to the tampered regions [
8,
9,
10,
11].
This paper introduces an algorithm that detects manipulated areas from splicing attacks based on the Siamese neural network. The algorithm, when combined with the K-means algorithm, significantly reduces the errors in splicing detection. The Siamese neural network identifies unique features for each image block, enabling the detection of manipulated regions. Using two identical branches for the Siamese neural network model allows the system to detect image manipulations effectively by comparing image blocks to discern similarities and discrepancies. Data augmentation is crucial for Siamese neural network training. This step enhances the pattern learning associated with image manipulation by applying geometric and signal-processing distortions to each block. Additionally, the K-means algorithm is employed for the related image regions detected as manipulated to realize a more precise forgery detection based on region similarities.
The main contributions of this algorithm for detecting splicing-based forgery are as follows:
The Siamese neural network extracts inherent features from each image block and facilitates learning complex and specific patterns, enhancing the efficiency of manipulated area detection.
Siamese neural networks learn invariant features from geometric and image processing attacks used in data augmentation, increasing the accuracy of splicing detection forgery.
Data augmentation enhances the algorithm’s robustness against splicing attacks, providing a reliable defense against image manipulation. Data augmentation involves geometric distortions and image processing on each block to generate comprehensive representations of image regions, thus reducing underfitting and overfitting issues. A description of data augmentation is provided in
Section 3.2.2, titled “Siamese Neural Network Training”.
The K-means algorithm refines the splicing detection process by clustering image blocks, enabling more efficient forgery region detection.
The K-means algorithm is essential to reducing the error in image splicing detection. Pixels are clustered based on similarities; this method enhances the detection of manipulated regions since each region detected as manipulated is divided into smaller blocks to make the detection more precise and reduce the error. This approach enhances the precision of identifying manipulated regions through effective clustering and comparison of image blocks.
This document is organized as follows:
Section 2 presents techniques for splicing detection.
Section 3 provides detailed information on the proposed method for splicing forgery detection introduced in this paper.
Section 4 presents the evaluation results obtained from the experiments conducted. Finally,
Section 5 concludes this research.
2. Previously Reported Algorithms
This section presents context related to image splicing manipulation detection through a review of different proposed techniques, highlighting the importance of developing these algorithms in digital image forensics.
Traditional techniques for detecting splicing forgery in images are based on comparison regions using image processing techniques and pattern analysis. Ghifari and Studiawan [
12] proposed a splicing detection method based on superpixel image segmentation and difference assessment using the mean-squared error (MSE) from image denoising. Subsequently, the K-means algorithm clusters the MSE values to detect image manipulations. Arafa et al. [
13] applied the Discrete Cosine Transform (DCT) to extract global features from the gray-level run length matrix, enabling pixel intensity measurement, and employed a support vector machine (SVM) to detect splicing manipulations. Das et al. [
14] detected homogeneous features using the Histogram of Gaussian (HoG), the Discrete Wavelet Transform (DWT), and the Local Binary Pattern (LBP). These techniques were employed to remove correlated features, and the SVM was utilized to determine manipulated image areas. Meena and Tyagi [
15] estimated the noise level for each block using the SLIC algorithm. Then, they applied the K-means algorithm to cluster similar regions based on this noise level estimation. This approach aims to group regions that exhibit similar features, which can indicate tampering or manipulation in the image. Yildirim et al. [
16] delineated the boundaries of spliced regions using an SVM and precisely localized them. Therefore, they applied Connected Component Labeling (CCL) to further refine and identify spliced regions. Tripathi et al. [
17] applied the SVM, the Twin Support Vectorial Machine, and an Artificial Neural Network to classify image regions and determine manipulated regions. Jaiprakash et al. developed a method for detecting image manipulations [
18]. They used the characteristics of coefficients related to the Discrete Cosine Transform (DCT) and the Discrete Wavelet Transform of the image chrominances, referring to the YCbCr color space. From the DCT and DWT coefficients, they generated statistical moments to correlate the image pixels in their frequency space and determine the existing changes in the image.
On the other hand, developing techniques for detecting splicing based on deep learning creates more efficient methods due to the learning of specific image features. Wu et al. [
19] proposed a localization and splicing detection algorithm that uses a Deep Matching and Validation Network (DMVN) to carry out both actions simultaneously. The algorithm uses a VGG16 model to detect a possible matching region, using cross-correlation and extracting the potential splicing areas with a deconvolutional module to extract a response map. Gadiparthi et al. [
20] developed an image splicing and copy-move detection method using recoloring. The authors resized the image to 128 × 128 pixels, then created a model with 13 layers (three layers, nine convolutional layers, and a flattering layer) that contains 512 feature maps. In addition, they used an SIFT algorithm to find critical areas in the image and used image clustering to identify the manipulated areas. Sharma and Singh [
2] present a DCNN model that can locate manipulations using the residual network ResNet-50 to obtain an image feature map to determine the manipulated zones. Nayaran et al. [
21] proposed a block-based method. They realized data augmentation by applying some distortions (scaling, image normalization) to the image in the HSV (Hue, Saturation, Value) color model. Therefore, a filter was applied to remove noise and enhance edges, and the region of interest (manipulated zone) was detected. Subsequently, a convolutional Siamese neural network was used to extract a feature map to classify the detected area. Farhan et al. developed a double dual Siamese neural network [
22]. This method is based on the concatenation of two convolutional Siamese neural networks where the input is a manipulated image, and by using a Laplace filter, sharpens filters to generate data augmentation with an HSV color image. Feature extraction was achieved through the use of feature maps from the image to classify the manipulated areas. Das and Naskar [
23] proposed a Convolutional Neural Network (CNN) model based on transfer learning. They utilized the ResNet50 architecture by replacing the initial convolutional layers to expedite training for classifying images with splicing distortions. Additionally, in their work [
24], Das and Naskar also proposed the MobileNetV2 model with an image size of 32 × 32 pixels. They implemented dropout to mitigate overfitting and utilized the Adam optimizer with binary cross-entropy loss to detect if an image was tampered by splicing attacks. Hingrajiya and Patel [
25] introduced a method to detect copy-move and splicing manipulations based on the DenseNet201 model, classifying an image as either genuine or forged. Ahmed et al. [
26] developed a procedure involving image resizing to 227 × 227 pixels, followed by the AlexNet model for classification, and concluded with image analysis using Canonical Correlation Analysis (CCA) to determine manipulated regions. Ali et al. [
27] described a method based on the error level analysis (ELA) module and realized feature extraction by employing the VGG-19 model to enhance structural recognition for image forgery detection. Irmawati et al. [
28] discussed a process involving image resizing to 244 × 244 pixels and conversion to .jpg format. They extracted features and patterns using the ResNet50 model and then applied the Canny edge detector to optimize image manipulation detection. Krishnamoorthy et al. [
29] proposed a method involving low-pass filtering, followed by utilizing the MobileNet model to identify forged images.
3. Proposed Splicing Manipulation Detection System Based on Siamese Neural Network
This section describes the characteristics of the proposed method for detecting image-splicing-manipulated areas. The proposed algorithm uses a Siamese neural network designed for image feature extraction. This neural network architecture efficiently compares the original and manipulated images’ features to identify tampered regions. The Siamese neural network analyzes and identifies complex image features and patterns, generating efficient image manipulation detection. This system accurately identifies the manipulations made to the image and determines the altered areas by comparing the extracted features between the original and manipulated images.
Furthermore, after the feature extraction stage, the detected tampered regions are divided into 8 × 8 blocks to realize more precise detection by the K-means algorithm to identify similar areas that are considered manipulated regions. The K-means algorithm clusters similar pixels unsupervised and efficiently analyzes the tampered image regions to locate the manipulated region.
Figure 1 presents a general diagram of the proposed technique, illustrating the sequence of steps that comprise the method. This methodology effectively detects image splicing manipulations, highlighting the capabilities of Siamese neural networks and K-means pixel cluster centroid comparison to achieve accurate and robust detection.
The proposed method has four stages: preprocessing, Siamese neural network feature extraction, image forgery detection, and the identification of regions distorted by splicing manipulations.
Preprocessing: This stage prepares the image for its analysis and efficiently learns image features.
Feature extraction using a Siamese neural network: A Siamese neural network extracts significant image features from the original and manipulated images to generate a feature matrix.
Detection of manipulated areas: The system determines potentially manipulated areas within the image by comparing the extracted features.
Identification of manipulated regions: Finally, the detected manipulated areas are divided into small regions to identify and locate manipulated areas using the K-means algorithm.
3.1. Image Preprocessing
In the preprocessing stage, the image is resized to 512 × 512 and divided into non-overlapping 32 × 32 blocks. These blocks are used to learn specific features from each image region during the Siamese neural network training, making it easier to identify manipulated areas. Image block segmentation is used because most image splicing manipulations modify specific image regions, so by dividing the image into smaller blocks, a more accurate representation of each area of the image can be generated, and each block is considered an independent region. The blocks contain significant information and simplify the Siamese network’s learning task for feature comparison. Furthermore, this strategy facilitates network training and improves computational efficiency. A block size of 32 × 32 pixels allows the Siamese network to learn the specific local features of each region; this is crucial to identifying visual and textural patterns indicative of image splicing manipulations.
Image preprocessing divides the image into blocks to be processed by the neural networks. Image block segmentation provides a better analysis of the regions, increasing the efficiency of the proposed method. The following equation expresses the number of blocks generated in this process.
where
M is the image width,
N is the image height, and
and
are the width and height of the block, which are 32 in the proposed schemes.
The feature extraction process leverages the Siamese neural network architecture depicted in
Figure 2. Image features are extracted from the convolutional layers of this network, generating a comprehensive feature matrix as the output. This process analyzes each region of an image separately, generating semantic descriptions of the image.
The convolutional layers detect local features such as edges, textures, and simple patterns in early layers, while later layers detect more complex and abstract features. The feature extraction process generates abstract representations containing essential information for identifying image manipulations. Convolutional layers reduce input dimensionality by applying filters and grouping important features, which reduces model complexity and helps extract more compact representations. The feature matrix construction with the extracted features from these layers provides precise tampering detection through dimensionality reduction. The feature matrix encodes patterns from the original and manipulated images, enhancing the model’s generalization to detect splicing manipulations. Therefore, the Siamese neural network uses the extracted features to differentiate between authentic and manipulated regions within an image for splicing detection.
The output of the neural network, which is the Block Feature Matrix, has dimensions of 2 by the number of blocks, as given by (1).
3.2. Feature Extraction Using a Siamese Neural Network
Feature extraction through a Siamese neural network compares the similarities between data. This methodology learns robust representations where similarity and comparison between images are essential. Each branch processes a set of data simultaneously, transforming these inputs into feature representations across layers. During the feature extraction process, a Siamese network is used to map each input. This mapping makes the relations and differences between the data more discernible. In the training phase, the Siamese network adjusts its parameters to minimize the distance between features from similar inputs and maximize the distance between features from dissimilar inputs. This adjustment allows for a more accurate comparison of similarity in the data. For this reason, this approach allows for the detection of manipulations in images, where the Siamese network can learn to distinguish between authentic areas and areas manipulated by splicing.
3.2.1. Siamese Neural Network
Siamese neural networks focus on tasks that require identifying, comparing, and relating patterns. The design of this neural network model consists of two branches or similar models that share parameters, making it a suitable option for the detection of tampered areas [
30,
31]. The main advantage of Siamese neural networks is that these models can be used to determine the similarity between two images by comparing them, which is useful for identifying manipulated areas. The training of the Siamese neural network consists of the use of pairs of images to generate representations that allow for identifying the differences between two images [
32,
33,
34].
Figure 2 visualizes the Siamese neural network’s architecture, where the values obtained from the last convolutional layer of both branches are used to determine if the image has been manipulated. Furthermore,
Table 1 shows the configuration of both branches.
The loss function used to train the proposed Siamese neural network is the contrastive loss, given by Equation (1), which aims to learn a semantic representation where the patterns of similar objects are in the same feature space to detect the manipulations performed in each region of the picture [
34].
where
is the value from the fully connected layer, which indicates the similarity between images,
and training regions from the image, and
m is a value that minimizes or maximizes the value of the Euclidean distance.
where
are the feature matrices generated by the Siamese neural network.
3.2.2. Siamese Neural Network Training
Training the Siamese neural network to create specific image features involves data augmentation. Because the amount of data required for training is in most cases larger than the image contained in most available databases, the data augmentation process applied different manipulations to each image block to increase the amount of data and achieve more efficient detection by reducing overfitting and underfitting. This image augmentation technique involves precise learning of features or patterns to detect manipulated areas.
Data augmentation techniques are strategies used to increase data to improve the model’s ability to generalize and recognize patterns. In this case, geometric and image processing techniques were applied, which include the following:
Image rotation (0°–365°): precise rotations are applied to images in a range of 0 to 365 degrees with a step of 15°, simulating different orientations of objects in the images with accuracy.
Image translation: This distortion moves the image horizontally or vertically to simulate different positions within the frame. For this image manipulation technique, an x translation, a y translation, and an translation of 10, 20, and 30 pixels were used.
JPEG compression: this image processing distortion technique consists of JPEG compression of images with a quality factor of 90, 70, 50, and 30, which can affect the quality and clarity of visual details.
Gaussian filter: the Gaussian filter, with a kernel of 5 × 5 and 7 × 7, smooths the image, which can help eliminate noise and imperfections.
Median filter: a median filter with a kernel of 5 × 5 and 7 × 7 removes noise from the image while maintaining important edges and details.
Gaussian noise: Gaussian noise of 0.009 and 0.09 is added to the image to simulate imaging conditions under adverse conditions.
Blur: image blur with a kernel of 5 × 5 and 7 × 7 softens details and creates a diffuse appearance.
Denoising: denoising techniques eliminate unwanted noise from the image, improving visual quality.
These techniques increase the neural network input data, helping the model learn and recognize patterns in different conditions and scenarios and improving its ability to generalize image features.
3.3. Tampered Area Detection
Detecting manipulated areas involves comparing the features extracted from the manipulated image with the features of the original image. Comparing the similarity of the feature matrix generated by a manipulated image with the feature matrix of the original image is essential in image manipulation detection since the manipulation of an image often involves changes in its visual, textural, or structural properties. These differences may be considered anomalous patterns, changes in the distribution of features, or discrepancies in learned representations. Feature matrices capture semantic and structural information from images. By comparing these matrices, the global and local contexts of the features are preserved, allowing for the detection of manipulations that may affect different regions of the image. The system can learn specific patterns associated with manipulations by comparing the feature matrices. Siamese neural networks learn representations, highlighting significant differences between the original and manipulated images. A comparison of the feature matrices helps identify these differences, contributing to the detection system’s accuracy. A comparison of feature matrices provides an efficient approach for detecting manipulated zones. Instead of comparing pixel by pixel, feature matrix comparison allows for a more abstract and semantic evaluation of images, improving the computational efficiency of the system.
3.4. Duplicate Area Detection
Subsequently, manipulated areas are detected using the K-means algorithm, which determines the centroids associated with each block. This comparison of centroids helps identify similarities between regions and determine which areas were tampered with. The Euclidean distance evaluates the similarity between centroids determined by the K-means algorithm, providing an accurate metric for duplicated region detection. This comprehensive approach constitutes a robust methodology for effectively identifying manipulated image areas.
3.5. Image Segmentation Based on K-means
Image segmentation techniques play a fundamental role in identifying specific areas of an image. Image segmentation identifies regions of interest (ROI) and regions of no interest (RONI) within an image with a predefined criterion. In the context of tampered zone detection, ROI detection is essential to identifying manipulated areas by clustering pixels. This approach contributes significantly to the robustness and effectiveness of the mechanisms for detecting splicing attacks.
The K-means algorithm is an unsupervised machine learning method designed for feature clustering [
35]. Unsupervised image segmentation with the K-means algorithm divides an image into homogeneous regions. A key feature of unsupervised segmentation is its ability to classify pixels without knowing the classes they belong to. In the context of image splicing detection, K-means segmentation detects RONI areas and ROI areas by classifying data into a predefined number of clusters. Initially, K pixels are randomly chosen as the initial centroids for the clusters. Each pixel in the image is assigned to the cluster whose centroid is closest, basing this assignment on the Euclidean distance, given by Equation (3), from an arbitrary centroid
[
36,
37]. Finally, the cluster centroids are recalculated as the average of the pixels assigned to each cluster, thus completing the segmentation process. This approach identifies and distinguishes areas of interest, optimizing image manipulation detection through applying the K-means algorithm.
where
is a data point or pixel value. The K-means algorithm classifies the pixel values in the image according to the number of clusters used to segment it and recognizes RONI and ROI areas for image splicing.
Figure 3 illustrates the segmentation of an image performed by the K-means algorithm using different numbers of clusters.
The K-means algorithm, applied to the image features obtained by the Siamese neural network, carries out image segmentation based on the pixel similarities of characteristics related to the original image and the manipulated image. Therefore, it is necessary to divide the areas detected as manipulated into smaller blocks to increase the algorithm’s efficiency. The centroids of the K-means algorithm are used as reference points to identify and evaluate manipulated areas, facilitating comparison and determining the similarity between image blocks.
K-means integration, as a refinement process after the approximate detection of patterns realized by the Siamese neural network related to splicing distortions, improves the precision and efficiency of splicing manipulation detection. On the other hand, the application of the K-means algorithm as a refinement method in the detection of splicing manipulations represents a significant leap forward in the field of digital forensics. This approach provides an effective tool for a reliable evaluation of the authenticity of images. Additionally, combining deep learning techniques and clustering analysis for different image regions solves complex challenges associated with digital image manipulation. For this reason, K-means enhances the algorithm’s ability to identify and analyze images with precision and accuracy.
4. Experimental Results
To evaluate the proposed image splicing detection algorithm, our own image database was used, which contains images captured in different environments and conditions, such as various scenarios or objects with lighting changes, to which different image splicing attacks were applied. This database contains 166 images, of which 68 do not contain manipulations and 68 are images modified by splicing attacks. The photos are in the RBG color model. Another database used was the Columbia1 image database, which is composed of 183 original images and 180 manipulated images. The image sizes range from 757 × 568 to 1152 × 768 and are uncompressed, in either TIFF or BMP formats [
38]. The CASIA V2 image base was also used, which consists of 5123 images classified into 3295 images manipulated by copy-move attacks and 1828 by splicing distortion [
39]. In addition, the Realistic Tampering Dataset is composed of 220 original and tampered images; all images are 1920 × 1080 px RGB uint8 bitmaps stored in the TIFF format [
40]. Finally, the MICC-F220 database was used to assess the effectiveness of the proposed splicing detection method in different scenarios. This database contains 110 original images and 110 modified RGB images in JPEG format (737 × 492) [
41].
Figure 4 shows some examples of the databases images.
This section provides a comprehensive overview of the results of many experiments with the proposed splicing detector algorithm. The approach presented identifies manipulated regions. Our evaluation examines the system’s performance and robustness against splicing attacks. For experimentation, it utilized a system equipped with an NVIDIA GTX 1650 graphics card and an Intel Core i7 processor operating at 3.4 GHz, running on the Windows 10 operating system. These specifications establish an environment conducive to the algorithm’s effective operation within the PyTorch framework in Python.
4.1. Performance Evaluation of the Proposed Algorithm
The metrics used to evaluate the efficiency of the proposed system are essential to measuring its performance. First, accuracy, given by Equation (5), determines whether the pixels in the manipulated area correctly correspond to the manipulated area.
Additionally, the precision evaluates the number of pixels correctly predicted as part of the tampered area. This metric provides specific information about the system’s ability to predict manipulated regions, and it is given in Equation (6).
True positives (
TP) are the pixels correctly classified as part of the region of interest, and false positives (
FP) are the pixels incorrectly classified as part of the region of interest. On the other hand,
recall measures the algorithm’s effectiveness in appropriately detecting the pixels that belong to the tampered area. This metric highlights the system’s ability to identify manipulated areas, as given in Equation (7).
The
F1 score metric provides a value that evaluates the overall detection capacity, combining both precision and
recall. It is useful for obtaining a balanced view of the system’s performance, as given in Equation (8).
Finally, the mean-squared error quantifies the error obtained in the detection of the manipulated areas. It provides a quantitative measure of the system’s performance, as given in Equation (9).
These metrics offer a complete and detailed evaluation of the proposed system in terms of its ability to detect and classify image-splicing-manipulated areas.
Figure 5 illustrates the results of our image splicing detection method across four distinct databases: Realistic Tampering, Own, CASIA V2, MICC-F220, and Columbia. Notably, our method exhibits exceptional performance within the Realistic Tampering database, showcasing its ability to identify altered regions within images from this dataset accurately.
In our proprietary database (Own), our method demonstrates high efficiency, reaffirming its effectiveness in detecting image splicing manipulations. However, within the CASIA V2 database, our method’s efficiency appears comparatively lower. This observation may stem from the unique characteristics of the CASIA database, which includes images subjected to splicing manipulations. For the evaluation of the CASIA database, the original image has to be converted to *.tif format since the manipulated image is in this format. Despite this, experimental tests reveal the efficacy of our proposed technique against splicing distortions within the CASIA V2 image dataset. Furthermore, the results obtained from the experiments conducted on the Columbia database demonstrate our method’s efficient detection capabilities. This outcome underscores the versatility and robustness of our algorithm in identifying image splicing manipulations across different datasets.
Figure 6a shows the accuracy of the proposed algorithm, obtaining an accuracy mean value of 0.9843 for splicing detection in the three image databases. This result indicates that the algorithm has a high level of ability to correctly classify manipulated or non-manipulated regions. The precision is illustrated in
Figure 6b; this graphic represents the algorithm’s ability to identify the image’s splicing regions efficiently. The obtained precision mean value is 0.9757. This value underscores the algorithm’s robustness for the detection of manipulated regions.
Figure 6c depicts the algorithm’s recall, which is 0.9540, indicating that the proposed algorithm has a high capacity to detect manipulated regions, thus minimizing false negatives. The
F1 score is shown in
Figure 6d, achieving a mean value of 0.9635. This result indicates that the algorithm balances precision and
recall, which is important to avoid false positives and negatives. An
MSE plot is illustrated in
Figure 6e and demonstrates that the algorithm has a low error (0.007). The low root-mean-squared error value indicates the algorithm’s ability to identify the modified image regions. The graphs demonstrate that the proposed algorithm based on Siamese neural networks is efficient for image splicing detection and manipulated region detection.
4.2. Performance Comparison of the Proposed Algorithm
Table 2 shows a performance comparison between the proposed algorithm and other previously reported splicing tampering detection algorithms. The obtained results show that the proposed algorithm stands out for its high performance and capacity to generalize characteristics through using Siamese neural networks, which increases its precision and effectiveness in detecting splicing manipulations in images. As we can see in
Table 2, the accuracy of the proposed method is 98.6%, which is quite similar to the results reported by Ahmed et al. using the AlexNet network, although in this case, the system was evaluated using only the database CASIA V1. Compared with other methods, the proposed algorithm achieves higher performance. This indicates that the proposed method accurately and efficiently identifies manipulated regions in images, efficiently generalizing the image’s features. The performance evaluation using different databases, including some public and our databases, indicates that the proposed model is robust and can adapt to different scenarios and image manipulations, providing higher accuracy and precision than other previously proposed schemes. The Siamese neural networks generate image representations to identify patterns for efficient image manipulation detection. In addition, combining Siamese neural networks with the K-means algorithm helps identify the original region duplicated in the image.
The proposed method based on Siamese neural networks outperforms other techniques based on pre-trained neural network models as the backbone, such as VGG19, DenseNet, ResNet, MobileNet, and GoogLeNet. The proposed Siamese neural network is designed to compare manipulated and original images to determine their similarity, making it ideal for detecting splicing manipulations in images by accurately identifying manipulated patterns. Additionally, these networks typically have a simplified architecture compared to pre-trained neural networks that are larger models, reducing computational complexity by having fewer trainable parameters. By directly comparing images, Siamese networks can learn more robust representations, improving tamper detection by focusing on the differences between regions. Finally, by combining Siamese networks with the K-means technique, the efficiency of tamper detection is improved by more accurately identifying duplicated or manipulated areas in images. The success of the proposed method is due to its specific design, lightweight architecture, and ability to learn robust representations for the accurate detection of image manipulations.
It should be noted that we used five metrics to evaluate the system, while most of the previously proposed algorithms only use between one and three metrics. The use of multiple metrics allows us to gain a more complete perspective on different aspects of performance, which can provide a deeper understanding of its effectiveness.
Figure 7 compares the processing training time of the proposed Siamese neural network with several established neural models: VGG19, ResNet18, ResNet50, DenseNet121, AlexNet, GoogLeNet, and MobileNet. These models are the basis of most algorithms used for splicing tamper detection. Additionally,
Table 3 depicts the number of trainable parameters for each neural network.
These results show that the proposed network requires a lower processing time with a lower computational cost, which allows for its more straightforward implementation and potentially faster results, making it an attractive option for splicing tamper detection tasks. The Siamese model’s reduction in processing time and parameters compared with the main CNN backbones is due to the tasks that must be performed by the pre-trained models, which are specifically adapted for the detection of splicing manipulations.
5. Discussion
The analysis of the results determines that the proposed method based on Siamese neural networks is an efficient tool for detecting splicing manipulations in digital images. The evaluation results show that the proposed scheme, in which the image under analysis is resized to 512 × 512 and segmented into non-overlapped blocks of 32 × 32 pixels, which are then used to train the Siamese neural network, is an efficient technique for detecting image manipulations by comparing blocks and determining their similarity. This allows for the accurate and efficient detection of manipulated areas, improving the effectiveness of the proposed method.
Furthermore, the Siamese neural network extracts the inherent and invariant features of each image block and learns patterns related to each image region, increasing its efficiency in detecting manipulated areas. For this reason, this technique detects manipulated regions despite applying geometric distortions to the duplicated region. The application of data augmentation during the training of the Siamese neural network, through geometric distortions and image processing, increases the system’s robustness by generating image representations efficiently, avoiding underfitting and overfitting. The detection of the manipulated region using the K-means algorithm helps determine the similarity between image blocks to detect this region efficiently. The results obtained in the experimental tests demonstrated the effectiveness of the proposed algorithm in detecting splicing manipulations in images, providing higher accuracy than other previously proposed schemes. The high performance of the algorithm in detecting manipulated areas in the Realistic Tampering image database and our database is highlighted, as is its ability to adapt to different types of manipulations, as in the CASIA V2 and Columbia databases. It is important to mention that our system can detect altered areas by using the splicing technique.
It is worth mentioning that adjusting the blocks’ size during preprocessing impacts the neural network’s training time and the method’s efficiency. An increase in block size results in a reduction in training time, but it also decreases the efficiency of the proposed method. On the other hand, reducing the size of the blocks increases the processing time but improves the efficiency in detecting splicing manipulations. Therefore, a balanced relationship must be established between the computational cost, processing time, and efficiency of the proposed system. This relationship will depend on the system’s specific features, where the proposed methodology is implemented, and the user’s requirements.
Table 3 shows that most of the neural networks used as the backbone in the algorithms compared with the proposed method have more trainable parameters, which increases the training time. This effect is reflected in the testing times taken by the algorithms. On the other hand, algorithms that use neural models with a more significant number of parameters can, in some cases, show increased efficiency; however, this entails a higher computational cost. An example of this is the method of Ahmed et al. [
26], which presents a slightly higher efficiency (98.79%) compared to the proposed method (98.6%) but with a significantly longer training time (58.05 s versus 5.25 s), thus demonstrating a higher computational cost.
On the other hand, the method proposed by Das and Naskar [
24] uses the MobileNet neural network, which contains a smaller number of trainable parameters and a shorter training time (2.55 s), although it obtains a lower efficiency (93%). The technique developed in this study allows for obtaining an algorithm with high efficiency, providing a favorable balance between efficiency and computational cost.
6. Conclusions
This paper proposes a splicing forgery detection algorithm in which the image under analysis is resized and segmented into non-overlapping blocks of 16 × 16 pixels. These blocks are then fed into a Siamese neural network for feature extraction, which is used in the K-means stage to detect duplicated areas. The proposed algorithm was evaluated using several public databases as well as a database developed by the author during this research. The evaluation results show that the proposed scheme provides higher accuracy than other previously proposed algorithms, being able to detect more than just splicing manipulations. The proposed method’s primary advantage is its capability to detect and localize manipulated and duplicated areas, unlike many existing methods based on neural networks, which only identify whether an image has been manipulated without providing specific information about the affected areas.
The system was assessed with six databases, providing an exhaustive evaluation with more images in different scenarios. Most existing methods are evaluated with a maximum of three databases, which is an advantage of the proposed method. The Realistic Tampering image database also consists of realistic image manipulations that are difficult to detect with the naked eye. In addition, our image database contains images in different conditions and were taken in different places with different cameras and in different environmental conditions, such as changes in lighting, image size, and times.
Optimizing the computational complexity of the proposed method based on Siamese neural networks is crucial to improving its performance and efficiency. Some preprocessing techniques could be considered to optimize the performance of the proposed method in future work. PCA reduces the dimensionality of the data while maintaining as much variance as possible, which can decrease the computational processing cost without losing relevant information. Normalizing the data in a specific range, usually [0, 1], can improve the model’s stability and performance during training. Using the halftone version of the image to normalize the data can help maintain the image’s features of pixel intensity. This technique can generate more efficient learning in the neural network.
Additionally, the development of future work related to the proposed method could focus on generalizing neural network training for robust feature learning by implementing algorithms that do not depend on a specific dataset or image, such as integrating image features in neural network feature maps to determine image manipulations. Neural network learning can be achieved through techniques that create more robust feature maps, eliminating the need to compare an altered image with the original image. On the other hand, less computationally complex neural network methodologies that provide high efficiency in detection and classification are necessary, thus improving overall model performance. Applying these techniques and approaches could create efficient and faster detectors of splicing manipulations.