Enhancing Signature Verification Using Triplet Siamese Similarity Networks in Digital Documents

Tehsin, Sara; Hassan, Ali; Riaz, Farhan; Nasir, Inzamam Mashood; Fitriyani, Norma Latif; Syafrudin, Muhammad

doi:10.3390/math12172757

Open AccessArticle

Enhancing Signature Verification Using Triplet Siamese Similarity Networks in Digital Documents

by

Sara Tehsin

¹,

Ali Hassan

¹

,

Farhan Riaz

^1,2,

Inzamam Mashood Nasir

³,

Norma Latif Fitriyani

^4,*

and

Muhammad Syafrudin

^4,*

¹

Department of Computer and Software Engineering, National University of Sciences and Technology, Islamabad 44080, Pakistan

²

School of Computer Science, University of Lincoln, Lincoln LN6 7DQ, UK

³

Department of Computer Science, HITEC University Taxila, Taxila 47040, Pakistan

⁴

Department of Artificial Intelligence and Data Science, Sejong University, Seoul 05006, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Mathematics 2024, 12(17), 2757; https://doi.org/10.3390/math12172757

Submission received: 4 August 2024 / Revised: 31 August 2024 / Accepted: 4 September 2024 / Published: 5 September 2024

(This article belongs to the Section Fuzzy Sets, Systems and Decision Making)

Download

Browse Figures

Versions Notes

Abstract

:

In contexts requiring user authentication, such as financial, legal, and administrative systems, signature verification emerges as a pivotal biometric method. Specifically, handwritten signature verification stands out prominently for document authentication. Despite the effectiveness of triplet loss similarity networks in extracting and comparing signatures with forged samples, conventional deep learning models often inadequately capture individual writing styles, resulting in suboptimal performance. Addressing this limitation, our study employs a triplet loss Siamese similarity network for offline signature verification, irrespective of the author. Through experimentation on five publicly available signature datasets—4NSigComp2012, SigComp2011, 4NSigComp2010, and BHsig260—various distance measure techniques alongside the triplet Siamese Similarity Network (tSSN) were evaluated. Our findings underscore the superiority of the tSSN approach, particularly when coupled with the Manhattan distance measure, in achieving enhanced verification accuracy, thereby demonstrating its efficacy in scenarios characterized by close signature similarity.

Keywords:

signature verification; triplet Siamese similarity network; document forgery; machine learning; deep learning

MSC:

68T10; 68T07; 62M40

1. Introduction

Biometric verification methods encompass a range of modalities such as irises, thumbprints, veins, faces, and signatures, serving diverse applications including financial transactions, attendance tracking, and official contract signing [1]. Initially, biometric identification relied on bodily measurements; however, evolving needs have necessitated the integration of multiple human bodily characteristics for authentication, with handwritten signatures being among the most prevalent. Yet, identifying individuals via signatures presents challenges, notably the potential for replication through practice [2]. This difficulty arises from the limited availability of training data and the subtle distinctions between authentic signatures and skilled forgeries.

Discerning between inter-personal forgery and intra-personal signature genuineness is particularly challenging due to minimal inter-class dissimilarities and significant intra-class variations [3]. Recent advances in pattern recognition and image processing have facilitated automated signature authentication [4]. However, the lack of dynamic information regarding the signature writing process hinders the development of effective feature extractors to differentiate genuine from forged signatures. In this context, deep learning (DL) models, notably the Siamese neural network, have emerged as a promising approach [5].

The Siamese neural network (SNN) transforms signature features into convolutional representations, enhancing handwriting verification performance [6]. Despite various offline verification approaches, none have surpassed existing methods [7]. Addressing this, a triplet Siamese similarity network (tSSN) was proposed to authenticate signatures and compare it with conventional methods. Multiple distance measure techniques were evaluated, including Euclidean, Manhattan, and Minkowski distances, within the triplet similarity network framework.

Our major contributions include:

A significant exploration of signature authentication methods.
Introduction of a triplet Siamese similarity network (tSSN) for signature validation, emphasizing its superiority over traditional approaches.
Evaluation of various distance measure techniques within the triplet similarity network paradigm, enhancing the robustness of signature verification.

The study is structured as follows: Section 2 offers a comprehensive literature review of handwritten signature verification. Section 3 and Section 4 delineate the proposed technique and present the corresponding experimental results, respectively. Finally, Section 5 summarizes the conclusions drawn from this study and outlines avenues for future research.

2. Literature Review

Signature verification, a form of biometric authentication, holds significant importance in sectors such as banking, judiciary, and administration, where user authentication is paramount. Much like fingerprints, each individual’s signature is inherently unique. However, despite this uniqueness, the possibility of signature replication through repeated attempts exists. To mitigate such fraudulent practices and effectively discern genuine signatures from forgeries, deep learning and machine learning techniques have been leveraged. Both online and offline authentication methods have been explored in extant literature.

Ghosh et al. (2021) introduced a spatial-temporal adaptation-based Siamese neural network, employing Long Short-Term Memory (LSTM) and 1-D Convolutional Neural Network (1-D CNN) architectures to extract spatial features [8]. This network effectively integrated temporal information, contributing to its widespread adoption in signature verification tasks due to its superior classification performance with limited input data [9]. Tolosana et al. (2021) proposed a recurrent neural network solution specifically tailored for online signature verification, demonstrating improved performance in distinguishing genuine signatures from forgeries when combined with dynamic time warping [10].

In a complementary study, Jain et al. (2020) introduced an architectural-independent CNN-based approach characterized by a Shallow CNN (sCNN) architecture, which exhibited reduced training and testing times owing to its streamlined parameterization while yielding enhanced verification outcomes [11]. Building upon this, Jagtap et al. (2019) proposed a combined CNN and Siamese network framework, augmenting signature representations with statistical features to bolster the accuracy of distinguishing real signatures from counterfeit ones [12].

Moreover, researchers have explored the efficacy of Recurrent Neural Network (RNN) architectures for both online and offline signature verification tasks. Tolosana et al. (2017) demonstrated the superiority of RNN-based approaches, particularly LSTM and Bidirectional LSTM (BLSTM) models, in online verification scenarios [13]. Ruiz et al. (2020) extended this inquiry to offline verification, highlighting the comparative advantages of RNN models over traditional CNN architectures [14]. The contrasting of the unique RNN and CNN models gave better results for offline signature verification and identification [15]. The RNN algorithms were LSTM- and BLSTM-based. These two models performed better as compared to the CNN model.

Furthermore, Chakladar et al. (2021) introduced a multimodal Siamese neural network (mSNN) that integrated EEG signals with offline signature samples, facilitating user identification through the extraction of distinct temporal and spatial characteristics [16]. Their findings underscored the efficacy of the Siamese network in capturing the nuanced similarities and dissimilarities between signatures.

However, challenges persist, particularly concerning the scarcity of training data, which can impede the accuracy of signature verification systems [17]. Addressing this limitation, Yapıcı et al. (2021) proposed a novel application of data augmentation techniques, specifically employing Cycle-GAN, to enhance the robustness of CNN-based models for offline signature verification. They evaluated the data augmentation techniques on the widely used CNN-based VGG19, VGG16, DenseNet121, and ResNet50 models. Experiments revealed that the signature augmentation technique improved the performance of standalone CNN models for offline signature verification.

Additionally, Tahir et al. (2021) emphasized the exploration of geometric features and Recurrent Neural Networks for offline verification, aiming to discern metric disparities among signature image pairings [18]. Geometric features such as Aspect Ratio (AR), Baseline Slope Angle (BSA), Normalized Area (NA), and Extraction of the Center of Gravity were used for offline verification. With a Recurrent Neural Network (RNN) and Siamese network, [19] aimed to discover metric differences between signature image pairings [19]. In addition, they experimented with LSTM and Gated Recurrent Unit (GRU) systems with Siamese to determine the efficacy of various network structures. Based on their research, they determined that the Siamese network outperformed online verification. Meanwhile, Hefny and Moustafa (2020) developed an online verification system based on deep learning techniques, utilizing Legendre polynomial coefficients as discriminative features and observing reduced error rates and improved accuracy through their experimentation with the SigComp2011 dataset [20].

3. Proposed Methodology

CNN is the most widely used deep learning architecture which can be used in various applications, particularly for computer vision tasks. This paper examines the effectiveness of a tSSN by comparing this network to a Siamese-based CNN network. The proposed technique is depicted in Figure 1.

Three signature samples were used separately as input in a customized triplet loss CNN named t-Net. The triplet loss function was computed in our methodology as it enforced a gap between similar and dissimilar signatures which made it effective in signature verification where signature samples were very closely related. The Euclidian distance measure technique was computed among the signature sample, and then their results were passed on to the similarity network. This similarity network was a two-layered network which gave the result as real or forged signatures.

3.1. Pre-Processing

The initial phase of our methodology involved preprocessing. In this stage, the “bounding box” method was applied to the image sets, wherein a rectangular area was delineated around the Region of Interest (ROI). This technique effectively isolated the signature portion while excluding extraneous elements. Consequently, this approach streamlined subsequent processing by reducing the computational load. The signature was then converted into grayscale, and the values were normalized by dividing the pixel values by the standard deviation. The input images had a standard 256 × 256 dimension by resizing the image. Therefore, the signature images underwent a sizing process. Preprocessing was performed on the whole dataset. Figure 2 depicts the preprocessing procedures.

3.2. Convolutional Neural Network and t-Siamese Network

The proposed CNN was a lightweight architecture comprising 10 layers, as illustrated in Figure 3. It included two convolutional layers, a batch normalization layer, an average pooling layer, a dropout layer, a flatten layer, and a fully connected dense layer. The convolutional layers performed feature extraction by convolving a weighted kernel. Non-linearity in the feature map was introduced through the application of a nonlinear layer. The pooling layer reduced the spatial resolution by replacing neighborhoods with their statistical information. Each layer of the CNN established a local connection between the input and the output. The ReLU activation function was employed throughout the CNN architecture. Padding for each layer was set to ‘valid’. Additionally, each convolutional layer block was followed by an average pooling layer with a stride of 2.

The tSSN consisted of three identical subnets. The configuration of the three CNNs was identical, with the same configuration combined with a distance metric. In the training process, there were three signatures in a data instance: anchor (S), positive (S+), and negative (S⁻), where the anchor and positive signatures were the original signatures of a person, whereas negative signatures were his forged signatures. The triplet signatures

(S, S^{+,} S^{-})

was input through t-similarly based CNN, which converted these signatures into an embedding space

D_{t} (S), D_{t} (S^{+}), D_{t} (S^{-})

through the dense layer of t-Siamese CNN. In this triplet structure, signatures from different classes were expected to be grouped into tightly well-separated clusters. The tSSN had a t-Net that sequentially extracted features of

(S, S^{+}, S^{-})

images. The Euclidian distance metric can be used to measure the distance between

(S, S^{+})

and

(S, S^{-})

can be computed as

{d i s t}^{+} = |D_{t} (S) - D_{t} (S^{+})|

(1)

and

{d i s t}^{-} = |D_{t} (S) - D_{t} (S^{-})|

(2)

The loss of tSSN model can be computed through this distance which helps in the computation of the similarity score. This score was transformed into a range from 0 to 1 through the similarity network. The similarity of

(S, S^{+})

was expected to be larger than that of

(S, S^{-})

.

3.3. Triplet Loss Function

The tSSN model uses a triplet loss function to achieve better performance, which means it increases the distance between inter-class signatures while decreasing the distance between intra-class signatures. In the training process, the triplet loss function was used, which is given as

L (S, S^{+}, S^{-}) = \max (0, α + {d i s t}^{+} - {d i s t}^{-})

(3)

where

α

is the margin with the value 0.4, and the dist⁺ is the distance between the anchor and positive sample, while dist⁻ is the distance between the anchor and negative sample. The value of alpha can vary between 0 to 1 but the embedding clusters were created much tighter due to the higher value of alpha. If there are n number of training images, then the loss is

I (L (S, S^{+}, S^{-})) = \sum_{i = 1}^{n} L (S_{i}, {S^{+}}_{i}, {S^{-}}_{i})

(4)

The triplet loss function interpretation is:

L = \{\begin{matrix} 0, t w o d i s t s a r e f a r e n o u g h \\ L (S, S^{+}, S^{-}) t w o d i s t s a r e n o t f a r e n o u g h \end{matrix}

(5)

While testing, the two distances were then compared to the authenticity of the unknown signature using:

{d i s t}^{-} - {d i s t}^{+} \geq α

(6)

These measured distances were forwarded to the similarity network as a feature set. The two layered similarity network classified the image as a real or forged signature image while incorporating the inter-class and intra-class distances among the signatures. The tSSN architecture outperformed as compared to the standalone Convolutional Neural Network and Siamese network; this architecture has been especially preferred.

4. Experiments

The efficacy of the proposed methodology was evaluated through a comprehensive series of experiments. The implementation was conducted in Python 3.7.10 utilizing the Keras deep learning API. Training of the model was executed on an Intel i7 1.6 GHz CPU with 16 GB of RAM (Intel, Santa Clara, CA, USA), augmented by a GPU card (NVidia GeForce MX330, Nvidia, Santa Clara, CA, USA).

4.1. Datasets

For experimentation, five different datasets were used, which included the BHSig260, SigComp2011, 4NSigComp2010, 4NSigComp2012, and CEDAR datasets. The distribution of the training and testing dataset was 60:40 throughout our experiments. The BHsig260 signature dataset consists of the signatures of 260 people, which are a mixture of Hindi and Bengali signatures. It has 100 Bengali signatures and 160 Hindi signatures. SigComp2011 was published at the ICDAR 2011 conference for the International Signature Verification Competition (SVC) [21]. The dataset contains both Chinese and Dutch authors. The 4NSigComp2010 dataset contains images of offline signatures [22]. The education signature collection contains 209 images. It has 9 actual signatures and 200 forged signatures from the same author. The 4NSigComp2012 dataset’s training set is comprised of the training and test sets of 4NsigComp2012 [23]. It contains the signatures of two writers as examples. It has the same configuration as 4NSigComp2010. The CEDAR dataset comprises 55 English signatures, of which half are forged samples and half are genuine signatures of each person [24]. The samples of signatures are given in Figure 4.

4.2. Evaluation Parameters

The model’s performance was evaluated using accuracy, False Acceptance Rate (FAR), and False Rejection Rate (FRR) metrics. The FAR is calculated as the ratio of false positive to negative samples. The FRR is calculated as the proportion of false negative to positive samples. Lower FRR or FAR and greater accuracy indicate improved performance. Its calculations are:

A C C = \frac{T r u e p o s i t i v e + T r u e n e g a t i v e}{T r u e p o s i t i v e + T r u e n e g a t i v e + F a l s e p o s i t i v e + F a l s e n e g a t i v e} \times 100

(7)

F A R = \frac{F a l s e p o s i t i v e}{F a l s e p o s i t i v e + T r u e n e g a t i v e} \times 100

(8)

F R R = \frac{F a l s e n e g a t i v e}{F a l s e n e g a t i v e + T r u e p o s i t i v e} \times 100

(9)

For the detailed results, the recall and precision have also been calculated for the proposed methodology. The Recall evaluation metric denotes the correct number of the signature prediction as a real case out of positive plus negative cases. The precision measures the ability of the model to rightly classify a signature with a real case among the total number of positive cases. The F-1 score is the combination of recall and precision.

R e c a l l = \frac{T r u e p o s i t i v e}{T r u e p o s i t i v e + F a l s e n e g a t i v e}

(10)

P r e c = \frac{T r u e p o s i t i v e}{T r u e p o s i t i v e + F a l s e p o s i t i v e}

(11)

F - 1 = 2 \times \frac{P r e c \times R e c a l l}{P r e c + R e c a l l}

(12)

4.3. Results and Discussion

The proposed methodology has been tested for all datasets. In Table 1, the results of tSSN are given using various performance metrics, which shows that the proposed methodology outperformed. The following experiments were conducted using Euclidian distance.

The proposed methodology has also been tested for different transformations such as rotation, scaling, and flipping. The testing samples are shown in Figure 5. The zooming of the image was performed by the value 0.3. A 90-degree rotation was performed. Horizontal flipping was also performed for all the testing images.

The results in terms of accuracy are given in Table 2 which indicates the accuracy drops as the testing image has different transformations. However, it was differentiating among real or forged signatures with more than 60% accuracy.

The research recognizes the decline in performance found when applying certain picture alterations; however, it does not extensively explore techniques to mitigate these impacts. Considering that signatures and pictures in real-life situations often experience different types of distortions, it is crucial to investigate techniques that might improve the model’s ability to withstand these alterations. By formulating and executing techniques to reduce the effects of these changes, the resilience of the model may be greatly enhanced, resulting in more dependable performance in practical situations when optimal circumstances are not consistently present.

4.3.1. Comparison with Other Standalone Models

A comparison between tSSN and other standalone models on the four public datasets is presented. The standalone CNN model carries nine layers, which include two convolutional, two batch normalization, two average polling, two drop-out, one flatten, and one output layer, as shown in Figure 3. The Siamese network is a twin network that shares the same weights and network. Its architecture is shown in Figure 6. The features from the CNN model were extracted, which were forwarded to the lambda layer. The distance among signature features has been calculated through the Euclidian technique.

In Table 3, the results of tSSN, standalone CNN, and Siamese CNN are depicted. Using a tSSN method based on Euclidean distance, trials were conducted independently for each dataset. In addition, pairs of genuine and fake signatures were used when evaluating the performance of the proposed methodology in terms of accuracy and comparing the similarity between the two signatures. The random pairs of signatures from that dataset were chosen for the training on each dataset.

4.3.2. Comparison with Different Loss Functions

The signature verification method is a binary classification problem that has two classes: real or forged. The loss function plays a critical role as a crucial part of the optimization process. In tSSN architecture, the triplet loss function has been used throughout experimentation to check the response of the proposed methodology on different loss functions; the contrastive loss function and binary-cross entropy function were compared with the triplet loss function in this paper. The contrastive loss works on a similar strategy as the triplet loss function. It uses pairs of samples, which can be S and S⁺ or S and S⁻. If the samples are S and S⁺, they are pulled towards each other; otherwise, the distance among them is increased. The function of contrastive loss is given as

L o s s = \sum_{i = 1}^{b} [(1 - y) {‖f (S_{i}^{j} - S_{i}^{+})‖}_{2}^{2} + y [- {‖f (S_{i}^{j} - S_{i}^{-})‖}_{2}^{2}] + α]

(13)

where y = 0 when

(S_{i}^{j} - S_{i}^{+})

is the anchor-positive signature, and y = 1 when

(S_{i}^{j} - S_{i}^{-})

is the anchor-negative signature. In other words, the contrastive loss performs like the triplet loss but one by one rather than simultaneously. The performance of the triplet loss function is better than the contrastive loss since the triplet inputs guaranteed a better distance margin than the pairwise contrastive loss. The triplet loss function works well for imbalanced datasets as compared to the contrastive loss function. Binary cross-entropy or log loss computes the average difference score between the actual and predicted probability distributions for predicting class 1, which means the score is minimized, and a perfect cross-entropy value is 0. Its probabilistic interpretation provides a measure of how well the predicted probabilities align with the true binary labels. As signature forgery is a binary class problem, that is why binary cross entropy loss function has been used while ignoring the triplet and other loss mechanisms. However, its performance drops when the dataset is imbalanced. It overlooks the intra-class and inter-class differences. The performance comparison of these loss functions is given in Table 4. All distances are measured using Euclidian distance measure technique in this experimentation.

4.3.3. Comparison with Several Pre-Trained Models

The CNN baseline models have been demonstrated to outperform other methods in a wide variety of image-processing applications. However, it would be impossible to train these models from scratch. In certain instances, the deployment of pre-trained models using the Transfer Learning (TL) approach may be beneficial. In TL, a deep learning model’s information learned from a large dataset is used to tackle a related task using a smaller dataset [25]. Four pre-trained models (specifically pre-trained on ImageNet), such as VGG16, ResNet50, MobileNet, and EfficientNetB2 [26], were utilized for comparison with the tSSN model to classify genuine and forged signatures. The final layer in these models has been deleted, and a new FC layer with an output size of two has been added to represent the binary classes (genuine and forge). The last FC layer was trained in these models, whereas the other layers were initialized with pre-learned weights. The VGG16, ResNet50, MobileNet, and EfficientNet model had 16, 50, 28, and 237 layers, respectively, while the tSSN model had 12 layers, including the layers of the similarity network. It had fewer parameters as compared to the state-of-the-art models. This experimentation was performed on four signature datasets to validate the performance of the proposed methodology. The accuracy comparison is given in Table 5.

4.3.4. Comparison with Machine Learning Classifiers

In this study, a feature vector of dimensionality 128 was extracted from the temporal Social Sensory Network (tSSN), which was subsequently input to a two-layer neural network termed the similarity network for classification purposes. Our methodology was benchmarked against conventional machine learning algorithms, including Support Vector Machine (SVM), Logistic Regression (LR), and Gradient Boosting (XGBoost) Classifier. A comparative analysis of classification accuracies is presented in Table 6, demonstrating the superior performance of our proposed model over the aforementioned machine learning classifiers.

4.3.5. Comparison with Several Distance Measure Techniques

The common practice for finding the distance between two signatures samples is a comparison with distance measure techniques. In our experimentation, three distances, including Euclidian, Manhattan, and Minkowski distances, were also calculated for our methodology.

The formula of the Euclidian distance measure technique is:

D_{e} = \sqrt{{(S_{x 1} - S_{x 2})}^{2} + {(S_{y 1} - S_{y 2})}^{2}}

(14)

where

S_{1} a n d S_{2}

are the signature samples among which distance is needed to measure. The Manhattan distance measure formula is given as

D_{m} = |(S_{x 1} - S_{x 2})| + |(S_{y 1} - S_{y 2})|

(15)

The Minkowski distance measure technique formula is as follows:

D_{m i n} = {(\sum_{i = 1}^{n} {|S_{x i} - S_{y i}|}^{p})}^{1 / p}

(16)

The Minkowski with p = 3 is the generalized form of Euclidian and Manhattan distance measure techniques. The proposed methodology has been tested on five datasets using different distance measure techniques and accuracy; FAR and FRR values are shown in Figure 7. The results of Manhattan were better as compared to the Euclidian distance and Minkowski distance measure techniques.

The tSSN model was specifically created to provide a strong and efficient solution for detecting signature fraud. It does this by accurately capturing the complex spatial and temporal characteristics that are naturally present in handwritten signatures. The design of the model enables it to assess minute deviations in stroke patterns, pressure, and time, accurately differentiating between authentic and counterfeit signatures. This feature is very advantageous in practical scenarios when it is necessary to promptly and dependably confirm the genuineness of a signature. The tSSN model is known for its ability to strike a good balance between computing efficiency and detection performance. This makes it particularly suitable for situations where real-time processing is crucial, such as in financial transactions, legal document verification, and identity authentication systems.

Nevertheless, the effectiveness of the model might fluctuate based on several aspects, such as the intricacy and variety of the signature collection, the magnitude and complexity of the network, and the computing resources at hand. In low-resource contexts, it may be necessary to optimize the model by lowering its size or using more efficient techniques in order to ensure real-time performance while still maintaining accuracy. Moreover, adjusting the model’s parameters to match the unique attributes of the target dataset might further improve its effectiveness and ability to detect. Overall, while the tSSN model shows potential for detecting signature fraud, its actual use in various operational scenarios may need careful consideration.

4.3.6. Comparison with Previous Works

The proposed methodology was compared with other existing methods on different datasets, which is depicted in Table 7. The results were generated on different datasets using various existing techniques. The tSSN with Euclidian distance technique has been used. The findings of this study demonstrate that the proposed methodology exhibited superior performance compared to existing approaches, as evidenced by the empirical results (see Table 7).

4.3.7. Validating Performance of the Selected Module

In this sub-section, an ablation study conducted to rigorously assess the efficacy of the chosen module within the framework of our proposed architecture is presented. This empirical analysis serves to validate the robustness and efficacy of our architectural design, offering insights into the critical components driving its performance. To assess the contribution of each module in our proposed network, specific components from the original framework were removed. Specifically, the similarity network, t-Net, and triplet loss function blocks were removed to evaluate the impacts on signature verification. The results of these ablation studies are reported in Table 8.

The findings indicate that removing selected modules results in performance degradation across all datasets, especially triplet loss as it increases the inter-class distance. This suggests that our proposed mechanism effectively mitigates information using t-Net and triplet loss functions. Additionally, the similarity module demonstrates a performance improvement which highlights the effectiveness of our similarity network.

4.3.8. Validating Performance on Different Cross-Validations

The proposed methodology has been tested through K-fold cross-validation, which reduced overfitting and utilized data efficiently. The results of the 10-fold validation on the CEDAR dataset are represented in Table 9. In 10-fold cross-validation, nine sets were used for training, and one set was used for testing. The average accuracy, recall, precision, and F-1 score were 75.0, 75.3, 73.0, and 76.5, respectively, on the CEDAR dataset.

The cross-validation on unseen dataset has been conducted, in which training has been performed on the SigComp2011 dataset, and testing has been performed on the 4NsigComp2012 dataset. The average accuracy, recall, precision, and F-1 score, in this case, were 80.6, 82.7, 81.4, and 83.6, respectively. The study would greatly benefit from a more thorough analysis of the situations in which the model does not succeed or perform below its ideal level. Conducting such an investigation is essential for recognizing the constraints of the model, which may play a key role in directing future enhancements. An example of this is comprehending the reasons behind the model’s difficulties in handling certain sorts of signatures, such as those that exhibit significant variation or are affected by noise. This knowledge may help identify particular aspects of the model’s structure or training methods that may need improvement. Through a comprehensive analysis of these instances of failure, researchers might obtain useful insights that would enable them to make specific modifications, eventually resulting in a model that is more resilient and dependable.

Furthermore, examining subpar performance might reveal certain circumstances in which the model would need further adjustment. For instance, if there are ongoing problems in dealing with certain ways of signing or if specific environmental conditions are causing issues, this may suggest the need for more advanced preprocessing methods or the incorporation of additional models to improve accuracy. By directly confronting these challenges, the study will not only enhance the existing model but also advance the creation of more thorough tactics for detecting signature fraud. By conducting a more thorough examination, the model may be enhanced to efficiently handle various real-world situations, hence improving its practical usefulness and dependability.

To tackle the problem of the model’s ability to apply to signatures that have not been seen before and to various demographic groups, the article provides a comprehensive assessment of the model’s performance over a wide range of signatures and demographic attributes. This was accomplished by implementing a set of examinations specifically intended to evaluate the model’s efficacy on signatures that were not included in the dataset used for training.

5. Conclusions

The objective of this study was to propose a methodology for effectively distinguishing between genuine and forged signatures by leveraging a similarity network with a triplet loss function. This approach was designed to enhance the discrimination of inter-class and intra-class differences. Extensive experimentation on publicly available datasets confirmed the efficacy of the proposed methodology. Compared to standalone similarity networks and CNN architectures, our proposed framework consistently achieved superior performance, particularly attributed to the triplet loss function’s ability to optimize inter-class disparities while minimizing intra-class variations. While ResNet marginally outperformed our methodology on one dataset, our approach demonstrated remarkable accuracy, surpassing 85% in discriminating between genuine and forged signatures across multiple datasets. Among various distance measures employed in the t-Siamese similarity network, the Manhattan distance technique emerged as the most effective, outperforming alternatives like Euclidean and Minkowski distance measures. Nonetheless, the classification accuracy diminished when subjected to image transformations such as rotation, scaling, and flipping during testing. Future research endeavors will focus on integrating handcrafted features into our approach to mitigate the impact of these transformations while reducing the complexity of the proposed architecture. This work will be extended to online datasets whose dynamic information is different from offline datasets.

Author Contributions

S.T.: Conceptualization, methodology, software, writing—original draft preparation. A.H.: validation, investigation, resources. F.R.: data curation, visualization, investigation. I.M.N.: writing—original draft preparation, supervision, visualization. N.L.F.; visualization, validation, supervision, funding, writing—review and editing. M.S.: conceptualization, methodology, supervision, funding, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the first/corresponding author/s.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Christlein, V.; Bernecker, D.; Hönig, F.; Maier, A.; Angelopoulou, E. Writer Identification Using GMM Supervectors and Exemplar-SVMs. Pattern Recognit. 2017, 63, 258–267. [Google Scholar] [CrossRef]
Maergner, P.; Pondenkandath, V.; Alberti, M.; Liwicki, M.; Riesen, K.; Ingold, R.; Fischer, A. Combining graph edit distance and triplet networks for offline signature verification. Pattern Recognit. Lett. 2019, 125, 527–533. [Google Scholar] [CrossRef]
Nasir, I.M.; Raza, M.; Ulyah, S.M.; Shah, J.H.; Fitriyani, N.L.; Syafrudin, M. ENGA: Elastic Net-Based Genetic Algorithm for human action recognition. Expert Syst. Appl. 2023, 227, 120311. [Google Scholar] [CrossRef]
Wu, D.; Luo, X.; Shang, M.; He, Y.; Wang, G.; Wu, X. A Data-Characteristic-Aware Latent Factor Model for Web Services QoS Prediction. IEEE Trans. Knowl. Data Eng. 2020, 34, 2525–2538. [Google Scholar] [CrossRef]
Luo, X.; Qin, W.; Dong, A.; Sedraoui, K.; Zhou, M. Efficient and High-quality Recommendations via Momentum-incorporated Parallel Stochastic Gradient Descent-Based Learning. IEEE/CAA J. Autom. Sin. 2020, 8, 402–411. [Google Scholar] [CrossRef]
Li, H.; Wei, P.; Hu, P. AVN: An Adversarial Variation Network Model for Handwritten Signature Verification. IEEE Trans. Multimed. 2021, 24, 594–608. [Google Scholar] [CrossRef]
Liu, Z.; Luo, X.; Wang, Z. Convergence Analysis of Single Latent Factor-Dependent, Nonnegative, and Multiplicative Update-Based Nonnegative Latent Factor Models. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 1737–1749. [Google Scholar] [CrossRef]
Ghosh, S.; Ghosh, S.; Kumar, P.; Scheme, E.; Roy, P.P. A novel spatio-temporal siamese network for 3d signature recognition. Pattern Recognit. Lett. 2021, 144, 13–20. [Google Scholar] [CrossRef]
Jain, S.; Khanna, M.; Singh, A. Comparison among different CNN architectures for signature forgery detection using siamese neural network. In Proceedings of the 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India, 19–20 February 2021; pp. 481–486. [Google Scholar]
Tolosana, R.; Vera-Rodriguez, R.; Fierrez, J.; Ortega-Garcia, J. DeepSign: Deep On-Line Signature Verification. IEEE Trans. Biom. Behav. Identit. Sci. 2021, 3, 229–239. [Google Scholar] [CrossRef]
Jain, A.; Singh, S.K.; Singh, K.P. Handwritten signature verification using shallow convolutional neural network. Multimed. Tools Appl. 2020, 79, 19993–20018. [Google Scholar] [CrossRef]
Jagtap, A.B.; Sawat, D.D.; Hegadi, R.S.; Hegadi, R.S. Siamese network for learning genuine and forged offline signature verification. In Recent Trends in Image Processing and Pattern Recognition: Second International Conference, RTIP2R 2018, Solapur, India, 21–22 December 2018, Revised Selected Papers, Part III 2; Springer: Singapore, 2019. [Google Scholar]
Tolosana, R.; Vera-Rodriguez, R.; Fierrez, J.; Ortega-Garcia, J. Biometric signature verification using recurrent neural networks. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017. [Google Scholar]
Ruiz, V.; Linares, I.; Sanchez, A.; Velez, J.F. Off-line handwritten signature verification using compositional synthetic generation of signatures and Siamese Neural Networks. Neurocomputing 2020, 374, 30–41. [Google Scholar] [CrossRef]
Ghosh, R. A Recurrent Neural Network based deep learning model for offline signature verification and recognition system. Expert Syst. Appl. 2021, 168, 114249. [Google Scholar] [CrossRef]
Chakladar, D.D.; Kumar, P.; Roy, P.P.; Dogra, D.P.; Scheme, E.; Chang, V. A multimodal-Siamese Neural Network (mSNN) for person verification using signatures and EEG. Inf. Fusion 2021, 71, 17–27. [Google Scholar] [CrossRef]
Yapıcı, M.M.; Tekerek, A.; Topaloğlu, N. Deep learning-based data augmentation method and signature verification system for offline handwritten signature. Pattern Anal. Appl. 2021, 24, 165–179. [Google Scholar] [CrossRef]
Tahir, N.M.T.; Ausat, A.N.; Bature, U.I.; Abubakar, K.A.; Gambo, I. Off-line Handwritten Signature Verification System: Artificial Neural Network Approach. Int. J. Intell. Syst. Appl. 2021, 13, 45–57. [Google Scholar] [CrossRef]
Tolosana, R.; Vera-Rodriguez, R.; Fierrez, J.; Ortega-Garcia, J. Exploring Recurrent Neural Networks for On-Line Handwritten Signature Biometrics. IEEE Access 2018, 6, 5128–5138. [Google Scholar] [CrossRef]
Hefny, A.; Moustafa, M. Online signature verification using deep learning and feature representation using Legendre polynomial coefficients. In The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2019) 4; Springer: Cham, Switzerland, 2020. [Google Scholar]
Liwicki, M.; Malik, M.I.; Heuvel, C.E.v.D.; Chen, X.; Berger, C.; Stoel, R.; Blumenstein, M.; Found, B. Signature Verification Competition for Online and Offline Skilled Forgeries (SigComp2011). In Proceedings of the 2011 International Conference on Document Analysis and Recognition (ICDAR), Beijing, China, 18–21 September 2011; pp. 1480–1484. [Google Scholar]
Liwicki, M.; Heuvel, C.E.v.D.; Found, B.; Malik, M.I. Forensic signature verification competition 4NSigComp2010—Detection of simulated and disguised signatures. In Proceedings of the 2010 12th International Conference on Frontiers in Handwriting Recognition (ICFHR), Kolkata, India, 16–18 November 2010; pp. 715–720. [Google Scholar]
Liwicki, M.; Malik, M.I.; Alewijnse, L.; Heuvel, E.v.D.; Found, B. ICFHR 2012 Competition on Automatic Forensic Signature Verification (4NsigComp 2012). In Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition (ICFHR 2012), Bari, Italy, 18–20 September 2012; pp. 823–828. [Google Scholar]
Meroño-Peñuela, A.; Ashkpour, A.; Guéret, C.; Schlobach, S. CEDAR: The Dutch historical censuses as Linked Open Data. Semant. Web 2017, 8, 297–310. [Google Scholar] [CrossRef]
Yu, K.; Guo, Z.; Shen, Y.; Wang, W.; Lin, J.C.-W.; Sato, T. Secure Artificial Intelligence of Things for Implicit Group Recommendations. IEEE Internet Things J. 2021, 9, 2698–2707. [Google Scholar] [CrossRef]
Geirhos, R.; Rubisch, P.; Michaelis, C.; Bethge, M.; Wichmann, F.A.; Brendel, W. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv 2018, arXiv:1811.12231. [Google Scholar]
Alvarez, G.; Sheffer, B.; Bryant, M. Offline Signature Verification with Convolutional Neural Networks; Technical Report; Stanford University: Stanford, CA, USA, 2016. [Google Scholar]
Arisoy, M.V. Signature verification using siamese neural network one-shot learning. Int. J. Eng. Innov. Res. 2021, 3, 248–260. [Google Scholar] [CrossRef]
Butt, U.M.; Masood, F.; Unnisa, Z.; Razzaq, S.; Dar, Z.; Azhar, S.; Abbas, I.; Ahmad, M. A deep insight into signature verification using deep neural network. In Intelligent Systems and Applications: Proceedings of the 2020 Intelligent Systems Conference (IntelliSys); Springer: Cham, Switzerland, 2021; Volume 3. [Google Scholar]
Abdirahma, A.A.; Hashi, A.O.; Elmi, M.A.; Rodriguez, O.E.R. Advancing Handwritten Signature Verification Through Deep Learning: A Comprehensive Study and High-Precision Approach. Int. J. Eng. Trends Technol. 2024, 72, 81–91. [Google Scholar] [CrossRef]
Dey, S.; Dutta, A.; Toledo, J.I.; Ghosh, S.K.; Lladós, J.; Pal, U. Signet: Convolutional siamese network for writer independent offline signature verification. arXiv 2017, arXiv:1707.02131. [Google Scholar]
Xiong, Y.-J.; Cheng, S.-Y.; Ren, J.-X.; Zhang, Y.-J. Attention-based multiple siamese networks with primary representation guiding for offline signature verification. Int. J. Doc. Anal. Recognit. IJDAR 2024, 27, 195–208. [Google Scholar] [CrossRef]
Ren, J.-X.; Xiong, Y.-J.; Zhan, H.; Huang, B. 2C2S: A two-channel and two-stream transformer based framework for offline signature verification. Eng. Appl. Artif. Intell. 2023, 118, 105639. [Google Scholar] [CrossRef]
Das, S.D.; Ladia, H.; Kumar, V.; Mishra, S. Writer independent offline signature recognition using ensemble learning. arXiv 2019, arXiv:1901.06494. [Google Scholar]

Figure 1. Proposed solution for signature verification.

Figure 2. Preprocessing steps of the proposed methodology.

Figure 3. t-Siamese-based CNN model.

Figure 4. Sample images from the signature dataset.

Figure 5. Sample images of testing image transformations.

Figure 6. Siamese-based CNN model.

Figure 7. Comparison of (a) FAR, (b) FRR, and (c) ACC on selected datasets.

Table 1. Results of the proposed model on all datasets.

Dataset	ACC (%)	Recall (%)	Prec (%)	F-1 (%)	FAR	FRR
4NSigComp2010	90.1	88.6	87.4	87.9	4.3	4.5
SigComp2011	92.2	90.7	89.3	89.9	3.9	3.8
4NSigComp2012	93.5	95.6	91.8	93.6	3.7	3.1
BHsig260	91.5	89.4	94.6	91.9	4.1	4.3
CEDAR	86.1	90.9	85.7	88.2	6.8	4.2

Table 2. Accuracy (%) analysis of dataset variations.

Dataset	Rotation			Flipping			Scaling
Dataset	ACC (%)	FAR	FRR	ACC (%)	FAR	FRR	ACC (%)	FAR	FRR
4NSigComp2010	65.5	9.7	8.6	63.9	10.5	9.9	59.1	9.3	9.0
SigComp2011	55.8	8.7	8.4	60.2	6.8	7.6	52.6	10.9	11.6
4NSigComp2012	63.4	9.6	8.9	64.7	9.1	9.3	62.7	10.7	10.3
BHsig260	61.9	10.1	9.4	54.3	11.9	10.8	60.8	8.5	7.8
CEDAR	62.1	11.8	10.5	57.5	11.2	11.1	57.2	10.3	10.6

Table 3. T-Siamese, Siamese, non-Siamese-based CNN model.

Datasets	CNN			Siamese Based CNN			tSSN
Datasets	Acc	FAR	FRR	Acc	FAR	FRR	Acc	FAR	FRR
4NSigComp2010	60.5	6.3	5.1	75.2	5.1	4.9	90.1	4.3	4.5
SigComp2011	52.4	5.9	4.1	65.9	4.5	3.6	92.2	3.9	3.8
4NSigComp2012	55.7	4.1	4.9	75.6	3.9	3.8	93.5	3.7	3.1
CEDAR	50.8	5.7	5.9	60.3	5.1	4.7	86.1	6.8	4.2

Table 4. Accuracy (%) analysis of different loss functions.

Datasets	Triplet Loss	Contrastive Loss	Binary Cross-Entropy
4NSigComp2010	90.1	92.6	67.5
SigComp2011	92.2	76.1	70.2
4NSigComp2012	93.5	92.7	88.1
CEDAR	86.1	73.8	75.7

Table 5. Accuracy (%) analysis of CNN models.

Datasets	tSSN	VGG16	ResNet	MobileNet	EfficientNet
4NSigComp2010	90.1	61.6	77.5	72.0	74.8
SigComp2011	92.2	76.0	92.9	85.7	79.1
4NSigComp2012	93.5	70.3	82.1	86.5	79.8
CEDAR	86.1	74.9	75.7	79.7	68.9

Table 6. Accuracy (%) analysis of ML classifiers.

Datasets	tSSN	SVM	LR	XGBoost
4NSigComp2010	90.1	78.1	61.5	70.4
SigComp2011	92.2	84.5	64.8	67.6
4NSigComp2012	93.5	73.3	67.1	72.9
CEDAR	86.1	70.5	60.4	76.7

Table 7. Comparison of accuracy with other techniques.

Datasets	Previous Methods	Accuracy (%)
SigComp2011	Signature Verification with CNN [27]	84.0
	One shot Learning [28]	90.1
	tSSN	93.3
4NSigComp2012	Signature Verification using DNN [29]	85.0
	Advancing Handwritten signature [30]	86.2
	tSSN	91.6
BHsig260	Convolutional Siamese Network [31]	86.1
	Attention based network [32]	89.47
	2C2S [33]	90.68
	tSSN	91.5
CEDAR	Ensemble Learning [34]	92.0
	Advancing Handwritten signature [30]	85.4
	tSSN	86.1

Table 8. Validating the performance of the selected module on the proposed methodology.

Without Module	SigComp2011	4NSigComp2012	BHsig260	CEDAR
t-Net	62.0	71.5	67.4	68.7
Triplet loss	45.9	50.6	55.7	59.2
Similarity Network	58.5	61.9	69.3	60.8
Proposed Model with all module	92.2	93.5	91.5	86.1

Table 9. K-fold validation on CEDAR signature’s datasets.

Testing Set	ACC (%)	Recall (%)	Prec (%)	F-1 (%)
1	75.1	72.2	73.9	70.5
2	72.3	73.4	70.4	78.3
3	74.8	73.8	72.5	76.2
4	70.2	69.7	68.4	74.3
5	74.4	71.5	72.6	75.9
6	71.9	68.4	69.9	73.1
7	70.7	73.7	68.1	65.9
8	80.6	82.6	78.3	81.7
9	79.6	83.3	77.0	83.3
10	81.1	84.5	79.3	86.0
Average	75.07	75.3	73.0	76.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tehsin, S.; Hassan, A.; Riaz, F.; Nasir, I.M.; Fitriyani, N.L.; Syafrudin, M. Enhancing Signature Verification Using Triplet Siamese Similarity Networks in Digital Documents. Mathematics 2024, 12, 2757. https://doi.org/10.3390/math12172757

AMA Style

Tehsin S, Hassan A, Riaz F, Nasir IM, Fitriyani NL, Syafrudin M. Enhancing Signature Verification Using Triplet Siamese Similarity Networks in Digital Documents. Mathematics. 2024; 12(17):2757. https://doi.org/10.3390/math12172757

Chicago/Turabian Style

Tehsin, Sara, Ali Hassan, Farhan Riaz, Inzamam Mashood Nasir, Norma Latif Fitriyani, and Muhammad Syafrudin. 2024. "Enhancing Signature Verification Using Triplet Siamese Similarity Networks in Digital Documents" Mathematics 12, no. 17: 2757. https://doi.org/10.3390/math12172757

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Signature Verification Using Triplet Siamese Similarity Networks in Digital Documents

Abstract

1. Introduction

2. Literature Review

3. Proposed Methodology

3.1. Pre-Processing

3.2. Convolutional Neural Network and t-Siamese Network

3.3. Triplet Loss Function

4. Experiments

4.1. Datasets

4.2. Evaluation Parameters

4.3. Results and Discussion

4.3.1. Comparison with Other Standalone Models

4.3.2. Comparison with Different Loss Functions

4.3.3. Comparison with Several Pre-Trained Models

4.3.4. Comparison with Machine Learning Classifiers

4.3.5. Comparison with Several Distance Measure Techniques

4.3.6. Comparison with Previous Works

4.3.7. Validating Performance of the Selected Module

4.3.8. Validating Performance on Different Cross-Validations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI