Next Article in Journal
Influence of Selected Compositions of Wall Materials and Drying Techniques Used for Encapsulation of Linseed Oil and Its Ethyl Esters
Previous Article in Journal
Study on Delaunay Triangular Mesh Delineation for Complex Terrain Based on the Improved Center of Gravity Interpolation Method
Previous Article in Special Issue
A Regional Brightness Control Method for a Beam Projector to Avoid Human Glare
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Realistic Hand Image Composition Method for Palmprint ROI Embedding Attack

1
School of Software, Nanchang Hangkong University, 696 Fenghe Nan Avenue, Nanchang 330063, China
2
Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition, Nanchang Hangkong University, Nanchang 330063, China
3
School of Electrical and Electronic Engineering, College of Engineering, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 120749, Republic of Korea
4
Department of Computer Engineering, Sejong University, Seoul 05006, Republic of Korea
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2024, 14(4), 1369; https://doi.org/10.3390/app14041369
Submission received: 26 December 2023 / Revised: 27 January 2024 / Accepted: 30 January 2024 / Published: 7 February 2024
(This article belongs to the Special Issue Multimedia Systems Studies)

Abstract

:
Palmprint recognition (PPR) has recently garnered attention due to its robustness and accuracy. Many PPR methods rely on preprocessing the region of interest (ROI). However, the emergence of ROI attacks capable of generating synthetic ROI images poses a significant threat to PPR systems. Despite this, ROI attacks are less practical since PPR systems typically take hand images as input rather than just the ROI. Therefore, there is a pressing need for a method that specifically targets the system by composing hand images. The intuitive approach involves embedding an ROI into a hand image, a comparatively simpler process requiring less data than generating entirely synthetic images. However, embedding faces challenges, as the composited hand image must maintain a consistent color and texture. To overcome these challenges, we propose a training-free, end-to-end hand image composition method incorporating ROI harmonization and palm blending. The ROI harmonization process iteratively adjusts the ROI to seamlessly integrate with the hand using a modified style transfer method. Simultaneously, palm blending employs a pretrained inpainting model to composite a hand image with a continuous transition. Our results demonstrate that the proposed method achieves a high attack performance on the IITD and Tongji datasets, with the composited hand images exhibiting realistic visual quality.

1. Introduction

Palmprint recognition (PPR) has recently been widely studied due to its robustness and accuracy. A typical PPR is composed of two stages. Stage one is the preprocessing stage, in which the palmprint region of interest (ROI) is located and extracted; stage two is the recognition stage, which entails feature extraction, matching, and the final decision. ROI localization is indispensable for PPR, which determines which part of the palmprint will be used for recognition, and it usually requires an input hand satisfying certain conditions.
Given the vital role of ROI in PPR systems, ROI attacks aimed at spoofing and evading PPR systems [1,2,3] have emerged. While these attacks primarily target the palmprint ROI, their practicality may be limited, as PPR systems typically utilize hand images as input rather than the ROI. Consequently, synthesized hand images are required for impersonation attacks. Generally, two distinct approaches are employed to achieve hand image synthesis: generating a hand image based on the ROI and embedding the ROI into an existing hand image. The merit of the embedding method over the generating method stems from its reduced dependency on training data for a generator to yield realistic outcomes, rendering it a more practical choice.
Specifically, the embedding approach employs a hand image as a carrier and integrates the ROI into it, a procedure dubbed ROI embedding. After acquiring composited images through ROI embedding, these images can be employed to launch impersonation attacks. In this paper, we aim to formulate an ROI embedding method tailored to substantiate the impersonation attack through composited hands, namely a hand attack instead of synthesized ROI.
As illustrated in Figure 1, the components delineated by the black dashed line represent the palmprint recognition flow, constituting a PPR system that exclusively accepts the entire hand as a valid input. The ROI attack flow attempts to compromise the system by employing an ROI image deemed less practical. Conversely, the hand attack flow poses a more substantial threat, enabling attacks through legitimate inputs.
The hand attack flow consists of a series of steps. Initially, the attacking ROI is acquired with the goal of compromising the target ROI. Subsequently, an arbitrary full-hand image is selected as the carrier hand. The third step involves utilizing an ROI embedding method to seamlessly integrate the attacking ROI into the chosen carrier hand. Ultimately, this process creates a composite hand, enabling the execution of the intended attack.
The ROI embedding is intuitive yet non-trivial, posing two principal challenges. The first challenge involves embedding localization, necessitating the precise placement of the ROI on the carrier hand at an appropriate scale. However, a singular ROI position is not universally applicable to a hand, given the diverse preprocessing methods in PPR systems. The second challenge pertains to appearance consistency, wherein the attacking ROI’s hue, luminance, and skin color must seamlessly integrate with the carrier hand. Moreover, the region around the embedding position in the carrier hand must exhibit continuous texture and natural transitions. Appearance consistency is an intuitive metric for ROI embedding, ensuring that the composited hand image attains desirable visual quality and fidelity, as illustrated in Figure 2. In this study, we focus on the second challenge, assuming the availability of a decent preprocessing method employed by the target PPS system.
This paper introduces an end-to-end, training-free approach for ROI embedding to achieve a realistic composition of hand images. Our methodology comprises two key components: ROI harmonization and palm blending. In the ROI harmonization phase, we employ a modified style transfer method to closely transform the attacking ROI’s appearance to resemble that of the carrier hand. This adaptation leverages an optimization method based on the pretrained Contrastive Arbitrary Style Transfer (CAST) network [4], which can achieve desirable results within a few iterations.
In the palm blending phase, the harmonized ROI image warps to the embedding position on the carrier hand. Concurrently, the area surrounding the embedding position on the carrier hand is masked and then regenerated using a pretrained inpainting network to facilitate a natural transition. The ROI image of the carrier hand serves as the reference image for inpainting, capitalizing on the ability of the inpainting method [5] to generate the masked area with semantic information from a reference image without requiring additional training. Ultimately, this process yields a composited hand image with heightened appearance consistency.
In conclusion, our contributions are as follows.
(1) We present the first end-to-end, training-free approach for synthesizing hand images, extending the scope of ROI attacks to encompass hand attacks.
(2) To enhance the visual coherence of the composited hand images, we introduce an efficient and effective optimization method based on CAST and a pretrained reference-based inpainting technique.
(3) The experimental results underscore the efficacy of our proposed method. The resulting composited hand images exhibit a noteworthy attack success rate and demonstrate superior appearance consistency in several datasets.
The subsequent sections of this paper are structured as follows: Section 2 presents an overview of the relevant prior research. Section 3 delineates the designed hand image composition methods. Section 4 presents comprehensive experiments on the composition methods and the composited hand images. Lastly, Section 5 encompasses the conclusions drawn from the study and discusses potential avenues for future research.

2. Related Works

2.1. Palmprint Recognition

Zhang first introduced PalmCode [6] for PPR, which is characterized by four distinct steps: preprocessing, feature extraction, matching (or enrollment), and decision. Subsequent developments in PPR have predominantly focused on enhancing feature extraction, matching, and decision-making. These advancements include approaches based on handcrafted methodologies [7,8,9,10] as well as those rooted in deep learning [11,12,13,14].
A competitive coding scheme [7] has been introduced to enhance the discriminative features, selecting the highest responsive direction order as a feature from six-directional Gabor features. Building upon this competition scheme, the work [8] employs a Modified Finite Radon Transform (MFRAT) to replace Gabor, extracting more robust features. Another paper [9] enhances the discriminative nature of the competitive feature by incorporating an additional adjacent direction matrix. To leverage more comprehensive information, ref. [10] employs six-directional Gabor features and utilizes the average match score of these features to inform the decision-making process.
In the realm of deep-learning-based approaches, several studies [11,12] have integrated the competition mechanism with multi-scale learnable Gabor kernels to enhance feature extraction effectively. Furthermore, the work [13] employs a hashing method for extracting concise features, while [14] introduces a single convolution layer to extract multi-directional features. These PPR methods are dedicated to exploring discriminative features within the palmprint ROI.
Conversely, other studies have focused on refining preprocessing methods [15,16,17,18,19,20], directing their efforts toward devising universal and effective techniques for accurately locating palmprint ROIs. The majority of ROI localization methods are based on identifying finger valley points, exemplified by methodologies such as that described in [15], which employs Convolutional Neural Networks (CNN) for hand classification and segmentation [16], which employs an improved active shape model [17], which utilizes a hand landmark detector model, and [18], which integrates a segmentation network and two regression networks. Some methodologies directly predict the ROI bounding box, including [19], which incorporates a finger classifier and transfer learning, and [20], which employs a regression network to simultaneously detect palmprint and palm vein ROIs.
A plethora of diverse methods for localizing ROI lead to variations in the obtained ROI outcomes. Recognition results, derived from matching and decision processes, are intricately dependent on preprocessing and feature extraction outcomes. Therefore, accurate knowledge of the ROI position is crucial to the efficacy of hand attacks. The successful execution of a hand attack appears unattainable without prior knowledge of the target preprocessing method, given the diverse nature of preprocessing techniques. Consequently, our research is grounded in the assumption that knowledge of the target preprocessing method is available.

2.2. Attacks on Palmprint

Given the popularity of palmprint ROI recognition methods, there has been an emergence of ROI attacks aimed at spoofing and evading these recognition systems. One such approach is the False Acceptance Attack proposed by Wang [1]. This method leverages Generative Adversarial Networks (GANs) to generate a substantial volume of synthetic palmprint ROI images. Subsequently, a k-means algorithm is employed to eliminate indistinguishable images, constituting an attack strategy against the recognition system.
Sun has introduced two reinforcement strategies for PPR attacks [2], employing iterative algorithms to modify a given ROI image. The objective is to enhance the similarity between the modified ROI and target features, thus subverting the PPR system. On the other hand, Yang utilizes a style-transfer method to generate a realistic appearance based on ROI features to attack the system [3]. Notably, these attack methodologies primarily target the palmprint ROI, but their efficacy may be limited in practical environments as they typically do not successfully navigate through the preprocessing stage within a PPR system.

2.3. Image Composition

Image composition, a classical task in image processing, encompasses subtasks such as object placement, image harmonization, and image blending [21]. Object placement focuses on placing the foreground to the background with a suitable position, size, and orientation, as shown in [22], in which the authors use two generative modules and a spatial transformation network to decide the position for placement, and [23], which applies the bounding box calculated by the features from the masked convolutions to locate the position. Considering the differences between the color and hue of the foreground and background, image harmonization is used to decrease these differences, as shown in [24], in which the authors designed an image harmonization network with collaborative dual transformation for pixel and RGB transformation to harmonization, and [25], which applied the self-consistent style contrastive learning scheme with background-attentional adaptive instance normalization to harmonization. For seamless composition, image blending is applied to diminish the unnatural edge around the foreground in the composited image, as described in [26], which optimizes the style and content loss with the Poisson blending loss and iteratively updates the pixel of the boundary region for blending, and [27], which proposes using a densely connected multi-stream fusion network to fuse the foreground and the background image. Generally, these tasks have been geared towards achieving realistic visual quality by situating independent objects within a background without explicit precision requirements.
However, the challenge intensifies when dealing with hand and palmprint ROI, as they constitute an integrated entity with continuous texture connection rather than two distinct objects. Consequently, hand image composition poses a more substantial challenge, demanding a higher level of precision in seamlessly placing an ROI into a palm and ensuring a cohesive appearance of the palmprint as a unified entity. Moreover, the foreground palmprint must preserve most of texture information under harmonization. Conventional image composition methods fall short of meeting the specific requirements of hand composition. Figure 3 illustrates the distinction between image composition and the more intricate task of hand composition.

3. Methodology

3.1. Threat Model for Hand Attacks

Let P ( · ) represent the preprocessing procedure of the PPR system, where H denotes the full hand image. The ROI is obtained through P ( H ) and is subsequently utilized for palmprint recognition.
Suppose R A t k and C denote an attacking ROI and an arbitrary carrier hand image, respectively, where R A t k is assumed to be an ROI image designed to match the target ROI in the feature domain. The R A t k can be sourced from a compromised dataset or synthesized using the methods detailed in Section 2.2. We assume that an adversary possesses knowledge of the P ( · ) of the target system, implying awareness of the precise position P of ROI within C; this is referred to as the embedding position.
Designating E ( · , · , · ) as our proposed ROI embedding technique, a highly realistic composited hand image H A t k can be generated through E ( R A t k , C , P ) in an end-to-end and training-free manner. By inputting H A t k into the target PPR system, R A t k can be accurately extracted through P ( · ) , achieving the objective of the attack.

3.2. Overview

E ( · , · , · ) comprises two crucial components: ROI harmonization and palm blending. Let R R e f denote the reference ROI—an image extracted from C at position P, utilized to facilitate both ROI harmonization and palm blending. During the ROI harmonization, an iterative optimization process is implemented to transform the R A t k style to align with R R e f . Subsequently, the harmonized R A t k is seamlessly integrated into the carrier image C during the palm blending phase. In this phase, the area surrounding position P is masked to facilitate inpainting. The inpainting operation employs R R e f as a reference image, utilizing its semantic information to regenerate the masked area. The resulting composited hand image represents the culmination of the entire process and is referred to as ROI embedding. The schematic representation of the ROI embedding process is displayed in Figure 4.
The ROI embedding pipeline, shown in Figure 4, comprises three distinct components. The first component is the localization, which is responsible for furnishing essential R R e f and an inpainting mask required for subsequent ROI harmonization and palm blending. The inputs for this stage are carrier image C and the specified embedding position P. The ROI localization is determined based on the known target preprocessing.
The second component involves ROI harmonization, wherein R A t k transforms harmonized R A t k aligned with the style of R R e f to mitigate visual disparities between R A t k and C. ROI harmonization adopts a modified style transfer method, detailed in Section 3.3. The last component constitutes the palm blending, which results in generation of the composited hand image. The harmonized R A t k is initially incorporated into C based on the predetermined P. Subsequently, the embedded hand image is input into the inpainting network with R R e f , facilitating the regeneration of the masked inpainting area. The composited hand image is obtained, with further details discussed in Section 3.4.

3.3. ROI Harmonization

Given potential hue, illumination, and skin color disparities between R A t k and carrier hand image C, ROI harmonization becomes imperative to mitigate these differences. Moreover, for computational efficiency in matching and aligning with the grayscale nature of ROI images, R A t k is converted to a grayscale image.
Specifically, we leverage the pretrained Contrastive Arbitrary Style Transfer (CAST) proposed in [4] to achieve ROI harmonization. As shown in Figure 5, CAST employs cycle generation to train encoders and decoders, adeptly separating and merging style and content. However, training a model that accommodates the significant diversity within hand images across various datasets proves challenging. We devise an iterative style transfer method based on pretrained CAST to initialize and optimize image quality for individual palms. This iterative process resembles the training process of CAST but focuses on optimizing the image quality rather than the model ability, which enables the synthesis of realistic images through limited iterations.
Specifically, let A and B symbolize R A t k and R R e f , respectively. The notation A | B and B | A denote A or B and B or A, respectively. The ROI harmonization process commences by employing the encoder E ( · , · ) to amalgamate the content of A | B and the style of B | A from the two input images, A and B. Subsequently, the decoder operates on E ( A | B , B | A ) to generate the harmonized image ( A | B ) f based on the content of A | B and the style of B | A . Once ( A | B ) f is obtained, it is combined with A | B to yield a reconstructed image ( A | B ) r . The final harmonized image is refined by minimizing the difference between A and A r and B and B r .
Within our proposed ROI harmonization framework, the loss function is essential in enhancing both image quality and harmonization effectiveness. Notably, the loss function employed in CAST solely incorporates the L 1 loss and assigns equal weights to L c y c A and L c y c B . However, this approach proves insufficient for attaining optimal image quality and harmonization effects. Furthermore, preserving texture information in R A t k is critical, necessitating measures to minimize interference from R R e f . Consequently, essential and imperative modifications must be made to the loss function to address these considerations.
Building upon the existing L 1 loss, the cycle loss is improved by introducing two loss functions, namely L 2 and L l p i p s [28], to enhance the preservation and separation of content and style information, in which the overall image visual quality can be improved. To further preserve texture information in synthetic images, a texture loss L T is devised. This involves utilizing a separate texture feature extractor, f t ( ) which is composed of two Gabor convolution layers and a leakyReLU function for feature extraction. For a comprehensive feature extraction, a six-direction Gabor kernel is employed. Following the acquisition of texture features, L 1 is employed to calculate the differences between input images and reconstructed images. The L T is the average of the six Gabor feature losses, as depicted in Equation (2).
f t ( x , g i ) = φ L R ( C o n v ( C o n v ( x , g i ) , g i ) )
L T x , y = i = 1 6 L 1 ( f t x , g i , f t ( y , g i ) ) 6
where C o n v denotes convolution, φ L R is the LeakyReLU function, and g i is the Gabor convolution kernel toward ( i 1 ) × 30 degrees.
The final modified cycle loss is Equation (4), which constitutes L c y c A and L c y c B . The two losses, L c y c A and L c y c B , that are specified by Equation (3), are almost identical except for their input and loss coefficients.
L c y c A | B x , y = λ L 1 L 1 x , y + λ L 2 L 2 x , y + λ L P L l p i p s x , y + λ L T L T x , y
L c y c A , B , A r , B r = L c y c A A , A r + L c y c B B , B r
where x and y represent the original image and the reconstructed image, respectively. The λ L 1 , λ L 2 , λ L P , and λ L T denote the weight coefficients of L 1 , L 2 , L l p i p s and L T loss, respectively.
Since we need to preserve more detail in A and more appearance in B, the loss coefficients of L c y c A are experientially set as 0.5, 0.3, 0.3, and 0.7 for λ L 1 , λ L 2 , λ L P , and λ L T , respectively. On the other hand, the loss coefficients of L c y c B are set to 0.5, 0.7, 0.7, and 0.3. Moreover, since our goal is to improve the image quality, not the network performance, and the adversarial loss produced by the discriminators of CAST will slow and disturb the image optimization process, the discriminators and their loss are disabled to speed up and stabilize the optimization process.

3.4. Palm Blending

Each palm exhibits a distinct texture characterized by unique and specific patterns. Directly overlaying attacking ROI R A t k onto carrier hand C can introduce noticeable inconsistencies in the palm area surrounding R A t k . To address this challenge, we have devised an intuitive solution that uses a pretrained inpainting network for region regeneration. Specifically, we utilize the Paint by Example (PbyE) inpainting network [5], which employs a reference image to generate a mask area. The generated area, while not identical to the reference image, mirrors the semantic and visual characteristics of the reference. Moreover, the mask’s surrounding area is also considered a reference for the inpainting process, making this network suitable for generating a seamless and continuous palm area without additional training.
The process begins by obtaining the embedded hand by warping the harmonized R A t k into the specified embedding position P on carrier hand C. Subsequently, reference ROI R R e f is employed as the reference image to guide the inpainting process. Finally, an inpainting mask is generated based on the embedding position P of carrier hand C, as illustrated in Figure 6.
As illustrated in Figure 6, ROI localization is employed to extract the embedding position P from the carrier image C. Subsequently, the inpainting mask is deduced based on P through a three-step procedure. Firstly, an ROI mask is delineated using P, where white represents the mask and black signifies the background to be disregarded. Secondly, employing a dilation kernel, the ROI mask from the first step is expanded to generate a dilated ROI mask. The final inpainting mask is derived by subtracting the ROI mask from the first step from the dilated ROI mask obtained in the second step. The impact of this inpainting mask on the masked embedded hand is depicted in Figure 6.
Nevertheless, the size of the dilation kernel requires careful consideration. A larger kernel size leads to a more extensive inpainting area, potentially introducing significant changes in C and consequently diminishing the realism of the composited hand image. Conversely, a smaller kernel size results in a more confined inpainting area, making the transition harder than a larger kernel. To address this, we devised an adaptive kernel size strategy, selecting one of the five side lengths of the ROI as the dilation kernel size.
Once the embedded hand, inpainting mask, and reference image R R e f have been obtained, the final composited hand image is synthesized using the Paint by Example (PbyE) method. The palm blending pipeline is depicted in Figure 7.

4. Experiments

The experimental setup is as follows: an Intel(R) Xeon(R) Platinum 8222CL CPU @3.00 GHz from Chengdu in China, equipped with 64 GB of internal storage, an NVIDIA 3090Ti GPU from Jiangsu in China, and an Ubuntu 20.04.3 LTS 64-bit operating system. Four distinct datasets were employed in the ensuing experiments, with comprehensive details available in Table 1, and a few image samples of each dataset are shown in Figure 8.
A judicious data selection was undertaken given the subtle variations within hand images of single datasets and the imperative to conduct experiments under cross-dataset conditions. Specifically, 200 images were randomly sampled from each dataset for experimentation purposes, and all 200 images sampled from each dataset were from different hands, except for the 200 images sourced from Zhou [32], which were from 160 hands.
All images from the four datasets underwent cropping and resizing to facilitate uniformity, resulting in a standardized resolution 512 × 512 . In our experimental setup, the Palm Keypoint Localization Neural Network (PKLNet) [18], renowned for its exceptional performance in ROI localization, was employed.

4.1. ROI Harmonization

4.1.1. Cross-Dataset Harmonization

In a real-world scenario, there can be substantial discrepancies between H and R A t k . Cross-dataset harmonization was executed to emulate this environment, requiring only 30 iterations to harmonize an individual image. The content images are intentionally rendered in grayscale to simulate potential variations in R A t k . The assessment of harmonized images involves two metrics: L l p i p s for evaluating visual quality and texture loss L T for assessing the preservation of textures. The visual outcomes of the harmonization process are illustrated in Figure 9, while the corresponding quantitative results for L l p i p s and L T are provided in Table 2 and Table 3, respectively.
As depicted in Figure 9, our ROI harmonization method consistently produces remarkable visual results irrespective of the content or style employed. The outcomes presented in Table 2 indicate that harmonized images adopting the Tongji style outperform other styles, while those harmonized with the IITD style exhibit slightly inferior results. This discrepancy can be attributed to the robust color profile of Tongji images, facilitating more effective transfer, whereas the lighter color of IITD images poses challenges for seamless transfer. Moreover, as detailed in Table 3, all harmonized images exhibit lower L T , signifying effective preservation of texture across the harmonization process.

4.1.2. Ablation Study

Experiments have been devised to elucidate the impact of iteration and losses. The Zhou dataset, characterized by diverse image conditions, was selected for the ensuing ablation study to provide comprehensive insights. The visual results across different iterations are depicted in Figure 10, and the corresponding evaluation results are shown in Figure 11.
In Figure 10, images from iteration 0 are deemed unsatisfactory, while those from iteration 30 demonstrate a significant enhancement, approaching the quality of the original images. Subsequent iterations yield only marginal improvements. Figure 11 reveals that all losses’ elbow points are reached before iteration 30, which means that the mean and std of all losses are relatively lower, and there is not much room for dropping after this point. Considering effectiveness and efficiency, we designate 30 iterations as the optimal parameter for all experiments.
Building upon the findings at iteration 30, an exploration into the impact of L c y c l e was conducted. The visual outcomes are presented in Figure 12, while a comprehensive evaluation is enumerated in Table 4. The results indicate that the inclusion of additional L l p i p s and L 2 significantly enhances both visual quality and realism compared to the original cycle loss. Moreover, our devised texture loss exhibits the ability to preserve more texture while imparting slight alterations to visual quality.

4.2. ROI Embedding Attack

To assess the performance of our ROI embedding attack, we employ the Binary Orientation Co-occurrence Vector (BOCV) [10] as the recognition method, and we utilize PKLNet [18] to extract all ROI. The success threshold for the attack is established based on the Equal Error Rate (EER) calculated using BOCV for each dataset. EER is determined by utilizing all data from the original dataset to calculate the EER specifically for their ROIs.
The attack’s results are presented in Table 5, displaying the attack success rate for each scenario. Additionally, Figure 13 illustrates the distribution of matching distances. In Table 5, the first column represents the attacking ROI, and the last column denotes the corresponding dataset’s EER. In Figure 13, datasetA2datasetB signifies using ROI from datasetA as the attacking ROI while employing datasetB as the carrier hand. The grayscale ROIs from each dataset serve as both the attacking and target ROIs.
The efficacy of the attack diminishes notably when Gao and Zhou serve as the carrier hands. This can be attributed to the diverse and intricate environments associated with Gao and Zhou, which may introduce complexities affecting the accuracy of PKLNet ROI localization. In contrast, the attack significantly improves performance when Tongji and IITD act as the carrier hands.
Figure 14 presents the composited hand images, visually representing the attack scenarios. The quality of these composited hand images is quantitatively assessed by measuring the disparity between the composited hands and the carrier hands. Evaluation metrics such as the peak signal-to-noise ratio (PSNR) and L l p i p s are employed, and the corresponding results are presented in Table 6 and Table 7.
Regarding image quality, Figure 14 visually presents realistic outcomes wherein an attacking ROI seamlessly integrates into a distinct carrier hand, producing a natural visual effect. Furthermore, the evaluation results in Table 6 and Table 7 corroborate these findings, indicating a favorable and satisfactory outcome.

5. Conclusions and Future Works

This paper introduced an end-to-end, training-free method for hand image composition designed for ROI embedding attacks. This method leverages a modified style transfer approach, demonstrating notable success in achieving harmonized ROI images within a few iterations. The intuitive and effective utilization of a pretrained inpainting model to regenerate the area surrounding the embedding position contributes to the robustness of our framework. Notably, our approach requires only an ROI image and a hand image for execution, eliminating the need for training and enhancing practicality. In contrast with earlier approaches in palmprint attacks, our research introduces a novel full-hand attack strategy that builds upon an existing palmprint ROI. This innovative approach extends the applicability of attacks to more realistic environments, addressing a previously overlooked aspect. Furthermore, our method presents an intuitive and effective means of synthesizing a realistic composite hand image. Notably, this idea exhibits universality across different styles and achieves enhanced precision in texture preservation compared to conventional image composition methods. It is essential to highlight a key observation derived from our methodology: the palmprint recognition system is susceptible to significant threats even when exposed to minor and incomplete information leaks, such as an incomplete ROI. Moving forward, we will focus on devising a more efficient method that eliminates the need for Region of Interest (ROI) localization information from the target system. Additionally, our approach can be extended to a data augmentation strategy, synthesizing highly realistic full-hand images to enhance the performance of full-hand recognition methods. Nevertheless, we acknowledge that there is still ample room for improvement in our work. ROI harmonization is not so sensitive to the light color style and is susceptible to strong stains. In addition, the attack result should have a higher performance. However, we will aim to explore ways to enhance its efficacy in the future.

Author Contributions

Conceptualization, L.Y. and L.L.; methodology, L.Y.; software, L.Y.; validation, L.Y. and L.L.; formal analysis, L.Y.; investigation, L.Y.; resources, L.L.; data curation, L.L.; writing—original draft preparation, L.Y.; writing—review and editing, L.L., A.B.J.T. and C.K.; visualization, L.Y.; supervision, L.L. and A.B.J.T.; project administration, L.L.; funding acquisition, L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (61866028), the Technology Innovation Guidance Program Project (Special Project of Technology Cooperation, Science and Technology Department of Jiangxi Province) (20212BDH81003), and the Innovation Foundation for Postgraduate Students of Nanchang Hangkong University (YC2022-S765).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy issues.

Acknowledgments

Thank you to the reviewers who reviewed this paper and the MDPI editor who edited it professionally.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
PPRPalmprint recognition
ROIRegion of interest
CASTContrastive arbitrary style transfer
CNNConvolutional neural networks
GANsGenerative adversarial networks
PKLNetPalm keypoint localization neural network
BOCVBinary orientation co-occurrence vector
EEREqual error rate
PSNRPeak signal-to-noise ratio
P ( · ) Preprocessing procedure of the PPR system
HFull hand image
CCarrier hand image
PEmbedding position
E ( · , · , · ) Proposed ROI embedding technique
R A t k Attacking ROI
R R e f Reference ROI
H A t k Composited hand image

References

  1. Wang, F.; Leng, L.; Teoh, A.B.J.; Chu, J. Palmprint false acceptance attack with a generative adversarial network (GAN). Appl. Sci. 2020, 10, 8547. [Google Scholar] [CrossRef]
  2. Sun, Y.; Leng, L.; Jin, Z.; Kim, B.G. Reinforced palmprint reconstruction attacks in biometric systems. Sensors 2022, 22, 591. [Google Scholar] [CrossRef]
  3. Yang, Z.; Leng, L.; Zhang, B.; Li, M.; Chu, J. Two novel style-transfer palmprint reconstruction attacks. Appl. Intell. 2023, 53, 6354–6371. [Google Scholar] [CrossRef]
  4. Zhang, Y.; Tang, F.; Dong, W.; Huang, H.; Ma, C.; Lee, T.Y.; Xu, C. Domain enhanced arbitrary image style transfer via contrastive learning. In Proceedings of the ACM SIGGRAPH 2022 Conference Proceedings, Vancouver, BC, Canada, 7–11 August 2022; pp. 1–8. [Google Scholar]
  5. Yang, B.; Gu, S.; Zhang, B.; Zhang, T.; Chen, X.; Sun, X.; Chen, D.; Wen, F. Paint by example: Exemplar-based image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18381–18391. [Google Scholar]
  6. Zhang, D.; Kong, W.K.; You, J.; Wong, M. Online palmprint identification. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 1041–1050. [Google Scholar] [CrossRef]
  7. Kong, A.K.; Zhang, D. Competitive coding scheme for palmprint verification. In Proceedings of the 17th International Conference on Pattern Recognition—ICPR 2004, Cambridge, UK, 26 August 2004; IEEE: Piscataway, NJ, USA, 2004; Volume 1, pp. 520–523. [Google Scholar]
  8. Jia, W.; Huang, D.S.; Zhang, D. Palmprint verification based on robust line orientation code. Pattern Recognit. 2008, 41, 1504–1513. [Google Scholar] [CrossRef]
  9. Xu, Y.; Fei, L.; Wen, J.; Zhang, D. Discriminative and robust competitive code for palmprint recognition. IEEE Trans. Syst. Man Cybern. Syst. 2016, 48, 232–241. [Google Scholar] [CrossRef]
  10. Guo, Z.; Zhang, D.; Zhang, L.; Zuo, W. Palmprint verification using binary orientation co-occurrence vector. Pattern Recognit. Lett. 2009, 30, 1219–1227. [Google Scholar] [CrossRef]
  11. Liang, X.; Yang, J.; Lu, G.; Zhang, D. Compnet: Competitive neural network for palmprint recognition using learnable Gabor kernels. IEEE Signal Process. Lett. 2021, 28, 1739–1743. [Google Scholar] [CrossRef]
  12. Yang, Z.; Huangfu, H.; Leng, L.; Zhang, B.; Teoh, A.B.J.; Zhang, Y. Comprehensive competition mechanism in palmprint recognition. IEEE Trans. Inf. Forensics Secur. 2023, 18, 5160–5170. [Google Scholar] [CrossRef]
  13. Wu, T.; Leng, L.; Khan, M.K. A multi-spectral palmprint fuzzy commitment based on deep hashing code with discriminative bit selection. Artif. Intell. Rev. 2023, 56, 6169–6186. [Google Scholar] [CrossRef]
  14. Fei, L.; Zhao, S.; Jia, W.; Zhang, B.; Wen, J.; Xu, Y. Toward efficient palmprint feature extraction by learning a single-layer convolution network. IEEE Trans. Neural Netw. Learn. Syst. 2022, 32, 9783–9794. [Google Scholar] [CrossRef] [PubMed]
  15. Bao, X.; Guo, Z. Extracting region of interest for palmprint by convolutional neural networks. In Proceedings of the 2016 Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA), Oulu, Finland, 12–15 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar]
  16. Gao, F.; Cao, K.; Leng, L.; Yuan, Y. Mobile palmprint segmentation based on improved active shape model. J. Multimed. Inf. Syst. 2018, 5, 221–228. [Google Scholar]
  17. Matkowski, W.M.; Chai, T.; Kong, A.W.K. Palmprint recognition in uncontrolled and uncooperative environment. IEEE Trans. Inf. Forensics Secur. 2019, 15, 1601–1615. [Google Scholar] [CrossRef]
  18. Liang, X.; Fan, D.; Yang, J.; Jia, W.; Lu, G.; Zhang, D. PKLNet: Keypoint localization neural network for touchless palmprint recognition based on edge-aware regression. IEEE J. Sel. Top. Signal Process. 2023, 13, 662–676. [Google Scholar] [CrossRef]
  19. Izadpanahkakhk, M.; Razavi, S.M.; Taghipour-Gorjikolaie, M.; Zahiri, S.H.; Uncini, A. Deep region of interest and feature extraction models for palmprint verification using convolutional neural networks transfer learning. Appl. Sci. 2018, 8, 1210. [Google Scholar] [CrossRef]
  20. Li, Z.; Liang, X.; Fan, D.; Li, J.; Zhang, D. BPFNet: A unified framework for bimodal palmprint alignment and fusion. In Proceedings of the Neural Information Processing: 28th International Conference, ICONIP 2021, Sanur, Indonesia, 8–12 December 2021; Proceedings, Part VI 28. Springer: Berlin/Heidelberg, Germany, 2021; pp. 28–36. [Google Scholar]
  21. Niu, L.; Cong, W.; Liu, L.; Hong, Y.; Zhang, B.; Liang, J.; Zhang, L. Making images real again: A comprehensive survey on deep image composition. arXiv 2021, arXiv:2106.14490. [Google Scholar]
  22. Lee, D.; Liu, S.; Gu, J.; Liu, M.Y.; Yang, M.H.; Kautz, J. Context-aware synthesis and placement of object instances. Adv. Neural Inf. Process. Syst. 2018, 31, 10414–10424. [Google Scholar]
  23. Volokitin, A.; Susmelj, I.; Agustsson, E.; Van Gool, L.; Timofte, R. Efficiently detecting plausible locations for object placement using masked convolutions. In Proceedings of the Computer Vision–ECCV 2020 Workshops, Glasgow, UK, 23–28 August 2020; Proceedings, Part IV 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 252–266. [Google Scholar]
  24. Cong, W.; Tao, X.; Niu, L.; Liang, J.; Gao, X.; Sun, Q.; Zhang, L. High-resolution image harmonization via collaborative dual transformations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 18470–18479. [Google Scholar]
  25. Hang, Y.; Xia, B.; Yang, W.; Liao, Q. Scs-co: Self-consistent style contrastive learning for image harmonization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 19710–19719. [Google Scholar]
  26. Zhang, L.; Wen, T.; Shi, J. Deep image blending. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Online, 2–5 March 2020; pp. 231–240. [Google Scholar]
  27. Zhang, H.; Zhang, J.; Perazzi, F.; Lin, Z.; Patel, V.M. Deep image compositing. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikola, HI, USA, 3–8 January 2021; pp. 365–374. [Google Scholar]
  28. Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
  29. Zhang, L.; Li, L.; Yang, A.; Shen, Y.; Yang, M. Towards contactless palmprint recognition: A novel device, a new benchmark, and a collaborative representation based identification approach. Pattern Recognit. 2017, 69, 199–212. [Google Scholar] [CrossRef]
  30. Kumar. Iit delhi Touchless Palmprint Database Version 1.0. 2009. Available online: https://www4.comp.polyu.edu.hk/csajaykr/IITD/Database_Palm.htm (accessed on 14 July 2023).
  31. Gao, F.M. Research on the Palmprint Authentication Algorithm of Mobile Terminal Assisted by Two Lines and One Point. Master’s Thesis, Nanchang Hangkong University, Nanchang, China, 2019. [Google Scholar]
  32. Zhou, Z.; Chen, Q.; Leng, L. Key point localization based on intersecting circle for palmprint preprocessing in public security. J. Def. Acquis. Technol. 2019, 1, 24–31. [Google Scholar] [CrossRef]
Figure 1. Hand attack, ROI attack, and palmprint recognition flow.
Figure 1. Hand attack, ROI attack, and palmprint recognition flow.
Applsci 14 01369 g001
Figure 2. The illustration of low and high appearance consistency in ROI embedding. The second row depicts a seamless integration of carrier and ROI images.
Figure 2. The illustration of low and high appearance consistency in ROI embedding. The second row depicts a seamless integration of carrier and ROI images.
Applsci 14 01369 g002
Figure 3. The comparison of image composition [21] and hand composition.
Figure 3. The comparison of image composition [21] and hand composition.
Applsci 14 01369 g003
Figure 4. The pipeline of ROI embedding.
Figure 4. The pipeline of ROI embedding.
Applsci 14 01369 g004
Figure 5. The pretrained CAST with modified loss functions is utilized to realize ROI harmonization.
Figure 5. The pretrained CAST with modified loss functions is utilized to realize ROI harmonization.
Applsci 14 01369 g005
Figure 6. The pipeline of making inpainting masks.
Figure 6. The pipeline of making inpainting masks.
Applsci 14 01369 g006
Figure 7. The illustration of Palm Blending.
Figure 7. The illustration of Palm Blending.
Applsci 14 01369 g007
Figure 8. The dataset samples of Tongji, IITD, Gao and Zhou.
Figure 8. The dataset samples of Tongji, IITD, Gao and Zhou.
Applsci 14 01369 g008
Figure 9. Cross-dataset harmonized images result.
Figure 9. Cross-dataset harmonized images result.
Applsci 14 01369 g009
Figure 10. Harmonized images in different iterations.
Figure 10. Harmonized images in different iterations.
Applsci 14 01369 g010
Figure 11. The L1, L2, Lpips, and texture loss in different iterations.
Figure 11. The L1, L2, Lpips, and texture loss in different iterations.
Applsci 14 01369 g011
Figure 12. The images of ablation study of cycle losses.
Figure 12. The images of ablation study of cycle losses.
Applsci 14 01369 g012
Figure 13. The distribution of ROI embedding attack in four datasets (ROI2Hand).
Figure 13. The distribution of ROI embedding attack in four datasets (ROI2Hand).
Applsci 14 01369 g013
Figure 14. The composited hand images of four datasets.
Figure 14. The composited hand images of four datasets.
Applsci 14 01369 g014
Table 1. Datasets Explanation.
Table 1. Datasets Explanation.
DatasetsAll DataUsed DataResolutionCondition
HandsImagesHandsImages
Tongji [29]60012,000200200800 × 600Simplex
IITD [30]46026012002001600 × 1200Simplex
Gao [31]204816200200720 × 1184Indoor
Zhou [32]160918160200Multi-resolutionIndoor and outdoor
Table 2. Average Lpips score between style images and harmonized images, the lower the better.
Table 2. Average Lpips score between style images and harmonized images, the lower the better.
Content\StyleTongjiIITDGaoZhou
Tongji0.117 ± 0.0600.355 ± 0.0660.336 ± 0.0750.326 ± 0.097
IITD0.273 ± 0.0450.124 ± 0.0840.363 ± 0.0810.330 ± 0.097
Gao0.292 ± 0.0520.400 ± 0.0710.203 ± 0.0930.342 ± 0.103
Zhou0.282 ± 0.0570.367 ± 0.0680.345 ± 0.0810.142 ± 0.081
Table 3. Average texture loss between content images and harmonized images, the lower the better.
Table 3. Average texture loss between content images and harmonized images, the lower the better.
Content\StyleTongjiIITDGaoZhou
Tongji0.029 ± 0.0110.055 ± 0.0170.062 ± 0.0170.055 ± 0.019
IITD0.044 ± 0.0150.020 ± 0.0110.040 ± 0.0100.042 ± 0.015
Gao0.052 ± 0.0160.037 ± 0.0090.028 ± 0.0100.044 ± 0.016
Zhou0.047 ± 0.0170.040 ± 0.0160.042 ± 0.0150.024 ± 0.012
Table 4. The evaluation results of ablation study of cycle losses; the bolded values are the best.
Table 4. The evaluation results of ablation study of cycle losses; the bolded values are the best.
Setting L 1 L 2 L l p i p s L T
Baseline( L 1 )0.145 ± 0.0700.034 ± 0.0300.496 ± 0.1400.026 ± 0.013
+ L 2 + L l p i p s 0.145 ± 0.0660.033 ± 0.0280.381 ± 0.1200.027 ± 0.016
+ L T 0.144 ± 0.0700.033 ± 0.0300.384 ± 0.1280.024 ± 0.013
Table 5. The attack success rate of ROI embedding attack (%).
Table 5. The attack success rate of ROI embedding attack (%).
ROITongjiIITDGaoZhouEER
Tongji8898.53663.50.29
IITD82.59941.5681.09
Gao8897.56569.514.5
Zhou85.5994477.52.14
Table 6. PSNR of composited hand images, the higher the better.
Table 6. PSNR of composited hand images, the higher the better.
ROI\HandTongjiIITDGaoZhou
Tongji30.96 ± 2.7828.42 ± 1.9629.13 ± 1.6125.74 ± 3.47
IITD25.38 ± 2.7332.09 ± 1.5328.83 ± 1.6125.97 ± 3.26
Gao25.70 ± 2.6627.92 ± 2.0729.83 ± 1.6926.10 ± 3.41
Zhou24.38 ± 2.8327.89 ± 2.2128.69 ± 1.7229.34 ± 3.06
Table 7. Lpips of composited hand images, the lower the better.
Table 7. Lpips of composited hand images, the lower the better.
ROI\HandTongjiIITDGaoZhou
Tongji0.085 ± 0.0590.057 ± 0.0140.067 ± 0.0300.110 ± 0.067
IITD0.129 ± 0.0660.037 ± 0.0110.070 ± 0.0280.108 ± 0.064
Gao0.137 ± 0.0610.062 ± 0.0170.054 ± 0.0220.112 ± 0.066
Zhou0.140 ± 0.0610.061 ± 0.0170.073 ± 0.0320.073 ± 0.060
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yan, L.; Leng, L.; Teoh, A.B.J.; Kim, C. A Realistic Hand Image Composition Method for Palmprint ROI Embedding Attack. Appl. Sci. 2024, 14, 1369. https://doi.org/10.3390/app14041369

AMA Style

Yan L, Leng L, Teoh ABJ, Kim C. A Realistic Hand Image Composition Method for Palmprint ROI Embedding Attack. Applied Sciences. 2024; 14(4):1369. https://doi.org/10.3390/app14041369

Chicago/Turabian Style

Yan, Licheng, Lu Leng, Andrew Beng Jin Teoh, and Cheonshik Kim. 2024. "A Realistic Hand Image Composition Method for Palmprint ROI Embedding Attack" Applied Sciences 14, no. 4: 1369. https://doi.org/10.3390/app14041369

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop