1. Introduction
Advanced biometric recognition technology may be enhanced by ensuring that only authorized users can access a system [
1]. This technology captures and validates user identities through physiological or behavioral biometrics such as face, iris, fingerprint, EEG, and voice [
2,
3]. Palmprint, a fast-growing and relatively new biometric, is the inner surface of the hand between the wrist and the fingers [
1]. Palmprint refers to an impression of the palm on a surface containing rich intrinsic features, including the principal lines and wrinkles (
Figure 1) [
1] and abundant ridge and minutia-based features similar to a fingerprint [
4,
5]. These features result in high accuracy and reliable performance in personal verification and identification [
6,
7,
8].
Several techniques for palmprint recognition have been proposed, such as minutia-based, geometry-based, and transformed-based features [
7]. Various image processing methods exist to process these features, including encoding-based algorithms, structure-based methods, and statistics-based methods [
9]. Many methods in the literature have recently incorporated deep learning due to its ability to achieve high recognition accuracy and adaptability to various biometrics [
10]. Training such deep learning models may require large datasets [
10]. However, smaller datasets can also be utilized by employing effective data augmentation techniques.
The National Institute of Standards and Technology (NIST) recently discontinued several publicly available datasets from its catalog due to privacy issues [
11,
12]. To address the limited availability of data, synthetic palmprints have been generated, and the generation tool has been made available for public use. This approach was motivated by previous works like Palm-GAN [
13], which aimed to generate palmprint images using generative adversarial networks (GANs).
The primary motivations for creating synthetic images are their affordability, effectiveness, and ability to provide increased privacy during testing. Moreover, significant advancements in the quality and resolution of images generated by GANs have been made recently [
7,
8,
9]. A standard GAN generator’s architecture operates similarly: initially, rough low-resolution attributes are created, which are progressively refined through upsampling layers. These features are blended locally using convolution layers, and additional details are added via nonlinear processes [
13]. However, despite these apparent similarities, existing GAN structures do not generate images in a naturally hierarchical way. While broader features primarily influence the existence of enhanced details, they do not precisely determine their locations [
12]. Instead, a significant portion of the more detailed information is determined based on fixed pixel coordinates.
Synthetic data can be relied upon instead of real-world data [
13]. A generator model can learn over training images to generate a synthetic image. In this scenario, synthetic data have the edge over real data regarding enrollment detection and verification [
14]. A large number of synthetic datasets can be produced at low cost and with little effort while posing no privacy risk [
15]. A single synthetic image with well-controlled modifications can also alter and expand the dataset [
13]. The traditional method to develop synthetic images involves changing the orientation of images and using filters such as the Gabor filter, which changes the final structure of any image [
5,
10,
16]. In other classical approaches, the orientation of the fingerprint or the skin color for facial biometrics is changed [
11,
12]. There is no traditional approach to generating synthetic palmprints [
14].
A framework for generating palm images using a style-based generator named StyleGAN2-ADA, a variation within the StyleGAN family, was previously introduced [
1]. The current goal is to create synthetic images using different GANs like StyleGAN2-ADA and StyleGAN3 to demonstrate a more realistic transformation process. In this process, the position of each detail in the image will be entirely determined from the key features. This is the only StyleGAN-based approach to generate high-resolution palm images up to 2048 × 2048 pixels. In a previous study, a TV-GAN-based framework was applied to generate palmprints; however, high-resolution images were not generated in that work [
13].
The contributions are as follows:
Our model utilizes a high-resolution and progressive growth training approach, producing realistic shapes and hand–image boundaries without facing quality issues.
New quality metrics are developed to assess the usability of generated images and their similarity to original palm images, ensuring that the synthetic palm images do not reveal real identities.
The generated model is publicly available, representing the first StyleGAN-based palm image synthesis model.
The SIFT (Scale-Invariant Feature Transform)-based method to filter unwanted images from the generated synthetic images is open-sourced.
A novel script to detect finger anomalies in the field of palmprint recognition is open-sourced.
The rest of the paper is organized as follows. In
Section 2, the pre-processing method and proposed model architecture are presented. In
Section 3, we discuss the process of preparing the data for the experiment, and the training method for the model is described in
Section 4. The implementation of model training is discussed in
Section 5. The results and discussion are in
Section 6, and the conclusion is provided in
Section 7.
5. Implementation
5.1. Preparing Scripts and Datasets
The datasets were initially converted to the
.tfrecords format due to requiring less space than the original format. The StyleGAN2-ADA example script provided by NVIDIA [
18] was modified to adjust its parameters to suit the dataset’s requirements. To accommodate different palm features of left and right hands, such as principal and secondary lines, the datasets were grouped into various combinations such as only right-hand datasets, only left-hand datasets, Dataset1 right-hand datasets with the rest of the dataset, and Dataset2 left-hand datasets with the rest of the dataset to achieve better output. The training sessions for the database were grouped as follows (
Table 1).
5.2. Training Models
StyleGAN2-ADA: The training was conducted on Google Colaboratory Pro Plus with 52 GB RAM and a P100 GPU. The StyleGAN2-ADA model was trained for 500 epochs in a Jupyter notebook environment. During the training sessions, each dataset from DB1 and DB2 generated a model (.pkl) file every 100 epochs. Starting with 4 × 4 pixels as mentioned in the architecture, the resolution increased progressively, and the training was concluded when it reached 512 × 512 pixels. The final (.pkl) file was used to save the models.
StyleGAN3: The training utilized a high-performance 4080 GPU and a substantial 64 GB of RAM. The training was conducted in a Jupyter notebook environment. The training protocol of the model was organized in the following manner. The training process consisted of multiple steps for monitoring and modifying the model’s performance. Initially, the model performed training for a total of 1000 epochs.
This research focused on ensuring the best quality of the generated images, particularly regarding their resolution, detail, and fidelity to the input data. We also monitored the training progress through periodic snapshots taken every fifty epochs. This approach allowed us to track the model’s evolution closely and make necessary adjustments as needed.
5.3. Quality Assessment
Two quality evaluation factors were assessed: manually checking the images by eye and the Scale Invariant Feature Transform (SIFT) algorithm [
29]. Images showing visibly low quality were eliminated after manually checking them by eye. These were classified as “poor-quality images”. Images that did not accurately detect the main lines were filtered and separated from the dataset using a test script. The script verified if the generated images had anomalies such as six fingers or palm marker issues.
The SIFT-based image processing method was applied to further eliminate unwanted images. The training samples were resized to match the test samples, and all images were resized to 205 × 345 pixels to focus on the region of interest (ROI), specifically the palm.
Pixel ratios of the images were calculated, such as distances
and
between specific points
and
, respectively, and the ratio of
to
was then computed. Images not accurately identifying the principal lines were excluded from the dataset using a ‘score value’ threshold. The threshold value for image quality is subjective to the algorithm and dataset being used. A ratio value between 0.2 and 0.8 to determine a well-matched image. For this research, an image pair was considered well matched for SIFT features if the ratio value exceeded 0.5.
Figure 7 shows the palm’s scaled and resized images (ROI).
5.4. Performance Evaluation
For StyleGAN2-ADA, the images were separated into “good-” and “poor-quality” categories. To ensure the uniqueness of each synthetic palm image, the SIFT algorithm was implemented in the database for both StyleGAN2-ADA and StyleGAN3. Pairs of images were randomly selected and compared using the SIFT feature extraction. Matches between two images were computed based on their indices in the image list, considering their respective keypoints and descriptors. The score was calculated as a matching percentage considering each image’s number of matches and keypoints. This process was performed iteratively eight hundred times, each time with different randomly selected images.
To apply the SIFT algorithm to the ROI images for StyleGAN2-ADA and StyleGAN3, a total of eight hundred ROI palm and finger images were created to thoroughly verify the uniqueness of each synthetic hand image. Pairs of hand images where SIFT had been previously applied were used for comparison. The computed similarity score between the key points and descriptors was represented by the output score. This process was performed iteratively ten times with different pairs.
For StyleGAN2-ADA, the 3439 synthetic images were divided into two classes: “good-quality” and “poor-quality”, as shown in
Table 2. With 113 poor-quality images, 3328 images were classified as good. The images were tested to find out how many were high-quality. The Python script is provided in the shared repository:
https://github.com/rizvee007/palmphoto (accessed on 9 September 2024).
For StyleGAN3, the “good-” and “poor-quality” images were separated into two classes from 1400 synthetic images. With 21 poor-quality images across five different categories, the remaining 1379 images were classified as good. The “good-” and “poor-quality” images were fed into a test script to determine the number of quality images. The Python script is provided in the shared repository:
https://github.com/rizvee007/palmphoto (accessed on 9 September 2024).
To measure the quality and diversity of the model numerically, the Fréchet Inception Distance (FID) [
25] was computed on the generated palmprint images by the model. FID, an extension of the Inception Score (IS) [
26], was previously proposed to assess the quality of images generated by GANs and other generative models and compare the statistics of generated samples to real samples.