Next Article in Journal
A Robust and Versatile Numerical Framework for Modeling Complex Fractional Phenomena: Applications to Riccati and Lorenz Systems
Next Article in Special Issue
RGB-D Camera and Fractal-Geometry-Based Maximum Diameter Estimation Method of Apples for Robot Intelligent Selective Graded Harvesting
Previous Article in Journal
Deterministic and Stochastic Analysis of Fractional-Order Legendre Filter with Uncertain Parameters
Previous Article in Special Issue
Artificial Intelligence-Based Segmentation and Classification of Plant Images with Missing Parts and Fractal Dimension Estimation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimation of Fractal Dimension and Detection of Fake Finger-Vein Images for Finger-Vein Recognition

Division of Electronics and Electrical Engineering, Dongguk University, 30 Pildong-ro 1-gil, Jung-gu, Seoul 04620, Republic of Korea
*
Author to whom correspondence should be addressed.
Fractal Fract. 2024, 8(11), 646; https://doi.org/10.3390/fractalfract8110646
Submission received: 18 October 2024 / Revised: 28 October 2024 / Accepted: 29 October 2024 / Published: 31 October 2024

Abstract

:
With recent advancements in deep learning, spoofing techniques have developed and generative adversarial networks (GANs) have become an emerging threat to finger-vein recognition systems. Therefore, previous research has been performed to generate finger-vein images for training spoof detectors. However, these are limited and researchers still cannot generate elaborate fake finger-vein images. Therefore, we develop a new densely updated contrastive learning-based self-attention generative adversarial network (DCS-GAN) to create elaborate fake finger-vein images, enabling the training of corresponding spoof detectors. Additionally, we propose an enhanced convolutional network for a next-dimension (ConvNeXt)-Small model with a large kernel attention module as a new spoof detector capable of distinguishing the generated fake finger-vein images. To improve the spoof detection performance of the proposed method, we introduce fractal dimension estimation to analyze the complexity and irregularity of class activation maps from real and fake finger-vein images, enabling the generation of more realistic and sophisticated fake finger-vein images. Experimental results obtained using two open databases showed that the fake images by the DCS-GAN exhibited Frechet inception distances (FID) of 7.601 and 23.351, with Wasserstein distances (WD) of 18.158 and 10.123, respectively, confirming the possibility of spoof attacks when using existing state-of-the-art (SOTA) frameworks of spoof detection. Furthermore, experiments conducted with the proposed spoof detector yielded average classification error rates of 0.4% and 0.12% on the two aforementioned open databases, respectively, outperforming existing SOTA methods for spoof detection.

1. Introduction

The evolution of identity verification in security technologies can be characterized as follows: (1) methods using keys, security cards, IDs, etc. These carry the risk of loss as the item must always be carried. (2) Methods using passwords, personal identification numbers (PINs), pattern locks, etc. These require memorization and may be exposed through external factors. (3) Methods using biometric data like fingerprints, faces, irises, and finger-veins. These are advantageous for security as they are unique to each individual, require neither possession nor memorization, and are less susceptible to external exposure. With advantages such as high accuracy and convenience, biometrics has been extensively studied for application to a variety of tasks and is now used in many security fields. However, biometric recognition systems which use pattern recognition techniques to compare enrolled-user biometric images with real-time input remain vulnerable to spoof attacks that exploit stolen images or data through data breaches or hacking [1]. Therefore, there is a need for specialized research on the anti-spoofing of biometric systems.
With the advancement of deep learning technologies, spoofing techniques have also evolved. Generative adversarial networks (GANs) are being studied for their capability to train generators and discriminators in an adversarial relationship, enabling the creation of image samples that are similar to original images with respect to characteristic distribution. Although images generated by GANs typically contain a unique GAN fingerprint, so that detection is straightforward via classifiers like convolutional neural networks (CNNs), researchers have confirmed that spoof attacks are possible with post-processed generated images through the use of various low-pass filters (Gaussian filter, median filter, average filter, etc.) following the discovery of high-frequency components in existing research [2]. These findings cause biometric recognition systems to fail in detecting adversarial spoof attacks. A spoof attack on post-processed generated images can cause conventional spoof detection mechanisms to fail, leading to inaccurate results. This enables unauthorized users to repeatedly gain access to sensitive information, potentially causing significant social, organizational, and financial losses.
Accordingly, previous research has adopted cycle-consistent adversarial networks (CycleGANs) to generate finger-vein images for the training of spoof detectors [3]. However, these approaches show limitations in generating elaborate fake finger-vein image samples. To overcome this challenge, a novel method is developed for generating fake finger-vein images, as well as a corresponding spoof detector for finger-veins. By integrating the fake finger-vein images generated by our method into conventional spoof detectors for additional training, or by directly applying our proposed spoof detection method, the security level of a finger-vein recognition system can be significantly enhanced, improving its robustness against spoof attacks.
Compared with previous works, our study has the following contributions:
-
To resolve the issue of the generation of less elaborate fake finger-vein images by the existing methods, our study introduces a novel method for generating elaborate fake finger-vein images that can attack conventional finger-vein-recognition systems. We propose the densely updated contrastive learning-based self-attention generative adversarial network (DCS-GAN);
-
The DCS-GAN is trained using the adaptive moment estimation (Adam) optimizer with sharpness-aware minimization (SAM) to improve the model’s generalization. This allows for the creation of high-quality fake images. Furthermore, by updating the loss through a comparison of generated images and real images using a DenseNet-161 that is pre-trained on finger-vein data, the model can create fake images with a distribution similar to the original ones. Additionally, the inclusion of a self-attention layer in the generator emphasizes the finger-vein patterns, enhancing the quality of the generated images;
-
The performance of spoof detection is improved by an enhanced convolutional network for a next-dimension (ConvNeXt) with a large kernel attention (LKA). This not only takes into account the adaptability in the spatial dimension, inherent to traditional self-attention, but also considers adaptability in the channel dimension, thereby computing long-range correlations and improving spoof detection;
-
To improve the spoof detection performance of the proposed method, we introduce fractal dimension estimation to analyze the complexity and irregularity of class activation maps from real and fake finger-vein images, enabling the generation of more realistic and sophisticated fake finger-vein images. In addition, we freely share our DCS-GAN, enhanced ConvNeXt, algorithm codes, and generated fake finger-vein images through [4], so that researchers can utilize them for further study and ensure fair evaluations.
The rest of this manuscript is organized as follows. Section 2 analyzes the existing research, while Section 3 provides a thorough explanation of the proposed method. Section 4 presents the experimental results, which are then discussed in Section 5. Finally, Section 6 concludes the study.

2. Related Work

The research on finger-vein spoof attacks and detection can be categorized into two areas: spoof attacks and spoof detection. Therefore, the existing research related to spoof attacks, relying on fabricated objects and generated images, is analyzed herein. We also categorize and examine the existing research related to spoof detection based on machine learning and deep learning.

2.1. Spoof Attack

2.1.1. Using Fake Fabricated Artifacts

Previous research on finger-vein spoof attacks has mainly used handcrafted fake images to attempt spoof attacks. Nguyen et al. [5] printed 56 real finger-vein images on three types of paper—overhead project (OHP) films, A4 paper, and matte paper—using a LaserJet printer at various resolutions: 300 (low-resolution), 1200 (middle-resolution), and 2400 (high-resolution) dots per inch (dpi). They considered the texture of the paper and details based on resolution to generate the total of 7560 fake finger-vein images. They then attached these to fingers and attempted spoof attacks. Tome et al. [6] printed 220 real images on paper using a LaserJet printer and enhanced the vein outlines using a board marker to carry out spoof attacks. Singh et al. [7] printed 468 real images on glossy paper using an inkjet printer and improved the quality of the vein patterns using an existing algorithm [8] to generate 468 fake images. Raghavendra and Busch [9] used 100 real images and printed them on LaserJet and inkjet printers and replayed them smartphone displays, creating a total of 300 fake images. Additionally, Krishnan et al. [10] used a prosthesis with an inkjet-printed finger-vein image attached and a thin rubber cap. Schuiki et al. [11] used a finger-vein image printed on a LaserJet printer and attached it to wax. However, such fabricated artifacts for print and display attacks have limitations. Although they may look similar to the original (real) images, they suffer from issues like paper texture, resolution, and noise due to the acquisition environment. Furthermore, they lack effectiveness against recently improved CNN-based spoof detectors.

2.1.2. Using Fake Generated Images

The evolving GANs, developed by many researchers, generate data via training through competition between their generator and discriminator networks. This has brought about the following positive effects. (1) They can be used as a data augmentation method in small datasets where data acquisition is difficult [12,13]. (2) They can generate labeled data in segmentation fields where labeling is challenging or expensive to carry out [14,15]. (3) They can address image degradation issues caused by low or high illumination, blur, noise, etc. [16,17]. However, the ability of GANs to produce images with high similarity to the original images has led to risks of spoof attacks in the biometric recognition field, including deep fakes. Although there has been considerable research on spoof detection against generated (fake) images in the domains of face, iris, and fingerprint recognition, there has been very little research on spoof detection for finger-vein recognition. Previous research using CycleGAN [18] to create fake finger-vein images similar to real images for spoof attacks had the drawback of not generating highly elaborate fake finger-vein images [3].

2.2. Spoof Detection

2.2.1. Machine Learning-Based Methods

Finger-vein images display low-quality characteristics, As described in Chapter 1, they contain extensive noise, including scattering blurring, as they use the pattern of veins under the skin of a finger illuminated by near-infrared (NIR) light. Therefore, conventional image processing methods have been applied in previous finger-vein spoof detection research. Raghavendra and Busch [9] applied a steerable pyramid to extract information about the various sizes and directions in finger-vein images and used a support vector machine (SVM) for spoof detection and binary classification (real or fake). Tirunagari et al. [19] employed dynamic mode decomposition (DMD), a technique for analyzing the dynamic characteristics of data, and specifically used a windowed dynamic mode decomposition (W-DMD) approach, moving a sliding window across the entire time range of the data. Features of the images were then extracted and classified using SVM. Kocher et al. [20] employed extension binary patterns (LBP) to extract image features and performed real and fake classification through a linear SVM. Similarly, Nguyen et al. [5] translated the input images into the frequency domain through Fourier transformation (FT) to extract information about the frequency bands in the image, or decomposed the low- and high-frequency components using Haar and Daubechies wavelet transformation to extract information, and then performed spoof detection through SVM. Additionally, Bok et al. [21] extracted heart rate and blood flow signal characteristics from finger-vein videos using discrete FT and used them to train an SVM for spoof detection. However, a drawback of the aforementioned studies is the degradation of spoof detection performance due to various spoof data creation methods.

2.2.2. Deep Learning-Based Methods

Recent advancements in deep learning technology have led to research on spoof detection using CNNs. Nguyen et al. [1] used modified models of visual geometry group (VGG)-Net [22] and AlexNet [23] to extract feature maps from images. Subsequently, they performed dimensionality reduction on these feature maps with the help of principal component analysis (PCA), and conducted fake classification via SVM. Shaheed et al. [24] employed only the entry flow of Xception [25] for feature extraction and performed spoof detection through a linear SVM. Kim et al. [3] used two types of ensemble networks, denseNet-161 and denseNet-169 [26], to obtain spoof detection scores. They then conducted score-fusion via SVM to classify them as real or fake. Additionally, Singh et al. [7] utilized SfS-Net [27] to acquire two types of images: normal-map and diffuse-map. They then extracted features using texture descriptors like LBP, local phase quantization (LPQ), and binarized statistical image features (BSIF), and used a linear SVM to obtain three different spoof detection scores. They classified real and fake data through SUM-rule fusion. However, the limitation of the aforementioned methods is that they do not achieve high accuracy in the spoof detection of more elaborately generated fake finger-vein images. To mitigate this problem, a spoof detection approach using an enhanced network of ConvNeXt-Small is proposed in this study. Table 1 shows the comparisons of existing and proposed methods for spoof attack and spoof detection in finger-vein recognition.

3. Proposed Method

3.1. Flow Diagram of the Proposed Method

In this subsection, an overview of the proposed model, which is depicted in Figure 1, is described. Initially, for the spoof attack procedure, we extract the region of interest (ROI) from the input finger-vein image using the preprocessing method explained in Section 3.2. The extracted ROI image is then used as an input to the DCS-GAN to generate a fake finger-vein image. Subsequently, through low-pass filtering-based image blurring, such as median filter, Gaussian filter, and average filter blurring, we remove the GAN fingerprints present in the fake sample produced by the DCS-GAN, thus creating a more elaborate fake image. In the spoof detection procedure, the finger-vein image that has undergone post-processing is used as an input to the ConvNeXt with LKA, which ultimately classifies it as either a real or fake finger-vein image.
In our research, the synthesis of fake images and our model’s recognition (i.e., learning) do not occur in one cycle of calculations. The synthesis of fake images (spoof attack procedure shown in Figure 1) occurs in advance. Afterwards, with the synthesized fake images, our recognition model is trained and recognizes which images are fake after the training of recognition model is finished (Spoof Detection Procedure of Figure 1). That is, the synthesis of fake images and our model’s recognition (i.e., learning) are performed separately. Therefore, the recognition system which was not trained with the synthesized fake images does not know which images are fake.

3.2. The Preprocessing of the Finger-Vein Images

The preprocessing step is to remove the background and detect the finger ROI in the original finger-vein image, which serves as an input to the finger-vein recognition system. In the finger-vein recognition system, NIR lighting is used, resulting in a structure that blocks external lighting. Consequently, the areas outside the finger contain a black background. To remove this background, it is necessary to detect the finger boundaries at the top, bottom, left, and right. For detecting the left and right boundaries, this study employed the average pixel brightness. Specifically, we calculated the average pixel value along the y-axis for each x-axis position and detected the right and left boundaries based on the x-axis positions where this average value exceeded a certain threshold. Because the penetration amount of the NIR light varies depending on the skin and thickness of the user’s finger, the threshold was adaptively determined on the basis of the average brightness from the input image. For the top and bottom finger boundaries, we detected the lines through filter operations using a 4 × 20 mask [28]. To address errors that may have occurred in the detection of the upper or lower boundaries, we compared the distance between the average value of all detected y-axis boundary coordinates and each detected y-axis boundary coordinate, eliminating outlier points that showed significant differences from the average, and then refined the boundary lines with the remaining points. Based on the refined boundaries, we apply bilinear interpolation to the obtained finger region to acquire the finger ROI of 224×224 pixels, which is used as the input to the pre-trained DCS-GAN.

3.3. Spoof Attack Procedure

3.3.1. Generation of Fake Finger-Vein Image Using DCS-GAN

In this study, the structure of the DCS-GAN used to produce fake finger-vein image samples is displayed in Figure 2. Detailed content about the DCS-GAN generator and discriminator is provided in Table 2 and Table 3, respectively. In the DCS-GAN, the correlation between patches in the features attained by the encoder of the input real image and patches in the feature map extracted from the encoder of the generated fake image is calculated through a patch sample MLP composed of two dense layers, updating the loss. Additionally, the generator’s encoder and an additional encoder share weights, enabling the maximization of mutual information by applying contrastive learning [29].
When acquiring finger-vein images, a NIR camera is used to capture the finger-vein, which is illuminated by NIR light, and the acquired images will show fingerprint marks. In this study, we improved the image reality by adding self-attention [30] after each residual block, as indicated in Table 2, to both preserve the fingerprint texture and emphasize the patterns of veins in the fake image.
A t t e n t i o n Q ,   K ,   V = s o f t m a x Q K T d k V
Equation (1) relates to self-attention, where the query, key, and value are represented by Q, K, and V, respectively. A query is an information vector generated at a specific location in the input data, and a key represents an information vector generated at another location in the input data. A value represents the actual information generated at each location in the input data. First, the similarity (relevance) is computed through the dot product between the query and key ( Q K T ), and scaling by d k is applied for smoother model training. Second, a softmax activation function is applied to obtain the probability distribution, which is finally applied to the input data ( V ) as a self-attention map to emphasize relevant information.
For training the DCS-GAN, we used SAM [31] along with Adam [32] as the optimizer. Adam, which is widely used in existing research, generally offers excellent performance in training data compared with other optimizers, but it poses the risk of overfitting, leading to weaker performance on validation and testing data [33]. SAM helps to reach global minima by smoothing sharp minima during training. As a result, the model can avoid the risk of overfitting (i.e., not being overly tailored to training data) and show improved generalization performance on new data. Therefore, using SAM and Adam together, we were able to improve the generalization performance, which is crucial in generative models, and generate high-quality images. Equations (2)–(6) describe the operations of SAM.
L D ω   L S ω + ϵ + h ( ω 2 2 / σ 2 )
L D ω   max ϵ 2 σ [ L S ω + ϵ L S ω ] + L S ω + h ( ω 2 2 / σ 2 )
L S S A M ω + μ ω 2 2     w h e r e   L S S A M ( ω ) L S ω + ϵ
ω L S S A M ω ω L S ω + ϵ ^ ω = d ω + ϵ ^ ω d ω ω L S ω | ω + ϵ ^ ω = ω L S ( ω ) | ω + ϵ ^ ( ω ) + d ϵ ^ ( ω ) d ( ω ) ω L S ( ω ) | ω + ϵ ^ ( ω )
ω L S S A M ( ω ) ω L S ( ω ) | ω + ϵ ^ ( ω )
In Equations (2) and (3), ω is the vector of the classifier parameters, and h is a strictly increasing function. One of the differences between Equations (2) and (3) is that L S ω is added in Equation (3). In this model, the term max ϵ 2 σ [ L S ω + ϵ L S ω ] ] represents the sharpness, indicating the level of variation in the loss value when ω is altered by ϵ , while the term L S ω signifies the loss value of the training data, as in other existing methods. Additionally, the term h ( ω 2 2 / σ 2 ) represents the term of regularization related to the size of ω . Here, the term of regularization employs an L2 regularizer. As a result, both the sharpness and training loss are minimized, enabling the model to find a relatively flat region. On the basis of this, the problem of selecting optimal parameter values for solving the SAM drawback is formulated in Equation (4). Consequently, while the SAM loss would be calculated as shown in Equation (5), the term d ϵ ^ ( ω ) d ( ω ) ω L S ( ω ) | ω + ϵ ^ ( ω ) involves calculating the Hessian matrix, which is computationally expensive and thus not used to avoid slowing down the training. Therefore, Equation (6) becomes the final SAM loss.
To address the issue of the vanishing gradient in the generator caused by the sigmoid cross-entropy loss function that is traditionally employed to train the discriminator in GANs, Mao et al. [34] updated the generator by using the least square loss. This computes the distances between the distributions of the original samples not as divergence but as a least square error, thus penalizing the image samples that are not close to the decision boundary and enabling the generation of samples closer to real images. For that reason, we selected the least squares GAN (LSGAN) loss for smooth training of the GAN.
L D i s c r i m i n a t o r G e n ,   D i s , X , Y   = 1 2 E y ~ Y D i s y b 2 + 1 2 E x ~ X D i s ( G e n x ) a 2
L G e n e r a t o r G e n , D i s ,   X = 1 2 E x ~ X D i s ( G e n x ) c 2
Equation (7) describes the LSGAN formulation used for training the DCS-GAN discriminator in this study, where a and b , respectively, denote the labels for fake and real images, and D i s and G e n represent the discriminator model and the generator model. To minimize the value of this equation, the term 1 2 E y ~ Y D i s y b 2 must become D i s y = b , and in 1 2 E x ~ X D i s ( G e n x ) a 2 , D i s G e n x = a must be achieved. More specifically, Equation (7) ensures that the discriminator model correctly classifies real images as y , the real image label, and fake images generated by the generator as x , the fake image label. Conversely, Equation (8) is for training the generator, and to minimize its value, 1 2 E x ~ X D i s ( G e n x ) c 2 in the term must become D i s ( G e n x ) = c . Essentially, the data generated as c a finger-vein images should not be classified by the discriminator as having fake image labels.
In this study, besides the typical generator and discriminator constituting a GAN, we also employed a separate encoder section of the generator and an additional patch sample MLP to maximize the amount of mutual information. Based on this, we calculated a multilayer, patchwise contrastive loss aimed at correlating the same patches in the feature maps of real and fake images while not correlating the different patches, as shown in Equation (9).
L P a t c h G e n ,   M l p ,   X = E x ~ X l = 1 L s = 1 S l l z ^ l s , z l s , z l S / s
In Equation (9), M l p is the MLP, L denotes the number of layers as l 1 , 2 , 3 , , L , and S denotes the number of spatial locations as s 1 , 2 , 3 , , S . Therefore, the term E x ~ X l = 1 L s = 1 S l l z ^ l s , z l s , z l S / s results from inputting the output feature maps from the layers of the encoder into the MLP by using their spatial locations. The obtained feature map is then encoded into the image that is output from the generator.
In the case of GANs for general image-to-image translation, the main objective lies in mapping the input images to the output images while preserving their shapes but altering their internal structures. However, our objective is to generate fake images through GANs that the conventional finger-vein spoof detector cannot distinguish from input real images. Therefore, we have additionally applied perceptual loss [35] that allows for the comparison of feature maps between input real images and output fake images. The conventional perceptual loss employs a VGG-16 model pre-trained on ImageNet. As this study aims to spoof attack detection for finger-vein images, DenseNet-161 [3], pre-trained on real and fake finger-vein images, is used as a feature extractor.
L p e r c e p t u a l G e n ,   X , Y = 1 H i , j W i , j C i , j h = 1 H i , j w = 1 W i , j c = 1 C i , j ( i , j ( G e n x ) h , w , c i , j ( y ) h , w , c ) 2
In Equation (10), i , j refers to the feature map obtained after the j th convolution and before the i th maxpooling in the pre-trained DenseNet-161 model. H i , j , W i , j , and C i , j represent the dimensions of the feature map, respectively. Therefore, the term ( i , j ( G e n x ) h , w , c i , j ( y ) h , w , c ) 2 signifies the Euclidean distance between the generated fake image sample and the real sample. Finally, Equation (11) combines Equations (8) and (10) to represent the loss used for training the generator in this study.
L G e n e r a t o r G e n , D i s ,   X + L P a t c h G e n ,   M l p ,   X + L p e r c e p t u a l G e n ,   X , Y
In this study, to generate fake images that are as similar as possible to real images, we used real images as the input images during the DCS-GAN training process. For the target image, we excluded the input image itself and used one of the remaining real images within the input image’s intra-class. The target image was randomly chosen to facilitate smooth learning through the use of diverse inputs. Figure 3 represents the samples of inputs for the DCS-GAN.

3.3.2. The Post-Processing Stage for the Generation of Fake Finger-Vein Images

In the spoof attack procedure in Figure 1, post-processing involves removing the traces of fake image generation. Previously, synthetic images generated by GANs contained a ‘GAN fingerprint,’ so spoof detection was relatively straightforward. However, researchers have found that such high-frequency components can be removed by low-pass filters like the median filter, Gaussian filter, and average filter [2]. Due to this, the risk of spoofing through GANs has increased. Therefore, in this study, we applied the Gaussian filter, median filter, and average filter individually and compared their effects on the spoof detection performance.

3.4. Spoof Detection Procedure

Spoof Detection of Fake-Vein Image by Enhanced ConvNeXt

In this study, we chose ConvNeXt-Small [36] as the base model for detecting fake finger-vein images. ConvNeXt achieves SOTA performance through various improvements such as the use of a stage compute ratio, stem layer (patchify), and inverted bottleneck. For these reasons, and considering computational efficiency, we used ConvNeXt-Small as the base model for spoof detection in this study. ConvNeXt-Small consists of a structure with ConvNeXt Block (1) × 3, ConvNeXt Block (2) × 3, ConvNeXt Block (3) × 27, and ConvNeXt Block (4) × 3. Here, ConvNeXt Blocks (1)–(4) are different blocks, and the × number indicates the number of repetitions. Unlike conventional CNN models, ConvNeXt Blocks utilize a 7 × 7-size kernel to expand the receptive field, thereby enhancing the model’s performance. To further improve the performance of the existing ConvNeXt model, this study proposes an enhanced ConvNeXt-Small model that additionally employs LKA [37] after the last ConvNeXt Block to enable self-adaptation and long-range correlations. This allows emphasized feature maps to be transmitted to the classifier for the spoof detection problem, which involves real or fake classification (binary classification). The structure of the enhanced ConvNeXt-Small is detailed in Figure 4 and Table 4.

3.5. Fractal Dimension Estimation

Fractals are complex structures that display self-similarity and diverge from traditional geometric rules [38]. The fractal dimension (FD) quantifies the complexity of a shape, indicating whether it is more concentrated or dispersed. In this study, a binary image representing the activated region of finger-vein images (real or fake) is used for FD estimation. The FD in this context ranges between almost one and two, reflecting different degrees of complexity. This range encompasses various representations of binary class activation maps (BCAMs), with higher FD values corresponding to greater shape intricacy. The FD for the activated region is calculated using the box-counting method [39,40], where C represents the number of boxes that evenly cover each activated region, and δ is the scaling factor of the boxes. The FD is determined using Equation (12).
F D = lim δ 0 l o g ( C ( δ ) ) l o g ( 1 / δ )
where 1 F D 2 , and for all δ > 0, there exists a C ( δ ) . The pseudocode for estimating the FD of the activated part of the finger-vein image using the box-counting method is provided in Algorithm 1.
Algorithm 1 Pseudocode for Fractal Dimension (FD) Estimation
Input: BCAM: Binary class activation map extracted from DSC-GAN’s encoder
Output: FD: Fractal dimension
1:  Find the largest dimension of the box size and adjust it to the nearest power of 2
Max_dimension = max(size(BCAM))
     δ = 2^[log2(Max_dimension)]
2:  If the size is smaller than δ, pad the image to match δ‘s dimension
     if size(BCAM) < size(δ)
       Pad_width = ((0, δ − BCAM.shape [0]), (0, δ − BCAM.shape [1]))
       Pad_ BCAM= pad(BCAM, Pad_width, mode = ‘constant’, constant_values = 0)
     else
       Pad_ BCAM = BCAM
3:  Initialize an array to store the number of boxes corresponding to each dimension size
n = zeros(1, δ + 1)
4:  Compute the number of boxes, C(δ) containing at least one pixel of the positive region
n[δ + 1] = sum(BCAM[:])
5:  While δ > 1:
       a. Divide the size of δ by 2
       b. Reassign the number of boxes C(δ)
6:  Compute the log(C(δ)) and log(δ) for each δ
7:  Fit a line to the points [(log(δ), log(C(δ))] using the least squares method
8:  The fractal dimension (FD) is found by the slope of the fitted line
Return FD

4. Experimental Results

4.1. Experimental Database and Setups

For the performance evaluation of the DCS-GAN for generating fake finger-vein images in the spoof attack procedure and the enhanced ConvNeXt-Small for detecting forgeries in the spoof detection procedure, we used real finger-vein images from two open databases: the ISPR database [1] and the Idiap database [41]. The ISPR database consists of a total of 3300 real images captured from all fingers of both hands of 33 individuals, each captured 10 times (10 trials × 33 individuals × 2 hands × 5 fingers). The Idiap database comprises a total of 440 real images captured from the index fingers from both hands of 110 individuals, each captured twice (2 trials × 110 individuals × 2 hands × 1 finger). In Table 5, a description of the ISPR and Idiap databases is presented, and Figure 5 shows examples from both databases.
The experimental work of this study was performed using a desktop computer equipped with an Intel® Core (TM) i7-9700F central processing unit (CPU) operating at 3.0 GHz, supplemented by 32 GB of RAM and an NVIDIA GeForce RTX 3060 graphics processing unit (GPU). This graphics card includes 3584 compute unified device architecture (CUDA) cores and has a total of 12 GB of dedicated graphics memory [42].

4.2. Training of the Proposed Networks

All experiments in this study were carried out using two-fold cross-validation. Specifically, in the first fold validation, half of the total data were used for training of the network, while the remaining half were used to test the network. In the second fold validation, this was reversed. The final testing accuracy was calculated by averaging the testing accuracies from the two folds. From the training data, 10% of the data were used as a validation set to avoid model overfitting. For effective training, the ISPR database images were resized to 256 × 256 and then subjected to random crop augmentation to a size of 224 × 224. Particularly, in the Idiap database, due to the high risk of overfitting with the use of only 440 real images, each real image was subjected to 10-pixel shifts in all four directions (up, down, left, and right), resulting in a total of 1760 training images (440 images × 4 directions). Figure 6 shows examples of the shift augmentation applied to the Idiap database.

4.2.1. Training of DCS-GAN for Spoof Attack

For the spoof attack procedure, the DCS-GAN was trained for the generation of fake images that are similar to the real images. The initial learning rate was set at 0.0002, and it decayed at a rate of 0.9 every 10,000 steps, completing a total of 400 epochs. Table 6 provides details about the training parameters. Figure 7a refers to the training loss graphs for both the generator and the discriminator. This graph indicate sufficiently converged training results from the training split. Figure 7b displays the validation loss graphs for the generator and the discriminator, confirming that the DCS-GAN did not overfit on the training data.

4.2.2. Training of Enhanced ConvNeXt-Small for Spoof Detection

In the procedures for detecting spoof attacks with fake finger-vein images obtained by the DCS-GAN, Adam was used as the optimizer and cross-entropy loss was employed as the loss function. Table 7 details the training parameters.
To train the enhanced ConvNeXt-Small for spoof detection, we mixed the original (real) images with the generated (fake) images from the DCS-GAN and trained it with two-fold cross validation. Figure 8a presents the resulting training accuracy and loss graphs for the enhanced ConvNeXt-Small. This demonstrates that, with the increase in epochs, both the accuracy and loss converge, indicating sufficient training on the data. Figure 8b presents the validation accuracy and loss graphs of the enhanced ConvNeXt-Small, confirming that the model has not overfitted on the training data.

4.3. Testing of Proposed Model

4.3.1. Evaluation Metrics

In this study, the quality of the generated images by the GAN was evaluated using the Frechet inception distance (FID) [43] as per Equation (13), which was commonly used in previous research [3,29]. Additionally, the Wasserstein distance [44] was also used to evaluate the quality of uneven illumination-corrected vein images as in Equation (14) [17]. The quality was assessed by comparing real (original) images with fake (generated) images.
F I D = μ r e a l μ f a k e 2 + T r (     r e a l + f a k e 2   r e a l f a k e )
W D p ( P , Q ) = I n f γ Π P , Q R d × R d I r e a l I f a k e p d γ     1 / p  
In Equation (13), μ r e a l and μ f a k e represent the average pixel values of the real and fake image samples, respectively; T r represents the diagonal sum function obtained from Inception-v3, pre-trained on ImageNet; and r e a l and f a k e represent the covariance matrices. In Equation (14), Π P , Q signifies the joint probability distribution, R d × R d represents the marginal probability distribution, and d γ denote the measure according to the joint distribution γ .
To test the performance of the model’s spoof detection, we used the attack presentation classification error rate (APCER), as per Equation (15), according to the ISO/IEC-30107 standard [45]. We also used the bona fide presentation classification error rate (BPCER) in Equation (16) to indicate the error rate of incorrectly classifying real images as fake. Additionally, the average classification error rate (ACER) was calculated using Equation (17), representing the average error rate between the APCER and BPCER.
A P C E R = 1 ( 1 I f a k e ) i = 1 I f a k e D e t e c t o r i
B P C E R = 1 I r e a l i = 1 I r e a l D e t e c t o r i
A C E R = 1 2 A P C E R + B P C E R  
I r e a l refers to the number of real (original) images, and I f a k e refers to that of fake (generated) images. Additionally, D e t e c t o r i refers to the predicted labels obtained from the spoof detector. Therefore, in Equation (15), i takes a value of 1 if a fake image is correctly classified, and in the case of incorrect classification, a value of 0 is assigned. In Equation (16), i takes a value of 0 if a real image is correctly classified, and in the case of incorrect classification, a value of 1 is assigned.

4.3.2. Performance Test of the Spoof Attack

4.3.2.1. Ablation Studies

As the foremost step of our ablation study to measure the performance of the spoof attack, we compared the performance by incrementally removing the key modules of the DCS-GAN through the ISPR database. As indicated in Table 8, we confirmed that the WD metric by the CUT model (without SAM and self-attention) with a Dense perceptual was the best, but that the FID metric by the DCS-GAN model was the best. However, the goal of this research is to improve the performance of attacks against spoofing detectors. The FID metric reflects the performance of spoofing attacks against spoof detector CNNs, and it uses features obtained from pre-trained Inception V3 models [46]. The WD metric is usually used to evaluate the simple quality of uneven illumination-corrected images based on differences in pixel distribution [17]. Therefore, the FID metric can provide a more accurate measure of performance than the WD. To check the quality of the fake finger-vein images, we evaluated their effectiveness in spoof attacks using existing finger-vein spoof detectors. Specifically, we used DenseNet-161 and DenseNet-169, as in previous research [3], to evaluate the images’ performance using the ACER metric on the generated images as listed in Table 8. As is evident from Table 9, the DCS-GAN model with all proposed modules generated fake images that yielded the highest spoof detection error rates: an average ACER of 1.05% for DenseNet-161 and 1.03% for DenseNet-169. This confirms that our fully equipped DCS-GAN model produces fake images that are the most similar to real images. For the next ablation study, we compared our results with the performance of data augmentation techniques applied to the Idiap database, for which there was a shortage of real images during DCS-GAN training. As demonstrated in Table 10, applying data augmentation via random cropping from 256 × 256 to 224 × 224 and shifting in four directions (up, down, left, right) resulted in the highest performance.
For the next ablation study, we compared our results to those of spoof detection performance based on the post-processing described in Section 3.3.2. All spoof detection training in this work was performed under the assumption that the actual conditions of spoof attacks were unknown, specifically how the fake images were generated. We trained our model using fake images without post-processing, while fake images with post-processing were only used for testing. The experimental results, as shown in Table 11, reveal that fake images generated from the ISPR database and post-processed with a median 5 × 5 filter led to the highest spoof detection errors: 5.74% for DenseNet-161 and 7.31% for DenseNet-169. Specifically, post-processing with the median 5 × 5 filter produced fake images that were the most similar to real images, hindering the spoof detection. For all subsequent experiments using the ISPR database, we used images post-processed with the median 5 × 5 filter. Additionally, as shown in Table 12, the fake images generated from the Idiap database and post-processed with the Gaussian 3 × 3 filter also exhibited the highest spoof detection errors: 2.5% for both DenseNet-161 and DenseNet-169. Hence, using the Gaussian 3 × 3 filter for post-processing produced the fake images that were the most similar to real images, effectively challenging the spoof detection. For all subsequent experiments using the Idiap database, we used images post-processed with the Gaussian 3 × 3 filter.

4.3.2.2. Comparing Image Quality by the Proposed and SOTA Approaches

We compared the similarities between both fake images generated by our DCS-GAN and real (original) images and those generated by the SOTA methods. As shown in Table 13, the DCS-GAN outperformed all other methods when it was evaluated using the FID metric. In contrast, when using the WD metric, Pix2Pix or Pix2PixHD showed the highest results. As explained in Section 4.3.2.1, the FID metric provides a more accurate measure of performance than the WD metric. In some evaluation metrics, the difference between the DCS-GAN and SOTA methods is small. However, the final goal of this study is to generate more realistic fake images, which makes detecting spoof attacks more difficult than it is when using the SOTA methods, and this was verified in Section 4.3.3.2. Figure 9 shows examples of the fake finger-vein images generated for spoof attacks using the DCS-GAN and the SOTA methods. As can be seen, the DCS-GAN effectively creates fake images which are more similar to the real images than those generated by the SOTA methods. The spoof detection results of the SOTA spoof attack methods and the proposed DCS-GAN method are compared in Section 4.3.3.2.

4.3.2.3. FD Estimation for Evaluating Generated Image Quality by the Proposed Method

To evaluate the fake finger-vein images generated by the proposed method, we performed FD estimation, which can serve as a metric to analyze the complexity of and assess the similarity between real and fake images. In this approach, we utilized Eigen-CAM [49] to extract the class activation map (CAM). It can generate the CAM without requiring class labels. Unlike traditional CAM techniques, which are typically used to visualize activation maps corresponding to specific class labels, Eigen-CAM identifies key activation regions in feature maps independently of any class. First, we obtained a CAM that represents important regions from the final layer of the generator’s encoder in the DCS-GAN model, and the extracted activation map was then binarized to produce a binary class activation map (BCAM), which was subsequently used for FD estimation. In our research, we did not turn the grayscale finger-vein images into black and white ones. Instead, we turned the red-green-blue color images of the class activation maps for the real and fake finger-vein images into black and white ones for fractal dimension estimation, as shown in Figure 10. In detail, we used the fixed threshold of 180 for the binarization of the red images, where the pixel whose value is higher than or the same as 180 is presented as a white pixel in both cases of real and fake finger-vein images, whereas a pixel with a value less than 180 is presented as a black pixel, as shown in Figure 10. That is because the pixels with a high red value indicate important features in the class activation map [49].
The FD values represent the complexity of the BCAM of the finger-vein images. As shown in both Figure 10 and Table 14, the FD values of the real and fake finger-vein images are similar. This indicates that the fake image generated by the DCS-GAN has almost the same level of complexity as the real image, suggesting that the fake image is generated nearly identically to the real one. Therefore, it can be concluded that the fake images produced by the method proposed in this paper are highly similar to the real images while preserving the genuine characteristics of the real images. Furthermore, this suggests that this method can play a crucial role in enhancing the security level of finger-vein recognition systems.

4.3.3. Performance Test of Spoof Detection

4.3.3.1. Ablation Study

To test the performance of the spoof detection, an ablation study was performed comparing the results of the enhanced ConvNeXt to those of the conventional ConvNeXt when attempting a spoof attack using fake finger-vein images generated by our method. Table 15 shows the performances of the conventional ConvNeXt and enhanced ConvNeXt on the ISPR and Idiap databases.
On the ISPR database, the enhanced ConvNeXt-Small (proposed method) reduced the average ACER to 0.4% during one- and two-fold validation, a 0.41% decrease compared with the conventional ConvNeXt-Small. Moreover, the enhanced ConvNeXt-Tiny showed a reduction to 0.56%, a 0.42% decrease compared with the conventional ConvNeXt-Tiny, albeit with a slightly higher error rate than the enhanced ConvNeXt-Small. A relatively higher error rate of 0.16% was observed. In the Idiap database, the enhanced ConvNeXt-Tiny also outperformed the conventional ConvNeXt-Tiny, and the enhanced ConvNeXt-Small (proposed method) showed the best result, with an ACER of 0.12%.

4.3.3.2. Comparisons of Spoof Detection Accuracies by Proposed and SOTA Methods

In this subsection, the spoof detection accuracy of the proposed spoof detector is compared with that of the SOTA spoof detectors. First, for a fair performance evaluation, we compared the performance based on various score-fusion methods used in existing research [3] and a detector trained on generated images from the DCS-GAN, as shown in Table 16. Table 16 presents the performance of existing spoof detectors on these fake finger-vein images obtained by the DCS-GAN. In the experiments on the ISPR database, an ACER of 0.82% was observed, and for the Idiap database, it was 0.34%. In comparison, using the fake-image generation method in the existing study [3], the ACER increased to 0.32% on the ISPR database and 0.23% on the Idiap database, marking increases of 0.5% and 0.11%, respectively. This confirms that spoof attacks using fake images produced by the DCS-GAN more effectively prevent spoof detection compared with the method used in the existing research [3].
Next, we compared the performance of the proposed spoof detector with that of the SOTA detectors, as shown in Table 17 and Table 18. As confirmed in Table 17 and Table 18, the proposed spoof detector exhibits the best performance. Moreover, we verified the equal error rate (EER), as shown in Figure 11, using the receiver operating characteristic (ROC) curves of the true positive rate (TPR) (Equation (18)) according to the false positive rate (FPR) (Equation (19)), similar to previous research [50].
T P R = 1 1 I r e a l i = 1 I r e a l D e t e c t o r i
F P R = 1 1 I f a k e i = 1 I f a k e D e t e c t o r i
In Equations (18) and (19), I r e a l denotes the number of real (original) images, and I f a k e refers the number of fake (generated) images. Additionally, D e t e c t o r i refers to the predicted labels obtained from the spoof detector. Therefore, in Equation (18), i will take the value of 1 if the input real image is incorrectly classified as a fake image, and the value of 0 if it is correctly classified as a real image. Additionally, in Equation (19), i will take the value of 0 if the input fake image is incorrectly classified as a real image, and the value of 1 if it is correctly classified as a fake image. As indicated in Figure 11, we confirmed that the proposed spoof detector exhibits the best performance.
Subsequently, we performed comparisons of the spoof detection testing errors with the use of the images generated by the DCS-GAN and the SOTA methods using the proposed enhanced ConvNeXt-Small detector, which showed the best detection performance, in Table 17 and Table 18 and Figure 11. As listed in Table 19, the proposed DCS-GAN had the highest ACER, confirming that the DCS-GAN is the most effective at generating fake images that are the hardest to detect, thus being the closest to real images.

4.3.3.3. Comparisons of Algorithm Complexity

In this subsection, we evaluate the number of trainable parameters (param.), the number of floating point operations per second (FLOPs), and the GPU memory usage of the proposed method. Additionally, we computed the average processing time on a Jetson TX2 board to assess its feasibility in resource-constrained environments. As given in Figure 12, the Jetson TX2 board is equipped with an NVIDIA PascalTM-family GPU consisting of 256 compute unified device architecture (CUDA) cores [52].
As indicated in Table 20, the processing time of our method on the Jetson TX2 system is 97.22 ms (10.29 (1000/97.22) frames per second (fps)), with GPU memory usage of 219.16 megabytes (MB), the number of parameters being 51.29 mega (M), and the number of FLOPs being 17.17 Giga (G). While not the best in all metrics shown in Table 20, the proposed method still shows the best accuracies in spoof detection compared with the existing SOTA methods, as demonstrated in Table 17 and Table 18 and Figure 11. We also confirmed that the proposed spoof detector operates effectively even on the resource-limited Jetson TX2 embedded board. Although the Modified VGG16 + PCA + SAM [1] and Modified Xception + LSVM [24] were faster in terms of processing time compared to the proposed method, they had certain drawbacks. The Modified VGG16 + PCA + SAM required 2.46 times more GPU memory, while the Modified Xception + LSVM showed higher ACER values, with 0.74% on the ISPR database and 1.13% on the Idiap database, when compared to the proposed method.

5. Discussions

In the spoof attack procedure, the DCS-GAN exhibited performance improvements of 6.274 and 0.845 in FID on the ISPR and Idiap databases, respectively, compared to the second-best model. Additionally, as illustrated in Figure 9, the images generated by the DCS-GAN appear to be more similar to the original images than those generated by the SOTA methods. In the spoof detection procedure, the enhanced ConvNeXt-Small model showed an improvement in ACER of 0.42% on the ISPR database and of 0.22% on the Idiap database compared to the second-best model. Unlike the existing spoof detection methods for finger-vein recognition systems, our proposed method operates without requiring additional classifiers (i.e., SVM) and demonstrates a reasonable processing time, as illustrated in Table 20, making it suitable for resource-constrained embedded environments. However, Figure 13 and Figure 14, respectively, display examples of correct and incorrect spoof detection for real images from the ISPR and Idiap databases, as well as fake images generated by the DCS-GAN. As shown in Figure 13, the proposed spoof detector can accurately distinguish between real and fake images that are nearly indistinguishable to the naked eye. As in Figure 14, camera noise and fingerprint residues present in the real images are mostly eliminated in the fake images. These factors are believed to contribute to the incorrect spoof detection results.
Additionally, to identify the criteria used by the enhanced ConvNeXt-Small to discriminate real and fake images, we examined the extracted features through gradient-weighted class activation mapping (Grad-CAM) images [53]. Figure 15a shows the Grad-CAM images for real images, while Figure 15b displays the Grad-CAM images for fake images generated from those in Figure 15a. Starting from the leftmost images in Figure 15a,b, the images represent the Grad-CAM visualizations acquired from the first ConvNeXt Block, second ConvNeXt Block, third ConvNeXt Block, fourth ConvNeXt Block, and LKA attention shown in Table 4. In Figure 15, the activation map in red indicates significant features, while blue indicates insignificant features. Comparing Figure 15a,b, we observe that different activation maps are displayed for real and fake images that appear identical to the naked eye. This confirms that the proposed enhanced ConvNeXt-Small effectively extracts crucial features for spoof detection.

6. Conclusions

In this study, we proposed a DCS-GAN capable of generating fake finger-vein images for training spoof detectors, aiming to mitigate the increasing risks of data spoofing, a negative impact of deep learning-based image generation models, in finger-vein recognition systems. The fake images generated by the DCS-GAN showed an improved spoof attack performance compared to existing spoof attack image generators. Additionally, our proposed enhanced ConvNeXt-Small spoof detector displayed a lower spoof detection error rate than the SOTA methods and effectively extracted significant features, which were confirmed based on Grad-CAM images.
To improve the spoof detection performance of the proposed method, we introduced fractal dimension estimation to analyze the complexity and irregularity of class activation maps from real and fake finger-vein images, enabling the generation of more realistic and sophisticated fake finger-vein images. However, as mentioned in the discussion section, our DCS-GAN aimed to preserve fingerprint residues in fake finger-vein images. However, the loss of these residues in some images made spoof detection easier. Additionally, camera noise contributed to incorrect spoof detection by our enhanced ConvNeXt-Small spoof detector in certain cases.
The proposed spoof detection method can work effectively between the image acquisition module and recognition part in existing finger-vein recognition systems in order to enhance their security level. An alternative usage would be to additionally train existing spoof detectors with the fake finger-vein images generated by our method, thus enhancing the accuracies of spoof detectors. Nevertheless, as illustrated in Table 20, the processing speed of our spoof detector was 10.29 fps on a Jetson TX2 board with limited computing resources. This may be deemed inadequate for real-time application in real-world scenarios.
Therefore, our future work would reduce the processing overhead of our spoof detection model by employing a knowledge distillation method, while improving its performance by considering fake images generated from various generative models, such as diffusion and variational autoencoders. Furthermore, we would apply our method to other biometric data such as palm-vein and hand dorsal-vein images, and explore multi-modalities that are robust to external influences and capable of handling global information.

Author Contributions

Methodology, Writing—original draft, S.G.K.; Conceptualization, J.S.H.; Investigation, J.S.K.; Supervision, Writing—review and editing, K.R.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the Ministry of Science and ICT (MSIT), Korea, under the Information Technology Research Center (ITRC) support program (IITP-2024-2020-0-01789), supervised by the Institute for Information & Communications Technology Planning & Evaluation (IITP).

Data Availability Statement

The proposed DCS-GAN, enhanced ConvNeXt-Small and fake finger-vein images are publicly available via the Github site (https://github.com/SeungguKim98/Finger-Vein-Spoof-Attack-Detection, accessed on 26 July 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Nguyen, D.T.; Yoon, H.S.; Pham, T.D.; Park, K.R. Spoof detection for finger-vein recognition system using NIR camera. Sensors 2017, 17, 2261. [Google Scholar] [CrossRef] [PubMed]
  2. Neves, J.C.; Tolosana, R.; Vera-Rodriguez, R.; Lopes, V.; Proença, H.; Fierrez, J. GANprintR: Improved fakes and evaluation of the state of the art in face manipulation detection. IEEE J. Sel. Top. Signal Process. 2020, 14, 1038–1048. [Google Scholar] [CrossRef]
  3. Kim, S.G.; Choi, J.; Hong, J.S.; Park, K.R. Spoof detection based on score fusion using ensemble networks robust against adversarial attacks of fake finger-vein images. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 9343–9362. [Google Scholar] [CrossRef]
  4. DCS-GAN. Available online: https://github.com/SeungguKim98/Finger-Vein-Spoof-Attack-Detection (accessed on 26 July 2024).
  5. Nguyen, D.T.; Park, Y.H.; Shin, K.Y.; Kwon, S.Y.; Lee, H.C.; Park, K.R. Fake finger-vein image detection based on Fourier and wavelet transforms. Digit. Signal Process. 2013, 23, 1401–1413. [Google Scholar] [CrossRef]
  6. Tome, P.; Vanoni, M.; Marcel, S. On the vulnerability of finger vein recognition to spoofing. In Proceedings of the International Conference on the Biometrics Special Interest Group, Darmstadt, Germany, 10–12 September 2014; pp. 1–10. [Google Scholar]
  7. Singh, J.M.; Venkatesh, S.; Raja, K.B.; Ramachandra, R.; Busch, C. Detecting finger-vein presentation attacks using 3D shape & diffuse reflectance decomposition. In Proceedings of the international conference on Signal-Image Technology & Internet-Based Systems, Sorrento, Italy, 26–29 November 2019; pp. 8–14. [Google Scholar] [CrossRef]
  8. Ramachandra, R.; Raja, K.B.; Venkatesh, S.K.; Busch, C. Design and development of low-cost sensor to capture ventral and dorsal finger-vein for biometric authentication. IEEE Sens. J. 2019, 19, 6102–6111. [Google Scholar] [CrossRef]
  9. Raghavendra, R.; Busch, C. Presentation attack detection algorithms for finger vein biometrics: A comprehensive study. In Proceedings of the International Conference on Signal Image Technology & Internet Based Systems, Bangkok, Thailand, 23–27 November 2015; pp. 628–632. [Google Scholar] [CrossRef]
  10. Krishnan, A.; Thomas, T.; Nayar, G.R.; Sasilekha Mohan, S. Liveness detection in finger vein imaging device using plethysmographic signals. In Proceedings of the Intelligent Human Computer Interaction, Allahabad, India, 7–9 December 2018; pp. 251–260. [Google Scholar] [CrossRef]
  11. Schuiki, J.; Prommegger, B.; Uhl, A. Confronting a variety of finger vein recognition algorithms with wax presentation attack artefacts. In Proceedings of the IEEE International Workshop on Biometrics and Forensics, Rome, Italy, 6–7 May 2021; pp. 1–6. [Google Scholar] [CrossRef]
  12. Yang, H.; Fang, P.; Hao, Z. A GAN -based method for generating finger vein dataset. In Proceedings of the International Conference on Algorithms, Computing and Artificial Intelligence, Sanya, China, 24–26 December 2020; pp. 1–6. [Google Scholar] [CrossRef]
  13. Zhang, J.; Lu, Z.; Li, M.; Wu, H. GAN-based image augmentation for finger-vein biometric recognition. IEEE Access 2019, 7, 183118–183132. [Google Scholar] [CrossRef]
  14. Ciano, G.; Andreini, P.; Mazzierli, T.; Bianchini, M.; Scarselli, F. A multi-stage GAN for multi-organ chest X-ray image generation and segmentation. Mathematics 2021, 9, 2896. [Google Scholar] [CrossRef]
  15. Wang, L.; Guo, D.; Wang, G.; Zhang, S. Annotation-efficient learning for medical image segmentation based on noisy pseudo labels and adversarial learning. IEEE Trans. Med. Imaging 2020, 40, 2795–2807. [Google Scholar] [CrossRef] [PubMed]
  16. Choi, J.; Hong, J.S.; Kim, S.G.; Park, C.; Nam, S.H.; Park, K.R. RMOBF-Net: Network for the restoration of motion and optical blurred finger-vein images for improving recognition accuracy. Mathematics 2022, 10, 3948. [Google Scholar] [CrossRef]
  17. Hong, J.S.; Choi, J.; Kim, S.G.; Owais, M.; Park, K.R. INF-GAN: Generative adversarial network for illumination normalization of finger-vein images. Mathematics 2021, 9, 2613. [Google Scholar] [CrossRef]
  18. Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar] [CrossRef]
  19. Tirunagari, S.; Poh, N.; Bober, M.; Windridge, D. Windowed DMD as a microtexture descriptor for finger vein counter-spoofing in biometrics. In Proceedings of the IEEE International Workshop on Information Forensics and Security, Rome, Italy, 16–19 November 2015; pp. 1–6. [Google Scholar] [CrossRef]
  20. Kocher, D.; Schwarz, S.; Uhl, A. Empirical evaluation of LBP-extension features for finger vein spoofing detection. In Proceedings of the International Conference of the Biometrics Special Interest Group, Darmstadt, Germany, 21–23 September 2016; pp. 1–5. [Google Scholar] [CrossRef]
  21. Bok, J.Y.; Suh, K.H.; Lee, E.C. Detecting fake finger-vein data using remote photoplethysmography. Electronics 2019, 8, 1016. [Google Scholar] [CrossRef]
  22. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556; pp. 1–14, 1–14. [Google Scholar] [CrossRef]
  23. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 84–90. [Google Scholar] [CrossRef]
  24. Shaheed, K.; Mao, A.; Qureshi, I.; Abbas, Q.; Kumar, M.; Zhang, X. Finger-vein presentation attack detection using depthwise separable convolution neural network. Expert Syst. Appl. 2022, 198, 116786. [Google Scholar] [CrossRef]
  25. Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar] [CrossRef]
  26. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar] [CrossRef]
  27. Sengupta, S.; Kanazawa, A.; Castillo, C.D.; Jacobs, D.W. SfSNet: Learning shape, reflectance and illuminance of faces in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2017; pp. 6296–6305. [Google Scholar] [CrossRef]
  28. Kang, B.J.; Park, K.R. Multimodal biometric method based on vein and geometry of a single finger. IET Comput. Vision 2010, 4, 209–217. [Google Scholar] [CrossRef]
  29. Park, T.; Efros, A.A.; Zhang, R.; Zhu, J.-Y. Contrastive learning for unpaired image-to-image translation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 319–345. [Google Scholar] [CrossRef]
  30. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Polosukhin, I. Attention is all you need. In Proceedings of the Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 1–11. [Google Scholar] [CrossRef]
  31. Foret, P.; Kleiner, A.; Mobahi, H.; Neyshabur, B. Sharpness-aware minimization for efficiently improving generalization. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26 April–1 May 2022; pp. 7360–7371. [Google Scholar] [CrossRef]
  32. Kingma, D.P. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar] [CrossRef]
  33. Zou, D.; Cao, Y.; Li, Y.; Gu, Q. Understanding the generalization of Adam in learning neural networks with proper regularization. arXiv 2021, arXiv:2108.11371. [Google Scholar] [CrossRef]
  34. Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.; Wang, Z.; Paul Smolley, S. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2813–2821. [Google Scholar] [CrossRef]
  35. Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 694–711. [Google Scholar] [CrossRef]
  36. Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar] [CrossRef]
  37. Guo, M.-H.; Lu, C.-Z.; Liu, Z.-N.; Cheng, M.-M.; Hu, S.-M. Visual attention network. Comput. Vis. Media 2023, 9, 733–752. [Google Scholar] [CrossRef]
  38. Brouty, X.; Garcin, M. Fractal properties; information theory, and market efficiency. Chaos Solitons Fractals 2024, 180, 114543. [Google Scholar] [CrossRef]
  39. Yin, J. Dynamical fractal: Theory and case study. Chaos Solitons Fractals 2023, 176, 114190. [Google Scholar] [CrossRef]
  40. Crownover, R.M. Introduction to Fractals and Chaos, 1st ed.; Jones & Bartlett Publisher: Burlington, MA, USA, 1995. [Google Scholar]
  41. Tome, P.; Raghavendra, R.; Busch, C.; Tirunagari, S.; Poh, N.; Shekar, B.; Gragnaniello, D.; Sansone, C.; Verdoliva, L.; Marcel, S. The 1st competition on counter measures to finger vein spoofing attacks. In Proceedings of the International Conference on Biometrics, Phuket, Thailand, 19–22 May 2015; pp. 513–518. [Google Scholar] [CrossRef]
  42. NVIDIA GeForce RTX 3060. Available online: https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3060-3060ti/ (accessed on 25 June 2024).
  43. Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local Nash equilibrium. In Proceedings of the International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6629–6640. [Google Scholar] [CrossRef]
  44. Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 7–9 August 2017; pp. 214–223. [Google Scholar] [CrossRef]
  45. ISO/IEC JTC1 SC37; Biometrics. ISO/IEC WD 30107–3: Information Technology—Presentation Attack Detection-Part 3: Testing and Reporting and Classification of Attacks. International Organization for Standardization: Geneva, Switzerland, 2014.
  46. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar] [CrossRef]
  47. Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5967–5976. [Google Scholar] [CrossRef]
  48. Wang, T.-C.; Liu, M.-Y.; Zhu, J.-Y.; Tao, A.; Kautz, J.; Catanzaro, B. Catanzaro, High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8798–8807. [Google Scholar] [CrossRef]
  49. Muhammad, M.B.; Yeasin, M. Eigen-cam: Class activation map using principal components. In Proceedings of the International Joint Conference on Neural Networks, Glasgow, UK, 19–24 July 2020; pp. 1–7. [Google Scholar] [CrossRef]
  50. Face Anti-spoofing Challenge. Available online: https://sites.google.com/view/face-anti-spoofing-challenge/ (accessed on 26 February 2024).
  51. Tu, Z.; Talebi, H.; Zhang, H.; Yang, F.; Milanfar, P.; Bovik, A.; Li, Y. Maxvit: Multi-axis vision transformer. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 459–479. [Google Scholar] [CrossRef]
  52. Jetson TX2 Module. Available online: https://developer.nvidia.com/embedded/jetson-tx2 (accessed on 23 July 2024).
  53. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar] [CrossRef]
Figure 1. Overall flowchart of proposed method.
Figure 1. Overall flowchart of proposed method.
Fractalfract 08 00646 g001
Figure 2. Architecture of DCS-GAN.
Figure 2. Architecture of DCS-GAN.
Fractalfract 08 00646 g002
Figure 3. Samples for the selection of input and target image for training the generator and discriminator of DCS-GAN. * denotes one image randomly chosen in the intra-class of the input image, excluding the input image.
Figure 3. Samples for the selection of input and target image for training the generator and discriminator of DCS-GAN. * denotes one image randomly chosen in the intra-class of the input image, excluding the input image.
Fractalfract 08 00646 g003
Figure 4. Architecture of enhanced ConvNeXt-Small.
Figure 4. Architecture of enhanced ConvNeXt-Small.
Fractalfract 08 00646 g004
Figure 5. Sample images of real finger-veins in the databases. (a) Examples from the ISPR database and (b) examples from the Idiap database.
Figure 5. Sample images of real finger-veins in the databases. (a) Examples from the ISPR database and (b) examples from the Idiap database.
Fractalfract 08 00646 g005
Figure 6. Examples of data augmentation on the Idiap database. (a) Original image, (b) image shifted upward, (c) image shifted downward, (d) image shifted to the left, (e) image shifted to the right.
Figure 6. Examples of data augmentation on the Idiap database. (a) Original image, (b) image shifted upward, (c) image shifted downward, (d) image shifted to the left, (e) image shifted to the right.
Fractalfract 08 00646 g006
Figure 7. Graphs for the training and validation loss of DCS-GAN. (a) Training loss graph of the generator and the discriminator. (b) Validation loss graph of the generator and the discriminator.
Figure 7. Graphs for the training and validation loss of DCS-GAN. (a) Training loss graph of the generator and the discriminator. (b) Validation loss graph of the generator and the discriminator.
Fractalfract 08 00646 g007
Figure 8. Training and validation accuracy (Acc) and loss (Loss) graphs of the enhanced ConvNeXt-Small. (a) Training accuracy and loss graphs. (b) Validation accuracy and loss graphs.
Figure 8. Training and validation accuracy (Acc) and loss (Loss) graphs of the enhanced ConvNeXt-Small. (a) Training accuracy and loss graphs. (b) Validation accuracy and loss graphs.
Fractalfract 08 00646 g008
Figure 9. Sample images of fake finger-vein images generated by DCS-GAN and other SOTA methods. Examples of (a) original image and images generated by (b) Pix2Pix, (c) Pix2PixHD, (d) CycleGAN, (e) CUT, and (f) DCS-GAN.
Figure 9. Sample images of fake finger-vein images generated by DCS-GAN and other SOTA methods. Examples of (a) original image and images generated by (b) Pix2Pix, (c) Pix2PixHD, (d) CycleGAN, (e) CUT, and (f) DCS-GAN.
Fractalfract 08 00646 g009
Figure 10. FD estimation analysis for comparison between real and fake vein images: the first to the fourth images, from the left, in (ah) mean finger vein image, CAM, BCAM, and FD graph, respectively. (a,c,e,g) show the real finger-vein images whereas (b,d,f,h) present the corresponding fake finger-vein images.
Figure 10. FD estimation analysis for comparison between real and fake vein images: the first to the fourth images, from the left, in (ah) mean finger vein image, CAM, BCAM, and FD graph, respectively. (a,c,e,g) show the real finger-vein images whereas (b,d,f,h) present the corresponding fake finger-vein images.
Fractalfract 08 00646 g010aFractalfract 08 00646 g010bFractalfract 08 00646 g010c
Figure 11. ROC curves of TPR according to FPR by the proposed and the SOTA methods on (a) ISPR database and (b) Idiap database.
Figure 11. ROC curves of TPR according to FPR by the proposed and the SOTA methods on (a) ISPR database and (b) Idiap database.
Fractalfract 08 00646 g011
Figure 12. Jetson TX2 board.
Figure 12. Jetson TX2 board.
Fractalfract 08 00646 g012
Figure 13. Examples of correct spoof detection by the proposed method. (a) and (c) are examples of real images from the ISPR and Idiap databases, respectively, and (b) and (d) are corresponding examples of fake images.
Figure 13. Examples of correct spoof detection by the proposed method. (a) and (c) are examples of real images from the ISPR and Idiap databases, respectively, and (b) and (d) are corresponding examples of fake images.
Fractalfract 08 00646 g013
Figure 14. Examples of incorrect spoof detection by the proposed method. (a) and (c) are examples of real images from the ISPR and Idiap databases, respectively, and (b) and (d) are corresponding examples of fake images. In the proposed method, (b) and (d) are incorrectly identified as real images.
Figure 14. Examples of incorrect spoof detection by the proposed method. (a) and (c) are examples of real images from the ISPR and Idiap databases, respectively, and (b) and (d) are corresponding examples of fake images. In the proposed method, (b) and (d) are incorrectly identified as real images.
Fractalfract 08 00646 g014
Figure 15. Grad-CAM images. (a) shows Grad-CAM images for real images, while (b) shows Grad-CAM images for fake images generated from the real images in (a). In both (a,b), the first row is from the ISPR database, and the second row is from the Idiap database. Each row starts with the input image on the far left, followed by Grad-CAM images acquired from the first ConvNeXt Block, second ConvNeXt Block, third ConvNeXt Block, fourth ConvNeXt Block, and LKA attention of Table 4, respectively.
Figure 15. Grad-CAM images. (a) shows Grad-CAM images for real images, while (b) shows Grad-CAM images for fake images generated from the real images in (a). In both (a,b), the first row is from the ISPR database, and the second row is from the Idiap database. Each row starts with the input image on the far left, followed by Grad-CAM images acquired from the first ConvNeXt Block, second ConvNeXt Block, third ConvNeXt Block, fourth ConvNeXt Block, and LKA attention of Table 4, respectively.
Fractalfract 08 00646 g015
Table 1. Comparison of existing and proposed methods for spoof attack and spoof detection in finger-vein recognition.
Table 1. Comparison of existing and proposed methods for spoof attack and spoof detection in finger-vein recognition.
CategoryMethodsAdvantagesDisadvantages
Spoof attackUsing fake fabricated artifactsPrinted on OHP film, matte paper, and A4 paper using a LaserJet printer at resolutions of 300, 1200, 2400 dpi and then applied to the finger [5]Considers even the curvature of the finger during the spoof attack
-
The quality of the fake image is not high due to not emphasizing the vein pattern
-
Labor-intensive and costly to produce fabricated artifacts
Printed using an inkjet printer and applied to a prosthesis and a thin rubber cap [10]
Printed using a LaserJet printer and applied to wax [11]
Printed using laser and inkjet printers and replayed on smartphone display [9]Provides more realistic motion information through display replaying
Printed using a LaserJet printer and enhanced the vein outline with a black whiteboard marker [6]Improved vein pattern quality by applying post-processing after printing
-
Very low image quality compared to generated images
-
Spoof attack performance against CNN-based detector is not high
Printed on glossy paper using an inkjet printer and enhanced the vein pattern using Ramachandra et al. [8]‘s algorithm [7]
Using fake generated imagesGenerated fake finger-vein images using CycleGAN [3]The first study to use generated finger-vein images for both spoof attack and detectionUnable to generate elaborate fake finger-vein images
Generates fake finger-vein images using DCS-GAN (Proposed method)Generates fake data that is similar to the characteristic distribution of original finger-vein imagesUnlike the structure of the existing research model CycleGAN, requires two discriminators and a multilayered perceptron (MLP)
Spoof detectionMachine learning-basedSteerable pyramid + SVM [9]Requires less time for training compared to deep learning-based methodsPerformance degradation in spoof detection depending on various spoof data generation methods
W-DMD + SVM [19]
FT, Haar and Daubechies wavelet + SVM [5]
Discrete FT + SVM [21]
Deep learning-basedModified network of AlexNet or VGG-Net + PCA + SVM [1]Enables diverse spoof detection through learning CNN filters for efficient feature extractionLower accuracy in spoof detection against elaborately created fake finger-vein images
Xception (entry flow) + linear SVM [24]
Ensemble network of DenseNet-161 and DenseNet-169 + SVM [3]
SfS-Net + linear SVM [7]
Enhanced network of ConvNeXt-Small
(Proposed method)
-
Processes in one stage, eliminating the need for a separate classifier
-
High accuracy in spoof detection against elaborately created fake finger-vein images
The time required for CNN training is significant
Table 2. Descriptions of the generator in the DCS-GAN (NA means “not available”).
Table 2. Descriptions of the generator in the DCS-GAN (NA means “not available”).
Layer TypeKernel SizeNumber of FiltersStrideInput SizeOutput Size
InputNANANA224 × 224 × 3224 × 224 × 3
3 × 3 Padding (Reflect)NANANA224 × 224 × 3230 × 230 × 3
1st Conv Block *Conv
Instance Norm (ReLU)
7
NA
64
NA
1
NA
230 × 230 × 3
224 × 224 × 64
224 × 224 × 64
224 × 224 × 64
2nd Conv Block * (ReLU)31281224 × 224 × 64224 × 224 × 128
Antialiasing Sampling (Down)4NANA224 × 224 × 128112 × 112 × 128
3rd Conv Block * (ReLU)32561112 × 112 × 128112 × 112 × 256
Antialiasing Sampling (Down)4NANA112 × 112 × 25656 × 56 × 256
1st Res Block1 × 1 Padding (Reflect)
4th Conv Block * (ReLU)
1 × 1 Padding (Reflect)
5th Conv Block * (Linear)
NA
3
NA
3
NA
256
NA
256
NA
1
NA
1
56 × 56 × 256
58 × 58 × 256
56 × 56 × 256
58 × 58 × 256
58 × 58 × 256
56 × 56 × 256
58 × 58 × 256
56 × 56 × 256
1st Self-attentionNANANA56 × 56 × 25656 × 56 × 256
2nd–8th Res Blocks with Self-attentionsNANANA56 × 56 × 25656 × 56 × 256
9th Res Block3256156 × 56 × 25656 × 56 × 256
9th Self-attentionNANANA56 × 56 × 25656 × 56 × 256
Antialiasing Sampling (Up)4NANA56 × 56 × 256112 × 112 × 256
22nd Conv Block * (ReLU)31281112 × 112 × 256112 × 112 × 128
Antialiasing Sampling (Up)4NANA112 × 112 × 128224 × 224 × 128
23rd Conv Block * (ReLU)3641224 × 224 × 128224 × 224 × 64
3 × 3 Padding (Reflect)NANANA224 × 224 × 64230 × 230 × 64
24th Conv Block (Tanh) 731230 × 230 × 64224 × 224 × 3
OutputNANANA224 × 224 × 3224 × 224 × 3
* indicates that instance normalization is included after the corresponding layer.
Table 3. Descriptions of the discriminator in the DCS-GAN (NA means “not available”).
Table 3. Descriptions of the discriminator in the DCS-GAN (NA means “not available”).
LayerKernel SizeNumber of FiltersStrideInput SizeOutput Size
InputNANANA224 × 224 × 3224 × 224 × 3
25th Conv Block (Leaky ReLU)4641224 × 224 × 3224 × 224 × 64
Antialiasing Sampling (Down)4NAn/1224 × 224 × 64112 × 112 × 64
26th Conv Block * (Leaky ReLU)41281112 × 112 × 64112 × 112 × 128
Antialiasing Sampling (Down)4NANA112 × 112 × 12856 × 56 × 128
27th Conv Block * (Leaky ReLU)4256156 × 56 × 12856 × 56 × 256
Antialiasing Sampling (Down)4NANA56 × 56 × 25628 × 28 × 256
1 × 1 Padding (Constant)NANANA28 × 28 × 25630 × 30 × 256
28th Conv Block * (Leaky ReLU)4512130 × 30 × 25627 × 27 × 512
1 × 1 Padding (Constant)NANANA27 × 27 × 51229 × 29 × 512
29th Conv Block (Linear)41129 × 29 × 51226 × 26 × 1
OutputNANANA26 × 26 × 126 × 26 × 1
* indicates that instance normalization is included after the corresponding layer.
Table 4. Descriptions of enhanced ConvNeXt-Small (NA means “not available”).
Table 4. Descriptions of enhanced ConvNeXt-Small (NA means “not available”).
LayerNumber of BlocksKernel SizeNumber of FiltersStrideInput SizeOutput Size
InputNANANANA224 × 224 × 3224 × 224 × 3
StemConv *NA4 × 4964224 × 224 × 356 × 56 × 96
1st ConvNeXt BlockDepthwise Conv *
Dense (GELU)
Dense
37 × 7
1 × 1
1 × 1
96
384
96
1
1
1
56 × 56 × 96
56 × 56 × 96
56 × 56 × 384
56 × 56 × 96
56 × 56 × 384
56 × 56 × 96
1st Down Sampling BlockLayer Norm
Conv
1NA
2 × 2
NA
192
NA
2
56 × 56 × 96
56 × 56 × 96
56 × 56 × 96
28 × 28 × 192
2nd ConvNeXt Block37 × 7
1 × 1
1 × 1
192
768
192
1
1
1
28 × 28 × 192
28 × 28 × 192
28 × 28 × 768
28 × 28 × 192
28 × 28 × 768
28 × 28 × 192
2nd Down Sampling Block1NA
2 × 2
NA
384
NA
2
28 × 28 × 192
28 × 28 × 192
28 × 28 × 192
14 × 14 × 384
3rd ConvNeXt Block277 × 7
1 × 1
1 × 1
384
1536
384
1
1
1
14 × 14 × 384
14 × 14 × 384
14 × 14 × 1536
14 × 14 × 384
14 × 14 × 1536
14 × 14 × 384
3rd Down Sampling Block1NA
2 × 2
NA
768
NA
2
14 × 14 × 384
14 × 14 × 384
14 × 14 × 384
7 × 7 × 768
4th ConvNeXt Block37 × 7
1 × 1
1 × 1
768
3072
768
1
1
1
7 × 7 × 768
7 × 7 × 768
7 × 7 × 3072
7 × 7 × 768
7 × 7 × 3072
7 × 7 × 768
Conv (GELU)NA1 × 176817 × 7 × 7687 × 7 × 768
LKAConv
Dilation Conv
Conv
multiply
NA
NA
NA
NA
5 × 5
7 × 7
1 × 1
NA
768
768
768
NA
1
1
1
NA
7 × 7 × 768
7 × 7 × 768
7 × 7 × 768
7 × 7 × 768
7 × 7 × 768
7 × 7 × 768
7 × 7 × 768
7 × 7 × 768
Conv
Add
NA
NA
1 × 1
NA
768
NA
1
NA
7 × 7 × 768
7 × 7 × 768
7 × 7 × 768
7 × 7 × 768
Global Average PoolingNANANANA7 × 7 × 768768
Dense (Softmax)NANA2NA7682
* indicates that layer normalization is included after the corresponding layer.
Table 5. Detailed description of the ISPR and Idiap databases.
Table 5. Detailed description of the ISPR and Idiap databases.
DatabaseNumber of
Trials
Number of
Individuals
Number of
Hands
Number of
Fingers
Total Number
of Images
ISPR1033253300
Idiap211021440
Table 6. Hyperparameters used to train DCS-GAN.
Table 6. Hyperparameters used to train DCS-GAN.
Parameter TypesValue
Learning decay step10,000
Learning decay rate0.9
Learning rate2 × 10−4
OptimizerAdam + SAM
Beta 10.5
Beta 20.999
Batch size1
Epochs400
Adversarial lossLSGAN
Additional lossPatch, Perceptual
Table 7. Hyperparameters used to train enhanced ConvNeXt-Small.
Table 7. Hyperparameters used to train enhanced ConvNeXt-Small.
Parameter TypesValue
Learning decay stepNone
Learning decay rateNone
Learning rate1 × 10−6
Beta 10.9
Beta 20.999
Epsilon1 × 10−7
Batch size4
Epochs30
LossCross entropy
Table 8. Performance variation depending on the modules composing DCS-GAN (“Perceptual” refers to the cases where pre-trained VGG-16 was used for calculating perceptual loss as per Equation (10), and “Dense perceptual” indicates the cases where DenseNet-161 [3] was used in the same manner).
Table 8. Performance variation depending on the modules composing DCS-GAN (“Perceptual” refers to the cases where pre-trained VGG-16 was used for calculating perceptual loss as per Equation (10), and “Dense perceptual” indicates the cases where DenseNet-161 [3] was used in the same manner).
Perceptual LossDense
Perceptual
SAMSelf-AttentionFIDWD
1-Fold
Validation
2-Fold
Validation
Average1-Fold
Validation
2-Fold
Validation
Average
19.26125.86922.56530.3807.01018.695
16.36518.92717.64619.5767.18013.378
13.14914.68913.9194.2866.4285.357
12.2835.4578.87025.03920.55422.797
8.5315.6717.10119.78217.05018.416
Table 9. Comparison of spoof attack performance by generated images on DenseNet-161 and DenseNet-169 (“Perceptual” refers to the cases where pre-trained VGG-16 was used for calculating perceptual loss as per Equation (10), and “Dense perceptual” indicates the cases where DenseNet-161 [3] was used in the same manner) (metric: ACER, unit: %).
Table 9. Comparison of spoof attack performance by generated images on DenseNet-161 and DenseNet-169 (“Perceptual” refers to the cases where pre-trained VGG-16 was used for calculating perceptual loss as per Equation (10), and “Dense perceptual” indicates the cases where DenseNet-161 [3] was used in the same manner) (metric: ACER, unit: %).
Classification ModelGeneration Model1-Fold
Validation
2-Fold
Validation
Average
Perceptual LossDense PerceptualSAMSelf-Attention
DenseNet-161 0.270.270.27
0.360.330.35
0.460.490.48
0.730.790.76
0.941.151.05
DenseNet-169 0.360.420.39
0.640.700.67
0.670.730.70
0.730.820.77
1.061.001.03
Table 10. Comparison of fake image generation performance depending on data augmentation (# means “number of” and × means “none”).
Table 10. Comparison of fake image generation performance depending on data augmentation (# means “number of” and × means “none”).
AugmentationFIDWD
Random Crop# Directions for Shift1-Fold
Validation
2-Fold
Validation
Average1-Fold
Validation
2-Fold
Validation
Average
256 → 224×29.57926.99728.28840.14161.68650.914
300 → 22427.76526.57127.16835.08637.97336.530
256 → 224230.37726.36628.37220.28240.53230.407
300 → 22430.06223.53326.79830.38246.74138.562
256 → 224424.81021.89123.3518.61411.63210.123
300 → 22427.54824.96526.25728.81450.33639.575
256 → 224825.68528.12926.90716.50518.80517.655
300 → 22427.70224.75826.23024.21425.90525.060
Table 11. Comparison of spoof attack performance by type of post-processing on DenseNet-161 and DenseNet-169 using the ISPR database (metric: ACER, unit: %).
Table 11. Comparison of spoof attack performance by type of post-processing on DenseNet-161 and DenseNet-169 using the ISPR database (metric: ACER, unit: %).
Classification ModelPost
Processing
Kernel Size1-Fold
Validation
2-Fold
Validation
Average
DenseNet-161Average filter3 × 30.300.490.40
5 × 50.060.090.08
Gaussian filter3 × 30.851.331.09
5 × 50.180.300.24
Median filter3 × 33.493.673.58
5 × 56.255.225.74
DenseNet-169Average filter3 × 30.700.640.67
5 × 50.240.420.33
Gaussian filter3 × 30.640.940.79
5 × 50.360.400.38
Median filter3 × 32.313.092.70
5 × 58.046.587.31
Table 12. Comparison of spoof attack performance by type of post-processing on DenseNet-161 and DenseNet-169 using the Idiap database (metric: ACER, unit: %).
Table 12. Comparison of spoof attack performance by type of post-processing on DenseNet-161 and DenseNet-169 using the Idiap database (metric: ACER, unit: %).
Classification ModelPost
Processing
Kernel Size1-Fold
Validation
2-Fold
Validation
Average
DenseNet-161Average filter3 × 31.822.272.05
5 × 50.910.680.80
Gaussian filter3 × 32.272.732.50
5 × 51.822.732.28
Median filter3 × 31.141.141.14
5 × 50.000.450.23
DenseNet-169Average filter3 × 31.361.821.59
5 × 50.912.281.60
Gaussian filter3 × 32.272.732.50
5 × 51.361.361.36
Median filter3 × 32.050.911.48
5 × 51.590.230.91
Table 13. Comparing the image quality testing of generated fake finger-vein images by DCS-GAN with testing of those generated by the SOTA methods.
Table 13. Comparing the image quality testing of generated fake finger-vein images by DCS-GAN with testing of those generated by the SOTA methods.
DatabaseModelFIDWD
ISPRPix2Pix [47]32.1937.887
Pix2PixHD [48]13.87510.305
CycleGAN [18]23.57613.792
CUT [29]22.56518.695
DCS-GAN (Proposed)7.60118.158
IdiapPix2Pix [47]55.0625.750
Pix2PixHD [48]24.2002.625
CycleGAN [18]33.1762.868
CUT [29]24.1963.109
DCS-GAN (Proposed)23.35110.123
Table 14. FD, R2, and C values from Figure 10.
Table 14. FD, R2, and C values from Figure 10.
ResultsCase 1Case 2Case 3Case 4
Real
Figure 10a
Fake
Figure 10b
Real
Figure 10c
Fake
Figure 10d
Real
Figure 10e
Fake
Figure 10f
Real
Figure 10g
Fake
Figure 10h
R20.999030.999300.996890.997010.999680.999800.999160.99931
C0.999520.999650.998440.998500.999840.999900.999580.99965
FD2.002861.999952.004921.995742.035632.040301.967121.96665
Table 15. Comparison of performance between enhanced ConvNeXt and conventional ConvNeXt (unit: %).
Table 15. Comparison of performance between enhanced ConvNeXt and conventional ConvNeXt (unit: %).
ModelDatabase1-Fold2-FoldAverage
APCERBPCERACERAPCERBPCERACERAPCERBPCERACER
ConvNeXt-TinyISPR0.242.371.310.121.150.640.181.760.98
Idiap0.000.910.450.000.450.230.000.680.34
Enhanced ConvNeXt-TinyISPR0.670.970.820.000.610.300.340.790.56
Idiap0.000.450.230.000.450.230.000.450.23
ConvNeXt-SmallISPR0.791.401.090.420.610.520.611.010.81
Idiap0.910.000.450.000.450.230.460.230.34
Enhanced ConvNeXt-Small (Proposed)ISPR0.430.910.670.060.180.120.250.550.40
Idiap0.000.450.230.000.000.000.000.230.12
Table 16. Performance comparison based on various score-fusion methods used in existing research.
Table 16. Performance comparison based on various score-fusion methods used in existing research.
DatabaseMethod1-Fold2-FoldAverage
APCERBPCERACERAPCERBPCERACERAPCERBPCERACER
ISPRSVMLinear0.062.311.180.001.030.520.031.670.85
RBF0.062.311.180.001.030.520.031.670.85
Poly0.062.191.120.001.030.520.031.610.82
Sigmoid0.004.862.430.002.001.000.003.431.72
IdiapSVMLinear0.000.910.450.000.450.230.000.680.34
RBF0.000.910.450.000.450.230.000.680.34
Poly0.000.910.450.000.450.230.000.680.34
Sigmoid0.003.181.590.000.910.450.002.051.02
Table 17. Comparisons of spoof detection testing errors by the proposed and the SOTA methods on ISPR database (unit: %).
Table 17. Comparisons of spoof detection testing errors by the proposed and the SOTA methods on ISPR database (unit: %).
Method1-Fold2-FoldAverage
APCERBPCERACERAPCERBPCERACERAPCERBPCERACER
Ensemble Networks + SVM [3]0.062.191.120.001.030.520.031.610.82
Modified Xception + LSVM [24]0.612.611.610.301.030.670.461.821.14
Steerable pyramid + SVM [9]7.832.795.316.431.764.107.132.284.71
Modified VGG16 + PCA + SVM [1]2.790.001.403.460.121.793.130.061.60
MaxViT-Small [51]2.311.281.792.312.002.152.311.641.97
Enhanced ConvNeXt-Small (Proposed)0.430.910.670.060.180.120.250.550.40
Table 18. Comparisons of spoof detection testing errors by the proposed and the SOTA methods on Idiap database (unit: %).
Table 18. Comparisons of spoof detection testing errors by the proposed and the SOTA methods on Idiap database (unit: %).
Method1-Fold2-FoldAverage
APCERBPCERACERAPCERBPCERACERAPCERBPCERACER
Ensemble Networks + SVM [3]0.000.910.450.000.450.230.000.680.34
Modified Xception + LSVM [24]0.001.360.680.003.641.820.002.501.25
Steerable pyramid + SVM [9]0.001.820.910.002.271.140.002.051.03
Modified VGG16 + PCA + SVM [1]0.452.271.360.910.450.680.681.361.02
MaxViT-Small [51]0.910.450.681.360.450.911.140.450.80
Enhanced ConvNeXt-Small (Proposed)0.000.450.230.000.000.000.000.230.12
Table 19. Comparisons of spoof detection testing errors with the generated images by the DCS-GAN and the SOTA methods using the proposed enhanced ConvNeXt-Small detector (unit: %).
Table 19. Comparisons of spoof detection testing errors with the generated images by the DCS-GAN and the SOTA methods using the proposed enhanced ConvNeXt-Small detector (unit: %).
DatabaseModel1-Fold2-FoldAverage
APCERBPCERACERAPCERBPCERACERAPCERBPCERACER
ISPRPix2Pix [47]0.000.550.270.240.180.210.120.370.24
Pix2PixHD [48]0.300.490.390.140.280.210.220.390.30
CycleGAN [18]0.790.120.460.000.550.270.400.340.37
CUT [29]0.000.670.330.280.570.420.140.620.38
DCS-GAN (Proposed)0.430.910.670.060.180.120.250.550.40
IdiapPix2Pix [47]0.000.000.000.000.000.000.000.000.00
Pix2PixHD [48]0.000.000.000.000.000.000.000.000.00
CycleGAN [18]0.000.000.000.000.000.000.000.000.00
CUT [29]0.000.000.000.000.000.000.000.000.00
DCS-GAN (Proposed)0.000.450.230.000.000.000.000.230.12
Table 20. Comparison of the proposed method and the SOTA methods in terms of average processing time per image, GPU memory usage, number of parameters, and FLOPs on the Jetson TX2 board.
Table 20. Comparison of the proposed method and the SOTA methods in terms of average processing time per image, GPU memory usage, number of parameters, and FLOPs on the Jetson TX2 board.
MethodProcessing Time
(Unit: ms (fps))
GPU Memory Usage
(Unit: MB)
Number of Param.
(Unit: M)
FLOPs
(Unit: G)
Ensemble Networks + SVM [3]113.30 (8.83)190.6639.3422.27
Modified Xception + LSVM [24]27.87 (35.88) 96.581.403.03
Modified VGG16 + PCA + SVM [1]61.70 (16.20) 538.8514.7130.95
MaxViT-Small [51]218.06 (4.59)314.9868.2322.32
Enhanced ConvNeXt-Small (Proposed)97.22 (10.29) 219.1651.2917.17
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, S.G.; Hong, J.S.; Kim, J.S.; Park, K.R. Estimation of Fractal Dimension and Detection of Fake Finger-Vein Images for Finger-Vein Recognition. Fractal Fract. 2024, 8, 646. https://doi.org/10.3390/fractalfract8110646

AMA Style

Kim SG, Hong JS, Kim JS, Park KR. Estimation of Fractal Dimension and Detection of Fake Finger-Vein Images for Finger-Vein Recognition. Fractal and Fractional. 2024; 8(11):646. https://doi.org/10.3390/fractalfract8110646

Chicago/Turabian Style

Kim, Seung Gu, Jin Seong Hong, Jung Soo Kim, and Kang Ryoung Park. 2024. "Estimation of Fractal Dimension and Detection of Fake Finger-Vein Images for Finger-Vein Recognition" Fractal and Fractional 8, no. 11: 646. https://doi.org/10.3390/fractalfract8110646

APA Style

Kim, S. G., Hong, J. S., Kim, J. S., & Park, K. R. (2024). Estimation of Fractal Dimension and Detection of Fake Finger-Vein Images for Finger-Vein Recognition. Fractal and Fractional, 8(11), 646. https://doi.org/10.3390/fractalfract8110646

Article Metrics

Back to TopTop