Estimation of Fractal Dimension and Detection of Fake Finger-Vein Images for Finger-Vein Recognition

Kim, Seung Gu; Hong, Jin Seong; Kim, Jung Soo; Park, Kang Ryoung

doi:10.3390/fractalfract8110646

Open AccessArticle

Estimation of Fractal Dimension and Detection of Fake Finger-Vein Images for Finger-Vein Recognition

Division of Electronics and Electrical Engineering, Dongguk University, 30 Pildong-ro 1-gil, Jung-gu, Seoul 04620, Republic of Korea

^*

Author to whom correspondence should be addressed.

Fractal Fract. 2024, 8(11), 646; https://doi.org/10.3390/fractalfract8110646

Submission received: 18 October 2024 / Revised: 28 October 2024 / Accepted: 29 October 2024 / Published: 31 October 2024

(This article belongs to the Special Issue Fractional Order Complex Systems: Advanced Control, Intelligent Estimation and Reinforcement Learning Image Processing Algorithms)

Download

Browse Figures

Versions Notes

Abstract

:

With recent advancements in deep learning, spoofing techniques have developed and generative adversarial networks (GANs) have become an emerging threat to finger-vein recognition systems. Therefore, previous research has been performed to generate finger-vein images for training spoof detectors. However, these are limited and researchers still cannot generate elaborate fake finger-vein images. Therefore, we develop a new densely updated contrastive learning-based self-attention generative adversarial network (DCS-GAN) to create elaborate fake finger-vein images, enabling the training of corresponding spoof detectors. Additionally, we propose an enhanced convolutional network for a next-dimension (ConvNeXt)-Small model with a large kernel attention module as a new spoof detector capable of distinguishing the generated fake finger-vein images. To improve the spoof detection performance of the proposed method, we introduce fractal dimension estimation to analyze the complexity and irregularity of class activation maps from real and fake finger-vein images, enabling the generation of more realistic and sophisticated fake finger-vein images. Experimental results obtained using two open databases showed that the fake images by the DCS-GAN exhibited Frechet inception distances (FID) of 7.601 and 23.351, with Wasserstein distances (WD) of 18.158 and 10.123, respectively, confirming the possibility of spoof attacks when using existing state-of-the-art (SOTA) frameworks of spoof detection. Furthermore, experiments conducted with the proposed spoof detector yielded average classification error rates of 0.4% and 0.12% on the two aforementioned open databases, respectively, outperforming existing SOTA methods for spoof detection.

Keywords:

spoof attack; spoof detection; finger-vein recognition; fractal dimension estimation; generative adversarial network; convolutional neural network

1. Introduction

The evolution of identity verification in security technologies can be characterized as follows: (1) methods using keys, security cards, IDs, etc. These carry the risk of loss as the item must always be carried. (2) Methods using passwords, personal identification numbers (PINs), pattern locks, etc. These require memorization and may be exposed through external factors. (3) Methods using biometric data like fingerprints, faces, irises, and finger-veins. These are advantageous for security as they are unique to each individual, require neither possession nor memorization, and are less susceptible to external exposure. With advantages such as high accuracy and convenience, biometrics has been extensively studied for application to a variety of tasks and is now used in many security fields. However, biometric recognition systems which use pattern recognition techniques to compare enrolled-user biometric images with real-time input remain vulnerable to spoof attacks that exploit stolen images or data through data breaches or hacking [1]. Therefore, there is a need for specialized research on the anti-spoofing of biometric systems.

With the advancement of deep learning technologies, spoofing techniques have also evolved. Generative adversarial networks (GANs) are being studied for their capability to train generators and discriminators in an adversarial relationship, enabling the creation of image samples that are similar to original images with respect to characteristic distribution. Although images generated by GANs typically contain a unique GAN fingerprint, so that detection is straightforward via classifiers like convolutional neural networks (CNNs), researchers have confirmed that spoof attacks are possible with post-processed generated images through the use of various low-pass filters (Gaussian filter, median filter, average filter, etc.) following the discovery of high-frequency components in existing research [2]. These findings cause biometric recognition systems to fail in detecting adversarial spoof attacks. A spoof attack on post-processed generated images can cause conventional spoof detection mechanisms to fail, leading to inaccurate results. This enables unauthorized users to repeatedly gain access to sensitive information, potentially causing significant social, organizational, and financial losses.

Accordingly, previous research has adopted cycle-consistent adversarial networks (CycleGANs) to generate finger-vein images for the training of spoof detectors [3]. However, these approaches show limitations in generating elaborate fake finger-vein image samples. To overcome this challenge, a novel method is developed for generating fake finger-vein images, as well as a corresponding spoof detector for finger-veins. By integrating the fake finger-vein images generated by our method into conventional spoof detectors for additional training, or by directly applying our proposed spoof detection method, the security level of a finger-vein recognition system can be significantly enhanced, improving its robustness against spoof attacks.

Compared with previous works, our study has the following contributions:

-: To resolve the issue of the generation of less elaborate fake finger-vein images by the existing methods, our study introduces a novel method for generating elaborate fake finger-vein images that can attack conventional finger-vein-recognition systems. We propose the densely updated contrastive learning-based self-attention generative adversarial network (DCS-GAN);
-: The DCS-GAN is trained using the adaptive moment estimation (Adam) optimizer with sharpness-aware minimization (SAM) to improve the model’s generalization. This allows for the creation of high-quality fake images. Furthermore, by updating the loss through a comparison of generated images and real images using a DenseNet-161 that is pre-trained on finger-vein data, the model can create fake images with a distribution similar to the original ones. Additionally, the inclusion of a self-attention layer in the generator emphasizes the finger-vein patterns, enhancing the quality of the generated images;
-: The performance of spoof detection is improved by an enhanced convolutional network for a next-dimension (ConvNeXt) with a large kernel attention (LKA). This not only takes into account the adaptability in the spatial dimension, inherent to traditional self-attention, but also considers adaptability in the channel dimension, thereby computing long-range correlations and improving spoof detection;
-: To improve the spoof detection performance of the proposed method, we introduce fractal dimension estimation to analyze the complexity and irregularity of class activation maps from real and fake finger-vein images, enabling the generation of more realistic and sophisticated fake finger-vein images. In addition, we freely share our DCS-GAN, enhanced ConvNeXt, algorithm codes, and generated fake finger-vein images through [4], so that researchers can utilize them for further study and ensure fair evaluations.

The rest of this manuscript is organized as follows. Section 2 analyzes the existing research, while Section 3 provides a thorough explanation of the proposed method. Section 4 presents the experimental results, which are then discussed in Section 5. Finally, Section 6 concludes the study.

2. Related Work

The research on finger-vein spoof attacks and detection can be categorized into two areas: spoof attacks and spoof detection. Therefore, the existing research related to spoof attacks, relying on fabricated objects and generated images, is analyzed herein. We also categorize and examine the existing research related to spoof detection based on machine learning and deep learning.

2.1. Spoof Attack

2.1.1. Using Fake Fabricated Artifacts

Previous research on finger-vein spoof attacks has mainly used handcrafted fake images to attempt spoof attacks. Nguyen et al. [5] printed 56 real finger-vein images on three types of paper—overhead project (OHP) films, A4 paper, and matte paper—using a LaserJet printer at various resolutions: 300 (low-resolution), 1200 (middle-resolution), and 2400 (high-resolution) dots per inch (dpi). They considered the texture of the paper and details based on resolution to generate the total of 7560 fake finger-vein images. They then attached these to fingers and attempted spoof attacks. Tome et al. [6] printed 220 real images on paper using a LaserJet printer and enhanced the vein outlines using a board marker to carry out spoof attacks. Singh et al. [7] printed 468 real images on glossy paper using an inkjet printer and improved the quality of the vein patterns using an existing algorithm [8] to generate 468 fake images. Raghavendra and Busch [9] used 100 real images and printed them on LaserJet and inkjet printers and replayed them smartphone displays, creating a total of 300 fake images. Additionally, Krishnan et al. [10] used a prosthesis with an inkjet-printed finger-vein image attached and a thin rubber cap. Schuiki et al. [11] used a finger-vein image printed on a LaserJet printer and attached it to wax. However, such fabricated artifacts for print and display attacks have limitations. Although they may look similar to the original (real) images, they suffer from issues like paper texture, resolution, and noise due to the acquisition environment. Furthermore, they lack effectiveness against recently improved CNN-based spoof detectors.

2.1.2. Using Fake Generated Images

The evolving GANs, developed by many researchers, generate data via training through competition between their generator and discriminator networks. This has brought about the following positive effects. (1) They can be used as a data augmentation method in small datasets where data acquisition is difficult [12,13]. (2) They can generate labeled data in segmentation fields where labeling is challenging or expensive to carry out [14,15]. (3) They can address image degradation issues caused by low or high illumination, blur, noise, etc. [16,17]. However, the ability of GANs to produce images with high similarity to the original images has led to risks of spoof attacks in the biometric recognition field, including deep fakes. Although there has been considerable research on spoof detection against generated (fake) images in the domains of face, iris, and fingerprint recognition, there has been very little research on spoof detection for finger-vein recognition. Previous research using CycleGAN [18] to create fake finger-vein images similar to real images for spoof attacks had the drawback of not generating highly elaborate fake finger-vein images [3].

2.2. Spoof Detection

2.2.1. Machine Learning-Based Methods

Finger-vein images display low-quality characteristics, As described in Chapter 1, they contain extensive noise, including scattering blurring, as they use the pattern of veins under the skin of a finger illuminated by near-infrared (NIR) light. Therefore, conventional image processing methods have been applied in previous finger-vein spoof detection research. Raghavendra and Busch [9] applied a steerable pyramid to extract information about the various sizes and directions in finger-vein images and used a support vector machine (SVM) for spoof detection and binary classification (real or fake). Tirunagari et al. [19] employed dynamic mode decomposition (DMD), a technique for analyzing the dynamic characteristics of data, and specifically used a windowed dynamic mode decomposition (W-DMD) approach, moving a sliding window across the entire time range of the data. Features of the images were then extracted and classified using SVM. Kocher et al. [20] employed extension binary patterns (LBP) to extract image features and performed real and fake classification through a linear SVM. Similarly, Nguyen et al. [5] translated the input images into the frequency domain through Fourier transformation (FT) to extract information about the frequency bands in the image, or decomposed the low- and high-frequency components using Haar and Daubechies wavelet transformation to extract information, and then performed spoof detection through SVM. Additionally, Bok et al. [21] extracted heart rate and blood flow signal characteristics from finger-vein videos using discrete FT and used them to train an SVM for spoof detection. However, a drawback of the aforementioned studies is the degradation of spoof detection performance due to various spoof data creation methods.

2.2.2. Deep Learning-Based Methods

Recent advancements in deep learning technology have led to research on spoof detection using CNNs. Nguyen et al. [1] used modified models of visual geometry group (VGG)-Net [22] and AlexNet [23] to extract feature maps from images. Subsequently, they performed dimensionality reduction on these feature maps with the help of principal component analysis (PCA), and conducted fake classification via SVM. Shaheed et al. [24] employed only the entry flow of Xception [25] for feature extraction and performed spoof detection through a linear SVM. Kim et al. [3] used two types of ensemble networks, denseNet-161 and denseNet-169 [26], to obtain spoof detection scores. They then conducted score-fusion via SVM to classify them as real or fake. Additionally, Singh et al. [7] utilized SfS-Net [27] to acquire two types of images: normal-map and diffuse-map. They then extracted features using texture descriptors like LBP, local phase quantization (LPQ), and binarized statistical image features (BSIF), and used a linear SVM to obtain three different spoof detection scores. They classified real and fake data through SUM-rule fusion. However, the limitation of the aforementioned methods is that they do not achieve high accuracy in the spoof detection of more elaborately generated fake finger-vein images. To mitigate this problem, a spoof detection approach using an enhanced network of ConvNeXt-Small is proposed in this study. Table 1 shows the comparisons of existing and proposed methods for spoof attack and spoof detection in finger-vein recognition.

3. Proposed Method

3.1. Flow Diagram of the Proposed Method

In this subsection, an overview of the proposed model, which is depicted in Figure 1, is described. Initially, for the spoof attack procedure, we extract the region of interest (ROI) from the input finger-vein image using the preprocessing method explained in Section 3.2. The extracted ROI image is then used as an input to the DCS-GAN to generate a fake finger-vein image. Subsequently, through low-pass filtering-based image blurring, such as median filter, Gaussian filter, and average filter blurring, we remove the GAN fingerprints present in the fake sample produced by the DCS-GAN, thus creating a more elaborate fake image. In the spoof detection procedure, the finger-vein image that has undergone post-processing is used as an input to the ConvNeXt with LKA, which ultimately classifies it as either a real or fake finger-vein image.

In our research, the synthesis of fake images and our model’s recognition (i.e., learning) do not occur in one cycle of calculations. The synthesis of fake images (spoof attack procedure shown in Figure 1) occurs in advance. Afterwards, with the synthesized fake images, our recognition model is trained and recognizes which images are fake after the training of recognition model is finished (Spoof Detection Procedure of Figure 1). That is, the synthesis of fake images and our model’s recognition (i.e., learning) are performed separately. Therefore, the recognition system which was not trained with the synthesized fake images does not know which images are fake.

3.2. The Preprocessing of the Finger-Vein Images

The preprocessing step is to remove the background and detect the finger ROI in the original finger-vein image, which serves as an input to the finger-vein recognition system. In the finger-vein recognition system, NIR lighting is used, resulting in a structure that blocks external lighting. Consequently, the areas outside the finger contain a black background. To remove this background, it is necessary to detect the finger boundaries at the top, bottom, left, and right. For detecting the left and right boundaries, this study employed the average pixel brightness. Specifically, we calculated the average pixel value along the y-axis for each x-axis position and detected the right and left boundaries based on the x-axis positions where this average value exceeded a certain threshold. Because the penetration amount of the NIR light varies depending on the skin and thickness of the user’s finger, the threshold was adaptively determined on the basis of the average brightness from the input image. For the top and bottom finger boundaries, we detected the lines through filter operations using a 4 × 20 mask [28]. To address errors that may have occurred in the detection of the upper or lower boundaries, we compared the distance between the average value of all detected y-axis boundary coordinates and each detected y-axis boundary coordinate, eliminating outlier points that showed significant differences from the average, and then refined the boundary lines with the remaining points. Based on the refined boundaries, we apply bilinear interpolation to the obtained finger region to acquire the finger ROI of 224×224 pixels, which is used as the input to the pre-trained DCS-GAN.

3.3. Spoof Attack Procedure

3.3.1. Generation of Fake Finger-Vein Image Using DCS-GAN

In this study, the structure of the DCS-GAN used to produce fake finger-vein image samples is displayed in Figure 2. Detailed content about the DCS-GAN generator and discriminator is provided in Table 2 and Table 3, respectively. In the DCS-GAN, the correlation between patches in the features attained by the encoder of the input real image and patches in the feature map extracted from the encoder of the generated fake image is calculated through a patch sample MLP composed of two dense layers, updating the loss. Additionally, the generator’s encoder and an additional encoder share weights, enabling the maximization of mutual information by applying contrastive learning [29].

When acquiring finger-vein images, a NIR camera is used to capture the finger-vein, which is illuminated by NIR light, and the acquired images will show fingerprint marks. In this study, we improved the image reality by adding self-attention [30] after each residual block, as indicated in Table 2, to both preserve the fingerprint texture and emphasize the patterns of veins in the fake image.

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(1)

Equation (1) relates to self-attention, where the query, key, and value are represented by Q, K, and V, respectively. A query is an information vector generated at a specific location in the input data, and a key represents an information vector generated at another location in the input data. A value represents the actual information generated at each location in the input data. First, the similarity (relevance) is computed through the dot product between the query and key (

Q K^{T}

), and scaling by

\sqrt{d_{k}}

is applied for smoother model training. Second, a softmax activation function is applied to obtain the probability distribution, which is finally applied to the input data (

V

) as a self-attention map to emphasize relevant information.

For training the DCS-GAN, we used SAM [31] along with Adam [32] as the optimizer. Adam, which is widely used in existing research, generally offers excellent performance in training data compared with other optimizers, but it poses the risk of overfitting, leading to weaker performance on validation and testing data [33]. SAM helps to reach global minima by smoothing sharp minima during training. As a result, the model can avoid the risk of overfitting (i.e., not being overly tailored to training data) and show improved generalization performance on new data. Therefore, using SAM and Adam together, we were able to improve the generalization performance, which is crucial in generative models, and generate high-quality images. Equations (2)–(6) describe the operations of SAM.

L_{D} (ω) \leq L_{S} (ω + ϵ) + h ({‖ ω ‖}_{2}^{2} / σ^{2})

(2)

L_{D} (ω) \leq \max_{{‖ ϵ ‖}_{2} \leq σ} {[L}_{S} (ω + ϵ) - L_{S} (ω)] + L_{S} (ω) + h ({‖ ω ‖}_{2}^{2} / σ^{2})

(3)

L_{S}^{S A M} (ω) + {μ ‖ ω ‖}_{2}^{2} w h e r e L_{S}^{S A M} (ω) ≜ L_{S} (ω + ϵ)

(4)

\nabla_{ω} L_{S}^{S A M} (ω) \approx \nabla_{ω} L_{S} (ω + \hat{ϵ} (ω)) = \frac{d (ω + \hat{ϵ} (ω))}{d ω} {\nabla_{ω} L_{S} (ω) |}_{ω + \hat{ϵ} (ω)} = {\nabla_{ω} L_{S} (ω) |}_{ω + \hat{ϵ} (ω)} + \frac{d \hat{ϵ} (ω)}{d (ω)} {\nabla_{ω} L_{S} (ω) |}_{ω + \hat{ϵ} (ω)}

(5)

\nabla_{ω} L_{S}^{S A M} (ω) \approx {\nabla_{ω} L_{S} (ω) |}_{ω + \hat{ϵ} (ω)}

(6)

In Equations (2) and (3),

ω

is the vector of the classifier parameters, and

h

is a strictly increasing function. One of the differences between Equations (2) and (3) is that

L_{S} (ω)

is added in Equation (3). In this model, the term

\max_{{‖ ϵ ‖}_{2} \leq σ} {[L}_{S} (ω + ϵ) - L_{S} (ω)]]

represents the sharpness, indicating the level of variation in the loss value when

ω

is altered by

ϵ

, while the term

L_{S} (ω)

signifies the loss value of the training data, as in other existing methods. Additionally, the term

h ({‖ω‖}_{2}^{2} / σ^{2})

represents the term of regularization related to the size of

ω

. Here, the term of regularization employs an L2 regularizer. As a result, both the sharpness and training loss are minimized, enabling the model to find a relatively flat region. On the basis of this, the problem of selecting optimal parameter values for solving the SAM drawback is formulated in Equation (4). Consequently, while the SAM loss would be calculated as shown in Equation (5), the term

\frac{d \hat{ϵ} (ω)}{d (ω)} {\nabla_{ω} L_{S} (ω) |}_{ω + \hat{ϵ} (ω)}

involves calculating the Hessian matrix, which is computationally expensive and thus not used to avoid slowing down the training. Therefore, Equation (6) becomes the final SAM loss.

To address the issue of the vanishing gradient in the generator caused by the sigmoid cross-entropy loss function that is traditionally employed to train the discriminator in GANs, Mao et al. [34] updated the generator by using the least square loss. This computes the distances between the distributions of the original samples not as divergence but as a least square error, thus penalizing the image samples that are not close to the decision boundary and enabling the generation of samples closer to real images. For that reason, we selected the least squares GAN (LSGAN) loss for smooth training of the GAN.

L_{D i s c r i m i n a t o r} (G_{e n}, D_{i s}, X, Y) = {\frac{1}{2} E}_{y ~ Y} [{(D_{i s} (y) - b)}^{2}] + {\frac{1}{2} E}_{x ~ X} [{(D_{i s} (G_{e n} (x)) - a)}^{2}]

(7)

L_{G e n e r a t o r} (G_{e n}, D_{i s}, X) = {\frac{1}{2} E}_{x ~ X} [{(D_{i s} (G_{e n} (x)) - c)}^{2}]

(8)

Equation (7) describes the LSGAN formulation used for training the DCS-GAN discriminator in this study, where

a

and

b

, respectively, denote the labels for fake and real images, and

D_{i s}

and

G_{e n}

represent the discriminator model and the generator model. To minimize the value of this equation, the term

{\frac{1}{2} E}_{y ~ Y} [{(D_{i s} (y) - b)}^{2}]

must become

D_{i s} (y) = b

, and in

{\frac{1}{2} E}_{x ~ X} [{(D_{i s} (G_{e n} (x)) - a)}^{2}]

,

D_{i s} (G_{e n} (x)) = a

must be achieved. More specifically, Equation (7) ensures that the discriminator model correctly classifies real images as

y

, the real image label, and fake images generated by the generator as

x

, the fake image label. Conversely, Equation (8) is for training the generator, and to minimize its value,

{\frac{1}{2} E}_{x ~ X} [{(D_{i s} (G_{e n} (x)) - c)}^{2}]

in the term must become

D_{i s} (G_{e n} (x)) = c

. Essentially, the data generated as

c \neq a

finger-vein images should not be classified by the discriminator as having fake image labels.

In this study, besides the typical generator and discriminator constituting a GAN, we also employed a separate encoder section of the generator and an additional patch sample MLP to maximize the amount of mutual information. Based on this, we calculated a multilayer, patchwise contrastive loss aimed at correlating the same patches in the feature maps of real and fake images while not correlating the different patches, as shown in Equation (9).

L_{P a t c h} (G_{e n}, M_{l p}, X) = E_{x ~ X} \sum_{l = 1}^{L} \sum_{s = 1}^{S_{l}} l ({\hat{z}}_{l}^{s}, z_{l}^{s}, z_{l}^{S / s})

(9)

In Equation (9),

M_{l p}

is the MLP, L denotes the number of layers as

l \in \{1, 2, 3, \dots, L\}

, and S denotes the number of spatial locations as

s \in \{1, 2, 3, \dots, S\}

. Therefore, the term

E_{x ~ X} \sum_{l = 1}^{L} \sum_{s = 1}^{S_{l}} l ({\hat{z}}_{l}^{s}, z_{l}^{s}, z_{l}^{S / s})

results from inputting the output feature maps from the layers of the encoder into the MLP by using their spatial locations. The obtained feature map is then encoded into the image that is output from the generator.

In the case of GANs for general image-to-image translation, the main objective lies in mapping the input images to the output images while preserving their shapes but altering their internal structures. However, our objective is to generate fake images through GANs that the conventional finger-vein spoof detector cannot distinguish from input real images. Therefore, we have additionally applied perceptual loss [35] that allows for the comparison of feature maps between input real images and output fake images. The conventional perceptual loss employs a VGG-16 model pre-trained on ImageNet. As this study aims to spoof attack detection for finger-vein images, DenseNet-161 [3], pre-trained on real and fake finger-vein images, is used as a feature extractor.

L_{p e r c e p t u a l} (G_{e n}, X, Y) = \frac{1}{H_{i, j} W_{i, j} C_{i, j}} \sum_{h = 1}^{H_{i, j}} \sum_{w = 1}^{W_{i, j}} \sum_{c = 1}^{C_{i, j}} {{(\emptyset_{i, j} (G_{e n} (x))}_{h, w, c} - {\emptyset_{i, j} (y)}_{h, w, c})}^{2}

(10)

In Equation (10),

\emptyset_{i, j}

refers to the feature map obtained after the

j

th convolution and before the

i

th maxpooling in the pre-trained DenseNet-161 model.

H_{i, j}

,

W_{i, j}

, and

C_{i, j}

represent the dimensions of the feature map, respectively. Therefore, the term

{{(\emptyset_{i, j} (G_{e n} (x))}_{h, w, c} - {\emptyset_{i, j} (y)}_{h, w, c})}^{2}

signifies the Euclidean distance between the generated fake image sample and the real sample. Finally, Equation (11) combines Equations (8) and (10) to represent the loss used for training the generator in this study.

L_{G e n e r a t o r} (G_{e n}, D_{i s}, X) + L_{P a t c h} (G_{e n}, M_{l p}, X) + L_{p e r c e p t u a l} (G_{e n}, X, Y)

(11)

In this study, to generate fake images that are as similar as possible to real images, we used real images as the input images during the DCS-GAN training process. For the target image, we excluded the input image itself and used one of the remaining real images within the input image’s intra-class. The target image was randomly chosen to facilitate smooth learning through the use of diverse inputs. Figure 3 represents the samples of inputs for the DCS-GAN.

3.3.2. The Post-Processing Stage for the Generation of Fake Finger-Vein Images

In the spoof attack procedure in Figure 1, post-processing involves removing the traces of fake image generation. Previously, synthetic images generated by GANs contained a ‘GAN fingerprint,’ so spoof detection was relatively straightforward. However, researchers have found that such high-frequency components can be removed by low-pass filters like the median filter, Gaussian filter, and average filter [2]. Due to this, the risk of spoofing through GANs has increased. Therefore, in this study, we applied the Gaussian filter, median filter, and average filter individually and compared their effects on the spoof detection performance.

3.4. Spoof Detection Procedure

Spoof Detection of Fake-Vein Image by Enhanced ConvNeXt

In this study, we chose ConvNeXt-Small [36] as the base model for detecting fake finger-vein images. ConvNeXt achieves SOTA performance through various improvements such as the use of a stage compute ratio, stem layer (patchify), and inverted bottleneck. For these reasons, and considering computational efficiency, we used ConvNeXt-Small as the base model for spoof detection in this study. ConvNeXt-Small consists of a structure with ConvNeXt Block (1) × 3, ConvNeXt Block (2) × 3, ConvNeXt Block (3) × 27, and ConvNeXt Block (4) × 3. Here, ConvNeXt Blocks (1)–(4) are different blocks, and the × number indicates the number of repetitions. Unlike conventional CNN models, ConvNeXt Blocks utilize a 7 × 7-size kernel to expand the receptive field, thereby enhancing the model’s performance. To further improve the performance of the existing ConvNeXt model, this study proposes an enhanced ConvNeXt-Small model that additionally employs LKA [37] after the last ConvNeXt Block to enable self-adaptation and long-range correlations. This allows emphasized feature maps to be transmitted to the classifier for the spoof detection problem, which involves real or fake classification (binary classification). The structure of the enhanced ConvNeXt-Small is detailed in Figure 4 and Table 4.

3.5. Fractal Dimension Estimation

Fractals are complex structures that display self-similarity and diverge from traditional geometric rules [38]. The fractal dimension (FD) quantifies the complexity of a shape, indicating whether it is more concentrated or dispersed. In this study, a binary image representing the activated region of finger-vein images (real or fake) is used for FD estimation. The FD in this context ranges between almost one and two, reflecting different degrees of complexity. This range encompasses various representations of binary class activation maps (BCAMs), with higher FD values corresponding to greater shape intricacy. The FD for the activated region is calculated using the box-counting method [39,40], where

C

represents the number of boxes that evenly cover each activated region, and

δ

is the scaling factor of the boxes. The FD is determined using Equation (12).

F D = \lim_{δ \to 0} \frac{l o g (C (δ))}{l o g (1 / δ)}

(12)

where

1 \leq F D \leq 2

, and for all

δ

> 0, there exists a

C (δ)

. The pseudocode for estimating the FD of the activated part of the finger-vein image using the box-counting method is provided in Algorithm 1.

Algorithm 1 Pseudocode for Fractal Dimension (FD) Estimation

Input: BCAM: Binary class activation map extracted from DSC-GAN’s encoder
Output: FD: Fractal dimension
1: Find the largest dimension of the box size and adjust it to the nearest power of 2
Max_dimension = max(size(BCAM))
  δ = 2^[log₂(Max_dimension)]
2: If the size is smaller than δ, pad the image to match δ‘s dimension
  if size(BCAM) < size(δ)
  Pad_width = ((0, δ − BCAM.shape [0]), (0, δ − BCAM.shape [1]))
  Pad_ BCAM= pad(BCAM, Pad_width, mode = ‘constant’, constant_values = 0)
  else
  Pad_ BCAM = BCAM
3: Initialize an array to store the number of boxes corresponding to each dimension size
n = zeros(1, δ + 1)
4: Compute the number of boxes, C(δ) containing at least one pixel of the positive region
n[δ + 1] = sum(BCAM[:])
5: While δ > 1:
  a. Divide the size of δ by 2
  b. Reassign the number of boxes C(δ)
6: Compute the log(C(δ)) and log(δ) for each δ
7: Fit a line to the points [(log(δ), log(C(δ))] using the least squares method
8: The fractal dimension (FD) is found by the slope of the fitted line
Return FD

4. Experimental Results

4.1. Experimental Database and Setups

For the performance evaluation of the DCS-GAN for generating fake finger-vein images in the spoof attack procedure and the enhanced ConvNeXt-Small for detecting forgeries in the spoof detection procedure, we used real finger-vein images from two open databases: the ISPR database [1] and the Idiap database [41]. The ISPR database consists of a total of 3300 real images captured from all fingers of both hands of 33 individuals, each captured 10 times (10 trials × 33 individuals × 2 hands × 5 fingers). The Idiap database comprises a total of 440 real images captured from the index fingers from both hands of 110 individuals, each captured twice (2 trials × 110 individuals × 2 hands × 1 finger). In Table 5, a description of the ISPR and Idiap databases is presented, and Figure 5 shows examples from both databases.

The experimental work of this study was performed using a desktop computer equipped with an Intel® Core (TM) i7-9700F central processing unit (CPU) operating at 3.0 GHz, supplemented by 32 GB of RAM and an NVIDIA GeForce RTX 3060 graphics processing unit (GPU). This graphics card includes 3584 compute unified device architecture (CUDA) cores and has a total of 12 GB of dedicated graphics memory [42].

4.2. Training of the Proposed Networks

All experiments in this study were carried out using two-fold cross-validation. Specifically, in the first fold validation, half of the total data were used for training of the network, while the remaining half were used to test the network. In the second fold validation, this was reversed. The final testing accuracy was calculated by averaging the testing accuracies from the two folds. From the training data, 10% of the data were used as a validation set to avoid model overfitting. For effective training, the ISPR database images were resized to 256 × 256 and then subjected to random crop augmentation to a size of 224 × 224. Particularly, in the Idiap database, due to the high risk of overfitting with the use of only 440 real images, each real image was subjected to 10-pixel shifts in all four directions (up, down, left, and right), resulting in a total of 1760 training images (440 images × 4 directions). Figure 6 shows examples of the shift augmentation applied to the Idiap database.

4.2.1. Training of DCS-GAN for Spoof Attack

For the spoof attack procedure, the DCS-GAN was trained for the generation of fake images that are similar to the real images. The initial learning rate was set at 0.0002, and it decayed at a rate of 0.9 every 10,000 steps, completing a total of 400 epochs. Table 6 provides details about the training parameters. Figure 7a refers to the training loss graphs for both the generator and the discriminator. This graph indicate sufficiently converged training results from the training split. Figure 7b displays the validation loss graphs for the generator and the discriminator, confirming that the DCS-GAN did not overfit on the training data.

4.2.2. Training of Enhanced ConvNeXt-Small for Spoof Detection

In the procedures for detecting spoof attacks with fake finger-vein images obtained by the DCS-GAN, Adam was used as the optimizer and cross-entropy loss was employed as the loss function. Table 7 details the training parameters.

To train the enhanced ConvNeXt-Small for spoof detection, we mixed the original (real) images with the generated (fake) images from the DCS-GAN and trained it with two-fold cross validation. Figure 8a presents the resulting training accuracy and loss graphs for the enhanced ConvNeXt-Small. This demonstrates that, with the increase in epochs, both the accuracy and loss converge, indicating sufficient training on the data. Figure 8b presents the validation accuracy and loss graphs of the enhanced ConvNeXt-Small, confirming that the model has not overfitted on the training data.

4.3. Testing of Proposed Model

4.3.1. Evaluation Metrics

In this study, the quality of the generated images by the GAN was evaluated using the Frechet inception distance (FID) [43] as per Equation (13), which was commonly used in previous research [3,29]. Additionally, the Wasserstein distance [44] was also used to evaluate the quality of uneven illumination-corrected vein images as in Equation (14) [17]. The quality was assessed by comparing real (original) images with fake (generated) images.

F I D = {|μ_{r e a l} - μ_{f a k e}|}^{2} + T r (\sum_{r e a l} + \sum_{f a k e} - 2 \sum_{r e a l} \sum_{f a k e})

(13)

W D_{p} (P, Q) = {(\underset{γ \in Π (P, Q)}{I n f} {\int_{R^{d} \times R^{d}} |I_{r e a l} - I_{f a k e}|}^{p} d γ)}^{1 / p}

(14)

In Equation (13),

μ_{r e a l}

and

μ_{f a k e}

represent the average pixel values of the real and fake image samples, respectively;

T r

represents the diagonal sum function obtained from Inception-v3, pre-trained on ImageNet; and

\sum_{r e a l}

and

\sum_{f a k e}

represent the covariance matrices. In Equation (14),

Π (P, Q)

signifies the joint probability distribution,

R^{d} \times R^{d}

represents the marginal probability distribution, and

d γ

denote the measure according to the joint distribution

γ

.

To test the performance of the model’s spoof detection, we used the attack presentation classification error rate (APCER), as per Equation (15), according to the ISO/IEC-30107 standard [45]. We also used the bona fide presentation classification error rate (BPCER) in Equation (16) to indicate the error rate of incorrectly classifying real images as fake. Additionally, the average classification error rate (ACER) was calculated using Equation (17), representing the average error rate between the APCER and BPCER.

A P C E R = 1 - (\frac{1}{I_{f a k e}}) \sum_{i = 1}^{I_{f a k e}} {D e t e c t o r}_{i}

(15)

B P C E R = \frac{1}{I_{r e a l}} \sum_{i = 1}^{I_{r e a l}} {D e t e c t o r}_{i}

(16)

A C E R = \frac{1}{2} (A P C E R + B P C E R)

(17)

I_{r e a l}

refers to the number of real (original) images, and

I_{f a k e}

refers to that of fake (generated) images. Additionally,

{D e t e c t o r}_{i}

refers to the predicted labels obtained from the spoof detector. Therefore, in Equation (15),

i

takes a value of 1 if a fake image is correctly classified, and in the case of incorrect classification, a value of 0 is assigned. In Equation (16),

i

takes a value of 0 if a real image is correctly classified, and in the case of incorrect classification, a value of 1 is assigned.

4.3.2. Performance Test of the Spoof Attack

4.3.2.1. Ablation Studies

As the foremost step of our ablation study to measure the performance of the spoof attack, we compared the performance by incrementally removing the key modules of the DCS-GAN through the ISPR database. As indicated in Table 8, we confirmed that the WD metric by the CUT model (without SAM and self-attention) with a Dense perceptual was the best, but that the FID metric by the DCS-GAN model was the best. However, the goal of this research is to improve the performance of attacks against spoofing detectors. The FID metric reflects the performance of spoofing attacks against spoof detector CNNs, and it uses features obtained from pre-trained Inception V3 models [46]. The WD metric is usually used to evaluate the simple quality of uneven illumination-corrected images based on differences in pixel distribution [17]. Therefore, the FID metric can provide a more accurate measure of performance than the WD. To check the quality of the fake finger-vein images, we evaluated their effectiveness in spoof attacks using existing finger-vein spoof detectors. Specifically, we used DenseNet-161 and DenseNet-169, as in previous research [3], to evaluate the images’ performance using the ACER metric on the generated images as listed in Table 8. As is evident from Table 9, the DCS-GAN model with all proposed modules generated fake images that yielded the highest spoof detection error rates: an average ACER of 1.05% for DenseNet-161 and 1.03% for DenseNet-169. This confirms that our fully equipped DCS-GAN model produces fake images that are the most similar to real images. For the next ablation study, we compared our results with the performance of data augmentation techniques applied to the Idiap database, for which there was a shortage of real images during DCS-GAN training. As demonstrated in Table 10, applying data augmentation via random cropping from 256 × 256 to 224 × 224 and shifting in four directions (up, down, left, right) resulted in the highest performance.

For the next ablation study, we compared our results to those of spoof detection performance based on the post-processing described in Section 3.3.2. All spoof detection training in this work was performed under the assumption that the actual conditions of spoof attacks were unknown, specifically how the fake images were generated. We trained our model using fake images without post-processing, while fake images with post-processing were only used for testing. The experimental results, as shown in Table 11, reveal that fake images generated from the ISPR database and post-processed with a median 5 × 5 filter led to the highest spoof detection errors: 5.74% for DenseNet-161 and 7.31% for DenseNet-169. Specifically, post-processing with the median 5 × 5 filter produced fake images that were the most similar to real images, hindering the spoof detection. For all subsequent experiments using the ISPR database, we used images post-processed with the median 5 × 5 filter. Additionally, as shown in Table 12, the fake images generated from the Idiap database and post-processed with the Gaussian 3 × 3 filter also exhibited the highest spoof detection errors: 2.5% for both DenseNet-161 and DenseNet-169. Hence, using the Gaussian 3 × 3 filter for post-processing produced the fake images that were the most similar to real images, effectively challenging the spoof detection. For all subsequent experiments using the Idiap database, we used images post-processed with the Gaussian 3 × 3 filter.

4.3.2.2. Comparing Image Quality by the Proposed and SOTA Approaches

We compared the similarities between both fake images generated by our DCS-GAN and real (original) images and those generated by the SOTA methods. As shown in Table 13, the DCS-GAN outperformed all other methods when it was evaluated using the FID metric. In contrast, when using the WD metric, Pix2Pix or Pix2PixHD showed the highest results. As explained in Section 4.3.2.1, the FID metric provides a more accurate measure of performance than the WD metric. In some evaluation metrics, the difference between the DCS-GAN and SOTA methods is small. However, the final goal of this study is to generate more realistic fake images, which makes detecting spoof attacks more difficult than it is when using the SOTA methods, and this was verified in Section 4.3.3.2. Figure 9 shows examples of the fake finger-vein images generated for spoof attacks using the DCS-GAN and the SOTA methods. As can be seen, the DCS-GAN effectively creates fake images which are more similar to the real images than those generated by the SOTA methods. The spoof detection results of the SOTA spoof attack methods and the proposed DCS-GAN method are compared in Section 4.3.3.2.

4.3.2.3. FD Estimation for Evaluating Generated Image Quality by the Proposed Method

To evaluate the fake finger-vein images generated by the proposed method, we performed FD estimation, which can serve as a metric to analyze the complexity of and assess the similarity between real and fake images. In this approach, we utilized Eigen-CAM [49] to extract the class activation map (CAM). It can generate the CAM without requiring class labels. Unlike traditional CAM techniques, which are typically used to visualize activation maps corresponding to specific class labels, Eigen-CAM identifies key activation regions in feature maps independently of any class. First, we obtained a CAM that represents important regions from the final layer of the generator’s encoder in the DCS-GAN model, and the extracted activation map was then binarized to produce a binary class activation map (BCAM), which was subsequently used for FD estimation. In our research, we did not turn the grayscale finger-vein images into black and white ones. Instead, we turned the red-green-blue color images of the class activation maps for the real and fake finger-vein images into black and white ones for fractal dimension estimation, as shown in Figure 10. In detail, we used the fixed threshold of 180 for the binarization of the red images, where the pixel whose value is higher than or the same as 180 is presented as a white pixel in both cases of real and fake finger-vein images, whereas a pixel with a value less than 180 is presented as a black pixel, as shown in Figure 10. That is because the pixels with a high red value indicate important features in the class activation map [49].

The FD values represent the complexity of the BCAM of the finger-vein images. As shown in both Figure 10 and Table 14, the FD values of the real and fake finger-vein images are similar. This indicates that the fake image generated by the DCS-GAN has almost the same level of complexity as the real image, suggesting that the fake image is generated nearly identically to the real one. Therefore, it can be concluded that the fake images produced by the method proposed in this paper are highly similar to the real images while preserving the genuine characteristics of the real images. Furthermore, this suggests that this method can play a crucial role in enhancing the security level of finger-vein recognition systems.

4.3.3. Performance Test of Spoof Detection

4.3.3.1. Ablation Study

To test the performance of the spoof detection, an ablation study was performed comparing the results of the enhanced ConvNeXt to those of the conventional ConvNeXt when attempting a spoof attack using fake finger-vein images generated by our method. Table 15 shows the performances of the conventional ConvNeXt and enhanced ConvNeXt on the ISPR and Idiap databases.

On the ISPR database, the enhanced ConvNeXt-Small (proposed method) reduced the average ACER to 0.4% during one- and two-fold validation, a 0.41% decrease compared with the conventional ConvNeXt-Small. Moreover, the enhanced ConvNeXt-Tiny showed a reduction to 0.56%, a 0.42% decrease compared with the conventional ConvNeXt-Tiny, albeit with a slightly higher error rate than the enhanced ConvNeXt-Small. A relatively higher error rate of 0.16% was observed. In the Idiap database, the enhanced ConvNeXt-Tiny also outperformed the conventional ConvNeXt-Tiny, and the enhanced ConvNeXt-Small (proposed method) showed the best result, with an ACER of 0.12%.

4.3.3.2. Comparisons of Spoof Detection Accuracies by Proposed and SOTA Methods

In this subsection, the spoof detection accuracy of the proposed spoof detector is compared with that of the SOTA spoof detectors. First, for a fair performance evaluation, we compared the performance based on various score-fusion methods used in existing research [3] and a detector trained on generated images from the DCS-GAN, as shown in Table 16. Table 16 presents the performance of existing spoof detectors on these fake finger-vein images obtained by the DCS-GAN. In the experiments on the ISPR database, an ACER of 0.82% was observed, and for the Idiap database, it was 0.34%. In comparison, using the fake-image generation method in the existing study [3], the ACER increased to 0.32% on the ISPR database and 0.23% on the Idiap database, marking increases of 0.5% and 0.11%, respectively. This confirms that spoof attacks using fake images produced by the DCS-GAN more effectively prevent spoof detection compared with the method used in the existing research [3].

Next, we compared the performance of the proposed spoof detector with that of the SOTA detectors, as shown in Table 17 and Table 18. As confirmed in Table 17 and Table 18, the proposed spoof detector exhibits the best performance. Moreover, we verified the equal error rate (EER), as shown in Figure 11, using the receiver operating characteristic (ROC) curves of the true positive rate (TPR) (Equation (18)) according to the false positive rate (FPR) (Equation (19)), similar to previous research [50].

T P R = 1 - (\frac{1}{I_{r e a l}}) \sum_{i = 1}^{I_{r e a l}} {D e t e c t o r}_{i}

(18)

F P R = 1 - (\frac{1}{I_{f a k e}}) \sum_{i = 1}^{I_{f a k e}} {D e t e c t o r}_{i}

(19)

In Equations (18) and (19),

I_{r e a l}

denotes the number of real (original) images, and

I_{f a k e}

refers the number of fake (generated) images. Additionally,

{D e t e c t o r}_{i}

refers to the predicted labels obtained from the spoof detector. Therefore, in Equation (18),

i

will take the value of 1 if the input real image is incorrectly classified as a fake image, and the value of 0 if it is correctly classified as a real image. Additionally, in Equation (19),

i

will take the value of 0 if the input fake image is incorrectly classified as a real image, and the value of 1 if it is correctly classified as a fake image. As indicated in Figure 11, we confirmed that the proposed spoof detector exhibits the best performance.

Subsequently, we performed comparisons of the spoof detection testing errors with the use of the images generated by the DCS-GAN and the SOTA methods using the proposed enhanced ConvNeXt-Small detector, which showed the best detection performance, in Table 17 and Table 18 and Figure 11. As listed in Table 19, the proposed DCS-GAN had the highest ACER, confirming that the DCS-GAN is the most effective at generating fake images that are the hardest to detect, thus being the closest to real images.

4.3.3.3. Comparisons of Algorithm Complexity

In this subsection, we evaluate the number of trainable parameters (param.), the number of floating point operations per second (FLOPs), and the GPU memory usage of the proposed method. Additionally, we computed the average processing time on a Jetson TX2 board to assess its feasibility in resource-constrained environments. As given in Figure 12, the Jetson TX2 board is equipped with an NVIDIA Pascal^TM-family GPU consisting of 256 compute unified device architecture (CUDA) cores [52].

As indicated in Table 20, the processing time of our method on the Jetson TX2 system is 97.22 ms (10.29 (1000/97.22) frames per second (fps)), with GPU memory usage of 219.16 megabytes (MB), the number of parameters being 51.29 mega (M), and the number of FLOPs being 17.17 Giga (G). While not the best in all metrics shown in Table 20, the proposed method still shows the best accuracies in spoof detection compared with the existing SOTA methods, as demonstrated in Table 17 and Table 18 and Figure 11. We also confirmed that the proposed spoof detector operates effectively even on the resource-limited Jetson TX2 embedded board. Although the Modified VGG16 + PCA + SAM [1] and Modified Xception + LSVM [24] were faster in terms of processing time compared to the proposed method, they had certain drawbacks. The Modified VGG16 + PCA + SAM required 2.46 times more GPU memory, while the Modified Xception + LSVM showed higher ACER values, with 0.74% on the ISPR database and 1.13% on the Idiap database, when compared to the proposed method.

5. Discussions

In the spoof attack procedure, the DCS-GAN exhibited performance improvements of 6.274 and 0.845 in FID on the ISPR and Idiap databases, respectively, compared to the second-best model. Additionally, as illustrated in Figure 9, the images generated by the DCS-GAN appear to be more similar to the original images than those generated by the SOTA methods. In the spoof detection procedure, the enhanced ConvNeXt-Small model showed an improvement in ACER of 0.42% on the ISPR database and of 0.22% on the Idiap database compared to the second-best model. Unlike the existing spoof detection methods for finger-vein recognition systems, our proposed method operates without requiring additional classifiers (i.e., SVM) and demonstrates a reasonable processing time, as illustrated in Table 20, making it suitable for resource-constrained embedded environments. However, Figure 13 and Figure 14, respectively, display examples of correct and incorrect spoof detection for real images from the ISPR and Idiap databases, as well as fake images generated by the DCS-GAN. As shown in Figure 13, the proposed spoof detector can accurately distinguish between real and fake images that are nearly indistinguishable to the naked eye. As in Figure 14, camera noise and fingerprint residues present in the real images are mostly eliminated in the fake images. These factors are believed to contribute to the incorrect spoof detection results.

Additionally, to identify the criteria used by the enhanced ConvNeXt-Small to discriminate real and fake images, we examined the extracted features through gradient-weighted class activation mapping (Grad-CAM) images [53]. Figure 15a shows the Grad-CAM images for real images, while Figure 15b displays the Grad-CAM images for fake images generated from those in Figure 15a. Starting from the leftmost images in Figure 15a,b, the images represent the Grad-CAM visualizations acquired from the first ConvNeXt Block, second ConvNeXt Block, third ConvNeXt Block, fourth ConvNeXt Block, and LKA attention shown in Table 4. In Figure 15, the activation map in red indicates significant features, while blue indicates insignificant features. Comparing Figure 15a,b, we observe that different activation maps are displayed for real and fake images that appear identical to the naked eye. This confirms that the proposed enhanced ConvNeXt-Small effectively extracts crucial features for spoof detection.

6. Conclusions

In this study, we proposed a DCS-GAN capable of generating fake finger-vein images for training spoof detectors, aiming to mitigate the increasing risks of data spoofing, a negative impact of deep learning-based image generation models, in finger-vein recognition systems. The fake images generated by the DCS-GAN showed an improved spoof attack performance compared to existing spoof attack image generators. Additionally, our proposed enhanced ConvNeXt-Small spoof detector displayed a lower spoof detection error rate than the SOTA methods and effectively extracted significant features, which were confirmed based on Grad-CAM images.

To improve the spoof detection performance of the proposed method, we introduced fractal dimension estimation to analyze the complexity and irregularity of class activation maps from real and fake finger-vein images, enabling the generation of more realistic and sophisticated fake finger-vein images. However, as mentioned in the discussion section, our DCS-GAN aimed to preserve fingerprint residues in fake finger-vein images. However, the loss of these residues in some images made spoof detection easier. Additionally, camera noise contributed to incorrect spoof detection by our enhanced ConvNeXt-Small spoof detector in certain cases.

The proposed spoof detection method can work effectively between the image acquisition module and recognition part in existing finger-vein recognition systems in order to enhance their security level. An alternative usage would be to additionally train existing spoof detectors with the fake finger-vein images generated by our method, thus enhancing the accuracies of spoof detectors. Nevertheless, as illustrated in Table 20, the processing speed of our spoof detector was 10.29 fps on a Jetson TX2 board with limited computing resources. This may be deemed inadequate for real-time application in real-world scenarios.

Therefore, our future work would reduce the processing overhead of our spoof detection model by employing a knowledge distillation method, while improving its performance by considering fake images generated from various generative models, such as diffusion and variational autoencoders. Furthermore, we would apply our method to other biometric data such as palm-vein and hand dorsal-vein images, and explore multi-modalities that are robust to external influences and capable of handling global information.

Author Contributions

Methodology, Writing—original draft, S.G.K.; Conceptualization, J.S.H.; Investigation, J.S.K.; Supervision, Writing—review and editing, K.R.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the Ministry of Science and ICT (MSIT), Korea, under the Information Technology Research Center (ITRC) support program (IITP-2024-2020-0-01789), supervised by the Institute for Information & Communications Technology Planning & Evaluation (IITP).

Data Availability Statement

The proposed DCS-GAN, enhanced ConvNeXt-Small and fake finger-vein images are publicly available via the Github site (https://github.com/SeungguKim98/Finger-Vein-Spoof-Attack-Detection, accessed on 26 July 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Nguyen, D.T.; Yoon, H.S.; Pham, T.D.; Park, K.R. Spoof detection for finger-vein recognition system using NIR camera. Sensors 2017, 17, 2261. [Google Scholar] [CrossRef] [PubMed]
Neves, J.C.; Tolosana, R.; Vera-Rodriguez, R.; Lopes, V.; Proença, H.; Fierrez, J. GANprintR: Improved fakes and evaluation of the state of the art in face manipulation detection. IEEE J. Sel. Top. Signal Process. 2020, 14, 1038–1048. [Google Scholar] [CrossRef]
Kim, S.G.; Choi, J.; Hong, J.S.; Park, K.R. Spoof detection based on score fusion using ensemble networks robust against adversarial attacks of fake finger-vein images. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 9343–9362. [Google Scholar] [CrossRef]
DCS-GAN. Available online: https://github.com/SeungguKim98/Finger-Vein-Spoof-Attack-Detection (accessed on 26 July 2024).
Nguyen, D.T.; Park, Y.H.; Shin, K.Y.; Kwon, S.Y.; Lee, H.C.; Park, K.R. Fake finger-vein image detection based on Fourier and wavelet transforms. Digit. Signal Process. 2013, 23, 1401–1413. [Google Scholar] [CrossRef]
Tome, P.; Vanoni, M.; Marcel, S. On the vulnerability of finger vein recognition to spoofing. In Proceedings of the International Conference on the Biometrics Special Interest Group, Darmstadt, Germany, 10–12 September 2014; pp. 1–10. [Google Scholar]
Singh, J.M.; Venkatesh, S.; Raja, K.B.; Ramachandra, R.; Busch, C. Detecting finger-vein presentation attacks using 3D shape & diffuse reflectance decomposition. In Proceedings of the international conference on Signal-Image Technology & Internet-Based Systems, Sorrento, Italy, 26–29 November 2019; pp. 8–14. [Google Scholar] [CrossRef]
Ramachandra, R.; Raja, K.B.; Venkatesh, S.K.; Busch, C. Design and development of low-cost sensor to capture ventral and dorsal finger-vein for biometric authentication. IEEE Sens. J. 2019, 19, 6102–6111. [Google Scholar] [CrossRef]
Raghavendra, R.; Busch, C. Presentation attack detection algorithms for finger vein biometrics: A comprehensive study. In Proceedings of the International Conference on Signal Image Technology & Internet Based Systems, Bangkok, Thailand, 23–27 November 2015; pp. 628–632. [Google Scholar] [CrossRef]
Krishnan, A.; Thomas, T.; Nayar, G.R.; Sasilekha Mohan, S. Liveness detection in finger vein imaging device using plethysmographic signals. In Proceedings of the Intelligent Human Computer Interaction, Allahabad, India, 7–9 December 2018; pp. 251–260. [Google Scholar] [CrossRef]
Schuiki, J.; Prommegger, B.; Uhl, A. Confronting a variety of finger vein recognition algorithms with wax presentation attack artefacts. In Proceedings of the IEEE International Workshop on Biometrics and Forensics, Rome, Italy, 6–7 May 2021; pp. 1–6. [Google Scholar] [CrossRef]
Yang, H.; Fang, P.; Hao, Z. A GAN -based method for generating finger vein dataset. In Proceedings of the International Conference on Algorithms, Computing and Artificial Intelligence, Sanya, China, 24–26 December 2020; pp. 1–6. [Google Scholar] [CrossRef]
Zhang, J.; Lu, Z.; Li, M.; Wu, H. GAN-based image augmentation for finger-vein biometric recognition. IEEE Access 2019, 7, 183118–183132. [Google Scholar] [CrossRef]
Ciano, G.; Andreini, P.; Mazzierli, T.; Bianchini, M.; Scarselli, F. A multi-stage GAN for multi-organ chest X-ray image generation and segmentation. Mathematics 2021, 9, 2896. [Google Scholar] [CrossRef]
Wang, L.; Guo, D.; Wang, G.; Zhang, S. Annotation-efficient learning for medical image segmentation based on noisy pseudo labels and adversarial learning. IEEE Trans. Med. Imaging 2020, 40, 2795–2807. [Google Scholar] [CrossRef] [PubMed]
Choi, J.; Hong, J.S.; Kim, S.G.; Park, C.; Nam, S.H.; Park, K.R. RMOBF-Net: Network for the restoration of motion and optical blurred finger-vein images for improving recognition accuracy. Mathematics 2022, 10, 3948. [Google Scholar] [CrossRef]
Hong, J.S.; Choi, J.; Kim, S.G.; Owais, M.; Park, K.R. INF-GAN: Generative adversarial network for illumination normalization of finger-vein images. Mathematics 2021, 9, 2613. [Google Scholar] [CrossRef]
Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar] [CrossRef]
Tirunagari, S.; Poh, N.; Bober, M.; Windridge, D. Windowed DMD as a microtexture descriptor for finger vein counter-spoofing in biometrics. In Proceedings of the IEEE International Workshop on Information Forensics and Security, Rome, Italy, 16–19 November 2015; pp. 1–6. [Google Scholar] [CrossRef]
Kocher, D.; Schwarz, S.; Uhl, A. Empirical evaluation of LBP-extension features for finger vein spoofing detection. In Proceedings of the International Conference of the Biometrics Special Interest Group, Darmstadt, Germany, 21–23 September 2016; pp. 1–5. [Google Scholar] [CrossRef]
Bok, J.Y.; Suh, K.H.; Lee, E.C. Detecting fake finger-vein data using remote photoplethysmography. Electronics 2019, 8, 1016. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556; pp. 1–14, 1–14. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 84–90. [Google Scholar] [CrossRef]
Shaheed, K.; Mao, A.; Qureshi, I.; Abbas, Q.; Kumar, M.; Zhang, X. Finger-vein presentation attack detection using depthwise separable convolution neural network. Expert Syst. Appl. 2022, 198, 116786. [Google Scholar] [CrossRef]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar] [CrossRef]
Sengupta, S.; Kanazawa, A.; Castillo, C.D.; Jacobs, D.W. SfSNet: Learning shape, reflectance and illuminance of faces in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2017; pp. 6296–6305. [Google Scholar] [CrossRef]
Kang, B.J.; Park, K.R. Multimodal biometric method based on vein and geometry of a single finger. IET Comput. Vision 2010, 4, 209–217. [Google Scholar] [CrossRef]
Park, T.; Efros, A.A.; Zhang, R.; Zhu, J.-Y. Contrastive learning for unpaired image-to-image translation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 319–345. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Polosukhin, I. Attention is all you need. In Proceedings of the Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 1–11. [Google Scholar] [CrossRef]
Foret, P.; Kleiner, A.; Mobahi, H.; Neyshabur, B. Sharpness-aware minimization for efficiently improving generalization. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26 April–1 May 2022; pp. 7360–7371. [Google Scholar] [CrossRef]
Kingma, D.P. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar] [CrossRef]
Zou, D.; Cao, Y.; Li, Y.; Gu, Q. Understanding the generalization of Adam in learning neural networks with proper regularization. arXiv 2021, arXiv:2108.11371. [Google Scholar] [CrossRef]
Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.; Wang, Z.; Paul Smolley, S. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2813–2821. [Google Scholar] [CrossRef]
Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 694–711. [Google Scholar] [CrossRef]
Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar] [CrossRef]
Guo, M.-H.; Lu, C.-Z.; Liu, Z.-N.; Cheng, M.-M.; Hu, S.-M. Visual attention network. Comput. Vis. Media 2023, 9, 733–752. [Google Scholar] [CrossRef]
Brouty, X.; Garcin, M. Fractal properties; information theory, and market efficiency. Chaos Solitons Fractals 2024, 180, 114543. [Google Scholar] [CrossRef]
Yin, J. Dynamical fractal: Theory and case study. Chaos Solitons Fractals 2023, 176, 114190. [Google Scholar] [CrossRef]
Crownover, R.M. Introduction to Fractals and Chaos, 1st ed.; Jones & Bartlett Publisher: Burlington, MA, USA, 1995. [Google Scholar]
Tome, P.; Raghavendra, R.; Busch, C.; Tirunagari, S.; Poh, N.; Shekar, B.; Gragnaniello, D.; Sansone, C.; Verdoliva, L.; Marcel, S. The 1st competition on counter measures to finger vein spoofing attacks. In Proceedings of the International Conference on Biometrics, Phuket, Thailand, 19–22 May 2015; pp. 513–518. [Google Scholar] [CrossRef]
NVIDIA GeForce RTX 3060. Available online: https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3060-3060ti/ (accessed on 25 June 2024).
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local Nash equilibrium. In Proceedings of the International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6629–6640. [Google Scholar] [CrossRef]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 7–9 August 2017; pp. 214–223. [Google Scholar] [CrossRef]
ISO/IEC JTC1 SC37; Biometrics. ISO/IEC WD 30107–3: Information Technology—Presentation Attack Detection-Part 3: Testing and Reporting and Classification of Attacks. International Organization for Standardization: Geneva, Switzerland, 2014.
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar] [CrossRef]
Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5967–5976. [Google Scholar] [CrossRef]
Wang, T.-C.; Liu, M.-Y.; Zhu, J.-Y.; Tao, A.; Kautz, J.; Catanzaro, B. Catanzaro, High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8798–8807. [Google Scholar] [CrossRef]
Muhammad, M.B.; Yeasin, M. Eigen-cam: Class activation map using principal components. In Proceedings of the International Joint Conference on Neural Networks, Glasgow, UK, 19–24 July 2020; pp. 1–7. [Google Scholar] [CrossRef]
Face Anti-spoofing Challenge. Available online: https://sites.google.com/view/face-anti-spoofing-challenge/ (accessed on 26 February 2024).
Tu, Z.; Talebi, H.; Zhang, H.; Yang, F.; Milanfar, P.; Bovik, A.; Li, Y. Maxvit: Multi-axis vision transformer. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 459–479. [Google Scholar] [CrossRef]
Jetson TX2 Module. Available online: https://developer.nvidia.com/embedded/jetson-tx2 (accessed on 23 July 2024).
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar] [CrossRef]

$Fractalfract 08 00646 g001$

Figure 1. Overall flowchart of proposed method.

$Fractalfract 08 00646 g001$

$Fractalfract 08 00646 g002$

Figure 2. Architecture of DCS-GAN.

$Fractalfract 08 00646 g002$

$Fractalfract 08 00646 g003$

Figure 3. Samples for the selection of input and target image for training the generator and discriminator of DCS-GAN. * denotes one image randomly chosen in the intra-class of the input image, excluding the input image.

$Fractalfract 08 00646 g003$

$Fractalfract 08 00646 g004$

Figure 4. Architecture of enhanced ConvNeXt-Small.

$Fractalfract 08 00646 g004$

$Fractalfract 08 00646 g005$

Figure 5. Sample images of real finger-veins in the databases. (a) Examples from the ISPR database and (b) examples from the Idiap database.

$Fractalfract 08 00646 g005$

$Fractalfract 08 00646 g006$

Figure 6. Examples of data augmentation on the Idiap database. (a) Original image, (b) image shifted upward, (c) image shifted downward, (d) image shifted to the left, (e) image shifted to the right.

$Fractalfract 08 00646 g006$

$Fractalfract 08 00646 g007$

Figure 7. Graphs for the training and validation loss of DCS-GAN. (a) Training loss graph of the generator and the discriminator. (b) Validation loss graph of the generator and the discriminator.

$Fractalfract 08 00646 g007$

$Fractalfract 08 00646 g008$

Figure 8. Training and validation accuracy (Acc) and loss (Loss) graphs of the enhanced ConvNeXt-Small. (a) Training accuracy and loss graphs. (b) Validation accuracy and loss graphs.

$Fractalfract 08 00646 g008$

$Fractalfract 08 00646 g009$

Figure 9. Sample images of fake finger-vein images generated by DCS-GAN and other SOTA methods. Examples of (a) original image and images generated by (b) Pix2Pix, (c) Pix2PixHD, (d) CycleGAN, (e) CUT, and (f) DCS-GAN.

$Fractalfract 08 00646 g009$

$Fractalfract 08 00646 g010a$ $Fractalfract 08 00646 g010b$ $Fractalfract 08 00646 g010c$

Figure 10. FD estimation analysis for comparison between real and fake vein images: the first to the fourth images, from the left, in (a–h) mean finger vein image, CAM, BCAM, and FD graph, respectively. (a,c,e,g) show the real finger-vein images whereas (b,d,f,h) present the corresponding fake finger-vein images.

$Fractalfract 08 00646 g010a$ $Fractalfract 08 00646 g010b$ $Fractalfract 08 00646 g010c$

$Fractalfract 08 00646 g011$

Figure 11. ROC curves of TPR according to FPR by the proposed and the SOTA methods on (a) ISPR database and (b) Idiap database.

$Fractalfract 08 00646 g011$

$Fractalfract 08 00646 g012$

Figure 12. Jetson TX2 board.

$Fractalfract 08 00646 g012$

$Fractalfract 08 00646 g013$

Figure 13. Examples of correct spoof detection by the proposed method. (a) and (c) are examples of real images from the ISPR and Idiap databases, respectively, and (b) and (d) are corresponding examples of fake images.

$Fractalfract 08 00646 g013$

$Fractalfract 08 00646 g014$

Figure 14. Examples of incorrect spoof detection by the proposed method. (a) and (c) are examples of real images from the ISPR and Idiap databases, respectively, and (b) and (d) are corresponding examples of fake images. In the proposed method, (b) and (d) are incorrectly identified as real images.

$Fractalfract 08 00646 g014$

$Fractalfract 08 00646 g015$

Figure 15. Grad-CAM images. (a) shows Grad-CAM images for real images, while (b) shows Grad-CAM images for fake images generated from the real images in (a). In both (a,b), the first row is from the ISPR database, and the second row is from the Idiap database. Each row starts with the input image on the far left, followed by Grad-CAM images acquired from the first ConvNeXt Block, second ConvNeXt Block, third ConvNeXt Block, fourth ConvNeXt Block, and LKA attention of Table 4, respectively.

$Fractalfract 08 00646 g015$

Table 1. Comparison of existing and proposed methods for spoof attack and spoof detection in finger-vein recognition.

Category		Methods	Advantages	Disadvantages
Spoof attack	Using fake fabricated artifacts	Printed on OHP film, matte paper, and A4 paper using a LaserJet printer at resolutions of 300, 1200, 2400 dpi and then applied to the finger [5]	Considers even the curvature of the finger during the spoof attack	- The quality of the fake image is not high due to not emphasizing the vein pattern - Labor-intensive and costly to produce fabricated artifacts
		Printed using an inkjet printer and applied to a prosthesis and a thin rubber cap [10]
		Printed using a LaserJet printer and applied to wax [11]
		Printed using laser and inkjet printers and replayed on smartphone display [9]	Provides more realistic motion information through display replaying
		Printed using a LaserJet printer and enhanced the vein outline with a black whiteboard marker [6]	Improved vein pattern quality by applying post-processing after printing	- Very low image quality compared to generated images - Spoof attack performance against CNN-based detector is not high
		Printed on glossy paper using an inkjet printer and enhanced the vein pattern using Ramachandra et al. [8]‘s algorithm [7]
	Using fake generated images	Generated fake finger-vein images using CycleGAN [3]	The first study to use generated finger-vein images for both spoof attack and detection	Unable to generate elaborate fake finger-vein images
	Using fake generated images	Generates fake finger-vein images using DCS-GAN (Proposed method)	Generates fake data that is similar to the characteristic distribution of original finger-vein images	Unlike the structure of the existing research model CycleGAN, requires two discriminators and a multilayered perceptron (MLP)
Spoof detection	Machine learning-based	Steerable pyramid + SVM [9]	Requires less time for training compared to deep learning-based methods	Performance degradation in spoof detection depending on various spoof data generation methods
		W-DMD + SVM [19]
		FT, Haar and Daubechies wavelet + SVM [5]
		Discrete FT + SVM [21]
	Deep learning-based	Modified network of AlexNet or VGG-Net + PCA + SVM [1]	Enables diverse spoof detection through learning CNN filters for efficient feature extraction	Lower accuracy in spoof detection against elaborately created fake finger-vein images
		Xception (entry flow) + linear SVM [24]
		Ensemble network of DenseNet-161 and DenseNet-169 + SVM [3]
		SfS-Net + linear SVM [7]
		Enhanced network of ConvNeXt-Small (Proposed method)	- Processes in one stage, eliminating the need for a separate classifier - High accuracy in spoof detection against elaborately created fake finger-vein images	The time required for CNN training is significant

Table 2. Descriptions of the generator in the DCS-GAN (NA means “not available”).

Layer Type		Kernel Size	Number of Filters	Stride	Input Size	Output Size
Input		NA	NA	NA	224 × 224 × 3	224 × 224 × 3
3 × 3 Padding (Reflect)		NA	NA	NA	224 × 224 × 3	230 × 230 × 3
1st Conv Block *	Conv Instance Norm (ReLU)	7 NA	64 NA	1 NA	230 × 230 × 3 224 × 224 × 64	224 × 224 × 64 224 × 224 × 64
2nd Conv Block * (ReLU)		3	128	1	224 × 224 × 64	224 × 224 × 128
Antialiasing Sampling (Down)		4	NA	NA	224 × 224 × 128	112 × 112 × 128
3rd Conv Block * (ReLU)		3	256	1	112 × 112 × 128	112 × 112 × 256
Antialiasing Sampling (Down)		4	NA	NA	112 × 112 × 256	56 × 56 × 256
1st Res Block	1 × 1 Padding (Reflect) 4th Conv Block * (ReLU) 1 × 1 Padding (Reflect) 5th Conv Block * (Linear)	NA 3 NA 3	NA 256 NA 256	NA 1 NA 1	56 × 56 × 256 58 × 58 × 256 56 × 56 × 256 58 × 58 × 256	58 × 58 × 256 56 × 56 × 256 58 × 58 × 256 56 × 56 × 256
1st Self-attention		NA	NA	NA	56 × 56 × 256	56 × 56 × 256
2nd–8th Res Blocks with Self-attentions		NA	NA	NA	56 × 56 × 256	56 × 56 × 256
9th Res Block		3	256	1	56 × 56 × 256	56 × 56 × 256
9th Self-attention		NA	NA	NA	56 × 56 × 256	56 × 56 × 256
Antialiasing Sampling (Up)		4	NA	NA	56 × 56 × 256	112 × 112 × 256
22nd Conv Block * (ReLU)		3	128	1	112 × 112 × 256	112 × 112 × 128
Antialiasing Sampling (Up)		4	NA	NA	112 × 112 × 128	224 × 224 × 128
23rd Conv Block * (ReLU)		3	64	1	224 × 224 × 128	224 × 224 × 64
3 × 3 Padding (Reflect)		NA	NA	NA	224 × 224 × 64	230 × 230 × 64
24th Conv Block (Tanh)		7	3	1	230 × 230 × 64	224 × 224 × 3
Output		NA	NA	NA	224 × 224 × 3	224 × 224 × 3

* indicates that instance normalization is included after the corresponding layer.

Table 3. Descriptions of the discriminator in the DCS-GAN (NA means “not available”).

Layer	Kernel Size	Number of Filters	Stride	Input Size	Output Size
Input	NA	NA	NA	224 × 224 × 3	224 × 224 × 3
25th Conv Block (Leaky ReLU)	4	64	1	224 × 224 × 3	224 × 224 × 64
Antialiasing Sampling (Down)	4	NA	n/1	224 × 224 × 64	112 × 112 × 64
26th Conv Block * (Leaky ReLU)	4	128	1	112 × 112 × 64	112 × 112 × 128
Antialiasing Sampling (Down)	4	NA	NA	112 × 112 × 128	56 × 56 × 128
27th Conv Block * (Leaky ReLU)	4	256	1	56 × 56 × 128	56 × 56 × 256
Antialiasing Sampling (Down)	4	NA	NA	56 × 56 × 256	28 × 28 × 256
1 × 1 Padding (Constant)	NA	NA	NA	28 × 28 × 256	30 × 30 × 256
28th Conv Block * (Leaky ReLU)	4	512	1	30 × 30 × 256	27 × 27 × 512
1 × 1 Padding (Constant)	NA	NA	NA	27 × 27 × 512	29 × 29 × 512
29th Conv Block (Linear)	4	1	1	29 × 29 × 512	26 × 26 × 1
Output	NA	NA	NA	26 × 26 × 1	26 × 26 × 1

* indicates that instance normalization is included after the corresponding layer.

Table 4. Descriptions of enhanced ConvNeXt-Small (NA means “not available”).

Layer		Number of Blocks	Kernel Size	Number of Filters	Stride	Input Size	Output Size
Input		NA	NA	NA	NA	224 × 224 × 3	224 × 224 × 3
Stem	Conv *	NA	4 × 4	96	4	224 × 224 × 3	56 × 56 × 96
1st ConvNeXt Block	Depthwise Conv * Dense (GELU) Dense	3	7 × 7 1 × 1 1 × 1	96 384 96	1 1 1	56 × 56 × 96 56 × 56 × 96 56 × 56 × 384	56 × 56 × 96 56 × 56 × 384 56 × 56 × 96
1st Down Sampling Block	Layer Norm Conv	1	NA 2 × 2	NA 192	NA 2	56 × 56 × 96 56 × 56 × 96	56 × 56 × 96 28 × 28 × 192
2nd ConvNeXt Block		3	7 × 7 1 × 1 1 × 1	192 768 192	1 1 1	28 × 28 × 192 28 × 28 × 192 28 × 28 × 768	28 × 28 × 192 28 × 28 × 768 28 × 28 × 192
2nd Down Sampling Block		1	NA 2 × 2	NA 384	NA 2	28 × 28 × 192 28 × 28 × 192	28 × 28 × 192 14 × 14 × 384
3rd ConvNeXt Block		27	7 × 7 1 × 1 1 × 1	384 1536 384	1 1 1	14 × 14 × 384 14 × 14 × 384 14 × 14 × 1536	14 × 14 × 384 14 × 14 × 1536 14 × 14 × 384
3rd Down Sampling Block		1	NA 2 × 2	NA 768	NA 2	14 × 14 × 384 14 × 14 × 384	14 × 14 × 384 7 × 7 × 768
4th ConvNeXt Block		3	7 × 7 1 × 1 1 × 1	768 3072 768	1 1 1	7 × 7 × 768 7 × 7 × 768 7 × 7 × 3072	7 × 7 × 768 7 × 7 × 3072 7 × 7 × 768
Conv (GELU)		NA	1 × 1	768	1	7 × 7 × 768	7 × 7 × 768
LKA	Conv Dilation Conv Conv multiply	NA NA NA NA	5 × 5 7 × 7 1 × 1 NA	768 768 768 NA	1 1 1 NA	7 × 7 × 768 7 × 7 × 768 7 × 7 × 768 7 × 7 × 768	7 × 7 × 768 7 × 7 × 768 7 × 7 × 768 7 × 7 × 768
Conv Add		NA NA	1 × 1 NA	768 NA	1 NA	7 × 7 × 768 7 × 7 × 768	7 × 7 × 768 7 × 7 × 768
Global Average Pooling		NA	NA	NA	NA	7 × 7 × 768	768
Dense (Softmax)		NA	NA	2	NA	768	2

* indicates that layer normalization is included after the corresponding layer.

Table 5. Detailed description of the ISPR and Idiap databases.

Database	Number of Trials	Number of Individuals	Number of Hands	Number of Fingers	Total Number of Images
ISPR	10	33	2	5	3300
Idiap	2	110	2	1	440

Table 6. Hyperparameters used to train DCS-GAN.

Parameter Types	Value
Learning decay step	10,000
Learning decay rate	0.9
Learning rate	2 × 10⁻⁴
Optimizer	Adam + SAM
Beta 1	0.5
Beta 2	0.999
Batch size	1
Epochs	400
Adversarial loss	LSGAN
Additional loss	Patch, Perceptual

Table 7. Hyperparameters used to train enhanced ConvNeXt-Small.

Parameter Types	Value
Learning decay step	None
Learning decay rate	None
Learning rate	1 × 10⁻⁶
Beta 1	0.9
Beta 2	0.999
Epsilon	1 × 10⁻⁷
Batch size	4
Epochs	30
Loss	Cross entropy

Table 8. Performance variation depending on the modules composing DCS-GAN (“Perceptual” refers to the cases where pre-trained VGG-16 was used for calculating perceptual loss as per Equation (10), and “Dense perceptual” indicates the cases where DenseNet-161 [3] was used in the same manner).

Perceptual Loss	Dense Perceptual	SAM	Self-Attention	FID			WD
Perceptual Loss	Dense Perceptual	SAM	Self-Attention	1-Fold Validation	2-Fold Validation	Average	1-Fold Validation	2-Fold Validation	Average
				19.261	25.869	22.565	30.380	7.010	18.695
√				16.365	18.927	17.646	19.576	7.180	13.378
	√			13.149	14.689	13.919	4.286	6.428	5.357
	√	√		12.283	5.457	8.870	25.039	20.554	22.797
	√	√	√	8.531	5.671	7.101	19.782	17.050	18.416

Table 9. Comparison of spoof attack performance by generated images on DenseNet-161 and DenseNet-169 (“Perceptual” refers to the cases where pre-trained VGG-16 was used for calculating perceptual loss as per Equation (10), and “Dense perceptual” indicates the cases where DenseNet-161 [3] was used in the same manner) (metric: ACER, unit: %).

Classification Model	Generation Model				1-Fold Validation	2-Fold Validation	Average
Classification Model	Perceptual Loss	Dense Perceptual	SAM	Self-Attention	1-Fold Validation	2-Fold Validation	Average
DenseNet-161					0.27	0.27	0.27
	√				0.36	0.33	0.35
		√			0.46	0.49	0.48
		√	√		0.73	0.79	0.76
		√	√	√	0.94	1.15	1.05
DenseNet-169					0.36	0.42	0.39
	√				0.64	0.70	0.67
		√			0.67	0.73	0.70
		√	√		0.73	0.82	0.77
		√	√	√	1.06	1.00	1.03

Table 10. Comparison of fake image generation performance depending on data augmentation (# means “number of” and × means “none”).

Augmentation		FID			WD
Random Crop	# Directions for Shift	1-Fold Validation	2-Fold Validation	Average	1-Fold Validation	2-Fold Validation	Average
256 → 224	×	29.579	26.997	28.288	40.141	61.686	50.914
300 → 224	×	27.765	26.571	27.168	35.086	37.973	36.530
256 → 224	2	30.377	26.366	28.372	20.282	40.532	30.407
300 → 224	2	30.062	23.533	26.798	30.382	46.741	38.562
256 → 224	4	24.810	21.891	23.351	8.614	11.632	10.123
300 → 224	4	27.548	24.965	26.257	28.814	50.336	39.575
256 → 224	8	25.685	28.129	26.907	16.505	18.805	17.655
300 → 224	8	27.702	24.758	26.230	24.214	25.905	25.060

Table 11. Comparison of spoof attack performance by type of post-processing on DenseNet-161 and DenseNet-169 using the ISPR database (metric: ACER, unit: %).

Classification Model	Post Processing	Kernel Size	1-Fold Validation	2-Fold Validation	Average
DenseNet-161	Average filter	3 × 3	0.30	0.49	0.40
	Average filter	5 × 5	0.06	0.09	0.08
	Gaussian filter	3 × 3	0.85	1.33	1.09
	Gaussian filter	5 × 5	0.18	0.30	0.24
	Median filter	3 × 3	3.49	3.67	3.58
	Median filter	5 × 5	6.25	5.22	5.74
DenseNet-169	Average filter	3 × 3	0.70	0.64	0.67
	Average filter	5 × 5	0.24	0.42	0.33
	Gaussian filter	3 × 3	0.64	0.94	0.79
	Gaussian filter	5 × 5	0.36	0.40	0.38
	Median filter	3 × 3	2.31	3.09	2.70
	Median filter	5 × 5	8.04	6.58	7.31

Table 12. Comparison of spoof attack performance by type of post-processing on DenseNet-161 and DenseNet-169 using the Idiap database (metric: ACER, unit: %).

Classification Model	Post Processing	Kernel Size	1-Fold Validation	2-Fold Validation	Average
DenseNet-161	Average filter	3 × 3	1.82	2.27	2.05
	Average filter	5 × 5	0.91	0.68	0.80
	Gaussian filter	3 × 3	2.27	2.73	2.50
	Gaussian filter	5 × 5	1.82	2.73	2.28
	Median filter	3 × 3	1.14	1.14	1.14
	Median filter	5 × 5	0.00	0.45	0.23
DenseNet-169	Average filter	3 × 3	1.36	1.82	1.59
	Average filter	5 × 5	0.91	2.28	1.60
	Gaussian filter	3 × 3	2.27	2.73	2.50
	Gaussian filter	5 × 5	1.36	1.36	1.36
	Median filter	3 × 3	2.05	0.91	1.48
	Median filter	5 × 5	1.59	0.23	0.91

Table 13. Comparing the image quality testing of generated fake finger-vein images by DCS-GAN with testing of those generated by the SOTA methods.

Database	Model	FID	WD
ISPR	Pix2Pix [47]	32.193	7.887
	Pix2PixHD [48]	13.875	10.305
	CycleGAN [18]	23.576	13.792
	CUT [29]	22.565	18.695
	DCS-GAN (Proposed)	7.601	18.158
Idiap	Pix2Pix [47]	55.062	5.750
	Pix2PixHD [48]	24.200	2.625
	CycleGAN [18]	33.176	2.868
	CUT [29]	24.196	3.109
	DCS-GAN (Proposed)	23.351	10.123

Table 14. FD, R², and C values from Figure 10.

Results	Case 1		Case 2		Case 3		Case 4
Results	Real Figure 10a	Fake Figure 10b	Real Figure 10c	Fake Figure 10d	Real Figure 10e	Fake Figure 10f	Real Figure 10g	Fake Figure 10h
R²	0.99903	0.99930	0.99689	0.99701	0.99968	0.99980	0.99916	0.99931
C	0.99952	0.99965	0.99844	0.99850	0.99984	0.99990	0.99958	0.99965
FD	2.00286	1.99995	2.00492	1.99574	2.03563	2.04030	1.96712	1.96665

Table 15. Comparison of performance between enhanced ConvNeXt and conventional ConvNeXt (unit: %).

Model	Database	1-Fold			2-Fold			Average
Model	Database	APCER	BPCER	ACER	APCER	BPCER	ACER	APCER	BPCER	ACER
ConvNeXt-Tiny	ISPR	0.24	2.37	1.31	0.12	1.15	0.64	0.18	1.76	0.98
ConvNeXt-Tiny	Idiap	0.00	0.91	0.45	0.00	0.45	0.23	0.00	0.68	0.34
Enhanced ConvNeXt-Tiny	ISPR	0.67	0.97	0.82	0.00	0.61	0.30	0.34	0.79	0.56
Enhanced ConvNeXt-Tiny	Idiap	0.00	0.45	0.23	0.00	0.45	0.23	0.00	0.45	0.23
ConvNeXt-Small	ISPR	0.79	1.40	1.09	0.42	0.61	0.52	0.61	1.01	0.81
ConvNeXt-Small	Idiap	0.91	0.00	0.45	0.00	0.45	0.23	0.46	0.23	0.34
Enhanced ConvNeXt-Small (Proposed)	ISPR	0.43	0.91	0.67	0.06	0.18	0.12	0.25	0.55	0.40
Enhanced ConvNeXt-Small (Proposed)	Idiap	0.00	0.45	0.23	0.00	0.00	0.00	0.00	0.23	0.12

Table 16. Performance comparison based on various score-fusion methods used in existing research.

Database	Method		1-Fold			2-Fold			Average
Database	Method		APCER	BPCER	ACER	APCER	BPCER	ACER	APCER	BPCER	ACER
ISPR	SVM	Linear	0.06	2.31	1.18	0.00	1.03	0.52	0.03	1.67	0.85
		RBF	0.06	2.31	1.18	0.00	1.03	0.52	0.03	1.67	0.85
		Poly	0.06	2.19	1.12	0.00	1.03	0.52	0.03	1.61	0.82
		Sigmoid	0.00	4.86	2.43	0.00	2.00	1.00	0.00	3.43	1.72
Idiap	SVM	Linear	0.00	0.91	0.45	0.00	0.45	0.23	0.00	0.68	0.34
		RBF	0.00	0.91	0.45	0.00	0.45	0.23	0.00	0.68	0.34
		Poly	0.00	0.91	0.45	0.00	0.45	0.23	0.00	0.68	0.34
		Sigmoid	0.00	3.18	1.59	0.00	0.91	0.45	0.00	2.05	1.02

Table 17. Comparisons of spoof detection testing errors by the proposed and the SOTA methods on ISPR database (unit: %).

Method	1-Fold			2-Fold			Average
Method	APCER	BPCER	ACER	APCER	BPCER	ACER	APCER	BPCER	ACER
Ensemble Networks + SVM [3]	0.06	2.19	1.12	0.00	1.03	0.52	0.03	1.61	0.82
Modified Xception + LSVM [24]	0.61	2.61	1.61	0.30	1.03	0.67	0.46	1.82	1.14
Steerable pyramid + SVM [9]	7.83	2.79	5.31	6.43	1.76	4.10	7.13	2.28	4.71
Modified VGG16 + PCA + SVM [1]	2.79	0.00	1.40	3.46	0.12	1.79	3.13	0.06	1.60
MaxViT-Small [51]	2.31	1.28	1.79	2.31	2.00	2.15	2.31	1.64	1.97
Enhanced ConvNeXt-Small (Proposed)	0.43	0.91	0.67	0.06	0.18	0.12	0.25	0.55	0.40

Table 18. Comparisons of spoof detection testing errors by the proposed and the SOTA methods on Idiap database (unit: %).

Method	1-Fold			2-Fold			Average
Method	APCER	BPCER	ACER	APCER	BPCER	ACER	APCER	BPCER	ACER
Ensemble Networks + SVM [3]	0.00	0.91	0.45	0.00	0.45	0.23	0.00	0.68	0.34
Modified Xception + LSVM [24]	0.00	1.36	0.68	0.00	3.64	1.82	0.00	2.50	1.25
Steerable pyramid + SVM [9]	0.00	1.82	0.91	0.00	2.27	1.14	0.00	2.05	1.03
Modified VGG16 + PCA + SVM [1]	0.45	2.27	1.36	0.91	0.45	0.68	0.68	1.36	1.02
MaxViT-Small [51]	0.91	0.45	0.68	1.36	0.45	0.91	1.14	0.45	0.80
Enhanced ConvNeXt-Small (Proposed)	0.00	0.45	0.23	0.00	0.00	0.00	0.00	0.23	0.12

Table 19. Comparisons of spoof detection testing errors with the generated images by the DCS-GAN and the SOTA methods using the proposed enhanced ConvNeXt-Small detector (unit: %).

Database	Model	1-Fold			2-Fold			Average
Database	Model	APCER	BPCER	ACER	APCER	BPCER	ACER	APCER	BPCER	ACER
ISPR	Pix2Pix [47]	0.00	0.55	0.27	0.24	0.18	0.21	0.12	0.37	0.24
	Pix2PixHD [48]	0.30	0.49	0.39	0.14	0.28	0.21	0.22	0.39	0.30
	CycleGAN [18]	0.79	0.12	0.46	0.00	0.55	0.27	0.40	0.34	0.37
	CUT [29]	0.00	0.67	0.33	0.28	0.57	0.42	0.14	0.62	0.38
	DCS-GAN (Proposed)	0.43	0.91	0.67	0.06	0.18	0.12	0.25	0.55	0.40
Idiap	Pix2Pix [47]	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
	Pix2PixHD [48]	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
	CycleGAN [18]	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
	CUT [29]	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
	DCS-GAN (Proposed)	0.00	0.45	0.23	0.00	0.00	0.00	0.00	0.23	0.12

Table 20. Comparison of the proposed method and the SOTA methods in terms of average processing time per image, GPU memory usage, number of parameters, and FLOPs on the Jetson TX2 board.

Method	Processing Time (Unit: ms (fps))	GPU Memory Usage (Unit: MB)	Number of Param. (Unit: M)	FLOPs (Unit: G)
Ensemble Networks + SVM [3]	113.30 (8.83)	190.66	39.34	22.27
Modified Xception + LSVM [24]	27.87 (35.88)	96.58	1.40	3.03
Modified VGG16 + PCA + SVM [1]	61.70 (16.20)	538.85	14.71	30.95
MaxViT-Small [51]	218.06 (4.59)	314.98	68.23	22.32
Enhanced ConvNeXt-Small (Proposed)	97.22 (10.29)	219.16	51.29	17.17

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, S.G.; Hong, J.S.; Kim, J.S.; Park, K.R. Estimation of Fractal Dimension and Detection of Fake Finger-Vein Images for Finger-Vein Recognition. Fractal Fract. 2024, 8, 646. https://doi.org/10.3390/fractalfract8110646

AMA Style

Kim SG, Hong JS, Kim JS, Park KR. Estimation of Fractal Dimension and Detection of Fake Finger-Vein Images for Finger-Vein Recognition. Fractal and Fractional. 2024; 8(11):646. https://doi.org/10.3390/fractalfract8110646

Chicago/Turabian Style

Kim, Seung Gu, Jin Seong Hong, Jung Soo Kim, and Kang Ryoung Park. 2024. "Estimation of Fractal Dimension and Detection of Fake Finger-Vein Images for Finger-Vein Recognition" Fractal and Fractional 8, no. 11: 646. https://doi.org/10.3390/fractalfract8110646

APA Style

Kim, S. G., Hong, J. S., Kim, J. S., & Park, K. R. (2024). Estimation of Fractal Dimension and Detection of Fake Finger-Vein Images for Finger-Vein Recognition. Fractal and Fractional, 8(11), 646. https://doi.org/10.3390/fractalfract8110646

Article Menu

Estimation of Fractal Dimension and Detection of Fake Finger-Vein Images for Finger-Vein Recognition

Abstract

1. Introduction

2. Related Work

2.1. Spoof Attack

2.1.1. Using Fake Fabricated Artifacts

2.1.2. Using Fake Generated Images

2.2. Spoof Detection

2.2.1. Machine Learning-Based Methods

2.2.2. Deep Learning-Based Methods

3. Proposed Method

3.1. Flow Diagram of the Proposed Method

3.2. The Preprocessing of the Finger-Vein Images

3.3. Spoof Attack Procedure

3.3.1. Generation of Fake Finger-Vein Image Using DCS-GAN

3.3.2. The Post-Processing Stage for the Generation of Fake Finger-Vein Images

3.4. Spoof Detection Procedure

Spoof Detection of Fake-Vein Image by Enhanced ConvNeXt

3.5. Fractal Dimension Estimation

4. Experimental Results

4.1. Experimental Database and Setups

4.2. Training of the Proposed Networks

4.2.1. Training of DCS-GAN for Spoof Attack

4.2.2. Training of Enhanced ConvNeXt-Small for Spoof Detection

4.3. Testing of Proposed Model

4.3.1. Evaluation Metrics

4.3.2. Performance Test of the Spoof Attack

4.3.2.1. Ablation Studies

4.3.2.2. Comparing Image Quality by the Proposed and SOTA Approaches

4.3.2.3. FD Estimation for Evaluating Generated Image Quality by the Proposed Method

4.3.3. Performance Test of Spoof Detection

4.3.3.1. Ablation Study

4.3.3.2. Comparisons of Spoof Detection Accuracies by Proposed and SOTA Methods

4.3.3.3. Comparisons of Algorithm Complexity

5. Discussions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI