Next Article in Journal
Real-Time Air-Writing Recognition for Arabic Letters Using Deep Learning
Previous Article in Journal
Preservation and Protection of Cultural Heritage: Vibration Monitoring and Seismic Vulnerability of the Ruins of Carmo Convent (Lisbon)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Visual Feature-Guided Diamond Convolutional Network for Finger Vein Recognition

Artificial Intelligence and Computer Vision Laboratory, Zhongshan Institute, University of Electronic Science and Technology of China, Zhongshan 528402, China
*
Author to whom correspondence should be addressed.
Sensors 2024, 24(18), 6097; https://doi.org/10.3390/s24186097
Submission received: 16 July 2024 / Revised: 24 August 2024 / Accepted: 16 September 2024 / Published: 20 September 2024
(This article belongs to the Section Sensing and Imaging)

Abstract

:
Finger vein (FV) biometrics have garnered considerable attention due to their inherent non-contact nature and high security, exhibiting tremendous potential in identity authentication and beyond. Nevertheless, challenges pertaining to the scarcity of training data and inconsistent image quality continue to impede the effectiveness of finger vein recognition (FVR) systems. To tackle these challenges, we introduce the visual feature-guided diamond convolutional network (dubbed ‘VF-DCN’), a uniquely configured multi-scale and multi-orientation convolutional neural network. The VF-DCN showcases three pivotal innovations: Firstly, it meticulously tunes the convolutional kernels through multi-scale Log-Gabor filters. Secondly, it implements a distinctive diamond-shaped convolutional kernel architecture inspired by human visual perception. This design intelligently allocates more orientational filters to medium scales, which inherently carry richer information. In contrast, at extreme scales, the use of orientational filters is minimized to simulate the natural blurring of objects at extreme focal lengths. Thirdly, the network boasts a deliberate three-layer configuration and fully unsupervised training process, prioritizing simplicity and optimal performance. Extensive experiments are conducted on four FV databases, including MMCBNU_6000, FV_USM, HKPU, and ZSC_FV. The experimental results reveal that VF-DCN achieves remarkable improvement with equal error rates (EERs) of 0.17 % , 0.19 % , 2.11 % , and 0.65 % , respectively, and Accuracy Rates (ACC) of 100 % , 99.97 % , 98.92 % , and 99.36 % , respectively. These results indicate that, compared with some existing FVR approaches, the proposed VF-DCN not only achieves notable recognition accuracy but also shows fewer number of parameters and lower model complexity. Moreover, VF-DCN exhibits superior robustness across diverse FV databases.

1. Introduction

Finger vein (FV) biometrics has emerged as an exceptionally secure and reliable technology for personal identity authentication. Finger veins are vascular pattern features that are imperceptible to the naked eye, but can be captured by using near-infrared (NIR) light with a specific wavelength ranging from 700 nm to 1000 nm [1]. When NIR light passes through the finger, blood vessels absorb the light, causing a distinctive dark pattern on the image. Such unique vein patterns offer several advantages over other biometric traits, including:
  • Highsecurity. The intricate and distinctive patterns of FV are unique, rendering them exceedingly difficult to replicate or forge.
  • Non-contact. Finger vein recognition (FVR) does not require physical contact with the sensor, significantly reducing the risk of contamination and the transmission of germs.
  • User-friendly. The process of FVR is swift and straightforward, simply requiring the user to put their finger close to the sensor. Moreover, FVR is accessible to a wide range of individuals, regardless of age, gender, or complexion.
The cornerstone of FVR lies in the extraction of discriminative features from acquired images, which can be achieved through two types of primary methods: the handcrafted-based and the deep learning-driven. In the early stages of research, Miura [2,3] pioneered curvature-based methods that captured the extent of curve bending at a particular point, albeit being susceptible to noise. Later, Gabor filtering-based methods [4,5] were introduced to enhance and extract FV features, while Gabor filters are tunable to detect specific frequencies and orientations, finding optimal parameters for a given dataset remains challenging. Subsequently, curvature and Radon-like features (RLFs) were combined to effectively aggregate spatial information around vein structures [6], highlighting vein patterns and suppressing spurious non-boundary responses and noise. However, the obtained features are influenced by illumination variations. Recently, binary patterns of phase congruency (BPPCs) and pyramids of histograms of orientation gradients (PHOGs) have been incorporated for FV feature extraction [7]. However, this method remains susceptible to local changes in scale, translation, and other factors. Handcrafted-based methods rely heavily on expert experience rather than data-driven, which are not always efficient but tend to vary across databases and scenarios.
On the contrary, deep learning-driven methods, which are inherently reliant on training data, have the potential to address some of these challenges. Various classical convolutional neural networks (CNNs), such as VGGNet [8,9], AlexNet [10,11], ResNet [12], DenseNet [13,14,15], Siamese Networks [16,17], Xception [18], and generative adversarial networks (GANs) [19], have demonstrated robustness in a range of image recognition issues, and also exhibited outstanding performance in FVR through fine-tuning and transfer learning [20]. In addition, self-attention mechanisms are also explored in FVR. Among them, a vein pattern constrained transformer (VPCFormer) [21] was proposed that incorporates a self-attention mechanism to capture the correlations between different views of FV patterns, helping the model learn more discriminative features and improving its robustness. Then, a large kernel and attention mechanism network (Let-Net) [22] was presented that also utilizes a self-attention mechanism to enhance the feature representation. By incorporating large kernels and an attention mechanism, the network can capture both local and global context information. SE-DenseNet-HP [23], on the other hand, combined the squeeze-and-excitation (SE) channel attention with a hybrid pooling mechanism, allowing the model to dynamically recalibrate channel-wise feature responses and extract discriminative multi-scale features. The attention mechanism acquires the attention weights by calculating the similarity between different units (channels and channels, pixels and pixels) in the feature maps, thus achieving a concentration of information.
It is noteworthy that the attention mechanism typically elevates the computational and storage requirements of the network, necessitating longer training and inference times. In certain scenarios, the attention mechanism might inadvertently concentrate on irrelevant features, potentially causing the model to overlook crucial information [24]. In contrast, the human visual system possesses a swift and dynamic ability to adjust its perception of external objects. When the visual range is optimally positioned, it can effortlessly capture intricate details. Conversely, for objects situated too far or too close, the visual system instinctively lowers its resolution to prioritize discernible features, given the challenges of distinguishing finer details.
To address these challenges and harness the strengths of both traditional visually guided handcrafted methods and deep learning (DL) methods, while minimizing their respective limitations, we propose a uniquely configured multi-scale and multi-orientation convolutional neural network. This unique architecture, coined the visual feature-guided diamond convolutional network (hereinafter dubbed ‘VF-DCN’), boasts a deliberate three-layer configuration and fully unsupervised training process, focusing on attaining simplicity and optimal performance. In all convolutional layers of VF-DCN, the convolutional kernels are tuned through multi-scale Log-Gabor filters, and then, an adaptive orientational filter learning strategy for the convolutional kernels across different scales is implemented that draws on the human vision. Remarkably, VF-DCN showcases an innovative diamond-shaped convolutional structure that efficiently maintains a wider range of orientational kernels at medium scales. The main contributions of this work are summarized as follows:
  • Visual feature-guided convolutional kernels. The Log-Gabor filters, which closely mimic the frequency response of visual cells, are used to generate multi-scale Log-Gabor convolutional kernels. This ingenious design empowers the network to capture visual features with unprecedented effectiveness.
  • Diamondconvolutionalstructure. Inspired by retina imaging, where images become blurred at extreme focal lengths, a diamond convolutional structure is crafted to extract significant orientational information through training across multi-scale Log-Gabor filters.
  • Fullyunsupervisedlearningnetwork. The network is deliberately designed with just two Log-Gabor convolutional layers and a fully unsupervised training process, achieving a harmonious balance between simplicity and efficiency.
The remainder of this paper is organized as follows: Section 2 provides a summary review of Gabor and Log-Gabor filtering approaches for FVR. Section 3 details the design of Log-Gabor convolutional kernels. Section 4 elaborates on the entire recognition process of the proposed VF-DCN model. Section 5 discusses the experimental results to comprehensively assess the performance of the VF-DCN model. Four FV databases are adopted that contain images with varying qualities, resolutions, and dynamic ranges. Section 6 concludes the work with some remarks and hints at plausible future research lines.

2. Related Works

In this section, we provide a concise overview of Gabor-like filters, specifically Gabor and Log-Gabor, in the context of FVR applications. The Gabor filter family, inspired by the receptive fields of simple cells in the mammalian visual cortex, exhibits robustness to distortion in their coefficient magnitudes, rendering them ideally suited for pattern recognition tasks [25], including those pertaining to finger veins.

2.1. Gabor Filters

In the field of FVR, Gabor filters have been broadly used for feature enhancement and representation. Among them, a bank of even-symmetric Gabor filters with 8 orientations was used to exploit vein information in the images [4]. Then, Yang et al. [26] extended the Gabor filter bank to 2 scales and 8 orientations, and Wang et al. [27] used a bank of 24 Gabor filters covering 4 scales and 6 orientations. Moreover, fusion schemes are introduced to offer insight into the complementarity of various feature extraction methods. Specifically, a fuzzy-based fusion method was proposed in [28] that integrated Gabor filters with Retinex filters, resulting in enhanced visibility and recognition capabilities for FV images. In [29], adaptive Gabor filters were combined with SIFT/SURF feature extractors to enhance vein patterns. In [30], the concept of point grouping was incorporated into Gabor filters to effectively capture local vein patterns. The above Gabor filtering technologies primarily extract texture and orientation features in FV images, which are susceptible to image blurring, translation, rotation, and noise. To address these issues, Shi et al. [31] incorporated scattering removal techniques with Gabor filters to improve the clarity and reliability of FV patterns, alleviating the interference of noise and blurring artifacts. Li et al. [32] proposed a histogram of competitive Gabor directional binary statistics (HCGDBS) approach to improve the discriminant ability of features and robustness to variation in image quality.
In recent years, numerous efforts have been directed towards integrating Gabor filters with deep learning networks, aimed at eliminating the constraints of manual parameter tuning and the limited representation capacity of Gabor filters. In [33], Gabor filters were employed as a preprocessing step, where Gabor-filtered images served as the input of the network. Further, in [34], the first layer of the network used Gabor kernels for feature learning, leaving the rest of the layers unchanged. Notably, the parameters of Gabor kernels are learned by backpropagation. In [35], a few of the early convolutional layers were substituted by a parameterized Gabor convolutional layer. Moreover, Luan et al. [36] adopted Gabor filters to modulate learnable convolutional kernels, allowing the network to capture more robust features across orientation and scale variations, without incurring additional computational burden. Similarly, Yao et al. [17] introduced Gabor orientation filters (GoFs) to modulate conventional convolutional kernels and constructed a Siamese network for FV verification.
It is crucial to acknowledge that Gabor filters possess two prominent limitations. First, the maximum bandwidth of a Gabor filter is constrained to approximately one octave, which restricts its ability to cover a wide range of frequencies. Second, Gabor filters are not the preferred choice when seeking broad spectrum information while requiring optimal spatial localization, as this hinders their efficiency in FV feature extraction.

2.2. Log-Gabor Filters

The Log-Gabor filter, proposed by Field [25], serves as an alternative to the Gabor filter with several distinct advantages. In the frequency domain, the Log-Gabor filter exhibits an attenuation rate that aligns more closely with the human visual system. This characteristic makes it more sensitive to low-frequency information and less sensitive to high-frequency information. As a result, the Log-Gabor filter demonstrates stronger anti-interference ability and is more accurate and reliable in extracting multi-scale image features. Among them, Gao [37] pioneered the use of Log-Gabor filters to decompose input images into multiple scales and orientations. Arrospide [38] demonstrated the superiority of Log-Gabor filters over Gabor filters in the context of image-based vehicle verification. Yang et al. [39] employed phase congruency and Log-Gabor energy for multimodal medical image fusion, showcasing the filters’ versatility in fusing diverse image modalities. Bounneche [40] proposed an oriented multi-scale Log-Gabor filter tailored for multispectral palmprint recognition. Lv et al. [41] utilized an odd-symmetric 2D Log-Gabor filter to analyze the phase and amplitude of iris textures across different frequencies and orientations. Shams et al. [42] combined a diffusion-coherence filter with a 2D Log-Gabor filter to enhance fingerprint images. Beyond these applications, Log-Gabor filters have also found their niche in motion estimation [43], remote sensing [44], and numerous other domains.
Overall, Log-Gabor filters exhibit superior performance compared to Gabor filters across various image processing and computer vision applications, particularly in multi-scale feature extraction, frequency feature matching, and noise resilience. Given that Log-Gabor has not yet been harnessed in FVR, we propose to incorporate Log-Gabor filters into the design of a lightweight FVR network. In the following, we will delve into the formulation of Log-Gabor convolutional kernels and the recognition process of our proposed VF-DCN model.

3. Log-Gabor Convolutional Kernels

In this section, 1D and 2D Log-Gabor filtering kernels are presented, and their corresponding parameter selection is discussed.

3.1. Log-Gabor Function

As described in [25], the transfer function of a Log-Gabor filter is a Gaussian function on a logarithmic frequency scale, the corresponding 1D Log-Gabor function is defined as Equation (1):
logG ( f ) = exp log f f 0 2 2 log σ f 0 2 ,
where f 0 is the central frequency of the filter, and σ is the standard deviation used to determine the filter bandwidth. It can be observed from Equation (1) that the frequency response of a Log-Gabor is symmetric on a logarithmic axis.
When extending the 1D Log-Gabor filter to 2D, the filter f in Equation (1) should be constructed in the polar coordinate system of frequency domain due to the singularity of log function at the origin. Specifically, the 2D Log-Gabor is decomposed into two components: radialfilter and angularfilter, so that the bandwidth of each component can be adjusted independently to facilitate analysis. Among these, the radial filter provides a frequency response to determine the frequency band, as described by Equation (2):
logG r ( r ) = exp log r f 0 2 2 · σ r 2 ,
and the angular filter is used to determine the orientation, as described by Equation (3):
logG θ ( θ ) = exp θ θ 0 2 2 · σ θ 2 .
Then, these two components are multiplied together to construct the overall 2D Log-Gabor filter, as shown in Equation (4):
logG ( r , θ ) = logG r ( r ) · logG θ ( θ ) ,
where ( r , θ ) are the polar coordinates, with r representing the radial coordinate and θ representing the angular coordinate. θ 0 is the orientation angle of the filter, and σ r and σ θ are used to determine the radial and angular bandwidths, respectively. Table 1 shows the parameter settings required to build the two components of the 2D Log-Gabor filter, and the selection of specific parameters is discussed below.

3.2. Radial Parameters Selection

In Equation (2), σ r determines the radial filter bandwidth. The smaller the value of σ r , the larger the radial filter bandwidth. Empirically, when the value of σ r is 0.75 , the radial filter bandwidth is approximately one octave, and when the value of σ r is 0.55 , the radial filter bandwidth is approximately two octaves. Figure 1 shows the results of the radial filters under different values of σ r . In our experiments, σ r is set to 0.55 for balancing purposes.
In addition, the filter’s central frequency f 0 is calculated by Equation (5) as the reciprocal of the w a v e l e n g t h .
f 0 = 1 w a v e l e n g t h .
Here, the w a v e l e n g t h is calculated by Equation (6).
w a v e l e n g t h = W m i n · M S 1 , ( S = 1 , , N s c a l e ) .
W m i n is the wavelength of the smallest scale filter, M is the radial scaling factor, which is used to control the successive wavelength of the radial filters, and S denotes the radial filter scales varying from 1 to N s c a l e . When the w a v e l e n g t h is set to the minimum value W m i n , the frequency attains its maximum value. In Section 5.4.1, we discussed the influence of different W m i n on the recognition performance and set W m i n = 2 pixels.

3.3. Angular Parameters Selection

In Equation (3), θ 0 is the orientation angle of the filter, as defined by Equation (7):
θ 0 = i · π N o r i , ( i = 0 , , N o r i 1 ) .
Similarly, the angular bandwidth of the filter is determined by the parameter σ θ , which is calculated by Equation (8).
σ θ = T · π N o r i .
The angular bandwidth determines the directionality of the filter, a narrower bandwidth results in stronger directionality. Moreover, the angular interval between filter orientations is fixed by N o r i . In the frequency domain, the spread of the 2D Log-Gabor filter in the angular orientation is a Gaussian with respect to the polar angle around the center. The angular overlap of the filter transfer functions is controlled by the angular interval between filter orientations and angular scaling factor T. Figure 2 shows the results of angular filters when θ 0 = 0 , N o r i = 10 under different angular scaling factors T. It can be observed that the larger the values of T, the less angular overlap. In the following experiments, T is set to 1.3 to achieve approximately minimal overlap.

3.4. Bank of Log-Gabor Filtering Kernels

With the parameter settings of N s c a l e = 4 and N o r i = 10 , we can obtain a bank of 2D Log-Gabor filters by Equation (4). According to the parameter settings in Table 1, we present the bank of Log-Gabor filters obtained in Figure 3.

4. VF-DCN Model for Finger Vein Recognition

As previously discussed, the human visual system exhibits nonlinear logarithmic characteristics. In this regard, Log-Gabor is consistent with the human visual system, potentially enabling it to encode natural images more efficiently than ordinary Gabor functions. Given the remarkable performance gains achieved by Gabor filters integrated with CNNs in the field of FVR, it is reasonable to hypothesize that the incorporation of Log-Gabor filters into CNNs could further bring improvements. Motivated by the above premise, we integrated Log-Gabor filters with a CNN architecture to devise a uniquely configured multi-scale and multi-orientational finger vein recognition network, namely ‘VF-DCN’.
In this section, the overall framework of our VF-DCN and its processing flow specific to FVR are firstly elaborated. Then, an adaptive orientational filter selection and retention mechanism for Log-Gabor convolutional kernels across various scales is implemented. This stands as the cornerstone of our VF-DCN model, ensuring optimal utilization of Log-Gabor filters for capturing intricate vein patterns across different orientations and scales. Finally, the output feature vectors of image samples are extracted from the well-trained VF-DCN and serve as inputs for downstream recognition or verification tasks.

4.1. Framework of VF-DCN Model

The overall framework of the VF-DCN is depicted in Figure 4. It is known as a lightweight network, consisting of a preprocessing stage and an unsupervised training process. Here, the unsupervised training aims to learn the convolutional kernels within its two convolutional layers. By utilizing multi-scale Log-Gabor filters and incorporating the human visual system’s sensitivity to orientation at varying scales, the optimal orientational filters are adaptively identified and function as the final convolutional kernels. For detailed unsupervised training strategies, refer to Section 4.2. Upon completion of the thorough training process, the VF-DCN model transforms into a feature extractor, generating feature vectors that can be directly employed in downstream recognition or verification tasks.

4.1.1. Preprocessing Stage

In the preprocessing step, we employed a synergistic approach that integrates the 3 σ criterion dynamic threshold strategy [1] with the Kirsch detector [45] to localize the region of interest (ROI). Compared to Sobel, Canny, etc., the Kirsch detector exhibits a superior balance in identifying weak edges and minimizing false edges, yielding a clearer binary edge gradient image. Nonetheless, when FV image quality is hindered by uneven illumination and noise, edges may exhibit pronounced discontinuities, and some weak edges may remain undetected. To address this issue, the 3 σ criterion offers three-level dynamic thresholds that automatically adjust to varying image qualities. This ensures the generation of more complete boundary lines, thereby facilitating the efficacy of the ROI extraction process. For illustration, Figure 7c,d show examples of ROI extracted from two FV databases.

4.1.2. Unsupervised Training Process of VF-DCN

In this section, we initially illustrate the network topology of VF-DCN, followed by a detailed exposition of its specific training process.
The backbone of VF-DCN boasts a deliberate three-layer CNN architecture consisting of two consecutive Log-Gabor convolutional layers ( L 1 and L 2 ), followed by a binary hashing and block-wise histogram layer. As shown in Figure 5.
The input layer L 0 comprises the ROI samples derived from the preprocessing stage. Assuming the i-th input ROI sample I i possesses dimensions of μ × ν . For two consecutive Log-Gabor convolutional layers, N s c a l e scales and N o r i orientations of Log-Gabor filters are adaptively constructed, comprising a bank of K 1 and K 2 filtering kernels in each convolutional layer. In the first convolutional layer, each of K 1 filtering kernel is convolved with the input sample I i , forming a total of K 1 output feature maps I i 1 with dimensions of μ × ν , as mathematically expressed in Equation (9):
I i 1 = I i logG 1 1 , 1 = 1 , , K 1 ,
where * signifies the 2D Log-Gabor convolution operation.
After the completion of the first convolutional layer, each K 1 input feature map I i 1 undergoes a convolution operation with every convolution kernel in logG 2 , resulting in a total of K 1 × K 2 output feature maps with dimensions of μ × ν . This transformation is concisely encapsulated in Equation (10):
I i 1 , 2 = I i 1 logG 2 2 , 1 = 1 , , K 1 , 2 = 1 , , K 2 .
Subsequently, binary hashing is performed on the acquired feature maps, and the final histogram features are distilled through block-wise histogram encoding. In this process, the binary layer serves as a nonlinear transformer, leveraging a straightforward binary hashing quantization method to remap the feature maps into a binary representation, as expressed in Equation (11).
T i j = k = 1 K 2 2 k 1 H ( I i j , k ) , j = 1 , , K 1 ,
where H ( · ) is a Heaviside step function that outputs 1 when the variable is positive and 0 otherwise. ∑ denotes the weighted sum of K 2 binary images, so as to obtain the encoded feature maps with integer-valued mode.
The block-wise histogram layer plays the role of feature pooling. It uses simple block-wise histograms of the binary encoding to generate the final 1D feature vector. First, feature map T i j is partitioned into B number of non-overlapping blocks. Then, the histogram of decimal values in each block is computed, and all B block histograms are concatenated into a 1D vector, as expressed in Equation (12).
f i = [ Hist ( T i 1 , 1 ) , , Hist ( T i 1 , B ) , , Hist ( T i K 1 , 1 ) , , Hist ( T i K 1 , B ) ] T ,
where Hist ( · ) is the histogram operation function, and f i ( 2 K 2 ) K 1 B is the learned feature vector corresponding to the input image sample I i .
In short, VF-DCN innovatively incorporates Log-Gabor convolutional kernels to extract multi-scale and multi-orientation human-like visual features, which mitigates overfitting and simplifies the training process. It can be seen as a simple unsupervised deep convolutional network, allowing for random sample selection during network training without the need to tune or optimize various regularization parameters. Moreover, the block-wise histogram of VF-DCN implicitly encodes spatial information in the image, effectively approximating the probability distribution function of image features within each block.

4.2. Adaptive Orientational Filtering Selection

As previously mentioned, the key training objective revolves around determining the optimal Log-Gabor convolutional kernels across two consecutive convolutional layers. To achieve this, we devised an adaptive orientational filter selection and retention strategy across multiple scales, tailored to extract multi-scale features while dynamically selecting the most suitable orientational filters for diverse FV datasets. The learning process of the adaptive filter consists of three main steps:
  • Firstly, a candidate bank of Log-Gabor filters is constructed, comprising 4 scales and 10 orientations. Specifically, the radial filter scale S (as denoted in Equation (6)) is set to { 1 , 2 , 3 , 4 } , and the orientation angle θ 0 (as denoted in Equation (7)) spans from { 0 , π / 10 , π / 5 , 3 π / 10 , 2 π / 5 , π / 2 , 3 π / 5 , 7 π / 10 , 4 π / 5 , 9 π / 10 } .
  • Secondly, for each scale, we carry out a histogram statistical analysis of the most pertinent orientational filters. It should be noted here that the reason why each scale is carried out separately is inspired by the nature of retinal imaging, where fine details become harder to discern at extreme distances due to declined detail resolution, we should adjust to varying focal lengths and perspectives when analyzing objects at different scales. Likewise, in the convolutional layer of the VF-DCN, it becomes imperative to dynamically adjust the number of orientational filters based on the scale’s suitability in extracting features. To address this, we carry out the selection of orientational filters within each scale in turn. Specifically, aims to the aforementioned 10 candidate orientational filters within each scale, each training ROI image I i is convolved with them, resulting in a total of 10 filtered complex images (denoted as r e s F i j , j = 1 , , 10 ). Subsequently, we extract the absolute value of the real part from each filtered complex image r e s F i j to generate the corresponding power map (denoted as p o w M a p F i j ). Next, the magnitude responses of each pixel in these power map images serve as a metric for assessing the filter’s impact on the image. We then sort these magnitude responses in descending order across all pixels and all power maps, simultaneously recording the index of the power map, as well as the corresponding spatial row and column coordinates. This enables us to identify the most prominent orientations—those filters most frequently utilized—by analyzing the statistical histogram of high magnitude responses among the candidate orientational filters.
  • Finally, we retain the filters with the highest count of such high-magnitude responses, effectively fine-tuning the number of orientations at each scale. This strategy ensures that the convolutional filters better reflect the inherent characteristics of the image and the scale’s contribution to feature extraction. By mirroring the adaptability of the human visual system in processing objects at varying distances, this mechanism enhances the efficiency and realism of the convolutional filters.
In order to better understand the whole process of orientational filtering selection, we provide a pseudo-code description in Algorithm 1.
In Algorithm 1, logGabor is the Log-Gabor filter construction function, enabling the generation of Log-Gabor filters tailored to specific scales and orientations as dictated by Formula (4). To efficiently perform approximate Log-Gabor image convolution operations, the algorithm leverages the fft 2 and ifft 2 functions, which represent the two-dimensional discrete Fourier transform and its inverse transform, respectively. Following the convolution operations, the real ( ) function is employed to isolate the real part of the transformed data. The SortFilterResponse ( ) function, whose pseudo-code is detailed in Algorithm 2, plays a pivotal role in sorting the magnitude responses of each pixel across all orientational power maps. Subsequently, the CountMostUsedOri ( ) function, accompanied by its pseudo-code in Algorithm 3, delves into statistical analysis. It meticulously counts the frequency of occurrence of each candidate orientational filter across all pixel positions. Finally, the SelectMostUsedOri ( ) function simplifies the process by directly identifying and selecting the most frequently used orientational filters from the pool of candidates. This streamlined approach ensures that the most representative filters are prioritized for further analysis or application.
Algorithm 1 Pseudo-code of the orientational filter selection algorithm
Input:
  1:
Training ROI images: I i , i = ( 1 , , N ) ;
  2:
Radial filter scale: S = { 1 , 2 , 3 , 4 } ;
  3:
Candidate orientation angle of the filter: θ 0 = { 0 , π / 10 , π / 5 , 3 π / 10 , 2 π / 5 , π / 2 , 3 π / 5 , 7 π / 10 , 4 π / 5 , 9 π / 10 } ;
  4:
Number of scales N s c a l e = 4 , number of candidate orientations N o r i = 10 .
Output:
  5:
The best orientation angles: b e s t θ 0 .
  6:
  7:
// Construct init bank of Log-Gabor filters.
  8:
for s = 1 to N s c a l e  do
  9:
     F i l t e r B a n k ( s ) = logGabor ( s , θ 0 ) ;
10:
end for
11:
12:
// Select the best orientation filters within each scale in turn.
13:
for s = 1 to N s c a l e  do // for each scale
14:
    for i = 1 to N do // for each training sample
15:
        for j = 1 to N o r i  do// for each orientation
16:
            r e s F i j = ifft 2 ( fft 2 ( I i ) . F i l t e r B a n k ( s ) ( j ) ) ;
17:
            p o w M a p F i j = ( real ( r e s F i j ) ) ; // calculate absolution of real part.
18:
        end for
19:
        // Record and sort magnitude responses of each pixel in all orientational power maps.
20:
         s o r t R e s i = SortFilterResponse ( p o w M a p F i ) ;
21:
        // statistical histogram of the candidate orientational filters.
22:
         o r i _ c o u n t i = CountMostUsedOri ( s o r t R e s i , F i l t e r B a n k ( s ) ) ;
23:
         o r i _ c o u n t A l l + = o r i _ c o u n t i ;
24:
    end for
25:
     b e s t θ 0 = SelectMostUsedOri ( o r i _ c o u n t A l l ) . // Choose the most used orientational filters.
26:
end for
As illustrated in Figure 3, filters corresponding to extreme scales, specifically S = 1 and S = 4, are overly large or small, respectively. Conversely, filters at intermediate scales, notably S = 2 and S = 3, contribute more significantly to capturing crucial features. Consequently, for the extreme scales (S = 1 and S = 4), we strategically select a relatively fewer orientational filters (e.g., n1 = n4 = 2), while for the intermediate scales (S = 2 and S = 3), we retain a comparatively higher number of orientational filters (e.g., n2 = n3 = 7).
Surprisingly, the acquired convolutional kernel structure resembles a diamond shape, aptly modeling the human eye’s adaptability to varying focal lengths and perspectives when observing objects at different distances. This feature not only brings a bio-plausible mechanism but also significantly enhances the robustness of a computer vision model when processing real-world images. Figure 6 depicts the adaptive orientational filter learning strategy applied to the convolutional kernels across diverse scales. This strategy enables the model to dynamically refine its orientation selection, optimizing its performance based on the intricacies of the data it encounters.
Algorithm 2 Pseudo-code for SortFilterResponse ( ) function
Input:
  1:
Num of pixels: N p i x = N o r i × r × c ;
  2:
Power Maps: p o w M a p F N o r i × r × c .
Output:
  3:
Sorted magnitude responses of all pixels: s o r t R e s N p i x × 4 .
  4:
  5:
s o r t R e s = z e r o s ( N p i x , 4 ) ;
  6:
t e m p = z e r o s ( N p i x , 4 ) ;
  7:
i d x = 1 ;
  8:
for i = 1 to N o r i  do // for each orientational power map
  9:
    for  r o w s = 1 to r do
10:
        for  c o l s = 1 to c do
11:
            t m p ( i d x , 1 ) = p o w M a p F ( i ) . v a l u e ( r o w s , c o l s ) ;
12:
            t m p ( i d x , 2 ) = r o w s ;
13:
            t m p ( i d x , 3 ) = c o l s ;
14:
            t m p ( i d x , 4 ) = i ;
15:
            i d x + + ;
16:
        end for
17:
    end for
18:
end for
19:
20:
// Sort magnitude responses of each pixel in descending order.
21:
[ s o r t R e s ( : , 1 ) , i d x S o r t ] = s o r t ( t m p ( : , 1 ) , d e s c e n d ) ;
22:
s o r t R e s ( : , 2 : 4 ) = t m p ( i d x S o r t , 2 : 4 ) ;
Algorithm 3 Pseudo code for CountMostUsedOri ( ) function
Input:
  1:
Num of pixels: N p i x = N o r i × r × c ;
  2:
Sorted magnitude responses of all pixels: s o r t R e s N p i x × 4 .
Output:
  3:
Histogram statistics for each candidate orientational filters: h i s t _ c o u n t .
  4:
  5:
h i s t _ c o u n t = z e r o s ( N o r i , 1 ) ;
  6:
t e m p _ r e s F = z e r o s ( N o r i , r , c ) ;
  7:
for  i d x = 1 to N p i x  do // for each pixel
  8:
     c u r r X = s o r t R e s ( i d x , 2 ) ;
  9:
     c u r r Y = s o r t R e s ( i d x , 3 ) ;
10:
     c u r r O = s o r t R e s ( i d x , 4 ) ;
11:
    if  ( t e m p _ r e s F ( c u r r O ) . v a l u e ( c u r r X , c u r r Y ) = = 1 )  then
12:
        continue;
13:
    end if
14:
     h i s t _ c o u n t ( c u r r O ) + = 1 ;
15:
     t e m p _ r e s F ( c u r r O ) . v a l u e ( c u r r Y , c u r r X ) = 1 ;
16:
end for
17:
h i s t _ c o u n t = h i s t _ c o u n t . / s u m ( h i s t _ c o u n t ( : ) ) . // Normalize

4.3. Recognition

Following the aforementioned procedures, we have learned the respective feature vectors for each training image through the VF-DCN framework. These feature vectors exhibit versatility, capable of being applied in both classification and verification scenarios.
Under the classification paradigm, the ensemble of feature vectors { f i } extracted from the FV ROIs serves as the foundational input for determining the class label (or identity) correlated with each feature vector. To assess the proficiency of VF-DCN in extracting highly discriminative feature vectors, we have opted for a simple yet effective classifier: the k-nearest neighbor (k-NN) classifier based on Euclidean distance, with k = 1 (denoted as 1-NN in the following). This choice is advantageous due to its absence of training requirements and the lack of tunable parameters, ensuring a direct evaluation of the feature vectors’ discriminative power.
Figure 6. Adaptive orientational filter learning strategy for the convolutional kernels across different scales.
Figure 6. Adaptive orientational filter learning strategy for the convolutional kernels across different scales.
Sensors 24 06097 g006
Shifting to the verification mode, a crucial matching step ensues. Here, two biometric templates, each encapsulated within their respective feature vectors f i and f j , are compared to yield a corresponding distance metric d i , j = match ( f i , f j ) , where match ( · ) is the Euclidean distance used for quantitative measure of the similarity between the two feature vectors.

5. Experimental Analysis

This section presents the experimental analysis to evaluate the performance of the proposed VF-DCN model. First, Section 5.1 provides the details of the experimental FV databases. Then, Section 5.2 and Section 5.3 present the experimental setting and corresponding evaluation metrics. After that, some key parameters are analyzed in Section 5.4, and the ablation study of the VF-DCN model is presented in Section 5.5. Finally, computational complexity is discussed in Section 5.6, and the comparison with some state-of-the-art methods is presented in Section 5.7.

5.1. Experimental Databases

In our experiments, four distinct finger vein databases: MMCBNU_6000 [46], FV_USM [47], HKPU [5], and our Self-Made ZSC_FV [1] are employed to facilitate a fair and comprehensive comparison. These databases capture FV images under diverse conditions and heterogeneous acquisition devices, thereby ensuring the robustness and representativeness of our evaluation for real-world applications. Table 2 shows the pertinent characteristics of the four FV databases, and Figure 7 visually depicts the ROIs of each database.

5.1.1. MMCBNU_6000 [46]

MMCBNU_6000 database (MMCBNU_6000 is available at http://multilab.jbnu.ac.kr/MMCBNU_6000, accessed on 1 December 2023) is created by Jeonbuk National University in Korea. It comprises 6000 FV images from 600 fingers belonging to 100 diverse subjects, encompassing students and professors from CBNU. These subjects originate from 20 countries spanning Asia, Europe, and America, offering a wide range of FV patterns. The database records six fingers per subject—the index, middle, and ring fingers of both hands, with each finger imaged ten times in a single session. The FV images are saved in bitmap (.bmp) format, alongside predefined region of interest (ROI) images with dimensions of 128 × 60 (as depicted in Figure 7a). Statistical analysis using the 3 σ criterion [1] reveals that 94.78 % of the images, or 5687 in total, exhibit good quality, while 0.88 % (53 images) are of poor quality, with the remainder falling into the medium quality category. This distribution indicates the robustness and suitability of the MMCBNU_6000 database for research and evaluation endeavors.
Figure 7. ROI images of four FV databases, in which, ROIs in (a,b) are provided by the dataset itself, while ROIs in (c,d) are extracted by 3σ criterion [1].
Figure 7. ROI images of four FV databases, in which, ROIs in (a,b) are provided by the dataset itself, while ROIs in (c,d) are extracted by 3σ criterion [1].
Sensors 24 06097 g007aSensors 24 06097 g007b

5.1.2. FV_USM [47]

FV_USM database (FV_USM is available at http://drfendi.com/fv_usm_database/, accessed on 1 December 2023) is created by the University of Sains Malaysia. It comprises 5904 FV images from 492 fingers belonging to 123 individuals, including 83 males and 40 females. These participants, exclusively Asian, are staff and students of USM, spanning ages 20 to 52. For each individual, images of four fingers were captured: the index and middle fingers of both hands. This process was repeated in two distinct sessions, with six captures per finger per session, totaling 12 images per finger. To simulate real-world verification scenarios, where multiple images of the same finger may be available, experimental evaluations often blend images from both sessions for the same finger. All captured FV images are saved in JPEG format, accompanied by predefined ROI with dimensions of 300 × 100 (as depicted in Figure 7b). Statistical analysis using the 3 σ criterion [1] reveals that 83.43 % of the images (4926) are of good quality, while 2.98 % (176) are deemed poor quality. The remainder falls into the medium-quality category. Although the FV_USM database boasts a slightly lower percentage of top-tier images compared to the MMCBNU_6000 database, it nonetheless offers a valuable resource for research and evaluation purposes.

5.1.3. HKPU [5]

HKPU database (HKPU is available at http://www4.comp.polyu.edu.hk/~csajaykr/fvdatabase.htm, accessed on 1 December 2023), developed by the Hong Kong Polytechnic University, comprises 3132 FV images from 312 fingers of 156 individuals, predominantly under 30 years old. Each participant contributed images of their left index and middle fingers, captured in two separate sessions spanning from one month to over six months apart, with an average interval of 66.8 days. The first session yielded 1872 samples, while the second session gathered 1260 samples from the first 210 fingers. To simulate a real-world scenario, images from the same finger across sessions are intermixed. All finger vein images are saved in bitmap (.bmp) format and were captured under a non-contact acquisition environment, resulting in noise, rotational, and translational variations. The original image size is 513 × 256 pixels, and undergoes ROI segmentation during preprocessing as described in [1] (refer to Figure 7c). Statistical analysis using the 3 σ criterion [1] reveals that 29.31 % of the images (918) are classified as good quality, 22.16 % (694 images) as poor quality, with the remainder deemed medium quality. This indicates the relatively low proportion of high-quality images in HKPU compared to other databases.

5.1.4. ZSC_FV [1]

ZSC_FV database, created by our team, contains 37,080 FV images collected from 1030 undergraduate students, all within the age range of 18 to 22 years old. Each student contributed 36 images—six samples from the index, middle, and ring fingers of both hands. The acquisition process was conducted indoors under varying illumination conditions, enriching its analytical potential. The capturing device was manufactured by Beijing YanNan Tech Co., Ltd. (Beijing, China). All finger vein images are saved in bitmap (.bmp) format with a resolution of 512 × 384 pixels. Prior to analysis or use in FVR, these images undergo pre-processing that includes ROI segmentation [1] (as shown in Figure 7d). Statistical analysis using the 3 σ criterion [1] reveals that 94.63 % (totaling 35,090 samples) comprises good quality images. Conversely, 4.8 % of the images (1778 samples) are classified as poor quality, while the remainder falls into the medium quality category. ZSC_FV provides a substantial and diverse dataset of FV images from a young population, and captured under varying conditions, offers compelling experimental results to prove the superiority of our proposed methods.

5.2. Experimental Setting

Our experiments were carried out under a computing environment with 3.6 GHz Intel Core i7 CPU (Intel Corporation, Santa Clara, CA, USA) and 32 GB RAM. We adopted an open-set protocol, ensuring that the training and testing sets were entirely non-overlapping. Specifically, for each database, approximately 50 % of fingers were randomly selected for training, with the remainder reserved for testing. Notably, in scenarios where a finger was captured across two sessions, we consolidated the images to simulate a realistic data collection scenario, maintaining the distinctiveness between training and testing fingers. The classification and verification tasks were solely executed on the testing set, and the final results were averaged over five iterations for enhanced accuracy. In the verification phase, Euclidean distance served as the metric for similarity assessment.

5.3. Evaluation Metrics

As performance metrics, we focused on the equal error rate (EER), accuracy (ACC), and the receiver operating characteristic (ROC) curve, which are widely recognized standards for evaluating the performance of FVR [17].
The EER signifies the optimal balance between the False Acceptance Rate (FAR) and the False Rejection Rate (FRR), with a lower EER indicating superior verification performance. Among these, FAR quantifies the error rate where the unenrolled FV images are accepted as enrolled images, the corresponding formula is shown in Equation (13).
FAR = N u m b e r o f F a l s e A c c e p t a n c e s N u m b e r o f I m p o s t e r V e r i f i c a t i o n A t t e m p t s × 100 % ,
while FRR represents the error rate where the enrolled FV images are rejected as unenrolled images. The corresponding formula is shown in Equation (14).
FRR = N u m b e r o f F a l s e R e j e c t i o n s N u m b e r o f G e n u i n e V e r i f i v a t i o n A t t e m p t s × 100 % .

5.4. Key Parameters Analysis

In this experiment, we analyzed some key parameters used in the VF-DCN model, allowing us to understand the specific impact of each parameter on the overall performance. As discussed in Section 3, some key parameters, including the W m i n , M (radial scaling factor), and T (angular scaling factor) will affect the representation ability of Log-Gabor, so we chose these parameters for testing. When these three parameters are set, the central frequency of the filter f 0 and the angular standard deviation σ θ are also set by Equations (5) and (8). It should be noted that each sub-experiment focuses on evaluating one parameter while keeping the others fixed according to Table 1, and the FV database adopted is MMCBNU_6000.
By systematically varying each parameter and observing the changes in recognition performance, we can gain insights into how these parameters influence the filter’s effectiveness. Specifically, the diamond convolution structure utilized is [2,7,7,2].

5.4.1. W m i n

This sub-experiment delves into exploring the impact of adjusting W m i n on recognition performance. Upon setting W m i n , the maximum frequency is derived using Equations (5) and (6). Table 3 presents the recognition performance, and Figure 8a visually illustrates the trend of EER as W m i n varies. Notably, when the value of W m i n is set to 2, a relatively superior performance is achieved.

5.4.2. Radial Scaling Factor (M)

This sub-experiment investigates the effect of varying the Radial Scaling Factor (M) on recognition performance. By adjusting M, a sequence of wavelengths and corresponding frequencies are generated, adhering to Equations (5) and (6). Our findings in Table 4 reveal that while variations in M have a relatively minor influence on ACC, they significantly impact the EER. Specifically, as M increases from 1.4 to 2.2 , the EER continuously decreases, indicating an enhanced recognition performance. Figure 8b illustrates the trend of EER, showing how EER improves with increasing M values.

5.4.3. Angular Scaling Factor (T)

This section investigates the impact of varying T (Angular Scaling Factor) on recognition performance. As elaborated in Section 3.3, Equation (8) underscores the role of T in influencing the σ θ . Table 5 presents the recognition performance under various T values. Figure 8c visually depicts the trend of EER as T varies. When the value of T is set to 1.3 , a relatively superior performance is observed, indicating an optimal setting for maximizing recognition accuracy. This adjustment ensures a smooth and effective balance of the angular scaling, thereby enhancing the overall recognition performance.

5.5. Ablation Study

In this section, we conduct ablation studies to gain insights into the individual contributions of different scales to the discriminative features and identify the optimal diamond-shaped convolutional structure that maximizes performance. It is important to note that for this study, we utilize the parameter settings detailed in Table 1, specifically M = 2.2, σ r = 0.55, T = 1.3, W m i n = 2.0, and all ROIs are resized to 32 × 32 .
Firstly, we test the contribution degree of the four scales to the discriminative feature. To do this, we choose 10 orientations from each single scale. In the first column of Table 6, [ 10 , 0 , 0 , 0 ] indicates that 10 orientations are chosen from scale S = 1 , with no orientations selected from the other scales. Similar interpretations apply for [ 0 , 10 , 0 , 0 ] , which means 10 orientations are chosen from scale S = 2 , with no orientations selected from the other scales. From Table 6, the EER on S = 1 is too high, and the EER on S = 4 takes the second place, revealing that using only the smallest ( S = 1 ) or largest ( S = 4 ) scales results in unacceptably high EERs, akin to the visual blurring that occurs when observing objects at extreme distances or proximities. Conversely, scales at S = 2 and S = 3 demonstrate relatively lower EERs, suggesting that intermediate scales contribute more effectively to the discriminative features.
Secondly, we explore the effectiveness of various diamond-shaped convolutional structures. In the first column of Table 6, [ 2 , 7 , 7 , 2 ] signifies that the two most predominant orientations are selected on scales S = 1 and S = 4 , respectively, while the seven most predominant orientations are selected on scales S = 2 and S = 3 , respectively. From Table 6, the diamond convolutional structure [ 2 , 7 , 7 , 2 ] consistently outperforms other configurations across four databases, as evident from the EER values reported in Table 6 and further illustrated in Figure 9. This optimal structure effectively balances the orientation selection across scales, leading to improved recognition performance.

5.6. Feature Extraction Time

In this experiment, we conducted a comprehensive analysis of the feature extraction time for various diamond-shaped convolutional structures. Table 7 presents the feature extraction times (in seconds) for these structures across four FV databases. A clear trend emerges from the results: the fewer orientations selected within a given structure, the lower the time required for feature extraction. Although the structure [ 2 , 7 , 7 , 2 ] inevitably takes longer due to its increased number of orientations, it is noteworthy that the time cost for our proposed method remains exceptionally low, at approximately 0.0356 s. This is a testament to the efficiency of our VF-DCN model, even when compared to other DL methods [14], which often come with significantly higher computational overheads. Therefore, our VF-DCN model not only achieves superior performance in terms of recognition accuracy but also maintains an acceptable feature extraction time, making it suitable for real-time applications. The balance between effectiveness and efficiency underscores the practicality and value of our proposed diamond-shaped convolutional structure.

5.7. Comparison Experiment

In this experiment, we conducted a thorough comparison of our proposed VF-DCN against the following typical and recent FV feature representation and recognition methods in terms of EER and ACC.
(1)
RLF [6]: RLF is a handcrafted method, which combines curvature and radon-like features, can effectively aggregate the dispersed spatial information around the vein structures, thus highlighting vein patterns and suppressing spurious non-boundary responses and noises, obtaining a more smoothing vein structure image. From Table 8, the performance of RLF as a recent handcrafted method is better than GCN and less than other DL methods. It shows that handcrafted methods that are close to human vision also have their advantages.
(2)
GCN [36]: GCN (The source code for GCN can be available at https://github.com/jxgu1016/Gabor_CNN_PyTorch, accessed on 12 January 2024) is a Gabor convolutional network with Gabor filters incorporated into DCNNs. The network is composed of four Gabor convolution layers, a Max-pooling and ReLU following the convolution layer, and a dropout layer after the fully connected layer. From Table 8, although as a DL method, the performance of GCN is limited by the depth of the network.
(3)
PalmNet [48]: PalmNet (The source code for PalmNet can be available at https://github.com/AngeloUNIMI/PalmNet, accessed on 12 January 2024) is a 3-layer CNN with two Gabor convolutional layers and one binarization layer, which uses an innovative unsupervised training algorithm and can tune filters based on a limited quantity of data. PalmNet is a hybrid method comprised of a Gabor filter and a shallow convolution network. From Table 8, the performance is better than other DL methods, proving the idea that fusing handcrafted and DL is feasible.
(4)
SNGR [17]: SNGR was constructed based on a Siamese framework and embedded with a pair of eight-layer tiny ResNets as the backbone branch network. We chose the EER and ACC when the ratio of training and testing data is 9:1, as reported in [17].
(5)
SC-SDCN [14]: SC-SDCN is a DL method, which proposes a sparsified densely connected network with separable convolution. The more training data, the better the performance. For comparison fairly, we chose the EER and ACC when the ratio of training and testing data is 5:5. If the training data increase, the performance also improves, which has been reported in [14]. It shows that the DL method is affected by the training data; however, our proposed VF-DCN requires little data.
(6)
DenseNet161 [49]: DenseNet161 (The source code for DenseNet161 can be available at https://github.com/ridvansalihkuzu/vein-biometrics, accessed on 12 January 2024) is a DL method. We chose the EER and ACC when the ratio of training and testing data is 9:1, which has been reported in [17].
Despite the unique strengths exhibited by all the methods under consideration, the proposed VF-DCN model demonstrates superior performance across four distinct databases, as shown in Table 8. Our method achieves low EERs of 0.17 % , 0.19 % , 2.11 % , and 0.65 % , and high ACCs of 100 % , 99.97 % , 98.92 % , and 99.36 % on the MMCBNU_6000, FV_USM, HKPU, and ZSC_FV databases, respectively. This achievement validates the feasibility of our innovative approach, which integrates simulated retinal imaging techniques with a combination of Log-Gabor filters and a diamond-shaped convolutional structure. The successful integration of these components not only enhances the network’s ability to capture intricate FV features but also showcases the potential of this novel approach in advancing the field of FV technology.
Table 8. Comparison with other methods on four FV databases.
Table 8. Comparison with other methods on four FV databases.
MethodsMMCBNU_6000FV_USMHKPUZSC_FV
EERACCEERACCEERACCEERACC
RLF [6] 0.78 % - 0.87 % - 2.49 % - 1.39 % -
GCN [36] 1.86 % 98.74 % 2.05 % 98.72 % ----
PalmNet [48] 0.21 % 99.97 % 0.28 % 99.97 % 2.73 % 99.30 % 0.73 % 99.10 %
SNGR [17] 0.52 % 99.55 % 0.5 % 99.74 % ----
SC-SDCN [14] 0.58 % 99.68 % 0.82 % 99.62 % ----
DenseNet161 [49] 0.60 % 99.57 % 1.48 % 98.94 % ----
VF-DCN 0 . 17 % 100 % 0 . 19 % 99 . 97 % 2 . 11 % 98 . 92 % 0 . 65 % 99 . 36 %

6. Conclusions

In this paper, we carried out a hybrid exploration of Log-Gabor and a diamond convolutional structure. The advantages of the proposed VF-DCN are as follows:
(1)
IntegrationofLog-GaborFilters: Log-Gabor filters are well-suited for natural image processing due to their ability to capture the statistical properties of natural scenes. By incorporating Log-Gabor filters into our network architecture, we effectively leverage their benefits for improved image feature extraction and representation.
(2)
DiamondConvolutionalStructure: This structure enables the network to capture spatial information in a more efficient and effective manner, leading to improved performance.
(3)
SimulatingRetinalImaging: By combining Log-Gabor filters and diamond convolutions, we created a network that simulates the processes of the human retina. This approach results in a network that is better able to represent and process visual information in a way that is similar to the human visual system.
(4)
ImprovedPerformance: The fact that VF-DCN achieves the best performance compared to other methods is a clear indication that our approach is effective. This not only validates our idea but also demonstrates the potential of combining Log-Gabor filters and diamond convolutions for visual information processing tasks.
(5)
PotentialforFurtherApplications: The success of VF-DCN in achieving superior performance suggests that this approach has the potential to be applied to a wide range of image processing and computer vision tasks, such as object detection, image segmentation, and visual recognition.
While the VF-DCN excels as an efficient lightweight network model, featuring just two convolutional layers, it is prudent to acknowledge its inherent limitations in extracting deeper, more abstract features. Consequently, there is a pressing need to delve deeper into extending this network model, exploring ways to transform it into a deeper, more comprehensive architecture. Furthermore, although the adaptive learning strategy of orientational filters is indeed inspired by the intricate workings of the human visual system, it is imperative to undertake rigorous research to determine the optimal number of orientational filters at each scale. Looking ahead, we plan to continue this line of research and endeavor to integrate VF-DCN with self-attention mechanisms, thereby enhancing the network’s ability to mimic the fundamental principles underlying biological visual imaging systems even more closely.

Author Contributions

Methodology, Q.Y., D.S. and X.X.; writing—original draft preparation, Q.Y. and X.X.; writing—review and editing, Q.Y., D.S., X.X. and K.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (62271130), the Guangdong Basic and Applied Basic Research Foundation (2023A1515010066), the Science and Technology Foundation of Guangdong Province (2021A0101180005), the Key Area Special Fund of Guangdong Provincial Department of Education (2022ZDZX3042), and the Social Welfare and Basic Research Projects of Zhongshan (2023B2042, 2021B2006, 2021B2018).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Acknowledgments

The authors would like to take this opportunity to thank the Editors and anonymous reviewers for their detailed comments and suggestions, which greatly helped us to improve the clarity and presentation of our manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yao, Q.; Song, D.; Xu, X. Robust Finger-vein ROI Localization Based on the 3σ Criterion Dynamic Threshold Strategy. Sensors 2020, 20, 3997. [Google Scholar] [CrossRef] [PubMed]
  2. Miura, N.; Nagasaka, A.; Miyatake, T. Feature extraction of finger-vein patterns based on repeated line tracking and its application to personal identification. Mach. Vis. Appl. 2004, 15, 194–203. [Google Scholar] [CrossRef]
  3. Miura, N.; Nagasaka, A.; Miyatake, T. Extraction of finger-vein patterns using maximum curvature points in image profiles. IEICE-Trans. Inf. Syst. 2007, E90-D, 1185–1194. [Google Scholar] [CrossRef]
  4. Yang, J.; Yang, J.; Shi, Y. Finger-Vein Segmentation Based on Multi-channel Even-symmetric Gabor Filters. In Proceedings of the 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems, Shanghai, China, 20–22 November 2009; Volume 4, pp. 500–503. [Google Scholar]
  5. Kumar, A.; Zhou, Y. Human Identification Using Finger Images. IEEE Trans. Image Process. 2012, 21, 2228–2244. [Google Scholar] [CrossRef]
  6. Yao, Q.; Song, D.; Xu, X.; Zou, K. A Novel Finger Vein Recognition Method Based on Aggregation of Radon-Like Features. Sensors 2021, 21, 1885. [Google Scholar] [CrossRef]
  7. Lv, W.; Ma, H.; Li, Y. A finger vein authentication system based on pyramid histograms and binary pattern of phase congruency. Infrared Phys. Technol. 2023, 132, 104728. [Google Scholar] [CrossRef]
  8. Huang, H.; Liu, S.; Zheng, H.; Ni, L.; Zhang, Y.; Li, W. DeepVein: Novel finger vein verification methods based on Deep Convolutional Neural Networks. In Proceedings of the 2017 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA), New Delhi, India, 22–24 February 2017; pp. 1–8. [Google Scholar]
  9. Anas Bilal, G.S.; Mazhar, S. Finger-vein recognition using a novel enhancement method with convolutional neural network. J. Chin. Inst. Eng. 2021, 44, 407–417. [Google Scholar] [CrossRef]
  10. Fairuz, S.; Habaebi, M.H.; Elsheikh, E.M.A. Finger Vein Identification Based on Transfer Learning of AlexNet. In Proceedings of the 7th International Conference on Computer and Communication Engineering (ICCCE), Kuala Lumpur, Malaysia, 19–20 September 2018; pp. 465–469. [Google Scholar]
  11. Lu, Y.; Xie, S.; Wu, S. Exploring Competitive Features Using Deep Convolutional Neural Network for Finger Vein Recognition. IEEE Access 2019, 7, 35113–35123. [Google Scholar] [CrossRef]
  12. Kim, W.; Song, J.M.; Park, K.R. Multimodal Biometric Recognition Based on Convolutional Neural Network by the Fusion of Finger-Vein and Finger Shape Using Near-Infrared (NIR) Camera Sensor. Sensors 2018, 18, 2296. [Google Scholar] [CrossRef]
  13. Song, J.M.; Kim, W.; Park, K.R. Finger-Vein Recognition Based on Deep DenseNet Using Composite Image. IEEE Access 2019, 7, 66845–66863. [Google Scholar] [CrossRef]
  14. Yao, Q.; Xu, X.; Li, W. A Sparsified Densely Connected Network with Separable Convolution for Finger-Vein Recognition. Symmetry 2022, 14, 2686. [Google Scholar] [CrossRef]
  15. Noh, K.J.; Choi, J.; Hong, J.S.; Park, K.R. Finger-Vein Recognition Based on Densely Connected Convolutional Network Using Score-Level Fusion with Shape and Texture Images. IEEE Access 2020, 8, 96748–96766. [Google Scholar] [CrossRef]
  16. Tang, S.; Zhou, S.; Kang, W.; Wu, Q.; Deng, F. Finger vein verification using a Siamese CNN. IET Biom. 2019, 8, 306–315. [Google Scholar] [CrossRef]
  17. Yao, Q.; Chen, C.; Song, D.; Xu, X.; Li, W. A Novel Finger Vein Verification Framework Based on Siamese Network and Gabor Residual Block. Mathematics 2023, 11, 3190. [Google Scholar] [CrossRef]
  18. Shaheed, K.; Mao, A.; Qureshi, I.; Kumar, M.; Hussain, S.; Ullah, I.; Zhang, X. DS-CNN: A pre-trained Xception model based on depth-wise separable convolutional neural network for finger vein recognition. Expert Syst. Appl. 2022, 191, 116288. [Google Scholar] [CrossRef]
  19. Hou, B.; Yan, R. Triplet-Classifier GAN for Finger-Vein Verification. IEEE Trans. Instrum. Meas. 2022, 71, 1–12. [Google Scholar] [CrossRef]
  20. Kuzu, R.S.; Maiorana, E.; Campisi, P. Vein-based Biometric Verification using Transfer Learning. In Proceedings of the 43rd International Conference on Telecommunications and Signal Processing (TSP), Milan, Italy, 7–9 July 2020; pp. 403–409. [Google Scholar]
  21. Zhao, P.; Song, Y.; Wang, S.; Xue, J.H.; Zhao, S.; Liao, Q.; Yang, W. VPCFormer: A transformer-based multi-view finger vein recognition model and a new benchmark. Pattern Recognit. 2024, 148, 110170. [Google Scholar] [CrossRef]
  22. Li, M.; Gong, Y.; Zheng, Z. Finger Vein Identification Based on Large Kernel Convolution and Attention Mechanism. Sensors 2024, 24, 1132. [Google Scholar] [CrossRef]
  23. Devkota, N.; Kim, B.W. Finger Vein Recognition Using DenseNet with a Channel Attention Mechanism and Hybrid Pooling. Electronics 2024, 13, 501. [Google Scholar] [CrossRef]
  24. Li, X.; Feng, J.; Cai, J.; Lin, G. FV-MViT: Mobile Vision Transformer for Finger Vein Recognition. Sensors 2024, 24, 1331. [Google Scholar] [CrossRef]
  25. Field, D.J. Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am. A Opt. Image Sci. 1987, 4, 2379–2394. [Google Scholar] [CrossRef] [PubMed]
  26. Yang, J.; Shi, Y.; Yang, J. Finger-Vein Recognition Based on a Bank of Gabor Filters. In Proceedings of the Computer Vision—ACCV 2009, Xi’an, China, 23–27 September 2009; Springer: Berlin/Heidelberg, Germany, 2010; pp. 374–383. [Google Scholar]
  27. Wang, R.; Wang, G.; Chen, Z.; Zeng, Z.; Wang, Y. A palm vein identification system based on Gabor wavelet features. Neural Comput. Appl. 2014, 24, 161–168. [Google Scholar] [CrossRef]
  28. Shin, K.Y.; Park, Y.H.; Nguyen, D.T.; Park, K.R. Finger-Vein Image Enhancement Using a Fuzzy-Based Fusion Method with Gabor and Retinex Filtering. Sensors 2014, 14, 3095–3129. [Google Scholar] [CrossRef] [PubMed]
  29. Kovač, I.; Marák, P. Finger vein recognition: Utilization of adaptive gabor filters in the enhancement stage combined with SIFT/SURF-based feature extraction. Signal Image Video Process. 2023, 17, 635–641. [Google Scholar] [CrossRef]
  30. Yang, L.; Yang, G.; Wang, K.; Liu, H.; Xi, X.; Yin, Y. Point Grouping Method for Finger Vein Recognition. IEEE Access 2019, 7, 28185–28195. [Google Scholar] [CrossRef]
  31. Shi, Y.; Yang, J. Image restoration and enhancement for finger-vein recognition. In Proceedings of the 2012 IEEE 11th International Conference on Signal Processing, Beijing, China, 21–25 October 2012; Volume 3, pp. 1605–1608. [Google Scholar]
  32. Li, M.; Wang, H.; Li, L.; Zhang, D.; Tao, L. Finger Vein Recognition Based on a Histogram of Competitive Gabor Directional Binary Statistics. J. Database Manag. 2023, 34, 1–19. [Google Scholar] [CrossRef]
  33. Calderon, A.F.L.; Roa, S.; Victorino, J. Handwritten Digit Recognition using Convolutional Neural Networks and Gabor filters. In Proceedings of the 2003 International Congress on Computational Intelligence, Medellín, Colombia, 6–8 November 2003. [Google Scholar]
  34. Alekseev, A.; Bobe, A. GaborNet: Gabor filters with learnable parameters in deep convolutional neural network. In Proceedings of the 2019 International Conference on Engineering and Telecommunication (EnT), Dolgoprudny, Russia, 20–21 November 2019; pp. 1–4. [Google Scholar]
  35. Pérez, J.C.; Alfarra, M.; Jeanneret, G.; Bibi, A.; Thabet, A.; Ghanem, B.; Arbeláez, P. Gabor Layers Enhance Network Robustness. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 450–466. [Google Scholar]
  36. Luan, S.; Chen, C.; Zhang, B.; Han, J.; Liu, J. Gabor Convolutional Networks. IEEE Trans. Image Process. 2018, 27, 4357–4366. [Google Scholar] [CrossRef]
  37. Gao, X.; Sattar, F.; Venkateswarlu, R. Multiscale Corner Detection of Gray Level Images Based on Log-Gabor Wavelet Transform. IEEE Trans. Circuits Syst. Video Technol. 2007, 17, 868–875. [Google Scholar]
  38. Arróspide, J.; Salgado, L. Log-Gabor Filters for Image-Based Vehicle Verification. IEEE Trans. Image Process. 2013, 22, 2286–2295. [Google Scholar] [CrossRef]
  39. Yang, Y.; Tong, S.; Huang, S.; Lin, P. Log-Gabor energy based multimodal medical image fusion in NSCT domain. Comput. Math. Methods Med. 2014, 2014, 835481. [Google Scholar] [CrossRef]
  40. Bounneche, M.D.; Boubchir, L.; Bouridane, A.; Nekhoul, B.; Ali-Chérif, A. Multi-spectral palmprint recognition based on oriented multiscale log-Gabor filters. Neurocomputing 2016, 205, 274–286. [Google Scholar] [CrossRef]
  41. Lv, L.; Yuan, Q.; Li, Z. An algorithm of Iris feature-extracting based on 2D Log-Gabor. Multimed. Tools Appl. 2019, 78, 22643–22666. [Google Scholar] [CrossRef]
  42. Shams, H.; Jan, T.; Ali, A.; Ahmad, N.; Munir, A.; Khalil, R.A. Fingerprint image enhancement using multiple filters. PeerJ Comput. Sci. 2023, 9, e1183. [Google Scholar] [CrossRef] [PubMed]
  43. Wang, Y.; Lu, H.; Qin, X.; Guo, J. Residual Gabor convolutional network and FV-Mix exponential level data augmentation strategy for finger vein recognition. Expert Syst. Appl. 2023, 223, 119874. [Google Scholar] [CrossRef]
  44. Zhu, B.; Yang, C.; Dai, J.; Fan, J.; Qin, Y.; Ye, Y. R2FD2: Fast and Robust Matching of Multimodal Remote Sensing Images via Repeatable Feature Detector and Rotation-Invariant Feature Descriptor. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–15. [Google Scholar] [CrossRef]
  45. Kirsch, R.A. Computer determination of the constituent structure of biological images. Comput. Biomed. Res. 1971, 4, 315–328. [Google Scholar] [CrossRef]
  46. Lu, Y.; Xie, S.J.; Yoon, S.; Wang, Z.; Park, D.S. An available database for the research of finger vein recognition. In Proceedings of the 2013 6th International Congress on Image and Signal Processing (CISP), Hangzhou, China, 16–18 December 2013; Volume 1, pp. 410–415. [Google Scholar]
  47. Mohd Asaari, M.S.; Suandi, S.A.; Rosdi, B.A. Fusion of Band Limited Phase Only Correlation and Width Centroid Contour Distance for finger based biometrics. Expert Syst. Appl. 2014, 41, 3367–3382. [Google Scholar] [CrossRef]
  48. Genovese, A.; Piuri, V.; Plataniotis, K.N.; Scotti, F. PalmNet: Gabor-PCA Convolutional Networks for Touchless Palmprint Recognition. IEEE Trans. Inf. Forensics Secur. 2019, 14, 3160–3174. [Google Scholar] [CrossRef]
  49. Kuzu, R.S.; Piciucco, E.; Maiorana, E.; Campisi, P. On-the-Fly Finger-Vein-Based Biometric Recognition Using Deep Neural Networks. IEEE Trans. Inf. Forensics Secur. 2020, 15, 2641–2654. [Google Scholar] [CrossRef]
Figure 1. Radial filters under different values of σ r .
Figure 1. Radial filters under different values of σ r .
Sensors 24 06097 g001
Figure 2. Angular filters under different angular scaling factors T.
Figure 2. Angular filters under different angular scaling factors T.
Sensors 24 06097 g002
Figure 3. Bank of Log-Gabor filters. Each row in (c) contains filters computed with the same scale, for each scale, 10 orientations are sampled.
Figure 3. Bank of Log-Gabor filters. Each row in (c) contains filters computed with the same scale, for each scale, 10 orientations are sampled.
Sensors 24 06097 g003aSensors 24 06097 g003b
Figure 4. Illustration of the framework of VF-DCN.
Figure 4. Illustration of the framework of VF-DCN.
Sensors 24 06097 g004
Figure 5. Diamond convolutional structure of VF-DCN.
Figure 5. Diamond convolutional structure of VF-DCN.
Sensors 24 06097 g005
Figure 8. Trend of EER at varying parameters.
Figure 8. Trend of EER at varying parameters.
Sensors 24 06097 g008
Figure 9. ROC curves of various diamond-shaped convolutional structures on four finger vein databases.
Figure 9. ROC curves of various diamond-shaped convolutional structures on four finger vein databases.
Sensors 24 06097 g009
Table 1. Parameters setting of the 2D Log-Gabor filter.
Table 1. Parameters setting of the 2D Log-Gabor filter.
ParameterValueDescription
Radial W m i n 2Wavelength of the smallest scale filter
σ r 0.55Radial standard deviation
N s c a l e 4Number of radial filter scales
M2.2Radial scaling factor
Angular σ θ Angular standard deviation
N o r i 10Number of filter orientations
T1.3Angular scaling factor
Table 2. Details of four FV databases (For column ‘Fingers’, i: index, m: middle, r: ring. For column ‘Hands’, l: left hand, r: right hand).
Table 2. Details of four FV databases (For column ‘Fingers’, i: index, m: middle, r: ring. For column ‘Hands’, l: left hand, r: right hand).
DatabasesTotal
Num of
Images
Num of
Finger
Classes
Num of
Subjects
FingersHandsNum of
Images
per Finger
SessionsROI
MMCBNU_60006000600100i, m, rl, r101provided
FV_USM5904492123i, ml, r122provided
HKPU3132312156i, ml6/1223 σ criterion [1]
ZSC_FV37,08061891030i, m, rl, r613 σ criterion [1]
Table 3. Varying W m i n results on recognition performance.
Table 3. Varying W m i n results on recognition performance.
1.01.52.02.53.0
EER 0.51 % 0.18 % 0 . 17 % 0.25 % 0.31 %
ACC 99.97 % 100 % 100 % 99.97 % 99.93 %
Table 4. Different M (radial scaling factor) results on recognition performance.
Table 4. Different M (radial scaling factor) results on recognition performance.
1.41.51.61.71.81.92.02.12.22.3
EER 1.44 % 0.61 % 0.36 % 0.21 % 0.20 % 0.21 % 0.24 % 0.22 % 0 . 17 % 0.21 %
ACC 99.31 % 99.84 % 99.97 % 100 % 100 % 100 % 100 % 99.97 % 100 % 100 %
Table 5. Different T (angular scaling factor) results on recognition performance.
Table 5. Different T (angular scaling factor) results on recognition performance.
1.01.11.21.31.41.5
EER 0.180 % 0.172 % 0.176 % 0 . 165 % 0.197 % 0.230 %
ACC 100 % 100 % 100 % 100 % 100 % 100 %
Table 6. Number of convolution results on four FV databases.
Table 6. Number of convolution results on four FV databases.
Number of ConvolutionMMCBNU_6000FV_USMHKPUZSC_FV
EERACCEERACCEERACCEERACC
[ 10 , 0 , 0 , 0 ] 12.88 % 46.37 % 24.34 % 19.93 % 19.45 % 53.11 % 14.11 % 37.76 %
[ 0 , 10 , 0 , 0 ] 0.91 % 99.84 % 1.95 % 98.81 % 7.33 % 94.99 % 2.68 % 93.15 %
[ 0 , 0 , 10 , 0 ] 0.60 % 99.85 % 0.71 % 99.80 % 5.91 % 95.50 % 1.55 % 97.48 %
[ 0 , 0 , 0 , 10 ] 1.89 % 99.34 % 1.70 % 99.25 % 9.38 % 88.40 % 4.30 % 83.13 %
[ 1 , 6 , 6 , 1 ] 0.21 % 100 % 0.19 % 99.97 % 2.41 % 98.67 % 0.91 % 98.74 %
[ 2 , 6 , 6 , 2 ] 0.21 % 100 % 0.21 % 99.97 % 2.20 % 98.92 % 0.83 % 98.80 %
[ 1 , 7 , 7 , 1 ] 0.18 % 100 % 0.20 % 99.97 % 2.44 % 98.61 % 0.72 % 99.18 %
[ 2 , 7 , 7 , 2 ] 0.17 % 100 % 0.19 % 99.97 % 2.11 % 98.92 % 0.65 % 99.36 %
Table 7. Feature extraction time(/s) of various diamond structures on four FV databases.
Table 7. Feature extraction time(/s) of various diamond structures on four FV databases.
Diamond-ShapeMMCBNU_6000FV_USMHKPUZSC_FV
[ 10 , 0 , 0 , 0 ] 0.00300.00310.00240.0059
[ 0 , 10 , 0 , 0 ] 0.00300.00310.00230.0058
[ 0 , 0 , 10 , 0 ] 0.00250.00260.00190.0043
[ 0 , 0 , 0 , 10 ] 0.00190.00190.00170.0030
[ 1 , 6 , 6 , 1 ] 0.00680.00660.02730.0117
[ 2 , 6 , 6 , 2 ] 0.01110.01030.02720.0386
[ 1 , 7 , 7 , 1 ] 0.01110.01090.02820.0178
[ 2 , 7 , 7 , 2 ] 0.03560.03440.02850.0408
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yao, Q.; Song, D.; Xu, X.; Zou, K. Visual Feature-Guided Diamond Convolutional Network for Finger Vein Recognition. Sensors 2024, 24, 6097. https://doi.org/10.3390/s24186097

AMA Style

Yao Q, Song D, Xu X, Zou K. Visual Feature-Guided Diamond Convolutional Network for Finger Vein Recognition. Sensors. 2024; 24(18):6097. https://doi.org/10.3390/s24186097

Chicago/Turabian Style

Yao, Qiong, Dan Song, Xiang Xu, and Kun Zou. 2024. "Visual Feature-Guided Diamond Convolutional Network for Finger Vein Recognition" Sensors 24, no. 18: 6097. https://doi.org/10.3390/s24186097

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop