1. Introduction
A person can be recognized in security systems by a unique username and password, but they can be readily stolen [
1]. The fingerprint is one of the first imaging modalities of biometric identification. It is more accurate and less expensive than other biometric modalities [
2,
3]. A fingerprint’s surface has ridges and valleys, which do not change during a lifetime [
4]. Fingerprint recognition can be used for authentication or identifying purposes. In verification, the fingerprint is compared to the templates of a particular subject in the database, but in identification, the unknown fingerprint is compared to the templates of all subjects in the database to ascertain the subject’s identity [
5]. Fingerprints are gaining in popularity and their datasets are becoming increasingly large. They are recorded utilizing a variety of low-cost embedded sensors in smart devices such as smartphones and computers. The high processing complexity of a fingerprint identification system is one of its primary drawbacks. One way to address this issue is to categorize fingerprints in a database to condense the search space. The existing classification methods are effective when fingerprints are recorded using the same sensor. However, when fingerprints are collected using various sensors (referred to as cross-sensor or sensor interoperability problem), classification performance is deteriorated; even verification of the same person’s finger is degraded [
6,
7,
8]. While considerable research has been conducted on cross-sensor fingerprint verification [
8,
9,
10,
11,
12], there has been no study on cross-sensor fingerprint classification, which motivates us to work on this topic.
Numerous fingerprint categorization systems have been developed, some relying on non-conventional approaches and others on convolutional neural networks. The references provide an exhaustive overview of non-CNN methods [
13,
14]. The success rate of a fingerprint classification approach is highly dependent on the quality of the description of the discriminating information of a fingerprint. Directional ridge patterns and singularities are critical distinguishing characteristics of fingerprints, as demonstrated by the techniques proposed in [
15,
16,
17,
18,
19,
20], which utilize this information in a variety of ways to classify fingerprints. Gue et al. [
15] employ the amount and kind of core points as fingerprint descriptors, as well as rule-based categorization, to classify fingerprints. Additionally, this approach classifies indistinguishable fingerprints using center-to-delta flow and balance arm flow. Its categorization accuracy is 92.7% on average. Jung and Lee [
21] split a fingerprint into 16 × 16 pixel blocks, compute their representative directions, use Markov models to identify the core block, and then divide the fingerprint into four areas, each of which is represented using distributions of ridge directional values. This method has a classification accuracy of 97.4%. Dorasamy et al. [
17] employed a simplified rule-based technique and two features: directional patterns and singular points for fingerprint description. The classification accuracy of this scheme is 92.2%. Saeed et al. [
18] proposed a modified histogram of oriented gradients (HOG) fingerprint classification algorithm. The HOG descriptor’s orientation field computation is not ridge pattern specific. In order to improve the HOG descriptor’s ability to represent a fingerprint, we compute an orientation field that is suited to the ridge pattern. This technique achieved an average accuracy of 98.70% on the noisy fingerprint database FVC2004. Saeed et al. [
19] suggested a new approach for classifying noisy fingerprints from live scan devices using statistical features (mean, standard deviation, kurtosis and skewness) from dense scale invariant feature transform (d-SIFT). This method achieved 97.6% accuracy using FVC2004, a noisy, low-quality live scanned fingerprint database. Sudhir et al. [
22] employed GLCM, LBP, and SURF for feature extraction, while SVM and BoF classifiers were used for classification. Based on FVC2004, they got average accuracy of 74.50 using SVM and 84.75 using BoF.
Deep CNN has shown remarkable results in many applications [
23,
24,
25,
26]; it has been used to classify fingerprints [
27,
28,
29,
30,
31,
32] and has achieved encouraging results. Zia et al. [
33] introduced the Bayesian DCNNs (B-DCNNs) by incorporating Bayesian model uncertainty to increase fingerprint categorization accuracy. They achieved 95.3% accuracy on FVC004 (5 class), showing a 0.8–1.0% improvement in model accuracy compared to the baseline DCNN. In Nguyen et al. [
34], the CNN approach is suggested for the noise reduction stage of noisy fingerprint. Two main steps are involved in this procedure. Non-local information is used to construct a pre-processing phase for noisy image. Fingerprints are then separated into patches and utilized for CNN training, resulting in a model for CNN de-noising of future noisy images, which can subsequently be smoothed using Gaussian filtering to remove pixel artifacts. Fingerprints that have been pre-processed are separated into overlapping patches during the CNN training step. To train the convolutional neural network, they feed these patches into it. They’ve built a three-tiered network with distinct filters and operators at each level. Third layer convolutional layer predicts enhancing patches and reconstructs the output image. Using the Gaussian algorithm and a canny algorithm they strength the information edge, this approach is able to filter out noise. When all images have been processed by the morphological procedure, the result will be improved. They extracted features from pre-processed fingerprints (arch, loop and whorl) and classified them using for classifiers: random forest, SVM, CNN, and K-NN and obtained accuracies of 97.78%, 95.83%, 96.11%, and 92.05%, respectively.
Nahar et al. [
35] designed CNN models based on the LeNet-5 design for fingerprint classification. They evaluated their method using the augmented subset (DB1) from the FVC2004 dataset. They got an accuracy of 99.1%. In deep models, layers and filters are defined by experiments, and no special rule is used to choose them; tuning the hyper-parameters is tiring and time-consuming. Motivated by the difficulty in the design of CNN architectures, we propose a technique that determines automatically and adaptively the architecture of a CNN model using the fingerprints dataset. To begin, we use the LGDBP description Saeed, et al. [
36] and K-medoid clustering algorithm [
37] to choose representative fingerprints, and then we derive the layers filters using Fukunaga–Koontz Transform (FKT) [
38]. To control the depth of a CNN model, we compute the ratio between traces of between-class scatter matrix
Sb and within-class scatter matrix
Sw.
The proposed fingerprint CNN classification system was evaluated against the state-of-the-art fingerprint classification schemes utilizing the benchmark multi-sensor datasets FingerPass and FVC2004. Specifically, the contributions of this work are as follows:
We developed an efficient automatic method for classifying cross-sensor fingerprints based on a CNN model.
We proposed a technique for the custom-designed building of a CNN model, which automatically determines the architecture of the model using the class discriminative information from fingerprints. The layers and their respective filters of an adaptive CNN model are customized using FKT, and the ratio of the traces of the between-class scatter matrix, and the within-class scatter matrix.
We thoroughly evaluated the proposed method on two datasets. The proposed fingerprint classification scheme is quick, accurate, and performs well with noisy fingerprints obtained using live scan devices as well as cross-sensor fingerprints.
The rest of the paper is organized as follows.
Section 2 presents the details of the proposed technique. The experimental results have been given in
Section 3.
Section 4 discusses the performance of the proposed method in detail.
Section 5 concludes the article.
2. Proposed Method
The convolutional neural network (CNN) is one of the most widely used and popular deep learning networks [
39]. Its general structure comprises different types of layers, including the CONV layer with different filters, pooling layer, activation function layer, fully connected layer, and loss function [
40]. It has been used for a wide range of tasks, including image and video recognition [
41], classification of images [
42], medical image analysis [
43], computer vision [
44], and natural language processing [
45].
Many advancements in CNN learning methods and architecture have a place, allowing the network to handle larger, diverse, more complicated, and multiclass issues [
46]. Following AlexNet’s outstanding performance on the ImageNet dataset in 2012, many applications used CNNs [
47]. A layer-wise representation of CNN reversed the trend toward extraction of features at low spatial resolution in deep architecture, as achieved in VGG [
48]. Most modern architectures follow VGG’s simple and homogeneous topology idea. The Google deep learning group introduced the divide, transform, and merge concept with the inception block. The inception block introduced the concept of branching within a layer, allowing for feature abstraction at various spatial scales [
49]. Skip connections, developed by ResNet [
50] for deep CNN training, gained popularity in 2015. Others, like Wide ResNet, are exploring the influence of multilevel transformations on CNN’s learning capacity by increasing cardinality or widening the network [
51]. So, the research turned from parameter optimization to network architecture design. Thus, new architectural concepts like channel boosting, spatial and feature-map exploitation, and attention-based information processing emerged [
52]. The main issue in the design of CNN models is to tune the architecture of CNN for a specific application.
2.1. Problem Formulation
The fingerprints are categorized into four types: arch, left loop, right loop, and whorl. Identifying the type of a fingerprint is a multiclass classification problem. Let there be
N subjects, and
K fingerprints are captured from each subject with
M different sensors; these fingerprints are categorized into
C classes. Let
, where
represents the
ith fingerprint of the
jth subject captured with
sth sensor, be the set of fingerprints, and
={1, 2, …,
C}, where
C is the number of classes, be the set of fingerprint labels (classes). The problem of predicting the type of a fingerprint
is to build a function
that takes a fingerprint
and assigns it a label
c i.e.,
, where
θ are the parameters. We design the function
using a CNN model, in this case
θ represents the weights and biases of the model. The model is built adaptively. Its design process is shown in
Figure 1, and the detail is given in the rest of the section.
2.2. Adaptive CNN Model
The main constituent of a CNN model is a convolutional (CONV) layer. It extracts discriminative features from the input signal, applying convolution operation with filters of fixed size. CONV layers are stacked in a CNN model to extract a hierarchy of features. The number of filters in each CONV layer and the number of CONV layers in a CNN model are hyper-parameters, and finding the best configuration of a model for a specific application is a hard optimization problem; it entails the search of huge parameter space. In addition, the initialization of learnable parameters of a CNN model has a significant effect on the performance of the model when it is trained with an iterative optimization algorithm like Adam optimizer. Leveraging the discriminative content of fingerprints, we propose a simple method to find the best configuration of the model adaptively. Initially, we select the representative fingerprints from each type to guide the design process of a CNN model. The discriminative information in these fingerprints is used to determine the width (the number of filters) of each CONV layer and the depth (the number of CONV layers) of the model; it is also used for data-dependent initialization of the filters of CONV layers. An overview of the design process is shown in
Figure 1. We employ clustering to select the representative fingerprints, the Fukunaga–Koontz Transform (FKT) [
38], which exploits class-discriminative information, to determine the number of filters in a CONV layer, and the ratio of the between-class scatter matrix
Sb to the within-class scatter matrix
Sw to adjust the depth (i.e., the number of CONV layers) of the CNN model. Finally, to minimize the number of learnable parameters and avoid overfitting, global pooling layers are introduced. By decreasing the resolution of the feature maps, the pooling layer seeks to achieve shift-invariance, and the pooling layer’s feature map is linked directly to SoftMax [
53]. The design process is worked out in detail and discussed in the following subsections, and its overview is shown in
Figure 1.
2.2.1. Selection of Representative Fingerprints
We extract discriminative information from fingerprints to specify the CONV layers and the depth of a CNN model adaptively. To do this, we cluster the training set to identify the most representative fingerprints of each class. For determining the representative fingerprints, discriminative features from fingerprints are extracted using the LGDBP descriptor [
36] K-medoids [
37] is used for clustering since it selects the instances as cluster centers and is suitable for finding the representative subset of the training set. The fingerprints corresponding to the cluster centers are chosen as the representative subset. The number of clusters for each class in the K-medoids algorithm is specified using the silhouette analysis [
54]. Using this procedure, we select the set
X = {
X1,
X2, …,
XC}, where
Xi = {
RFj,
j = 1, 2, 3, …,
ni} is the set of representative fingerprints of
ith class.
2.2.2. Design of the Main DeepFKTNet Architecture
The architectures of the state-of-the-art CNN models are usually not drawn from the data and are fixed and highly complex. On the contrary, we define a data-dependent architecture of DeepFKTNet. Its primary architecture is based on the answers to two questions: (i) how many CONV layers should be in the model and (ii) how many filters must be in each layer. These questions are addressed by an iterative algorithm that computes the number of filters in a CONV layer, adds it iteratively to the model, and terminates when a criterion is satisfied. We use the discriminative structural information embedded in fingerprints to determine the number of filters in a CONV layer and their initialization. The detail is given in Algorithm 1. We discuss the algorithm with motivation in the following paragraphs.
Initially, the set
X = {
X1,
X2, …,
XC} is used to determine the number of filters of the first CONV layer and initialize them. Inspired by the filter size of the first CONV layer in the state-of-the-art CNN models like ResNet [
50], DenseNet [
55], and Inception [
49], we fixed the size of filter size of the first layer to 7 × 7. We extract patches of size
w ×
h from the representative fingerprints (steps 2–3 of Algorithm 1) and formulate the problem of determining the filters (
as finding the optimal projection direction vectors
= 1,2, …
d, which are determined by solving the following optimization problem:
where
and
are the between-class and within-class scatter matrices (as computed in step 4 of the Algorithm 1). According to Fukunaga Koontz Discriminant Analysis (FKT) [
38], the optimal projection direction vectors
are the eigenvectors of
i.e.,
where
,
and
&
are obtained by the diagonalization of the sum
i.e.,
(steps 5–6 of Algorithm 1). The Equation (2) gives the optimal vectors, which simultaneously maximize
and minimize
Unlike Linear Discriminant Analysis (LDA) [
56], the inversion of
is not needed in this approach, so it can tackle very high-dimensional data. Additionally, this approach seeks to find optimal vectors that are orthogonal. As the dimension of the patch vectors
related to the intermediate CONV layers is usually very high, and we need filters that are independent, so this approach is suitable for our design process. The problem of selecting the number of filters in the convolutional layer is to select the eigenvectors
= 1, 2, …
L so that the ratio
attains maximum value. Here the between-class scatter matrix
SFb and within-class matrix
SFw are computed for each
by projecting all activations
in the space spanned by
(steps 7–8 of the Algorithm 1). It ensures to select the filters which extract discriminative features. After selecting
= 1, 2, …
L, the CONV block with
L filters
initialized with
is introduced in DeepFKTNet. Then, a pooling layer is added if needed (step 8–10 of the Algorithm 1).
Using the current architecture of DeepFKTNet, the set of activations
of
X = {
X1,
X2, …,
XC} is computed. These activations are used to determine whether to add more layers to the net. It is decided by calculating the trace ratio
, where
and
are the between-class and within-class scatter matrices of the activations
. If
is greater than the previous TR (PTR), it means that the addition of the current block of layers introduced the discriminative potential to the network. This criterion ensures that the features generated by DeepFKTNet have large inter-class variation and small intra-class scatter. To add another CONV block, the steps 3–10 are repeated with
Z. To reduce the size of feature maps for computational effectiveness, pooling layers are added after the first and second CONV blocks. As the kernels and their number are determined from the fingerprint images, each layer can have a different number of filters.
Algorithm 1: Design of the main DeepFKTNet Architecture |
Input: The set X = {X1, X2, …, XC}, where Xi = {RFj, j = 1, 2, 3, …, ni} is the set of representative fingerprints of ith class. |
Output: The main DeepFKTNet Architecture. |
Step 1: | Initialize DeepFKTNet with input layer and set w = 7, h = 7, d = 1, and m (the number of filters) = 0 for the first layer; PTR (previous TR) = 0. |
Step 2: | For i = 1, 2, 3,…, C |
| Compute = RFj, for each |
Step 3: | For |
| Ai = ∅ |
| For each |
| Extract patches of size w × h with stride 1 from , vectorize them |
| into vectors of dimension and append to Ai. |
Step 4: | Using , compute |
| -between-class scatter matrix , where is an matrix with all ones. |
| -within-class scatter matrices |
Step 5: | Diagonalize the sum and transform the scatter matrices |
| using the transform matrix . i.e., , |
Step 6: | Compute eigenvectors of such that |
Step 7: | For each eigenvector |
| -Reshape to a filter fk of size |
| -Compute where |
| -Compute the between scatter matrix SFb and within scatter matrix SFw from . |
| -Compute the trace ratio |
Step 8: | Select L filters corresponding to (as shown in Figure 2 for layer 1). |
Step 9: | Add the CONV block to DeepFKTNet with filters . Update m = m + 1. |
Step 10: | If m = 1 or 2, add a max pool layer with pooling operation of size 2 × 2 and stride 2 to |
| Deep FKTNet. |
Step 11: | Compute , where = DeepFKTNet(RFj), for each |
Step 12: | Using , compute the ratio |
| If , w = 3, h = 3, d = L and go to Step 3, otherwise stop. |
Figure 2.
Selection of best filters for layer1 of DeepFKTNet model for FingerPass dataset.
Figure 2.
Selection of best filters for layer1 of DeepFKTNet model for FingerPass dataset.
It is to be noted that the eigenvector
, which are used to specify the kernels of a CONV layer, have the maximum
and capture most of the variability in input fingerprint images without redundancy in the form of independent features. The depth of a CNN model (number of layers) and the number of kernels for each layer are important factors that determine the model complexity. Step 7 of Algorithm 1 determines the best kernels that ensure the preservation of maximum energy of the input image, and step 8 initializes these kernels to be suitable for the fingerprint domain. The selected kernels extract the features from fingerprint images so that the variability of the structures in fingerprint images is maximality preserved. It is also important that the features must be discriminative (i.e., have large inter-class variance and small intra-class scatter as we go deeper in the network). It is ensured using the trace ration
, the larger the value of the trace ratio, the larger the inter-class variance and the smaller the intra-class scatter [
57]. Step 11 in Algorithm 1 allows adding CONV layers as long as
TR is increasing and determines the data-dependent depth of DeepFKTNet, as shown in
Figure 2.
2.3. Addition of Global Pool and Softmax Layers
Activation of the last CONV block is with dimension
h × w × L, and after flattening, it is fed to FC layers; the number of parameters is huge and leads to overfitting. To reduce the number of parameters and spatial dimensions of the last CONV block activation, we feed it to global average pooling (GAP) and global max-pooling (GMP) layers [
58]. The GAP average all the
hw values, whereas the GMP takes into account the contributions of the neurons of maximum response; the number of neurons in the FC layer is
h × w × L, and it is reduced to 1 × 1 ×
L when only GMP or GAP is introduced. We concatenate the output of GMP and GAP layers to overcome the shortcoming of each and then feed it to the FC layer, followed by the SoftMax layer.
2.4. Fine-Tuning the Model
The DeepFKTNet model is evaluated using the challenge multisensory FingerPass dataset [
59], and it is compared to the well-known deep models: ResNet [
50] and DenseNet [
55] pre-trained on the ImageNet dataset and fine-tuned using the same dataset as DeepFKTNet. For further validation, we evaluated our method using the challenge FVC2004 dataset [
60] and compared it to the state-of-the-art methods. For each dataset, we select the most representative fingerprint images from the training set using K-medoids and LGDBP descriptor and then built its adaptive DeepFKTNet architecture using Algorithm 1.
2.4.1. Datasets and the Adaptive Architectures
To verify the performance of the DeepFKTNet model on benchmark datasets, we used FingerPass and FVC2004 datasets. The FingerPass is a multi-sensor dataset; it was collected using nine different optical and capacitive sensors and two interaction types, i.e., press and sweep. The FingerPass contains a total of fingers separated into nine subsets based on sensors; each subset contains 12 impressions of 8 fingers from 90 persons.
FVC2004 dataset contains noisy images acquired by live scan devices. It has 4 sets: DB1 collected using optical V300 sensor, DB2 collected using optical U 4000, DB3 collected using thermal sweeping sensor, and DB4 is a synthetic fingerprint dataset. Each one contains 880 fingerprint images [
60]. We categorized FVC2004 fingerprints into four categories: arch, left loop, right loop, and whorl. We merge the 4 sets of FVC2004 into one set of four classes; it is now a multi-sensor fingerprint dataset.
To setup best parameters for each DeepFKTNet model, the hyperparameter optimization software framework Optuna [
61] is used to select the best hyperparameters for fine-tuning the DeepFKTNet model. Using Algorithms 1, the DeepFKTNet architecture obtained for the FVC2004 dataset consists of 5 CONV blocks, as shown in
Figure 3a, whereas the architecture constructed for the FingerPass dataset has11 blocks, as depicted in
Figure 3b. The number of filters for each CONV block and the depth of each model for each fingerprint dataset are determined using Algorithm 1. Using the Optuna optimization algorithm, we fine-tuned the hyperparameters and tested three optimizers (Adam, SGD, and RMSprop), learning rate between 1 × 10
−1, and 1 × 10
−5, patch size (5, 10, 15, 20, 30, 50), activation functions (Relu, LRelu, and Sigmoid), and dropout between 0.25 and 0.50. After training for 10 epochs, the best hyper-parameters for each dataset are shown in
Table 1.
2.4.2. Evaluation Procedure
For evaluation, we manually separated the FingerPass dataset into four classes (arch, left loop, right loop, and whorl). We divided the FingerPass dataset into three sets (80% training, 10% validation, and 10% testing) using two different scenarios. In scenario-1, the fingers from each sensor were divided into training, validation, and test sets. In scenario-2, fingers in the training, validation, and test sets are from different sensors.
For the FVC2004 dataset, we divided the dataset into training (80%), validation (10%), and testing (10%), keeping the balance. For performance evaluation, we used four commonly used metrics: accuracy (ACC), true positive rate (TPR), true negative rate (TNR), and Kappa [
62,
63,
64,
65]. The overall average of metrics has been computed. The used metrics [
66,
67] to evaluate the proposed system are:
where
TP,
TN,
FP, and
FN are the numbers are true positives, true negatives, false positives, and false negatives;
P0 and
Pe are calculated from the confusion matrix; the detail is given in [
68]. To compute
TP,
TN,
FP, and
FN, one class, in turn, is taken as positive, the other classes are assumed to be negative, and the
TPR and
TNR are calculated. Finally, mean
TPR and
TNR are calculated by averaging
TPR and
TNR over all classes. In the results, the mean
TPR and
TNR are reported.
3. Experimental Results
This section presents the experimental results of the DeepFKTNet models designed for the two datasets.
We designed the DeepFKTNet model for each dataset and fine-tuned it using the training sets. We validated its performance on FingerPass and FVC2004 datasets and compared it with the widely used CNN models ResNet [
50] and DenseNet [
55], which were pre-trained on the ImageNet dataset and fine-tuned on the same training set that was used for the DeepFKTNet model. In the rest of the paper, we name the DeepFKTNet models as DeepFKTNet-11 and DeepFKTNet-5, designed for the FingerPass and the FV2004 datasets, respectively.
The results of the three models DeepFKTNet-11, ResNet152, and DenseNet121 for scenario-1 are shown in
Figure 4a and
Table 2a. The DeepFKTNet-11 model generated adaptively on the FingerPass dataset outperforms the state-of-the-art ResNet152 and DenseNet121 models in terms of all metrics. Though DenseNet121 is not better than DeepFKTNet-11, it outperforms ResNet152 in terms of all metrics.
Figure 4b and
Table 2b show the results for scenario-2 on the FingerPass dataset. In this scenario, the results obtained with the DeepFKTNet-11 are almost similar to those obtained in scenario-1. The DeepFKTNet-11 outperforms ResNet152 and DenseNet121.
Figure 5 illustrates the confusion matrices for both scenarios. These give insights into the system performance for different classes.
The DeepFKTNet-5 model was adaptively designed for the challenge FVC2004 dataset; it was evaluated using the above evaluation procedure. We fine-tuned the developed DeepFKTNet-5 model and the pre-trained models ResNet152 and DenseNet121 using the same dataset. The results are shown in
Figure 6; the DeepFKTNet-5 model outperforms the state-of-the-art ResNet152 and DenseNet121 models in terms of all metrics.
Figure 7 illustrates the confusion matrices for the FVC2004 dataset. These give insights into the system performance for different classes.
4. Discussions
We addressed the multi-sensor fingerprint classification problem and proposed a novel method for automatically generating a custom-designed DeepFKTNet model from the target fingerprint dataset. The number of layers and filters for each layer are not specified randomly; they are determined from the best representative fingerprints selected using the K-medoids clustering algorithm and LDGBP descriptor from the fingerprint datasets. The generated DeepFKTNet models are shallower than the state-of-the-art models, robust, involve a small number of learnable parameters, and suitable for fingerprint classification.
The results of the DeepFKTNet models on the FingerPass and FVC2004 datasets (
Figure 4 and
Figure 6) indicate that they outperform the famous deep models ResNet152 and DenseNet121, which were pre-trained on the ImageNet dataset and fine-tuned using the same fingerprint datasets. The architecture of a DeepFKTNet model is drawn directly from the dataset; the internal structures of the data determine its design. For this reason, the DeepFKTNet model has a compact size and yields better classification results. Further, it does not suffer from the overfitting problem (see
Table 3) since it involves a small number of learnable parameters (see
Table 4), which is comparable with the number of training examples. If the number of learnable parameters is huge as compared to the training examples, the overfitting problem cannot be avoided. The training and testing accuracies shown in
Table 3 indicate that the models do not suffer from overfitting. In addition, DeepFKTNet models are trained using the available training data, and the pre-training is not needed, unlike ResNet152 and DenseNet121.
The space complexity of a CNN model is measured in terms of the number of learnable parameters, whereas the number of FLOPS determines its time complexity.
Table 4 gives the statistics of the space and time complexities of the models. Overall, the DeepFKTNet model got competitive performance with fewer layers and parameters. The DeepFKTNet models designed for the two datasets have a small number of parameters, in thousands against millions in ResNet152 and DensNet121 models. DeepFKTNet-5 and DeepFKTNet-11 have fewer FLOPs than ResNet152 and DensNet121 and better performance. The DeepFKTNet-11 is relatively more complex than DeepFKTNet-5; the reason is that the FingerPass dataset involves a large number of sensors as compared to the FVC2004 dataset, and there is more variety of patterns in the FingerPass dataset, and to encode the discriminative pattern, more rich structure is needed.
Further, for investigating which features the DeepFKTNet models focus on for decision making, we employed GradCam [
69].
Figure 8 shows some heat maps generated with GradCam for DeepFKTNet-11. The fingerprint images from class arches and their GradCam visualizations are shown in
Figure 8a,b, the fingerprint images from the class left loop and their GradCam visualizations are shown in
Figure 8c,d.
Figure 8e,f depicts fingerprint images from the class right loop and their GradCam visualization, whereas
Figure 8g,h show fingerprint images from the class whorls and their GradCam visualizations. The visual analysis of the decision-making process of DeepFKTNet shows that it concentrates on the discriminative regions of fingerprints and extracts class discriminative features.
For a fair comparison, the DeepFKTNet-5 has been compared with the state-of-the-art fingerprint classification methods, which were validated on the benchmark public FVC2004 dataset; the comparison results are given in
Table 5.
The DeepFKTNet-5 model outperforms the state-of-the-art methods (handcraft and CNN methods) on the same dataset in terms of accuracy. The method of Jeon et al. [
70], despite being a complex ensemble of CNN models, got an accuracy of 97.2%, which is less than that of DeepFKTNet-5. Zia et al. [
33] employed B-DCNNs with five convolution layers and two FC layers (with 1024 and 512 neurons) for fingerprint classification and validated on the FVC2004 dataset; it does not yield better accuracy than that of DeepFKTNet-5 (95.3% vs. 98.89%). Its complexity is high; it has more FLOPs (0.65 G vs. 0.5 G) and more learnable parameters (38.66 M vs. 58.456 k). Nguyen et al. [
34] employed a two-stage CNN model for enhancing and then training and prediction. They used LBCNN [
71] method in the first stage, which has 0.352 M learnable parameters, and then employed a three-ternary model for training and prediction. They got an accuracy of 96.1% based on FVC2004 (three classes), which is less than DeepFKTNet-5. Nahar et al. [
35] used a modified LNet-5 model for fingerprint classification; they got 99.1% accuracy but with only a subset (DB1) from FVC2004, whereas the DeepFKTNet-5 model evaluated on the combined multi-sensor dataset of the four datasets (DB1, DB2, DB3, and DB4) from FVC2004. Also, the LNet-5 has a higher number of parameters, 19.25 M and 1.42 G FLOPs vs. 58.456 k and 0.5 G FLOPs of DeepFKTNet-5. The reason for the better performance and less complexity of DeepFKTNet-5 is that it is custom-designed, keeping in view the internal discriminative structures of fingerprints.
Table 5.
Comparison between DeepFKTNet-5 and the state-of-the-art methods.
Table 5.
Comparison between DeepFKTNet-5 and the state-of-the-art methods.
Paper | Method | Performance (%) |
---|
ACC | SE | SP | Kappa |
---|
Gupta et al. [72] 2015 | Singular point | 97.80 | - | - | - |
Darlow et al. [73] 2017 | Minutiae and DL | 94.55 | - | - | - |
Andono et al. [74] 2018 | Bag-of-Visual-Words | 90 | - | - | - |
Saeed et al. [19] 2018 | statistics of D-SIFT descriptor | 97.40 | - | - | - |
Saeed et al. [18] 2018 | Modified HOG descriptor | 98.70 | - | - | - |
Jeon et al. [70] 2017 | Ensemble CNN model | 97.2 | - | - | - |
Zia et al. [33] 2019 | B-DCNNs | 95.3 | | | |
Nguyen et al. [34] 2019 | CNN (tested on 3 classes of FVC2004) | 96.1 | | | |
Nahar et al. [35] 2022 | Modified LeNet (tested on FVC2004-DB1) | 99.1 | | | |
DeepFKTNet-5 | DeepFKTNet model | 98.89 | 95.46 | 99.18 | 96.82 |