1. Introduction
The use of fingerprints as a biometric identifier for person recognition is considered robust and reliable due to their unique, unalterable, and enduring characteristics over time [
1]. While other biometric techniques share similar attributes—such as facial and iris recognition for identification of individuals or palm print identification [
2] for classification of age and gender—fingerprints stand out as more accessible and cost-effective, particularly regarding scanning technology [
3]. Two primary applications of person recognition can be identified. The first straightforward and expedient method involves validating whether an individual is who they claim to be, a process commonly referred to as identity verification. In this scenario, a fingerprint is compared to the corresponding record stored in a database (DB) using a one-to-one matching procedure. The second application, which is the primary focus of this study, addresses situations where an individual’s identity is unknown. In such cases, the fingerprint must be matched against the entire DB, requiring a one-to-many comparison process. Recent advances in fingerprint localization have focused on improving accuracy through the use of novel similarity metrics [
4].
Due to the constant growth of the human population, large fingerprint databases are continuously generated, making it increasingly complex for identification systems to provide instant and precise results [
5]. To improve the response times of these identification systems, a preprocessing step—fingerprint classification—is employed to reduce the global search space. Addressing this challenge, convolutional neural networks (CNNs) achieve 100% accuracy but at the expense of excessive computational costs [
6,
7]. Techniques such as the use of an orientation field (OF), support vector machines (SVMs), and random forests (RFs) have been reported to yield satisfactory results. In [
6], a threshold above 1 resulted in 100% accuracy, while a threshold of 0.6 achieved over 93% accuracy using the National Institute of Standards and Technology Special Database 4 (NIST-4), which contains 4000 samples. Nevertheless, this excellent performance was impacted by the high structural similarity of ridges in certain samples. As the threshold increases, classification confusion also rises [
6]. Despite the high accuracy achieved by CNNs in fingerprint classification, their application in real-world scenarios is often impractical due to their high computational demands and extensive training requirements [
8]. In situations where decisions must be made in real time or within strict time constraints, such as in medical service provision or border control systems, the inference time of CNNs can become a critical bottleneck. These applications require rapid processing and reliable identification without depending on specialized high-performance hardware. Consequently, alternative models that balance accuracy and computational efficiency are necessary to ensure practical implementation in large-scale fingerprint identification systems [
9]. The authors of [
7] evaluated SVM and RF performance using several databases, including DB-HLG, the fingerprint verification competition (FVC) datasets (2000, 2002, and 2004), and NIST-4. The SVM achieved an accuracy of ≥95.5% and a mean squared error (MSE) of ≤0.321, with computational times of 7, 9, and 5 h for each database, respectively. The RF obtained an accuracy of ≥96.75%; an MSE of ≤0.274; and computational times of 8, 12, and 18 h for the same databases. As evidenced, machine learning offers significant advantages in fingerprint classification. However, these networks require high-performance computational architectures, which often render their application impractical. To address these limitations, this work proposes the use of a multilayer extreme learning machine (M-ELM) for efficient fingerprint classification, with the primary objective of improving training times and computational costs.
The standard Extreme Learning Machine (ELM) is characterized by its rapid processing and low computational cost while achieving accuracy levels comparable to those of CNNs [
10,
11,
12]. An ELM with a single hidden layer was demonstrated in [
13], where a standard ELM was compared to two types of CNNs (a CaffeNet variant and the CNN proposed in [
10]) for fingerprint classification. The study found that when using Hong08 [
14] as a descriptor and the ELM as a classifier, an accuracy of 94% and a penetration rate of 0.0332 were achieved, compared to the 99% accuracy obtained by the CNNs.
ELM networks were proposed by Huang for the training of single-hidden-layer feedforward neural networks (SLFNs). In ELM, the hidden nodes are initialized randomly, then trained without iterative methods. The weights and hidden neuron biases are randomly assigned, while the output weights are determined using the Moore–Penrose pseudoinverse under the least squares criterion [
15]. In ELM, the hidden nodes (or neurons) may or may not be of the same size. The only parameters that need to be learned are the connections (or weights) between the hidden layer and the output layer. Consequently, ELM is formulated as a linear model. In comparison with traditional learning methods such as SVM, CNNs, etc., ELM is notably efficient and tends to achieve a globally optimal solution. Field studies have proven that even with randomly generated hidden nodes, ELM retains the universal approximation capability of SLFNs [
16,
17].
M-ELMs are networks formed by extreme learning machine autoencoders (ELM-AEs), proposed as a new multilayer perceptron (MLP) training scheme aimed at addressing the deficiencies of ELMs. Notably, a standard ELM does not perform well when processing natural signals such as sounds and images. In [
18], imaging tests were conducted, including car detection, gesture recognition, and incremental online tracking. For car recognition, the UrbanaChampaign dataset from the University of Illinois [
19,
20] was used in both the training and testing phases. Images with dimensions of 100 × 40 pixels achieved an accuracy of 95.5% with a training time of 46.7 s. In the second experiment, the Cambridge gestures dataset was used [
21], comprising 900 image sequences of novel hand gesture types, which are divided into three hand shapes and three movements. Each type contains 100 image sequences with dimensions of 60 × 80 × 10 pixels. In the first phase, static gestures were analyzed, while the second phase included movements. This experiment achieved an average test accuracy of 99.4%, with a training time of 57.7 s.
The M-ELM training architecture is structurally divided into two phases: a hierarchical representation of unsupervised features and a supervised feature classification phase. ELM-AEs are stacked to enable learning across multiple hidden layers (unsupervised), except for the final layer, which consists of a standard ELM (supervised) that performs classification [
22]. The M-ELM model was proposed to improve generalization capacity, which directly depends on the characteristics of the training dataset. By addressing the two primary issues of the original ELM—network stability and computational complexity—the M-ELM enhances generalization capacity and simplifies computations. Specifically, the output weights are calculated using the generalized inverse of the hidden-layer output and the system’s actual outputs [
15]. As mentioned earlier, this study introduces, for the first time, the use of an M-ELM as a fingerprint classifier. To achieve this, we optimize the hyperparameters using brute force and determine the optimal number of neurons for each hidden layer. The results are validated using five-fold cross-validation, employing metrics such as training time, accuracy, and penetration rate [
5,
23]. These metrics enable a fair comparison with the most recent studies, highlighting their relevance in the literature [
10,
13]. The main contributions of this research are the introduction of novel alternatives for fingerprint classification, combining the best descriptors presented in the literature with the M-ELM approach. This method demonstrates a 4% improvement in performance compared to approaches using commercial computers [
13], although it is accompanied by a slight decrease in the penetration rate (PR) of 0.0003. Additionally, the training time is significantly reduced by approximately 17 s compared to ELM-based approaches. These results emphasize the scalability of the proposed method, as it evaluates the impact of varying the number of hidden neurons across different classifier configurations, including one hidden layer for the ELM and two or three hidden layers.
The remainder of this article is structured as follows:
Section 2 provides a review of the state of the art, summarizing the most significant contributions from the past decade that are directly relevant to this study, focusing on the two dominant trends: CNNs for image-based classification and ELMs for descriptor-based classification.
Section 3 discusses the descriptors, along with the theoretical and mathematical foundations employed for feature extraction and fingerprint classification using multilayer ELMs.
Section 4 describes the validation process and performance metrics applied in this study, along with details about the database, including the quality and quantity of fingerprints.
Section 5 presents the heuristic optimization process, considering different hidden layers for the multilayer ELM algorithm (one, two, and three hidden layers) using various descriptors and databases. It also includes tables summarizing the results, offering a comparison of completeness and computational cost in terms of the general and specific performance of the proposed approach relative to the most recent and high-performing studies. Finally,
Section 6 concludes the manuscript by summarizing the findings and outlining potential directions for future research.
4. Methods and Materials
To evaluate performance, the 5-FCV scheme is used [
49], ensuring unbiased and precise classification metrics [
10,
13]. This is because the validation and training sets are not static but come from five parts the DB. Thus, 20% is taken for validation, while the remaining four parts are used for training. For each database, feature descriptor, and ELM approach, the overall results are reported from an average of five runs. Evidently, the validation set is intended to find the ELM hyperparameters that maximize its classification ability, and only the number of hidden neurons (
N) is considered for heuristic optimization for simplicity and performance purposes (see next section). Accuracy and absolute error of the PR. Accuracy is chosen, as it is a widely accepted metric for artificial intelligence models and aligns with previous studies used for comparison [
10,
13,
29,
50,
51]. PR, on the other hand, is particularly relevant for addressing unbalanced datasets, making it a standard metric in fingerprint analysis [
10,
13]. The absolute value of PR allows for straightforward identification of the superior model, with values approaching zero indicating a better penetration rate. Given the naturally unbalanced distribution of the database, the target PR for this study is 0.2948 [
13].
In this study, SFINGE software [
52] was utilized to facilitate comparisons with significant and recent works reported in the state of the art [
10,
13]. In addition, this software generates a substantial database consisting of thousands of fingerprint samples, enabling conclusive observations regarding fingerprint-based individual classification [
10,
13,
51]. SFINGE software produces synthetic fingerprints of different quality levels (low, medium, and high) with a realistic class distribution (unbalanced classes). These synthetic fingerprints generated by SFINGE simulate fingerprints obtained by optical scanners, which are much closer to real systems, such as NIST-4 [
53]. Moreover, it is worth noting that the feasibility of adopting the SFINGE database has already been demonstrated in several editions of the FVC [
54,
55,
56,
57,
58], where it achieved results comparable to those of real fingerprint databases, following the natural class distribution. To emulate various scenarios, three quality profiles were generated using SFINGE: HQNoPert, Default, and VQandPert (see
Figure 4). The HQNoPert database consists of high-quality, undisturbed fingerprints. The default database includes fingerprints of medium quality and slight perturbations in localization and rotation. The VQAndPert database comprises fingerprint of varying quality levels, incorporating perturbations related to the location, rotation, and geometric distortions. The primary distinction among the databases lies in the quality of the generated images. In total, 10,000 fingerprints were generated for each quality category, resulting in a comprehensive dataset of 30,000 fingerprints. The generation of 10,000 fingerprints per category ensures a fair and consistent basis for comparison with the studies reported in the state of the art [
10,
13].
6. Conclusions and Future Works
The continuous growth of the human population has led to a corresponding increase in the size of fingerprint databases, resulting in billions of records and significantly slowing down fingerprint recognition processes. In this context, fingerprint classification serves as an effective preprocessing step to reduce the response times of biometric fingerprint systems. It is worth noting that fingerprint classification is a natural process, given that fingerprints can be categorized into five distinct classes. CNNs are currently the most accurate tools for fingerprint classification, achieving nearly 100% accuracy. However, their high computational cost and reliance on high-performance hardware make them impractical for widespread use. This manuscript introduces an alternative approach for fingerprint classification aimed at person recognition. The proposed method achieves competitive accuracy while minimizing computational costs, employing two- and three-hidden-layer ELMs, both based on the M-ELM architecture. These models significantly reduce training time—on the order of seconds—compared to state-of-the-art approaches reported in the literature. Additionally, computational resource requirements are substantially lowered, enabling the use of standard commercial machines without the need for high-performance GPUs. The results demonstrate that the three-layer ELM achieves the best accuracy while maintaining low computational costs and minimal training times. However, a slight decrease in TP rates is observed compared to the literature methods. The approach proposed in this study outperforms the unbalanced ELM described by Zabala et al. [
13], both in terms of performance and complexity. The comparison with the method proposed by Zabala-Blanco et al. [
13] is particularly relevant, as their method has shown excellent results on small- to medium-sized databases (1000 to 3000 samples).
In the literature, two commonly recognized databases for evaluating fingerprint verification algorithms are NIST [
6,
23,
27,
29,
34,
37,
38,
39] and FVC [
1,
7,
23,
27,
33,
54,
55,
56,
57,
58]. However, the NIST database presents significant limitations due to its balanced classes, making it unsuitable for real-world data scenarios; therefore, its use can be disregarded. On the other hand, the FVC database could be considered in future work to validate our results, although it also has challenges. Its main drawback is the limited number of samples (less than 800 per version), which reduces the robustness of experiments, and it does not address a five-class classification problem, an important aspect to consider when developing more complex verification models (to reduce the penetration rate).
Future work will focus on developing faster M-ELM approaches using numerical methods to optimize the computation of weights in the hidden and output layers of the ELM-AE. Heuristic optimization techniques will also be explored to fine tune all hyperparameters of the M-ELM, further improving performance and reducing computational costs. Additionally, testing will be conducted on larger databases emulating real-world scenarios such as the populations of countries like Chile (19 million), Spain (47 million), and Germany (83 million). Another promising avenue involves integrating the unbalanced learning approach of Zabala et al. [
13] into the final layer of the M-ELM to maximize performance on unbalanced datasets. A comparative analysis of computational costs will also be carried out using consistent software and hardware configurations across different ELM models in order to provide irrefutable evidence of the advantages of M-ELM in terms of efficiency and scalability.