Vein Specific Image Quality Assessment

In contrast to fingerprint recognition where there are standardised quality metrics like the NIST Fingerprint Image Quality (NFIQ) [53] and the newer version NFIQ 2.0, there are no standardised metrics in finger- and hand-vein recognition yet. Thus, the finger- and hand-vein images were analysed using GCF [54] as it is a general image contrast metric and hence, independent of the image content. With the help of GCF the image contrast can be quantified exclusively disregarding the actual image content. As we aim to quantify the image quality of vein images, of course two vein specific NIR image quality metrics, namely the approach proposed by Wang et al. [55] (Wang17) and the approach proposed by Ma et al. [56] (HSNR) were included as well. The first approach evaluates the vein image quality fusing a brightness uniformity and a clarity criterion, which is obtained by analysing the local pixel neighbourhoods. The HSNR approach, which is especially tailored for non-contact finger vein recognition, simulates the human visual system by calculating an HSNR index and integrates an effective area index, a finger shifting index and a contrast index to arrive at the final image quality value.

#### Score Level Fusion

According to the ISO/IEC TR 24722:2015 standard [57], biometric fusion can be regarded as a combination of information from multiple sources, that is, sensors, characteristic types, algorithms, instances or presentations in order to improve the overall system's performance and to increase the systems robustness. Biometric fusion can be categorised according to the level of fusion and the origin of input data. The different levels of fusion correspond to the components of a biometric recognition system—sensor-level, image-level, feature-level, score-level and decision-level fusion, which are indicated in Figure 7. Sensor-level fusion is also called multisensorial fusion and describes using multiple sensors for capturing samples of one biometric instance [57]. This can either be done by the sensor itself or during the biometric processing chain. Hence, we perform sensor-level fusion as our capturing device acquires finger as well as two different kinds of hand vein images. The actual fusion is done during the biometric processing chain at score level (fusing the output scores of the individual modalities—finger veins, hand veins 850nm and hand veins 950nm).

The following combinations of different acquired modalities are evaluated:


Note that, for the combinations including finger veins, only one finger is included in the fusion. We evaluated the combinations including a finger for all fingers of the respective hand and used the best performing finger, which turned out to be the middle finger for both hands. Acquiring images of several, distinct fingers takes more time as only one finger is captured at a time, the same applies to acquiring both hands. Thus, we restricted to the evaluated combinations to the above listed ones which do not considerably increase the acquisition time. The actual score level fusion is performed using the BOSARIS tool-kit [58], which provides a MATLAB based framework for calibrating, fusing and evaluating scores from binary classifiers and has originally been developed for automatic speaker recognition. A 5 fold random split of training and test data with 20 runs was used to train and fuse the scores using BOSARIS. The reported performance results are the average values of the 20 individual runs.

#### *2.4. Experimental Setup and Evaluation Protocol*

The evaluation is split into three parts—image quality assessment, baseline recognition performance evaluation for the individual subsets and recognition performance evaluation of the fusion combinations. The image quality assessment and the baseline recognition performance evaluation is done separately for the finger dataset and the two hand vein datasets (850 nm and 950 nm illuminator). The three image quality assessment schemes are evaluated for each individual image per dataset. The results are the average values over the whole dataset, that is, there is a single value for the finger vein and the hand vein 850 nm as well as the hand vein 950 nm dataset for each image quality metric. For the recognition performance DET plots as well as the EER (the point where the FMR equals the

FNMR), the FMR1000 (the lowest FNMR for FMR = 0.1%) and the ZeroFMR (the lowest FNMR for FMR = 0%) are provided. At first the parameters for the pre-processing and feature extraction are optimised on a training dataset. Each dataset is divided into two roughly equal sized subsets, based on the contained subjects, that is, all fingers/hands of the same person are in one subset. The best parameters are determined on each subset and then applied to the other subset for determining the comparison scores. This ensures a full separation of the training and test set. The final results are based on the combined scores of both test runs. The FVC2004 [59] test protocol is applied for calculating the comparison scores in order to determine the FMR/FNMR: for the genuine scores, all possible genuine comparisons are evaluated, resulting in *ngen* <sup>=</sup> <sup>5</sup>·(5−1) <sup>2</sup> · (42 · 6) = 2520 and *ngen* <sup>=</sup> <sup>5</sup>·(5−1) <sup>2</sup> · (42 · 2) = 840 genuine scores for the finger and hand vein subset, respectively. For the impostor scores only the first template of a finger/hand is compared against the first template of all other fingers/hands, resulting in *nimp* <sup>=</sup> (42·6)·(42·6−1) <sup>2</sup> = 31,626 impostor comparisons for the finger vein subset as well as *nimp* <sup>=</sup> (42·2)·(42·2−1) <sup>2</sup> = 3486 impostor comparisons for the hand vein ones. The EER/FMR1000/ZeroFMR values are given in percentage terms, for example, 0.47 means 0.47%. The full results including the image quality values for each single image, the comparison scores and plots as well as the settings and script files to reproduce the experiments can be downloaded here: http://www.wavelab.at/sources/Kauba19c/.
