5.2.1. Daugman's Decidability Index

Daugman's decidability index [75] is a widely used method for assessing the performance of iris recognition systems [3,36,75]. In an iris recognition system like OSIRIS, a binary phase code is derived for each presented iris image. Then, the fractional Hamming distance to the phase code of a reference iris image is computed. The distributions of these Hamming distances are compared between a set of matching and a set of non-matching iris image pairs from a test dataset. The larger the overlap between the distributions, the more likely recognition errors become. The Daugman index (*d* ) measures the separation of these distributions by

$$d' = \frac{|\mu\_E - \mu\_I|}{\sqrt{\frac{1}{2}(\sigma\_E^2 + \sigma\_I^2)}},$$

where *μ<sup>E</sup>* and *μ<sup>I</sup>* are the means and *σ<sup>E</sup>* and *σ<sup>I</sup>* are the standard deviations of the distributions. Larger values correspond to better discrimination. We follow this procedure using the *GC*<sup>2</sup> multi-modal biometric

dataset and plot the histograms of the Hamming distances for the matching and the non-matching iris pairs in Figure 6. For visualization, normal distributions were fitted to the histograms.

**Figure 6.** Normal distributions fitted to the normalized histograms of Hamming distances of matching (solid lines) and non-matching (dash lines) iris pairs are shown for three test image datasets.

We can now study the effect of quality filtering on the performance of the iris recognition system. In Figure 7, we show Daugman's decidability index as a function of the fraction of removed poor-quality images. DSMI, BRISQUE, and WAV1 image quality metrics were used for quality filtering. Filtering out low-quality iris images using the DSMI metric leads to the largest performance improvement in the REFLEX dataset, while quality filtering in the PHONE dataset leads only to small improvements. This could be due to the DSMI metric performing better in quality assessment on iris images in the REFLEX dataset or to the PHONE dataset posing a greater challenge to the reference iris recognition system. The Daugman index for the PHONE dataset is only 1.36, compared to 2.02 and 1.90 for REFLEX and LFC, respectively (see Figure 6).

**Figure 7.** Daugman's decidability index for all iris images, after filtering different parts of the iris images with the poorest quality using three image quality metrics on three test datasets.

From the Daugman's decidability index values in the three test datasets, as shown in Figure 7, we can conclude that filtering out the iris images with the poorest quality using the proposed DSMI metric improves the recognition accuracy of the reference iris recognition system. The BRISQUE metric also performs well in the REFLEX dataset, but it is not consistent for quality filtering in the LFC and PHONE datasets. WAV1 is not consistent with quality filtering on all three test datasets.

#### 5.2.2. Receiver Operating Characteristic Curve

The area under the curve (AUC) of the receiver operating characteristic (ROC) is a widely used performance metric for comparing the accuracy of iris recognition systems. The iris recognition system with the larger AUC is considered to be a more accurate system.

To visualize and measure the improvements of the performance of the reference iris recognition system by filtering out the poor quality iris images, the ROC curves were generated for each dataset by plotting the true positive rate against the false positive rate at various fractional Hamming distances (see Figure 8).

Figure 8 shows the ROC curves for the three test datasets with different quality filtering thresholds using our DSMI metric, BRISQUE, and WAV1 metrics. The solid red lines in Figure 8 show the performance of the reference iris recognition system without quality filtering. Without quality filtering, the corresponding AUC value for the REFLEX dataset is 0.9065, for the LFC dataset it is 0.8861, and for the PHONE dataset it is 0.8226. The AUC values show again that the PHONE dataset is the most challenging one for the reference iris recognition system.

**Figure 8.** The receiver operating characteristic (ROC) curves for the three test datasets (REFLEX, LFC, and PHONE) with different quality filtering thresholds using our DSMI metric, BRISQUE, and WAV1. The solid red, dashed blue, dot-dashed green, and dotted black lines were plotted without quality filtering, after filtering out one-quarter, half, and three-quarters of the poorest-quality images, respectively.

We also computed the AUC values after removing 1/4, 1/2, and 3/4 of the iris images with the poorest quality from each test dataset. The AUC values are listed in the figure legends for all of the test datasets. Using the proposed DSMI metric for quality filtering increased the AUC value in all test datasets.

In the REFLEX dataset, filtering out a quarter of the iris images with the poorest quality using the DSMI metric greatly improves the performance of the reference iris recognition system in terms of AUC by 0.0406 (4.5%). However, filtering out the second quarter only increases AUC by 0.0062 (0.65%). This indicates that the middle two quarters of the iris images have a small quality deviation, and filtering a part of these images does not result in a considerable improvement in the performance of the iris recognition system. However, filtering the third quarter of the iris images with the poorest quality improves the AUC significantly by 0.0336 (3.5%).

The performance improvements for the LFC dataset after filtering out the first, second, and third quarters of the iris images with the poorest quality using the DSMI metric are 0.0278 (3.1%), 0.0124 (1.4%), and 0.0104 (1.1%), respectively. The values for performance improvement on the PHONE dataset are 0.0049 (0.6%), 0.0127 (1.5%), and 0.0413 (4.9%). Filtering out the first quarter of the iris images with the poorest quality using the DSMI metric only slightly improves the AUC value, but filtering out three quarters of the iris images with the poorest quality improves the performance significantly by 7.2%. We visualized these performance improvements in Figure 9.

**Figure 9.** Area under the curve (AUC) values for all iris images after removing different parts of the iris images with the poorest quality.

The analysis of the AUC values shows that the performance of the reference iris recognition system has improved by quality filtering in all test datasets when using the DSMI metric for quality assessment. In contrast, BRISQUE is consistent for quality filtering for the REFLEX dataset, but not for the other two test datasets. WAV1 shows inconsistent performance in all test datasets.

The reason for this could be that the DSMI metric is optimized for assessing the image quality of iris images and BRISQUE for the perceptual quality of natural images. Both, however, can assess image quality for different image distortions. The WAV1 metric is optimized for blur assessment. Since blur is common in iris images taken with handheld devices, we compare our method with the WAV1 metric. However, the iris images in test datasets have more complicated authentic in-the-wild image distortions, and these distortions degrade the performance of WAV1 in all test datasets.

#### 5.2.3. Equal Error Rate

The equal error rate (EER) is the rate at which both accept and reject errors are equal. The EER is used for comparing the accuracy of classification systems with different receiver operating characteristic (ROC) curves. With the EER approach, the system with the lowest EER is considered the most accurate.

In Table 3, we calculated the EER values when three image quality metrics were used to filter out the poor-quality iris images from the test datasets. The greatest performance improvement is achieved by filtering out poor-quality iris images using the DSMI metric on the REFLEX dataset. The PHONE dataset is the more challenging dataset for the reference iris recognition system, resulting in higher EER values.

The results confirm that rejecting poor-quality images using the proposed DSMI metric improves the iris recognition performance consistently, while this observation does not hold for BRISQUE and WAV1 metrics.

**Table 3.** The equal error rate (EER) values are calculated after filtering different parts of the iris images with the poorest quality from each test dataset. This table shows the EER values when all iris images are passed to the iris recognition system and after filtering out one quarter, half, and three quarters of the iris images with the poorest quality from the REFLEX, LFC, and PHONE datasets using the DSMI, BRISQUE, and WAV1 quality metrics.


In summary, for all of the test iris image datasets (REFLEX, LFC, PHONE) and all of the performance evaluation methods (Daugman's decidability index, AUC, EER), the performance of the reference iris recognition system (OSIRIS, Version 4.1) increased consistently by filtering out iris images with the poor quality using the proposed DSMI quality metric. In contrast, for the other two image quality metrics (BRISQUE, WAV1), the experiments showed inconsistencies, i.e., removing more low-quality images did not always increase the performance of the reference iris recognition system.

Figure 10 shows some iris samples from the test datasets with poor quality scores predicted by the proposed DSMI metric. These samples will be filtered out when we remove a quarter of the iris images with the poorest quality from each test dataset. If we pass these samples to the reference iris detection system for iris recognition, all of them will be falsely rejected. Thus, the proposed DSMI metric can be used to decide whether an input iris sample should be enrolled in a dataset or rejected, and a new sample should be captured based on the quality score. Although our method is designed to consider only image covariates, some subject covariates, such as eyelid occlusion due to blinking, may also result in motion blur or other image quality distortions that can be measured by our proposed quality metric, as shown in Figure 10c. All iris samples shown in Figure 10 suffer from authentic image distortion and other quality degradation due to subject covariates.

Figure 11 shows some iris samples with DSMI scores that are higher than the threshold for filtering out one quarter of the iris samples with the poorest quality from each test dataset. Our proposed framework passes these images for iris segmentation and identification when only a quarter of the iris images with the poorest quality are filtered out from the test datasets. However, all of these samples will be falsely rejected by the reference iris recognition system. Some of these images have quality degradation related to subject covariates, such as eyelashes obscuring the iris or closed eyes.

(**a**) 0.35 (**b**) 0.27 (**c**) 0.09 (**d**) 0.33 (**e**) 0.39 (**f**) 0.35

**Figure 10.** The first row shows some iris samples from the multi-modal biometric dataset *GC*<sup>2</sup> [36], which are classified as low-quality samples by our DSMI metric. All of these samples would be falsely rejected with high dissimilarity scores (>0.47) by the reference iris detection system. However, if we filter out a quarter of the iris images with the poorest quality from each test dataset, these samples will be removed and not passed to the iris recognition system. The second row shows the segmentation result of the segmentation module of the reference iris recognition system. The DSMI scores are listed below the iris samples.

**Figure 11.** The first row shows some iris samples from the multi-modal biometric dataset *GC*<sup>2</sup> [36], which are classified by our DSMI metric as iris samples of sufficient quality if only one quarter of the iris images with the poorest quality are filtered out. Therefore, these images are passed to the iris recognition pipeline for further processing. However, all of these samples would be falsely rejected by the reference iris recognition system with high dissimilarity values (>0.47). The second row shows the segmentation result of the segmentation module of the reference iris recognition system. The DSMI scores are listed below the iris samples.

The iris samples that are shown in Figure 11 have fewer image distortions compared to the sample shown in Figure 10. Therefore, our quality metric predicts higher quality scores for these iris images. Some of these images have quality degradations related to subject covariates, such as eyelashes obscuring the iris or closed eyes. If we filter out half of the iris samples with the poorest quality, these samples will be filtered. However, by setting a higher quality filtering threshold, some iris samples may be rejected unnecessarily.

#### *5.3. Computational Complexity*

It is straightforward to assess the computational complexity of the DSMI quality metric by checking the algorithmic steps, outlined in Section 3.1, one by one. The result is a time complexity, linear in the size of the input image. More precisely, it is *O*(*N* × *M* × *P*), where *N* × *M* is the image size in pixels, and *P* is the number of points checked in the neighborhood of each pixel for deriving the sign and magnitude patterns.

We also recorded the actual speed of the quality metric using our implementation, running on an MSI GP60 laptop with an Intel Core i7 processor and 16GB RAM with MATLAB version 2018b in Ubuntu 18.04.3 LTS. We computed the DSMI quality scores on four parts of the test datasets, each containing iris images of the same size in pixels, ranging from 596 × 397 up to 2036 × 1358 (see Table 4). The table confirms the linear time complexity, amounting to roughly 0.06 × <sup>10</sup>−<sup>6</sup> seconds per pixel. At that processing speed, a throughput of 66 frames per second (FPS) can be achieved at resolution 596 × 397. For the higher resolutions, 625 × 537, 1233 × 810, and 2036 × 1358, the speed is 40, 16, and 6 FPS, respectively. Therefore, the proposed method can be used to assess the quality of iris images in interactive applications, such as iris recognition systems based on handheld imaging devices.

**Table 4.** Comparison of the average running time (seconds) on four sets of iris images with different resolutions.

