Boosting the Performance of Deep Ear Recognition Systems Using Generative Adversarial Networks and Mean Class Activation Maps
Abstract
:1. Introduction
- -
- To mitigate the challenges associated with an absence of color information and to enhance the visual quality of dark images during their input to a CNN model, we introduced and trained a proficient framework. This framework employs a DCGAN model for the purpose of colorizing grayscale and dark images.
- -
- To improve the predictive ability of CNNs, we introduced a novel framework, termed Mean-CAM-CNN. This framework is designed to guide the CNN’s focus towards the most salient common region, encapsulating the most pertinent features. The Mean-CAM process incorporates CAMs to delineate an RoI from images belonging to the same class. This selective mechanism isolates discriminative features that furnish pertinent representations essential for the task of ear recognition.
- -
- We conducted an extensive evaluation of the proposed methodologies by employing two widely recognized and challenging ear recognition datasets. This evaluation encompassed both graphical and statistical analyses to comprehensively assess the performance outcomes.
2. Related Work
2.1. Handcrafted Methods
2.2. Deep Learning Methods
3. Proposed Approach
3.1. Preprocessing
3.2. Feature Extraction/Classification
3.2.1. CAM Methodology
3.2.2. Proposed Mean-CAM Methodology
3.2.3. Mask Inference
3.2.4. Global and Local Stages
4. Experimental Analysis
- Rank-1 and Rank-5 recognition rates.
- Cumulative match score curves (CMCs).
- Area under the CMC curve (AUCMC).
4.1. Datasets
4.1.1. MAI
4.1.2. AWE
4.2. Evaluation Protocols
- -
- Normalization using the mean and standard deviation.
- -
- Random rotation of the image by −20 and +20 degrees.
- -
- Application of a Gaussian blur filter to the image.
- -
- Adjustment of the hue, saturation, contrast, and brightness of the image within specified range values.
- -
- Horizontal flipping of the image by 50%.
4.3. Setup
4.4. Experiment #1
4.5. Experiment #2
4.6. Experiment #3
4.7. Comparison
5. Conclusions
- –
- Firstly, we aim to investigate various feature visualization techniques, such as t-distributed stochastic neighbor embedding (t-SNE). This analysis will allow us to gain deeper insights into the discriminatory features inherent in ear images.
- –
- Secondly, we plan to evaluate multiple CNN architectures concurrently. This comparative study will enable us to identify the most effective CNN architecture for our specific task, thereby optimizing the performance of our system.
- –
- Lastly, we will explore the potential synergies between deep-learned features and handcrafted features. This entails investigating combinations such as local binary pattern, robust local oriented patterns, and local phase quantization. By integrating these different types of features, we aim to leverage the strengths of both deep learning and traditional feature engineering approaches, resulting in a more robust and accurate identification system.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, Z.; Yang, J.; Zhu, Y. Review of Ear Biometrics. Arch. Comput. Methods Eng. 2021, 28, 149–180. [Google Scholar] [CrossRef]
- Doghmane, H.; Bourouba, H.; Messaoudi, K.; Bourennane, E.B. Ear recognition based on discriminant multi-resolution image representation. Int. J. Biom. 2020, 12, 377–395. [Google Scholar] [CrossRef]
- Sforza, C.; Grandi, G.; Binelli, M.; Tommasi, D.; Rosati, R.; Ferrario, V. Age- and Sex-Related Changes in the Normal Human Ear. Forensic Sci. Int. 2009, 187, 110–111. [Google Scholar] [CrossRef] [PubMed]
- Yoga, S.; Balaih, J.; Rangdhol, V.; Vandana, S.; Paulose, S.; Kavya, L. Assessment of Age Changes and Gender Differences Based on Anthropometric Measurements of the Ear: A Cross-Sectional Study. J. Adv. Clin. Res. Insights 2017, 4, 92–95. [Google Scholar] [CrossRef]
- Ganapathi, I.I.; Ali, S.S.; Prakash, S.; Vu, N.S.; Werghi, N. A survey of 3D ear recognition techniques. ACM Comput. Surv. 2023, 55, 1–36. [Google Scholar] [CrossRef]
- Ma, Y.; Huang, Z.; Wang, X.; Huang, K. An Overview of Multimodal Biometrics Using the Face and Ear. Math. Probl. Eng. 2020, 2020, 6802905. [Google Scholar] [CrossRef]
- Beghriche, T.; Attallah, B.; Brik, Y.; Djerioui, M. A multi-level fine-tuned deep learning based approach for binary classification of diabetic retinopathy. Chemom. Intell. Lab. Syst. 2023, 237, 104820. [Google Scholar] [CrossRef]
- Mohammed, A.; Kora, R. A comprehensive review on ensemble deep learning: Opportunities and challenges. J. King Saud Univ. Comput. Inf. Sci. 2023, 35, 757–774. [Google Scholar]
- Amrouni, N.; Benzaoui, A.; Zeroual, A. Palmprint Recognition: Extensive Exploration of Databases, Methodologies, Comparative Assessment, and Future Directions. Appl. Sci. 2023, 14, 153. [Google Scholar] [CrossRef]
- Matsuo, Y.; LeCun, Y.; Sahani, M.; Precup, D.; Silver, D.; Sugiyama, M.; Morimoto, J. Deep Learning, Reinforcement Learning, and World Models. Neural Netw. 2022, 152, 267–275. [Google Scholar] [CrossRef] [PubMed]
- Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning Deep Features for Discriminative Localization. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
- Hassaballah, M.; Alshazly, H.A.; Ali, A.A. Ear Recognition Using Local Binary Patterns: A Comparative Experimental Study. Expert Syst. Appl. 2019, 118, 182–200. [Google Scholar] [CrossRef]
- Hassaballah, M.; Alshazly, H.A.; Ali, A.A. Robust Local Oriented Patterns for Ear Recognition. Multimed. Tools Appl. 2020, 79, 31183–31204. [Google Scholar] [CrossRef]
- Sarangi, P.P.; Mishra, B.S.P.; Dehuri, S.; Cho, S.B. An Evaluation of Ear Biometric System Based on Enhanced Jaya Algorithm and SURF Descriptors. Evol. Intell. 2020, 13, 443–461. [Google Scholar] [CrossRef]
- Sajadi, S.; Fathi, A. Genetic Algorithm Based Local and Global Spectral Features Extraction for Ear Recognition. Expert Syst. Appl. 2020, 159, 113639. [Google Scholar] [CrossRef]
- Khaldi, Y.; Benzaoui, A. Region of interest synthesis using image-to-image translation for ear recognition. In Proceedings of the 2020 International Conference on Advanced Aspects of Software Engineering (ICAASE), Constantine, Algeria, 28–30 November 2020; pp. 1–6. [Google Scholar]
- Regouid, M.; Touahria, M.; Benouis, M.; Mostefai, L.; Lamiche, I. Comparative Study of 1D-Local Descriptors for Ear Biometric System. Multimed. Tools Appl. 2022, 81, 29477–29503. [Google Scholar] [CrossRef]
- Korichi, A.; Slatnia, S.; Aiadi, O. TR-ICANet: A Fast Unsupervised Deep-Learning-Based Scheme for Unconstrained Ear Recognition. Arab. J. Sci. Eng. 2022, 47, 9887–9898. [Google Scholar] [CrossRef]
- Alshazly, H.; Linse, C.; Barth, E.; Martinetz, T. Handcrafted versus CNN Features for Ear Recognition. Symmetry 2019, 11, 1493. [Google Scholar] [CrossRef]
- Alshazly, H.; Linse, C.; Barth, E.; Martinetz, T. Ensembles of Deep Learning Models and Transfer Learning for Ear Recognition. Sensors 2019, 19, 4139. [Google Scholar] [CrossRef] [PubMed]
- Priyadharshini, R.A.; Arivazhagan, S.; Arun, M. A Deep Learning Approach for Person Identification Using Ear Biometrics. Appl. Intell. 2020, 51, 2161–2172. [Google Scholar] [CrossRef] [PubMed]
- Khaldi, Y.; Benzaoui, A. A New Framework for Grayscale Ear Images Recognition Using Generative Adversarial Networks under Unconstrained Conditions. Evol. Syst. 2021, 12, 923–934. [Google Scholar] [CrossRef]
- Alshazly, H.; Linse, C.; Barth, E.; Idris, S.A.; Martinetz, T. Towards Explainable Ear Recognition Systems Using Deep Residual Networks. IEEE Access 2021, 9, 122254–122273. [Google Scholar] [CrossRef]
- Omara, I.; Hagag, A.; Ma, G.; El-Samie, A.; Fathi, E.; Song, E. A Novel Approach for Ear Recognition: Learning Mahalanobis Distance Features from Deep CNNs. Mach. Vis. Appl. 2021, 32, 1–14. [Google Scholar] [CrossRef]
- Sharkas, M. Ear Recognition with Ensemble Classifiers; A Deep Learning Approach. Multimed. Tools Appl. 2022, 81, 43919–43945. [Google Scholar] [CrossRef]
- Xu, X.; Liu, Y.; Liu, C.; Lu, L. A Feature Fusion Human Ear Recognition Method Based on Channel Features and Dynamic Convolution. Symmetry 2023, 15, 1454. [Google Scholar] [CrossRef]
- Aiadi, O.; Khaldi, B.; Saadeddine, C. MDFNet: An unsupervised lightweight network for ear print recognition. J. Ambient Intell. Human Comput. 2023, 14, 13773–13786. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
- Gonzalez, E.; Alvarez, L.; Mazorra, L. MAI Ear Database. 2008. Available online: http://www.ctim.es/research%20works/ami%20ear%20database (accessed on 10 April 2024).
- Emeršič, Ž.; Struc, V.; Peer, P. Ear Recognition: More than a Survey. Neurocomputing 2017, 255, 26–39. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 2012 Advances in Neural Information Processing Systems 25, Lake Tahoe, NV, USA, 3–8 December 2012; pp. 1097–1105. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Metric | Architecture | MAI | AWE |
---|---|---|---|
Rank-1 (%) | AlexNet | 88.50 | 30.25 |
VGG-16 | 95.50 | 47.25 | |
VGG-19 | 91.50 | 40.25 | |
ResNet-50 | 98.66 | 57.75 | |
Rank-5 (%) | AlexNet | 95.50 | 50.25 |
VGG-16 | 99.50 | 73.75 | |
VGG-19 | 97.50 | 66.75 | |
ResNet-50 | 99.66 | 79.00 | |
AUCMC (%) | AlexNet | 92.11 | 56.54 |
VGG-16 | 94.56 | 75.41 | |
VGG-19 | 93.91 | 72.44 | |
ResNet-50 | 100.00 | 96.54 |
AMI | AWE | ||||||
---|---|---|---|---|---|---|---|
Rank-1 (%) | Rank-5 (%) | AUCMC (%) | Rank-1 (%) | Rank-5 (%) | AUCMC (%) | ||
ResNet-50 | 98.66 | 99.66 | 100.00 | 57.75 | 79.00 | 96.54 | |
Mean-CAM-CNN | τ = 0.1 | 96.67 | 100.00 | 100.00 | 62.25 | 79.50 | 96.70 |
τ = 0.2 | 98.00 | 100.00 | 99.99 | 58.25 | 78.75 | 97.02 | |
τ = 0.3 | 98.66 | 99.66 | 99.99 | 68.50 | 85.75 | 97.99 | |
τ = 0.4 | 99.33 | 99.33 | 99.99 | 68.75 | 83.25 | 98.56 | |
τ = 0.5 | 99.67 | 100.00 | 100.00 | 74.50 | 89.50 | 98.93 | |
τ = 0.6 | 99.67 | 100.00 | 100.00 | 69.25 | 87.00 | 98.56 | |
τ = 0.7 | 98.33 | 100.00 | 100.00 | 67.50 | 87.25 | 98.34 | |
τ = 0.8 | 98.66 | 99.66 | 99.99 | 62.00 | 85.25 | 97.88 | |
τ = 0.9 | 96.00 | 99.00 | 99.99 | 60.00 | 78.75 | 96.81 |
Dataset | Original Image | Results | B/L Visualizations | Mean-CAM Visualizations |
---|---|---|---|---|
MAI | Input class: 46 B/L pred: 46 P = 56.69% Mean-CAM pred: 46 P = 90.10% | |||
Input class: 88 B/L pred: 74 P = 33.27% Mean-CAM pred: 88 P = 99.83% | ||||
AWE | Input class: 95 B/L pred: 41 P = 92.51% Mean-CAM pred: 95 P = 99.14% | |||
Input class: 8 B/L pred: 65 P = 79.10% Mean-CAM pred: 8 P = 98.96% |
AMI | AWE | |||||
---|---|---|---|---|---|---|
Rank-1 (%) | Rank-5 (%) | AUCMC (%) | Rank-1 (%) | Rank-5 (%) | AUCMC (%) | |
Without preprocessing | 99.67 | 100.00 | 100.00 | 74.50 | 89.50 | 98.93 |
With preprocessing | 100.00 | 100.00 | 100.00 | 76.25 | 91.25 | 99.96 |
Approach | Publication | Year | Method | MAI | AWE |
---|---|---|---|---|---|
Handcrafted | Hassaballah et al. [13] | 2019 | LBP variants | 73.71 | 49.60 |
Hassaballah et al. [14] | 2020 | RLOP | 72.29 | 54.10 | |
Sarangi et al. [15] | 2020 | Jaya algorithm + SURF | / | 44.00 | |
Sajadi and Fathi [16] | 2020 | GZ + LPQ | / | 53.50 | |
Khaldi and Benzaoui [17] | 2020 | BSIF | / | 44.53 | |
Regouid et al. [18] | 2022 | 1D multi-resolution LBP | 100.00 | 43.00 | |
Deep-Learning | Alshazly et al. [20] | 2019 | VGG-13-16-19-ensembles | 93.96 | / |
Alshazly et al. [21] | 2019 | AlexNet (Fine-tuned) | 94.50 | / | |
Priyadharshini et al. [22] | 2020 | CNN | 96.99 | / | |
Khaldi and Benzaoui [23] | 2021 | DCGAN + VGG16 | 96.00 | 50.53 | |
Alshazly et al. [24] | 2021 | Combination of ResNet features | 99.64 | 67.25 | |
Omara et al. [25] | 2021 | Mahalanobis distance + CNN | 97.80 | / | |
Sharkas [26] | 2022 | Discrete curvelet transform + Ensemble of ResNet features | 99.45 | / | |
Xu et al. [27] | 2023 | CFDCNet | 99.70 | 72.70 | |
Aiadi et al. [28] | 2023 | MDFNet | 97.67 | / | |
Our proposed method | 2024 | DCGAN + Mean-CAM-CNN | 100.00 | 76.25 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bouaouina, R.; Benzaoui, A.; Doghmane, H.; Brik, Y. Boosting the Performance of Deep Ear Recognition Systems Using Generative Adversarial Networks and Mean Class Activation Maps. Appl. Sci. 2024, 14, 4162. https://doi.org/10.3390/app14104162
Bouaouina R, Benzaoui A, Doghmane H, Brik Y. Boosting the Performance of Deep Ear Recognition Systems Using Generative Adversarial Networks and Mean Class Activation Maps. Applied Sciences. 2024; 14(10):4162. https://doi.org/10.3390/app14104162
Chicago/Turabian StyleBouaouina, Rafik, Amir Benzaoui, Hakim Doghmane, and Youcef Brik. 2024. "Boosting the Performance of Deep Ear Recognition Systems Using Generative Adversarial Networks and Mean Class Activation Maps" Applied Sciences 14, no. 10: 4162. https://doi.org/10.3390/app14104162
APA StyleBouaouina, R., Benzaoui, A., Doghmane, H., & Brik, Y. (2024). Boosting the Performance of Deep Ear Recognition Systems Using Generative Adversarial Networks and Mean Class Activation Maps. Applied Sciences, 14(10), 4162. https://doi.org/10.3390/app14104162