A Fine-Grained Bird Classification Method Based on Attention and Decoupled Knowledge Distillation
Abstract
:Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. Dataset
2.2. Fine-Grained Bird Image Classification Model
2.3. Attention-Guided Data Augmentation
2.4. Compression of The Fine-Grained Bird Classification Model
2.5. Decoupled Knowledge Distillation
3. Results
3.1. Evaluation Metrics
3.2. Implementation Details
3.3. Localization Effect Visualization of Objects and Key Part Regions
3.4. Ablation Experiments
3.5. Experiment to Compare the Effects of Knowledge Distillation
3.6. Model Comparison
3.7. Comparison of the Number of Parameters and Operations Predicted by the Model
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kati, V.I.; Sekercioglu, C.H. Diversity, ecological structure, and conservation of the landbird community of Dadia reserve, Greece. Divers. Distrib. 2006, 12, 620–629. [Google Scholar] [CrossRef]
- Bibby, C.J. Making the most of birds as environmental indicators. Ostrich 1999, 70, 81–88. [Google Scholar] [CrossRef]
- Charmantier, A.; Gienapp, P. Climate change and timing of avian breeding and migration: Evolutionary versus plastic changes. Evol. Appl. 2014, 7, 15–28. [Google Scholar] [CrossRef] [PubMed]
- Gregory, R.D.; Noble, D.; Field, R.; Marchant, J.; Raven, M.; Gibbons, D. Using birds as indicators of biodiversity. Ornis Hung. 2003, 12, 11–24. [Google Scholar]
- Jasim, H.A.; Ahmed, S.R.; Ibrahim, A.A.; Duru, A.D. Classify Bird Species Audio by Augment Convolutional Neural Network. In Proceedings of the 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), Ankara, Turkey, 9–11 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
- Kahl, S.; Clapp, M.; Hopping, W.; Goëau, H.; Glotin, H.; Planqué, R.; Vellinga, W.P.; Joly, A. Overview of birdclef 2020: Bird sound recognition in complex acoustic environments. In Proceedings of the CLEF 2020—11th International Conference of the Cross-Language Evaluation Forum for European Languages, Thessaloniki, Greece, 22–25 September 2020. [Google Scholar]
- Kahl, S.; Wood, C.M.; Eibl, M.; Klinck, H. BirdNET: A deep learning solution for avian diversity monitoring. Ecol. Inform. 2021, 61, 101236. [Google Scholar] [CrossRef]
- Zhang, C.; Chen, Y.; Hao, Z.; Gao, X. An Efficient Time-Domain End-to-End Single-Channel Bird Sound Separation Network. Animals 2022, 12, 3117. [Google Scholar] [CrossRef] [PubMed]
- Theivaprakasham, H.; Sowmya, V.; Ravi, V.; Gopalakrishnan, E.; Soman, K. Hybrid Features-Based Ensembled Residual Convolutional Neural Network for Bird Acoustic Identification. In Advances in Communication, Devices and Networking; Springer: Berlin/Heidelberg, Germany, 2023; pp. 437–445. [Google Scholar]
- Raj, S.; Garyali, S.; Kumar, S.; Shidnal, S. Image based bird species identification using convolutional neural network. Int. J. Eng. Res. Technol. 2020, 9, 346. [Google Scholar]
- Rong, Y.; Xu, W.; Akata, Z.; Kasneci, E. Human attention in fine-grained classification. arXiv 2021, arXiv:2111.01628. [Google Scholar]
- Varghese, A.; Shyamkrishna, K.; Rajeswari, M. Utilization of deep learning technology in recognizing bird species. In AIP Conference Proceedings; AIP Publishing LLC: Melville, NY, USA, 2022; Volume 2463, p. 020035. [Google Scholar]
- Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 10 January 2023).
- Wang, Y.; Wang, Z. A survey of recent work on fine-grained image classification techniques. J. Vis. Commun. Image Represent. 2019, 59, 210–214. [Google Scholar] [CrossRef]
- Zhang, H.; Xu, T.; Elhoseiny, M.; Huang, X.; Zhang, S.; Elgammal, A.; Metaxas, D. Spda-cnn: Unifying semantic part detection and abstraction for fine-grained recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1143–1152. [Google Scholar]
- Liu, Y.; Bai, Y.; Che, X.; He, J. Few-Shot Fine-Grained Image Classification: A Survey. In Proceedings of the 2022 4th International Conference on Natural Language Processing (ICNLP), Xi’an, China, 25–27 March 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 201–211. [Google Scholar]
- Zhang, N.; Donahue, J.; Girshick, R.; Darrell, T. Part-based R-CNNs for fine-grained category detection. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 834–849. [Google Scholar]
- Lam, M.; Mahasseni, B.; Todorovic, S. Fine-grained recognition as hsnet search for informative image parts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2520–2529. [Google Scholar]
- Wei, X.S.; Xie, C.W.; Wu, J.; Shen, C. Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recognit. 2018, 76, 704–714. [Google Scholar] [CrossRef]
- Lu, X.; Yu, P.; Li, H.; Li, H.; Ding, W. Weakly supervised fine-grained image classification algorithm based on attention-attention bilinear pooling. J. Comput. Appl. 2021, 41, 1319. [Google Scholar]
- Xiao, T.; Xu, Y.; Yang, K.; Zhang, J.; Peng, Y.; Zhang, Z. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 842–850. [Google Scholar]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
- Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv 2013, arXiv:1312.4400. [Google Scholar]
- Kumar Singh, K.; Jae Lee, Y. Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3524–3533. [Google Scholar]
- Zhang, X.; Wei, Y.; Feng, J.; Yang, Y.; Huang, T.S. Adversarial complementary learning for weakly supervised object localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1325–1334. [Google Scholar]
- Fu, J.; Zheng, H.; Mei, T. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4438–4446. [Google Scholar]
- Yang, Z.; Luo, T.; Wang, D.; Hu, Z.; Gao, J.; Wang, L. Learning to navigate for fine-grained classification. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 420–435. [Google Scholar]
- Hu, T.; Qi, H.; Huang, Q.; Lu, Y. See better before looking closer: Weakly supervised data augmentation network for fine-grained visual classification. arXiv 2019, arXiv:1901.09891. [Google Scholar]
- Hu, B.; Lai, J.H.; Guo, C.C. Location-aware fine-grained vehicle type recognition using multi-task deep networks. Neurocomputing 2017, 243, 60–68. [Google Scholar] [CrossRef]
- Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge distillation: A survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
- Tang, J.; Liu, M.; Jiang, N.; Cai, H.; Yu, W.; Zhou, J. Data-free network pruning for model compression. In Proceedings of the 2021 IEEE International Symposium on Circuits and Systems (ISCAS), Daegu, Republic of Korea, 22–28 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–5. [Google Scholar]
- Liu, F.; Zhao, W.; He, Z.; Wang, Y.; Wang, Z.; Dai, C.; Liang, X.; Jiang, L. Improving neural network efficiency via post-training quantization with adaptive floating-point. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 5281–5290. [Google Scholar]
- Lee, S.; Kim, H.; Jeong, B.; Yoon, J. A training method for low rank convolutional neural networks based on alternating tensor compose-decompose method. Appl. Sci. 2021, 11, 643. [Google Scholar] [CrossRef]
- Chen, P.; Liu, S.; Zhao, H.; Jia, J. Distilling knowledge via knowledge review. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5008–5017. [Google Scholar]
- Zhao, B.; Cui, Q.; Song, R.; Qiu, Y.; Liang, J. Decoupled Knowledge Distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11953–11962. [Google Scholar]
- Li, J.; Bhat, A.; Barmaki, R. Pose Uncertainty Aware Movement Synchrony Estimation via Spatial-Temporal Graph Transformer. In Proceedings of the 2022 International Conference on Multimodal Interaction, Bangalore, India, 7–18 November 2022; pp. 73–82. [Google Scholar]
- Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech-Ucsd Birds-200-2011 Dataset. 2011. Available online: https://authors.library.caltech.edu/27452/ (accessed on 10 January 2023).
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Wei, X.S.; Luo, J.H.; Wu, J.; Zhou, Z.H. Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans. Image Process. 2017, 26, 2868–2881. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, F.; Li, M.; Zhai, G.; Liu, Y. Multi-branch and multi-scale attention learning for fine-grained visual categorization. In Proceedings of the International Conference on Multimedia Modeling, Prague, Czech Republic, 22–24 June 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 136–147. [Google Scholar]
- Zhou, B.; Khosla, A.; Lapedriza, À.; Oliva, A.; Torralba, A. Object Detectors Emerge in Deep Scene CNNs. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Vujović, Z. Classification model evaluation metrics. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 599–606. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 248–255. [Google Scholar]
- Sun, C.; Shrivastava, A.; Singh, S.; Gupta, A. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Models | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
ShuffleNetV2 | 35.26% | 37.54% | 35.41% | 35.87% |
V2(P) | 64.98% | 65.66% | 65.19% | 64.51% |
V2(P,Os) | 72.97% | 74.45% | 73.18% | 73.32% |
V2(P,Os,L) | 76.48% | 77.40% | 76.70% | 76.54% |
V2(P,Os,Ps) | 78.24% | 80.03% | 78.41% | 78.51% |
V2(P,Os,Ps,L) | 84.05% | 84.85% | 84.16% | 84.03% |
V2(P,Os,Ps,L,D) | 87.02% | 87.61% | 87.16% | 87.01% |
V2(P,Os,Ps,L,D,Ot,Pt) | 87.63% | 88.31% | 87.78% | 87.74% |
Weight | 1.0 | 1.5 | 2.0 | 2.5 | 3.0 | |
---|---|---|---|---|---|---|
Accuracy | 86.02% | 86.81% | 86.97% | 87.63% | 87.11% | 86.97% |
KD methods | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
None | 84.05% | 84.85% | 84.16% | 84.03% |
KD | 85.48% | 86.30% | 85.62% | 85.56% |
DKD | 87.02% | 87.61% | 87.16% | 87.01% |
Models | Backbone Network | Accuracy (%) |
---|---|---|
ShuffleNetV2 | _ | 65.0 |
Densenet121 | _ | 83.4 |
Mask-CNN | _ | 87.3 |
HSnet | _ | 87.5 |
RA-CNN | VGG-19 | 85.4 |
NTS-Net | ResNet50 | 87.5 |
Teacher Model | Densenet121 | 89.5 |
Student Model | ShuffleNetV2 | 87.6 |
Models | Model Size | Params | FLOPs | Predicted Time |
---|---|---|---|---|
ShuffleNetV2 | 177.07 MB | 2.28 M | 598.5 M | 79 ms |
Densenet121 | 1.31 GB | 6.95 M | 11.53 G | 519 ms |
RA-CNN | 3.06 GB | 265.94 M | 117.65 G | 1.79 s |
NTS-Net | 1.24 GB | 29.03 M | 16.94 G | 903 ms |
Teacher Model | 1.32 GB | 6.95 M | 23.06 G | 1.1 s |
Student Model | 177.07 MB | 2.28 M | 1.2 G | 146 ms |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, K.; Yang, F.; Chen, Z.; Chen, Y.; Zhang, Y. A Fine-Grained Bird Classification Method Based on Attention and Decoupled Knowledge Distillation. Animals 2023, 13, 264. https://doi.org/10.3390/ani13020264
Wang K, Yang F, Chen Z, Chen Y, Zhang Y. A Fine-Grained Bird Classification Method Based on Attention and Decoupled Knowledge Distillation. Animals. 2023; 13(2):264. https://doi.org/10.3390/ani13020264
Chicago/Turabian StyleWang, Kang, Feng Yang, Zhibo Chen, Yixin Chen, and Ying Zhang. 2023. "A Fine-Grained Bird Classification Method Based on Attention and Decoupled Knowledge Distillation" Animals 13, no. 2: 264. https://doi.org/10.3390/ani13020264
APA StyleWang, K., Yang, F., Chen, Z., Chen, Y., & Zhang, Y. (2023). A Fine-Grained Bird Classification Method Based on Attention and Decoupled Knowledge Distillation. Animals, 13(2), 264. https://doi.org/10.3390/ani13020264