ConvNeXt-Based Fine-Grained Image Classification and Bilinear Attention Mechanism Model
Abstract
:Featured Application
Abstract
1. Introduction
2. Related Work
2.1. Analysis of Traditional Methods
2.2. Introduction to ConvNeXt Network
3. Proposed Algorithm
3.1. ConvNeXt Model Embedded with Attention Mechanism
3.2. Bilinear Attention Mechanism
3.3. BCBAM-Based Attention Framework
3.4. ConvNeXt Based on Multiscale Bilinear Attention Mechanism
4. Experimental Results and Analysis
4.1. Dataset Introduction
4.2. Preconditions of Experiment and Environment Description
- (1)
- Experiment 1: For ConvNext-Tiny network, EBA and SBA were used to embed the traditional attention mechanism (SE, CBAM) and the proposed BCBAM mechanism, respectively. The applicability and robustness of the proposed method were verified on different fine-grained datasets.
- (2)
- Experiment 2: On the basis of Experiment 1, the CUB200-2011 dataset was used to conduct an ablation experiment on the optimal optimization method to remove the influence of some components in the attention mechanism on the performance of the mechanism so as to better understand the behavior of the frame.
- (3)
- Experiment 3: The traditional fine-grained classification network is compared, and the superiority and practicality of the ConvNeXt-Tiny network embedded into BCBAM in implementing fine-grained classification tasks are verified.
4.3. Analysis of Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- He, K.-M.; Zhang, X.-Y.; Ren, S.-Q. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Xie, S.; Girshick, R.; Dollar, P. Aggregated residual transformations for deep neural networks. In Proceedings of the 2017 IEEE Conference on Computer-Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Howard, A.-G.; Zhu, M.; Chen, B.; Kalenichenko, D. Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. Available online: https://arxiv.org/pdf/1704.04861.pdf (accessed on 9 August 2021).
- Zhang, X.-Y.; Zhou, X.-Y.; Lin, M.-X. 2018b. ShuffleNet: Anextremely efficient convolutional neural network for mobile devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision andPatterm Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar]
- Huang, G.; Liu, Z. Densely connected convolutional networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
- Tan, M.-X.; Quoc, V.L. Efficientnet: Rethinkingmodel scaling for convolutional neural networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
- Han, D.; Kim, J.; Kim, J. Deep pyramidal residual networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6307–6315. [Google Scholar]
- Yamada, Y.; Iwamura, M.; Kise, K. Deep pyramidal residual networks with separated stochastic depth. arXiv 2016, arXiv:1612.01230. [Google Scholar]
- Zhang, K.; Guo, L.-R.; Gao, C. Pyramidal RoR for image classification. Clust. Comput. 2019, 22, 5115–5125. [Google Scholar] [CrossRef]
- Yang, Y.-B.; Zhong, Z.-S.; Shen, T.-C. Convolutional neural networks with alternately updated clique. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 2413–2422. [Google Scholar]
- Huang, G.; Liu, S.-C. Condense Net: An efficient Dense Net using learned group convolutions. In Proceedings of the 2018 Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 2752–2761. [Google Scholar]
- Chen, Y.; Li, J.; Xiao, H.; Jin, X.; Yan, S.; Feng, J. Dual Path Networks. In Proceedings of the International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 4467–4475. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems 17; MIT Press: Cambridge, MA, USA, 2017; pp. 5998–6008. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Zhu, Y.-X.; Li, R.C.; Yang, Y. Learning cascade attention for fine-grained image classification. Neural Netw. 2019, 122, 174–182. [Google Scholar] [CrossRef]
- Yan, Y.C.; NI, B.B.; Wei, H.-W. Fine-grained image analysis via progressive feature learning. Neurocomputing 2020, 396, 254–265. [Google Scholar] [CrossRef]
- Lin, T.Y.; Roychowdhury, A.; Maji, S. Bilinear CNN models for fine-grained visual recognition. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Santiago, Chile, 7–13 December 2015; pp. 1449–1457. [Google Scholar]
- Yu, C.-J.; Zhao, X.-Y.; Zheng, Q. Hierarchical bilinear pooling for fine-grained visual recognition. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; Springer: Heidelberg, Germany, 2018; pp. 574–589. [Google Scholar]
- Li, K.-L.; Wang, Y.-H.; Chen, D.; Wang, J. Fine-grained Image Classification Combining Attention and Bilinear Networks. J. Chin. Comput. Syst. 2021, 42, 1071–1076. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. arXiv 2021, arXiv:2201.03545. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- He, K.; Feng, X.; Gao, S.-N. A fine-grained image classification algorithm based on multi-scale feature fusion and repeated attention mechanism. J. Tianjin Univ. (Nat. Sci. Eng. Technol. Ed.) 2020, 53, 1077–1085. (In Chinese) [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11531–11539. [Google Scholar]
- Wah, C.; Branson, S.; Welinder, P. The Caltech-UCSD Birds-200-2011 Dataset; Technical Report CNS-TR-2011-001; California Institute of Technology: Pasadena, CA, USA, 2011. [Google Scholar]
- Maji, S.; Kannala, J.; Rahtu, E. Fine-Grained Visual Classification of Aircraft. arXiv 2013, arXiv:1306.5151. [Google Scholar]
- Krause, J.; Stark, M.; Deng, J. 3D Object Representations for Fine-Grained Categorization. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Tan, M.; Wang, G.-J.; Zhou, J. Fine-grained classification via hierarchical Bilinear pooling with aggregated slack mask. IEEE Access 2019, 7, 117944–117953. [Google Scholar] [CrossRef]
- Zhang, J.; Zhang, R.-S.; Huang, Y.-P. Unsupervised Part Mining for Fine Grained Image Classification. Available online: https://arxiv.org/abs/1902.09941 (accessed on 22 May 2020).
- Fu, J.; Zheng, H.; Mei, T. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4438–4446. [Google Scholar]
- Zheng, H.-L.; Fu, J.-L.; Mei, T. Learning multi-attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; IEEE Computer Society Press: Los Alamitos, CA, USA, 2017; pp. 5219–5227. [Google Scholar]
- Wei, X.; Zhang, Y.; Gong, Y. Grassmann pooling as compact homogeneous Bilinear pooling for fine-grained visual classification. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 355–370. [Google Scholar]
Dataset | Number of Categories | Training Set/Picture | Test Set/Picture |
---|---|---|---|
CUB200-2011 | 200 | 5994 | 5794 |
FGVC-Aircraft | 100 | 6667 | 3333 |
Stanford Cars | 196 | 8144 | 8041 |
ConvNeXt-Tiny | Integration Method | Improvement Method | Accuracy Rate/% | ||
CUB200-2011 | FGVC-Aircraft | Stanford Cars | |||
EBA | SE | 74.5 | 89.5 | — | |
CBAM | 82.9 | 91.0 | — | ||
BCBAM | 83.7 | 91.7 | — | ||
SBA | SE | 82.7 | 91.7 | 92.9 | |
CBAM | 84.7 | 92.0 | 92.7 | ||
BCBAM | 87.8 | 92.1 | 93.3 |
Dataset | Model Evaluation Parameters/% | ||
---|---|---|---|
Precision | Recall | Specificity | |
CUB200-2011 | 88.2 | 87.9 | 99.9 |
FGVC-Aircraft | 92.5 | 92.1 | 99.9 |
Stanford Cars | 93.5 | 93.2 | 99.9 |
Backbone | Dataset | ReLU Activate Function | ECA Mechanisms | BCBAM Mechanisms | Accuracy Rate/% |
---|---|---|---|---|---|
ConvNeXt-Tiny | CUB200-2011 | 85.1 | |||
√ | 85.8 | ||||
√ | 86.0 | ||||
√ | √ | 84.9 | |||
√ | √ | 79.3 | |||
√ | √ | 87.8 | |||
√ | √ | √ | 84.2 |
Backbone | Accuracy Rate/% | ||
---|---|---|---|
CUB200-2011 | FGVC-Aircraft | Stanford Cars | |
ConvNeXt-Tiny | 85.1 | 91.8 | 92.9 |
ResNet50 | 81.6 | 88.9 | 92.0 |
SE-ResNet50 | 80.9 | 88.7 | 91.7 |
CBAM-ResNet50 | 81.2 | 89.2 | 92.0 |
ECA-ResNet50 | 79.5 | 88.9 | 91.5 |
BCNN [19] | 84.1 | 84.1 | 91.3 |
HBP-RNet [29] | 85.8 | 90.2 | 92.2 |
UPM [30] | 81.9 | 85.9 | 89.2 |
RA-CNN [31] | 85.3 | 88.2 | 92.5 |
MA-CNN [32] | 86.5 | 89.9 | 92.8 |
GP-256 [33] | 85.8 | 88.1 | 91.7 |
Bilinear CBAM ConvNeXt | 87.8 | 92.1 | 93.3 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Z.; Gu, T.; Li, B.; Xu, W.; He, X.; Hui, X. ConvNeXt-Based Fine-Grained Image Classification and Bilinear Attention Mechanism Model. Appl. Sci. 2022, 12, 9016. https://doi.org/10.3390/app12189016
Li Z, Gu T, Li B, Xu W, He X, Hui X. ConvNeXt-Based Fine-Grained Image Classification and Bilinear Attention Mechanism Model. Applied Sciences. 2022; 12(18):9016. https://doi.org/10.3390/app12189016
Chicago/Turabian StyleLi, Zhiheng, Tongcheng Gu, Bing Li, Wubin Xu, Xin He, and Xiangyu Hui. 2022. "ConvNeXt-Based Fine-Grained Image Classification and Bilinear Attention Mechanism Model" Applied Sciences 12, no. 18: 9016. https://doi.org/10.3390/app12189016
APA StyleLi, Z., Gu, T., Li, B., Xu, W., He, X., & Hui, X. (2022). ConvNeXt-Based Fine-Grained Image Classification and Bilinear Attention Mechanism Model. Applied Sciences, 12(18), 9016. https://doi.org/10.3390/app12189016