MFANet: A Collar Classification Network Based on Multi-Scale Features and an Attention Mechanism
Abstract
:1. Introduction
- A novel module MFA is proposed, which is able to efficiently extract multi-scale feature information while reducing the computational overhead, incorporating a lightweight attentional approach to highlight the feature representation of key components and regions.
- A new network architecture, MFANet, is built on the basis of the MFA module, which inherits the advantages of the MFA module and can better handle image classification problems that contain multiple objects, multiple noises and a small percentage of recognition targets.
- Extensive experimental results show that, compared with the current mainstream network structure, the proposed MFANet obtains significant gains in classification accuracy on the collar image dataset Collar6 with fewer parameters and computations, achieving better classification results on the fashion dataset DeepFashion6, similarly obtaining accuracy gains on the standard classification dataset CIFAR-10.
2. Related Work
2.1. Multi-Scale in Computer Vision
2.2. Channel Attention
3. Proposed Method
3.1. MFA Module
3.1.1. Multi-Scale Feature Information Extraction
Algorithm 1 DW algorithm |
Input: Feature map X, |
Output: Feature map |
|
3.1.2. Attention Weight Computing
3.1.3. Attentional Calibration and Feature Aggregation
Algorithm 2 MFA algorithm |
Input: Feature map X |
Output: Feature map |
|
3.2. Network Design
4. Experiments
4.1. Experimental Configuration
4.2. Datasets
4.2.1. CIFAR-10 Dataset
4.2.2. Collar6 Dataset
4.2.3. DeepFashion6
4.3. Comparative Experiments
4.3.1. Experimental Results and Analysis of Comparison with Mainstream Neural Networks on the Collar6 Dataset
4.3.2. Experimental Results and Analysis of Comparative Experiments with Mainstream Attention Networks on the Collar6 Dataset
4.3.3. Comparative Experimental Results and Analysis on the DeepFashion6 Dataset
4.3.4. Comparative Experimental Results and Analysis on the CIFAR-10 Dataset
4.4. Ablation Experiments
4.4.1. Structural Ablation Experiments
4.4.2. Hyperparameter Ablation Experiments
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Yuan, W. Computer-Aided Data Analysis of Clothing Pattern Based on Popular Factors. In Proceedings of the 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 20–22 January 2022; pp. 1543–1546. [Google Scholar] [CrossRef]
- Rajput, P.S.; Aneja, S. IndoFashion: Apparel Classification for Indian Ethnic Clothes. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, 19–25 June 2021; pp. 3930–3934. [Google Scholar] [CrossRef]
- De Souza Inácio, A.; Lopes, H.S. EPYNET: Efficient Pyramidal Network for Clothing Segmentation. IEEE Access 2020, 8, 187882–187892. [Google Scholar] [CrossRef]
- Liu, Z.; Luo, P.; Qiu, S.; Wang, X.; Tang, X. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1096–1104. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Chengcheng, H.; Jian, Y.; Xiao, Q. Research and Application of Fine-Grained Image Classification Based on Small Collar Dataset. Front. Comput. Neurosci. 2022, 15, 121. [Google Scholar] [CrossRef] [PubMed]
- Fan, J.; Bocus, M.J.; Hosking, B.; Wu, R.; Liu, Y.; Vityazev, S.; Fan, R. Multi-Scale Feature Fusion: Learning Better Semantic Segmentation For Road Pothole Detection. In Proceedings of the 2021 IEEE International Conference on Autonomous Systems (ICAS), Montreal, QC, Canada, 11–13 August 2021; pp. 1–5. [Google Scholar] [CrossRef]
- Yang, F.; Li, X.; Shen, J. MSB-FCN: Multi-Scale Bidirectional FCN for Object Skeleton Extraction. IEEE Trans. Image Process. 2021, 30, 2301–2312. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Du, R.; Chang, D.; Bhunia, A.K.; Xie, J.; Ma, Z.; Song, Y.Z.; Guo, J. Fine-Grained Visual Classification via Progressive Multi-granularity Training of Jigsaw Patches. In Proceedings of the Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 153–168. [Google Scholar]
- Li, Y.; Chen, Y.; Wang, N.; Zhang, Z. Scale-Aware Trident Networks for Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE Computer Society: Los Alamitos, CA, USA, 2019; pp. 6053–6062. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE Computer Society: Los Alamitos, CA, USA, 2017; pp. 6230–6239. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar] [CrossRef]
- Gao, S.H.; Cheng, M.M.; Zhao, K.; Zhang, X.Y.; Yang, M.H.; Torr, P. Res2Net: A New Multi-Scale Backbone Architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 652–662. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.u.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11531–11539. [Google Scholar] [CrossRef]
- Yang, M.; Wang, H.; Hu, K.; Yin, G.; Wei, Z. IA-Net: An Inception–Attention-Module-Based Network for Classifying Underwater Images From Others. IEEE J. Ocean. Eng. 2022, 47, 704–717. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3141–3149. [Google Scholar] [CrossRef]
- Xu, S.; He, Q.; Tao, S.; Chen, H.; Chai, Y.; Zheng, W. Pig Face Recognition Based on Trapezoid Normalized Pixel Difference Feature and Trimmed Mean Attention Mechanism. IEEE Trans. Instrum. Meas. 2023, 72, 1–13. [Google Scholar] [CrossRef]
- Wang, Q.; Wu, T.; Zheng, H.; Guo, G. Hierarchical Pyramid Diverse Attention Networks for Face Recognition. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE Computer Society: Los Alamitos, CA, USA, 2020; pp. 8323–8332. [Google Scholar] [CrossRef]
- Chen, B.; Deng, W.; Hu, J. Mixed High-Order Attention Network for Person Re-Identification. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE Computer Society: Los Alamitos, CA, USA, 2019; pp. 371–381. [Google Scholar] [CrossRef]
- Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE Computer Society: Los Alamitos, CA, USA, 2019; pp. 1314–1324. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE Computer Society: Los Alamitos, CA, USA, 2017; pp. 1800–1807. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Maaten, L.V.D.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE Computer Society: Los Alamitos, CA, USA, 2017; pp. 2261–2269. [Google Scholar] [CrossRef]
- Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Proceedings of the Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 122–138. [Google Scholar]
- Zhang, H.; Zu, K.; Lu, J.; Zou, Y.; Meng, D. EPSANet: An efficient pyramid squeeze attention block on convolutional neural network. In Proceedings of the Asian Conference on Computer Vision, Macau, China, 4–8 December 2022; pp. 1161–1177. [Google Scholar]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective Kernel Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE Computer Society: Los Alamitos, CA, USA, 2019; pp. 510–519. [Google Scholar] [CrossRef]
- Xie, S.; Girshick, R.; Dollar, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE Computer Society: Los Alamitos, CA, USA, 2017; pp. 5987–5995. [Google Scholar] [CrossRef]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE Computer Society: Los Alamitos, CA, USA, 2021; pp. 13708–13717. [Google Scholar] [CrossRef]
- Qin, Z.; Zhang, P.; Wu, F.; Li, X. FcaNet: Frequency Channel Attention Networks. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE Computer Society: Los Alamitos, CA, USA, 2021; pp. 763–772. [Google Scholar] [CrossRef]
Network | Includes Attention Mechanism | Includes Multi-Scale Feature Extraction | Computational Volume Scale | Embeddable Modules |
---|---|---|---|---|
InceptionNet | ✕ | ✓ | the larger | ✕ |
Res2Net | ✕ | ✓ | the larger | ✕ |
SENet | ✓ | ✕ | small | ✓ |
ECANet | ✓ | ✕ | small | ✓ |
Ours | ✓ | ✓ | small | ✕ |
Output | ResNet-50 | MFANet |
---|---|---|
112 × 112 × 64 | 7 × 7, 64, stride = 2 | |
56 × 56 × 64 | 3 × 3, max pool, stride = 2 | |
56 × 56 × 256 | ||
28 × 28 × 512 | ||
14 × 14 × 1024 | ||
7 × 7 × 2048 | ||
1 × 1 | 7 × 7, global average pool, n-d fc |
Type | The Number of Training Set Pictures | The Number of Test Set Pictures | Total |
---|---|---|---|
Crew neck | 2480 | 620 | 3100 |
Lapel | 2608 | 652 | 3260 |
Stand collar | 2464 | 616 | 3080 |
Hoodie | 2560 | 640 | 3200 |
V-neck | 2468 | 617 | 3085 |
Fur lapel | 3122 | 625 | 3122 |
Type | The Number of Training Set Pictures | The Number of Test Set Pictures | Total |
---|---|---|---|
Dress | 2555 | 639 | 3194 |
Jacket | 2505 | 627 | 3132 |
Jeans | 2412 | 603 | 3015 |
Shorts | 2541 | 636 | 3177 |
Tank | 2528 | 632 | 3160 |
Tee | 2439 | 610 | 3049 |
Network | Parameters | FLOPs | Top-1 Accuracy (%) |
---|---|---|---|
EMRes-50 [6] | 28.02 M | 4.34 G | 73.6 |
ResNet-50 [9] | 23.52 M | 4.12 G | 66.5 |
ResNeXt-50 [30] | 22.99 M | 4.26 G | 75.7 |
Res2Net [14] | 23.66 M | 4.29 G | 74.8 |
DenseNet-161 [26] | 26.49 M | 7.82 G | 72.3 |
Xception [25] | 20.82 M | 4.58 G | 76.3 |
EPSANet [28] | 20.53 M | 3.63 G | 78.1 |
SKNet [29] | 25.44 M | 4.51 G | 56.1 |
MobileNet_v3_small [24] | 1.52 M | 58.79 M | 73.1 |
MobileNet_v3_large [24] | 4.21 M | 226.4 M | 73.5 |
ShuffleNet_v2 [27] | 1.26 M | 149.58 M | 75.1 |
SqueezeNet [31] | 0.73 M | 2.65 G | 57.1 |
Ours | 13.81 M | 2.61 G | 80.4 |
Method | Backbone Models | Parameters | FLOPs | Top-1 Accuracy (%) |
---|---|---|---|---|
SENet [16] | ResNet-50 | 26.05 M | 4.12 G | 67.9 |
CBAM [32] | 26.05 M | 4.12 G | 67.3 | |
ECANet [17] | 23.52 M | 4.12 G | 68.8 | |
CANet [33] | 25.43 M | 4.14 G | 66.7 | |
FcaNet [34] | 26.03 M | 4.12 G | 69.7 | |
Ours | 13.81 M | 2.61 G | 80.4 | |
SENet [16] | ResNeXt-50 | 25.51 M | 4.27 G | 74.3 |
CBAM [32] | 25.52 M | 4.27 G | 71.8 | |
ECANet [17] | 22.99 M | 4.26 G | 75.7 | |
CANet [33] | 24.91 M | 4.29 G | 69.9 | |
FcaNet [34] | 25.51 M | 4.27 G | 71.8 | |
SENet [16] | Res2Net | 26.18 M | 4.29 G | 73.9 |
CBAM [32] | 26.20 M | 4.29 G | 70.3 | |
ECANet [17] | 23.66 M | 4.29 G | 72.2 | |
CANet [33] | 25.58 M | 4.31 G | 70.8 | |
FcaNet [34] | 26.18 M | 4.29 G | 71.6 | |
SENet [16] | DenseNet-161 | 27.14 M | 7.82 G | 70.4 |
CBAM [32] | 27.14 M | 7.82 G | 73.2 | |
ECANet [17] | 26.49 M | 7.82 G | 72.9 | |
CANet [33] | 26.98 M | 7.83 G | 71.9 |
Network | Parameters | FLOPs | Top-1 Accuracy (%) |
---|---|---|---|
EMRes-50 [6] | 28.02 M | 4.34 G | 86.1 |
CANet [33] | 23.52 M | 4.12 G | 86.4 |
ECANet [17] | 22.99 M | 4.26 G | 86.3 |
SENet [16] | 23.66 M | 4.29 G | 86.3 |
ResNet-50 [9] | 26.49 M | 7.82 G | 85.8 |
ResNeXt-50 [30] | 20.82 M | 4.58 G | 86.5 |
Res2Net [14] | 20.53 M | 3.63 G | 87.0 |
DenseNet-161 [26] | 25.44 M | 4.51 G | 87.3 |
EPSANet [28] | 20.53 M | 3.63 G | 87.4 |
SKNet [29] | 25.44 M | 4.51 G | 83.8 |
Xception [25] | 20.83 M | 4.58 G | 87.3 |
Ours | 13.81 M | 2.61 G | 87.7 |
Network | Parameters | FLOPs | Top-1 Accuracy (%) |
---|---|---|---|
ResNet50-CA [33] | 25.45 M | 4.14 G | 91.2 |
ResNet50-ECA [17] | 23.53 M | 4.12 G | 91.5 |
ResNet50-SE [16] | 26.05 M | 4.12 G | 91.4 |
ResNet-50 [9] | 23.53 M | 4.12 G | 91.2 |
ResNeXt-50 [30] | 23.00 M | 4.26 G | 93.0 |
Res2Net [14] | 23.67 M | 4.29 G | 93.1 |
DenseNet-161 [26] | 26.49 M | 7.82 G | 92.2 |
EPSANet [28] | 20.53 M | 3.63 G | 94.0 |
SKNet [29] | 25.45 M | 4.51 G | 84.6 |
MobileNet_v3_small [24] | 1.53 M | 58. 80 M | 92.2 |
MobileNet_v3_large [24] | 4.21 M | 226.44 M | 92.6 |
ShuffleNet_v2 [27] | 1.26 M | 149.58 M | 92.8 |
SqueezeNet [31] | 0.73 M | 2.65 G | 82.3 |
Xception [25] | 20.83 M | 4.58 G | 92.7 |
Ours | 13.81 M | 2.61 G | 94.4 |
Dataset | Settings | Accuracy (%) |
---|---|---|
Collar6 | Baseline(ResNet-50) | 66.5 |
+Multi-scale Feature Extraction | 79.6 | |
+Attention | 68.8 | |
+MFA | 80.4 | |
DeepFashion6 | Baseline(ResNet-50) | 85.8 |
+Multi-scale Feature Extraction | 87.4 | |
+Attention | 86.5 | |
+MFA | 87.7 | |
CIFAR-10 | Baseline(ResNet-50) | 91.2 |
+Multi-scale Feature Extraction | 93.9 | |
+Attention | 91.5 | |
+MFA | 94.4 |
Dataset | Number of K | Accuracy (%) |
---|---|---|
Collar6 | 3 | 79.9 |
5 | 80.4 | |
7 | 80.0 | |
9 | 79.7 | |
DeepFashion6 | 3 | 87.0 |
5 | 87.7 | |
7 | 87.4 | |
9 | 87.2 | |
CIFAR-10 | 3 | 94.5 |
5 | 94.5 | |
7 | 94.3 | |
9 | 94.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qin, X.; Ya, S.; Yuan, C.; Chen, D.; Long, L.; Liao, H. MFANet: A Collar Classification Network Based on Multi-Scale Features and an Attention Mechanism. Mathematics 2023, 11, 1164. https://doi.org/10.3390/math11051164
Qin X, Ya S, Yuan C, Chen D, Long L, Liao H. MFANet: A Collar Classification Network Based on Multi-Scale Features and an Attention Mechanism. Mathematics. 2023; 11(5):1164. https://doi.org/10.3390/math11051164
Chicago/Turabian StyleQin, Xiao, Shanshan Ya, Changan Yuan, Dingjia Chen, Long Long, and Huixian Liao. 2023. "MFANet: A Collar Classification Network Based on Multi-Scale Features and an Attention Mechanism" Mathematics 11, no. 5: 1164. https://doi.org/10.3390/math11051164
APA StyleQin, X., Ya, S., Yuan, C., Chen, D., Long, L., & Liao, H. (2023). MFANet: A Collar Classification Network Based on Multi-Scale Features and an Attention Mechanism. Mathematics, 11(5), 1164. https://doi.org/10.3390/math11051164