Fine-Grained Detection Model Based on Attention Mechanism and Multi-Scale Feature Fusion for Cocoon Sorting
Abstract
:1. Introduction
2. Materials and Methods
2.1. Dataset
2.2. Overview Architecture
2.3. Hybrid Feature Extraction Network (HFE-Net)
2.3.1. Local Feature Extraction Block (LFEB)
2.3.2. Global Feature Extraction Block (GFEB)
2.4. Efficient Multi-Scale Feature Fusion Module (EMFF)
2.5. Overall Framework of the Model
2.6. Implementation Details
3. Results and Discussion
3.1. Performance of Designed AMMF-Net
3.2. Ablation Study
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Wen, C.; Wen, J.; Li, J.; Luo, Y.; Chen, M.; Xiao, Z.; Xu, Q.; Liang, X.; An, H. Lightweight Silkworm Recognition Based on Multi-Scale Feature Fusion. Comput. Electron. Agric. 2022, 200, 107234. [Google Scholar] [CrossRef]
- Nahiduzzaman, M.; Chowdhury, M.E.H.; Salam, A.; Nahid, E.; Ahmed, F.; Al-Emadi, N.; Ayari, M.A.; Khandakar, A.; Haider, J. Explainable Deep Learning Model for Automatic Mulberry Leaf Disease Classification. Front. Plant. Sci. 2023, 14, 1175515. [Google Scholar] [CrossRef]
- Xiong, H.; Cai, J.; Zhang, W.; Hu, J.; Deng, Y.; Miao, J.; Tan, Z.; Li, H.; Cao, J.; Wu, X. Deep Learning Enhanced Terahertz Imaging of Silkworm Eggs Development. iScience 2021, 24, 103316. [Google Scholar] [CrossRef] [PubMed]
- Wang, Q.; Li, Z.; Gu, T.; Ye, F.; Wang, X. Cocoons Counting and Classification Based on Image Processing. In Proceedings of the 2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Chengdu, China, 17–19 October 2020; pp. 148–152. [Google Scholar]
- Guo, F.; He, F.; Tao, D.; Li, G. Automatic Exposure Correction Algorithm for Online Silkworm Pupae (Bombyx Mori) Sex Classification. Comput. Electron. Agric. 2022, 198, 107108. [Google Scholar] [CrossRef]
- Sumriddetchkajorn, S.; Kamtongdee, C.; Chanhorm, S. Fault-Tolerant Optical-Penetration-Based Silkworm Gender Identification. Comput. Electron. Agric. 2015, 119, 201–208. [Google Scholar] [CrossRef]
- Tao, D.; Wang, Z.; Li, G.; Qiu, G. Radon Transform-Based Motion Blurred Silkworm Pupa Image Restoration. Int. J. Agric. Biol. Eng. 2019, 12, 152–159. [Google Scholar] [CrossRef]
- Cai, J.; Yuan, L.; Liu, B.; Sun, L. Nondestructive Gender Identification of Silkworm Cocoons Using X-Ray Imaging with Multivariate Data Analysis. Anal. Methods 2014, 6, 7224–7233. [Google Scholar] [CrossRef]
- Vasta, S.; Figorilli, S.; Ortenzi, L.; Violino, S.; Costa, C.; Moscovini, L.; Tocci, F.; Pallottino, F.; Assirelli, A.; Saviane, A.; et al. Automated Prototype for Bombyx Mori Cocoon Sorting Attempts to Improve Silk Quality and Production Efficiency through Multi-Step Approach and Machine Learning Algorithms. Sensors 2023, 23, 868. [Google Scholar] [CrossRef] [PubMed]
- Yang, C.; Peng, J.; Cai, J.; Tang, Y.; Zhou, L.; Yan, Y. Research and Design of a Machine Vision-Based Silk Cocoon Quality Inspection System. In Proceedings of the 2023 IEEE 10th International Conference on Cyber Security and Cloud Computing (CSCloud)/2023 IEEE 9th International Conference on Edge Computing and Scalable Cloud (EdgeCom), Xiangtan, China, 1–3 July 2023; pp. 369–374. [Google Scholar]
- Li, S.; Sun, W.; Liang, M.; Shao, T. Research on the Identification Method of Silkworm Cocoon Species Based on Improved YOLOv3. In Proceedings of the 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), Harbin, China, 25–27 December 2020; pp. 1119–1123. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 24 May 2019; pp. 6105–6114. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2017; Volume 30. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar]
- Jiang, Z.-H.; Hou, Q.; Yuan, L.; Zhou, D.; Shi, Y.; Jin, X.; Wang, A.; Feng, J. All Tokens Matter: Token Labeling for Training Better Vision Transformers. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2021; Volume 34, pp. 18590–18602. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Dong, X.; Bao, J.; Chen, D.; Zhang, W.; Yu, N.; Yuan, L.; Chen, D.; Guo, B. CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 12124–12134. [Google Scholar]
- Li, G.; Xu, D.; Cheng, X.; Si, L.; Zheng, C. SimViT: Exploring a Simple Vision Transformer with Sliding Windows. In Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan, 18–22 July 2022; pp. 1–6. [Google Scholar]
- Maaz, M.; Shaker, A.; Cholakkal, H.; Khan, S.; Zamir, S.W.; Anwer, R.M.; Shahbaz Khan, F. EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications. In Proceedings of the Computer Vision—ECCV 2022 Workshops; Karlinsky, L., Michaeli, T., Nishino, K., Eds.; Springer: Cham, Switzerland, 2023; pp. 3–20. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Kai, L.; Li, F.-F. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 22–24 June 2009; pp. 248–255. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Xiao, T.; Singh, M.; Mintun, E.; Darrell, T.; Dollar, P.; Girshick, R. Early Convolutions Help Transformers See Better. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2021; Volume 34, pp. 30392–30400. [Google Scholar]
- Li, J.; Xia, X.; Li, W.; Li, H.; Wang, X.; Xiao, X.; Wang, R.; Zheng, M.; Pan, X. Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios. arXiv 2022, arXiv:2207.05501. [Google Scholar]
- Xu, B.; Wang, N.; Chen, T.; Li, M. Empirical Evaluation of Rectified Activations in Convolutional Network. arXiv 2015, arXiv:1505.00853. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Wang, K.; Liew, J.H.; Zou, Y.; Zhou, D.; Feng, J. PANet: Few-Shot Image Semantic Segmentation with Prototype Alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October 2019–2 November 2019; pp. 9197–9206. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Jocher, G.; Stoken, A.; Borovec, J.; NanoCode012; ChristopherSTAN; Changyu, L.; Laughing; Hogan, A.; Lorenzomammana; Tkianai; et al. Zenodo, Ultralytics/Yolov5: V3.0; Zenodo: Geneva, Switzerland, 2020. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into High Quality Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.-P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. PVT v2: Improved Baselines with Pyramid Vision Transformer. Comp. Visual Media 2022, 8, 415–424. [Google Scholar] [CrossRef]
- Li, Y.; Yuan, G.; Wen, Y.; Hu, J.; Evangelidis, G.; Tulyakov, S.; Wang, Y.; Ren, J. EfficientFormer: Vision Transformers at MobileNet Speed. Adv. Neural Inf. Process. Syst. 2022, 35, 12934–12949. [Google Scholar]
- Mehta, S.; Rastegari, M. Mobilevit: Light-Weight, General-Purpose, And Mobile-Friendly Vision Transformer. arXiv 2022, arXiv:2110.02178. [Google Scholar]
- Yu, W.; Luo, M.; Zhou, P.; Si, C.; Zhou, Y.; Wang, X.; Feng, J.; Yan, S. MetaFormer Is Actually What You Need for Vision. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10809–10819. [Google Scholar]
Operator | Input Size | Number | Stride |
---|---|---|---|
Conv2D | 224 × 224 × 3 | 1 | 2 |
Conv2D | 112 × 112 × 36 | 1 | 1 |
Conv2D | 112 × 112 × 24 | 2 | 2, 1 |
PatchEmbed | 56 × 56 × 36 | 1 | - |
LFEB | 56 × 56 × 48 | 3 | 1 |
GFEB | 56 × 56 × 48 | 1 | 1 |
PatchEmbed | 56 × 56 × 48 | 1 | - |
LFEB | 28 × 28 × 96 | 4 | 1 |
GFEB | 28 × 28 × 96 | 1 | 1 |
PatchEmbed | 28 × 28 × 96 | 1 | - |
LFEB | 14 × 14 × 192 | 8 | 1 |
GFEB | 14 × 14 × 240 | 1 | 1 |
PatchEmbed | 14 × 14 × 240 | 1 | - |
LFEB | 7 × 7 × 384 | 3 | 1 |
GFEB | 7 × 7 × 384 | 1 | 1 |
Avg Pool | 7 × 7 × 384 | 1 | - |
FC | 1 × 1 × 384 | 1 | - |
Method | Normal | Inferior | mAP | ||||
---|---|---|---|---|---|---|---|
P | R | AP | P | R | AP | ||
YOLOv3 [32] | 83.51 | 66.20 | 68.13 | 90.92 | 52.86 | 59.62 | 63.87 |
YOLOv5-l [33] | 82.37 | 66.17 | 68.13 | 91.57 | 50.50 | 58.48 | 63.30 |
YOLOX-l [34] | 84.44 | 66.24 | 68.51 | 89.59 | 54.05 | 60.38 | 64.44 |
YOLOv7-l [35] | 85.68 | 66.08 | 68.65 | 84.92 | 55.98 | 60.70 | 64.68 |
Faster RCNN [36] | 87.56 | 65.73 | 68.70 | 91.13 | 51.74 | 59.54 | 64.12 |
Cascade RCNN [37] | 84.74 | 65.30 | 68.20 | 85.77 | 54.70 | 60.18 | 64.19 |
RetinaNet (baseline) [30] | 81.95 | 66.26 | 68.13 | 90.90 | 52.91 | 59.85 | 63.99 |
AMMF-Net (ours) | 89.86 | 66.67 | 69.75 | 90.44 | 57.39 | 62.48 | 66.12 |
Model | mAP | Params | GFLOPs |
---|---|---|---|
YOLOv3 | 63.87 | 61.95 | 156.62 |
YOLOv5-l | 63.30 | 46.73 | 144.89 |
YOLOX-l | 64.44 | 54.21 | 156.01 |
YOLOv7-l | 64.68 | 37.62 | 106.47 |
Faster RCNN | 64.12 | 41.22 | 182.23 |
Cascade RCNN | 64.19 | 69.17 | 238.10 |
RetinaNet (baseline) | 63.99 | 37.74 | 170.21 |
AMMF-Net (ours) | 66.12 | 21.33 | 135.40 |
Model | Params (MB) | GFLOPs | mAP (%) |
---|---|---|---|
Faster RCNN | 41.22 | 182.23 | 78.2 |
Cascade RCNN | 69.17 | 238.10 | 77.2 |
RetinaNet | 37.74 | 170.21 | 75.8 |
YOLOv3 | 61.95 | 156.62 | 73.3 |
YOLOv5-l | 46.73 | 144.89 | 76.5 |
YOLOX-l | 54.21 | 156.01 | 75.1 |
YOLOv7-l | 37.62 | 106.47 | 76.0 |
AMMF-Net (ours) | 21.33 | 135.40 | 77.6 |
Dataset | Method | Params (MB) | GFLOPs | mAP (%) |
---|---|---|---|---|
Cocoon | RetinaNet | 37.74 | 170.21 | 63.99 |
RetinaNet + HFE-Net | 20.03 | 129.12 | 64.19 | |
RetinaNet + HFE-Net + EMFF(AMMF-Net) | 21.33 | 135.40 | 66.12 | |
VOC07 + 12 | RetinaNet | 37.74 | 170.21 | 75.8 |
RetinaNet + HFE-Net | 20.03 | 129.12 | 76.8 | |
RetinaNet + HFE-Net + EMFF | 21.33 | 135.40 | 77.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zheng, H.; Guo, X.; Ma, Y.; Zeng, X.; Chen, J.; Zhang, T. Fine-Grained Detection Model Based on Attention Mechanism and Multi-Scale Feature Fusion for Cocoon Sorting. Agriculture 2024, 14, 700. https://doi.org/10.3390/agriculture14050700
Zheng H, Guo X, Ma Y, Zeng X, Chen J, Zhang T. Fine-Grained Detection Model Based on Attention Mechanism and Multi-Scale Feature Fusion for Cocoon Sorting. Agriculture. 2024; 14(5):700. https://doi.org/10.3390/agriculture14050700
Chicago/Turabian StyleZheng, Han, Xueqiang Guo, Yuejia Ma, Xiaoxi Zeng, Jun Chen, and Taohong Zhang. 2024. "Fine-Grained Detection Model Based on Attention Mechanism and Multi-Scale Feature Fusion for Cocoon Sorting" Agriculture 14, no. 5: 700. https://doi.org/10.3390/agriculture14050700
APA StyleZheng, H., Guo, X., Ma, Y., Zeng, X., Chen, J., & Zhang, T. (2024). Fine-Grained Detection Model Based on Attention Mechanism and Multi-Scale Feature Fusion for Cocoon Sorting. Agriculture, 14(5), 700. https://doi.org/10.3390/agriculture14050700