HRA-YOLO: An Effective Detection Model for Underwater Fish
Abstract
:1. Introduction
- Build a fish dataset under different water quality conditions. To ensure practicality in various real environments, fish data were previously collected in two distinct water quality settings and supplemented with images from a laboratory exhibiting clear water conditions and fish images on the Internet. Data augmentation is implemented here on the collected data to form the datasets.
- Construct HRA-YOLO by reshaping the YOLOv8s model. We modify the YOLOv8s model in two ways. One is adopting the lightweight network HGNetV2 as the backbone of YOLOv8s, and the other is using a newly designed module named RA (residual attention) to replace the bottleneck of its C2f module.
- Perform comprehensive experiments and analyze the experimental results from multiple perspectives.
2. Materials and Methods
2.1. YOLOv8s Model
2.2. HRA-YOLO Model
2.2.1. Lightweight Backbone Network
2.2.2. Dilation-Wise Residual Module (DWR)
2.2.3. Residual Attention (RA) Structure
2.2.4. Residual Attention Feature Extraction Module (RAFE)
3. Experimental Results and Discussion
3.1. Evaluation Metrics and Experimental Environment
3.2. Production of Experimental Data
3.2.1. Data Collection and Annotation
3.2.2. Offline Data Augmentation
3.3. Experimental Results of the HRA-YOLO Model
3.4. Comparison of Different Attention Mechanisms
3.5. Ablation Experiments
3.6. Comparison of Different Object Detection Models
3.7. Cross-Dataset Validation
3.8. Missed Detection Analysis
4. Conclusions
- Explore more efficient network structures and optimization algorithms to further enhance the real-time data processing speed of the model.
- Investigate underwater image enhancement algorithms and data augmentation methods to reduce false detection rates, improve detection precision, and enhance the model generalization ability.
- Research knowledge distillation and model pruning techniques to further reduce the model size while maintaining precision.
- Further study cross-domain transfer learning and multimodal fusion techniques to improve the model’s adaptability in environments where the visible spectrum is unreliable, in addition to other practical scenarios.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- FAO. The State of World Fisheries and Aquaculture 2022. Towards Blue Transformation; FAO: Rome, Italy, 2022; pp. 1–15. [Google Scholar] [CrossRef]
- Spampinato, C.; Giordano, D.; Salvo, R.D.; Chen-Burger, Y.-H.J.; Fisher, R.B.; Nadarajan, G. Automatic fish classification for underwater species behavior understanding. In Proceedings of the First ACM International Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams, Firenze, Italy, 29 October 2010; pp. 45–50. [Google Scholar] [CrossRef]
- Tharwat, A.; Hemedan, A.A.; Hassanien, A.E.; Gabel, T. A biometric-based model for fish species classification. Fish. Res. 2018, 204, 324–336. [Google Scholar] [CrossRef]
- Xu, C.; Wang, Z.; Du, R.; Li, Y.; Li, D.; Chen, Y.; Li, W.; Liu, C. A method for detecting uneaten feed based on improved YOLOv5. Comput. Electron. Agric. 2023, 212, 108101. [Google Scholar] [CrossRef]
- Fernandes, A.F.A.; Turra, E.M.; de Alvarenga, É.R.; Passafaro, T.L.; Lopes, F.B.; Alves, G.F.O.; Singh, V.; Rosa, G.J.M. Deep Learning image segmentation for extraction of fish body measurements and prediction of body weight and carcass traits in Nile tilapia. Comput. Electron. Agric. 2020, 170, 105274. [Google Scholar] [CrossRef]
- Xu, X.; Li, W.; Duan, Q. Transfer learning and SE-ResNet152 networks-based for small-scale unbalanced fish species identification. Comput. Electron. Agric. 2021, 180, 105878. [Google Scholar] [CrossRef]
- Chen, L.; Yin, X. Recognition Method of Abnormal Behavior of Marine Fish Swarm Based on In-Depth Learning Network Model. J. Web Eng. 2021, 20, 575–596. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Jalal, A.; Salman, A.; Mian, A.; Shortis, M.; Shafait, F. Fish detection and species classification in underwater environments using deep learning with temporal information. Ecol. Inform. 2020, 57, 101088. [Google Scholar] [CrossRef]
- Zhang, M.; Xu, S.; Song, W.; He, Q.; Wei, Q. Lightweight Underwater Object Detection Based on YOLO v4 and Multi-Scale Attentional Feature Fusion. Remote Sens. 2021, 13, 4706. [Google Scholar] [CrossRef]
- Li, L.; Shi, G.; Jiang, T. Fish detection method based on improved YOLOv5. Aquac. Int. 2023, 31, 2513–2530. [Google Scholar] [CrossRef]
- Liu, K.; Sun, Q.; Sun, D.; Peng, L.; Yang, M.; Wang, N. Underwater Target Detection Based on Improved YOLOv7. J. Mar. Sci. Eng. 2023, 11, 677. [Google Scholar] [CrossRef]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO, Version 8.0.0. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 11 January 2024).
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-time Object Detection. arXiv 2023, arXiv:2304.08069. [Google Scholar] [CrossRef]
- Wei, H.; Liu, X.; Xu, S.; Dai, Z.; Dai, Y.; Xu, X. DWRSeg: Rethinking Efficient Acquisition of Multi-scale Contextual Information for Real-time Semantic Segmentation. arXiv 2022, arXiv:2212.01173. [Google Scholar] [CrossRef]
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Bono, F.M.; Radicioni, L.; Cinquemani, S. A novel approach for quality control of automated production lines working under highly inconsistent conditions. Eng. Appl. Artif. Intell. 2023, 122, 106149. [Google Scholar] [CrossRef]
- Khalifa, N.E.; Loey, M.; Mirjalili, S. A comprehensive survey of recent trends in deep learning for digital images augmentation. Artif. Intell. Rev. 2022, 55, 2351–2377. [Google Scholar] [CrossRef] [PubMed]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar] [CrossRef]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13708–13717. [Google Scholar] [CrossRef]
- Liu, Y.; Shao, Z.; Teng, Y.; Hoffmann, N. NAM: Normalization-based Attention Module. arXiv 2021, arXiv:2111.12419. [Google Scholar] [CrossRef]
- Yang, L.; Zhang, R.; Li, L.; Xie, X. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Volume 139, pp. 11863–11874. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
- Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J. YOLOv5 by Ultralytics, Version 7.0. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 2 March 2024).
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar] [CrossRef]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar] [CrossRef]
- Roboflow100. Fish Market Dataset. 2023. Available online: https://universe.roboflow.com/roboflow-100/fish-market-ggjso (accessed on 20 August 2024).
Type of Datasets | Number of Images | Precision/% | Recall/% | mAP/% |
---|---|---|---|---|
Original | 3080 | 91.1 | 86.8 | 92.3 |
Original + Intensity Transformation | 4080 | 91.8 | 89.0 | 93.9 |
Original + Geometric Transformation | 4080 | 92.1 | 88.5 | 94.4 |
Original + Both Intensity and Geometric Transformation | 4080 | 91.8 | 87.4 | 93.5 |
Model | Precision/% | Recall/% | mAP/% | FLOPs/G | Parameters/M | Speed/FPS | Model Size/MB |
---|---|---|---|---|---|---|---|
YOLOv8s | 92.1 | 88.5 | 94.4 | 28.4 | 11.125971 | 124.6 | 22.5 |
HRA-YOLO | 93.1 | 88.3 | 94.5 | 23.0 | 8.225795 | 103.3 | 16.8 |
Model | Precision/% | Recall/% | mAP/% |
---|---|---|---|
DWR (no mechanism) | 91.8 | 87.9 | 93.7 |
DWR + CA | 92.7 | 86.1 | 93.0 |
DWR + NAM | 92.1 | 88.0 | 93.7 |
DWR + SimAM | 91.5 | 88.7 | 94.5 |
DWR + EMA | 93.1 | 88.3 | 94.5 |
Experiments | YOLOv8s | HGNetV2 | DWR | RAFE | Precision/% | Recall/% | mAP/% | FLOPs/G |
---|---|---|---|---|---|---|---|---|
1 | ✓ | 92.1 | 88.5 | 94.4 | 28.4 | |||
2 | ✓ | ✓ | 92.4 | 87.6 | 94.2 | 23.3 | ||
3 | ✓ | ✓ | 91.7 | 88.7 | 93.9 | 27.8 | ||
4 | ✓ | ✓ | 92.3 | 89.1 | 94.2 | 28.1 | ||
5 | ✓ | ✓ | ✓ | 91.8 | 87.9 | 93.7 | 22.7 | |
6 | ✓ | ✓ | ✓ | 93.1 | 88.3 | 94.5 | 23.0 |
Model | Precision/% | Recall/% | mAP/% | Speed/FPS | FLOPs/G |
---|---|---|---|---|---|
SSD | 89.3 | 81.9 | 92.0 | 28.7 | 84.1 |
EfficientDet | 91.0 | 83.4 | 90.4 | 18.7 | 19.2 |
RT-DETR-L | 91.2 | 88.3 | 93.2 | 57.6 | 100.6 |
YOLOv5s | 91.3 | 87.5 | 93.4 | 138.5 | 15.8 |
RC-YOLOv5 | 92.0 | 87.3 | 93.8 | 143.6 | 12.6 |
YOLOv7-tiny | 91.1 | 88.4 | 94.0 | 133.3 | 13.2 |
YOLOv9s | 91.7 | 89.7 | 94.7 | 87.6 | 26.7 |
YOLOv10s | 91.9 | 87.8 | 93.9 | 100.7 | 24.4 |
HRA-YOLO | 93.1 | 88.3 | 94.5 | 103.3 | 23.0 |
Model | Precision/% | Recall/% | mAP/% | Parameters/M | Speed/FPS |
---|---|---|---|---|---|
YOLOv8s | 98.9 | 99.1 | 99.2 | 11.132937 | 122.2 |
HRA-YOLO | 99.5 | 99.0 | 99.6 | 8.232761 | 102.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, H.; Zhang, J.; Cheng, H. HRA-YOLO: An Effective Detection Model for Underwater Fish. Electronics 2024, 13, 3547. https://doi.org/10.3390/electronics13173547
Wang H, Zhang J, Cheng H. HRA-YOLO: An Effective Detection Model for Underwater Fish. Electronics. 2024; 13(17):3547. https://doi.org/10.3390/electronics13173547
Chicago/Turabian StyleWang, Hongru, Jingtao Zhang, and Hu Cheng. 2024. "HRA-YOLO: An Effective Detection Model for Underwater Fish" Electronics 13, no. 17: 3547. https://doi.org/10.3390/electronics13173547
APA StyleWang, H., Zhang, J., & Cheng, H. (2024). HRA-YOLO: An Effective Detection Model for Underwater Fish. Electronics, 13(17), 3547. https://doi.org/10.3390/electronics13173547