Efficient Small-Object Detection in Underwater Images Using the Enhanced YOLOv8 Network
Abstract
:1. Introduction
- (1)
- To achieve model lightweighting, we use FasterNet-T0 [11] to replace the backbone of YOLOv8, slightly reducing model accuracy in exchange for a faster training speed and fewer model parameters.
- (2)
- In order to enhance the accuracy of small-object detection, we first integrate a prediction head for small objects into YOLOv8 because underwater images often contain many small objects. The prediction head we add is generated from high-resolution feature maps, making it more sensitive to small objects. We also perform specific optimizations for the number of channels in different resolution feature maps. Second, we enhance the performance of detecting small objects and handling occlusions in dense underwater images by utilizing Deformable ConvNets v2 [12] and incorporating Coordinate Attention [13] to embed positional information into channel attention, which incurs almost no computational overhead but helps the network find regions of interest in images.
- (3)
- With our lightweight model, we achieve 52.12% AP on the UTDAC2020 underwater dataset [6], surpassing the larger YOLOv8l model (AP 51.69%). When increasing the input resolution to 1280, the AP reach 53.18%. Additionally, we obtain outstanding results of mAP 84.4% on the Pascal VOC dataset, surpassing previous well-established detectors. These results demonstrate the effectiveness of our method in underwater environments and its generalization to common datasets.
2. Related Work
2.1. YOLOv8 Network
2.2. Lightweight Networks
2.3. Small-Object Detection
3. Approach
3.1. Faster Neural Networks
3.2. Prediction Head for Small Objects
3.3. Coordinate Attention
3.4. Deformable ConvNets V2
4. Experiments
4.1. Datasets
4.1.1. UTDAC2020
4.1.2. Pascal VOC
4.2. Implementation Details
4.3. Comparisons with Other State-of-the-Art Methods
4.3.1. Results on UTDAC2020
Method | Backbone | AP | AP50 | Parameters (M) | FLOPs (G) | Model Size (MB) |
---|---|---|---|---|---|---|
Faster R-CNN w/FPN [29] | ResNet50 | 44.50 | 80.90 | 41.14 | 63.26 | ~ |
Cascade R-CNN [39] | ResNet50 | 46.60 | 81.50 | 68.94 | 91.06 | ~ |
RetinaNet [40] | ResNet50 | 43.90 | 80.40 | 36.17 | 52.62 | ~ |
FCOS [41] | ResNet50 | 43.90 | 81.10 | 31.84 | 50.36 | ~ |
Deformable DETR [42] | ResNet50 | 46.60 | 84.10 | ~ | ~ | ~ |
Libra R-CNN [43] | ResNet50 | 45.80 | 82.00 | 41.40 | 63.53 | ~ |
Dynamic R-CNN [44] | ResNet50 | 45.60 | 80.10 | 41.14 | 63.26 | ~ |
ATSS [45] | ResNet50 | 46.20 | 82.50 | 31.89 | 51.58 | ~ |
Boosting R-CNN [6] | ResNet50 | 48.50 | 82.40 | 43.55 | 53.17 | ~ |
Boosting R-CNN * [6] | ResNet50 | 51.40 | 85.50 | 45.91 | 54.67 | ~ |
YOLOv8n | Darknet-53 | 49.07 | 82.73 | 3.0 | 8.1 | 6 |
YOLOv8s | Darknet-53 | 50.50 | 84.73 | 11.1 | 28.4 | 22 |
YOLOv8m | Darknet-53 | 51.74 | 85.11 | 25.8 | 78.7 | 50 |
YOLOv8l | Darknet-53 | 51.69 | 84.85 | 43.6 | 164.8 | 84 |
Ours | FasterNet-T0 | 52.12 | 85.49 | 8.5 | 25.5 | 17 |
Ours (1280) | FasterNet-T0 | 53.18 | 86.21 | 8.5 | 25.5 | 18 |
4.3.2. Results on Pascal VOC
4.4. Ablation Study
5. Discussion
5.1. The Impact of High-Resolution Feature Maps (Prediction Head for Small Objects)
5.2. The Impact of Low-Resolution Feature Maps on Large Object Detection
5.3. Applicable in Various Underwater Marine Scenarios
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Deans, C.; Marmugi, L.; Renzoni, F. Active underwater detection with an array of atomic magnetometers. Appl. Opt. 2018, 57, 2346–2351. [Google Scholar] [CrossRef]
- Czub, M.; Kotwicki, L.; Lang, T.; Sanderson, H.; Klusek, Z.; Grabowski, M.; Szubska, M.; Jakacki, J.; Andrzejewski, J.; Rak, D. Deep sea habitats in the chemical warfare dumping areas of the Baltic Sea. Sci. Total Environ. 2018, 616, 1485–1497. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Song, P.; Li, P.; Dai, L.; Wang, T.; Chen, Z. Boosting R-CNN: Reweighting R-CNN samples by RPN’s error for underwater object detection. Neurocomputing 2023, 530, 150–164. [Google Scholar] [CrossRef]
- Zhang, M.; Xu, S.; Song, W.; He, Q.; Wei, Q. Lightweight underwater object detection based on yolo v4 and multi-scale attentional feature fusion. Remote Sens. 2021, 13, 4706. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Chen, J.; Kao, S.-h.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. arXiv 2023, arXiv:2303.03667. [Google Scholar]
- Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9308–9316. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Wang, C.-Y.; Liao, H.-Y.M.; Yeh, I.-H. Designing Network Design Strategies Through Gradient Path Analysis. arXiv 2022, arXiv:2211.04800. [Google Scholar]
- Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. Tood: Task-aligned one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 3490–3499. [Google Scholar]
- Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inf. Process. Syst. 2020, 33, 21002–21012. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
- Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
- Huang, H.; Zhou, H.; Yang, X.; Zhang, L.; Qi, L.; Zang, A.-Y. Faster R-CNN for marine organisms detection and recognition using data augmentation. Neurocomputing 2019, 337, 372–384. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Li, W.; Zhang, L.; Wu, C.; Cui, Z.; Niu, C. A new lightweight deep neural network for surface scratch detection. Int. J. Adv. Manuf. Technol. 2022, 123, 1999–2015. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Takano, N.; Alaghband, G. Srgan: Training dataset matters. arXiv 2019, arXiv:1903.09922. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Alsubaei, F.S.; Al-Wesabi, F.N.; Hilal, A.M. Deep learning-based small object detection and classification model for garbage waste management in smart cities and iot environment. Appl. Sci. 2022, 12, 2281. [Google Scholar] [CrossRef]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Sadeghi, M.A.; Forsyth, D. 30hz object detection with dpm v5. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Part I 13. pp. 65–79. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra R-CNN: Towards balanced learning for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar]
- Zhang, H.; Chang, H.; Ma, B.; Wang, N.; Chen, X. Dynamic R-CNN: Towards high quality object detection via dynamic training. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Part XV 16. pp. 260–275. [Google Scholar]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9759–9768. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 1, pp. 91–99. [Google Scholar]
- Gidaris, S.; Komodakis, N. Object detection via a multi-region and semantic segmentation-aware cnn model. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1134–1142. [Google Scholar]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object detection via region-based fully convolutional networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 379–387. [Google Scholar]
- Zhu, Y.; Zhao, C.; Wang, J.; Zhao, X.; Wu, Y.; Lu, H. Couplenet: Coupling global structure with local parts for object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4126–4134. [Google Scholar]
- Shen, Z.; Liu, Z.; Li, J.; Jiang, Y.-G.; Chen, Y.; Xue, X. Dsod: Learning deeply supervised object detectors from scratch. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1919–1927. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Part I 14; pp. 21–37. [Google Scholar]
- Zhou, P.; Ni, B.; Geng, C.; Hu, J.; Xu, Y. Scale-transferrable object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 528–537. [Google Scholar]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-shot refinement neural network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4203–4212. [Google Scholar]
- Fu, C.-Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. DSSD: Deconvolutional single shot detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
- Fan, B.; Chen, W.; Cong, Y.; Tian, J. Dual refinement underwater object detection network. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Part XX 16. pp. 275–291. [Google Scholar]
- Zhang, Z.; Qiao, S.; Xie, C.; Shen, W.; Wang, B.; Yuille, A.L. Single-shot object detection with enriched semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5813–5821. [Google Scholar]
- Kong, T.; Sun, F.; Tan, C.; Liu, H.; Huang, W. Deep feature pyramid reconfiguration for object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 169–185. [Google Scholar]
- Pang, Y.; Wang, T.; Anwer, R.M.; Khan, F.S.; Shao, L. Efficient featurized image pyramid network for single shot detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7336–7344. [Google Scholar]
- Liu, S.; Huang, D. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 385–400. [Google Scholar]
- Li, Y.; Li, J.; Lin, W.; Li, J. Tiny-DSOD: Lightweight object detection for resource-restricted usages. arXiv 2018, arXiv:1807.11013. [Google Scholar]
- Wang, R.J.; Li, X.; Ling, C.X. Pelee: A real-time object detection system on mobile devices. In Proceedings of the Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; pp. 1967–1976. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Jiang, L.; Wang, Y.; Jia, Q.; Xu, S.; Liu, Y.; Fan, X.; Li, H.; Liu, R.; Xue, X.; Wang, R. Underwater species detection using channel sharpening attention. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual, 20–24 October 2021; pp. 4259–4267. [Google Scholar]
Environment | Versions or Model Number |
---|---|
CPU | Intel(R) Xeon(R) Silver 4210R CPU @ 2.40 GHz |
GPU | GeForce RTX 2080 Ti, Two GPUs, Memory of 11 G |
OS | Ubuntu 18.04 |
CUDA CUDNN | V 10.2 V 7.6.5 |
PyTorch | V 1.12.1 |
Python | V 3.8.16 |
Metrics | Description |
---|---|
AP50 | The mean average precision (mAP) at an intersection over union (IoU) of 0.50. |
AP | The mAP at IoU of 0.50:0.05:0.95. |
Parameters | The overall count of parameters in the network. |
FLOPs | Floating-point operations per second. |
Method | Backbone | Input Size | mAP | Parameters (M) |
---|---|---|---|---|
Two-Stage Detector: | ||||
Faster RCNN [46] | VGGNet | 1000 × 600 | 73.2 | 134.7 |
Faster RCNN [5] | ResNet-101 | 1000 × 600 | 76.4 | 60.13 |
MR-CNN [47] | VGG16 | 1000 × 600 | 78.2 | ~ |
R-FCN [48] | ResNet50 | 1000 × 600 | 77.4 | 31.9 |
CoupleNet [49] | ResNet101 | 1000 × 600 | 82.7 | ~ |
DSOD300 [50] | DS/64-192-48-1 | 300 × 300 | 77.7 | 14.8 |
Boosting R-CNN [6] | ResNet50 | 1000 × 600 | 81.9 | 43.6 |
Boosting R-CNN* [6] | ResNet50 | 1000 × 600 | 83.0 | 45.9 |
One-Stage Detector: | ||||
SSD512 [51] | VGG16 | 512 × 512 | 76.8 | ~ |
STDN513 [52] | DenseNet169 | 513 × 513 | 80.9 | ~ |
RefineDet512 [53] | VGG16 | 512 × 512 | 81.8 | ~ |
DSSD513 [54] | ResNet101 | 513 × 513 | 81.5 | ~ |
RetinaNet [40] | ResNet50 | 1000 × 600 | 77.3 | 36.2 |
FERNet [55] | VGG16 + ResNet50 | 512 × 512 | 81.0 | ~ |
DES512 [56] | VGG16 | 512 × 512 | 81.7 | ~ |
DFPR512 [57] | VGG16 | 512 × 512 | 81.1 | ~ |
EFIPNet512 [58] | VGG16 | 512 × 512 | 81.8 | ~ |
RFBNet512 [59] | VGG16 | 512 × 512 | 82.1 | ~ |
Lightweight detectors: | ||||
SqueezeNet-SSD [60] | SqueezeNet | 300 × 300 | 64.3 | 5.5 |
MobileNet-SSD [60] | MobileNet | 300 × 300 | 68.0 | 5.5 |
Pelee [61] | PeleeNet | 300 × 300 | 70.9 | 6.0 |
Tiny-DSOD [60] | G/32-48-64-80 | 300 × 300 | 72.1 | 1.0 |
YOLO detectors: | ||||
YOLOv8n | Darknet-53 | 640 × 640 | 80.4 | 3.0 |
YOLOv8s | Darknet-53 | 640 × 640 | 83.9 | 11.1 |
YOLOv8m | Darknet-53 | 640 × 640 | 86.1 | 25.9 |
Ours | FasterNet-T0 | 640 × 640 | 84.4 | 8.5 |
Setting | AP | Echinus | Starfish | Holothurian | Scallop | Parameters (M) | FLOPs (B) | Model Size (MB) |
---|---|---|---|---|---|---|---|---|
Baseline-YOLOv8s | 50.50 | 52.46 | 55.38 | 40.36 | 53.80 | 11.1 | 28.4 | 22 |
+FasterNet-T0 | 49.69 | 52.04 | 54.33 | 39.34 | 53.05 | 8.6 | 21.7 | 17 |
+FasterNet-T0, +Phead | 50.89 | 53.28 | 55.21 | 39.94 | 55.15 | 8.0 | 30.7 | 16 |
+FasterNet-T0, +Phead, +CA | 51.40 | 53.11 | 56.54 | 41.12 | 54.82 | 8.0 | 30.8 | 16 |
+FasterNet-T0, +Phead, +CA, +DCNv2 (640) | 52.12 | 53.92 | 56.85 | 42.51 | 55.22 | 8.5 | 25.5 | 17 |
+FasterNet-T0, +Phead, +CA, +DCNv2 (1280) | 53.18 | 53.08 | 57.64 | 44.87 | 57.13 | 8.5 | 25.5 | 18 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, M.; Wang, Z.; Song, W.; Zhao, D.; Zhao, H. Efficient Small-Object Detection in Underwater Images Using the Enhanced YOLOv8 Network. Appl. Sci. 2024, 14, 1095. https://doi.org/10.3390/app14031095
Zhang M, Wang Z, Song W, Zhao D, Zhao H. Efficient Small-Object Detection in Underwater Images Using the Enhanced YOLOv8 Network. Applied Sciences. 2024; 14(3):1095. https://doi.org/10.3390/app14031095
Chicago/Turabian StyleZhang, Minghua, Zhihua Wang, Wei Song, Danfeng Zhao, and Huijuan Zhao. 2024. "Efficient Small-Object Detection in Underwater Images Using the Enhanced YOLOv8 Network" Applied Sciences 14, no. 3: 1095. https://doi.org/10.3390/app14031095
APA StyleZhang, M., Wang, Z., Song, W., Zhao, D., & Zhao, H. (2024). Efficient Small-Object Detection in Underwater Images Using the Enhanced YOLOv8 Network. Applied Sciences, 14(3), 1095. https://doi.org/10.3390/app14031095