LC-YOLO: A Lightweight Model with Efficient Utilization of Limited Detail Features for Small Object Detection
Abstract
:1. Introduction
2. Related Work
2.1. Backbone for Feature Extraction
2.2. Neck for Feature Fusion
2.3. Head for Object Detection
3. Methodology
3.1. Review of YOLOv5 Model
3.2. Laplace Bottleneck (LB)
3.2.1. Laplace Operator
3.2.2. Bottleneck
3.2.3. Laplace Bottleneck
3.3. Cross-Layer Attention Upsampling (CLAU)
3.3.1. Nearest Neighbor Sampling
3.3.2. Scaled Dot Product Attention
3.3.3. Cross-Layer Attention Upsampling
3.4. LC-YOLO Model Network Structure
4. Experiments
4.1. Experiment Setup
4.1.1. Dataset
4.1.2. Evaluation Criteria
4.1.3. Experiments Details
4.2. Ablation Experiments
4.3. Comparison with State-of-Art Models
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. Computer Science. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Computer Society, Boston, MA, USA, 7–12 June 2014. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Wang, C.Y.; Liao, H.; Wu, Y.H.; Chen, P.Y.; Yeh, I.H. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Tan, M.; Le, Q.V. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 6105–6114. [Google Scholar]
- Lin, T.Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Peng, Z.; Cong, G. Scale-Transferrable Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Zhao, Q.; Sheng, T.; Wang, Y.; Tang, Z.; Chen, Y.; Cai, L.; Ling, H. M2det: A single-shot object detector based on multi-level feature pyramid network. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019. [Google Scholar]
- Islam, M.A.; Rochan, M.; Bruce, N.; Yang, W. Gated Feedback Refinement Network for Dense Image Labeling. In Proceedings of the Computer Vision & Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: Alexnet-level accuracy with 50x fewer parameters and <0.5 mb model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Wey, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Gao, H.; Wang, Z.; Cai, L.; Ji, S. Channelnets: Compact and efficient convolutional neural networks via channel-wise convolutions. Adv. Neural Inf. Process. Syst. 2020, 31. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, T.; Qi, G.J.; Xiao, B.; Wang, J. Interleaved group convolutions for deep neural networks. arXiv 2017, arXiv:1707.02725. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C. GhostNet: More Features From Cheap Operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Li, Y.; Chen, Y.; Dai, X.; Chen, D.; Liu, M.; Yuan, L.; Liu, Z.; Zhang, L.; Vasconcelos, N. MicroNet: Improving Image Recognition with Extremely Low FLOPs. In Proceedings of the International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Ding, J.; Xue, N.; Long, Y.; Xia, G.-S.; Lu, Q. Learning RoI transformer for oriented object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2849–2858. [Google Scholar]
- Ming, Q.; Zhou, Z.; Miao, L.; Zhang, H.; Li, L. Dynamic anchor learning for arbitrary-oriented object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 2355–2363. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6568–6577. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Yang, Z.; Liu, S.; Hu, H.; Wang, L.; Lin, S. RepPoints: Point Set Representation for Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Li, W.; Zhu, J. Oriented reppoints for aerial object detection. arXiv 2021, arXiv:2105.11111. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Zhu, H.; Chen, X.; Dai, W.; Fu, K.; Ye, Q.; Jiao, J. Orientation robust object detection in aerial images using deep convolutional neural network. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015; pp. 3735–3739. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
Category | Relative Scale 1 | Absolute Scale 2 |
---|---|---|
Car | 84.68% | 56.18% |
Airplane | 53.23% | 20.14% |
Methods | Precision (%) | Recall (%) | ||
---|---|---|---|---|
Car | Airplane | Car | Airplane | |
YOLOv5s | 89.70 | 97.29 | 79.72 | 93.14 |
YOLOv5s+LB | 87.57 | 98.60 (+1.31) | 85.64 (+5.92) | 81.35 |
YOLOv5s+CLAU | 89.40 | 97.59 (+0.30) | 85.30 (+5.58) | 93.84 (+0.70) |
LC-YOLO | 92.65 (+2.95) | 96.87 | 85.78 (+6.06) | 95.87 (+2.73) |
YOLOv5n | 89.13 | 95.80 | 78.07 | 93.85 |
YOLOv5n+LB | 91.61 (+2.48) | 97.98 (+2.18) | 81.27 (+3.20) | 89.88 |
YOLOv5n+CLAU | 83.07 | 98.85 (+3.05) | 88.56 (+10.49) | 79.21 |
Tiny LC-YOLO | 91.85 (+2.72) | 96.72 (+0.92) | 83.11 (+5.04) | 95.50 (+1.65) |
Methods | [email protected] (%) | [email protected] (%) | |
---|---|---|---|
Car | Airplane | ||
YOLOv5s | 87.16 | 96.15 | 91.66 |
YOLOv5s+LB | 90.32 (+3.07) | 93.55 | 91.94 (+0.28) |
YOLOv5s+CLAU | 90.64 (+3.48) | 97.58 (+1.43) | 94.11 (+2.45) |
LC-YOLO | 91.82 (+4.66) | 98.10 (+1.95) | 94.96 (+3.30) |
YOLOv5n | 86.64 | 96.09 | 91.36 |
YOLOv5n+LB | 89.88 (+3.24) | 95.23 | 92.10 (+0.74) |
YOLOv5n+CLAU | 90.86 (+4.22) | 92.51 | 91.69 (+0.33) |
Tiny LC-YOLO | 90.84 (+4.20) | 97.51 (+1.45) | 94.17 (+2.81) |
Methods | Parameters (M) | Inference (ms) | Calculation Quantity (GFLOPs) |
---|---|---|---|
YOLOv5s | 7.06 | 10.8 | 16.3 |
YOLOv5s+LB | 7.06 | 13.8 | 16.3 |
YOLOv5s+CLAU | 7.30 | 12.1 | 17.6 |
LC-YOLO | 7.30 | 15.6 | 17.6 |
YOLOv5n | 1.77 | 10.2 | 4.3 |
YOLOv5n+LB | 1.77 | 13.1 | 4.3 |
YOLOv5n+CLAU | 1.83 | 11.6 | 4.6 |
Tiny LC-YOLO | 1.83 | 14.4 | 4.6 |
Methods | Backbone | Parameters (M) | Precision (%) | Recall (%) | [email protected] (%) | [email protected] (%) | |
---|---|---|---|---|---|---|---|
Car | Airplane | ||||||
YOLOv3 | Darknet53 | 61.53 | - | - | 74.63 | 89.52 | 82.08 |
RetinaNet | ResNet-101-FPN | - | - | - | 84.64 | 90.51 | 87.57 |
Faster R-CNN | ResNet-101 | 44.55 | - | - | 86.87 | 89.86 | 88.36 |
RoI Trans | ResNet-101-FPN | - | - | - | 87.99 | 89.90 | 88.95 |
DAL | ResNet-101 | 44.55 | - | - | 89.25 | 90.49 | 89.87 |
Oriented RepPoints | ResNet-101-FPN | - | - | - | 89.51 | 90.70 | 90.11 |
YOLOv5n | - | 1.77 | 92.46 | 85.96 | 86.64 | 96.09 | 91.36 |
YOLOv5s | - | 7.06 | 93.5 | 86.43 | 87.16 | 96.15 | 91.66 |
Tiny LC-YOLO | - | 1.83 | 94.28 | 89.31 | 90.84 | 97.51 | 94.17 |
YOLOv5l | - | 46.61 | 94.73 | 87.65 | 91.86 | 96.51 | 94.19 |
YOLOv5m | - | 21.04 | 94.98 | 89.95 | 90.91 | 97.56 | 94.24 |
LC-YOLO | - | 7.30 | 94.76 | 90.83 | 91.82 | 98.10 | 94.96 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cui, M.; Gong, G.; Chen, G.; Wang, H.; Jin, M.; Mao, W.; Lu, H. LC-YOLO: A Lightweight Model with Efficient Utilization of Limited Detail Features for Small Object Detection. Appl. Sci. 2023, 13, 3174. https://doi.org/10.3390/app13053174
Cui M, Gong G, Chen G, Wang H, Jin M, Mao W, Lu H. LC-YOLO: A Lightweight Model with Efficient Utilization of Limited Detail Features for Small Object Detection. Applied Sciences. 2023; 13(5):3174. https://doi.org/10.3390/app13053174
Chicago/Turabian StyleCui, Menghua, Guoliang Gong, Gang Chen, Hongchang Wang, Min Jin, Wenyu Mao, and Huaxiang Lu. 2023. "LC-YOLO: A Lightweight Model with Efficient Utilization of Limited Detail Features for Small Object Detection" Applied Sciences 13, no. 5: 3174. https://doi.org/10.3390/app13053174