YOLO-SSFS: A Method Combining SPD-Conv/STDL/IM-FPN/SIoU for Outdoor Small Target Vehicle Detection
Abstract
:1. Introduction
2. Related Works
2.1. Development of Object Detection Algorithms
2.2. Achievements Related to Small Target Detection
3. Improved YOLOv5 Network
3.1. Introduction of YOLOv5
3.2. SPD-Conv
3.3. Small Target Detection Layer (STDL)
3.4. Improved Feature Pyramid Network (IM-FPN)
3.5. New Loss Function of Boundary Box—SIoU
4. Experimental Results
4.1. Experimental Dataset and Experimental Settings
4.2. Evaluation Metrics
4.3. Experiment Results
4.3.1. Experimental Analysis of Introducing the SPD-Conv Model
4.3.2. Experimental Analysis of Importing Small Target Detection Layer (STDL)
4.3.3. Experimental Analysis of Importing Improved FPN (IM-FPN)
4.3.4. Experimental Analysis of Importing Improved SIoU
4.3.5. Ablation Experiment
4.3.6. Contrast Experiment
4.3.7. Generalization Experiment
5. Conclusions and Future Works
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision & Pattern Recognition, San Diego, CA, USA, 20–25 June 2005. [Google Scholar]
- Felzenszwalb, P.F.; Girshick, R.B.; Mcallester, D.; Ramanan, D. Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans. Softw. Eng. 2010, 32, 1627–1645. [Google Scholar] [CrossRef]
- Guo, Z.H.; Zhang, L.; Zhang, D. A completed modeling of local binary pattern operator for texture classification. IEEE Trans. Image Process. 2012, 19, 1657–1663. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. In Proceedings of the Conference on Learning Theory, Nashville TN, USA, 6–9 July 1997. [Google Scholar]
- Lecun, Y.; Bottou, L. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 1904–1916. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, USA, 23–28 June 2014. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object Detection via Region-based Fully Convolutional Networks; Curran Associates Inc.: Red Hook, NY, USA, 2016. [Google Scholar]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the Computer Vision & Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Müller, S.; Hutter, F. TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation. In Proceedings of the IEEE/CVF international conference on computer vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Kisantal, M.; Wojna, Z.; Murawski, J.; Naruniec, J.; Cho, K. Augmentation for small object detection. arXiv 2019, arXiv:1902.07296. [Google Scholar]
- Zhao, T.; Liu, J.Y.; Shen, Q. An improved multi-gated feature pyramid network. Acta Opt. Sin. 2019, 39, 235–244. [Google Scholar]
- Nayan, A.A.; Saha, J.; Mozumder, A.N. Real Time Detection of Small Objects. Int. J. Innov. Technol. Explor. Eng. 2020, 9, 837. [Google Scholar]
- Zhou, Y.; Cai, Z.; Zhu, Y.; Yan, J. Automatic ship detection in SAR Image based on Multi-scale Faster R-CNN. J. Phys. Conf. Ser. 2020, 1550, 042006. [Google Scholar] [CrossRef]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving Into High Quality Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Bai, Y.; Zhang, Y.; Ding, M. SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Li, J.; Liang, X.; Wei, Y.; Xu, T.; Feng, J.; Yan, S. Perceptual Generative Adversarial Networks for Small Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Lim, J.S.; Astrid, M.; Yoon, H.J.; Lee, S.I. Small Object Detection using Context and Attention. In Proceedings of the 2021 international Conference on Artificial intelligence in information and Communication (ICAIIC), Jeju Island, Republic of Korea, 13–16 April 2021. [Google Scholar]
- Xu, H.; Yang, D.; Jiang, Q. Improvement of lightweight vehicle detection network based on SSD. Comput. Eng. Appl. 2022, 58, 209–217. [Google Scholar]
- Sri, J.S.; Esther, R.P. LittleYOLO-SPP: A Delicate Real-Time Vehicle Detection Algorithm. Optik 2020, 225, 165818. [Google Scholar]
- Liu, M.; Wang, J.; Dong, G.G.; Yi, W.M. Weakly labeled sound event detection based on improved pooling layer. J. Signal Process. 2021, 37, 1907–1913. [Google Scholar] [CrossRef]
- Lin, G.; Shen, W. Research on convolutional neural network based on improved Relu piecewise activation function. Procedia Comput. Sci. 2018, 131, 977–984. [Google Scholar] [CrossRef]
- Saponara, S.; Elhanashi, A.; Gagliardi, A. Reconstruct fingerprint images using deep learning and sparse autoencoder algorithms. In Real-Time Image Processing and Deep Learning 2021; SPIE: Bellingham, WA, USA, 2021; Volume 11736. [Google Scholar]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Abdusalomov, A.B.; Mukhiddinov, M.; Kutlimuratov, A.; Whangbo, T.K. Improved Real-Time Fire Warning System Based on Advanced Technologies for Visually Impaired People. Sensors 2022, 22, 7305. [Google Scholar] [CrossRef]
- Norkobil Saydirasulovich, S.; Abdusalomov, A.; Jamil, M.K.; Nasimov, R.; Kozhamzharova, D.; Cho, Y.-I. A YOLOv6-Based Improved Fire Detection Approach for Smart City Environments. Sensors 2023, 23, 3161. [Google Scholar] [CrossRef] [PubMed]
- Mukhiddinov, M.; Abdusalomov, A.B.; Cho, J. A Wildfire Smoke Detection System Using Unmanned Aerial Vehicle Images Based on the Optimized YOLOv5. Sensors 2022, 22, 9384. [Google Scholar] [CrossRef]
- Sunkara, R.; Luo, T. No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Cham, Switzerland, 2022. [Google Scholar]
- Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 1571–1580. [Google Scholar]
- Long, S.; Song, X.F.; Zhang, S.; Zhang, Q.L. Improved YOLOv5s aerial image vehicle detection research. Laser J. 2022, 43, 22–29. [Google Scholar] [CrossRef]
- Zou, Z.; Shi, Z.; Guo, Y.; Ye, J. Object Detection in 20 Years: A Survey. Proc. IEEE 2019, 111, 257–276. [Google Scholar] [CrossRef]
Category | Train | Val | Test | Total |
---|---|---|---|---|
Quantity | 6543 | 818 | 817 | 8178 |
Methods | Params (106) | Precision (%) | Recall (%) | mAP (%) | FPS |
---|---|---|---|---|---|
YOLOv5s | 7.01 | 84.50 | 67.44 | 75.44 | 57 |
YOLOv5s-SPD | 8.56 | 85.90 | 68.88 | 78.40 | 55 |
Methods | Params (106) | Precision (%) | Recall (%) | mAP (%) | FPS |
---|---|---|---|---|---|
YOLOv5s | 7.01 | 84.50 | 67.44 | 75.44 | 57 |
YOLOv5s (STDL) | 7.30 | 84.30 | 75.40 | 82.10 | 53 |
Methods | Params (106) | Precision (%) | Recall (%) | mAP (%) | FPS |
---|---|---|---|---|---|
YOLOv5s | 7.01 | 84.50 | 67.44 | 75.44 | 57 |
YOLOv5s (IM-FPN) | 7.06 | 83.76 | 68.41 | 76.40 | 57 |
Methods | Params (106) | Precision (%) | Recall (%) | mAP (%) | FPS |
---|---|---|---|---|---|
YOLOv5s | 7.01 | 84.50 | 67.44 | 75.44 | 57 |
YOLOv5s (SIoU) | 7.01 | 84.56 | 68.31 | 78.37 | 59 |
SPD-Conv | STDL | IM-FPN | SIoU | mAP0.5 (%) | FPS |
---|---|---|---|---|---|
75.44 | 57 | ||||
√ | 78.40 (+2.96) | 55 | |||
√ | √ | 80.25 (+1.85) | 54 | ||
√ | √ | √ | 81.41 (+1.16) | 55 | |
√ | √ | √ | √ | 83.07 (+1.66) | 52 |
Methods | Params (106) | Precision (%) | Recall (%) | mAP (%) | FPS |
---|---|---|---|---|---|
Faster R-CNN [42] | 60.52 | 82.20 | 53.77 | 57.00 | 33 |
FCOS [43] | 50.96 | / | / | 57.60 | 16 |
YOLOv3 | 61.52 | 84.77 | 76.54 | 80.42 | 46 |
YOLOv4 | 63.94 | 83.25 | 70.75 | 78.31 | 34 |
YOLOv5s | 7.01 | 84.50 | 67.44 | 75.44 | 57 |
YOLOv5-SSFS | 8.97 | 84.10 | 76.16 | 83.07 | 52 |
Methods | Params (106) | Precision (%) | Recall (%) | mAP (%) | FPS |
---|---|---|---|---|---|
YOLOv5s | 7.01 | 86.91 | 67.98 | 85.72 | 57 |
YOLO-SSFS | 8.97 | 87.42 | 69.18 | 88.90 | 52 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gu, Z.; Zhu, K.; You, S. YOLO-SSFS: A Method Combining SPD-Conv/STDL/IM-FPN/SIoU for Outdoor Small Target Vehicle Detection. Electronics 2023, 12, 3744. https://doi.org/10.3390/electronics12183744
Gu Z, Zhu K, You S. YOLO-SSFS: A Method Combining SPD-Conv/STDL/IM-FPN/SIoU for Outdoor Small Target Vehicle Detection. Electronics. 2023; 12(18):3744. https://doi.org/10.3390/electronics12183744
Chicago/Turabian StyleGu, Zhenchao, Kai Zhu, and Shangtao You. 2023. "YOLO-SSFS: A Method Combining SPD-Conv/STDL/IM-FPN/SIoU for Outdoor Small Target Vehicle Detection" Electronics 12, no. 18: 3744. https://doi.org/10.3390/electronics12183744
APA StyleGu, Z., Zhu, K., & You, S. (2023). YOLO-SSFS: A Method Combining SPD-Conv/STDL/IM-FPN/SIoU for Outdoor Small Target Vehicle Detection. Electronics, 12(18), 3744. https://doi.org/10.3390/electronics12183744