A Novel Small Target Detection Strategy: Location Feature Extraction in the Case of Self-Knowledge Distillation
Abstract
:1. Introduction
2. Related Work
2.1. Small Target Detection
2.2. Self-Distillation and Self-Supervised Contrastive Learning
2.3. Normalization
2.4. Activation Function
3. Materials and Methods
3.1. Self-Supervised Learning for Pre-Training
3.2. The Location Feature Extraction Strategy
3.2.1. Location Feature Extraction Structure
3.2.2. Instantiation
4. Experiment and Results
4.1. Dataset Pre-Processing
4.2. Experimental Environment Setting
4.3. Evaluation Metrics
4.4. Experimental Result
4.4.1. Visual Experiment on the Effect of Location Feature Extraction
4.4.2. False Positive Error Analysis Experiment of Characteristics
4.4.3. Experiment on Exploring the Effect of LFE Blocks Inserted into the Network
4.4.4. The Joint Experiment of Self-Supervised Self-Knowledge Distillation and LFE Block under GPU Condition
Pretext Dataset | Pretext Task | Downstream Dataset | Baseline | Training Parameters | mAP@0.5 | mAP@0.5:0:95 |
---|---|---|---|---|---|---|
Random initialization | - | PCB | ResNet18 | Epoch = 50 Batch size = 32 | 48.2% | 18.3% |
ImageNet100 | Simdis2x | ResNet18 | Epoch = 50 Batch size = 32 | 59.7% | 23.6% | |
Random initialization | - | ResNet18 + LFE block | Epoch = 150 Batch size = 64 | 48.7% | 19.1% | |
ImageNet100 | Simdis2x | ResNet18 + LFE block | Epoch = 150 Batch size = 64 | 63.1% | 23.9% | |
Random initialization | - | Small target | ResNet18 | Epoch = 100 Batch size = 64 | 80.5% | 59.2% |
ImageNet100 | Simdis2x | ResNet18 | Epoch = 100 Batch size = 64 | 80.6% | 59% | |
Random initialization | - | ResNet18 + LFE block | Epoch = 50 Batch size = 32 | 80.5% | 59.7% | |
ImageNet100 | Simdis2x | ResNet18 + LFE block | Epoch = 50 Batch size = 32 | 81.5% | 61.5% | |
Random initialization | - | Pascal VOC | ResNet18 | Epoch = 150 Batch size = 64 | 83.6% | 60.3% |
ImageNet100 | Simdis2x | ResNet18 | Epoch = 150 Batch size = 64 | 83.6% | 60.5% | |
Random initialization | - | ResNet18 + LFE block | Epoch = 100 Batch size = 64 | 83.6% | 60.8% | |
ImageNet100 | Simdis2x | ResNet18 + LFE block | Epoch = 100 Batch size = 64 | 83.6% | 61.2% |
Baseline | mAP@0.5 | mAP@0.5:0:95 |
---|---|---|
YOLOv7 | 90.3% | 71.2% |
YOLOv7 + two LFE blocks | 90.5% | 71.2% |
FA_SSD | 70.3% | 67.5% |
FA_SSD + two LFE blocks | 71.1% | 68.4% |
4.5. Ablation Study
4.5.1. Design of the ConvNHS Module
4.5.2. Position and Quantity of LFE Block Inserted into YOLOv4
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162. [Google Scholar]
- Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Shi, J.; Ouyang, W.; et al. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4974–4983. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common objects in contex. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Chen, C.; Liu, M.Y.; Tuzel, O.; Xiao, J. R-CNN for small object detection. In Proceedings of the Computer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; Springer: Cham, Switzerland, 2016; pp. 214–230. [Google Scholar]
- Liu, Y.; Sun, P.; Wergeles, N.; Shang, Y. A Survey and Performance Evaluation of Deep Learning Methods for Small Object Detection. Expert Syst. Appl. 2021, 172, 114602. [Google Scholar] [CrossRef]
- Han, W.; Chen, J.; Wang, L.; Feng, R.; Li, F.; Wu, L.; Tian, T.; Yan, J. Methods for Small, Weak Object Detection in Optical High-Resolution Remote Sensing Images: A survey of advances and challenges. IEEE Geosci. Remote Sens. Mag. 2021, 9, 8–34. [Google Scholar] [CrossRef]
- Gao, X.; Mo, M.; Wang, H.; Leng, J. Recent Advances in Small Object Detection. J. Data Acquis. Process. 2021, 36, 391–417. [Google Scholar]
- Chen, G.; Wang, H.; Chen, K.; Li, Z.; Song, Z.; Liu, Y.; Chen, W.; Knoll, A. A Survey of the Four Pillars for Small Object Detection: Multiscale Representation, Contextual Information, Super-Resolution, and Region Proposal. IEEE Trans. Syst. Man Cybern. Syst. 2020, 52, 936–953. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Fu, C.Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. DSSD: Deconvolutional Single Shot Detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
- Kong, T.; Yao, A.; Chen, Y.; Sun, F. HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 845–853. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Lim, J.S.; Astrid, M.; Yoon, H.J.; Lee, S.I. Small Object Detection using Context and Attention. In Proceedings of the 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Jeju Island, Republic of Korea, 13–16 April 2021; pp. 181–186. [Google Scholar]
- Aguinaldo, A.; Chiang, P.Y.; Gain, A.; Patil, A.; Pearson, K.; Feizi, S. Pearson and S. Feizi. Compressing GANs using Knowledge Distillation. arXiv 2019, arXiv:1902.00159. [Google Scholar]
- Ahn, S.; Hu, S.X.; Damianou, A.; Lawrence, N.D.; Dai, Z. Variational Information Distillation for Knowledge Transfer. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Aditya, S.; Saha, R.; Yang, Y.; Baral, C. Spatial Knowledge Distillation to Aid Visual Reasoning. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7–11 January 2019. [Google Scholar]
- Jing, L.; Tian, Y. Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 4037–4058. [Google Scholar] [CrossRef]
- Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 1571–1580. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef] [Green Version]
- Cui, L.; Ma, R.; Lv, P.; Jiang, X.; Gao, Z.; Zhou, B.; Xu, M. MDSSD: Multi-scale deconvolutional single shot detector for small objects. arXiv 2018, arXiv:1805.07009. [Google Scholar] [CrossRef] [Green Version]
- Liu, Z.; Li, D.; Ge, S.S.; Tian, F. Small traffific sign detection from large image. Appl. Intell. 2020, 50, 1–13. [Google Scholar] [CrossRef]
- Duan, K.; Du, D.; Qi, H.; Huang, Q. Detecting small objects using a channel-aware deconvolutional network. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 1639–1652. [Google Scholar] [CrossRef]
- Bell, S.; Zitnick, C.L.; Bala, K.; Girshick, R. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2874–2883. [Google Scholar]
- Leng, J.; Liu, Y.; Du, D.; Zhang, T.; Quan, P. Robust obstacle detection and recognition for driver assistance systems. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1560–1571. [Google Scholar] [CrossRef]
- Chen, X.L.; Gupta, A. Spatial Memory for Context Reasoning in Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar] [CrossRef]
- Hu, H.; Gu, J.; Zhang, Z.; Dai, J.; Wei, Y. Relation Networks for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3588–3597. [Google Scholar]
- Jie, H.; Li, S.; Gang, S. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 2011–2023. [Google Scholar]
- Mo, K. Spatial Transformer Network. Neural Inf. Process. Syst. 2017, 2017–2025. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. ECCV 2018, 3–19. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 19–25 June 2021. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Zhang, L.; Bao, C.; Ma, K. Self-Distillation: Towards Efficient and Compact Neural Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4388–4403. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Song, J.; Gao, A.; Chen, J.; Bao, C.; Ma, K. Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar] [CrossRef]
- Yang, C.; Xie, L.; Su, C.; Yuille, A.L. Snapshot Distillation: Teacher-Student Optimization in One Generation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2854–2863. [Google Scholar]
- Phuong, M.; Lampert, C. Distillation-Based Training for Multi-Exit Architectures. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar] [CrossRef]
- Mobahi, H.; Farajtabar, M.; Bartlett, P. Self-Distillation Amplifies Regularization in Hilbert Space. Adv. Neural Inf. Process. Syst. 2020, 33, 3351–3361. [Google Scholar]
- Zhang, Z.; Sabuncu, M.R. Self-Distillation as Instance-Specific Label Smoothing. Adv. Neural Inf. Process. Syst. 2020, 33, 2184–2195. [Google Scholar]
- Oord, A.V.D.; Li, Y.; Vinyals, O. Representation Learning with Contrastive Predictive Coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar] [CrossRef]
- Chen, X.; Fan, H.; Girshick, R.; He, K. Improved Baselines with Momentum Contrastive Learning. arXiv 2020, arXiv:2003.04297. [Google Scholar]
- Chopra, S.; Hadsell, R.; LeCun, Y. Learning a similarity metric discriminatively, with application to face verification. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; Volume 1, pp. 539–546. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; p. 119. [Google Scholar]
- Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. In Proceedings of the 34th Conference on Neural Information Processing Systems, Online, 6–12 December 2020; Volume 33, pp. 9912–9924. [Google Scholar]
- Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Avila Pires, B.; Guo, Z.; Gheshlaghi Azar, M.; et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning. Adv. Neural Inf. Process. Syst. 2020, 33, 21271–21284. [Google Scholar]
- Chen, X.; He, K. Exploring Simple Siamese Representation Learning. Comput. Vis. Pattern Recognit. 2021, 15745–15753. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; Volume 37, pp. 448–456. [Google Scholar]
- Wu, Y.; He, K. Group Normalization. Int. J. Comput. Vis. 2019, 128, 742–755. [Google Scholar] [CrossRef]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer Normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
- Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance Normalization: The Missing Ingredient for Fast Stylization. arXiv 2016, arXiv:1607.08022. [Google Scholar]
- Ortiz, A.; Robinson, C.; Morris, D.; Fuentes, O.; Kiekintveld, C.; Hassan, M.M.; Jojic, N. Local Context Normalization: Revisiting Local Normalization. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar] [CrossRef]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep Sparse Rectifier Neural Networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, Ft. Lauderdale, FL, USA, 11–13 April 2011; Volume 15, pp. 315–323. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Kim, K.; Ji, B.; Yoon, D.; Hwang, S. Self-Knowledge Distillation with Progressive Refinement of Targets. Int. Conf. Comput. Vis. 2021, 6, 6567–6576. [Google Scholar]
- Lee, H.; Lee, K.; Lee, K.; Lee, H.; Shin, J. Improving Transferability of Representations via Augmentation-Aware Self-Supervision. Adv. Neural Inf. Process. Syst. 2021, 34, 17710–17722. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Roy, A.M.; Bose, R.; Bhaduri, J. A fast accurate fine-grain object detection model based on YOLOv4 deep neural network. Neuural Comput. Appl. 2021, 34, 3895–3921. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. In IEEE Transactions on Pattern Analysis and Machine Intelligence; IEEE: Piscataway, NJ, USA, 2015; Volume 37, pp. 1904–1916. [Google Scholar] [CrossRef] [Green Version]
- Everingham, M.; Eslami SM, A.; Van Gool, L.; Williams CK, I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes Challenge: A Retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
- Hoiem, D.; Chodpathumwan, Y.; Dai, Q. Diagnosing Error in Object Detectors. In Proceedings of the 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012. Part III. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
Training Parameters | Parameter Value |
---|---|
Input_shape | [416, 416] |
Initial learning rate | 0.01 |
Min learning rate | Initial learning rate * 0.01 |
Optimizer_type | sgd |
Learning rate_decay_type | cos |
CPU | Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHZ (4 processors) |
GPU | GeForce GTX 1080Ti/PCIe/SSE2 |
CPU | GPU | - | |||||
---|---|---|---|---|---|---|---|
Network | Training Parameters | mAP@0.5 | Network | Training Parameters | mAP@0.5 | mAP@0.5:0:95 | Dataset |
YOLOv4 | Epoch = 50 Batch size = 32 | 43.65% | YOLOv4 | Epoch = 50 Batch size = 32 | 80.2% | 58.1% | Small target |
YOLOv4 + Five LFE blocks | Epoch = 50 Batch size = 32 | 45.45% | YOLOv4 + one LFE block | Epoch = 50 Batch size = 32 | 81.6% | 60% | |
YOLOv4 | Epoch = 50 Batch size = 32 | 36.29% | YOLOv4 | Epoch = 150 Batch size = 64 | 62% | 24.7% | PCB |
YOLOv4 + Five LFE blocks | Epoch = 50 Batch size = 32 | 38.89% | YOLOv4 + one LFE block | Epoch = 150 Batch size = 64 | 67.3% | 27.7% | |
YOLOv4 | Epoch = 50 Batch size = 32 | 84.24% | YOLOv4 | Epoch = 100 Batch size = 64 | 82.1% | 57.1% | Pascal VOC 07 + 12 |
YOLOv4 + Five LFE blocks | Epoch = 50 Batch size = 32 | 84.47% | YOLOv4 + one LFE block | Epoch = 100 Batch size = 64 | 82.6% | 58.7% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, G.; Li, J.; Yan, S.; Liu, R. A Novel Small Target Detection Strategy: Location Feature Extraction in the Case of Self-Knowledge Distillation. Appl. Sci. 2023, 13, 3683. https://doi.org/10.3390/app13063683
Liu G, Li J, Yan S, Liu R. A Novel Small Target Detection Strategy: Location Feature Extraction in the Case of Self-Knowledge Distillation. Applied Sciences. 2023; 13(6):3683. https://doi.org/10.3390/app13063683
Chicago/Turabian StyleLiu, Gaohua, Junhuan Li, Shuxia Yan, and Rui Liu. 2023. "A Novel Small Target Detection Strategy: Location Feature Extraction in the Case of Self-Knowledge Distillation" Applied Sciences 13, no. 6: 3683. https://doi.org/10.3390/app13063683
APA StyleLiu, G., Li, J., Yan, S., & Liu, R. (2023). A Novel Small Target Detection Strategy: Location Feature Extraction in the Case of Self-Knowledge Distillation. Applied Sciences, 13(6), 3683. https://doi.org/10.3390/app13063683