A Fast and Robust Safety Helmet Network Based on a Mutilscale Swin Transformer
Abstract
:1. Introduction
2. Methods
2.1. Feature Extraction Based on MAE-NAS
2.2. Feature Fusion Based on the Multiscale Swin Transformer and Efficient-RepGFPN
2.3. Loss Function
Algorithm 1: A Fast and Robust Safety Helmet Network Based on Mutilscale Swin Transformer |
Input: The input images . Output: The bounding box with a score for helmet.
|
3. Experimental Results and Discussions
3.1. Datasets
- Pictor-v3 is a multi-source dataset specifically for helmet detection that fuses images from crowdsourcing (698) and web mining (774). Among these images, the crowdsourced image contained 2496 worker instances, while the web-mined image contained 2230 worker instances.
- SHWD is a public dataset of safety helmet use and head detection, consisting of 7581 high-resolution images. In the SHWD dataset, 9044 safety helmet human subjects were labeled as positive, and 111,514 normal head subjects were labeled as not wearing a helmet or as negative samples.
3.2. Evaluation Criteria
3.3. Implementation Details
3.4. Comparison with the State-of-the-Art Models
3.5. Ablation Studies of FRSHNet
- FRSHNet#1: ResNet18 + (b)+ Zero Head.
- FRSHNet#2: ResNet50 + (b)+ Zero Head.
- FRSHNet#3: (a) + Zero Head.
- FRSHNet: (a) + (b) + Zero Head.
4. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
Abbreviations | Description |
RFID | Radio frequency identification |
CNN | Convolution neural network |
FOCS | Fully convolutional one-stage object detection |
SSD | Single shot multibox detector |
(S)W-MSA | (Shifted)-window-based multi-head self attention |
ViT | Vision transformer |
PRC | Precision–recall curve |
References
- Wu, H.; Zhao, J. An intelligent vision-based approach for helmet identification for work safety. Comput. Ind. 2018, 100, 267–277. [Google Scholar] [CrossRef]
- Yu, F.; Wang, X.; Li, J.; Wu, S.; Zhang, J.; Zeng, Z. Towards Complex Real-World Safety Factory Inspection: A High-Quality Dataset for Safety Clothing and Helmet Detection. arXiv 2023, arXiv:2306.02098. [Google Scholar]
- Chen, C.; Wu, W. Color pattern recognition with the multi-channel non-zero-order joint transform correlator based on the HSV color space. Opt. Commun. 2005, 244, 51–59. [Google Scholar] [CrossRef]
- Kelm, A.; Laußat, L.; Meins-Becker, A.; Platz, D.; Khazaee, M.J.; Costin, A.M.; Helmus, M.; Teizer, J. Mobile passive Radio Frequency Identification (RFID) portal for automated and rapid control of Personal Protective Equipment (PPE) on construction sites. Autom. Constr. 2013, 36, 38–52. [Google Scholar] [CrossRef]
- Li, Y.; Wei, H.; Han, Z.; Huang, J.; Wang, W. Deep Learning-Based Safety Helmet Detection in Engineering Management Based on Convolutional Neural Network. Adv. Civ. Eng. 2020, 2020, 10. [Google Scholar] [CrossRef]
- Rajaraman, V. Radio frequency identification. Reson 2017, 22, 549–575. [Google Scholar] [CrossRef]
- Dolez, P.I. Chapter 3.6—Progress in Personal Protective Equipment for Nanomaterials. In Nanoengineering; Dolez, P.I., Ed.; Elsevier: Amsterdam, The Netherlands, 2015; pp. 607–635. [Google Scholar] [CrossRef]
- Swain, M.J.; Ballard, D.H. Indexing via Color Histograms. In Proceedings of the Active Perception and Robot Vision, Maratea, Italy, 16–29 July 1989; Sood, A.K., Wechsler, H., Eds.; Springer: Berlin/Heidelberg, Germany, 1992; pp. 261–273. [Google Scholar]
- Žunić, J.; Hirota, K.; Rosin, P.L. A Hu moment invariant as a shape circularity measure. Pattern Recognit. 2010, 43, 47–57. [Google Scholar] [CrossRef]
- Pietikäinen, M. Local Binary Patterns. Scholarpedia 2010, 5, 9775. [Google Scholar] [CrossRef]
- Zhigang, L.; Wenzhong, S.; Qianqing, Q.; Xiaowen, L.; Donghui, X. Hierarchical support vector machines. In Proceedings of the 2005 IEEE International Geoscience and Remote Sensing Symposium (IGARSS ’05), Seoul, Republic of Korea, 20 July 2005; Volume 1, p. 4. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Jocher, G.; Stoken, A.; Borovec, J.; NanoCode012, C.; Changyu, L.; Laughing, H. ultralytics/yolov5: v3.0. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 20 December 2020).
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Law, H.; Deng, J. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv 2021, arXiv:2103.14030. [Google Scholar]
- Ding, L.; Fang, W.; Luo, H.; Love, P.E.; Zhong, B.; Ouyang, X. A deep hybrid learning model to detect unsafe behavior: Integrating convolution neural networks and long short-term memory. Autom. Constr. 2018, 86, 118–124. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Othman, N.A.; Aydin, I. A New Deep Learning Application Based on Movidius NCS for Embedded Object Detection and Recognition. In Proceedings of the 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Ankara, Turkey, 19–21 October 2018; pp. 1–5. [Google Scholar] [CrossRef]
- Xu, X.; Jiang, Y.; Chen, W.; Huang, Y.; Zhang, Y.; Sun, X. DAMO-YOLO: A Report on Real-Time Object Detection Design. arXiv 2023, arXiv:cs.CV/2211.15444. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Chen, H.; Shi, Z. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote. Sens. 2020, 12, 1662. [Google Scholar] [CrossRef]
- Song, F.; Zhang, S.; Lei, T.; Song, Y.; Peng, Z. MSTDSNet-CD: Multiscale Swin Transformer and Deeply Supervised Network for Change Detection of the Fast-Growing Urban Regions. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the Computer Vision—ECCV 2020 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Chen, H.; Wang, Y.; Guo, T.; Xu, C.; Deng, Y.; Liu, Z.; Ma, S.; Xu, C.; Xu, C.; Gao, W. Pre-trained image processing transformer. arXiv 2020, arXiv:2012.00364. [Google Scholar]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7036–7045. [Google Scholar]
- Jiang, Y.; Tan, Z.; Wang, J.; Sun, X.; Lin, M.; Li, H. GiraffeDet: A heavy-neck paradigm for object detection. arXiv 2022, arXiv:2202.04256. [Google Scholar]
- Nath, N.D.; Behzadan, A.H.; Paal, S.G. Deep Learning for Site Safety: Real-Time Detection of Personal Protective Equipment. Autom. Constr. 2020, 112, 103085. [Google Scholar] [CrossRef]
- Gochoo, M. Safety Helmet Wearing Dataset. Mendeley Data. 2021. Available online: https://github.com/njvisionpower/Safety-Helmet-Wearing-Dataset (accessed on 17 December 2019).
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar] [CrossRef]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Zhang, H.; Wang, Y.; Dayoub, F.; Sünderhauf, N. VarifocalNet: An IoU-aware Dense Object Detector. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 8510–8519. [Google Scholar] [CrossRef]
- Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. TOOD: Task-aligned One-stage Object Detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 3490–3499. [Google Scholar] [CrossRef]
Models | Backbone | Pictor-v3 | SHWD | ||
---|---|---|---|---|---|
mAp (0.50) | mAp (0.5:0.95) | mAp (0.50) | mAp (0.5:0.95) | ||
Faster R-CNN | ResNet-50 | 0.9060 | 0.5340 | 0.8480 | 0.6310 |
Retina-Net | ResNet-50 | 0.9054 | 0.5438 | 0.8548 | 0.6356 |
SSD-512 | VGG16 | 0.8550 | 0.4880 | 0.8080 | 0.5740 |
YOLO-v5 | CSPDarknet53 | 0.8818 | 0.5358 | 0.8399 | 0.6386 |
FCOS | ResNet-50 | 0.8950 | 0.5240 | 0.8580 | 0.6390 |
VF-Net | ResNet-50 | 0.9140 | 0.5520 | 0.8570 | 0.6390 |
TOOD | ResNet-50 | 0.9150 | 0.5580 | 0.8670 | 0.6440 |
FRSHNet | MAE-NAS | 0.9630 | 0.6570 | 0.9470 | 0.6810 |
Models | mAp (0.50) | mAp (0.5:0.95) | FPS |
---|---|---|---|
FRSHNet#1 | 0.9170 | 0.5750 | 6873.3 |
FRSHNet#2 | 0.9320 | 0.6210 | 6634.6 |
FRSHNet#3 | 0.8520 | 0.5270 | 7342.1 |
FRSHNet | 0.9470 | 0.6810 | 6785.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiang, C.; Yin, D.; Song, F.; Yu, Z.; Jian, X.; Gong, H. A Fast and Robust Safety Helmet Network Based on a Mutilscale Swin Transformer. Buildings 2024, 14, 688. https://doi.org/10.3390/buildings14030688
Xiang C, Yin D, Song F, Yu Z, Jian X, Gong H. A Fast and Robust Safety Helmet Network Based on a Mutilscale Swin Transformer. Buildings. 2024; 14(3):688. https://doi.org/10.3390/buildings14030688
Chicago/Turabian StyleXiang, Changcheng, Duofen Yin, Fei Song, Zaixue Yu, Xu Jian, and Huaming Gong. 2024. "A Fast and Robust Safety Helmet Network Based on a Mutilscale Swin Transformer" Buildings 14, no. 3: 688. https://doi.org/10.3390/buildings14030688
APA StyleXiang, C., Yin, D., Song, F., Yu, Z., Jian, X., & Gong, H. (2024). A Fast and Robust Safety Helmet Network Based on a Mutilscale Swin Transformer. Buildings, 14(3), 688. https://doi.org/10.3390/buildings14030688