Detection of Road Crack Images Based on Multistage Feature Fusion and a Texture Awareness Method
Abstract
:1. Introduction
2. Related Work
2.1. CNN-Based Crack Segmentation Models
2.2. Transformer-Based Crack Segmentation Models
3. FetNet
3.1. Network Architecture
3.2. Swin Transformer-Based Feature Extraction Block
3.3. Texture Unit
3.4. The Refinement Attention Module
3.5. The Panoramic Feature Module
3.6. The Cascade Pyramid Architecture
3.7. The Fully Connected Conditional Random Field
3.8. Loss Function
4. Experiments
4.1. Datasets
4.2. Implementation Details
4.2.1. Computational Platform
4.2.2. Parameter Settings
4.3. Evaluation Criteria
4.4. Results and Discussion
4.5. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kang, D.; Benipal, S.S.; Gopal, D.L.; Cha, Y.J. Hybrid pixel-level concrete crack segmentation and quantification across complex backgrounds using deep learning. Autom. Constr. 2020, 118, 103291. [Google Scholar] [CrossRef]
- Yang, S.; Xu, Q.; Wang, Z. Research progress of structural damage recognition based on convolutional neural networks. J. Archit. Civ. Eng. 2022, 39, 38–57. [Google Scholar]
- Ni, T.; Zhou, R.; Gu, C.; Yang, Y. Measurement of concrete crack feature with android smartphone app based on digital image processing techniques. Measurement 2020, 150, 107093. [Google Scholar] [CrossRef]
- Choi, S.; Kim, K.; Lee, J.; Park, S.H.; Lee, H.J.; Yoon, J. Image processing algorithm for real-time crack inspection in hole expansion test. Int. J. Precis. Eng. Manuf. 2019, 20, 1139–1148. [Google Scholar] [CrossRef]
- Qiao, W.; Liu, Q.; Wu, X.; Ma, B.; Li, G. Automatic pixel-level pavement crack recognition using a deep feature aggregation segmentation network with a scSE attention mechanism module. Sensors 2021, 21, 2902. [Google Scholar] [CrossRef]
- Feng, D.; Zhang, Z.; Yan, K. A Semantic Segmentation Method for Remote Sensing Images Based on the Swin Transformer Fusion Gabor Filter. IEEE Access 2022, 10, 77432–77451. [Google Scholar] [CrossRef]
- Peng, C.; Zhang, X.; Yu, G.; Luo, G.; Sun, J. Large kernel matters—Improve semantic segmentation by global convolutional network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4353–4361. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 10012–10022. [Google Scholar]
- Wang, L.; Li, R.; Wang, D.; Duan, C.; Wang, T.; Meng, X. Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images. Remote. Sens. 2021, 13, 3065. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Wang, X.; Hu, Z. Grid-based pavement crack analysis using deep learning. In Proceedings of the 2017 4th International Conference on Transportation Information and Safety (ICTIS), Banff, AB, Canada, 8–10 August 2017; pp. 917–924. [Google Scholar]
- Kim, B.; Yuvaraj, N.; Sri Preethaa, K.R.; Arun Pandian, R. Surface crack detection using deep learning with shallow CNN architecture for enhanced computation. Neural Comput. Appl. 2021, 33, 9289–9305. [Google Scholar] [CrossRef]
- Nguyen, N.H.T.; Perry, S.; Bone, D.; Le, H.T.; Nguyen, T.T. Two-stage convolutional neural network for road crack detection and segmentation. Expert Syst. Appl. 2021, 30, 115718. [Google Scholar] [CrossRef]
- Yang, X.; Li, H.; Yu, Y.; Luo, X.; Huang, T.; Yang, X. Automatic pixel-level crack detection and measurement using fully convolutional network. Comput. Civ. Infrastruct. Eng. 2018, 33, 1090–1109. [Google Scholar] [CrossRef]
- Li, S.; Zhao, X.; Zhou, G. Automatic pixel-level multiple damage detection of concrete structure using fully convolutional network. Comput. Civ. Infrastruct. Eng. 2019, 34, 616–634. [Google Scholar] [CrossRef]
- Hsieh, Y.A.; Tsai, Y.J. Machine learning for crack detection: Review and model performance comparison. J. Comput. Civ. Eng. 2020, 34, 04020038. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Computer Vision—ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.; et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 6881–6890. [Google Scholar]
- Wang, W.; Su, C. Automatic concrete crack segmentation model based on transformer. Autom. Constr. 2022, 139, 104275. [Google Scholar] [CrossRef]
- Li, R.; Su, J.; Duan, C.; Zheng, S. Linear attention mechanism: An efficient attention for semantic segmentation. arXiv 2020, arXiv:2007.14902. [Google Scholar]
- Gao, J.; Geng, X.; Zhang, Y.; Wang, R.; Shao, K. Augmented weighted bidirectional feature pyramid network for marine object detection. Expert Syst. Appl. 2024, 237, 121688. [Google Scholar] [CrossRef]
- Lafferty, J.; McCallum, A.; Pereira, F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the International Conference on Machine Learning, Williamstown, MA, USA, 28 June–1 July 2001; p. 3. [Google Scholar]
- Cun, X.; Pun, C.M. Image Splicing Localization via Semi-global Network and Fully Connected Conditional Random Fields. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Liu, Y.; Yao, J.; Lu, X.; Xie, R.; Li, L. DeepCrack: A deep hierarchical feature learning architecture for crack segmentation. Neurocomputing 2019, 338, 139–153. [Google Scholar] [CrossRef]
- Eisenbach, M.; Stricker, R.; Seichter, D.; Amende, K.; Debes, K.; Sesselmann, M.; Ebersbach, D.; Stoeckert, U.; Gross, H.M. How to get pavement distress detection ready for deep learning? A systematic approach. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 2039–2047. [Google Scholar]
- Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature pyramid and hierarchical boosting network for pavement crack detection. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1525–1535. [Google Scholar] [CrossRef]
- Shi, Y.; Cui, L.; Qi, Z.; Meng, F.; Chen, Z. Automatic road crack detection using random structured forests. IEEE Trans. Intell. Transp. Syst. 2016, 17, 3434–3445. [Google Scholar] [CrossRef]
- Shelhamer, E.; Long, J.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. On Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015, Proceedings of the 18th International Conference, Munich, Germany, 5–9 October 2015; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
- He, J.; Deng, Z.; Zhou, L.; Wang, Y.; Qiao, Y. Adaptive pyramid context network for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7519–7528. [Google Scholar]
- Huang, Z.; Wang, X.; Huang, L.; Huang, C.; Wei, Y.; Liu, W. Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Long Beach, CA, USA, 15–20 June 2019; pp. 603–612. [Google Scholar]
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 603–612. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. [Google Scholar]
- Yin, M.; Yao, Z.; Cao, Y.; Li, X.; Zhang, Z.; Lin, S.; Hu, H. Disentangled non-local neural networks. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 191–207. [Google Scholar]
- Yuan, Y.; Chen, X.; Wang, J. Object-contextual representations for semantic segmentation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 173–190. [Google Scholar]
- Zhang, H.; Wu, C.; Zhang, Z.; Zhu, Y.; Lin, H.; Zhang, Z.; Sun, Y.; He, T.; Mueller, J.; Manmatha, R.; et al. Resnest: Split-attention networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2736–2746. [Google Scholar]
Methods | FCN [32] | U-Net [33] | DANet [34] | ApcNet [35] | CcNet [36] | GcNet [37] | HrNet [38] | DnlNet [39] | OcrNet [40] | Resnest [41] | Ours | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
DeepCrack | P↑ | 93.9 | 90.4 | 91.4 | 85.4 | 93.9 | 77.5 | 89.4 | 92.8 | 95.8 | 93.4 | 95.1 |
R↑ | 76.8 | 69.7 | 77.8 | 82.2 | 79.9 | 77.5 | 81.5 | 81.5 | 73.5 | 80.3 | 86.4 | |
F1↑ | 83.1 | 76.2 | 83.6 | 86.7 | 85.5 | 82.4 | 84.9 | 86.2 | 80.8 | 85.6 | 89.7 | |
ACC↑ | 97.6 | 96.3 | 97.5 | 97.1 | 97.8 | 96.3 | 97.7 | 97.8 | 97.5 | 97.9 | 98.4 | |
mIoU↑ | 74.2 | 66.9 | 74.2 | 78.6 | 77.1 | 73.1 | 76.4 | 78.0 | 71.6 | 77.2 | 82.7 | |
FPS↑ (ms/img↓) | 24.0 (41.7) | 22.5 (44.5) | 19.2 (52.1) | 24.6 (40.7) | 24.9 (40.2) | 27.3 (36.7) | 46.2 (21.7) | 24.0 (41.7) | 41.1 (24.3) | 20.7 (48.3) | 60.3 (16.5) | |
GAPs384 | P↑ | 71.1 | 69.6 | 70.4 | 75.9 | 76.8 | 79.1 | 76.9 | 78.8 | 78.3 | 77.9 | 83.7 |
R↑ | 66.0 | 60.2 | 67.0 | 65.7 | 63.6 | 60.5 | 57.5 | 62.4 | 62.8 | 57.6 | 71.4 | |
F1↑ | 65.0 | 64.7 | 67.5 | 69.5 | 67.9 | 65.3 | 61.9 | 67.3 | 67.6 | 61.7 | 73.8 | |
ACC↑ | 98.8 | 98.4 | 98.5 | 98.1 | 98.4 | 98.5 | 98.1 | 98.3 | 98.7 | 98.4 | 98.5 | |
mIoU↑ | 62.9 | 58.1 | 63.2 | 61.6 | 60.5 | 58.6 | 56.2 | 60.0 | 60.3 | 56.1 | 69.8 | |
FPS↑ (ms/img↓) | 4.3 (232.6) | 4.2 (238.1) | 3.6 (277.8) | 3.7 (270.3) | 3.6 (277.8) | 4.1 (243.9) | 7.5 (133.4) | 3.4 (294.2) | 6.1 (163.9) | 3.0 (333.3) | 13.2 (75.7) | |
Crack500 | P↑ | 73.2 | 79.1 | 89.5 | 87.7 | 86.5 | 76.1 | 83.8 | 79.4 | 84.4 | 85.6 | 90.4 |
R↑ | 70.2 | 71.9 | 66.4 | 76.8 | 73.2 | 80.8 | 81.4 | 84.3 | 80.3 | 80.9 | 85.3 | |
F1↑ | 71.6 | 74.9 | 72.6 | 81.2 | 79.2 | 78.2 | 82.6 | 81.6 | 82.2 | 82.9 | 87.9 | |
ACC↑ | 94.4 | 95.3 | 95.8 | 96.6 | 96.5 | 94.9 | 96.4 | 95.8 | 96.4 | 96.9 | 97.1 | |
mIoU↑ | 62.1 | 65.3 | 63.4 | 71.8 | 69.7 | 68.4 | 73.4 | 72.1 | 72.9 | 74.0 | 78.6 | |
FPS↑ (ms/img↓) | 2.8 (357.2) | 2.1 (476.2) | 2.2 (454.6) | 6.2 (161.3) | 4.7 (212.8) | 5.3 (188.7) | 5.8 (172.4) | 6.3 (158.8) | 6.1 (163.9) | 4.3 (232.6) | 11.4 (87.7) | |
CFD | P↑ | 82.1 | 85.3 | 84.0 | 76.0 | 76.5 | 77.4 | 78.2 | 78.7 | 80.5 | 74.6 | 89.8 |
R↑ | 69.2 | 76.8 | 68.5 | 76.0 | 69.3 | 75.9 | 71.7 | 68.4 | 75.9 | 70.7 | 85.3 | |
F1↑ | 74.0 | 80.6 | 73.9 | 73.5 | 72.3 | 76.6 | 74.5 | 72.4 | 78.0 | 72.5 | 85.7 | |
ACC↑ | 98.8 | 99.0 | 98.8 | 98.6 | 98.6 | 98.6 | 98.7 | 98.6 | 98.8 | 98.5 | 99.2 | |
mIoU↑ | 65.4 | 71.8 | 65.4 | 65.0 | 63.9 | 67.8 | 65.8 | 64.0 | 69.1 | 64.1 | 77.6 | |
FPS↑ (ms/img↓) | 43.0 (23.3) | 29.8 (33.6) | 32.4 (30.9) | 38.2 (26.2) | 27.9 (35.9) | 39.5 (25.4) | 43.7 (22.9) | 41.2 (24.3) | 46.1 (21.7) | 35.5 (28.2) | 86.9 (11.5) |
Method | Texture Unit | RAM | PFM | RAM’ | PFM’ | P | mIoU | F1 |
---|---|---|---|---|---|---|---|---|
Swin-T | 90.5 | 75.5 | 83.5 | |||||
✓ | 93.3 | 78.5 | 86.3 | |||||
✓ | 93.0 | 76.3 | 85.4 | |||||
✓ | 93.2 | 77.2 | 86.1 | |||||
✓ | ✓ | ✓ | 94.8 | 82.1 | 89.4 | |||
✓ | ✓ | ✓ | 94.7 | 82.2 | 89.3 | |||
✓ | ✓ | ✓ | 95.1 | 82.7 | 89.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, M.; Tian, W.; Li, Y.; Sui, D. Detection of Road Crack Images Based on Multistage Feature Fusion and a Texture Awareness Method. Sensors 2024, 24, 3268. https://doi.org/10.3390/s24113268
Guo M, Tian W, Li Y, Sui D. Detection of Road Crack Images Based on Multistage Feature Fusion and a Texture Awareness Method. Sensors. 2024; 24(11):3268. https://doi.org/10.3390/s24113268
Chicago/Turabian StyleGuo, Maozu, Wenbo Tian, Yang Li, and Dong Sui. 2024. "Detection of Road Crack Images Based on Multistage Feature Fusion and a Texture Awareness Method" Sensors 24, no. 11: 3268. https://doi.org/10.3390/s24113268
APA StyleGuo, M., Tian, W., Li, Y., & Sui, D. (2024). Detection of Road Crack Images Based on Multistage Feature Fusion and a Texture Awareness Method. Sensors, 24(11), 3268. https://doi.org/10.3390/s24113268