Three-Dimensional Convolutional Vehicle Black Smoke Detection Model with Fused Temporal Features
Abstract
:1. Introduction
- Dataset: We assembled and publicly released a dataset for black smoke detection, which has been sourced from roadway CCTV cameras within China. We hope our dataset will facilitate the advancement of black smoke detection models and support their practical applications.
- Model: Our model features a dual-branch architecture and 3D convolutions for the concurrent analysis of temporal features, capturing black smoke movement, alongside a feature fusion network that integrates cross-scale and self-attention mechanisms to refine detection by focusing on key features.
- Experiments: The proposed method is trained and validated on our novel dataset. The results demonstrate a substantial improvement in vehicle black smoke detection accuracy for video streaming data compared to most current methods, validating the effectiveness of our approach in real-world scenarios.
2. Related Works
2.1. Object Detection
2.2. Black Smoke Detection
- Zhang et al. [21] enhanced network structure with MobileNet and spatial pyramid pooling and adopted a transfer learning training strategy. The recognition of black smoke was achieved by analyzing multiple frames, yielding improved results, albeit with challenges such as a high false alarm rate, a high rate of missed detections, and significant computational expense. Starting from this work, the model architecture used for black smoke detection started to converge towards mainstream target detection models.
- Wang et al. [3] proposed a two-stage convolutional neural network that combines the strengths of YOLOv3 and a multi-region convolutional network, effectively achieving fine-grained identification of black smoke. This is the first time that YOLO series model has been used in a black smoke detection mission, demonstrating the power of the YOLO line of models.
- Li et al. [22] utilized YOLOv3 for vehicle target detection and subsequently employed the Vision Transformer [23] to identify black smoke emanating from the rear of vehicles. This is a landmark attempt to apply the attention mechanism in a large number of applications to black smoke detection tasks.
- Zhou et al. [4] introduced a black smoke recognition method based on ResNet, which incorporates a reinforced spatial attention mechanism to effectively reduce the false alarm rate.
- Impractical detection scenarios: Most of the work mentioned above focuses primarily on model design and does not discuss the practical black smoke detection scenarios. On the other hand, for example, although [9] consider the issue of the detection scenarios, their model is designed for a specialized annual inspection station, which increases the cost of detection.
- Less usage of the temporal information: As we mentioned, temporal information has unique advantages for the black smoke detection task, yet previous methods have not discussed these advantages in detail.
3. Proposed Dataset
- These cameras capture the real and dynamic running state of vehicles, rather than the static and manually inspected log of vehicles just at an annual inspection station.
- These cameras are usually mounted in the same location as the traffic lights and therefore have a ideal and unified viewing.
- These cameras capture continuous videos, thus allowing for the utilization of temporal information.
4. Proposed Model
4.1. Model Structure
4.2. Black Smoke Feature Extraction with 3D Convolution
4.3. Feature Fusion Module
4.4. Self-Attention Module
5. Experiments
5.1. Setup
5.1.1. Dataset
5.1.2. Environment
5.1.3. Model-Training
5.2. Evaluation Metrics
5.3. Comparative Analysis
- MobileNetv3, YOLOv7, and YOLOv8 are all two-dimensional convolutional network detection models that do not account for the dynamic characteristics of black smoke influenced by noise, leading to lower accuracy.
- Ref. [5] employs the YOLOv5 network to extract black smoke features but only captures the static features, failing to distinguish clearly between black smoke and other disturbances such as vehicle shadows, road stains, and tree shades.
- The C3D model uses fully connected 3D convolution for vehicle black smoke detection, improving accuracy but suffering from a large number of parameters and low inference efficiency, making it unsuitable for real-time detection scenarios.
- Ref. [8] is an integrated solution that contains a semantic segmentation model (i.e., YOLOv5-s) and a detection model (i.e., MobileNetv3). Although this approach uses a semantic segmentation mechanism, the features that provide guidance for semantic segmentation are still obtained from a YOLOv5 model. Thus, from a feature extraction perspective, this is effectively equivalent to the integration of YOLOv5 and MobileNetv3, still limited by the usage to the temporal features.
5.4. Ablation Study
- denotes the two-branch structural model of branch 1 and branch 2
- denotes the introduction of the CSFF module on top of
- is the addition of the SAM module on top of
- is the FULL version of our model, which indicates that the FF module is added to . The FF module includes the CSFF module and the SAM module.
5.5. Introducing 3D Convolutional Kernel to Different Layers
5.6. Experiments on Different Branches and Resolutions
5.7. Case Study
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Guo, D.; Ren, M. Attention mechanism based two-branch black smoke vehicle detection network. Comput. Digit. Eng. China 2022, 50, 147. [Google Scholar]
- Chen, J. Research on the Visual Detection Method of Smoky Diesel Vehicles in Complex Scenes. Master’s Thesis, University of Science and Technology of China, Hefei, China, 2023. [Google Scholar]
- Wang, X.; Kang, Y.; Cao, Y. A two-stage Convolutional neural network for smoky diesel vehicle detection. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019. [Google Scholar]
- Zhou, J.; Qian, S.; Yan, Z.; Zhao, J.; Wen, H. ESA-Net: A Network with Efficient Spatial Attention for Smoky Vehicle Detection. In Proceedings of the IEEE International Instrumentation and Measurement Technology Conference, I2MTC 2021, Glasgow, UK, 17–20 May 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Hao, X. Deep Learning Based Motor Vehicle Black Smoke Detection. Master’s Thesis, China University of Mining and Technology, Xuzhou, China, 2023. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Wong, C.; Yifu, Z.; Montes, D.; et al. ultralytics/yolov5: v6. 2-yolov5 classification models, apple m1, reproducibility, clearml and deci. ai integrations. Zenodo 2022. [Google Scholar]
- Han, W.; Jun, T.; Xiaodong, L.; Shanyan, G.; Rong, X.; Li, S. PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2022. [Google Scholar]
- Wang, H.; Chen, K.; Li, Y. Automatic Detection Method for Black Smoke Vehicles Considering Motion Shadows. Sensors 2023, 23, 8281. [Google Scholar] [CrossRef] [PubMed]
- Chen, J.; Cao, Y.; Kang, Y.; Xu, Z.; Xia, X. CFL-Net: An Environmental Inspection Stations Diesel Vehicle Black Smoke Detection Network Based on Color Features Location. In Proceedings of the 2022 41st Chinese Control Conference (CCC), Hefei, China, 25–27 July 2022. [Google Scholar]
- Tripathi, A.; Gupta, M.K.; Srivastava, C.; Dixit, P.; Pandey, S.K. Object Detection using YOLO: A Survey. In Proceedings of the 5th International Conference on Contemporary Computing and Informatics, IC3I 2022, Uttar Pradesh, India, 14–16 December 2022; pp. 747–752. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollar, P.; Zitnick, L. Microsoft COCO: Common Objects in Context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014, Proceedings, Part V 13; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016, Proceedings, Part I 14; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object Detection via Region-based Fully Convolutional Networks. arXiv 2016, arXiv:1605.06409. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Online, 31 October 2017. [Google Scholar]
- Wei, X.; Liang, S.; Chen, N.; Cao, X. Transferable adversarial attacks for image and video object detection. arXiv 2018, arXiv:1811.12641. [Google Scholar]
- Tao, H.; Lu, X. Smoky vehicle detection based on multi-feature fusion and ensemble neural networks. Multim. Tools Appl. 2018, 77, 32153–32177. [Google Scholar] [CrossRef]
- Zhang, G.; Zhang, D.; Lu, X.; Cao, Y. Smoky Vehicle Detection Algorithm Based On Improved Transfer Learning. In Proceedings of the 6th International Conference on Systems and Informatics, ICSAI 2019, Shanghai, China, 2–4 November 2019; pp. 155–159. [Google Scholar] [CrossRef]
- Yuan, L.; Tong, S.; Lu, X. Smoky Vehicle Detection Based on Improved Vision Transformer. In Proceedings of the CSAE 2021: The 5th International Conference on Computer Science and Application Engineering, Sanya, China, 19–21 October 2021; Emrouznejad, A., Chou, J.R., Eds.; ACM: New York, NY, USA, 2021; pp. 97:1–97:5. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef] [PubMed]
- Cao, Y.; Lu, X. Learning spatial-temporal representation for smoke vehicle detection. Multim. Tools Appl. 2019, 78, 27871–27889. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Online, 5 July 2016; pp. 2818–2826. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online, 24 June 2020. [Google Scholar]
- Joseph, K.J.; Khan, S.; Khan, F.S.; Balasubramanian, V.N. Towards Open World Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online, 19–25 June 2021; pp. 5830–5840. [Google Scholar]
- Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Online, 18 October 2021; pp. 3520–3529. [Google Scholar]
- Diwan, T.; Anirudh, G.; Tembhurne, J.V. Object detection using YOLO: Challenges, architectural successors, datasets and applications. Multimed. Tools Appl. 2023, 82, 9243–9275. [Google Scholar] [CrossRef] [PubMed]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. arXiv 2019, arXiv:1905.02244. [Google Scholar]
- Wang, C.; Bochkovskiy, A.; Liao, H.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8. CoRR 2023. [Google Scholar]
- Tran, D.; Bourdev, L.D.; Fergus, R.; Torresani, L.; Paluri, M. Learning Spatiotemporal Features with 3D Convolutional Networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, 7–13 December 2015; pp. 4489–4497. [Google Scholar] [CrossRef]
- Han, X.; Wang, Y.; Zhai, B.; You, Q.; Yang, H. COCO is “ALL” You Need for Visual Instruction Fine-tuning. arXiv 2024, arXiv:2401.08968. [Google Scholar]
- Li, M.; Wu, J.; Wang, X.; Chen, C.; Qin, J.; Xiao, X.; Wang, R.; Zheng, M.; Pan, X. Aligndet: Aligning pre-training and fine-tuning in object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Online, 27 June 2023; pp. 6866–6876. [Google Scholar]
Parameter Name | Value |
---|---|
Batch Size | 8 |
Frame Size | 6 |
Epoch | 200 |
Initial learning rate | 0.01 |
Recurring learning rate | 0.01 |
Weight Decay | 0.9 |
Loss factor for IoU | 0.2 |
Loss factor for Cross-entropy | 1 |
Thresholds for IoU training | 0.5 |
Hue (fraction) | 0.015 |
Saturation (fraction) | 0.7 |
Luminance (fraction) | 0.4 |
Rotation angle (+/− deg) | 10.0 |
Translation (+/− fractions) | 0.1 |
Image scaling (+/− gain) | 0.5 |
Probability of performing a up-down flip | 0 |
Probability of performing a left-right flip | 0.5 |
Probability of performing Mosaic | 1.0 |
Model | AR(%) | RR(%) | F1-Score(%) | Inference Time |
---|---|---|---|---|
MobileNetv3 | 78.46 | 81.32 | 79.86 | 76.96 |
YOLOv7 | 83.78 | 84.32 | 84.04 | 123.34 |
YOLOv8 | 84.62 | 86.73 | 85.67 | 139.04 |
[1] | 85.44 | 88.54 | 86.95 | 185.62 |
[4] | 85.96 | 89.36 | 87.64 | 176.32 |
[5] | 85.64 | 87.46 | 86.55 | 132.46 |
C3D | 87.76 | 89.66 | 88.74 | 348.23 |
[8] | 84.37 | 87.90 | 86.56 | 317.25 |
ours | 89.42 | 91.32 | 90.36 | 196.68 |
Model | AR(%) | RR(%) | F1-Score(%) | Inference Time |
---|---|---|---|---|
ResNet50 | 82.02 | 84.72 | 83.35 | 82.72 |
85.33 | 87.24 | 86.27 | 144.46 | |
87.46 | 89.83 | 88.63 | 153.47 | |
86.75 | 89.47 | 88.10 | 151.28 | |
89.42 | 91.32 | 90.36 | 186.43 |
Model | Image Size | Position of 3D Convolution | AR |
---|---|---|---|
ResNet50 | 256 × 128 | NULL | 82.02 |
1st layer | 82.63 | ||
2nd layer | 82.37 | ||
3rd layer | 82.50 | ||
1st + 2nd | 82.82 | ||
1st + 3rd | 83.01 | ||
1st + 2nd + 3rd | 83.32 |
The Resolution of Branch 1 | The Resolution of Branch 2 | Usage of Both Branch 1 and Branch 2 | AR (%) |
---|---|---|---|
256 × 128 | - | N | 83.32 |
128 × 64 | - | N | 81.67 |
64 × 32 | - | N | 66.54 |
256 × 128 | 128 × 64 | Y | 85.33 |
256 × 128 | 64 × 32 | Y | 84.23 |
128 × 64 | 64 × 32 | Y | 78.01 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, J.; Yang, L.; Cheng, H.; Niu, L.; Xu, J. Three-Dimensional Convolutional Vehicle Black Smoke Detection Model with Fused Temporal Features. Appl. Sci. 2024, 14, 8173. https://doi.org/10.3390/app14188173
Liu J, Yang L, Cheng H, Niu L, Xu J. Three-Dimensional Convolutional Vehicle Black Smoke Detection Model with Fused Temporal Features. Applied Sciences. 2024; 14(18):8173. https://doi.org/10.3390/app14188173
Chicago/Turabian StyleLiu, Jiafeng, Lijian Yang, Hongxu Cheng, Lianqiang Niu, and Jian Xu. 2024. "Three-Dimensional Convolutional Vehicle Black Smoke Detection Model with Fused Temporal Features" Applied Sciences 14, no. 18: 8173. https://doi.org/10.3390/app14188173
APA StyleLiu, J., Yang, L., Cheng, H., Niu, L., & Xu, J. (2024). Three-Dimensional Convolutional Vehicle Black Smoke Detection Model with Fused Temporal Features. Applied Sciences, 14(18), 8173. https://doi.org/10.3390/app14188173