In recent years, the field of dangerous behavior detection has been devoted to exploring various technological approaches to comprehensively enhance workplace safety. These approaches encompass a range of methods, including traditional algorithms and deep learning algorithms. The core focus of research typically revolves around utilizing surveillance data to identify workers’ dangerous behaviors, such as the absence of safety helmets, smoking, and mobile phone usage during work in chemical enterprises. However, existing studies tend to lean towards specific types of dangerous behaviors, posing challenges in addressing the diverse requirements of actual workplace scenarios.
To gain a more comprehensive understanding of the application of dangerous behavior detection technology in different work settings, this study focuses on summarizing the latest research findings in the field of safety helmet wearing, smoking, and mobile phone usage detection both domestically and internationally. The aim is to elucidate the overall development trends in this field and provide valuable insights into the comprehensive application of dangerous behavior detection technology in practical workplace settings through an in-depth exploration of various technological approaches. This endeavor not only holds the potential to improve the overall safety of workplaces but also aids in effectively reducing the potential risks faced by chemical enterprises.
2.1. Traditional Methods for Dangerous Behavior Detection Algorithm
In the field of dangerous behavior detection, traditional methods primarily focus on utilizing computer vision and image processing techniques, as well as conventional feature engineering methods. For instance, Abu H. et al. proposed a detection method for automatically detecting helmets to ensure construction safety. They first combined the frequency domain information of images with Histograms of Oriented Gradients (HOGs) for detecting construction workers and then applied a combination of color-based and Circular Hough Transform (CHT) feature extraction techniques to detect the usage of helmets by construction workers [
16]. Seshadri et al. introduced a computer vision-based driver mobile phone usage detection system. By employing facial landmark tracking algorithms, the system can automatically identify whether the driver brings the phone close to the ear. The research validated the system using challenging Strategic Highway Research Program (SHRP2) facial view videos, demonstrating its effectiveness under natural driving conditions. By combining direct methods and various features, the system achieved satisfactory performance on facial pose verification data, providing new insights into understanding driver behavior [
17]. Wang et al. proposed a method for phone behavior detection using semi-supervised Support Vector Machine (SVM) models. Although the method involves a large iteration amount during the detection process, leading to slow data processing speed and real-time issues, it offers a unique approach to determining phone usage behavior [
18]. Pan et al. proposed a method combining Gaussian Mixture Models (GMMs) and frame differencing for extracting features of regions of interest and analyzing smoking behavior through RGB color features. However, this method is influenced by factors such as movement speed and weather in smoking behavior detection and suffers from low real-time performance and accuracy issues [
19]. Ai Bo utilized a combination of Gaussian Mixture Models with background subtraction to extract foreground object information, then performed HOG feature extraction on regions of interest, and determined the presence of smoking behavior in the current frame through a classifier. However, this traditional method is prone to be affected by factors such as weather and pedestrian speed in smoking behavior detection, limiting its accuracy and real-time performance [
20].
In summary, traditional methods are often constrained by manually designed features and rules, which may limit their performance in addressing complex scenes and diverse dangerous behaviors. However, with the rise in deep learning, methods based on deep learning have gradually achieved significant breakthroughs in the field of dangerous behavior detection.
2.2. Dangerous Behavior Detection Based on Deep Learning
Currently, both domestically and internationally, there is active exploration of the application of deep learning technology in safety behavior detection in the chemical industry, especially in detecting dangerous behaviors during chemical workers’ operations. Deep learning technology has been widely proven to have tremendous potential in reducing workplace accident risks and improving work efficiency. In the field of dangerous behavior detection, object detection techniques based on deep learning play a crucial role. Object detection methods are mainly divided into single-stage and two-stage approaches, each with its unique characteristics and advantages when it comes to performing dangerous behavior detection tasks [
21].
First, the two-stage target detection method accomplishes the target detection task through two stages. It first generates a series of candidate regions through a region proposal network and then performs target detection on these regions. The two-stage target detection algorithm usually has an advantage in accuracy and is suitable for dangerous behavior detection scenarios that require high detection accuracy. In the field of dangerous behavior detection, many domestic and foreign researchers are committed to using two-stage target detection algorithms to achieve the purpose of detecting whether there are potential dangerous behaviors in the work of workers through the in-depth analysis of picture data.
For example, Dey et al. proposed a context-driven detection method for distracted driving using in-vehicle cameras. This method employs a novel computer vision technique to detect distracted driving by identifying and analyzing objects like hands and smartphones inside the vehicle. By its unique context-driven approach, it provides real-time feedback regarding the specific reasons for distraction, thereby enhancing driving safety [
22]. Senyurek et al. introduced a deep learning algorithm utilizing a convolutional neural network (CNN) and long short-term memory network (LSTM) architecture to detect smoking behavior from respiratory signals. Compared to traditional feature-based classification frameworks, the advantage of the CNN-LSTM model lies in learning appropriate features from respiratory inductive plethysmography (RIP) sensor signals through the CNN layer, providing superior performance for smoking detection [
23]. Han et al. proposed a method for fast smoking behavior detection. Firstly, the face area is taken as the scope of smoking detection, effectively reducing the detection area of smoking targets by utilizing the characteristic that human body targets are relatively large compared to the face area. Then, the Faster R-CNN model is used to determine whether smoking behavior exists [
24]. Wang et al. proposed a method for identifying unsafe behaviors of construction workers based on text mining and image recognition technology, divided into three stages. Firstly, a deep learning algorithm is used to identify the safety equipment of construction workers. Secondly, the classification and detection of unsafe behaviors are completed through Faster R-CNN. Finally, in the third stage, the identification and tracking of personnel in dangerous areas are conducted, achieving comprehensive recognition of unsafe behaviors of construction workers on construction sites [
25]. Chen et al. presented a real-time automatic detection system for safety helmet-wearing based on the Faster R-CNN algorithm. The improved algorithm introduces Retinex image enhancement technology, effectively overcoming interference from factors such as light and distance. This technology improves the quality of images in complex outdoor scenes of substations, enabling timely and effective detection of individuals not wearing safety helmets, and providing reliable safety monitoring for substation construction [
26].
Second, single-stage target detection methods directly predict the location and category of targets in the input image without explicitly generating candidate regions. Typical single-stage target detection algorithms include YOLO, SSD, etc. These algorithms are characterized by high real-time performance, simplicity, and high efficiency, which are especially suitable for scenarios with high real-time requirements in dangerous behavior detection. In the field of dangerous behavior detection, many scholars at home and abroad also widely apply single-stage target detection methods to achieve timely and efficient detection of dangerous behavior.
For instance, Aboah et al. proposed a real-time multi-class helmet violation detection method using few-shot data sampling techniques and YOLOv8. They extracted frames with partially different backgrounds from a large number of video frames and applied data augmentation operations to these extracted frames using test-time augmentation strategies. Finally, they trained and tested the YOLOv8 model, achieving real-time detection goals [
27]. Fan et al. introduced a helmet-wearing detection method based on the EfficientDet algorithm. They first optimized the initial clustering centers using the K-Means++ clustering algorithm, then introduced the SeparableConv2D network. They combined the Simple and Efficient Bi-directional Feature Pyramid Network (BiFPN) proposed in the EfficientDet algorithm to extract image feature maps. They utilized the Channel Correlation Loss (CC-Loss) function as the classification loss function to constrain specific relationships between classes and channels, maintaining separability within and between classes, thereby improving the accuracy of the model detection [
28]. Yang et al. proposed a deep learning-based SSD algorithm for detecting illegal driving behaviors. The detection of driver-driving behaviors mainly includes using mobile phones, smoking, and not wearing seat belts. Utilizing the SSD algorithm can effectively address the issue of whether the driver is violating driving regulations during the driving process, significantly reducing the occurrence of traffic accidents [
29]. Zhao et al. presented a smoking behavior detection method for drivers based on the Feature Pyramid Network (FPN). By combining the FPN and dilated convolution technology, they detected small objects in driver images and identified their smoking behavior [
30]. She et al. proposed an improved YOLOx-based algorithm for small target smoking detection. By adding an attention mechanism module to focus on global information in the feature extraction network and concentrating attention within the target area through scale addition, they increased the use of deep networks. They also optimized the loss function by replacing it with the Generalized Intersection over Union (GIoU) loss function, addressing the shortcomings of IoU [
31].
Although deep learning has made some progress in dangerous behavior detection, challenges remain, such as insufficient detection effectiveness and relatively low accuracy. These issues also exist in the job safety and security scenes addressed in this study, especially in situations involving multiple target occlusions and environmental interference, rendering existing methods impractical. Compared to existing research, this study pays full attention to the characteristics of job safety and security scenes, addressing the issue of low detection accuracy of multi-scale targets in complex scenes in dangerous behavior target detection tasks.