Enhancing Object Detection in Underground Mines: UCM-Net and Self-Supervised Pre-Training
Abstract
:1. Introduction
- We propose a Hierarchical Global Response Normalization Block (HGRNBlock) to enhance the model’s feature expression capabilities and stability.
- We introduce depthwise separable convolution, combined with the Hierarchical Global Response Normalization module, to propose ESFENet, a feature extraction network for mine images. Based on the YOLOv8 model, we propose the underground coal mine object detection network, UCM-Net, which balances both precision and real-time performance for coal mine object detection.
- We design a self-supervised pre-training model structure for mine data based on the SparK masked encoder, generating dedicated pre-training weights for coal mine tasks for the first time.
- To address the limited benefits of self-supervised pre-training for lightweight models, we incorporate the feature fusion neck layer from the YOLO series into the pre-training structure. This enhancement enables the model to not only acquire more feature information during training but also achieve better fusion of these features.
2. Methods
2.1. Lightweight Detection Network
2.1.1. UCM-Net
- Backbone: We set the input image size to 640 × 640 × 3. The HGStem module from HGNetv2 [35] is introduced to capture basic visual features from the input image. Then, the Hierarchical Global Response Normalization Block (HGRNBlock) is used for further feature extraction from underground images. To enhance lightweight processing, we integrate depthwise separable convolution (DWConv) [36] to apply a downsampling operation to feature maps. Finally, the Spatial Pyramid Pooling Fast (SPPF) module is used to concatenate feature maps generated by maximum-pooling operations with different kernel sizes. This integration of multi-scale feature information enhances ESFENet’s ability to extract relevant features effectively.
- Neck: We adopt the FPN + PAN structure for feature fusion, focusing on multi-scale object prediction. The Feature Pyramid Network (FPN) adds a horizontal connection to the backbone, creating an upsampling path that merges low-level and high-level features, enhancing multi-scale detection and semantic information. The Path Aggregation Network (PAN) transfers strong localization information from low-level features and combines it with the semantic information from FPN. It utilizes both horizontal and vertical paths to fuse features of different resolutions, facilitating position information transmission.The PAN implementation includes two C2F modules and two 3 × 3 convolutions, which process features and reduce the feature map size from 80 × 80 to 20 × 20 while maintaining feature connectivity. Within the C2F module, the feature map first undergoes channel adjustment through a 1 × 1 convolution, followed by a split operation along the channel dimension, dividing it into two sub-feature maps. One part retains the original features, while the other uses multiple Bottleneck structures for deep feature extraction. Finally, the two branches are fused through a Concat operation. The C2F modules utilize additional gradient flow branches to enhance the model’s gradient propagation, mitigate gradient vanishing issues, and allow better feature fusion of the multi-scale information extracted by the backbone.
- Head: There are three detection heads in total. Each head is composed of two branches, with each branch containing two CBS layers and a 2D convolution. These branches predict the object’s class and location, respectively.
2.1.2. ESFENet
- Global L2 Norm CalculationThe first step is to compute the global L2 norm for each channel of the input feature map . This operation calculates the L2 norm for each channel across the entire batch, and the resulting dimensions of Gx will be B × C × 1 × 1, where B is the batch size and C is the number of channels. This step ensures that the norm is calculated independently for each channel while keeping the same dimensions for subsequent operations. The formula for computing Gx is as follows:
- Normalization Factor CalculationNext, the normalization factor Nx is computed by dividing each channel’s norm Gx by the mean of its norms across the channel dimension. This ensures that each channel’s norm is normalized to its own average:
- Scaling and ShiftingOnce the normalization is complete, the feature map is subjected to a scaling and shifting operation. Each normalized feature map is multiplied by a learnable scaling factor and added to a learnable shifting factor , which allows the model to adjust the normalized feature map to better fit the task:Here, and are learnable parameters with the same dimensions as the input feature map x. This scaling and shifting operation provides the model with flexibility to adapt the normalized features.This residual connection helps ensure the efficient flow of information during training, prevents vanishing gradients, and promotes feature reuse.After applying GRN, we introduce the GELU activation function. GELU combines the advantages of the ReLU and sigmoid activation functions. Similar to ReLU, GELU introduces non-linearity, but unlike ReLU, it smoothly handles negative values instead of setting them to zero. This smooth behavior helps the model avoid abrupt changes and better capture complex patterns in the data. The mathematical expression for GELU is as follows:The motivation for adding the GELU activation function after GRN is as follows:
- Enhancing the Model’s Expressive PowerGRN (Global Response Normalization) is essentially a normalization operation designed to standardize feature maps, ensuring that the feature values are consistent and enhancing the model’s stability. However, normalization is inherently a linear operation, and such operations may limit the model’s expressive capability. To increase the model’s expressive power, a non-linear activation function such as GELU is introduced after GRN. The introduction of GELU not only increases the model’s ability to represent complex patterns but also ensures smooth handling of features, avoiding the hard cutoff that occurs with functions such as ReLU, which sets negative values to zero. This makes the model more flexible when handling complex data.
- Preserving Statistical Properties of Feature MapsAfter GRN normalization, the output feature maps exhibit certain statistical properties. To preserve these properties and further enhance the model’s adaptability to different data, it is crucial to select an activation function that aligns with these statistical characteristics. The GELU (Gaussian Error Linear Unit) activation function applies the Cumulative Distribution Function (CDF) of a Gaussian distribution, which is more compatible with the statistical properties of the feature maps processed by GRN. The Gaussian distribution better matches the distribution of the feature maps, helping to maintain their statistical characteristics and improving the model’s performance across a wider range of inputs.
2.2. Self-Supervised Pretraining for Object Detection Tasks
2.2.1. Hierarchical Masked Encoder and Decoder
2.2.2. Structure Optimization and Downstream Transfer
3. Experiment Preparation
3.1. Datasets
3.2. Implementation Details
3.3. Evaluation Metrics
4. Experimental Results and Discussion
4.1. The Experiment of Detection
4.1.1. Comparison Experiments
4.1.2. Ablation Experiments
4.1.3. Generalization Experiments
4.2. Self-Supervised Pre-Training Results on Downstream Tasks
4.2.1. Comparative Experiment
4.2.2. Self-Supervised Pre-Training Experiment on UCM-Net
5. Conclusions
- The proposed ESFENet algorithm enhances the network’s feature extraction capability to adapt to the complex and dynamic coal mine environment. Ablation experiments show that the average mAP50:95 increased by 0.84% across five underground coal mine datasets, and the algorithm’s generalization ability was validated on the Pascal VOC2007 + 2012 dataset.
- The proposed universal underground coal mine detection model, UCM-Net, was tested on five mining detection datasets. The experimental results demonstrate that UCM-Net improves detection accuracy while reducing parameter size and computation cost. It achieved state-of-the-art (SOTA) performance on all five datasets, with a 21.5% reduction in parameter size and a 14.8% reduction in computational cost. Particularly on the miner behavior dataset, it achieved a 1.3% increase in mAP50:95.
- To address the mismatch between the officially provided supervised pre-training weights and the underground detection tasks, we utilized a self-supervised pre-training method to train a model-specific pre-training weight for underground coal mine detection, further improving the detection model’s accuracy. Experimental analysis shows that when the image masking ratio is 60%, the accuracy improvement on downstream detection tasks is the most significant.
- As the parameter size of the self-supervised pre-training model decreases, its improvement in detection accuracy becomes lower than that of supervised pre-training methods. To solve this issue, we included both the Backbone and Neck of the model into the pre-training structure as the Encoder, which strengthened the adaptability of self-supervised pre-training to downstream tasks. This resulted in UCM-Net achieving an average mAP50:95 of 94.4% across five datasets.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Liu, H.D.; Zhang, H.; Wang, J.P.; Dou, J.X.; Guo, R.; Li, G.Y.; Liang, Y.H.; Yu, J.L. Construction of macromolecular model of coal based on deep learning algorithm. Energy 2024, 294, 130856. [Google Scholar] [CrossRef]
- Zhang, K.; Yang, X.; Xu, L.; Thé, J.; Tan, Z.; Yu, H. Enhancing coal-gangue object detection using GAN-based data augmentation strategy with dual attention mechanism. Energy 2024, 287, 129654. [Google Scholar] [CrossRef]
- Zhang, K.; Wang, T.; Yang, X.; Xu, L.; Thé, J.; Tan, Z.; Yu, H. STATNet: One-stage coal-gangue detector based on deep learning algorithm for real industrial application. Energy AI 2024, 17, 100388. [Google Scholar] [CrossRef]
- Wu, B.; Wang, J.; Qu, B.; Qi, P.; Meng, Y. Development, effectiveness, and deficiency of China’s coal mine safety supervision system. Resour. Policy 2023, 82, 103524. [Google Scholar] [CrossRef]
- Huang, K.; Li, S.; Cai, F.; Zhou, R. Detection of large foreign objects on coal mine belt conveyor based on Improved. Processes 2023, 11, 2469. [Google Scholar] [CrossRef]
- Wu, X.; Li, H.; Wang, B.; Zhu, M. Review on improvements to the safety level of coal mines by applying intelligent coal mining. Sustainability 2022, 14, 16400. [Google Scholar] [CrossRef]
- He, D.; Le, B.T.; Xiao, D.; Mao, Y.; Shan, F.; Ha, T.T.L. Coal mine area monitoring method by machine learning and multispectral remote sensing images. Infrared Phys. Technol. 2019, 103, 103070. [Google Scholar] [CrossRef]
- Dou, D.; Wu, W.; Yang, J.; Zhang, Y. Classification of coal and gangue under multiple surface conditions via machine vision and relief-SVM. Powder Technol. 2019, 356, 1024–1028. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-Time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Imam, M.; Baïna, K.; Tabii, Y.; Ressami, E.M.; Adlaoui, Y.; Benzakour, I.; Abdelwahed, E.H. The future of mine safety: A comprehensive review of anti-collision systems based on computer vision in underground mines. Sensors 2023, 23, 4294. [Google Scholar] [CrossRef]
- Azhari, F.; Sennersten, C.C.; Lindley, C.A.; Sellers, E. Deep learning implementations in mining applications: A compact critical review. Artif. Intell. Rev. 2023, 56, 14367–14402. [Google Scholar]
- Liu, Y.; Wang, X.; Zhang, Z.; Deng, F. LOSN: Lightweight ore sorting networks for edge device environment. Eng. Appl. Artif. Intell. 2023, 123, 106191. [Google Scholar]
- Zhang, J.; Feng, Y.; Li, X.; Lang, D.; Xu, Y.; Li, H.A.; Li, X. Safety helmet wearing detection algorithm based on DSM-YOLOx. Res. Sq. 2024. [Google Scholar] [CrossRef]
- Wang, Y.; Guo, W.; Zhao, S.; Xue, B.; Zhang, W.; Xing, Z. A big coal block alarm detection method for scraper conveyor based on YOLO-BS. Sensors 2022, 22, 9052. [Google Scholar] [CrossRef]
- Rao, T.; Xu, H.; Pan, T. Pedestrian detection model in underground coal mine based on active and semi-supervised learning. In Proceedings of the 2023 8th International Conference on Signal and Image Processing (ICSIP), Wuxi, China, 8–10 July 2023; pp. 104–108. [Google Scholar] [CrossRef]
- Wang, Z.; Liu, Y.; Duan, S.; Pan, H. An efficient detection of non-standard miner behavior using improved YOLOv8. Comput. Electr. Eng. 2023, 112, 109021. [Google Scholar]
- Wen, X.; Li, B.; Wang, X.; Li, J.; Wei, D.; Gao, J.; Zhang, J. A Swin transformer-functionalized lightweight YOLOv5s for real-time coal–gangue detection. J. Real-Time Image Process. 2023, 20, 47. [Google Scholar]
- Xue, G.; Li, S.; Hou, P.; Gao, S.; Tan, R. Research on lightweight Yolo coal gangue detection algorithm based on resnet18 backbone feature network. Internet Things 2023, 22, 100762. [Google Scholar]
- Wang, B.; Cui, H.; Yu, X.; Su, Z.; Zheng, Y. Research on gangue detection method based on GD-YOLO. Eng. Lett. 2025, 33, 59–68. [Google Scholar]
- Zong, G.; Yue, Y.; Shan, W. Optimization study of coal gangue detection in intelligent coal selection systems based on the improved YOLOv8n model. Electronics 2024, 13, 4155. [Google Scholar] [CrossRef]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving language understanding by generative pre-training. Comput. Sci. Linguist. 2018. Available online: https://paperswithcode.com/paper/improving-language-understanding-by (accessed on 24 March 2025).
- Lee, J.; Toutanova, K. Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16000–16009. [Google Scholar]
- Dai, Z.; Cai, B.; Lin, Y.; Chen, J. Up-detr: Unsupervised pre-training for object detection with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1601–1610. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Tian, K.; Jiang, Y.; Diao, Q.; Lin, C.; Wang, L.; Yuan, Z. Designing bert for convolutional networks: Sparse and hierarchical masked modeling. arXiv 2023, arXiv:2301.03580. [Google Scholar]
- Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. Convnext v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 16133–16142. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat YOLOs on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Yang, W.; Zhang, X.; Ma, B.; Wang, Y.; Wu, Y.; Yan, J.; Liu, Y.; Zhang, C.; Wan, J.; Wang, Y.; et al. An open dataset for intelligent recognition and classification of abnormal condition in longwall mining. Sci. Data 2023, 10, 416. [Google Scholar]
- Chen, J.; Kao, S.H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H.; Chan, S.H.G. Run, don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Chen, X.; He, K. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15750–15758. [Google Scholar]
- Chen, X.; Fan, H.; Girshick, R.; He, K. Improved baselines with momentum contrastive learning. arXiv 2020, arXiv:2003.04297. [Google Scholar]
Dataset | Classes | Total Images | Scenes | Images for Training | Images for Validation | Images for Testing |
---|---|---|---|---|---|---|
Coal miners | 1 | 30,704 | 58 | 21,492 | 3071 | 6141 |
Support guard plates | 9 | 20,045 | 159 | 14,031 | 2005 | 4009 |
Large coal | 1 | 21,017 | 18 | 14,711 | 2102 | 4204 |
Miners’ behaviors | 8 | 24,709 | 67 | 17,296 | 2471 | 4942 |
Drag chains | 1 | 21,412 | 65 | 14,987 | 2142 | 4283 |
Configuration Parameter | Value |
---|---|
Optimizer | SGD |
Initial learning rate | |
Final learning rate | |
Momentum | 0.937 |
Weight decay | |
Batch size | 8 |
Epoch | 400 |
Input size | 640 |
Close mosaic | 10 |
patience | 50 |
Configuration Parameter | Value |
---|---|
Input size | 640 |
Batch size | 8 |
Base_lr | |
Epoch | 400 |
Sbn | True |
Clip | 5 |
Methods | Backbone | AP (%)/AP50 (%)/AP75 (%) | Params (M) | FLOPs | ||||
---|---|---|---|---|---|---|---|---|
Coal Miners | Guard Plates | Large Coal | Behaviors | Drag Chains | ||||
Faster R-CNN | ResNet50 | 60.7/94.0/65.0 | 72.8/96.0/85.1 | 43.3/75.9/43.5 | 59.3/86.4/69.8 | 83.1/98.8/92.2 | 41.35 | 52.0 |
YOLOv3 | Darknet53 | 64.5/95.3/74.2 | 70.6/96.7/80.9 | 47.5/80.9/51.1 | 61.6/88.1/74.9 | 81.0/98.9/93.7 | 61.52 | 46.5 |
YOLOv5-s | CSPDarknet53 | 74.3/98.2/84.7 | 79.2/97.3/90.7 | 52.4/84.1/57.3 | 71.0/90.6/84.4 | 90.1/99.0/97.7 | 7.01 | 15.8 |
YOLOX-tiny | CSPDarknet53 | 73.4/97.5/82.8 | 78.6/97.2/89.4 | 53.2/83.4/59.0 | 69.5/89.6/83.2 | 88.7/99.0/96.8 | 5.03 | 7.6 |
YOLOX-s | CSPDarknet53 | 73.0/97.4/82.9 | 79.2/97.1/90.0 | 54.2/84.1/60.2 | 71.2/90.3/84.9 | 89.5/99.0/96.8 | 8.94 | 13.3 |
YOLOv8n | CSPDarknet53 | 74.6/98.0/84.1 | 78.6/97.4/88.6 | 54.5/83.0/61.5 | 71.5/90.1/83.9 | 90.4/99.0/97.7 | 3.01 | 8.1 |
YOLOv10n | CSPDarkent53 | 74.5/97.7/84.7 | 78.7/96.9/87.5 | 54.1/81.9/61.6 | 70.9/89.7/84.6 | 90.3/98.9/97.7 | 2.69 | 8.2 |
UCM-Net | ESFENet | 74.6/98.0/84.2 | 79.2/97.4/92.0 | 54.9/83.5/61.7 | 72.2/90.6/85.0 | 91.0 /98.9/97.8 | 2.35 | 6.9 |
Methods | Backbone | mAP50:95 (%)/mAP50 (%)/F1-Score (%) | Params (M) | FLOPs | ||||
---|---|---|---|---|---|---|---|---|
Coal Miners | Guard Plates | Large Coal | Behaviors | Drag Chains | ||||
YOLOv8n | Ghostnet | 76.8/98.4/95.6 | 79.3/97.4/93.1 | 55.2/84.0/76.9 | 72.3/89.9/85.8 | 92.6/99.5/99.7 | 2.77 | 6.8 |
Mobilenetv3 | 76.5/98.2/95.5 | 77.5/97.1/91.9 | 53.7/83.0/75.8 | 72.5/90.5/86.4 | 93.1/99.5/99.6 | 3.11 | 6.1 | |
Mobilenetv4 | 76.6/98.4/95.6 | 80.4/97.8/94.1 | 55.1/83.7/76.6 | 71.4/90.8/86.0 | 92.8/99.4/99.4 | 5.7 | 22.5 | |
Fasternet | 77.4/98.3/95.9 | 79.6/97.4/92.8 | 55.1/83.8/76.8 | 73.3/90.6/86.7 | 93.1/99.5/99.6 | 4.17 | 10.7 | |
CSPDarkent53 | 76.7/98.3/95.7 | 79.9/97.7/94.3 | 55.0/83.5/76.3 | 72.9/90.7/87.3 | 92.6/99.5/99.5 | 3.01 | 8.1 | |
ESFENet | 77.6/98.5/95.8 | 80.6/97.8/94.0 | 55.3/84.0/77.0 | 74.2/91.5/87.5 | 93.6/99.5/99.6 | 2.35 | 6.9 |
Method | Precision (%)/Recall (%) | FPS (Frames/s) | ||||
---|---|---|---|---|---|---|
Coal Miners | Guard Plates | Large Coal | Miners’ Behaviors | Drag Chains | ||
YOLOv8n | 96.0/95.5 | 95.5/93.1 | 78.7/74.1 | 88.9/85.7 | 99.4/99.6 | 96.5 |
YOLOv10n | 95.8/94.2 | 93.3/93.7 | 78.2/72.6 | 85.7/85.1 | 99.2/99.4 | 70.1 |
YOLOv12n | 96.0/95.5 | 94.9/92.6 | 79.4/73.6 | 85.0/87.2 | 99.5/99.4 | 49.6 |
UCM-Net(OUR) | 96.3/95.3 | 95.6/92.5 | 79.2/74.9 | 87.0/88.0 | 99.6/99.6 | 77.1 |
Ground Truth | YOLOv8n | YOLOv10n | UCM-Net (OUR) | |
---|---|---|---|---|
Coal Miners | ||||
Guard Plates | ||||
Large Coal | ||||
Miners’ Behaviors | ||||
Drag Chains | ||||
Methods | mAP50:95 (%)/mAP50 (%) | Params (M) | FLOPs | ||||
---|---|---|---|---|---|---|---|
Coal Miners | Guard Plates | Large Coal | Miners’ Behaviors | Drag Chains | |||
Conv+C2f (baseline) | 76.6/98.3 | 79.9/97.7 | 55.0/83.5 | 72.9/90.7 | 92.6/99.5 | 3.01 | 8.1 |
DWConv + C2f | 76.7/98.3 | 80.1/97.8 | 55.1/83.9 | 73.8/90.5 | 92.8/99.4 | 2.62 | 7.2 |
DWConv + HGBlock | 77.1/98.4 | 80.4/97.8 | 55.2/83.9 | 73.0/91.1 | 93.6/99.4 | 2.35 | 6.9 |
DWConv + HGRNBlock × 1 | 77.2/98.5 | 80.6/97.9 | 55.3/84.0 | 73.9/91.3 | 93.6/99.4 | 2.35 | 6.9 |
Conv + HGRNBlock × 6 | 77.5/98.5 | 81.1/97.8 | 55.3/83.6 | 74.5/91.6 | 93.5/99.4 | 3.1 | 8.1 |
DWConv + HGRNBlock × 6 | 77.6/98.5 | 80.6/97.8 | 55.3/84.0 | 74.2/91.5 | 93.6/99.5 | 2.35 | 6.9 |
Method | mAP50 | Precision | Recall | F1-Score |
---|---|---|---|---|
Conv + C2f (baseline) | 69.8 | 75.4 | 62.5 | 68.3 |
DWConv + C2f | 67.7 | 74.6 | 60.3 | 66.7 |
DWConv + HGBlock | 69.6 | 75.6 | 62.2 | 68.2 |
DWConv + HGRNBlock × 1 | 69.7 | 75.5 | 62.3 | 68.3 |
DWConv + HGRNBlock × 6 | 69.5 | 76.8 | 61.5 | 68.3 |
Methods | Pre-Training Methods | Migration Component (Params) | Masking Ratio | mAP50:95 (%)/mAP50 (%) | ||||
---|---|---|---|---|---|---|---|---|
Coal Miners | Guard Plates | Large Coal | Miners’ Behaviors | Drag Chains | ||||
YOLOv8s | Random Initialization | 78.9/98.7 | 81.6/97.9 | 56.2/84.3 | 74.9/91.4 | 93.5/99.4 | ||
Official Supervised | Whole Framework (11.1 M) | 79.9/98.7 | 81.8/97.7 | 56.8/84.7 | 75.1/91.4 | 93.9/99.4 | ||
Self-supervised | Backbone (4.4 M) | 75% | 79.1/98.7 | 81.7/97.9 | 56.7/84.7 | 75.3/91.5 | 93.7/99.5 | |
Backbone (4.4 M) | 60% | 79.2/98.7 | 82.3/97.7 | 56.7/84.8 | 75.7/91.5 | 93.7/99.5 | ||
YOLOv8n- fasternet | Random Initialization | 77.4/98.3 | 79.6/97.4 | 55.1/83.8 | 73.3/90.6 | 93.1/99.5 | ||
Official Supervised | Backbone (2.2 M) | 73.6/97.5 | 79.3/97.6 | 54.8/83.9 | 70.4/89.8 | 91.4/99.5 | ||
Self-supervised | Backbone (2.2 M) | 75% | 77.6/98.4 | 81.5/97.7 | 55.1/83.6 | 73.7/91.4 | 93.2/99.4 | |
Backbone (2.2 M) | 60% | 77.7/98.4 | 81.3/97.8 | 55.4/84.0 | 73.5/91.1 | 93.5/99.4 | ||
YOLOv8n | Random Initialization | 76.6/98.3 | 79.9/97.7 | 55.0/83.5 | 72.9/90.7 | 92.6/99.5 | ||
Official Supervised | Whole Framework (3.0 M) | 78.1/98.6 | 79.8/97.4 | 55.8/84.2 | 74.7/91.2 | 93.6/99.4 | ||
Self-supervised | Backbone (1.1 M) | 75% | 77.0/98.5 | 80.9/97.7 | 55.7/84.1 | 73.1/90.7 | 92.6/99.5 | |
Backbone (1.1 M) | 60% | 77.4/98.6 | 81.1/97.6 | 55.6/84.0 | 72.9/91.0 | 93.0/99.5 | ||
Backbone + Neck (2.3 M) | 75% | 77.4/98.5 | 80.9/97.8 | 55.6/84.0 | 73.3/91.3 | 93.0/99.5 | ||
Backbone + Neck (2.3 M) | 60% | 77.5/98.6 | 81.3/97.9 | 55.8/84.2 | 73.5/90.8 | 93.1/99.5 |
Methods | Pre-Training Methods | mAP50:95 (%)/mAP50 (%) | ||||
---|---|---|---|---|---|---|
Coal Miners | Guard Plates | Large Coal | Miners’ Behaviors | Drag Chains | ||
UCM-Net | Random Initialization | 77.6/98.5 | 80.6/97.8 | 55.3/84.0 | 74.2/91.5 | 93.6/99.5 |
SimSiam | 75.8/97.9 | 76.9/96.4 | 54.9/83.2 | 69.3/88.6 | 90.9/99.4 | |
MoCov2 | 77.3/98.5 | 81.1/97.8 | 54.9/83.7 | 73.5/91.6 | 93.6/99.5 | |
SparK | 77.7/98.6 | 81.3/97.9 | 55.8/84.2 | 74.5/91.9 | 93.6/99.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, F.; Zou, J.; Xue, R.; Yu, M.; Wang, X.; Xue, W.; Yao, S. Enhancing Object Detection in Underground Mines: UCM-Net and Self-Supervised Pre-Training. Sensors 2025, 25, 2103. https://doi.org/10.3390/s25072103
Zhou F, Zou J, Xue R, Yu M, Wang X, Xue W, Yao S. Enhancing Object Detection in Underground Mines: UCM-Net and Self-Supervised Pre-Training. Sensors. 2025; 25(7):2103. https://doi.org/10.3390/s25072103
Chicago/Turabian StyleZhou, Faguo, Junchao Zou, Rong Xue, Miao Yu, Xin Wang, Wenhui Xue, and Shuyu Yao. 2025. "Enhancing Object Detection in Underground Mines: UCM-Net and Self-Supervised Pre-Training" Sensors 25, no. 7: 2103. https://doi.org/10.3390/s25072103
APA StyleZhou, F., Zou, J., Xue, R., Yu, M., Wang, X., Xue, W., & Yao, S. (2025). Enhancing Object Detection in Underground Mines: UCM-Net and Self-Supervised Pre-Training. Sensors, 25(7), 2103. https://doi.org/10.3390/s25072103