1. Introduction
The mineral transportation conveyor belt is an indispensable piece of equipment in the production and transportation of minerals. During operation, due to the friction and bumping with ore for a long time, which is also affected by factors such as the hardening and aging of the rubber, the surface of the conveyor belt will often appear to have edge loss, surface cracking, holes, covering rubber bulging or skinning, large area wear, deep scratches and other damages. Serious breakage is a precursor to the tearing of the conveyor belt, and if the breakage is not dealt with in time, the breakage will continue to expand with the increase in running time. Finally, destructive tearing can occur, which is a major hazard to the safety of personnel and property [
1]. The current traditional manual inspection method requires regular inspection by running the machine at a low speed when the machine is empty, and this inspection method is limited by the time of inspection and the accuracy of the workers’ inspection, which makes it difficult to detect damage on the surface of the conveyor belt in a timely, accurate and stable manner.
To solve the problems of traditional manual inspection methods and improve the efficiency, time and reliability of conveyor-belt-damage detection, Wang et al. [
2] proposed a scheme for the nondestructive inspection of conveyor belts using X-rays. Yang et al. [
3] proposed a method for detecting longitudinal tears in conveyor belts using infrared images, which performs ROI(region of interest) selection and the binarization of the images, and then determines whether an early warning should be issued based on the number of connected. Yang et al. [
4] proposed an early warning method for longitudinal tear detection in conveyor belts based on infrared spectral analysis, where the spectral characteristics of the infrared radiation field are used to determine whether there is a risk of longitudinal tear on the conveyor belt. Qiao et al. [
5] proposed a longitudinal tear detection method based on visible charge-coupled devices (CCDs) and infrared CCDs. Combining two CCDs yields a more reliable conveyor belt tear detection method.
In recent years, with the continuous development of deep-learning technology and its great progress in the field of computer vision, target detection models based on deep-learning technology have also achieved better results in conveyor-belt-damage detection tasks. Unlike traditional computer vision methods that focus on processing images using fixed algorithms and processes to extract specific regions of the image, deep-learning-based computer vision techniques can achieve a more accurate, faster, and more intelligent detection by extracting deep image features of the target from a large number of data samples. At present, deep-learning-based target detection algorithms are mainly divided into two-stage detection algorithms represented by Mask R-CNN [
6] and Faster R-CNN [
7], and single-stage detection algorithms represented by SSD [
8], RetinaNet [
9] and YOLO [
10]. The former is slower because it first generates the candidate frames of the target region, and then classifies each candidate frame, requiring two steps to complete the detection of the target. The latter can predict all the bounding boxes by feeding the images into the network only once, which makes it faster and is more often used in industrial scenarios.
As shown in
Table 1, a number of studies have been conducted on conveyor belt detection using deep-learning-based methods. Zhang et al. [
11] proposed a conveyor-belt-detection method using EfficientNet to replace YOLOv3’s backbone network Darknet53, which achieved 97.26% detection accuracy. Wang et al. [
12] proposed a detection model combining BTFPN and YOLOX to achieve 98.45% detection accuracy for conveyor belt damage. To improve the speed of detection of conveyor belt damage, Zhang et al. [
13] proposed a lightweight network based on YOLOv4 to improve the detection speed of conveyor belt damage, but its detection accuracy was only 93.22%. Guo et al. [
14] proposed a novel multiclassification conditional CycleGAN (MCC-CycleGAN) method for the detection of conveyor-belt-surface damage.
At present, deep-learning-based conveyor-belt-detection methods are generally improved in one of the aspects of detection accuracy or speed. When improving detection accuracy, it often leads to a slower detection speed and, conversely, increasing detection speed causes a decrease in detection accuracy. Therefore, improving detection accuracy while also improving detection speed is still a challenge.
Since there is no publicly available high-quality conveyor-belt-defect dataset, it is difficult to obtain a sufficient amount of data to train the deep-learning network, which makes it difficult to guarantee the detection and generalization performance of the model. In addition, due to the performance limitations of the equipment in actual production, how to balance and trade-off between detection accuracy and speed is also an important issue. In this paper, we focus on the above two problems in practical applications and propose a new conveyor-belt-defect detection method based on knowledge distillation.
Section 2 presents our proposed data enhancement method with the improvement of the YOLOv5 model using model pruning and knowledge distillation methods.
Section 3 shows the experimental setting with the relevant parameters and the results of the proposed improved method in this paper. Finally, in
Section 4, we conclude with a discussion and summary of the methods used in this paper.
3. Result and Discussion
3.1. Experiment Environment and Parameter Settings
All experiments in this paper were implemented on the open-source framework PyTorch 1.11.0, using a computer with Intel Xeon E5-2690 v4 CPU configuration, a Tesla P100-PCIE-16GB graphics card model and an Ubuntu 20.04 operating system. The initialized learning rate lr = 0.01, number of iterations epoch = 100, and other specific parameters are shown in
Table 2.
3.2. Dataset and Evaluation Indicators
Since the conveyor belt defect samples were different from the scratch samples in terms of morphology and location, and were more difficult to obtain than the scratches, they could not be enhanced using GANs. Therefore, in order to solve the problem of insufficient conveyor belt defect samples and the imbalance between the number of samples and scratch samples, this paper adopted an oversampling method to enhance the conveyor belt defect samples. The samples were oversampled by repeatedly adding an image to the training several times during the network training process. We divided the original defective samples into two parts: training set and test set, and then performed conventional data enhancements, such as rotation and cropping. Finally, the data-enhanced training set samples were three-times oversampled to generate the final conveyor belt’s defective samples dataset.
In this paper, we used a total of 1533 conveyor belt images taken at the production site and manually annotated them; then, the dataset was enhanced by rotating, cropping and adding noise to generate 3066 images. A total of 1500 images were generated by the joint GAN and copy–pasting strategy; 1177 samples were generated by oversampling the defective conveyor belt samples, then, we obtained a dataset containing 7276 conveyor belt samples. The training set and test set were divided with a ratio of 8:2 and, finally, 5821 samples were obtained from the training set and 1455 samples were obtained from the test set.
To have a more scientific evaluation standard for the model’s performance, the evaluation indexes used in this paper included classification accuracy (Precision, P), recall rate (Recall, R), mean average accuracy (mean Average Precision,
[email protected]), number of model parameters (Params) and inference time (Inference/ms), which are formulated as follows:
The TP (true positive) indicates the number of correctly predicted positive samples, FP (false positive) indicates the number of incorrectly predicted positive samples, and FN (false negative) indicates the number of incorrectly predicted negative samples.
3.3. Data Augmentation Strategy Ablation Experiments
To prove the effectiveness of our data augmentation strategy, we used different datasets to train YOLOv5n, and compared the detection results of different datasets when using the same detection model.
From
Table 3, we can see that, when we train the model using the original data, it is difficult to train the detection model effectively due to the small number of samples; thus, the detection accuracy of the model is relatively poor. Then, we used traditional data augmentation methods, such as rotation, translation and cropping, to augment the data: the detection accuracy was improved to a certain extent. After that, we used our data augmentation method combining GAN and copy–pasting strategies to augment the dataset, and the detection accuracy was further improved. However, since the number of conveyor belt defect samples was still too small, in order to improve the detection accuracy of the defect parts, we oversampled the conveyor belt defect samples in the training set to obtain the final dataset. Finally, we obtained 7276 conveyor belt images, which contained about 12,000 scratch samples and about 1200 conveyor-belt-edge defect samples.
After data augmentation, the number and diversity of conveyor-belt-damage samples in the dataset were significantly improved, increasing the number of samples while enriching the diversity of samples and improving the generalization ability of the detection model. The
[email protected] can reach 95.57% accuracy in detecting conveyor belt damage, which meets the requirements for detection accuracy in industrial applications.
3.4. Results of Model-Pruning Experiments
To prune the model, it is first necessary to train the model sparsely, as shown in
Figure 10. We visualized the scaling factors of the BN layer. It can be seen from the figure that the BN layer scaling factor is normally distributed before the model is sparse trained, and after the sparse training, the distribution of the BN layer scaling factor gradually converges to near zero due to the L1 regularization of the parameters. Then, the channels close to zero are removed to achieve the pruning of the model.
In this paper, we pruned and fine-tuned the lightweight network YOLOv5n, and compared the performance of various aspects of the model with different pruning ratios. The results are shown in
Table 4.
From
Table 2, it can be seen that, with the increase in pruning rate, the number of model parameters and inference time decreased, but at the same time, the detection accuracy of the model decreased, even after fine-tuning. When the pruning ratio exceeded 70%, the model performance decreased more obviously. Therefore, considering both the accuracy and speed of the model, this paper used a network with a pruning ratio of 70% as the final mini-network and named it YOLOv5n-slim, based on which a knowledge distillation algorithm was used to distill it to improve the performance of the model.
3.5. Experimental Results of Knowledge Distillation Algorithm
To verify the effectiveness of the knowledge distillation strategy used in this paper, YOLOv5m was used as the teacher network, and YOLOv5n and YOLOv5n-slim were used as the student network with mini-networks. In this paper, the network after YOLOv5n distillation was named YOLOv5n(KD), and the network after YOLOv5n-slim distillation was named YOLOv5n-slim(KD). The test results of the model are shown in
Table 5.
To compare the performance of the models after distillation, three models of different sizes of YOLOv5 were tested in this paper, as can be seen from
Table 3: (1) In the YOLOv5 series of models, the corresponding detection accuracy and inference time increased as the model increased from YOLOv5n to YOLOv5m. (2) The pruned network YOLOv5n-slim decreased by 4.83% in
[email protected], 72.16% in the number of parameters, and 29.41% in the inference time compared with that before pruning. (3) After the introduction of knowledge distillation, the detection accuracy of the model was relatively significantly improved compared to the previous one, where YOLOv5n(KD) improved by 1.76% compared to
[email protected] before distillation, and
[email protected] decreased by 1.11%, parameter volume decreased by 91.56% and inference time decreased by 83.17%, compared to the teacher network YOLOv5m. Compared with the network YOLOv5s, which is larger than it, a 75.66% reduction in parameter volume and a 63.83% reduction in inference time were achieved at the cost of a 0.2% reduction in
[email protected]. (4) The pruned student network YOLOv5n-slim showed a 4.09% improvement in
[email protected] after distillation, compared with the teacher network YOLOv5m
[email protected] which decreased by 2.92%. Additionally, the amount of network parameters decreased by 97.65%, and the inference time decreased by 88.12%. This shows that the knowledge distillation algorithm used in this paper can effectively improve the accuracy of the detection model without increasing the complexity of the model.
3.6. Feature Map Comparison Analysis
We choose to visualize the feature maps output by the network detection head, and the results are shown in
Figure 11.
From the comparison of the feature maps, it can be seen that the distilled network has higher activation values in the target region with clearer boundaries and is less influenced by background information. Therefore, the fine-grained feature-simulation distillation method used in this paper is a good guide for improving the feature-learning performance of the network.
3.7. Comparison with Other Models
To show the effectiveness of the knowledge distillation algorithm, we compared it with other YOLO lightweight networks, and the experimental results are shown in
Table 6.
From the table, it can be seen that, compared to the lightweight network YOLOv3-tiny, YOLOv5n(KD), after using the distillation algorithm in this paper, was improved by 0.59% in
[email protected], reduced 79.7% in parameters, and reduced by 19.05% in inference time. The pruned model YOLOv5n-slim(KD) was 1.91% lower in mAP and had 94.35% less parameters and 42.86% less inference time.
Compared with YOLOv4-tiny, the model after using knowledge distillation in this paper had a 13.03% and 10.53% improvement in
[email protected], a 70.46% and 91.91% reduction in the number of parameters, and a 65.31% and 75.51% reduction in inference time, respectively.
Compared with the lightweight model YOLOv7-tiny in the latest YOLO version YOLOv7, the distilled model
[email protected] was improved by 11. 84% and 9.34%, the number of parameters was reduced by 70.76% and 91.86%, and the inference time was reduced by 66% and 76%, respectively. The knowledge distillation algorithm used in this paper can assist the student network to better learn the features of the target to be detected and effectively improve the detection accuracy of the model by distilling the features of the target region without increasing the complexity of the model.
As shown in
Table 7, we also compared our research methods with those of others in the same area. Although the experimental environment and equipment were different, our method achieved a detection accuracy similar to other algorithms on devices with small differences in performance, and it achieved a significant improvement in detection speed.
In conclusion, we compared all the algorithms mentioned in this paper, and the results are shown in
Figure 12:
4. Conclusions and Future Work
We proposed a new detection method based on knowledge distillation in this paper to address the problem that defective samples are difficult to obtain in the damage detections of the mineral-transportation conveyor belt. Firstly, the DCGAN was used to generate conveyor belt scratch samples, and a data enhancement method combining GAN and copy–pasting strategies was proposed to effectively solve the problem of insufficient samples to train the neural network model. In addition, to address the requirements of model accuracy and speed in production environments, this paper pruned the model of YOLOv5n, investigated the model performance under different pruning ratios, and finally generated a smaller miniature network, before adopting a knowledge distillation method based on fine-grained feature simulation for the lightweight network YOLOv5n and the miniature network YOLOv5n-slim for knowledge distillation. For the task of detecting conveyor damage, the
[email protected] can reach 97.33% after knowledge distillation for the lightweight network YOLOv5n and 94.83% for the micro network YOLOv5n-slim, both of which meet the requirements of detection accuracy and speed in industrial applications.
Compared with other research, our study innovatively introduced model pruning and knowledge distillation to the task of conveyor-belt-damage detection. The model-pruning and knowledge distillation method designed in this paper significantly reduces the size of the model and improves the detection accuracy of the model without increasing the complexity of the model, enabling it to approach the detection accuracy of medium and large models with a smaller size and a faster inference speed. This allows our detection model to be deployed on lower-performance devices.
In following research, we will continue to work on the lightweighting of the model and the improvement of the detection accuracy. Additionally, we will study the influence of environmental factors, such as light and dust, on the detection results.