Improved Lightweight YOLOv4 Foreign Object Detection Method for Conveyor Belts Combined with CBAM

Liu, Jiehui; Qiao, Hongchao; Yang, Lijie; Guo, Jinxi

doi:10.3390/app13148465

Open AccessArticle

Improved Lightweight YOLOv4 Foreign Object Detection Method for Conveyor Belts Combined with CBAM

School of Mechanical and Equipment Engineering, Hebei University of Engineering, Handan 056038, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2023, 13(14), 8465; https://doi.org/10.3390/app13148465

Submission received: 28 June 2023 / Revised: 16 July 2023 / Accepted: 19 July 2023 / Published: 21 July 2023

Download

Browse Figures

Versions Notes

Abstract

:

During the operation of the belt conveyor, foreign objects such as large gangue and anchor rods may be mixed into the conveyor belt, resulting in tears and fractures, which affect transportation efficiency and production safety. In this paper, we propose a lightweight target detection algorithm, GhostNet-CBAM-YOLOv4, to resolve the problem of the difficulty of detecting foreign objects at high-speed movement in an underground conveyor belt. The Kmeans++ clustering method was used to preprocess the data set to obtain the anchor box suitable for the foreign object size. The GhostNet lightweight module replaced the backbone network, reducing the model’s parameters. The CBAM attention module was introduced to enhance the ability of feature extraction facing the complex environment under the mine. The depth separable convolution was used to simplify the model structure and reduce the number of parameters and calculations. The detection accuracy of the improved method on the foreign body data set reached 99.32%, and the detection rate reached 54.7 FPS, which was 6.83% and 42.1% higher than the original YOLOv4 model, respectively. The improved method performed better than the original model on the other two datasets and could effectively avoid misdetection and omission detection. In comparison experiments with similar methods, our proposed method also demonstrated good performance, verifying its effectiveness.

Keywords:

foreign object detection; CBAM; YOLOv4; GhostNet; depth-separable convolution; anchor box optimization

1. Introduction

China is the country with the highest energy consumption and the most significant coal consumption. As the primary energy source in China, coal has always dominated the direction of energy production and the consumption structure in China, ensuring the efficiency and safety of coal production and transportation for the stability of energy development. According to statistics, China’s coal output reached 4.07 billion tons in 2021: an increase of 4.7% over the same period of last year. Coal consumption advanced by 5.24 billion tons, an increase of 5.2% over the same period of last year, and coal consumption accounted for 56.0% of the national energy consumption [1]. It can be seen that the coal industry not only plays an irreplaceable role in promoting national economic development but also plays an essential role in improving the quality of life and the living standards of people [2].

The belt conveyor is the leading equipment for underground coal mine transportation, which has the advantages of a large load, low economic cost, and long-distance transport and undertakes important tasks in coal transportation. The smooth operation of belt conveyors and their safe production are critical conditions for coal transportation. A conveyor belt is the core and weakest part of the belt conveyor, and its maintenance cost accounts for about 40% of the total cost. However, in the complex environment of coal development, some uncontrollable problems could bring potential safety hazards to the belt conveyor during transportation, such as cracks and jams in the transmission joints or the tearing of the conveyor belt [3]. Due to the invasion of foreign objects (gangue, anchor, angle iron, logs), affects the production efficiency of coal mines and the transportation efficiency of belt conveyors, causing substantial economic losses to mine production. In severe cases, it may endanger the lives and safety of construction workers and cause serious coal mining accidents. Therefore, the real-time and timely detection and handling of non-coal foreign objects on the belt conveyor under the mine can minimize the damage caused by foreign objects to the conveyor and improve the utilization rate of coal, which is of great significance to the safe and stable production of coal mines and the improvement of production efficiency.

The manual detection method and X-ray method [4] are the traditional methods for foreign body detection. According to research, small coal mines still use the manual sorting method of foreign materials, and workers work long hours in a dark and humid environment with low productivity, wasting a lot of human resources and making it difficult to protect their health. X-ray detection is based on the difference in energy absorption coefficients of elements in foreign and non-foreign objects. Applying the radiation method to foreign object detection is expensive and can be hazardous to workers’ health. In recent years, with the rapid development of machine vision technology, image recognition technology has been applied to foreign object detection, replacing the traditional foreign object detection methods. Image recognition techniques use grayscale information features [5,6,7] to analyze texture information and support vector machines to recognize it. All these methods require human-specified features or complex logical inference operations and are subject to the influence of the external environment, with poor generalization and robustness.

With the widespread use of convolutional neural networks and improved computer hardware capabilities, object detection algorithms based on deep learning have made significant progress. The current target detection algorithms can be divided into two categories. One is the two-stage target detection algorithm; the representative algorithm is Faster-RCNN. The detection principle is based on target detection by extracting candidate regions in the image and then classifying and regressing each candidate region. The Fast-RCNN model has a high detection accuracy; however, the detection speed is slow, making it difficult to realize the real-time detection task. The pioneering work of one-stage object detection algorithms [8] is the YOLOv1 [9] object detection algorithm, which omits the process of extracting feature candidate boxes and directly divides the input image into regions. The commonly used single-stage target detection algorithms are SSD [10], YOLOv2 [11], YOLOv3 [12], YOLOv4, YOLOv5 [13], YOLOv6 [14], and YOLOv7 [15], DETR [16]. DETR (DEtection Transformer) is a Transformer-based end-to-end target detection that was proposed in 2020. It employs a Transformer-based encoder-decoder architecture that greatly simplifies the target detection pipeline. Its effect on the COCO dataset is comparable to that of Faster RCNN, and its structure is simpler than that of Faster RCNN. However, DETR training time is extremely long, the convergence speed is slow, and the required training equipment configuration is high. Especially in the face of high-resolution photos, its recognition of small target objects is poor, and the detection speed does not meet the requirements of real-time detection. Therefore, DETR is not suitable for conveyor belt foreign body detection.

Given the defects of image recognition and detection methods, current domestic and foreign scholars have applied deep learning methods to detect foreign objects. To achieve multi-scale fusion, Wu Shoupeng et al. [17] improved the feature pyramid in the Faster-RCNN model to identify foreign objects on the conveyor belt. The recognition effect was better. Lv et al. [18] preprocessed the foreign object images on the conveyor belt and used the improved Faster-RCNN [19,20] model to achieve foreign object recognition on the conveyor belt. This model was of high accuracy. However, the detection speed was slow, making it challenging to meet the conditions of real-time foreign object detection on the conveyor belt in a complex environment. Du, Jinyi, et al. [21], based on YOLOv3, replace the backbone network, optimizes the loss function, and adds a bidirectional feature pyramid to improve the detection accuracy of the model foreign object recognition. Ren et al. [22] designed a Fast-YOLOv3 algorithm with the addition of a deconvolution network and a Stipic data enhancement method, which improved the detection speed and detection accuracy. Jinyi Du et al. [23] used a lightweight network SSD model incorporating a self-attentive mechanism to achieve foreign object detection. The detection accuracy and speed were better than the Faster-RCNN and YOLOv3 models. The above methods are challenging to attain foreign object detection under working conditions regarding detection accuracy and the detection speed of identifying foreign objects. Shuai Hao [24] proposed a YOLOv5 target detection algorithm incorporating the CBAM [25,26,27,28] model to solve the problem of the inaccurate detection of foreign objects on the conveyor belt due to its high-speed motion and interference from coal dust and light. The detection accuracy of this method was improved based on YOLOv5; however, it was challenging for the detection speed to accomplish the task of real-time detection, and the problem of small target object detection was ignored.

In the industry, it is a challenge to keep updating the algorithms as they iterate. We have to consider low-cost training and deployment devices in industrial applications. In the foreign object detection environment of complex conveyor belts, we need to ensure that the detection method is highly accurate with a high speed. YOLOv4 is a well-balanced single-target detection algorithm with good accuracy and speed. Low complexity, few parameters, and fewer training platform configurations characterize the YOLOv4 model. In order to achieve the goal of real-time detection, we gathered various modules to reduce the parameter quantity while ensuring detection accuracy. Compared with the latest YOLOv7 algorithm, the precision of this study was the same as YOLOv7: the detection speed was better than YOLOv7, and the parameters and model complexity were far less than YOLOv7. This enabled our solution to be deployed on devices with limited resources, such as embedded systems. The improved method in this paper could be applied to other algorithms for the real-time detection of foreign objects.

Our dataset contained seven foreign objects: gangue, screw, nut, angle iron, wooden stick, U-shaped iron, quadruped, etc. The size and shape of the foreign objects varied, and there were more small target objects. In the complex background environment, foreign objects were easily confused with the background as especially the shape, size, and color of gangue and coal are similar, which makes it difficult to detect. In transportation, detecting foreign objects moving at a high speed in an underground conveyor belt is challenging. The detection accuracy is low, and the speed is slow. We should suppress irrelevant information in the background, enhance the feature extraction ability of foreign objects, and adopt the detection method of a lightweight model to improve detection accuracy and speed.

The conveyor belt is in a complex environment; therefore, we had to design a lightweight, high-precision, real-time foreign object detection model. By combining YOLOv4 [29,30,31,32,33] and GhostNet, we designed a YOLOV4 lightweight conveyor belt foreign object detection method incorporating CBAM. Under the condition of ensuring the high accuracy of YOLOv4, we used GhosNet to reduce the calculation and parameters of the model to improve the detection speed. In a complex background environment, foreign objects are easily confused with the background, and it is difficult to extract their feature information through feature extraction. We introduced the CBAM attention module to suppress irrelevant backgrounds and improve the ability of feature extraction in complex environments. The foreign objects on the conveyor belt had the characteristics of a significant size difference, and small target objects could easily miss detection. We used the kmeans++ clustering algorithm to optimize the dataset and obtain the suitable anchor box for foreign object target size, which could obtain better training results. In addition, our solution used GhostNet and depth-separable convolution to reduce the computational burden of YOLOv4, which enabled our solution to be deployed on devices with limited resources, such as embedded systems.

The following points summarize the main contributions of the research and provide an efficient and accurate real-time foreign object detection method:

(1): Based on the YOLOv4 model, we proposed a lightweight and high-precision foreign object detection model;
(2): Under the condition of ensuring the high accuracy of YOLOv4, we used GhostNet to reduce the calculation and parameters of the model and improve the detection speed;
(3): We integrated the CBAM module into the YOLOv4 network to improve the feature extraction capability and detection accuracy in complex environments;
(4): We used the kmeans++ clustering algorithm to optimize the dataset and obtain the suitable anchor box for a foreign object target size;
(5): We introduced deep separable convolution instead of a 3 × 3 ordinary convolution in the model, which reduced the model’s parameters and improved the detection speed.

The structure of this paper consists of five main parts. In the first section, the critical problems and challenges in foreign object detection, especially in underground conveyor belt detection, and the great significance of solving these problems are expounded in detail. The second part introduces the principles of YOLOv4 and GhostNet, and the proposed model technology can be deeply discussed. The third section puts forward the methods to improve each module in detail, including Convolutional Block Attention Module, Deeply separable convolution, and the Anchor Frame Optimization Method. In Section 4, the detection performance of each improved module is presented in detail, and the experimental results are evaluated and analyzed, with comparisons made to similar methods. Finally, Section 5 concludes the paper and outlines future research directions.

2. Relevant Network Models

2.1. YOLOv4 Algorithm

YOLO is the first target detection system based on a single neural network proposed in 2015. YOLOv1 is fast in its detection but not accurate enough in localization and has a low recall rate. YOLOv2 often improves on YOLOv1, increasing both accuracy and speed. YOLOv3 uses the residual model and FPN architecture to deepen the network structure and realize multi-scale detection. YOLOv4 uses many optimization strategies based on the original YOLO target detection architecture, with different degrees of optimization in data processing, backbone, network training, activation function, and loss function. Yolov4 and yolov5 are worth mentioning. Besides the improvement of the backbone, they also made innovations to data enhancement, such as the introduction of mosaic, mixup, and other methods. YOLOX uses YOLOv3 as the basic network to improve, adding three Decoupled heads to the output layer and converting the anchor-based detector into the anchor-free mode. Recently, the follow-up work of the YOLOv4 team YOLOv7 was put forward. It is superior to most existing target detectors in speed and accuracy by introducing dynamic label allocation and the re-parameterization of the model structure. We finally chose the YOLOv4 algorithm as the basis of improvement.

The YOLOv4 algorithm is a single target detection algorithm that does not generate candidate regions for direct classification and regression, and its network structure is shown in Figure 1. YOLOv4 consists of four parts: the input, backbone network, neck network, and prediction. The network model structure is shown below: the input was processed by mosaic data enhancement; the Backbone was selected with the CSPDarknet53 backbone network to extract features, and three effective feature layers were generated for prediction and regression analysis. The CSPDarknet53 network structure was used to reduce the computational bottleneck and memory cost, ensure its model accuracy, and improve CNN’s learning ability for model training. The feature maps extracted by the backbone network were input to the neck network neck. The SPP module and FPN [34] + PAN feature pyramid were integrated. The network Prediction was performed on the three effective feature layers to generate the final prediction frame.

2.2. GhostNet Module

Deep neural network models have been widely used in machine vision tasks, such as image classification and object detection, with great success. However, the storage and computation of neural network models on embedded devices is still a great challenge due to the limitation of storage space and power consumption. Therefore, utilizing an efficient and high-performance lightweight neural network is the key to solving this problem. The core of lightweight networks is to lighten the network in terms of size and speed while maintaining as much accuracy as possible. The SqueezeNet [35] family is a relatively early and classic lightweight network. SqueezeNet uses the Fire module for parameter compression. MobileNetV1 [36] builds lightweight networks using deeply separable convolutions. MobileNetV2 [37] proposes an innovative inverted residual with a linear bottleneck unit, which improves the overall network accuracy and speed although the number of layers often increases. MobileNetV3 [38], on the other hand, combines AutoML techniques with manual fine-tuning for lighter-weight network construction. Based on the analysis of the experimental results, we finally chose GhostNet lightweight network.

The GhostNet module network [39,40,41,42] structure is shown in Figure 2, which uses the principle of linear transformation first to obtain part of the base feature layer using the standard convolution calculation. Then, it obtains the Ghost feature layer using the GhostNet module, stacks the base feature layer with the Ghost feature layer, and outputs it. This structure significantly reduces the number of parameters and improves computational speed.

Suppose that the input feature layer size was

h_{1} \times w_{1} \times n

, and the size of the output feature layer was

h_{2} \times w_{2} \times m

, the size of the convolution kernel was

k \times k

, the computation of normal convolution was

L_{1}

, and the computation volume of the Ghost feature layer convolution was

L_{2} .

The first convolution kernel of the Ghost feature layer was

k_{1}

, and the second convolution kernel was

k_{2}

. The number of base feature layers converted into redundant feature layers was

s

.

The computational effort of ordinary convolution

L_{1}

would be,

L_{1} = h_{2} \times w_{2} \times m \times n \times k \times k

(1)

The computational volume of Ghost feature layer convolution

L_{2}

would be,

L_{2} = h_{2} \times w_{2} \times \frac{m}{s} \times n \times k_{1} \times k_{2} + (s - 1) \times h_{2} \times w_{2} \times \frac{m}{s} \times n \times k_{2} \times k_{2}

(2)

From Equations (1) and (2), the ratio of the calculated quantities could be deduced,

\frac{L_{1}}{L_{2}} = \frac{h_{2} \times w_{2} \times m \times n \times k \times k}{h_{2} \times w_{2} \times \frac{m}{s} \times n \times k_{1} \times k_{2} + s - 1 \times h_{2} \times w_{2} \times \frac{m}{s} \times n \times k_{2} \times k_{2}} \approx S

(3)

From Equation (3), we could obtain the simplified computation using the GhostNet module for feature extraction, which is about 1/s of that when using conventional convolution, greatly improving the operation speed and enhancing the detection speed of the whole model.

Ghost modules are stacked by bottleneck structures (GhostBottlenecks), which can be divided into two categories as step 1 and step 2, which are as shown in Figure 3. There were two Ghost modules in the bottleneck structure. The first Ghost module was used to increase the number of channels, and the second Ghost module decreased the number of channels to match the channels of the shortcut branch. This structure did not compress the height and width of the incoming feature layer but only deepened the depth of the network. When the step size was 2, a deep convolutional layer with step size 2 was added between the two Ghost modules to adjust and match the number of channels between the input and output.

3. Foreign Object Detection Method Based on GhostNet-CBAM-YOLOv4 for Conveyor Belts

3.1. GhostNet-CBAM-YOLOv4 Model

The GhostNet-CBAM-YOLOv4 model is shown in Figure 4. This model used GhostNet instead of CSPDarknet53 in YOLOv4 as the backbone network. The h-swish loss function was used in the Ghost module to improve the stability of foreign object recognition accuracy. Five feature layers of the Ghost module were retained. The feature layers were obtained by adjusting the stacking and convolution blocks using GhostBottlenecks. In order to improve the detection capability of foreign objects of various sizes in the mine conveyor belt, the CBAM attention module was added after the three effective feature layers to enhance feature extraction capability in the face of the complex environment under the mine. The feature maps processed by the attention mechanism were fused with features using PANet and were input into YOLOHead for their prediction to obtain the results.

3.2. CBAM Attention Mechanism

In order to cope with the complex and harsh environment under the mine and improve the feature extraction capability, the spatial and channel attention mechanisms were inserted into the three effective feature layers of the improved YOLOv4 model’s backbone output, and its network structure is shown in Figure 5.

The CBAM attention model consists of two parts: channel and space. The channel attention module pools the incoming feature map F for global average pooling and global maximum pooling, respectively. The two feature layers obtained from pooling were summed and activated by the sigmoid function to generate the final features required for channel attention.

The channel attention mechanism could be expressed as,

M_{C} (F) = σ (M L P (A v g p o o l (F)) + M L P (M a x p o o l (F))) = σ (w_{1} (w_{0} (F_{a v g}^{c})) + w_{1} (w_{0} (F_{m a x}^{c}))

(4)

In this equation

M_{C}

denotes the channel attention in the model,

σ

represents the sigmoid activation function,

M L P

represents the multilayer perceptron,

w_{0}

and

w_{1}

represent the weight matrix,

F_{a v g}^{c}

and

F_{m a x}^{c}

represent the global average pooling vector and the global maximum pooling vector, and

A v g p o o l

and

M a x p o o l

represent the feature map in the module for average pooling and maximum pooling.

The feature map F output from the channel attention module was used as the input feature of this module. The spatial attention module performed global average pooling and global maximum pooling operations on the input results. The two generated feature layers were stitched onto the channel for the splicing operation, downscaled using convolution and sigmoid activation. Finally, the features required by the spatial attention module were generated.

The spatial attention mechanism could be expressed as,

M_S (F) = σ (f^(7 \times 7) ([A v g p o o l (F)]; (M a x p o o l (F))) = σ (f^(7 \times 7) ([F_a v g^S; F_\max^S])

(5)

M_{s}

denotes the spatial attention in the convolutional block attention model and

f^{7 \times 7}

represents 7 × 7 convolutional operations.

3.3. Depth Separable Convolution

Ensuring the detection accuracy of this model, the depth-separable convolution [43] was introduced in the backbone network instead of the 3

\times

3 ordinary convolution in the backbone network, which reduced the number of parameters and computation in the network model and improved the detection speed. The depth-separable convolution structure is shown in Figure 6.

If we input a feature map of size

H \times W \times N

, the size of the convolution kernel would be

K \times K \times P

(where K is the size of the convolution kernel). The step size of the convolution would be one.

The number of parameters for standard convolution

W_{S C}

would be,

W_{S C} = N \times P \times K \times K

(6)

P

represents the number of convolution kernels,

N

represents the number of feature maps, and K is the size of the convolution kernel.

The calculated volume

Q_{S C}

would be expressed as,

Q_{S C} = N \times P \times K \times K \times H \times W

(7)

The depth separable convolutional integration can be conducted in two steps. A channel of the input feature layer can only be convolved by the convolutional kernel of that channel, while the number of channels in the output tensor remains unchanged. Then, the channel-adjusted feature fusion was performed by point-by-point convolution.

The computational effort of the deeply separable convolution

Q_{D S C}

would be,

K \times K \times H \times W + W \times H \times N \times P

(8)

The ratio of the two convolutions can be derived from Equations (7) and (8),

\frac{Q_{D S C}}{Q_{S C}} = \frac{K \times K \times H \times W + W \times H \times N \times P}{N \times P \times K \times K \times H \times W} = \frac{1}{N} + \frac{1}{K^{2}}

(9)

From the analysis of Equation (9), it can be obtained that the computation of deep convolution was

\frac{1}{N} + \frac{1}{K^{2}}

times the normal convolution. When the number of convolution kernels

N

and size

K

was larger, its computation was smaller, and the detection speed of the model was faster.

3.4. Anchor Frame Optimization Method

The GhostNet-CBAM-YOLOv4 model was trained before using the Kmeans++ [44,45,46] algorithm to preprocess the dataset and redesign the size of the anchor frame to obtain a suitable anchor frame for the foreign object target size.

Enter the width-height set T and the cluster center K of the foreign object and coal onto the conveyor belt in the dataset.
Randomly select one point from the set T as the initial clustering center O₁.
Calculate the shortest distance D(x) between each sample and the existing cluster center, calculate its probability of becoming a cluster center P(x), select the new cluster center by roulette wheel method, and find where k cluster centers.

$P (x) = {\frac{D (x)}{\sum_{x ϵ N} D (x)}}^{2}$

(10)

4: Calculate the distance D(x) of all the points in set T to each of the k cluster centers and classify the point into the category of the cluster center with the smallest distance.
5: For the clustering results, recalculate the center Ci category center Ci of each cluster category.

$Ci = \frac{\sum_{x \in C_{i}} x}{|C_{i}|}$

(11)

where x is the sample of kind Ci. Ci is the center of each clustering category.

6: Repeat step (4) and output the k cluster center results when the cluster center Ci of each cluster category no longer changes.

According to Figure 7, the anchor frame size of the GhostNet-CBAM-YOLOv4 model was optimized to obtain nine sets of a priori frames: (12, 15), (24, 39), (26, 77), (29, 50), (31, 28), (33, 35), (38, 53), (48, 61), (64, 38), where the intersection ratio of the selected prior frames and the target to be identified reached 87.14%. It could be seen that the clustering centers of foreign objects with different sizes were evenly distributed, and each class was closely spaced from the clustering center, which effectively accelerated the convergence speed of the network and improved the accuracy.

4. Experiments

4.1. Data Acquisition and Processing

On 6 April 2022, we collected images in the coal conveyor belt of Hebei University of Engineering and made a foreign object data set. The image was captured by HIKVISION’s MV-CA013-20GC industrial camera, which was fixed before shooting and set to a manual shooting mode to adjust the angle and supplement the light source. This camera had 1.3 million pixels, a high frame rate with a resolution of 1280 × 1024 and 90 FPS, a global shutter, multiple exposure modes, and 1 Gbps bandwidth with a Gigabit Ethernet interface, which could adapt to a long-term operation in harsh environments. Unqualified photos, such as blurred backgrounds and the too-high and too-low light intensity of the pictures, were removed, and 2031 qualified images were finally screened. The foreign object data set was divided according to the ratio of 8:1:1, 1626 images were randomly selected as the training set, 203 were randomly selected as the validation set, and 203 images were randomly selected as the test set. To enhance the generalization ability of the model, seven kinds of non-coal foreign objects in this experimental dataset included gangue, screw, nut, angle iron, wooden stick, U-shaped iron, and quadruped. The region and category information of the foreign objects were labeled using labeling software. The number of images for non-coal foreign objects is shown in Table 1.

In order to evaluate the algorithm proposed in this paper more accurately, we used the HIKVISION MV-CA013-20GC industrial camera to collect 800 coal images transported by the Xiaotun Coal Mine in Fengfeng Mining Area of Handan on 15 March 2023 and made a data set of coal gangue on a conveyor belt. The gangue dataset was divided into the training set, validation set, and test set according to the ratio of 8:1:1. The large amount of gangue in this dataset was much higher than the amount of coal, using the labeling software to label the coal areas and categories of information. Due to the complex environment of the underground, the high-speed movement of the conveyor belt, and the high amount of coal gangue, the inspection task was extremely challenging.

This paper also used VOC2007 [47] and VOC2012 [48] data set to verify the algorithm proposed in this paper. The VOC2007 and VOC2012 data set consisted of twenty different categories, including airplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, Motorbike, Person, Potted plant, Sheep, Sofa, Train, tv monitor. According to the benchmark of the PASCAL data set, we selected 1000 pieces from the train + val of VOC2007 and the train + val of VOC2012 for training. Finally, the test of VOC2007 was used to verify the algorithm’s performance.

4.2. Experimental Preparation

In object detection tasks, metrics such as precision, recall, average precision (AP), mean average precision (MAP), and FPS are commonly used to evaluate the classification performance of the algorithms. The calculation formula for each evaluation index is as follows:

P = \frac{T P}{T P + F P}

(12)

R = \frac{T P}{T P + F N}

(13)

A P = \int_{0}^{1} p (R) d R

(14)

M A P = \frac{1}{n} \sum A P

(15)

p a r a m e t e r s = K^{2} \times C_{i n} \times C_{o u t}

(16)

In order to verify the algorithm proposed in this paper, a relevant experimental study was conducted. The experimental platform configuration is shown in Table 2.

This model used the Pytorch framework. The hyperparameter batch size was set to 16, the number of training iterations was 300, the first 50 iterations were trained by freezing with a learning rate of 0.001, the next 250 iterations had a learning rate of 0.0001, the training process used the cosine annealing algorithm and label smoothing. After 300 iterations, the optimal weight file of the model was obtained.

The cosine annealing algorithm is a method that adjusts the learning rate during training. With the increase in the epoch, the learning rate first dropped sharply and then rose abruptly before repeating this process. In this way, we could constantly escape from the local optimum and look for new optimum. Because the models with different local best points had great diversity, this method could obtain better training results.

Label smoothing is a regularization method in machine learning, which is usually used for classification problems. The purpose is to prevent the model from overconfidently predicting labels during training and to improve the poor generalization ability. Using label smoothing enhances the generalization ability of the model and produces a more accurate prediction.

4.3. Analysis of Experimental Results

In order to verify the performance of this model, we used the YOLOv4 algorithm and this improved algorithm to train foreign object data sets, and the comparison results of the accuracy and loss rate are shown in Figure 8. Figure 8a,b shows the YOLOv4 algorithm on the left and the improved algorithm in this paper on the right.

From Figure 8a, it can be seen that the accuracy of the YOLOv4 algorithm increased to 0.8 only at one-fifth of the training process, and the final training accuracy was stabilized at about 0.92; while the accuracy of the improved algorithm in this paper rapidly increased at the beginning of training, and the accuracy was maintained at about 0.98, it indicates that the algorithm in this paper not only converged quickly but also had high detection accuracy. From Figure 8b, it can be seen that the false detection rate of this algorithm was much lower than that of the YOLOv4 algorithm after the training and detection of two types of foreign objects, bolts, and gangue, indicating that this algorithm was more capable of feature extraction and higher detection accuracy than the YOLOv4 algorithm in the face of complex environments.

A series of ablation experiments were designed to test the improvement measures in this paper. Firstly, the original YOLOv4 network was used as the standard model and was tested by the Kmeans++ clustering method, adding the CBAM attention module and Ghost module, respectively. Finally, the three improvement strategies were added to the YOLOv4 model at the same time. The experiments performed all used the same dataset and were in the same environment, with all parameters kept consistent. The results are shown in Table 3.

From Table 3, we can see that the detection accuracy improved by 0.92% through by the Kmeans++ clustering method to obtain the suitable size and anchor frame for foreign object targets; the Ghostnet-CBAM-YOLOv4 network model was designed, and the backbone network of YOLOv4 was replaced by the Ghost module and transformed into a lightweight network. The detection speed increased by 2.77%. The detection speed increased by 9.8 fps. Additionally, the weight file was 44.3 MB. The CBAM attention module was added to enhance the feature extraction ability of the target in the complex environment under the mine, and the detection accuracy increased by 3.89%. We introduced a deep separable convolution instead of the 3 × 3 ordinary convolution in the model; the detection accuracy was improved by 0.17%, and the detection speed was improved by 4.2 fps. Ghostnet-CBAM-YOLOv4 had a detection accuracy of 99.13% and a detection speed of 53.4 fps, which could meet foreign object detection under conveyor belt working conditions.

The Intersection over Union (IoU) indicates the degree of overlap between the predicted bounding box and the real bounding box, and the larger the value, the better the performance of the detector. The visualization results in the test set could determine the degree of overlap between the predicted frame and the real frame, and the larger the value of the predicted frame in the visualization results, the higher the degree of overlap between the predicted frame and the real frame so as to accurately evaluate the performance of the bounding box generated by the proposed method.

I o U = \frac{A r e a o f O v e r l a p}{A r e a o f U n i o n}

(17)

Our model was initially evaluated using CIoU (Complete IoU). CIoU Loss considers the overlapping area, the center distance, and the aspect ratio of the predicted frame and the real frame. CIoU adds the loss of the detection frame scale and the loss of length and width on the basis of DIoU so that the prediction frame can be more in line with the real frame. However, the aspect ratio of CIoU describes the relative value; therefore, it is impossible to obtain more accurate results. The calculation formula for this is as follows.

C I o U = I o U - \frac{p 2 (b, b^{g t})}{c^{2}}

(18)

where

p 2 (b, b^{g t})

represents the Euclidean distance of the center point of the prediction box and the real box, respectively. c represents the diagonal distance of the smallest closure region that can contain both the prediction frame and the real frame.

α = \frac{v}{1 - I o U + v}

(19)

v = \frac{4}{π} (a r c t a n \frac{w^{g t}}{h^{g t}} - a r c t a n \frac{w}{h})

(20)

L_{C I o U} = 1 - I o U + \frac{p 2 (b, b^{g t})}{c^{2}} + α v

(21)

MIOU (mean intersection over union) is the average of the intersection and union ratios for each class in the dataset. The calculation formula is as follows. In order to accurately evaluate the performance of the bounding box, we used MIoU as the evaluation index. Adding MIoU to the evaluation framework provided a more comprehensive and informative analysis of the performance of the proposed method. By measuring the overlap between the predicted bounding boxes and ground truth, MIoU provided a more reliable indication of the accuracy of the detections. The test results of this algorithm are shown in Table 4 and Figure 9.

M I o U = \frac{1}{k + 1} \sum_{i = 0}^{k} \frac{P_{i i}}{\sum_{i = 0}^{k} P_{i j} + \sum_{i = 0}^{k} P_{i j} - P_{i i}}

(22)

i stands for true value, j stands for predicted value, and p_ij indicates that i is predicted as j.

It can be seen from Table 4 that under the same environment and experimental conditions, the average precision of our model improved by 0.19%, and 1.3 fps increased the detection speed using the MIOU evaluation standard. The precision of the model was 99.6%, and the recall was 98.7%. From Figure 9, it can be concluded that the average coincidence degree between the predicted frame and the real frame of various foreign objects was above 90%, indicating that the predicted frame and the real frame were basically coincident. The prediction frame generated by our algorithm showed good performance under the evaluation index of MIoU, which could solve various problems in complex environments and realize the real-time detection of foreign objects.

In order to verify the performance of the model and show the superiority of our algorithm in complex environments, we used YOLOv4 and this algorithm to train on the conveyor belt gangue dataset under the same environment and experimental conditions; the test results are shown in Table 5 below. The GhostNet-CBAM-YOLOv4 model improved the average accuracy by 6.1%, the precision and recall by 5.2% and 4.3%, respectively, and the detection speed by 6.1 fps compared with the original model. The test results of the conveyor belt gangue dataset in the model of this paper are shown in Figure 10.

For the actual production and transportation process, the gangue was basically the same shape and size as the coal block, which is difficult to distinguish in the dim environment. By comparing the characteristics of the two, we found that the color brightness of the coal block was higher than that of the coal gangue. From Figure 11, it can be concluded that the overlapping degree between the prediction frame and the real frame of the coal block was more than 90%, and false detection was reduced. In this harsh and complex environment, the algorithm performance of this paper still showed a certain superiority, with high detection accuracy and detection speed, which could realize the conveyor belt foreign object detection under working conditions.

In order to verify the performance of this model, we used YOLOv4 and this algorithm to train the VOC2007 and VOC2012 data sets under the same environment and experimental conditions. The test results are shown in Table 6. The data in the table show that the average precision of the GhostNet-CBAM-YOLOv4 model was 8.5% higher than that of the original model, the accuracy and recall rate were 9.1% and 7.4% higher, respectively, and the detection speed was 9.2 fps higher than that of the original model. The test results for this algorithm and the YOLOv4 algorithm VOC2007 are shown in Figure 11.

4.4. Experimental Comparison of Different Models

Under the same training environment conditions, foreign object detection was performed on the same dataset using YOLOv3, YOLOv4, Faster-RCNN, YOLOX, YOLOv7, and Ghostnet-CBAM-YOLOv4 models, respectively. The experimental results are shown in Table 7. Figure 12 shows the foreign object target detection results on the belt for the algorithm in this paper. The MAP of the YOLOv3 model was lower than the other four models, and the detection speed was slower. The YOLOv4 model demonstrated high detection accuracy and speed. The algorithm’s performance was more balanced than other network models, as shown in Figure 12a. However, it could not identify small target objects and had leakage detection. The Faster-RCNN model had a high detection accuracy and could identify multi-target foreign objects; however, its detection speed was slow, and it was challenging to achieve the requirement of real-time foreign object detection. The algorithm in this paper introduced the GhostNet-CBAM-YOLOv4 model, simplifying the network structure and reducing the number of parameters. The detection speed and accuracy were better than the YOLOv7 model, and the weight file was much smaller than YOLOv7. The average detection accuracy and detection speed of the advanced algorithm were slightly higher than that of YOLOX, which shows the superiority of this algorithm. This allows our solutions to be deployed on devices with limited resources, such as embedded systems. As shown in Figure 12d, the second smallest target object could be effectively detected without any leakage, which could realize real-time detection of foreign objects under working conditions.

The experimental results derived from the improved algorithm show that our model’s average detection accuracy and detection speed outperformed Ren’s Fast-YOLOv3 algorithm and Du’s improved SSD algorithm and was capable of recognizing foreign objects in high-speed motion conveyor belts. Compared with HAO’s improved YOLOv5 model, the number of parameters in our model was smaller, the detection speed was faster, and its recognition effect on small target objects was significantly improved, with no obvious misdetection or omission. Our model represents great progress in detection accuracy and speed, simplifying the parameters, and could be deployed on equipment with limited resources, realizing the real-time detection of foreign objects in conveyor belts.

In order to verify the stability of this algorithm, we used the improved algorithm to train VOC2007 + 2012 and coal gangue data sets and obtained 46 sets of data. The experimental conditions and environment remain unchanged. We conducted a Kappa analysis on the data, and the results are shown in Table 8.

In this paper, the average detection accuracy of MAP was divided into three interval categories, A (82,85); B(79,82]; C(76,79]. After the Kappa test, the statistic was 6.823, p < 0.001, which was statistically significant. It could be considered that there was a statistical correlation between VOC and the data results trained by Gangue in the improved model. The Kappa value of correlation strength was 0.613, and the consistency was good.

5. Conclusions

Aiming to solve the problem of the difficult detection of high-speed moving foreign objects on conveyor belts, this paper designed a lightweight foreign object detection model incorporating CBAM’s YOLOv4. The Kmeans++ anchor frame optimization method was used to preprocess the dataset and obtain suitable anchor frames for foreign object target size. The GhostNet- CBAM-YOLOv4 network model was designed to reduce the number of parameters. The CBAM attention module was embedded after the feature layer to enhance the extraction of features from the network. Introducing a depth-separable convolution instead of the 3 × 3 ordinary convolution optimized the model and improved the detection speed. Through the comparison experiments of multiple models in three datasets and the same experimental environment, it was shown that the improved GhostNet-CBAM-YOLOv4 model in this paper dramatically improved the detection speed and accuracy and could satisfy the conditions of detecting foreign objects under real-time working conditions. It also demonstrated good performance for small target detection.

In the future, complex environments and variable scenes could affect target detection performance. Methods such as image enhancement should be utilized to improve the quality of dataset images and migration learning methods when applied to datasets from various scenes to improve the performance of target detection in complex scenes.

Author Contributions

Conceptualization, J.L.; Methodology, H.Q.; Software, H.Q. and J.G.; Validation, H.Q.; Formal analysis, H.Q.; Resources, J.L. and L.Y.; Writing – original draft, H.Q.; Writing – review & editing, L.Y.; Supervision, L.Y.; Project administration, J.L. and J.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Hebei Natural Science Foundation (grant number E2019402436).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Acknowledgments

The authors would like to express heartfelt thanks to the Key Laboratory of Intelligent Industrial Equipment Technology of Hebei Province (Hebei University of Engineering).

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhang, C.; Wang, P.; Wang, E.; Chen, D.; Li, C. Characteristics of coal resources in China and statistical analysis and preventive measures for coal mine accidents. Int. J. Coal Sci. Technol. 2023, 10, 22. [Google Scholar] [CrossRef]
Deng, Y.; Jiang, W.; Wang, Z. Economic resilience assessment and policy interaction of coal resource oriented cities for the low carbon economy based on AI. Resour. Policy 2023, 82, 103522. [Google Scholar] [CrossRef]
Liu, B. Image detection and fault recognition method for longitudinal tearing of mining conveyor belt. Coal Min. Mach. 2018, 39, 144–146. [Google Scholar]
Khasawneh, N.; Fraiwan, M.; Fraiwan, L.; Khassawneh, B.; Ibnian, A. Detection of COVID-19 from Chest X-ray Images Using Deep Convolutional Neural Networks. Sensors 2021, 21, 5940. [Google Scholar] [CrossRef]
Wu, K.X.; Song, J. Automatic Recognition of Coal and Gangue Based on Gray Level Co occurrence Matrix. Coal Eng. 2016, 48, 98–101. [Google Scholar]
Tan, C.C.; Yang, J.M. Research on the Extraction of Gray Information and Texture Features of Coal and Gangue Images. Ind. Mine Autom. 2017, 43, 27–31. [Google Scholar]
Zhang, H.; Jin, X.; Wu, Q.J.; Wang, Y.; He, Z.; Yang, Y. An Improved Gaussian Mixture Model for Coal Gangue Video Detection. J. Cent. South Univ. (Sci. Technol.) 2018, 49, 118–123. [Google Scholar]
Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE International Conference on Computer Vision, Stanford, CA, USA, 27–30 June 2016; IEEE Computer Society Press: Los Alamitos, CA, USA, 2016; pp. 779–788. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot multibox detector. In Proceedings of the 2015 14th European Conference on Computer Vision (ECCV), Santiago, Chile, 7–13 December 2015; pp. 1–17. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Wang, J.; Chen, Y.; Dong, Z.; Gao, M. Improved YOLOv5 network for real-time multi-scale traffic sign detection. Neural Comput. Appl. 2022, 35, 7853–7865. [Google Scholar] [CrossRef]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Cham, Switzerland, 23–27 October 2020; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
Wu, S.; Ding, E.; Yu, X. Foreign body identification method of conveyor belt based on improved FPN. Coal Mine Saf. 2019, 50, 127–130. [Google Scholar]
Lv, Z. Research on Image Recognition of Foreign Body in Coal Mine Belt Transportation in Complex Environment. Master’ Thesis, China University of Mining and Technology, Beijing, China, 2020. [Google Scholar]
Khasawneh, N.; Fraiwan, M.; Fraiwan, L. Detection of K-complexes in EEG waveform images using faster R-CNN and deep transfer learning. BMC Med. Inform. Decis. Mak. 2022, 22, 297. [Google Scholar] [CrossRef] [PubMed]
Jiang, Q.; Jia, M.; Bi, L.; Zhuang, Z.; Gao, K. Development of a core feature identification application based on the Faster R-CNN algorithm. Eng. Appl. Artif. Intell. 2022, 115, 105200. [Google Scholar] [CrossRef]
Du, J.; Chen, R.; Hao, L.; Shi, Z. Foreign Body Detection in Coal Mine Belt Conveyor. Ind. Mine Autom. 2021, 47, 77–83. [Google Scholar]
Ren, G.; Han, H.; Li, C.; Yin, Y. Detection of Foreign body in Coal Mine Belt Transportation based on Fast_YOLOv3 Algorithm. Ind. Mine Autom. 2021, 47, 128–133. [Google Scholar]
Du, J.; Shi, Z.; Hao, L.; Chen, R. Research on lightweight coal and gangue target detection method. Ind. Mine Autom. 2021, 47, 119–125. [Google Scholar]
Hao, S.; Zhang, X.; Ma, X.; Sun, S.; Wen, H.; Wang, J.; Bai, Q. Foreign Body Detection in Coal Mine Conveyor Belt Based on CBAM-YOLOV5. J. China Coal Soc. 2022, 47, 4147–4156. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the 15th European Conference, Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–19. [Google Scholar]
Li, Z.; Li, B.; Ni, H.; Ren, F.; Lv, S.; Kang, X. An Effective Surface Defect Classification Method Based on RepVGG with CBAM Attention Mechanism (RepVGG-CBAM) for Aluminum Profiles. Metals 2022, 12, 1809. [Google Scholar] [CrossRef]
Wang, S.H.; Fernandes, S.L.; Zhu, Z.; Zhang, Y.D. AVNC: Attention-Based VGG-Style Network for COVID-19 Diagnosis by CBAM. IEEE Sens. J. 2022, 22, 17431–17438. [Google Scholar] [CrossRef]
Al-Malla, M.A.; Jafar, A.; Ghneim, N. Image captioning model using attention and object features to mimic human image understanding. J. Big Data 2022, 9, 20. [Google Scholar] [CrossRef]
Liu, D.; Gao, S. Aircraft Detection in Remote Sensing Imagery Based on Improved YOLOv4. J. Phys. Conf. Ser. 2022, 2260, 012063. [Google Scholar] [CrossRef]
Hu, Y.; Liu, G.; Chen, Z.; Guo, J. Object Detection Algorithm for Wheeled Mobile Robot Based on an Improved YOLOv4. Appl. Sci. 2022, 12, 4769. [Google Scholar] [CrossRef]
Juyal, A.; Sharma, S.; Matta, P. Deep learning methods for object detection in autonomous vehicles. In Proceedings of the 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 3–5 June 2021; pp. 751–755. [Google Scholar]
Omar, M.; Kumar, P. Detection of roads potholes using YOLOv4. In Proceedings of the 2020 International Conference on Information Science and Communications Technologies (ICISCT), Karachi, Pakistan, 8–9 February 2020; IEEE: Karachi, Pakistan, 2020; pp. 1–6. [Google Scholar]
Tian, M.; Li, X.; Kong, S.; Wu, L.; Yu, J. A modified YOLOv4 detection method for a vision-based underwater garbage cleaning robot. Front. Inf. Technol. Electron. Eng. 2022, 23, 1217–1228. [Google Scholar] [CrossRef]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features From Cheap Operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 19 June 2020; IEEE: Seattle, WA, USA, 2020; pp. 1580–1589. [Google Scholar]
Zhang, C.; Zhou, J.; Tang, J.; Wu, F.; Cheng, H.; Wei, S. Deep unfolding for singular value decomposition compressed ghost imaging. Appl. Phys. B 2022, 128, 185. [Google Scholar] [CrossRef]
Esposito, C.; Landrum, G.A.; Schneider, N.; Stiefl, N.; Riniker, S. GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning. J. Chem. Inf. Model. 2021, 61, 2623–2640. [Google Scholar] [CrossRef]
Yang, X.; Yu, Z.; Jiang, P.; Xu, L.; Hu, J.; Wu, L.; Zou, B.; Zhang, Y.; Zhang, J. Deblurring Ghost Imaging Reconstruction Based on Underwater Dataset Generated by Few-Shot Learning. Sensors 2022, 22, 6161. [Google Scholar] [CrossRef]
Gong, W.; Tian, J.; Liu, J. Underwater Object Classification Method Based on Depthwise Separable Convolution Feature Fusion in Sonar Images. Appl. Sci. 2022, 12, 3268. [Google Scholar] [CrossRef]
Goicovich, I.; Olivares, P.; Román, C.; Vázquez, A.; Poupon, C.; Mangin, J.F.; Guevara, P.; Hernández, C. Fiber Clustering Acceleration With a Modified Kmeans++ Algorithm Using Data Parallelism. Front. Neuroinform. 2021, 15, 727859. [Google Scholar] [CrossRef] [PubMed]
Ma, Y.; Cheng, W. Optimization and Parallelization of Fuzzy Clustering Algorithm Based on the Improved Kmeans++ Clustering. IOP Conf. Ser. Mater. Sci. Eng. 2020, 768, 072106. [Google Scholar] [CrossRef]
Shahrezaei, M.H.; Tavoli, R. Parallelization of Kmeans++ using CUDA. arXiv 2019, arXiv:1908.02136. [Google Scholar]
Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]

Figure 1. YOLOv4 network structure.

Figure 2. GhostNet module network structure.

Figure 3. Structure of ghost bottlenecks with different step sizes.

Figure 4. GhostNet-CBAM-YOLOv4 structure.

Figure 5. CBAM attention module.

Figure 6. Depth separable convolution structure.

Figure 7. Anchor frame clustering results at K = 9.

Figure 8. Comparison of experimental results between the proposed algorithm and YOLOv4 algorithm.

Figure 9. Using MIOU foreign object detection results.

Figure 10. Detection results on the gangue dataset.

Figure 11. Detection results of target objects on the VOC2007.

Figure 12. The detection results of foreign objects on the belt.

Table 1. Number of images for the non-coal foreign object.

Category	Screw	Waste	U-Tube	Crabstick	Nut	Iron Plate	Quadripod
number	1206	2816	685	532	314	912	71

Table 2. Computer software and hardware configuration.

Configuration Name	Version Parameters
Operating System	ubuntu 18.04LTS
GPU	NVIDIA GeForce GTX 3060 Ti
CPU	Intel Core [email protected] GHz × 6 CPUs
Anaconda	3
CUDA	10.1.105
Python	3.8
Pytorch	1.1.0

Table 3. Ablation experimental results.

Baseline Models	Improvement Strategies				MAP (%)	Frequency (Frame-S⁻¹)	Weights File Size (MB)
Baseline Models	GhostNet	CBAM	Anchor Frame Optimization	Depth Separable Convolution	MAP (%)	Frequency (Frame-S⁻¹)	Weights File Size (MB)
YOLOv4	-	-	-		92.97%	38.5	244
YOLOv4	√	-	-		95.74%	48.3	44.3
YOLOv4	√	√	-		97.94%	46.5	45.9
YOLOv4		√	√		98.19%	31.5	246
YOLOv4	√	√	√		98.86%	49.2	42.8
YOLOv4	√	√	√	√	99.13%	53.4	35.8

Table 4. MIoU was used to evaluate the performance results of the improved algorithm on foreign data sets.

Methods	Precision (%)	Recall (%)	MAP (%)	Frequency (Frame-S⁻¹)
Algorithm of this paper	99.6	98.7	99.32	54.7

Table 5. Performance comparison between the improved algorithm and YOLOv4 on the conveyor belt gangue dataset.

Methods	Precision (%)	Recall (%)	MAP (%)	Frequency (Frame-S⁻¹)
YOLOv4	70.2	68.9	72.4	29.1
Algorithm of this paper	75.4	73.2	78.5	35.2

Table 6. Performance comparison between the improved algorithm and YOLOv4 on VOC data sets.

Methods	Precision (%)	Recall (%)	MAP (%)	Frequency (Frame-S⁻¹)
YOLOv4	79.8	73.1	76.8	37.1
Algorithm of this paper	88.9	80.5	85.3	46.3

Table 7. Comparison of the foreign matter detection of different models.

Methods	Input Image Size	MAP (%)	Frequency (Frame-S⁻¹)	Weights File Size (MB)
YOLOv3	416 × 416	88.31%	25.3	117
YOLOv4	416 × 416	92.97%	38.5	242
Faster RCNN	416 × 416	90.83%	12.3	460
Algorithm of this paper	416 × 416	99.32%	54.7	35.8
YOLOX	416 × 416	99.21%	52.9	36.7
YOLOv7	416 × 416	99.89%	50.2	142

Table 8. Kappa test results.

VOC	Gangue			Total	Kappa	T	p
VOC	A	B	C	Total	Kappa	T	p
A	15	4	2	21	0.613	6.823	<0.001
B	4	15	2	21
C	2	2	16	20
Total	21	21	20	62

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, J.; Qiao, H.; Yang, L.; Guo, J. Improved Lightweight YOLOv4 Foreign Object Detection Method for Conveyor Belts Combined with CBAM. Appl. Sci. 2023, 13, 8465. https://doi.org/10.3390/app13148465

AMA Style

Liu J, Qiao H, Yang L, Guo J. Improved Lightweight YOLOv4 Foreign Object Detection Method for Conveyor Belts Combined with CBAM. Applied Sciences. 2023; 13(14):8465. https://doi.org/10.3390/app13148465

Chicago/Turabian Style

Liu, Jiehui, Hongchao Qiao, Lijie Yang, and Jinxi Guo. 2023. "Improved Lightweight YOLOv4 Foreign Object Detection Method for Conveyor Belts Combined with CBAM" Applied Sciences 13, no. 14: 8465. https://doi.org/10.3390/app13148465

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved Lightweight YOLOv4 Foreign Object Detection Method for Conveyor Belts Combined with CBAM

Abstract

1. Introduction

2. Relevant Network Models

2.1. YOLOv4 Algorithm

2.2. GhostNet Module

3. Foreign Object Detection Method Based on GhostNet-CBAM-YOLOv4 for Conveyor Belts

3.1. GhostNet-CBAM-YOLOv4 Model

3.2. CBAM Attention Mechanism

3.3. Depth Separable Convolution

3.4. Anchor Frame Optimization Method

4. Experiments

4.1. Data Acquisition and Processing

4.2. Experimental Preparation

4.3. Analysis of Experimental Results

4.4. Experimental Comparison of Different Models

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI