Coal Gangue Target Detection Based on Improved YOLOv5s

Wang, Shuxia; Zhu, Jiandong; Li, Zuotao; Sun, Xiaoming; Wang, Guoxin

doi:10.3390/app132011220

Open AccessArticle

Coal Gangue Target Detection Based on Improved YOLOv5s

by

Shuxia Wang

¹,

Jiandong Zhu

²

,

Zuotao Li

²,

Xiaoming Sun

^1,* and

Guoxin Wang

²

¹

Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China

²

Heilongjiang Provincial Key Laboratory of Optical 3D Measurement and Detection, Heilongjiang University of Science and Technology, Harbin 150022, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(20), 11220; https://doi.org/10.3390/app132011220

Submission received: 13 September 2023 / Revised: 5 October 2023 / Accepted: 11 October 2023 / Published: 12 October 2023

Download

Browse Figures

Versions Notes

Abstract

:

Coal gangue sorting is a necessary process in coal mine production, and removing gangue is the basis for the coal production of clean energy; it is also an important approach to reduce the cost of washing, improve the grade of finished coal and increase the economic efficiency of coal mining enterprises. For the problem of high similarity and low-degree dynamic recognition of coal and gangue, a coal gangue target detection method based on improved YOLOv5s is proposed. Based on the YOLOv5s network, the decoupled head and SimAM attention mechanism are introduced and the CSP module in the neck part of YOLOv5s is replaced with the VoV-GSCSP structure. The experimental results show that the proposed method improves the mAP value by 6.1% over YOLOv5s in the gangue target detection task, while maintaining a higher detection speed. The coal gangue classification precision reaches 99.7% when tested on 1479 images. Compared with YOLOv5 series, YOLOv7 series, SSD and Faster-RCNN, the proposed method invariably yields higher precision and detection speed to meet the requirements of real-time detection. The experiments prove that the method proposed in this paper can be applied to the coal gangue sorting industry for fast and high-precision identification of coal gangue.

Keywords:

coal gangue; target detection; YOLOv5s; decoupled head; attention mechanism

1. Introduction

In the process of coal mining, gangue is a derivative of coal mining. Due to its low calorific value, gangue mixed into coal not only reduces the calorific value of coal, but also represents a major source of pollution [1]. As the quantity of high-quality coal decreases, the quality of raw coal is gradually reduced, and the gangue content in raw coal is also rising. In order to solve the problems of extensive coal utilization, low energy efficiency and heavy pollution, intelligent and clean sorting of coal gangue is an important link in the mining process, which is of great significance to improve the quality of raw coal and promote the clean and efficient utilization of coal.

At present, the main methods of coal gangue sorting are jigging coal selection, heavy medium selection, the photoelectric detection method of coal selection, near-infrared identification, the ray method of coal selection and image recognition methods [2,3,4]. Jigging and heavy medium sorting will waste water resources, for coal producing areas in arid regions, they cannot afford the consumption of water resources brought about by wet sorting, and the wastewater produced in the coal sorting process also leads to serious pollution and high operating costs. The photoelectric detection method improves the recognition precision of coal and gangue, but the slow speed impedes its universal utilization. Near-infrared recognition can collect more features and remove background interference, and the recognition precision is higher; the disadvantage is that the test sensitivity is lower, and the modeling difficulty is high. X-ray recognition precision is high, and it is easy to operate; however, it is prone to misrecognition for varied thicknesses of the gangue, which will reduce the recognition precision. γ-ray for the recognition of gangue has strong anti-interference ability and the recognition precision is higher, but the damage to the human body is life-threatening. The image recognition method features simple equipment, a high degree of automation, high recognition efficiency and strong portability. With the development of artificial intelligence technology in recent years, artificial intelligence is being integrated in more and more fields; image recognition, as an important application field of artificial intelligence, has undergone rapid development. Combining machine learning technology and artificial intelligence for coal gangue target detection can solve the problems of low recognition precision and unstable recognition, and is the future development trend of intelligent sorting of coal gangue. At present, the relevant research on coal gangue image recognition mainly includes two categories: image-based traditional methods and deep learning methods.

Image processing and machine learning have achieved certain results in the task of coal gangue recognition; machine learning can be based on the feature matrix extracted from image processing for category classification, which is of great significance in the field of coal gangue sorting. There are scholars who constantly propose and improve some non-contact gangue selection methods. Zhao et al. used X-ray transmission imaging of coal and gangue differently to achieve coal gangue sorting [5]; despite the realization of being non-contact, the method has been limited by radiation hazards. The grey scale symbiotic matrix is a statistically based texture feature extraction method, which provides information about the direction of the image grey scale, spacing and the magnitude of the change. Yuan et al. used this method for the study of coal gangue identification; the shortcomings of this method are that it does not fully capture the graphical characteristics of the local grey scale, and so the effect is less than ideal for local identification [6]. Numerous researchers have used image processing and machine learning methods to bring progress to coal gangue sorting to a large extent [7,8,9,10,11,12,13], but these methods generally have the demerits of complex data preprocessing, detection of a single dimension, recognition of low precision, poor generalization and so on.

Deep learning has developed faster in recent years, compared with other recognition methods; coal gangue sorting based on deep learning has the advantages of being non-contact with high security, high data utilization and high recognition precision and speed, and it constitutes the main research direction in the field of coal gangue target detection. Li used ResNet-50 and Soft-NMS to improve Faster R-CNN to improve the model’s detection and localization of coal and gangue. The input image needs to go through the feature extraction region recommended to the candidate region, and then the output needs to be classified; thus, the detection speed is slow and cannot meet the real-time requirements in the practical situations [14]. Pu et al. used migration learning to train the collected coal and gangue dataset; the precision of the validation set was only 82.5%, and the performance of the model needs to be improved [15]. ALFARZAEAI et al. used a heat map as the standard image and established a CGR-CNN coal gangue recognition model; the model training precision reached 99.36% and the validation precision reached 95.09% [16], but it relied on the effect of the preprocessing of the image. Liu et al., based on the target detection network YOLOv4, used a Laplacian operator and Gaussian filter to control for the effects of dust and impact on the image, and the recognition precision was improved by 1%~2% on the basis of the original network [17]. Cao et al. used migration learning to improve the AlexNet feature extraction network and combined it with the RPN network to obtain the classification information and pixel coordinates of the coal and gangue; the precision of the model detection was 90.17% [18]. Gao et al. used a U-shaped complete convolutional neural network to segment the coal gangue, and used the probability map output from this network to obtain the location and shape information of the coal gangue, which was only effective in specific environments and could not be used when the geological conditions of the coalfield varied greatly [19]. Lai et al. proposed an intelligent separation method of coal gangue based on multispectral imaging technology and target detection, which detects the coal gangue with the improved target detection model YOLO v4, and accurately identifies coal and coal gangue [20]. Compared with the traditional image recognition method, the deep learning method can be used to improve the recognition precision of coal gangue sorting.

This paper improves the network structure on the basis of the YOLOv5s deep learning network, introduces Slim-neck, the SimAM attention mechanism and decoupled head, and proposes the decoupled Head-GSConv-VoV-GSCSP-YOLOv5s-SimAM (DGSV-YOLOv5s-SA) network, with the following advantages:

(1): While maintaining a high detection speed in the field of coal gangue detection, the precision is significantly improved over the original network.
(2): It pays more attention to the global features and increases the receptive field.
(3): It provides an important reference for the coal gangue sorting industry.

This paper is structured as follows: The introduction is given in Section 1. The pre-improvement deep learning network YOLOv5s and the improved model DGSV-YOLOv5s-SA proposed in this paper are presented in Section 2. The experimental platform, experimental metrics and dataset of this paper are provided in Section 3. Finally, the conclusion is given in Section 4.

2. Network Model Improvements

2.1. YOLOv5s Network Architecture

YOLOv5 is a object detection algorithm based on deep learning that achieves a good balance between speed and precision [21]. YOLOv5 adopts a single-stage detection framework, which transforms the target detection task into a regression problem and directly performs target localization and classification in a single network. YOLOv5 is divided into YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l and YOLOv5x based on model size, with YOLOv5s having smaller model depth.

The network structure of YOLOv5s is shown in Figure 1, consisting of a Backbone network, Neck section and Head section. The Backbone network is mainly composed of Conv+BN+SiLu (CBS) and Cross-Stage Partial (CSP) modules. The CBS module includes convolutional layers, batch standardization and SiLU activation functions. The CSP module divides the feature map into two parts, allowing its gradient flow to propagate through different paths, and then obtains richer gradient fusion information through feature fusion. The Neck section can obtain strong semantic and localization information from feature maps. After extracting features from the input image through Backbone and Neck networks, three feature maps of different sizes are the outputs of the Head section to predict the targets of different sizes.

2.2. Decoupled Head

Decoupled head refers to the decomposition of the detection task into two parts, the classification task and the regression task, which are processed using different neural network structures [22].

Specifically, the classification of target detection mainly relies on the texture features of the target, while the regression task mainly relies on the edge features of the target. The decoupled head splits the fully connected layer in the original YOLO detection network into two sub-networks, one for classification and the other for regression. In the coal gangue sorting work, the detection object and the detection background have high similarity and are prone to edge positioning inaccuracies; the decoupled head clarifies the classification and regression tasks, which can focus on the edge features more clearly, reduce the positioning deviation and improve the detection precision. The structure of the decoupled head is shown in Figure 2, where different classification and regression sub-networks are used for the detection task, thus improving the flexibility and applicability of the model, and the precision is significantly improved.

2.3. SimAM Attention Mechanism

In the task of coal gangue sorting, the colors of the conveyor belt, coal and coal gangue are very similar, which brings a challenge to the target detection algorithm. The SimAM attention mechanism can help the model learn the similarity and difference between the target and the background, and so background interference is suppressed better [23]. The SimAM attention mechanism is based on similarity calculation and applied in neural network models.

Defining the energy function of each neuron is as follows:

e_{t} (w_{t}, b_{t}, y, x_{i}) = {(y_{t} - \hat{t})}^{2} + \frac{1}{M - 1} \sum_{i = 1}^{M - 1} {(y_{o} - {\hat{x}}_{i})}^{2}

(1)

Using binary labeling (i.e., 1 and −1) for

y_{t}

and

y_{o}

in the formula and adding regularization, the final energy function is as follows:

e_{t} (w_{t}, b_{t}, y, x_{i}) = \frac{1}{M - 1} \sum_{i = 1}^{M - 1} {(- 1 - (w_{t} x_{i} + b_{t}))}^{2} + {(1 - (w_{t} t + b_{t}))}^{2} + λ w_{t}^{2}

(2)

The minimum energy can be expressed as follows:

e_{t}^{*} = \frac{4 ({\hat{σ}}^{2} + λ)}{{(t - \hat{μ})}^{2} + 2 {\hat{σ}}^{2} + 2 λ}

(3)

In the formula,

\hat{t} = w_{t} t + b_{t}

{\hat{x}}_{i} = w_{t} x_{i} + b_{t}

w_{t} = - \frac{2 (t - μ_{t})}{{(t - μ_{t})}^{2} + 2 σ_{t}^{2} + 2 λ}

b_{t} = - \frac{1}{2} (t + μ_{t}) w_{t}

μ_{t} = \frac{1}{M - 1} \sum_{i = 1}^{M - 1} x_{i}

σ_{t}^{2} = \frac{1}{M - 1} \sum_{i = 1}^{M - 1} {(x_{i} - μ_{t})}^{2}

\hat{μ} = \frac{1}{M} \sum_{i = 1}^{M} x_{i}

{\hat{σ}}^{2} = \frac{1}{M} \sum_{i = 1}^{M} {(x_{i} - \hat{μ})}^{2}

In the above formula,

t

,

x_{i}

and

y

, respectively, represent the target neurons participating in the calculation, other neurons in the channel and calibration values.

i

represents the spatial index,

w_{t}

and

b_{t}

represent the weight and bias of the t-th layer neurons, M represents the number of energy functions in the channel and

μ_{t}

and

σ_{t}^{2}

, respectively, represent the mean and variance of other neurons in the channel except for

t

.

Formula (3) indicates that the lower the energy, the greater the difference between neuron

t

and the surrounding neurons, and the higher the importance. Therefore, the importance of neurons can be obtained through

\frac{1}{e_{t}^{*}}

.

According to the similarity calculation results, feature representations of higher similarity receive higher weights and more targeted attention to task-related information. In this paper, the SimAM attention mechanism is added to the Neck part; by fusing the feature representation with the corresponding weights, the final enhanced feature map is generated for subsequent model processing. SimAM enables the neural network to adaptively focus on the important features of the input data and improves the performance of the model. For the dataset with a single-colored conveyor belt as the background, SimAM can effectively capture the difference between the background features and the target features, which improves the target’s ability to identify the similar background and achieves more accurate feature representation.

2.4. Slim-Neck Structure

GSConv is a lightweight convolutional approach, which reduces the amount of computation while maintaining a similar output to the ordinary convolution. The authors of Slim-neck designed a VoV-GSCSP structure with reference to VoVnet, GSConv and CSPnet [24], which improves the recognition efficiency and meets the real-time requirements in the field of coal gangue sorting.

The principle of GSConv is to combine convolution and deep convolution for calculation. Deep convolution only performs convolution operations on the input channel, which can effectively reduce the computational workload and parameter count and improve the training and inference efficiency of the model, and is suitable for application in resource-constrained scenes. Meanwhile, GSConv effectively combines the output of deep convolution and ordinary convolution through subsequent shuffle operations, thereby achieving similar output results with ordinary convolution at a lower computational cost. For coal gangue sorting tasks with high efficiency requirements, the advantage of high efficiency of GSConv is particularly important. Therefore, for coal gangue image target detection, replacing the convolutional kernel of the Neck part of YOLOv5s with GSConv means it can better adapt to the characteristics of the image and improve the detection precision and efficiency. The structure of GSConv is shown in Figure 3.

As shown in Figure 4, VoV-GSCSP is a CSP structure based on the GSConv. By utilizing multiple convolutions and GSConv, multi-dimensional feature fusion is performed, and it can more comprehensively extract features and understand images. The VoV-GSCSP structure replaces the CSP module of the Neck part of YOLOv5s, which can conduct a global analysis of the location, shape, texture and other features of coal gangue in the original image, and increase the receptive field of the network, so as to better extract the global feature information. For the dataset with only coal gangue and the conveyor belt without other target interference, the lack of target information caused by the large receptive field can be avoided, and the precision of coal gangue sorting can be ensured while improving the detection speed.

Due to the improvement in the Neck section making it more lightweight, mentioned above, the number of parameters in the Neck section of the network was reduced. Slim-neck was constructed to improve the network’s ability to analyze global features, increase the receptive field and improve the detection speed.

2.5. DGSV-YOLOv5s-SA Network Structure

YOLOv5s is an efficient, accurate and lightweight object detection algorithm, which has become one of the research hotspots in the field of object detection. However, there is still some room for improvement in the recognition precision of this network in the field of coal gangue sorting; therefore, on the basis of YOLOV5s, the decoupled head and SimAM attention mechanisms are introduced. The CSP module of the Neck section of YOLOv5s is replaced with a VoV-GSCSP structure, resulting in DGSV-YOLOv5s-SA. The network structure is shown in Figure 5. The modules added as extensions to the original YOLOv5s architecture are enclosed in red boxes to distinguish them.

3. Experimental Results and Analysis

3.1. Experimental Platform

The visual system experimental platform consisted of four strip light sources, an industrial camera, a lens and a sunshade. The industrial camera model was Haikang MV-CA050-20UC, as shown in Figure 6. We placed coal and coal gangue samples on a conveyor belt, collected and identified images in a dark box and then transmitted the classification information of coal and coal gangue as well as the coordinate information of coal gangue to the robotic arm. The robotic arm synchronously tracked the coal gangue target, and finally grasped it. The experimental software and hardware environment is shown in Table 1.

3.2. Evaluation Indicators

In target detection algorithms, recall, F1 value, mAP and inference speed are often chosen as the evaluation metrics for network models. Under the specified hardware and software conditions, the inference speed is the number of images processed per second by the network model.

P = \frac{N_{T P}}{N_{T P} + N_{F P}}

(4)

R = \frac{N_{T P}}{N_{T P} + N_{F N}}

(5)

F 1 = \frac{2 P R}{P + R}

(6)

m A P = \frac{\sum_{k = 1}^{C} P_{e k}}{C}

(7)

L o s s = l o s s_{o b j} + l o s s_{b o x} + l o s s_{c l s}

(8)

In the equation, the following definitions are applied:

$P$ —Precision, as the proportion of correct portions detected to the number of positive samples detected.
$R$ —Recall, as the proportion of the fraction of correct detections to the total number of positive samples.
$F 1$ —An important metric for evaluating binary classification problems; the higher the value, the better the detection.
$m A P$ —Mean average precision.
$N_{T P}$ —The number of positive samples that the model correctly detects.
$N_{F P}$ —The number of samples that were incorrectly detected.
$N_{F N}$ —The number of samples that were missed.
$P_{e k}$ —Average precision.
$C$ —Number of categories.
$L o s s$ —Distance between predicted and desired information (labels).
$l o s s_{o b j}$ —Weighted confidence level loss.
$l o s s_{b o x}$ —Weighted detection box loss.
$l o s s_{c l s}$ —Weighted classification loss.

3.3. Dataset Establishment

In order to meet the actual working conditions, the conveyor belt and the light source system were built in the laboratory, and the coal and coal gangue image data on the conveyor belt were collected using an industrial camera in a dark box. After screening and data enhancement, a total of 14,790 coal and coal gangue image datasets were obtained. The datasets were divided into training set, validation set and test set according to the ratio of 8:1:1. The dataset images are shown in Figure 7, labeling the image of the coal block with c_x and the image of the coal gangue with cg_x.

3.4. Experimental Results and Analysis

The loss function curve of the training process is shown in Figure 8, which shows that the improved network has a smaller loss function value during training than the original network and has a faster convergence rate.

To further validate the effectiveness of the proposed method in this paper, a comparison was made with YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x, YOLOv7, YOLOv7x, SSD and Faster-RCNN [25,26,27]. The experimental parameters were uniformly set, with a training batch size of 16 and training period epoch of 60, and the validation set precision curve is shown in Figure 9. As can be seen from the figure, compared with other networks, DGSV-YOLOv5s-SA shows better performance under the same training period.

The evaluation metrics of the experimental results were compared, as shown in Table 2. Compared with Yolov5s, YOLOV5m, YOLOV5l, YOLOv5x, Yolov7, YOLOv7x, SSD and Faster-RCNN, the mAP value of DGSV-YOLOv5s-SA increased by 6.1%, 5.5%, 5.3%, 3.6%, 5.8%, 5.2%, 6.1% and 7.9%, respectively. In addition, inference speed, recall rate and F1 value in the test set were also significantly improved.

The relationship between precision and inference speed is shown in Figure 10. YOLO series and SSD have faster inference speeds as single-stage detection networks, while Faster-RCNN, which is a two-stage target detection algorithm, performs poorly in terms of precision. This is largely due to the fact that the convolution volume of Faster-RCNN is too large, and the feature map obtained via the feature extraction network is a single one, which lacks the fusion of multi-level features. More convolution or pooling can result in a larger receptive field, but the coal gangue is close to the color of the conveyor belt, the edges of the target are not clear, and the over-convolution or pooling may also result in a loss of information, thus affecting the precision. It can be concluded that DGSV-YOLOV5S-SA can enhance the feature extraction and fusion ability of the whole network and improve the performance of the network more comprehensively, resulting in higher detection precision of the coal gangue target and faster inference speed.

To further validate the effectiveness of the method, ablation experiments were carried out on the proposed model to analyze the effect of each improvement module, and the experimental results are shown in Table 3. The compatibility of Slim-neck and SimAM with the coal gangue sorting dataset leads to faster inference speed, and the decoupled head reduces the inference speed somewhat due to a large number of parameters.

In order to further explore the training process of the neural network and observe the effect of the improved module more intuitively, the Grad-CAM [28] heat map algorithm was used for the visualization of the algorithm’s feature map. As can be seen in Figure 11, the improved network model pays more attention to the global features and increases the receptive field; at the same time, irrelevant information about the background is suppressed. Moreover, the feature extraction is more comprehensive, which in turn improves the precision of coal gangue target detection.

4. Conclusions

The deep learning method was used to detect targets of coal gangue; in order to ensure that the recognition effect was not affected by natural light, the image acquisition platform was built, the coal and coal gangue datasets were homemade and the deep convolutional neural network model DGSV-YOLOv5s-SA was proposed on the basis of the YOLOv5s network model. The experimental results show that the proposed method can improve the mAP value of Yolov5s by 6.1% and maintain higher detection speed in the task of the target detection of coal gangue. By testing 1479 images of the test set, the classification precision of coal gangue was 99.7%. Compared with Yolov5 series, Yolov7 series, SSD and Faster-RCNN, the proposed method has higher precision and detection speed, and meets the requirements of real-time detection. The method presented in this paper can be applied to the coal gangue sorting industry to realize fast and high-precision identification of coal gangue. The method of using deep learning to separate coal gangue is more accurate than traditional machine learning and does not need to select the characteristics of coal gangue manually, and so there is more room for development, but it is more dependent on datasets than the machine learning method; there are some limitations in the number and type of datasets currently constructed, which may limit the generalization ability of the deep learning model to some extent. In the samples with incorrect detection, coal and coal gangue were mixed, and some sample categories were even difficult to define; thus, the number of categories could be increased, which will be the next step in this work.

Author Contributions

Methodology, S.W., Z.L., X.S. and G.W.; Validation, S.W., J.Z. and Z.L.; Investigation, Z.L.; Data curation, J.Z.; Writing—original draft, S.W., J.Z. and G.W.; Writing—review & editing, S.W., X.S. and G.W.; Project administration, G.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chang, Y.X.; Zhu, X.S.; Song, C.B.; Wei, Z.R. Hazard of gangue and its control. Chin. J. Geol. Hazard Control 2001, 02, 42–46. [Google Scholar]
Guo, X.J. Research and application of coal gangue separation technology. Coal Eng. 2017, 49, 74–76. [Google Scholar]
Lu, Y.C.; Yu, Z.S. Study on a coal gangue photoelectric sorting system and its anti-interference technology. Mining R&D 2020, 40, 144–147. [Google Scholar]
Wang, S.; He, L.; Guo, Y.C.; Hu, K.; Li, D.Y.; Zhao, Y.Q.; Ma, X. Dual-energy X-ray transmission identification method of multi-thickness coal and gangue based on SVM distance transformation. Fuel 2024, 356, 129593. [Google Scholar] [CrossRef]
Zhao, L.J.; Han, L.G.; Zhang, H.N.; Liu, Z.F.; Gao, F.; Yang, S.J.; Wang, Y.D. Study on recognition of coal and gangue based on multimode feature and image fusion. PLoS ONE 2023, 18, e0281397. [Google Scholar] [CrossRef] [PubMed]
Yuan, L.H.; Fu, L.; Yang, Y.; Miao, J. Analysis of texture feature extracted by gray level co-occurrence matrix. J. Comput. Appl. 2009, 29, 1018–1021. [Google Scholar] [CrossRef]
Yu, G.F. Expanded order co-occurrence matrix to differentiate between coal and gangue based on interval grayscale compression. J. Image Graph. 2012, 17, 966–970. [Google Scholar]
Guo, Y.C.; Wang, X.Q.; Wang, S.; Hu, K.; Wang, W.S. Identification Method of Coal and Coal Gangue Based on Dielectric Characteristics. IEEE Access 2021, 9, 9845–9854. [Google Scholar] [CrossRef]
Sun, Z.Y.; Lu, W.H.; Xuan, P.C.; Li, H.; Zhang, S.S.; Niu, S.C.; Jia, R.Q. Separation of gangue from coal based on supplementary texture by morphology. Int. J. Coal Prep. Util. 2019, 42, 221–237. [Google Scholar] [CrossRef]
Fu, C.C.; Lu, F.L.; Zhang, G.Y. Discrimination analysis of coal and gangue using multifractal properties of optical texture. Int. J. Coal Prep. Util. 2020, 42, 1925–1937. [Google Scholar] [CrossRef]
Tripathy, D.P.; Reddy, K.G.R. Novel Methods for Separation of Gangue from Limestone and Coal using Multispectral and Joint Color-Texture Features. J. Inst. Eng. (India) Ser. D 2017, 98, 109–117. [Google Scholar] [CrossRef]
Hou, W. Identification of Coal and Gangue by Feed-forward Neural Network Based on Data Analysis. Int. J. Coal Prep. Util. 2019, 39, 33–43. [Google Scholar] [CrossRef]
Dou, D.Y.; Wu, W.Z.; Yang, J.G.; Zhang, Y. Classification of coal and gangue under multiple surface conditions via machine vision and relief-SVM. Powder Technol. 2019, 356, 1024–1028. [Google Scholar] [CrossRef]
Li, Y. Research on Coal Gangue Detection Based on Deep Learning; Xi’an University of Science and Technology: Xi’an, China, 2020. [Google Scholar]
Pu, Y.Y.; Apel, D.B.; Szmigiel, A.; Chen, J. Image Recognition of Coal and Coal Gangue Using a Convolutional Neural Network and Transfer Learning. Energies 2019, 12, 1735. [Google Scholar] [CrossRef]
Alfarzaeai, M.S.; Niu, Q.; Zhao, J.Q.; Eshaq, R.M.A.; Hu, E.Y. Coal/Gangue Recognition Using Convolutional Neural Networks and Thermal Images. IEEE Access 2020, 8, 76780–76789. [Google Scholar] [CrossRef]
Liu, Q.; Li, J.G.; Li, Y.S.; Gao, M.W. Recognition Methods for Coal and Coal Gangue Based on Deep Learning. IEEE Access 2021, 9, 77599–77610. [Google Scholar] [CrossRef]
Cao, X.G.; Liu, S.Y.; Wang, P.; Xu, G.; Wu, X.D. Research on coal gangue identification and positioning system based on coal-gangue sorting robot. Coal Sci. Technol. 2022, 50, 237–246. [Google Scholar]
Gao, R.; Sun, Z.Y.; Li, W.; Pei, L.L.; Hu, Y.J.; Xiao, L.Y. Automatic Coal and Gangue Segmentation Using U-Net Based Fully Convolutional Networks. Energies 2020, 13, 829. [Google Scholar] [CrossRef]
Lai, W.H.; Zhou, M.R.; Hu, F.; Bian, K.; Song, H.P. Coal Gangue Detection Based on Multi-spectral Imaging and Improved YOLO v4. Acta Opt. Sin. 2020, 40, 72–80. [Google Scholar]
Glenn, J. Yolo v5 [EB/OL]. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 10 January 2022).
Ge, Z.; Liu, S.T.; Wang, F.; Li, Z.M.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Yang, L.X.; Zhang, R.Y.; Li, L.D.; Xie, X.H. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021. [Google Scholar]
Li, H.L.; Li, J.; Wei, H.B.; Liu, Z.; Zhan, Z.F.; Ren, Q.L. Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles. arXiv 2022, arXiv:2206.02424. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. arXiv 2016, arXiv:1512.02325. [Google Scholar]
Ren, S.Q.; He, K.M.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
Ramprasaath, R.S.; Michael, C.; Abhishek, D.; Ramakrishna, V.; Devi, P.; Dhruv, B. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2019, 128, 336–359. [Google Scholar]

Figure 1. YOLOv5s network structure diagram.

Figure 2. Decoupled head structure diagram.

Figure 3. GSConv structure diagram.

Figure 4. VoV-GSCSP structure diagram.

Figure 5. DGSV-YOLOv5s-SA network structure diagram.

Figure 6. Experimental platform.

Figure 7. Dataset images.

Figure 8. Training loss curve.

Figure 9. Training precision curve of each network.

Figure 10. Precision vs. speed curve.

Figure 11. Heat map visualization: (a) YOLOv5s heat map; (b) DGSV-YOLOv5s-SA heat map.

Table 1. Experimental software and hardware environment.

Name	Information
Operating system	Ubuntu 20.04
Deep learning frameworks	Pytorch 1.12.1
Processor	Intel(R) Xeon(R) Gold 6226R ×2
RAM	196 Gb
Video card	NVIDIA Tesla T4 ×4
Operational environment	Anaconda3
Python version	3.9

Table 2. Comparison of experimental results by network.

Model	F1 (%)	R (%)	FPS	mAP@0.5:0.95(%)
YOLOv5s	98.8	98.8	83.3	84.2
YOLOv5m	98.4	98.5	52.3	84.8
YOLOv5l	98.5	98.7	34.6	85.0
YOLOv5x	99.2	99.3	22.3	86.7
YOLOv7	98.8	99.1	28	84.5
YOLOv7x	99.1	99.3	21.1	85.1
SSD	99.2	99	58.6	84.2
Faster-RCNN	98	99.4	7.9	82.4
DGSV-YOLOv5s-SA	99.7	99.7	86.6	90.3

Table 3. Results of ablation experiments.

Model	F1 (%)	R (%)	FPS	mAP@0.5:0.95(%)
YOLOv5s	98.8	98.8	83.3	84.2
Decoupled Head-YOLOv5s	98.7	98.8	58.4	86.3
YOLOv5s-SimAM	98.6	98.8	91.7	85.8
GS-VoV-YOLOv5s	99.5	99.6	97.1	88.2
DGSV-YOLOv5s-SA	99.7	99.7	86.6	90.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, S.; Zhu, J.; Li, Z.; Sun, X.; Wang, G. Coal Gangue Target Detection Based on Improved YOLOv5s. Appl. Sci. 2023, 13, 11220. https://doi.org/10.3390/app132011220

AMA Style

Wang S, Zhu J, Li Z, Sun X, Wang G. Coal Gangue Target Detection Based on Improved YOLOv5s. Applied Sciences. 2023; 13(20):11220. https://doi.org/10.3390/app132011220

Chicago/Turabian Style

Wang, Shuxia, Jiandong Zhu, Zuotao Li, Xiaoming Sun, and Guoxin Wang. 2023. "Coal Gangue Target Detection Based on Improved YOLOv5s" Applied Sciences 13, no. 20: 11220. https://doi.org/10.3390/app132011220

APA Style

Wang, S., Zhu, J., Li, Z., Sun, X., & Wang, G. (2023). Coal Gangue Target Detection Based on Improved YOLOv5s. Applied Sciences, 13(20), 11220. https://doi.org/10.3390/app132011220

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Coal Gangue Target Detection Based on Improved YOLOv5s

Abstract

1. Introduction

2. Network Model Improvements

2.1. YOLOv5s Network Architecture

2.2. Decoupled Head

2.3. SimAM Attention Mechanism

2.4. Slim-Neck Structure

2.5. DGSV-YOLOv5s-SA Network Structure

3. Experimental Results and Analysis

3.1. Experimental Platform

3.2. Evaluation Indicators

3.3. Dataset Establishment

3.4. Experimental Results and Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI