Coal–Rock Data Recognition Method Based on Spectral Dimension Transform and CBAM-VIT

Yang, Jianjian; Zhang, Yuzeng; Wang, Kaifan; Tong, Yibo; Liu, Jinteng; Wang, Guoyong

doi:10.3390/app14020593

Open AccessArticle

Coal–Rock Data Recognition Method Based on Spectral Dimension Transform and CBAM-VIT

by

Jianjian Yang

^1,2,*,

Yuzeng Zhang

¹,

Kaifan Wang

¹,

Yibo Tong

¹,

Jinteng Liu

¹ and

Guoyong Wang

³

¹

School of Mechanical and Electrical Engineering, China University of Mining and Technology (Beijing), Beijing 100083, China

²

Key Laboratory of Intelligent Mining and Robotics, Ministry of Emergency Management, Beijing 100083, China

³

Inner Mongolia Research Institute, University of Mining and Technology (Beijing), Ordos 017004, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(2), 593; https://doi.org/10.3390/app14020593

Submission received: 6 November 2023 / Revised: 8 December 2023 / Accepted: 7 January 2024 / Published: 10 January 2024

(This article belongs to the Special Issue Advanced Intelligent Mining Technology)

Download

Browse Figures

Versions Notes

Abstract

:

Coal–gangue sorting is a vital component of intelligent mine construction. As intelligent manufacturing continued to advance, data-driven coal–gangue recognition emerged as a prominent research topic. However, conventional data-driven methods for coal–gangue recognition heavily rely on expert-extracted features. The process of feature extraction is labor-intensive and significantly impacts the final outcome. Deep learning (DL) offers an effective approach to automatically extract features from raw data. Among the various DL techniques, convolutional neural networks (CNNs) have proven to be particularly effective. In this paper, we propose an intelligent method for recognizing coal–rock by fusing multiple preprocessing techniques applied to near-infrared spectra and employing dual attention. Initially, a signal-to-RGB image conversion method is applied to fuse three types of preprocessing data, namely first-order differential, second-order differential, and standard normal transform, into an RGB image representation. Subsequently, we propose a neural network model (CBAM-VIT) that integrates the convolutional block attention mechanism (CBAM) and Vision Transformer (VIT). When evaluated on the coal–rock dataset, this model achieves an accuracy of 98.5%, surpassing the performance of VIT (95.3%), VGG-16 (89%), and AlexNet (82%). The comparative results clearly demonstrate that the proposed coal–gangue recognition method yields significant improvements in classification outcomes.

Keywords:

deep learning; coal–gangue sorting; data fusion; CBAM-VIT; attention mechanism

1. Introduction

Coal mine intelligence plays a crucial role in achieving the high-quality development of the coal industry [1]. The identification and sorting of coal–gangue is a key technology for intelligent coal mine construction and improving the accuracy and efficiency of coal seam detection, intelligent mining, and rapid sorting [2]. The traditional methods for coal–gangue separation mainly include dense medium separation, flotation, gravity separation, magnetic separation, and electrostatic separation. These methods present certain challenges in terms of separation efficiency, processing capacity, cost, and environmental friendliness. In recent years, coal and gangue separation technologies have been continuously developed. Currently, the main methods for coal–gangue identification include X-ray, γ-ray, image monitoring, radar detection, and vibration detection [3]. The X-ray and gamma-ray detection methods rely on distinguishing the principal constituent elements of coal and gangue; however, there are radiation risks associated with radiographic detection methods, posing potential harm to human health [4,5,6]. With advancements in computer vision technology, the potential of using image recognition technology for coal–gangue separation has been explored [7,8]. Sun Jiping et al. [9] employed the gray-level co-occurrence matrix to extract texture features from coal–rock images and established a coal–rock classification method, using Fisher discriminant analysis. Meng H et al. [10] utilized energy, contrast, correlation, and other feature vectors based on the differences in texture characteristics of coal–rock for coal–rock identification. However, surface and texture differences may not be visible in images due to various interfering factors, like uneven lighting and poor illumination. Additionally, the presence of dust around the gangue can also impact recognition performance. Therefore, it is highly significant to develop a recognition method with high efficiency, low operational costs, and no harm to human health.

The near-infrared spectrometer is an advanced instrument that utilizes information measurement and processing technology. It offers several advantages, including high analysis efficiency, stable results, non-destructive measurement, ease of operation, and wide applicability [11]. The identification mechanism relies on the specific chemical composition and material structure of the substance being analyzed, which exhibit absorption characteristics in the reflection spectrum at specific wavelengths. In recent years, near-infrared spectrometry has experienced rapid development in various fields, such as food, agriculture, and coal [12,13,14]. Mao et al. [15] proposed an improved multilayer extreme learning machine algorithm combined with visible-infrared spectroscopy, which established a rapid coal identification model. The accuracy rate of the test set reached 92.25%. Song Liang et al. [16] conducted research on coal and rock classification based on joint analysis of visible light-near infrared and thermal infrared. Yang et al. [17] utilized visible and near-infrared (VIS-NIR) reflectance spectroscopy for coal–rock identification. In the domain of image recognition, one study [18] explored a method for coal–gangue color image recognition based on multispectral and color texture fusion. Another study [8,19] developed a coal–gangue recognition model, using support vector machines, random forests, and feedforward neural networks, extracting multidimensional features from gray texture in coal–gangue images. Additionally, the literature [20] employed the improved LeNet-5 deep neural network to train coal–gangue samples and obtained a deep learning model for coal–gangue classification. However, direct acquisition of coal–gangue images is susceptible to environmental disturbances, and the classification accuracy of deep learning models can be relatively low. To address these challenges, one work from the literature [21] proposed a CNN-based coal–rock identification method utilizing hyperspectral data. By preprocessing the reflectance spectral data of coal and rock, the suggested one-dimensional convolutional neural network model achieved a recognition accuracy of 94.6%.

In 2022, Dong et al. [22] proposed a coal identification model combining a convolutional neural network and extreme learning machine. It provides a low-cost, convenient, and effective method for the rapid identification of on-site coal. In 2022, Xie et al. [23] proposed a deep learning target detection algorithm to replace artificial coal–gangue preparation. The problem of the vanishing generator gradient was overcome. The image generated by the improved Deep Convolutional Generative Adversarial Networks (DCGANs) can improve the recognition accuracy of coal–gangue by 5.87%. Shang et al. [24] proposed a primary process of underground raw coal with mechanical sieve jig as the main washing equipment. The main structures of the jig drive mechanism and waste discharge mechanism are optimized and improved. The process requirements and narrow space environment of the underground mine are satisfied, and the effective separation of coal and gangue is realized. Dong et al. [25] proposed a threshold sorting method based on dual-energy X-ray technology. The recognition rates of 150~300 mm materials were 96.99% and 92.32%, respectively.

This study aimed to identify coal–gangue, using near-infrared spectroscopy. Although previous studies investigated recognition and classification algorithms, they did not achieve the desired degree of accuracy. Thus, this study converted collected coal and gangue hyperspectral data into RGB images and used them as input for various neural network models (CBAM-VIT, VIT [26], AlexNet [27], and VGG-16 [28]). The study further trained and tested these models on the collected datasets. The t-distributed Stochastic Neighbor Embedding (t-SNE) technique [29] was utilized to reduce the dimensionality of the near-infrared spectrum data. Finally, comparative experiments were conducted to comprehensively evaluate the proposed network model.

The main work of this paper is as follows:

The features of the three preprocessing methods are fused and converted to RGB fusion images by signal image conversion.
A novel CNN-based ViT algorithm is introduced, aiming to comprehensively incorporate the information from both local and global features of coal and gangue data, thereby significantly enhancing the recognition performance.

2. Materials and Methods

2.1. Spectral Data Conversion to RGB Images

Near-infrared instruments are prone to random noise, instrument deviation, and external interferences such as sample background and scattered light. These factors can cause spectral shift or drift, potentially compromising the accuracy and precision of the model. To mitigate these issues, appropriate spectral preprocessing steps are necessary to enhance the prediction ability and stability of the model. In this section, A multi-signal-to-RGB image conversion method is proposed to efficiently merge three types of spectral preprocessing data into a single RGB image to obtain fusion information. The proposed method is illustrated in Figure 1.

Step 1: Use the spectral acquisition device to collect the spectral data of the coal–rock and measure ten times to obtain the average value. According to the performance of the instrument, each sample datum is a “1 × 155” matrix. Raw data can be expressed as follows:

X r a w = \{x_{1}, x_{2} \dots x_{n} : x_{i} \in R^{1 \times 155}\}

(1)

Step 2: Convert the spectral raw data into RGB three-channel data by using the first-order differential, second-order differential, and standard normal transformation preprocessing methods.

R channel . y_{f i r} = \frac{- 2 y_{j - 2} - y_{j - 1} + y_{j + 1} + y_{j + 2}}{10 λ}, j = 1, 2 \dots 155

(2)

G channel . y_{\sec} = \frac{- 2 y_{j - 2} - y_{j - 1} + 2 {y_{j} + y}_{j + 1} + y_{j + 2}}{7 λ^{2}}, j = 1, 2 \dots 155

(3)

B channel . y_{SNV} = \frac{y_{ij} - \bar{y_{i}}}{\sqrt{\frac{1}{p - 1} \sum_{j = 1}^{p} {(y_{ij} - \bar{y_{i}})}^{2}}} \bar{y_{i}} = \frac{1}{p} \sum_{j = 1}^{p} y_{ij}

(4)

In the formula, y_fir and y_sec are the reflectance after the first-order differential and second-order differential, respectively; y_j is the reflectance of the j-th wavelength point; λ is the wavelength interval, y_snv is the reflectance after standard normal variable transformation; y_ij is the reflectance of the i-th sample at the j-th wavelength point;

\bar{y_{i}}

is the average reflectance of the i-th sample; and p is the number of spectral wavelength points.

After this step, a modified three-channel signal is generated, which has a lower dimensionality compared to the original signal and contains only the main information.

Step 3: Convert the obtained three-channel signals into pixel matrices and normalize their values.

After processing, each value in the pixel matrix is normalized between 0 and 255.

x^{*} = \frac{x - x_m e a n}{x_\max - x_\min} \times 255

(5)

Step 4: Fuse the three normalized pixel matrices into one RGB pixel matrix to generate an RGB image, thereby obtaining feature-level fusion information.

The pixel matrix of an RGB image can be expressed as RGBM(a, b, Crgb) = (

x^{*}

, Crgb), * = 1, 2, 3.

Step 5: The RGB images represented as the RGB pixel matrix RGBM(a, b, Crgb) are divided into three parts: training, validation, and testing.

2.2. CBAM-VIT

The attention mechanism can highlight certain important features in the network model, improve the capability of feature extraction, and thus enhance the detection accuracy of the model.

The proposed CBAM-VIT method for coal–gangue classification in this paper incorporates the CBAM attention mechanism [30] in a series with the convolutional layer and VIT. The overall structure of the CBAM-VIT method is shown in Figure 2. Table 1 summarizes the specific description of the CBAM-VIT method. This approach considers the importance of pixels on different channels of the gangue RGB image, as well as the importance of pixels in different spatial positions within the same channel. This allows for the strengthening of effective feature information while weakening invalid feature information. To better extract spectral input features of coal–gangue, the method employs small convolution kernels and step sizes in preceding convolution layers. This enables full extraction of both local and global features of the spectral data of coal–gangue using VIT. The multi-head attention within VIT maximizes available resources for parallel computing, thereby increasing the speed of computing. Below, we provide a detailed introduction to the three attention modules and corresponding gangue recognition methods.

2.2.1. Channel Attention Mechanism

During the process of determining whether a particular pixel belongs to coal or gangue, it is observed that each channel of the feature map carries different levels of importance. Utilizing the feature information from each channel equally may lead to the inefficient use of computing resources. To address this issue, the channel attention mechanism is introduced to capture the interrelationship between channels in the feature map. This allows for differential treatment of the feature information from different channels.

In Figure 3, the input feature map is represented as F ∈ R C × H × W, where H and W denote the height and width of the input feature map, respectively, and C represents the number of channels in the input feature map. The feature map captures global spatial information by applying both global maximum pooling (Max Pool) and global average pooling (Avg Pool) operations, resulting in two feature maps with dimensions of C × 1 × 1. To fully utilize the compressed feature information, these feature maps are subsequently fed into a multilayer perceptron (MLP) to obtain two one-dimensional feature maps. The MLP consists of two fully connected layers activated by the ReLU function. By summing the two one-dimensional feature maps along the channel dimension and normalizing the result using the Sigmoid function, the output of the channel attention mechanism is obtained. This output is then multiplied element-wise with the original image to restore its size to C × H × W. The entire process can be mathematically expressed by Formula (6):

\begin{matrix} M_{c} (F) & = σ (M L P (A v g p o o l (F)) + M L P (M a x p o o l (F))) \\ = σ (W_{1} (W_{0} (F_{cvg}^{c})) + W_{1} (W_{0} (F_{avg}^{c}))) \end{matrix}

(6)

2.2.2. Spatial Attention Mechanism

In Figure 4, the spatial attention mechanism receives the feature map F, which has undergone channel feature recalibration, as input. Global maximum pooling and average pooling operations are performed along the channel dimension, resulting in two feature maps with dimensions of 1 × H × W. These two feature maps are then concatenated along the channel dimension, allowing for the encoding and fusion of information from different spatial positions through a convolutional layer. The resulting spatial weighted information, denoted as Ms, captures the importance of different spatial positions within the image. This process can be mathematically represented by Formula (7):

\begin{matrix} M_{s} (F) & = σ (f^{7 \times 7} ([A v g p o o l (F); M a x p o o l (F)])) \\ = σ (f^{7 \times 7} ([F_{avg}^{s}; F_{m a x}^{s}])) \end{matrix}

(7)

2.2.3. The Vision Transformer (ViT)

To enhance the feature extraction process in the original VIT, we employ a smaller convolution kernel and step size. This replaces the conventional image feature extraction method. By incorporating the channel attention mechanism of the CBAM attention mechanism and the spatial attention mechanism, we obtain a feature map that emphasizes significant features. The VIT model proceeds by flattening the feature map into a one-dimensional vector. These vectors are then linearly projected into n D-dimensional vectors while considering the position information Epos to capture the sequence data correlation. Furthermore, a classification flag (Xclass) is added before the input sequence data to better represent global information [31]. The process is mathematically expressed in Formula (8). Next, the multi-head attention mechanism is utilized to calculate attention weights from different positions, as described in Formulas (9)–(11). Nonlinear transformations of the data are then performed using Formula (12). After N ENCODER modules, the data are passed through a fully connected layer for classification. The VIT model structure is illustrated in Figure 5. Each ENCODER module comprises a multi-head attention layer and an MLP layer, with the MLP consisting of two nonlinear layers activated by GELU functions. To expedite calculations, layer normalization (Equation (13)) is applied before each layer for normalization, and residual connections are employed after each layer.

z_{0} = [x_{class}; x_{p}^{1} E; x_{p}^{2} E; \dots; x_{p}^{n} E] + E_{p o s}

(8)

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(9)

M u l t i H e a d (Q, K, V) = C o n c a t ({head}_{1 \dots} {Head}_{h}) W^{O}

(10)

{Headi}_{i} = A t t e n t i o n (Q_{i}^{Q}, K_{i}^{K}, V_{i}^{V})

(11)

z_{l} = M L P (L N ({headi}_{i}))

(12)

y = L N (z_{L}^{0})

(13)

2.3. Experimental Setup

In order to ensure the accuracy of the dust concentration in the experimental environment, a closed experimental environment was built, a blower was used to simulate the state of dust during underground work, and the dust concentration meter was used to detect it in real time. After the experimental environment changes, before the spectrum collection of the coal–rock sample, the calibration whiteboard is used for background calibration. After the test number H3 is changed to H4, the calibration whiteboard must be used for background calibration to prevent the reflection spectrum baseline from drifting. The spectrum acquisition device is connected as shown in Figure 6, the coal and rock samples are placed on the stage in turn, and the collimator and the halogen tungsten lamp are placed side by side at a position L meters above the axis of the coal and rock samples. The SpectroMOST6, matched with the spectrometer, is used as the software for spectral collection and simultaneous computer observation to remove spectral curves with large value deviations, collect 10 sets of near-infrared reflection spectrum data for the same sample, and calculate the average value of ten times of data to be the near-infrared reflection spectrum data of the sample. See Formulas (14) and (15) for the diameter, D, and area, S, of the bottom circle generated by light source irradiation, respectively. The reflectance spectrum data are collected using a near-infrared spectrometer, and the time spent on collecting each spectral curve is about 50 ms.

D = d + 2 L \tan \frac{θ}{2}

(14)

S = \frac{π D^{2}}{4}

(15)

3. Experimental Results and Analysis

3.1. Spectral Acquisition Experiment

3.1.1. Experimental Environment

From the literature, we know that, during coal mining, the dust concentration range of the fully mechanized mining face is about 300–600 mg/m³. In this paper, a series of relatively sufficient experimental simulations are carried out to study the feasibility of using spectrometers to identify underground coal–rocks. In the experimental design, the interaction of four factors, namely dust concentration, detection distance, wind speed, and coal proportion in the collimating mirror, was considered, and the influencing factors of the experiment and their specific level values are listed in Table 2 in detail.

3.1.2. Experimental Method

In order to ensure that the experimental results are more accurate, the halogen tungsten light source and the spectrometer were turned on for preheating one hour before the experiment. In order to improve the experimental efficiency, the orthogonal test method is adopted, and the specific test is shown in Table 3.

3.1.3. Dataset Description

In this study, 120 coal and rock samples were collected, and the experimental samples were collected under ten simulated working conditions (H1~H9) and the ideal experimental environment (H0). The model of the spectrometer in the experiment is the Holland spectrometer, the collection wavelength range of this instrument is 1300–2500 nm, and the resolution is 8 nm. Each experimental sample was placed behind the test bench, and the reflectance spectrum data of the coal–rock sample were collected by the abovementioned experimental device. Each experimental environment collected 10 spectral data, and the average value was taken as the final experimental data. According to the properties of the selected near-infrared spectrometer, the dimension of data collected for each experimental sample is [2 × 150], and we only keep the data of the dimension of reflectivity, that is, [1 × 150]. After the data are preprocessed by the first-order differential and second-order differential methods, the dimensions of the data become [1 × 149] and [1 × 148], respectively, so the data dimension of the original RGB image after data conversion is [1 × 148], which constitutes Dataset B. The dataset is described in Table 4. Dataset visualization is shown in Table 5.

3.2. Experimental Environment and Parameter Configuration

This article builds the model in the environment of Pytorch 1.7 and Python 3.8. The computer is based on the Windows operating system, the CPU is Intel Core i7-8700K (Lenovo, Beijing, China), and the GPU is GTX 1080Ti. The training set and test set participate in the entire training process of the model, and the verification set is independent of the model training and used for model verification after the model training is completed. Set the initial learning rate to 1 × 10⁻², use the cross-entropy loss function to calculate loss, and use Adam as the optimizer for optimization. Set the batch size to 64 and the number of training iterations to 500.

3.3. Comparison of Different Convolutional Neural Networks

The recognition accuracies of CBAM-VIT, VIT, AlexNet, and VGG-16 networks under the same number of iterations are shown in Figure 7.

In addition to testing the CBAM-VIT and VIT network models on the gangue RGB image dataset, traditional CNN models (AlexNet, VGG-16) were also evaluated. Without significant data preprocessing, the classification accuracy of CBAM-VIT and VIT in the test set reached 98.5% and 95.3%, respectively, surpassing other neural network models (as presented in the Figure 7). These results suggest that the self-attention mechanism in Vision Transformer can effectively capture global features between hyperspectral data of coal–gangue. Furthermore, the CBAM-VIT network model designed in this study not only employs the pooling operation in the CBAM module to obtain a global feature representation with fewer parameters and calculations but also adaptively captures recognition-related features. Notably, the recognition accuracy of CBAM-VIT outperforms the three CNN-based network models under all working conditions. To verify the competitiveness of the proposed CBAM-VIT network model, t-SNE technology was used to visualize the final output distribution of each neural network model in two-dimensional space. The visualization results, illustrated in Figure 8, depict coal in red and gangue in blue.

To further evaluate the effectiveness of CBAM-VIT, this study thoroughly assesses its recognition accuracy across various operational scenarios. Additionally, the spectral data of coal–gangue are included in the ideal experimental setting. Figure 9 presents the confusion matrix of CBAM-VIT across ten different working conditions. In this matrix, the rows and columns represent the predicted and true labels, respectively. The diagonal cells indicate the accuracy of gangue identification. Notably, the CBAM-VIT network model achieves a classification accuracy exceeding 96% across all working conditions, as evident from the confusion matrix. These results demonstrate the superior performance of the CBAM-VIT network model.

4. Discussion

CBAM-VIT demonstrated its effectiveness in comparison with the results of other neural networks. In the subsequent discussion, we explore the benefits of CBAM-VIT. Firstly, the model’s enhanced recognition feature learning mechanism is capable of discerning the distinct positions and peaks of absorption resulting from the various components of coal and gangue. This is achieved through the VIT’s utilization of a hybrid attention mechanism, allowing it to identify and prioritize information related to feature absorption peaks, consequently facilitating the extraction of crucial identification details that are essential for improving model performance. Secondly, the model also employs a discriminative feature learning mechanism, where the feature representation in CBAM-VIT consists of activation maps extracted by different convolutional kernels, each capturing varying degrees of importance for the task of gangue recognition.

5. Conclusions

To address the problem of inaccurate recognition of coal–gangue in the complex and harsh underground mining environment, a data fusion method, followed by an identification method, is proposed. The core techniques include converting the collected spectral data into RGB images by using three preprocessing methods and employing the fusion of CBAM and VIT models for classification and identification. In order to verify the practical application effectiveness of the proposed algorithm, a coal–gangue identification verification platform was constructed in the laboratory and tested to simulate the complex environment in underground mines. The research findings are as follows.

A coal–gangue image classification network model based on a hybrid attention mechanism (CBAM-VIT) is proposed, which achieves deep-level extraction of semantic features of coal–gangue through channel attention structure, spatial attention structure, and convolutional structure. The network model is trained using the cross-entropy loss function to automatically learn deep-level semantic feature representation and classify coal–gangue images.
CBAM-VIT enhances the feature expression capability of coal–gangue in complex environments. By fusing spatial and channel features, it more effectively integrates the multilayer feature information of coal–gangue RGB images. The transformer layer captures the long-range dependencies between image elements, capturing useful information and suppressing the irrelevant noise of different granularities in the feature maps. Compared to the VIT algorithm, the proposed CBAM-VIT algorithm achieves an average recognition accuracy improvement of 3.2%, enhancing the precision and reliability of classification. This demonstrates that the method has stronger robustness compared to other approaches.
This study set up a simulated coal mining environment with three levels and four influencing factors, considering only four environmental factors. There are still other influencing factors in the underground mining environment, and there exist certain discrepancies between the simulated environment and the actual working face in coal mines. Further research is needed to continuously improve the methodology.

Author Contributions

Conceptualization, J.Y. and Y.Z.; methodology, Y.Z.; software, K.W.; validation, Y.Z., K.W., and Y.T.; formal analysis, J.L.; investigation, J.Y.; resources, Y.Z.; data curation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, J.Y.; visualization, Y.Z.; supervision, G.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Theory and Method of Excavation-Support-Anchor Parallel Control for Intelligent Excavation Complex System (Grant No. 52104169), and National key research and development program (Grant No. 2022YFB4703703). Green, intelligent, and safe mining of coal resources (Grant No. 52121003).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Acknowledgments

The study was approved by the China University of Mining and Technology (Beijing).

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, G. Recent technological advances and problems in coal mine intelligence. Coal Sci. Technol. 2022, 50, 27. [Google Scholar]
Zhang, Q.; Zhang, R.; Liu, J.; Wang, C.; Zhang, H.; Tian, Y. A review of coal rock identification technology for intelligent mining in coal mines. Coal Sci. Technol. 2022, 50, 1–26. [Google Scholar]
Tian, L.Y.; Dai, B.H.; Wang, Q.M. A coal rock identification method based on multi-strain data fusion of rocker pins in coal mining machines. J. Coal 2020, 45, 1203–1210. [Google Scholar]
Liu, K.; Zhang, X.; Chen, Y.Q. Extraction of coal and gangue geometric features with multifractal detrending fluctuation analysis. Appl. Sci. 2018, 8, 463. [Google Scholar] [CrossRef]
Hu, F.; Zhou, M.; Yan, P.; Bian, K.; Dai, R. Multispectral imaging: A new solution for identification of coal and gangue. IEEE Access 2019, 7, 169697–169704. [Google Scholar] [CrossRef]
Zhang, N.; Liu, C. Radiation characteristics of natural gamma-ray from coal and gangue for recognition in top coal caving. Sci. Rep. 2018, 8, 190. [Google Scholar] [CrossRef] [PubMed]
Pu, Y.; Apel, D.B.; Szmigiel, A.; Chen, J. Image recognition of coal and coal gangue using a convolutional neural network and transfer learning. Energies 2019, 12, 1735. [Google Scholar] [CrossRef]
Hou, W. Identification of coal and gangue by feed-forward neural network based on data analysis. Int. J. Coal Prep. Util. 2019, 39, 33–43. [Google Scholar] [CrossRef]
Sun, J.; Su, B. Coal–rock interface detection on the basis of image texture features. Int. J. Min. Sci. Technol. 2013, 23, 681–687. [Google Scholar] [CrossRef]
Meng, L.; Li, M. Characteristic analysis and recognition of coal-rock interface based on visual technology. Int. J. Signal Process. Image Process. Pattern Recognit. 2016, 9, 61–68. [Google Scholar]
Huang, R.; Gao, X. Design of automatic classification equipment system for frozen fish meat based on near-infrared spectral detection technology. Autom. Instrum. 2023, 38, 70–75. [Google Scholar]
Guo, Z.; Wang, J.; Song, Y.; Yin, X.; Zou, C.; Zou, X. Design and experiment of a handheld visible near-infrared apple quality nondestructive inspection system. J. Agric. Eng. 2021, 37, 271–277. [Google Scholar]
Jin, N.; Chang, C.; Wang, H.; Chen, Y.; Fang, P. Design and experiment of online near-infrared feed quality monitoring platform. J. Agric. Mach. 2020, 51, 129–137. [Google Scholar]
Ning, S. Research on near-infrared spectral detection method of refined coal heat generation. Coal Process. Technol. 2016, 27–29. [Google Scholar]
Mao, Y.; Le, B.T.; Xiao, D.; He, D.; Liu, C.; Jiang, L.; Yu, Z.; Yang, F.; Liu, X. Coal classification method based on visible-infrared spectroscopy and an improved multilayer extreme learning machine. Opt. Laser Technol. 2019, 114, 10–15. [Google Scholar] [CrossRef]
Song, L.; Liu, S.; Yu, M.; Mao, Y.; Wu, L. Study on the classification method of coal and gangue based on the joint analysis of visible-near infrared and thermal infrared spectra. Spectrosc. Spectr. Anal. 2017, 37, 416–422. [Google Scholar]
Yang, E.; Ge, S.; Wang, S. Characterization and identification of coal and carbonaceous shale using visible and near-infrared reflectance spectroscopy. J. Spectrosc. 2018, 2018, 2754908. [Google Scholar] [CrossRef]
Tripathy, P.; Reddy, K.G.R. Novel Methods for Separation of Gangue from Limestone and Coal using Multispectral and Joint Color-Texture Features. J. Inst. Eng. Ser. D 2017, 98, 109–117. [Google Scholar] [CrossRef]
Xue, G.; Li, X.; Qian, X.; Zhang, Y. Random forest-based image recognition of coal gangue at heaving workface. Ind. Min. Autom. 2020, 46, 57–62. [Google Scholar]
Su, L.; Cao, X.; Ma, H.; Li, Y. Research on Coal Gangue Identification by Using Convolutional Neural Network. In Proceedings of the 2018 2nd IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Xi’an, China, 25–27 May 2018; pp. 810–814. [Google Scholar]
Yang, J.; Chang, B.; Zhang, Y.; Luo, W.; Wu, M. Research on CNN Coal and Rock Recognition Method Based on Hyperspectral Data. Int. J. Coal Sci. Technol. 2021. [Google Scholar] [CrossRef]
Xiao, D.; Le TT, G.; Doan, T.T.; Le, B.T. Coal identification based on a deep network and reflectance spectroscopy. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2022, 270, 120859. [Google Scholar] [CrossRef] [PubMed]
Xie, J.; Cai, X.; Yang, H.; Meng, Q.; Li, B.; Yan, Y.; Zhao, L. Coal Gangue Data Set Expansion Method Based on DCGAN. In Proceedings of the 2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 14–16 January 2022; pp. 303–308. [Google Scholar]
Shang, D.; Yang, Z.; Wang, J.; Wang, Y.; Liu, Y. Innovative Design and Fuzzy Logic Control for An Underground Moving Sieve Jig. Mathematics 2020, 8, 2151. [Google Scholar] [CrossRef]
Li, D.; Huang, X.; Ni, H.; Tang, W. Research on separation method of Coal gangue Based on Dual-energy X-ray image technology. In Proceedings of the 3rd International Conference on Electronics and Communication; Network and Computer Technology (ECNCT 2021), Xiamen, China, 3–5 December 2021; SPIE: Bellingham, DC, USA, 2022; Volume 12167, pp. 18–23. [Google Scholar]
Raghu, M.; Unterthiner, T.; Kornblith, S.; Zhang, C.; Dosovitskiy, A. Do vision transformers see like convolutional neural networks? Adv. Neural Inf. Process. Syst. 2021, 34, 12116–12128. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9. [Google Scholar] [CrossRef]
Liu, W.; Liu, X. Research progress of Transformer based on computer vision. Comput. Eng. Appl. 2022, 58, 1–16. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]

Figure 1. Data conversion schematic.

Figure 2. Overall structure of CBAM-VIT model.

Figure 3. Channel attention module diagram.

Figure 4. Spatial attention mechanism.

Figure 5. ViT diagram.

Figure 6. Near-infrared reflection spectrum collection device of coal–gangue.

Figure 7. Accuracy of different classification networks.

Figure 8. t−SNE three−dimensional distribution map.

Figure 9. Coal–gangue identification confusion matrix for ten working conditions.

Table 1. Model structure.

Layer	Output Volume	Description
Conv2d-1	[−1, 256, 1, 150]	Number of filters: 256. Kernel size: 1 × 3.
Conv2d-2	[−1, 512, 1, 150]	Number of filters: 512. Kernel size: 1 × 1.
Conv2d-3	[−1, 768, 1, 148]	Number of filters: 768. Kernel size: 1 × 3.
channel_attention	[−1, 768, 1, 148]	max_pool, avg_pool, ReLU
spacial_attention	[−1, 768, 1, 148]	Number of filters: 768. Kernel size: 1 × 7.
Encoder Block 1
Dropout-6	[−1, 149, 768]	Gaussian dropout: 0.3
LayerNorm-7	[−1, 149, 768]	Layer normalization
Linear-8	[−1, 149, 2304]	1:3 fully connection
Dropout-9	[−1, 12, 149, 149]	Gaussian dropout: 0.3
Linear-10	[−1, 149, 768]	3:1 fully connection
Dropout-11	[−1, 149, 768]	Gaussian dropout: 0.3
Attention-12	[−1, 149, 768]	Multi-attention
LayerNorm-13	[−1, 149, 768]	Layer normalization
Linear-14	[−1, 149, 3072]	1:4 fully connection
GELU-15	[−1, 149, 3072]	Activation
Dropout-16	[−1, 149, 3072]	Gaussian dropout: 0.3
Linear-17	[−1, 149, 768]	4:1 fully connection
Dropout-18	[−1, 149, 768]	Gaussian dropout: 0.3
MLP-19	[−1, 149, 768]	Fully connection. Activation: GELU.
Encoder Block 2
$⋮$	$⋮$	$⋮$
Linear-120	[−1, 2]	1538

Table 2. Influencing factors and level values.

	Detection Distance	Dust Concentration	Share of Coal	Wind Speed
Level Factor 1	1.2 m	200 mg/m³	All	3 m/s
Level Factor 2	1.5 m	500 mg/m³	2/3	6 m/s
Level Factor 3	1.8 m	800 mg/m³	1/3	9 m/s

Table 3. Four-factor, three-level orthogonal test table.

Label	Detection Distance	Dust Concentration	Share of Coal	Wind Speed
H1	1.2 m	200 mg/m³	All	3 m/s
H2	1.2 m	500 mg/m³	2/3	6 m/s
H3	1.2 m	800 mg/m³	1/3	9 m/s
H4	1.5 m	200 mg/m³	2/3	9 m/s
H5	1.5 m	500 mg/m³	1/3	3 m/s
H6	1.5 m	800 mg/m³	All	6 m/s
H7	1.8 m	200 mg/m³	1/3	6 m/s
H8	1.8 m	500 mg/m³	All	9 m/s
H9	1.8 m	800 mg/m³	2/3	3 m/s
H0	1.5 m	0 mg/m³	All	0 m/s

Table 4. Dataset description.

Dataset Name	Type	Data Dimension	Quantity
	Coal	1 × 148	1200
	Gangue	1 × 148	1200

Table 5. Dataset visualization.

Type	Coal										Gangue
	H1	H2	H3	H4	H5	H6	H7	H8	H9	H0	H1	H2	H3	H4	H5	H6	H7	H8	H9	H0
Channel A
Channel B
Channel C
RGB image

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, J.; Zhang, Y.; Wang, K.; Tong, Y.; Liu, J.; Wang, G. Coal–Rock Data Recognition Method Based on Spectral Dimension Transform and CBAM-VIT. Appl. Sci. 2024, 14, 593. https://doi.org/10.3390/app14020593

AMA Style

Yang J, Zhang Y, Wang K, Tong Y, Liu J, Wang G. Coal–Rock Data Recognition Method Based on Spectral Dimension Transform and CBAM-VIT. Applied Sciences. 2024; 14(2):593. https://doi.org/10.3390/app14020593

Chicago/Turabian Style

Yang, Jianjian, Yuzeng Zhang, Kaifan Wang, Yibo Tong, Jinteng Liu, and Guoyong Wang. 2024. "Coal–Rock Data Recognition Method Based on Spectral Dimension Transform and CBAM-VIT" Applied Sciences 14, no. 2: 593. https://doi.org/10.3390/app14020593

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Coal–Rock Data Recognition Method Based on Spectral Dimension Transform and CBAM-VIT

Abstract

1. Introduction

2. Materials and Methods

2.1. Spectral Data Conversion to RGB Images

2.2. CBAM-VIT

2.2.1. Channel Attention Mechanism

2.2.2. Spatial Attention Mechanism

2.2.3. The Vision Transformer (ViT)

2.3. Experimental Setup

3. Experimental Results and Analysis

3.1. Spectral Acquisition Experiment

3.1.1. Experimental Environment

3.1.2. Experimental Method

3.1.3. Dataset Description

3.2. Experimental Environment and Parameter Configuration

3.3. Comparison of Different Convolutional Neural Networks

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI