Promoting Sustainable Development of Coal Mines: CNN Model Optimization for Identification of Microseismic Signals Induced by Hydraulic Fracturing in Coal Seams

Li, Nan; Zhang, Yunpeng; Zhou, Xiaosong; Sun, Lihong; Huang, Xiaokai; Qiu, Jincheng; Li, Yan; Wang, Xiaoran

doi:10.3390/su16177592

Open AccessArticle

Promoting Sustainable Development of Coal Mines: CNN Model Optimization for Identification of Microseismic Signals Induced by Hydraulic Fracturing in Coal Seams

by

Nan Li

^1,*

,

Yunpeng Zhang

²,

Xiaosong Zhou

³,

Lihong Sun

³,

Xiaokai Huang

³,

Jincheng Qiu

³,

Yan Li

³ and

Xiaoran Wang

¹

State Key Laboratory for Fine Exploration and Intelligent Development of Coal Resources, China University of Mining and Technology, Xuzhou 221116, China

²

School of Mines, China University of Mining and Technology, Xuzhou 221116, China

³

School of Safety Engineering, China University of Mining and Technology, Xuzhou 221116, China

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(17), 7592; https://doi.org/10.3390/su16177592

Submission received: 18 July 2024 / Revised: 9 August 2024 / Accepted: 29 August 2024 / Published: 2 September 2024

(This article belongs to the Section Hazards and Sustainability)

Download

Browse Figures

Versions Notes

Abstract

:

Borehole hydraulic fracturing in coal seams can prevent dynamic coal mine disasters and promote the sustainability of the mining industry, and microseismic signal recognition is a prerequisite and foundation for microseismic monitoring technology that evaluates the effectiveness of hydraulic fracturing. This study constructed ultra-lightweight CNN models specifically designed to identify microseismic waveforms induced by borehole hydraulic fracturing in coal seams, namely Ul-Inception28, Ul-ResNet12, Ul-MobileNet17, and Ul-TripleConv8. The three best-performing models were selected to create both a probability averaging ensemble CNN model and a voting ensemble CNN model. Additionally, an automatic threshold adjustment strategy for CNN identification was introduced. The relationships between feature map entropy, training data volume, and model performance were also analyzed. The results indicated that our in-house models surpassed the performance of the InceptionV3, ResNet50, and MobileNetV3 models from the TensorFlow Keras library. Notably, the voting ensemble CNN model achieved an improvement of at least 0.0452 in the F1 score compared to individual models. The automatic threshold adjustment strategy enhanced the identification threshold’s precision to 26 decimal places. However, a continuous zero-entropy value in the feature maps of various channels was found to detract from the model’s generalization performance. Moreover, the expanded training dataset, derived from thousands of waveforms, proved more compatible with CNN models comprising hundreds of thousands of parameters. The findings of this research significantly contribute to the prevention of dynamic coal mine disasters, potentially reducing casualties, economic losses, and promoting the sustainable progress of the coal mining industry.

Keywords:

sustainability; hydraulic fracturing; coal seam; CNN; feature maps

1. Introduction

With the increasing depth of coal mining, dynamic coal mine disasters such as rock bursts and coal and gas outbursts are becoming increasingly serious and complex, and have become the major kinds disasters affecting the safe and efficient mining of deep coal [1,2]. Hydraulic fracturing technology has been widely used in the oil and gas industry and, in recent years, it has also been widely applied in the prevention and control of dynamic coal mine disasters such as rock burst and coal and gas outbursts in coal mines, achieving good application results [3,4]. Hydraulic fracturing pumps are specially designed to pump fracturing fluid into rock formations at pressures exceeding the fracturing pressure, causing cracks in the rocks and reducing stress concentration while increasing the permeability of oil and gas or coalbed methane [5]. Hydraulic fracturing technology is mainly applied to coal seams in coal mines and to oil and gas reservoirs such as sandstone or limestone in the oil and gas industry. During the process of rock fracturing, energy is released outward in the form of elastic waves, and microseismic waves are a type of elastic wave. Compared to oil and gas reservoirs, the microseismic waveforms induced by coal rock fractures are more complex and difficult to identify [6]. The evaluation of the effects hydraulic fracturing can optimize the fracturing process and ensure production safety [7]. Rock fractures caused by underground rock bursts, coal and gas outbursts, roof fractures, blasting, hydraulic fracturing, and other reasons can all form microseismic waveforms. However, compared to microseismic waveforms formed by other causes, microseismic waveforms formed by hydraulic fracturing are weaker and more difficult to identify. Microseismic/acoustic emission monitoring technology can directly characterize the evolution of a rock mass fracture by obtaining and processing the waveforms induced by the fracture [8]. Quickly and accurately identify the microseismic signals induced by hydraulic fracturing in a coal seam is a prerequisite and a foundation for the monitoring of microseismic positioning and evaluation of the effects of hydraulic fracturing in coal seams [9,10].

Traditional waveform recognition methods include long and short windows [11], methods based on the Akaike information criterion (AIC) [12], and waveform correlation based on waveform similarity [13]. These algorithms can accurately identify high signal-to-noise ratio microseismic waveforms, but it is difficult to identify low signal-to-noise ratio microseismic waveforms [14,15,16]. With the development of computers, deep learning technology is becoming increasingly mature. Deep learning makes computers intelligent by simulating the workings of the human brain [17]. In deep learning, “depth” refers to the hidden units in algorithms, which are equivalent to neurons in the brain. Although they do not directly participate in inputs and outputs, they are crucial for processing complex tasks [18]. Convolutional neural networks are an important type of deep learning neural network, as their convolutional and pooling operations perform well in image tasks. The current classic CNN algorithms include LeNet [19], Alexnet [20], VGG [21], Inception [22], ResNet model [23], MobileNet [24], etc. In recent years, CNNs have played an important role in image recognition, speech recognition, autonomous driving, and are still developing rapidly [25,26].

Various CNN models have been widely used in microseismic waveform identification in earthquake monitoring, hydraulic fracturing in oil and gas monitoring, rock burst monitoring, and pressure bump disaster monitoring in mining, and have achieved good results [27,28,29,30]. The signal-to-noise ratio of microseismic signals generated by earthquakes, pressure bump disasters, and rock bursts is high, and the pulse factor of microseismic waveforms monitored by hydraulic fracturing in oil and gas is larger, making these microseismic waveforms easier to identify. However, the microseismic signals induced by hydraulic fracturing in coal seams are more complex and weaker, making identification more difficult. At present, most microseismic identification based on CNN methods use a single CNN model [31,32], and there is little research on integrating and optimizing multiple CNN models. In theory, deeper and more complex models have stronger data processing capabilities, but more complex models also require more training data and better computer performance [33], which limits the development and application of CNN models. However, a CNN model that is too simple can make it difficult for the model to extract important features from images, thus failing to achieve more accurate classification results. Establishing how to use a lightweight CNN model and combine it with less training data to achieve better identification results without blindly increasing the number of CNN model layers is an important task. Model performance evaluation can intuitively reflect the identification and classification performance of the model. In addition, model parameters and training data can be adjusted based on the performance evaluation results to obtain a model with stronger generalization performance [34]. The commonly used performance evaluation indicators for CNN identification and classification include training set accuracy, test set accuracy, training set loss function, test set loss function, detection accuracy, detection accuracy, detection recall, detection F1, etc. [35,36,37]. The identification of microseismic waveform images by CNN is essentially the identification of image pixels. The image is transformed from the model input layer, through intermediate hidden layers (convolutional layers, pooling layers, ReLu layers, etc.) into the feature map, and finally output various probabilities from the model output layer [38,39]. The size, depth, pixel values, and other parameters of the feature map will change after passing through each layer in the model [40]. Therefore, the relevant parameters of feature map pixels can not only deeply explain the effectiveness of the model in extracting features from feature maps, but also reflect the performance of the model [41]. However, there are few studies that evaluate models from the perspective of feature map pixels.

Therefore, we built the ultra-lightweight Ul Inception 28, Ul ResNet12, Ul MobileNet17, and Ul TripleConv8 models, suitable for the identification of microseismic waveforms induced by hydraulic fracturing, and made use of Inception V3, ResNet50, and MobileNetV3 models from the TensorFlow library in Python. We selected three models with the best performance from the above seven CNN models to form the probability average-integrated CNN model and the voting-integrated CNN model, put forward the automatic adjustment strategy of CNN identification threshold, performed in-depth analysis of the changes in the transmission process of the feature map, and explained the reasons why different CNN models have different identification effects on microseismic waveforms. This study has reference significance for identifying low signal-to-noise ratio microseismic signals such as hydraulic fracturing in coal seams and coal mine dynamic disasters.

2. Methods

The initial learning rate for all CNN models was set to 0.0005, and it was halved whenever the test accuracy did not improve for three consecutive iterations. The bias value was set to 0, and the batch size was 128. When the number of channels containing microseismic waveforms within the same time window exceeded 3, it was considered that an event is present within that time window.

2.1. Different Convolution Methods

As illustrated in Figure 1, this paper incorporates three distinct CNN convolution techniques: standard convolution, depthwise convolution, and pointwise convolution. Standard convolution involves the application of each convolutional kernel to all input channels, with the results being summed to produce a single output channel. In cases where multiple convolutional kernels are utilized, each kernel generates its own independent output channel. Conversely, depthwise convolution involves the kernel sliding across the height and width of the input data, performing a weighted summation of the local neighborhood at every position. This method is predominantly employed to extract local image features, such as edges and textures. When it comes to pointwise convolution, the kernel typically measures 1 × 1, indicating that it operates exclusively on the channel dimension, disregarding spatial dimensions. Pointwise convolution is frequently used to adjust the number of channels in the input data. Furthermore, it facilitates the fusion of information from various channels, thereby augmenting the expressive capability of features. As the kernel solely focuses on inter-channel correlations, the parameter count and computational expense of pointwise convolution remain relatively low.

2.2. Training Data Acquisition

The acquisition of effective waveform training datasets follows these steps:

(1): The STA/LTA method is employed to extract microseismic events with high signal-to-noise ratios from continuous microseismic data (in SGY format). A total of 2449 events were obtained using this method. The long-term window for STA/LTA is set to 200 sampling points, the short-term window to 60 sampling points, and the threshold to 2.8. An event is determined to exist within a time window when more than three channels contain effective waveforms within the same time window. The time window is set to 4000 sampling points. The high threshold setting is to ensure that the events obtained are true microseismic events as much as possible, thereby ensuring the accuracy of the training dataset.
(2): Expert identification is used to select real microseismic events and background noise from the dataset obtained in step (1), with the background noise being placed in the background noise training set. This method selected a total of 1545 real events.
(3): The STA/LTA method is used to select waveforms from the microseismic events obtained in step (2) for the effective waveform training set. This method acquired 9993 microseismic waveforms. The long-term window for STA/LTA is set to 200 sampling points, the short-term window to 60 sampling points, and the threshold to 2.5. The reason for setting the threshold to 2.5 is that when the ratio of the short-term window to the long-term window is 2.5, the waveform is already relatively weak. When this ratio is below 2.5, it becomes difficult to determine whether the waveform represents a weak microseismic waveform or background noise. Compared to step (1), the STA/LTA threshold is lowered in this step because the event has already been confirmed as a real microseismic event, and the microseismic waveforms within the event are genuine. Therefore, the selection threshold can be appropriately reduced. In addition, since not all channels in the event contain effective waveforms, a threshold is set to ensure that the training data in the effective waveform training set differs from the background noise to some extent.
(4): Based on the microseismic waveforms obtained in step (3), repeated noise addition is performed to obtain a total of 809,433 microseismic waveforms. “Repeated” means that noise is added to the original waveform 1 to obtain waveform 2, then the same noise is added to waveform 2 to obtain waveform 3, and this process is repeated until waveform n is obtained. The number of noise additions was 80.
(5): The STA/LTA method is used to detect all waveforms obtained in step (4). The STA/LTA threshold is set to 2.4, and waveforms with a signal-to-noise ratio above 2.4 are considered effective waveforms. The reason for setting the threshold to 2.4 is that it has already been determined that these waveforms are real, even if they are relatively weak. Together with the original waveform data, a total of 489,496 microseismic waveforms were obtained.

The acquisition of background noise training datasets follows these steps:

(1): The STA/LTA method is used to extract background noise waveform segments from continuous microseismic data (in SGY format). When the ratio of the short-term window to the long-term window is less than 2.2 for all channels within the same time window, the waveform segment is considered background noise. This method acquired a total of 433,580 background noise training data points. Together with the background noise obtained in step (2) of the effective waveform training dataset acquisition, a total of 434,392 background noise data points were obtained. The long-term window for STA/LTA is set to 200 sampling points, the short-term window to 60 sampling points, and the threshold to 2.2.
(2): The expert checked the background noise training data obtained in step (1) and did not find any misidentification data. A total of 434,392 background noise training data points were obtained through this method.

2.3. Ultra-Light CNN Models

This study uses Python software version 3.7 for model building and data processing. Python is a free and open-source high-level programming language with a concise syntax and rich library support, enabling developers to efficiently write clear code. The Inception, MobileNet, and ResNet architectures are three significant structures within convolutional neural networks (CNNs). The following section outlines the characteristics of these three CNN models.

The primary features of Inception-style models include:

(1): Decomposition of larger convolutional kernels into smaller ones in series. For instance, a 5 × 5 kernel is decomposed into two 3 × 3 kernels, a 7 × 7 kernel into a 1 × 7 and a 7 × 1 kernel, and a 3 × 3 kernel into a 1 × 3 and a 3 × 1 kernel. This approach benefits from a decreased parameter count, ultimately accelerating training.
(2): When pooling first and then convolving there is more information loss, and the feature extraction ability of the model decreases; convolution before pooling requires a large amount of computation. Pooling and convolution should be parallelized. This model structure uses two parallel modules with stride 2, replacing serial with parallel.

The main innovation of the ResNet model lies in the introduction of “Residual Blocks”. Traditional neural networks add new transformations atop each preceding layer, while ResNet retains the original input from the preceding layer alongside these new transformations, termed “residuals”. This design enables the network to better learn differences between input and output rather than the output itself, enhancing model performance.

MobileNet architectures are distinguished by:

(1): The Mobilenet architecture model introduces the SE (Squeeze and Excitation) module and the h-swift activation function. The SE module enhances channel interaction and contextual information transmission in the network by adaptively learning the processing weights of channels, thereby improving the recognition ability of feature maps. The h-swift activation function is a new type of nonlinear activation function that can better enhance the nonlinear expression ability of the model.
(2): The MobileNet structural model adopts an inverted residual structure, which introduces a bottleneck layer between input and output to reduce computational complexity and memory usage. At the same time, by using the ReLU6 nonlinear activation function in the bottleneck layer, the nonlinear expression ability of the model is further enhanced.

Given the advantages of the three models mentioned above, this article has decided to build CNN models based on their model structures. In this paper, the InceptionV3, ResNet50, and MobileNetV3 models are invoked from the TensorFlow, Keras library. Post-training tests revealed unsatisfactory performance for these models, attributed to a potentially insufficient training dataset relative to the models’ extensive parameter counts. Consequently, ultra-lightweight models—Ul-Inception28, Ul-ResNet12, Ul-MobileNet17, and Ul-TripleConv8—tailored for small datasets of hydraulic fracturing microseismic waveforms were constructed, with the final digit in each model name indicating the number of convolutional layers. TripleConv8 simply stacks multiple convolutional layers, each accompanied by a BatchNormalization layer and an activation function layer. The model structures are illustrated in Figure 2, Figure 3, Figure 4, and Figure 5, respectively.

The total number of parameters, trainable parameters, and non-trainable parameters for the models InceptionV3, ResNet50, MobileNetV3, Ul-Inception28, Ul-ResNet12, Ul-MobileNet17, and Ul-TripleConv8 are presented in Table 1. The total number of parameters for InceptionV3 and ResNet50 reaches the tens of millions, while MobileNetV3 has millions of parameters. For Ul-Inception28, Ul-ResNet12, Ul-MobileNet17, and Ul-TripleConv8, the total number of parameters is in the hundreds of thousands. This is why the CNN models developed in this paper are referred to as ultra-lightweight.

2.4. Threshold Adjustment Strategies and Ultra-Lightweight CNN Integration Method

Figure 6 is a flowchart of the automatic threshold adjustment strategy. When the probability of a waveform being identified as a microseismic waveform by the model exceeds the threshold Q, the data are determined to be a microseismic waveform; otherwise, it is considered background noise. Initially, the identification threshold Q is set to 0.5, with a threshold variation amplitude of q = 0.5. STA/LTA and expert identification methods are then employed to identify m microseismic events from a segment of waveform data. Subsequently, the CNN model is used to identify the same batch of waveform data and output n microseismic events. If the absolute value of m-n is not less than 2, q is reduced to half of its original value. Next, the values of m and n are compared. If m is less than n, the threshold Q is increased by q (Q = Q + q); otherwise, Q is decreased by q (Q = Q − q). The updated CNN model with the new threshold is then used to reidentify the microseismic data. This process continues until the difference between m and n is less than 2. Once the absolute value of m-n is less than 2, the optimal threshold O is obtained, and the loop terminates.

Figure 7 is a flowchart of microseismic waveforms identification using the probability average model. The trained Ul-Inception28, -Ul-ResNet12, and Ul-TripleConv8 models are used to identify waveform segments, outputting probabilities P_A, P_B, and P_C that the waveform is a microseismic waveform. Based on this, the adjusted output probabilities P_WA, P_WB, and P_WC of each model are calculated by combining the optimal thresholds O_A, O_B, and O_C. The purpose of doing so is to uniformly adjust the probability thresholds of microseismic waveforms and background noise back to 50%. The average probability is denoted as P_W. The calculation methods for P_WA, P_WB, P_WC, and P_W are shown in Formulas (1), (2), (3), and (4), respectively. The discrimination probability of the probability-averaged ensemble CNN model is set to 50%. If P_W is greater than 50%, the waveform data are considered a microseismic waveform; otherwise, it is classified as background noise.

P_{W A} = (P_{A} / O_{A}) \times 0.5

(1)

P_{W B} = (P_{B} / O_{B}) \times 0.5

(2)

P_{W C} = (P_{C} / O_{C}) \times 0.5

(3)

P_{W} = (P_{W A} + P_{W B} + P_{W C}) / 3

(4)

Figure 8 is the flowchart of identifying microseismic waveforms using a voting ensemble CNN model. Trained Ul-Inception28, Ul-ResNet12, and Ul-TripleConv8 models are used to identify waveform segments. If at least two of these CNN models judge the waveform to be background noise, it is deemed as such. Conversely, if fewer than two CNN models classify the waveform segment as background noise, it is considered a microseismic waveform.

The automatic threshold adjustment strategy described in this article can be combined with the probability-averaged ensemble model and voting ensemble model to improve the recognition accuracy of microseismic waveforms. This method is applicable to microseismic waveforms formed by rock fractures caused by any reason. Firstly, the appropriate thresholds for each algorithm are obtained through threshold automatic adjustment strategy, and then the algorithms are integrated together through probability averaging or voting methods to obtain an ensemble algorithm with better recognition performance. Ensemble algorithms can be composed of any number of algorithms, not just the three algorithms mentioned in this article. The automatic threshold adjustment strategy can be applied to any waveform recognition algorithm that requires setting a threshold, including traditional algorithms and machine learning algorithms. The probability-averaged ensemble model can be applied to any algorithm that classifies based on probability, while the voting ensemble model can be applied to all algorithms.

3. Results

3.1. Training and Testing Results

The number of iterations for training all models is 50 (training all the data in the training set once is called 1 iteration), and after each iteration of training, all the data in the test set are tested once. Figure 9 shows the accuracy and loss function of the training set, as well as the accuracy and loss function of the testing set, in the iteration with the highest testing accuracy during the training process of each model. The training dataset contains 160,000 pieces of training data, including 80,000 effective waveforms and 80,000 instances of background noise. The test dataset contains 40,000 test data points, including 20,000 valid waveforms and 20,000 instances of background noise. The data in the training and testing sets do not intersect. There is no intersection between the detection dataset and the training and testing sets. The InceptionV3, ResNet50, and MobileNetV3 models demonstrated training and testing accuracy rates below 97%, with loss functions exceeding 0.1. Conversely, the Ul-Inception28, Ul-ResNet12, Ul-MobileNet17, and Ul-TripleConv8 models demonstrated training and testing accuracy rates exceeding 99%. Their training set loss functions were all less than or equal to 0.01, while their test set loss functions were all below 0.1. Notably, the Ul-MobileNet17 model achieved the highest training set accuracy of 99.6% and a peak test set accuracy of 99.1%. Among the four self-built models, it had the lowest accuracy and the highest loss function, with a training set loss of 0.01 and a test set loss of 0.03. Overall, the Ul-Inception28, Ul-ResNet12, Ul-MobileNet17, and Ul-TripleConv8 models outperformed the InceptionV3, ResNet50, and MobileNetV3 models.

Figure 10 depicts the automatic threshold adjustment process for Ul-Mobilenet. As iterations progress, the variation in threshold gradually decreases, with the threshold itself steadily decreasing and approaching the optimal value. Consequently, the error between the actual number of events and the number of events identified by the model also diminishes. At the 40th iteration, the quantity error becomes zero, denoting perfect alignment, and the optimal threshold is determined to be 2.72848410531878 × 10⁻¹². Thanks to this automatic threshold adjustment strategy, the threshold precision reached up to 26 decimal places.

Due to the relatively poor performance of the InceptionV3, ResNet50, and MobileNetV3 models from the TensorFlow library, we do not analyze them in the results section. Instead, we focus on evaluating the performance of the Ul-Inception28, Ul-ResNet12, Ul-MobileNet17, and Ul-TripleConv8 models using metrics such as precision (P_e), recall (R_e), and F1 score. The calculation formulas for these three metrics are presented in Equations (5), (6), and (7), respectively.

P_{e} = T_{p} / (T_{p} + F_{p})

(5)

R_{e} = T_{p} / (T_{p} + F_{n})

(6)

F 1 = (2 \times P_{e} \times R_{e}) / (P_{e} + R_{e})

(7)

Here, T_p represents true positives, which means the microseismic events identified by the CNN model are real microseismic events, whereas F_p stands for false positives, denoting that the identified microseismic events are not real ones. Tn denotes true negatives, denoting that the background noise identified by the CNN model is real background noise, while F_n signifies false negatives, implying that the identified background noise is not real background noise. A high precision rate denotes a low false identification rate, whereas a high recall rate denotes a low rate of missed identification by the CNN model. The F1 score is a metric used to assess the accuracy of binary classification models. It combines the strengths of both precision and recall, reflecting both the classification effectiveness and the completeness of the model’s identification.

Table 2 shows the event identification results of different CNN models. The F1 scores of Ul-Inception28, Ul-ResNet12, Ul-MobileNet17, and Ul-TripleConv8 are all superior to the STA/LTA method. Of the four CNN models built in-house, Ul-MobileNet17 exhibits the least effective performance. The voting ensemble model, constructed based on the Ul-Inception28, Ul-ResNet12, and Ul-TripleConv8 methods, outperforms any individual CNN model in terms of Tp, Fp, Pe, Re, and F1. Compared to the F1 scores of Ul-Inception28, Ul-ResNet12, and Ul-TripleConv8, the voting ensemble CNN model achieves improvements of 0.0521, 0.0521, and 0.0452, respectively. The ratios of the F1 values of the Ul-Perception28, Ul-ResNet12, Ul MobileNet17, and Ul-TripleConv8 models for identifying microseismic events to their total parameter quantities are 1.09 × 10⁻⁶, 2.79 × 10⁻⁶, 3.85 × 10⁻⁶, and 1.92 × 10⁻⁶, respectively. It can be seen that the Ul-MobileNet17 model achieves improved identification performance with the same number of parameters. However, of these four models, Ul MobileNet17 has the worst recognition performance, with an F1 value of approximately 0.827. Following analysis, it was believed that the Ul-MobileNet17 model has too few total parameters of 214722, which leads to insufficient feature extraction ability and poor recognition performance.

Microseismic event 1 in Figure 11 was identified by the voting ensemble CNN, but it was not identified by Ul-Inception28 or Ul-TripleConv8. Microseismic event 2 in Figure 11 was also identified by the voting ensemble CNN, yet Ul-ResNet12 failed to identify it. The blue box represents microseismic events, and the orange box represents the amplified waveform of a certain channel within the microseismic event. The identification of microseismic events is actually the identification of microseismic waveforms, and the challenge lies in identifying weak waveforms. In Figure 11a, channels 4, 6, 7, and 8 contain microseismic waveforms, but the waveforms in channels 4 and 8 are relatively weak and difficult to identify. Ul-Inception28 identified waveforms in channels 4, 6, and 7, while Ul-TripleConv8 identified waveforms in channels 6, 7, and 8. Both models failed to meet the event criterion for at least four channels containing microseismic waveforms. However, Ul-ResNet12 successfully identified waveforms in all four channels. Based on the voting ensemble principle, the voting ensemble CNN model identified waveforms in all four channels, thereby identifying the event. In Figure 11b, channels 4, 6, 7, 8, and 10 contain microseismic waveforms. Once again, the waveforms in channels 4 and 10 are relatively weak and challenging to identify. Both Ul-Inception28 and Ul-TripleConv8 identified waveforms in channels 4, 6, 7, 8, and 10, satisfying the event criterion of having at least four channels with microseismic waveforms. Meanwhile, Ul-ResNet12 only identified waveforms in channels 6, 7, and 8, falling short of the event criterion. The voting ensemble CNN model identified waveforms in all five channels, successfully identifying the event. It is worth noting that, in the final segment of the time window in Figure 11b, only channels 6 and 7 contain microseismic waveforms, which do not meet the event criterion of having at least four channels with waveforms; thus, it was not classified as an event. Compared to a single CNN model, the joint ultra-lightweight CNN model detected more weak waveforms, so its recognition result F1 value is also higher.

3.2. Transmission Law of Feature Maps

Based on the trained Ul Inception28, Ul ResNet12, Ul MobileNet17, and Ul TripleConv8, three types of waveforms were input: background noise, weak microseismic waveform, and high signal-to-noise ratio microseismic waveform. These three waveforms are illustrated in Figure 12 below. We analyzed the feature maps and their entropy values across different models, convolutional layers, and channels.

Figure 13a,b present the feature maps corresponding to the first channel of the initial convolutional layer in Ul-Inception28, with Figure 12a,c serving as input images, respectively. Various locations on the feature map correspond to distinct pixel values, which are represented by different colors. Remarkably, the feature map of the aforementioned channel bears a significant resemblance to the time-domain image. Conventionally, the initial convolutional layer is responsible for learning and extracting fundamental features from the image, including edges, colors, and textures. These characteristics play a pivotal role in the subsequent identification and comprehension of the image. The convolutional kernel parameters, obtained through rigorous training, signify optimized outcomes for the entire training dataset. Consequently, when handling diverse input images, these convolutional kernels efficiently extract pertinent features, showcasing their efficacy. This suggests that the extracted feature maps exhibit variations based on different inputs, highlighting the distinctive nature of the input data.

Figure 14, Figure 15, Figure 16 and Figure 17 illustrate the feature maps corresponding to the first channel of various convolutional layers within the Ul-Inception28, Ul-ResNet12, Ul-MobileNet17, and Ul-TripleConv8 models, respectively. These feature maps were generated using Figure 12b as the input data. As we progress deeper into the layers, the feature maps diminish in size, and the visual representations become increasingly abstract. This trend is inherent to the design of convolutional neural networks (CNNs), which often consist of multiple convolutional layers, pooling layers, and other components. Each successive layer further processes and transforms the input information. The shrinking size of the feature maps indicates a gradual shift from extracting detailed, low-level features (e.g., edges, colors, and textures) to more abstract, high-level representations (such as shapes, object components, and conceptual understanding). Due to their unique architectures, different models capture distinct aspects of the image, leading to a wide range of pixel values reflected in various positions on the feature maps.

Assembling multiple CNN models implies that each model can learn and understand data from diverse perspectives and levels, thereby extracting distinct features of the input data. This diversity aids in enhancing the entire ensemble model’s processing capability for complex data, improving identification accuracy and robustness. However, the ensemble CNN model is based on probability averaging requires averaging based on an optimal threshold. Although the optimal threshold obtained through the threshold adjustment strategy is very close to the ideal value, there still exists a certain error margin. This error leads to an increase in probability averaging error, ultimately compromising the effectiveness of the ensemble CNN model.

Figure 18 displays the feature maps from various channels of the final convolutional layer in UI-mobilenet8. Evidently, the color brightness differs among the feature maps of different channels. In convolutional neural networks, each channel’s convolution kernel is tasked with acquiring and isolating distinct features from the input data. Variations in color brightness signify diverse features captured by the convolutional layer. The prominent sections of the feature maps generally denote intense reactions during the convolution process, indicating that the network has pinpointed specific patterns or features within those regions.

Figure 19 illustrates feature maps from various channels of the final convolutional layer in MobileNetV3. Notably, each channel displays a uniform pixel value, consistent across all channels. This situation indicates that these channel feature maps are unable to convey significant information to the following convolutional layers, possibly resulting in the loss of valuable data. Furthermore, these channels essentially turn redundant, and the weights of the associated convolutional kernels may not undergo effective updates, ultimately compromising the model’s performance.

The entropy of feature maps from each channel of every convolutional layer in various convolutional neural network models is calculated. The entropy calculation method for feature maps is shown in Formula (8). A higher entropy of the feature map signifies a more uniform distribution of pixel values without any dominant ones, denoting that the model considers multiple possibilities when extracting features and possesses greater uncertainty. High entropy can be beneficial because it denotes that the model is more robust to variations in the input data. Conversely, lower entropy denotes a more concentrated distribution of pixel values in the feature map. This might denote that the model has strong confidence in certain features and can clearly identify them. However, excessively low entropy could lead to overconfidence, potentially causing the model to disregard other possibilities and perform poorly when presented with new, unseen data.

H (X) = - \sum_{i = 1}^{n} p_{i} \ln (p_{i})

(8)

When the feature maps of every channel in a particular convolutional layer of a CNN all have a size of 1 × 1, it only denotes that the entropy of each individual channel’s feature map in that layer is 0. However, this does not necessarily mean that the entropy of the entire convolutional layer’s feature maps is 0. As long as the pixel values of the feature maps in different channels of that layer vary, the entropy of the convolutional layer may not be zero. Additionally, even if a channel’s feature map size is not 1 × 1, but all pixel values on it are the same, the entropy of that channel’s feature map will still be zero. An entropy value of zero for feature maps in one convolutional layer of a CNN model does not imply that subsequent layers will also have zero-entropy feature maps. Due to complex interactions between layers, weight updates, the influence of activation functions, and network design choices, such as skip connections and batch normalization, subsequent layers can potentially regain non-zero-entropy values for their feature maps.

Figure 20 presents the entropy values of feature maps derived from various channels across multiple convolutional layers in different models. The entropy values are organized sequentially, beginning with the entropy of the first channel’s feature map in the initial convolutional layer, progressing to the entropy of the second channel’s feature map in the same layer, and continuing this pattern until reaching the entropy of the final channel’s feature map in the last convolutional layer. In the InceptionV3, ResNet50, and MobileNetV3 models, numerous instances are observed where consecutive feature maps from different convolutional layers and channels yield an entropy of 0. Such situations hinder the transmission of feature maps, ultimately leading to compromised model performance. The Ul-MobileNet17 model encounters a single instance of consecutive zero-entropy values spanning different convolutional layers and channels, whereas the Ul-Inception28, Ul-ResNet12, and Ul-TripleConv8 models do not encounter this problem. Among the Ul-Inception28, Ul-ResNet12, Ul-TripleConv8, and Ul-MobileNet17 models, Ul-MobileNet17 demonstrates the least satisfactory performance. In the Ul-TripleConv8 model, the entropy values of feature maps from different convolutional layers and channels initially increase and then stabilize. This trend indicates that diversity and information are preserved during feature map transmission, contributing to its superior performance compared to the Ul-Inception28, Ul-ResNet12, and Ul-MobileNet17 models. Conversely, the Ul-Inception28 and Ul-ResNet12 models show fluctuating entropy values across different convolutional layers and channels, suggesting less optimal performance relative to Ul-TripleConv8, albeit with a marginal difference. Clearly, consecutive zero-entropy values across various convolutional layers and channels significantly impact model performance.

Based on the entropy changes during the transmission of feature maps in the CNN model, corresponding quadratic linear formulas were fitted, including InceptionV3, ResNet50, MobileNetV3, UI-TripleConv8, UI-ResNet12, UI-MobileNet17, and UI-Identifion18, which correspond to seven formulas, as shown in Formulas (9) to (15). According to the formulas, it can be observed that as the number of layers increases, the entropy value of the feature map shows a decreasing trend.

y = 4 . 17 e - 09 x^{2} - 1 . 48 e - 04 x + 1 . 26

(9)

y = - 3 . 86 e - 10 x^{2} + 3 . 35 e - 04 x + 0 . 04

(10)

y = - 6 . 56 e - 10 x^{2} - 3 . 56 e - 06 x + 0 . 53

(11)

y = - 6 . 77 e - 06 x^{2} + 0 . 005 x + 0 . 96

(12)

y = - 5 . 43 e - 06 x^{2} + 0 . 003 e - 04 x + 0 . 92

(13)

y = - 2 . 04 e - 08 x^{2} + 7 . 18 e - 05 x + 0 . 60

(14)

y = - 6 . 26 e - 08 x^{2} + 3 . 19 e - 04 x + 0 . 50

(15)

Deep models learn feature representations of data through layer-by-layer propagation. Deeper networks have the capability to learn more abstract and advanced features. However, this requires a substantial amount of data to train each layer, ensuring that meaningful feature transformations are learned at every level. Upon analysis, it is believed that the main reason for the consecutive zero-entropy values in feature maps across different convolutional layers and channels in InceptionV3, ResNet50, and MobileNetV3 is the relatively deep network architecture of these models. During backpropagation, gradients tend to decrease layer by layer, resulting in very small gradients, even close to zero, for the hidden layers near the input layer. Consequently, their weights barely update during training. When weights cannot be effectively updated, the model struggles to extract meaningful features from the input data. This leads to a lack of diversity in the feature maps passed to the subsequent layers of the CNN, ultimately resulting in a feature map entropy of zero. Increasing the quantity and quality of training data represents a viable method to enhance model performance.

The primary reason for consecutive zero-entropy values in feature maps across different convolutional layers and channels in Ul-MobileNet17 is attributed to the numerous 1 × 1 convolution kernels employed in the model. The limited receptive field of these convolutional layers restricts their capacity to capture a sufficiently broad context, thereby yielding unrepresentative extracted features. Furthermore, the model’s insufficient parameter count may hinder its ability to learn the full spectrum of data features, thus inadequately representing the complexity of the input data. Consequently, this can result in a homogenization of the feature map output, ultimately reducing its entropy to zero.

4. Discussion

Deeper models often imply higher complexity. As the number of layers in a model increases, so does the number of parameters, which requires the model to learn more intricacies and feature mappings. To effectively train these parameters, prevent overfitting, and enable the model to grasp the true data distribution, a larger volume of training data are necessary to provide adequate constraints and information. Figure 21 illustrates the F1 score for data identification under different training data volumes for InceptionV3, ResNet50, and MobileNetV3 models. Surprisingly, with the increase in training data, the F1 score does not change significantly. Upon analysis, two primary reasons emerge: (1) the models are excessively complex, and (2) the training data are overly simplistic. The three models, InceptionV3, ResNet50, and MobileNetV3, boast millions, even tens of millions of parameters. However, the training dataset comprises just 9000 waveform data points, augmented to create a larger dataset. Despite the increased quantity, the quality of the training data does not match the models’ complexity due to the significant similarity among the data points.

When the model structure becomes excessively complex, it may possess an overabundance of parameters and representational power, enabling it to fit the training data perfectly, encompassing even noise and outliers. This often leads to overfitting, a scenario where the model demonstrates exceptional performance on the training set but experiences a notable decline in performance when presented with unseen test sets or practical applications. An overfitted model tends to overly concentrate on the specifics of the training data, thereby overlooking the underlying patterns and generalization capabilities inherent in the data. For the original dataset consisting of 9000 waveforms, data augmentation has resulted in an expanded training dataset that is more conducive for CNN models with parameter counts in the tens of thousands. By combining multiple distinct CNN models through ensemble voting, the performance of the CNN model can be further enhanced.

5. Conclusions

(1): The threshold is automatically adjusted based on the error between the number of events identified by the model and the actual number of events. As iterations progress, the threshold variation gradually decreases, approaching an optimal threshold. The threshold precision reaches up to 26 decimal places.
(2): Using the method proposed in this article to identify microseismic waveforms in the hydraulic fracturing in coal seam in the Xieqiao Mine effectively improves the recognition effect of microseismic waveforms. The voting ensemble model built based on Ul Perception28, Ul-ResNet12, and Ul-TripleConv8 methods has improved the F1 index by at least 0.0452 compared to a single CNN model, and by 0.1552 compared to the STA/LTA method commonly used in coal mines.
(3): When the entropy of feature maps across various layers and channels consecutively reaches zero, it can compromise the transmission of feature maps, leading to poor model performance. Factors like a high number of model parameters, a small receptive field of convolutional layers, or insufficient parameters can contribute to consecutive zero entropies in feature maps across layers and channels. The training dataset, expanded from thousands of waveform data, is more suitable for CNN models with parameter counts in the tens of thousands. It is challenging to achieve good performance with CNN models that have millions or tens of millions of parameters.
(4): By integrating ultra-lightweight CNN models to identify microseismic events, it is helpful to characterize more complete hydraulic fracturing fractures in coal seams, thereby more accurately evaluating the fracturing effect and releasing pressure in stress concentration areas more precisely. We plan to integrate more types of ultra-lightweight deep learning models to further improve the accuracy of microseismic waveform recognition. This study can provide reference and guidance for the prevention and control of coal mine dynamic disasters based on hydraulic fracturing in coal seams.

Author Contributions

Conceptualization, N.L.; methodology, N.L. and Y.Z.; code, Y.Z.; field test and validation, N.L. and Y.Z.; data analysis, Y.Z., N.L., X.Z. and X.W.; writing—original draft preparation, N.L. and Y.Z.; writing—review and editing, L.S., J.Q., X.H. and Y.L.; project administration, N.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China (2022YFE0128300) and the National Natural Science Foundation of China (52174221).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article material, further inquiries can be directed to the corresponding author.

Acknowledgments

We would like to thank the editor and anonymous reviewers for their constructive comments. We would like to thank Haijiang Zhang, Jiawei Qian, and Huasheng Cha from University of Science and Technology of China, and Xin Zhang from the University of New South Wales for their help in conducting this experiment.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yuan, L.; Wang, E.Y.; Ma, Y.K.; Liu, Y.B.; Li, X.L. Research progress of coal and rock dynamic disasters and scientific and technological problems in China. J. China Coal Soc. 2023, 48, 1825–1845. [Google Scholar]
Cao, A.Y.; Dou, L.M.; Bai, X.X.; Liu, Y.Q.; Yang, K.; Li, J.Z.; Wang, B. State-of-the-art occurrence mechanism and hazard control of mining tremors and their challenges in Chinese coal mines. J. China Coal Soc. 2023, 48, 1894–1918. [Google Scholar]
Li, N.; Fang, L.L.; Sun, W.C.; Zhang, X.; Chen, D. Evaluation of Borehole Hydraulic Fracturing in Coal Seam Using the Microseismic Monitoring Method. Mec. Roches. 2021, 54, 607–625. [Google Scholar] [CrossRef]
Yuan, L.; Lin, B.Q.; Yang, W. Research progress on hydraulic fracture characteristics and controlling factors of coalbed methane reservoirs. J. China Coal Soc. 2023, 48, 4443–4460. [Google Scholar]
Zenchenko, E.V.; Turuntaev, S.B.; Nachev, V.A.; Chumakov, T.K.; Zenchenko, P.E. Study of the Interaction of a Hydraulic Fracture with a Natural Fracture in a Laboratory Experiment Based on Ultrasonic Transmission Monitoring. Energies 2024, 17, 277. [Google Scholar] [CrossRef]
Zhang, Y.P.; Li, N.; Sun, L.H.; Qiu, J.C.; Huang, X.K.; Li, Y. Recognition of Weak Microseismic Events Induced by Borehole Hydraulic Fracturing in Coal Seam Based on ResNet-10. Appl. Sci. 2024, 14, 80. [Google Scholar] [CrossRef]
Lu, Z.H.; Jia, Y.Z.; Cheng, L.J.; Pan, Z.J.; Xu, L.J.; He, P.; Guo, X.Z.; Ouyang, L.M. Microseismic Monitoring of Hydraulic Fracture Propagation and Seismic Risks in Shale Reservoir with a Steep Dip Angle. Nat. Resour. Res. 2022, 31, 2973–2993. [Google Scholar] [CrossRef]
Li, D.K.; He, Z.R.; Zhang, Z.D.; Zou, Y.S.; Wang, X.Q.; Xie, F. A new template matching based acoustic emission detection procedure and its application in laboratory hydraulic fracturing experiment. Chin. J. Geophys. 2023, 66, 4386–4401. [Google Scholar]
Li, N.; Sun, W.C.; Huang, B.X.; Chen, D.; Zhang, S.H.; Yan, M.Y. Acoustic Emission Source Location Monitoring of Laboratory-Scale Hydraulic Fracturing of Coal Under True Triaxial Stress. Nat. Resour. Res. 2021, 30, 2297–2315. [Google Scholar] [CrossRef]
Chen, G.Y.; Yang, W.; Tan, Y.Y.; Zhang, H.J.; Li, J.L. Automatic phase detection and arrival picking for microseismic events in hydraulic fracturing based on machine learning and array correlation. Chin. J. Geophys. 2023, 66, 1558–1574. [Google Scholar]
Allen, Automatic earthquake recognition and timing from single trace. Seismol. Soc. Am. Bull. 1978, 68, 1521–1532. [CrossRef]
Akaike, H. A new look at the statistical model identification. IEEE Ttansactions Autom. Control. 1974, 19, 716–723. [Google Scholar] [CrossRef]
Gibbons, S.J.; Ringdal, F. The detection of low magnitude seismic events using array-based waveform correlation. Coal Sci. Technol. 2010, 165, 149–166. [Google Scholar] [CrossRef]
Xu, H.Y.; Zhao, Y.; Yang, T.H.; Wang, S.H.; Chang, Y.Q.; Jia, P. An automatic P-wave onset time picking method for mining-induced microseismic data based on long short-term memory deep neural network. Geomat. Nat. Hazards Risk. 2022, 13, 908–933. [Google Scholar] [CrossRef]
Zhang, J.Y.; Dong, L.j.; Xu, N.W. Noise Suppression of Microseismic Signals via Adaptive Variational Mode Decomposition and Akaike Information Criterion. Criterion 2020, 10, 3790. [Google Scholar] [CrossRef]
Zhang, Y.N.; Li, H.Y.; Zhang, S.Z.; Li, Y.Z.; Huang, Y.F.; Zhong, W.X. Microseismic detection and location in Shanghai and its adjacent areas. Chin. J. Geophys.-Chin. Ed. 2023, 66, 1113–1124. [Google Scholar]
Shan, P.F.; Yan, Z.M.; Lai, X.P.; Xu, H.C.; Hu, Q.X.; Guo, Z.A. An analytical methodology of rock burst with fully mechanized top-coal caving mining in steeply inclined thick coal seam. Sci. Rep. 2024, 14, 451. [Google Scholar] [CrossRef]
Hinton, G. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
Lecun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Tamura, H. An analysis of information segregation in parallel streams of a multi-stream convolutional neural network. Sci. Rep. 2024, 14, 9097. [Google Scholar] [CrossRef]
Fan, W.Q.; Li, X.Y.; Liu, Z.C. Fusion of visible and infrared images using GE-WA model and VGG-19 network. Sci. Rep. 2023, 13, 190. [Google Scholar] [CrossRef] [PubMed]
Guan, K.N.; Sun, Y.; Yang, G.; Yang, X.H. Knowledge Acquisition and Reasoning Model for Welding Information Integration Based on CNN and Knowledge Graph. Electronics 2015, 12, 1275. [Google Scholar] [CrossRef]
Wen, T.H.; Hung, J.M.; Huang, W.H.; Jhang, J.; Lo, Y.C.; Hsu, H.H.; Ke, Z.E.; Chen, Y.C.; Chin, Y.H.; Su, I.; et al. Fusion of memristor and digital compute-in-memory processing for energy—efficient edge computing. Science 2024, 384, 325–332. [Google Scholar] [CrossRef]
Rusyn, B.; Lutsyk, O.; Kosarevych, R.; Maksymyuk, T.; Gazda, J. Features extraction from multi-spectral remote sensing images based on multi-threshold binarization. Sci. Rep. 2023, 13, 19655. [Google Scholar] [CrossRef] [PubMed]
Fakhouri, H.N.; Alawadi, S.; Awaysheh, F.M.; Hamad, F. Novel hybrid success history intelligent optimizer with Gaussian transformation: Application in CNN hyperparameter tuning. Clust. Comput. 2023, 27, 3717–3739. [Google Scholar]
Baccari, B.; Krouma, A. Rhizosphere Acidification Determines Phosphorus Availability in Calcareous Soil and Influences Faba Bean (Vicia faba) Tolerance to P Deficiency. Sustainability 2023, 15, 6203. [Google Scholar] [CrossRef]
Wang, H.C.; Alkhalifah, T.; bin Waheed, U.; Birnie, C. Data-Driven Microseismic Event Localization: An Application to the Oklahoma Arkoma Basin Hydraulic Fracturing Data. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–12. [Google Scholar] [CrossRef]
Zhao, M.; Tang, L.; Chen, S.; Su, J.R.; Zhang, M. Machine learning based automatic foreshock catalog building for the 2019 Ms6.0 Changning, Sichuan earthquake. Chin. J. Geophys. 2021, 64, 54–66. [Google Scholar]
Ma, Y.Z.; Zhu, S.T.; Pan, J.F.; Gao, Y.T.; Zhang, X.F.; Jiang, F.X.; Liu, J.H.; Wang, B.; Chen, Y. Identification and on-site application of the main hazard-causing stratum of overlying strata in coal mines. J. China Coal Soc. 2024, 49, 2589–2603. [Google Scholar]
Zhang, X.J.; Liu, X.; Wang, Y.C. Study on the evolution and prediction of stratified burst in hard rocks based on disturbance and interlayer interaction. J. Min. Satety Eng. 2024, 41, 504–510. [Google Scholar]
Peng, P.A.; Lei, R.; Wang, J.M. Enhancing Microseismic Signal Classification in Metal Mines Using Transformer-Based Deep Learning. Sustainability 2023, 15, 14959. [Google Scholar] [CrossRef]
Choi, J.; Byun, J.; Seol, S.J.; Lee, S.K. Convolutional neural network-based moment tensor inversion using domain adaptation for microseismicity monitoring. Explor. Geophys. 2023, 54, 133–143. [Google Scholar] [CrossRef]
Li, J.M.; Tang, S.B.; Weng, F.W.; Li, K.Y.; Yao, H.W.; He, Q.Y. Waveform recognition and process interpretation of microseismic monitoring based on an improved LeNet5 convolutional neural network. Coal Sci. Technol. 2023, 30, 904–918. [Google Scholar] [CrossRef]
Zhang, X.L.; Wang, X.H.; Zhang, Z.H.; Wang, Z.H. CNN-Transformer for Microseismic Signal Classification. Electronics 2023, 12, 2468. [Google Scholar] [CrossRef]
Tian, J.H.; Tian, Z.C.; Zhang, M.W.; Meng, Q.B.; Zhang, A.H.; Liu, C.; Jia, L. A novel identification method of microseismic events based on empirical mode decomposition and artificial neural network features. J. Appl. Geophys. 2024, 222, 105329. [Google Scholar] [CrossRef]
Wang, Y.J.; Qiu, Q.; Lan, Z.Q.; Chen, K.Y.; Zhou, J.; Gao, P.; Zhang, W. Identifying microseismic events using a dual-channel CNN with wavelet packets decomposition coefficients. Comput. Geosci. 2022, 166, 105164. [Google Scholar] [CrossRef]
Dong, L.J.; Shu, H.M.; Tang, Z.; Yan, X.H. Microseismic event waveform classification using CNN-based transfer learning models. Nt. J. Min. Sci. Technol. 2023, 33, 1203–1216. [Google Scholar] [CrossRef]
Shu, H.M.; Dawod, A.Y. Microseismic Monitoring Signal Waveform Recognition and Classification: Review of Contemporary Techniques. Appl. Sci. 2023, 13, 12739. [Google Scholar] [CrossRef]
Li, X.G.; Feng, S.; Guo, Y.T.; Li, H.Y.; Zhou, Y.J. Denoising Method for Microseismic Signals with Convolutional Neural Network Based on Transfer Learning. Int. J. Comput. Intell. Syst. 2023, 16, 91. [Google Scholar] [CrossRef]
Ma, Y.Y.; Eaton, D.; Igonin, N.; Wang, Y. Machine learning-assisted processing workflow for multi-fiber DAS microseismic data. Front. Earth Sci. 2023, 11, 1096212. [Google Scholar] [CrossRef]
Di, Y.Y.; Wang, E.Y.; Li, Z.H.; Liu, X.F.; Huang, T.; Yao, J.J. Comprehensive early warning method of microseismic, acoustic emission, and electromagnetic radiation signals of rock burst based on deep learning. Int. J. Rock. Mech. Min. Sci. 2023, 170, 105519. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of different convolution methods.

Figure 2. Structure of the Ul-Inception28 model.

Figure 3. Structure of the Ul-ResNet-12 model.

Figure 4. Structure of the Ul-MobileNet-17 model.

Figure 5. Structure of the Ul-TripleConv8 model.

Figure 6. Flowchart of automatic adjustment strategy for identification threshold.

Figure 7. Flowchart of microseismic waveform identification by probability-averaged CNN model.

Figure 8. Flowchart of microseismic waveform identification by voting ensemble CNN Model.

Figure 9. Training and testing accuracy and loss functions of different CNN models.

Figure 10. Ul-Mobilenet automatic threshold adjustment identification process.

Figure 11. Examples of identified event results. (a) Microseismic event 1; (b) microseismic event 2.

Figure 12. Time-domain images of three microseismic waveforms. (a) Background noise; (b) weak microseismic waveform; (c) high signal-to-noise ratio microseismic waveform.

Figure 13. Feature maps of the first channel in the first convolutional layer of Ul-Inception28 for different images. (a) Background noise; (b) high signal-to-noise ratio microseismic waveform.

Figure 14. Feature maps of the first channel in different convolutional layers of Ul-Inception28. (a) The 1st convolutional layer; (b) the 10th convolutional layer; (c) the 19th convolutional layer; (d) the 28th convolutional layer.

Figure 15. Feature maps of the first channel in different convolutional layers of Ul-ResNet12. (a) The 1st convolutional layer; (b) the 5th convolutional layer; (c) the 9th convolutional layer; (d) the 12th convolutional layer.

Figure 16. Feature maps of the first channel in different convolutional layers of Ul-MobileNet17. (a) The 1st convolutional layer; (b) the 6th convolutional layer; (c) the 11th convolutional layer; (d) the 17th convolutional layer.

Figure 17. Feature maps of the first channel in different convolutional layers of Ul-TripleConv8. (a) The first convolutional layer; (b) the third convolutional layer; (c) the fifth convolutional layer; (d) the eighth convolutional layer.

Figure 18. Feature maps of different channels in the last convolutional layer of UI-mobilenet8. (a) The 64th channel; (b) the 128th channel; (c) the 192nd channel; (d) the 256th channel.

Figure 19. Feature maps of different channels in the last convolutional layer of MobilenetV3. (a) The 64th channel; (b) the 128th channel; (c) the 192nd channel; (d) the 256th channel.

Figure 20. Entropy values of feature maps from all channels of all convolutional layers across different models. (a) InceptionV3; (b) ResNet50; (c) MobileNetV3; (d) UI-TripleConv8; (e) UI-ResNet12; (f) UI-MobileNet17; (g) UI-Inception18.

Figure 21. F1 scores of microseismic waveform recognition for different models trained on various datasets.

Table 1. Relevant parameters of different CNN models.

Models	Total Params	Trainable Params	Non-Trainable Params
InceptionV3	23,851,784	23,817,352	34,432
ResNet50	25,636,712	25,583,592	53,120
MobileNetV3	2,554,968	2,542,856	12,112
Ul-Inception28	834,066	830,706	3360
Ul-ResNet12	325,570	324,578	992
Ul-MobileNet17	214,722	210,498	4224
Ul-TripleConv8	476,690	475,474	1216

Table 2. Event identification results of different CNN models.

Evaluation Parameters	STA\LTA	Ul-Inception28	Ul-ResNet12	Ul-MobileNet17	Ul-TripleConv8	Average Ensemble	Voting Integration
T_p	54	60	60	55	60	64	62
F_p	14	6	6	12	5	15	1
P_e	0.7941	0.9091	0.9091	0.8209	0.9231	0.8101	0.9841
R_e	0.8182	0.9091	0.9091	0.8333	0.9091	0.9697	0.9394
F1	0.806	0.9091	0.9091	0.8271	0.916	0.8828	0.9612

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, N.; Zhang, Y.; Zhou, X.; Sun, L.; Huang, X.; Qiu, J.; Li, Y.; Wang, X. Promoting Sustainable Development of Coal Mines: CNN Model Optimization for Identification of Microseismic Signals Induced by Hydraulic Fracturing in Coal Seams. Sustainability 2024, 16, 7592. https://doi.org/10.3390/su16177592

AMA Style

Li N, Zhang Y, Zhou X, Sun L, Huang X, Qiu J, Li Y, Wang X. Promoting Sustainable Development of Coal Mines: CNN Model Optimization for Identification of Microseismic Signals Induced by Hydraulic Fracturing in Coal Seams. Sustainability. 2024; 16(17):7592. https://doi.org/10.3390/su16177592

Chicago/Turabian Style

Li, Nan, Yunpeng Zhang, Xiaosong Zhou, Lihong Sun, Xiaokai Huang, Jincheng Qiu, Yan Li, and Xiaoran Wang. 2024. "Promoting Sustainable Development of Coal Mines: CNN Model Optimization for Identification of Microseismic Signals Induced by Hydraulic Fracturing in Coal Seams" Sustainability 16, no. 17: 7592. https://doi.org/10.3390/su16177592

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Promoting Sustainable Development of Coal Mines: CNN Model Optimization for Identification of Microseismic Signals Induced by Hydraulic Fracturing in Coal Seams

Abstract

1. Introduction

2. Methods

2.1. Different Convolution Methods

2.2. Training Data Acquisition

2.3. Ultra-Light CNN Models

2.4. Threshold Adjustment Strategies and Ultra-Lightweight CNN Integration Method

3. Results

3.1. Training and Testing Results

3.2. Transmission Law of Feature Maps

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI