3.1. TCN
Temporal Convolutional Network is a sequence modeling method based on convolutional neural network [
21]. TCN is built by stacking residual link structures, based on the one-dimensional convolution of residual links, and combined with dilation convolution. TCN uses residual links to transfer gas-containing coal fracture signal information across layers to speed up the model training speed. Causal convolution is used to ensure that the risk level prediction outputs do not use future information, and dilation convolution is used to capture the long-term dependency information in the signal data and to broaden the convolution kernel of TCN to fit the historical gas-containing coal fracture signal data. The TCN convolution kernel is widened to fit the historical gas-containing coal fracture signal data to ensure the accurate prediction of the risk level of gas-containing coal fractures.
The dilation convolution is formulated as follows:
where
is the convolution kernel size,
denotes the number of filters in the convolution operation, and
is the expansion factor. When
= 1, causal convolution performs dot product operation on the input data. When
≠ 1, dilation convolution operation is performed on the input data, and the expansion coefficient varies as the number of network layers increases in the manner of
.
Convolutional computation is performed at each layer, and the convolutional result is output after multiple rounds of convolution. The feeling field computation formula is shown in Equation (10).
The residual module ensures the stability of the multilayer network to effectively solve the data loss problem and is arranged in accordance with the causal expansion convolutional layer, WeightNorm layer, ReLU layer and Dropout layer composition, in which the ReLU function used in the ReLU layer contributes to the improvement of the convergence speed of the model and the solution of the vanishing problem. However, due to the ReLU activation function in the input being negative when the gradient is 0, this results in the “necrosis” phenomenon. The PeLU [
22] is introduced to replace the ReLU activation function, and the PeLU activation function formula is as follows:
where
and
are learnable parameters that are adaptively tuned according to the dataset. Figures show a comparison of the activation functions for ReLU and PeLU (
= 0.9,
= 1).
As can be seen from
Figure 1, when
, the derivative of PeLU is always not 0, which is smoother than the ReLU function, which solves the problem of “neuron death” and effectively enhances the ability of the model to predict the risk level of gas-containing coal fractures.
3.2. BIGRU
The BiGRU model utilizes a combination of forward GRU and reverse GRU to calculate the signal sequence of gas-containing coal fractures, which can effectively balance the influence of historical and future signal data on the current signal and make up for the defects of the TCN network, such as its inability to obtain information from back to front [
21].
The BiGRU network takes the output vector of the TCN as input and obtains the update gate state
and reset gate state from the previous moment’s state information and the current input:
where
is the reset gate output;
represents the update gate output;
represents the current hidden state;
and
are the weights and bias terms; and
is the Sigmoid function.
On this basis, the current gas-containing coal fracture signal output is calculated as
.
where
,
,
and
are the weight matrix and bias, and the activation function
scales to the signal feature data to [−1, 1].
After this cycle, the predicted risk level of gas-containing coal fractures from the BiGRU model is obtained.
3.3. BiGRU Hyperparameter Optimization
In order to improve the accuracy of gas-containing coal fracture risk class prediction and reduce the influence of manual parameter selection, the improved SCSO algorithm is utilized to optimize the three prediction parameters, namely, the hyperparameter batch size B, the learning rate L, and the number of iterations M, which have a large influence on the results, in order to determine the optimal parameter combinations and avoid repeated experiments.
The sand cat swarm optimization algorithm is a novel, simple and efficient swarm optimization algorithm proposed by Seyyedabbasi et al. in 2022. The algorithm mimics the hunting behavior of the sand cat herd, which can detect low frequencies below 2 KHz and also has an incredible ability to dig for prey. The sand cat group optimization algorithm is inspired by these two aspects and divides the foraging behavior of sand cats into two stages: global search and attack prey [
23,
24]. The BiGRU hyperparameters are regarded as the prey, and SCSO is utilized to find the optimal and improve the algorithm.
In the population initialization stage, the random initialization of the population raises the problem of uneven distribution, which affects the quality of optimal solution. Adopting Singer mapping [
25] to the initial population of sand cat allows the population to obtain a more symmetrical probability distribution, increasing the probability of obtaining the optimal solution. The Singer chaotic mapping formula is shown in Equation (16):
where
is a control parameter with the value range of (0, 1). When
∈ [0.9, 1.08], the Singer mapping has chaotic behavior.
The prey–exploration equation for the sand cat colony is described as follows:
where
denotes the position vector search agent;
denotes the number of iterations for the current iteration;
denotes the best candidate position;
denotes the current position of the search agent;
denotes the range of sensitivity of the sand cat to low-frequency noise;
denotes the range of general sensitivity that decreases linearly from 2 to 0;
is the current iteration; and
is the maximum number of iterations.
The decreasing factor
in SCSO decreases linearly, which causes the algorithm to converge slowly in the late iterations and thus fall into the local optimum. For this reason, a chaotic decreasing factor is introduced to avoid the feasible solution from falling into the local optimum. The improved decreasing factor is expressed as follows:
In addition, since sand cats can sense frequencies as low as 2 kHz, takes the value of 2.
The SCSO algorithm attacks the prey at the end of the prey search, and the prey attack mechanism for the sand cat population is described below:
where
denotes a random angle between 0 and 360; and
denotes a random position generated from the best position and the current position. Using this method, each member of the population is able to move in a different circumferential direction. Each sand cat chooses a random angle. In this way, the sand cat can avoid locally optimal traps as it approaches the prey position.
SCSO balances the exploration and exploitation phases by an adaptive factor, , which is the global search phase if |R| > 1 and the attack phase if |R| < 1.
In order to further enhance the population diversity in the late iteration and improve the global search capability of the algorithm, the adaptive t-distribution is introduced to mutate the adaptive t-distribution for the current global optimal solution and update the optimal solution. The t-distribution probability density function is as follows:
where
is the degrees of freedom parameter. The degrees-of-freedom parameter of the adaptive t-distribution is the current number of iterations of the algorithm. At the beginning of the iteration of the algorithm, due to the relatively small number of iterations, the t-distribution tends to be Cauchy distribution at this time, which increases the diversity of the population and improves the global search ability of the algorithm. In the later stages of the iteration, the number of iterations is relatively large, and the t-distribution tends to be a Gaussian distribution at this time, which is conducive to a fine search in a small range and enhances the local convergence ability of the algorithm. In order to increase the population diversity in the early iteration and improve the local exploitation ability in the late iteration, the adaptive parameter
is introduced. The formula is as follows:
As can be seen from the above equation, the adaptive parameter
is relatively large in the early stages of the iteration. This means that the t-distribution can be better utilized to increase the diversity of the population. In the later stages of the iteration, the adaptive parameter
is gradually reduced to reduce the influence of the t-distribution on the position of individuals so as to retain more optimal individuals. Set the variation probability to 0.5 and implement the t-distribution change strategy when the generated random number [0, 1] is smaller than the variation probability. The formula is as follows:
where
denotes the location of the ith sand cat individual after the update of the t-distribution;
denotes the t-distribution with the degree of freedom parameter t; and
denotes the current number of iterations.
3.4. Establishment of Gas-Containing Coal Fracture Risk Level Prediction Model
Combined with ISCSO algorithm, each neural network hyperparameter is optimized while optimizing the prediction parameters, and the ISCSO-optimized gas-containing coal fracture risk level prediction model composed of TCN-BIGRU is established.
The signal features of gas-containing coal fracture obtained by dimensionality reduction were randomly divided into the training set and the test set in a ratio of 3:1 for each class in the risk class prediction index Z. The model was then analyzed for the risk class prediction index. In order to eliminate the differences in the scale and value of different risk level predictors of gas-containing coal fractures and accelerate the convergence of the model, a robust normalization (RobustScaler) was applied to the data in the training and test sets [
26]. In order to eliminate the differences in the magnitudes and values of different risk level prediction indicators of gas coal fracture, the model was normalized by RobustScaler, which retains the outliers to the maximum extent and scales the data according to the inter-quadratic range (IQR) to weaken the influence of the outliers. The sequence
is transformed to
using the RobustScaler method with the following normalization formula:
where
represents the median of the data, and
is the quartile spacing value of the data.
The pre-processed gas-containing coal fracture data are input into the TCN model, the weights of TCN network are initialized, and the number of iterations, the number of residual modules, the number of network layers, the expansion coefficient, the size of convolution kernel, etc. are adjusted according to the training of the TCN model so as to prevent local shackles caused by too large a span of the time-series data. Furthermore, the prediction results of the risk level of gas-containing coal fracture are outputted through the fully connected layer at the end of the training of the model and inputted into the BIGRU network for further prediction. The BIGRU network is used for the next prediction. The ISCSO-ATCN gas outburst risk prediction modeling process is shown in
Figure 2.
The optimization of BIGRU parameters using the ISCSO algorithm, combined with the current prediction parameter results, optimize the risk level prediction model using the predicted error rate as the degree of adaptation. Set the maximum number of iterations of ISCSO, initialize the search range, and specify the search space of prediction parameters according to experience. Through the iterative optimization of the ISCSO algorithm, calculate the current fitness value, update the value of each parameter, and enter the next iteration, reduce the error rate of the model until the completion of the iteration. Once the search for optimization is completed, determine the optimal value of the prediction parameter and the weighting parameter, and through continuous iteration, reach the maximum number of iterations; that is the end of the search for the hyperparameter optimization. At this time, the optimal position of the optimal individual of the sand cat group is the hyperparameter to be optimized in the BIGRU model. According to the ISCSO global optimization of BIGRU, the optimal parameter combination is obtained, and the ISCSO-PTCN-BIGRU gas-containing coal fracture risk level prediction model is established.