Distriformer: Research on a Distributed Training Rockburst Prediction Method

Zhang, Yu; Fang, Kongyi; Guo, Zhengjia

doi:10.3390/pr12061205

Open AccessArticle

Distriformer: Research on a Distributed Training Rockburst Prediction Method

by

Yu Zhang

^1,2,

Kongyi Fang

^1,* and

Zhengjia Guo

³

¹

School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China

²

State Key Laboratory in China for GeoMechanics and Deep Underground Engineering, China University of Mining & Technology, Beijing 100083, China

³

Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90007, USA

^*

Author to whom correspondence should be addressed.

Processes 2024, 12(6), 1205; https://doi.org/10.3390/pr12061205

Submission received: 9 May 2024 / Revised: 8 June 2024 / Accepted: 9 June 2024 / Published: 12 June 2024

(This article belongs to the Special Issue Intelligent Safety Monitoring and Prevention Process in Coal Mines)

Download

Browse Figures

Versions Notes

Abstract

:

The precise forecasting of rockburst is fundamental for safeguarding human lives and property, upholding national energy security, and protecting social welfare. Traditional methods for predicting rockburst suffer from poor accuracy and extended model training durations. This paper proposes a distributed training rockburst prediction method called Distriformer, which uses deep learning technology combined with distributed training methods to predict rockburst. To assess the efficacy of the Distriformer rockburst model proposed herein, five datasets were used to compare the proposed method with Transformer and Informer. The experimental results indicate that, compared with Transformer, the proposed method reduces the mean absolute error by 44.4% and the root mean square error by 30.7% on average. In terms of training time, the proposed method achieves an average accelaration ratio of 1.72. The Distriformer rockburst model enhances the accuracy of rockburst prediction, reduces training time, and serves as a reference for development of subsequent real-time prediction models for extensive rockburst data.

Keywords:

rockburst; distributed training; rockburst prediction

1. Introduction

Since the initial documentation of rockburst in the 17th century [1], these events have progressively gained attention within the scientific community. Rockburst occurs during the mining process due to external disturbances, prompting the release of elastic potential energy, resulting in rock rupture, often accompanied by audible explosion phenomena [2]. The phenomenon of rockburst is characterized by its sudden and unpredictable nature, as it is influenced by a multitude of intricate factors, such as the inherent structure of the rock, the surrounding topography, and the stress within the crust, making it a challenging task for humans to foresee promptly. Coal has traditionally served as the primary energy source in China [3]. With ongoing societal advancement, resource demands have surged, leading to annual escalations in both the scale and frequency of coal mining, consequently resulting in a rise in rockburst accidents as well [4]. Rockburst disasters endanger human lives, property, national energy security, and societal welfare. Hence, there is an urgent need to investigate efficient and accurate methods for predicting rockburst.

Rockburst prediction has perennially attracted significant research attention within this domain. Over the decades, scholars have investigated diverse facets of rockburst prediction methodologies. Existing methods for predicting rockburst encompass both traditional prediction methods and those leveraging artificial intelligence techniques. The microseismic method [5], acoustic emission method [6], and electromagnetic radiation method [7] are categorized as traditional rockburst prediction methods. These methods investigate and forecast rockburst based on rock parameters and characteristics gathered through monitoring. Traditional rockburst prediction methods exhibit significant limitations in practical applications. Hence, researchers endeavor to integrate artificial intelligence techniques into rockburst prediction endeavors [8]. Traditional machine learning algorithms such as support vector machine [9,10], decision tree [11], and random forest [12,13] have also been employed to address the nonlinear challenges inherent in rockburst prediction. Furthermore, deep learning methods have garnered attention due to their enhanced nonlinear processing capabilities. Artificial neural network [14,15], self-organizing neural network [16], and long short-term memory recurrent neural network [17] are pivotal in rockburst prediction. They significantly enhance predictive accuracy. The accuracy of rockburst prediction has significantly improved from traditional methods to those leveraging artificial intelligence techniques.

Most of the aforementioned methods focus on predicting rockburst intensity levels. There is limited research on using artificial intelligence techniques to predict acoustic emission time series. At the same time, the above studies often do not pay attention to the time issue of model training. In addition to accuracy, timeliness is a crucial indicator in rock burst prediction, as it determines whether preventive measures can be implemented promptly to mitigate disaster-related losses. Therefore, temporality is also a focus of this study. Accurate and rapid predictions are particularly important.

The proposal of Transformer [18] has led to excellence in multiple domains. The original Transformer architecture encounters challenges related to time complexity and memory usage as model sizes expand and the necessity to handle long sequential data escalates. Research scholars have put forth a refined architecture known as Informer, which stems from Transformer and integrates the ProbSparse self-attention mechanism along with the self-attention distilling mechanism to effectively address these challenges. This model effectively reduces both the computational and memory demands, thereby significantly enhancing prediction accuracy across a diverse array of tasks. Consequently, researchers have also undertaken efforts to introduce this crucial aspect into the realm of rockburst prediction.

Furthermore, to enhance the performance of deep learning algorithms in real-world scenarios, there is a tendency for datasets to undergo expansion, deep learning models to become more intricate, and training duration to undergo escalation. In recent years, distributed training has emerged as a primary method for accelerating the training of deep learning models, garnering significant attention from both domestic and international research scholars to reduce training time.

This paper proposes a distributed training rockburst prediction method called Distriformer. Distributed training methods are innovatively introduced into the field of rockburst prediction for the first time. Initially, the rockburst data and the model are allocated across multiple GPUs. Furthermore, distributed training is employed to optimize the rockburst prediction model during the training phase. Finally, the trained model for predicting rockburst will be applied. The method was applied to five rockburst datasets for experimentation, demonstrating optimal model performance. On average, the mean absolute error decreases by 44.4% and the root mean square error decreases by 30.7% for accuracy. Additionally, it achieves an average acceleration ratio of 1.72 for timeliness. The method proposed in this paper effectively solves the accuracy and timeliness problems encountered in rockburst prediction.

2. Experimental System and Data Sources

The experimental apparatus utilized in this study originates from the strain rockburst physical simulation test system autonomously designed by China University of Mining and Technology. The system comprises three components: the mainframe, the loading control system, and the data acquisition system. The strain rockburst physical simulation test system is shown in Figure 1. The system is capable of simulating rockburst occurrence indoors and capturing it via an acoustic emission sensor, a pressure sensor, and a high-speed camera to monitor the data changes throughout the entire rockburst process of the rock samples.

The rock specimens utilized in this experiment comprise ROCKBURST_S1, ROCKBURST_S2, ROCKBURST_S3, ROCKBURST_S4, and ROCKBURST_S5. Uniform and well-integrated rocks were prepared by cutting them into samples of specified dimensions and subsequently positioned at the designated loading point within the experimental setup. Initially, the loading control system employs a three-phase and six-sided force on the rock samples to replicate the pre-construction force state of the rock at the site. The forces on the rock sample are shown in Figure 2a. The loading control system was incrementally loaded to a predetermined level, followed by rapid unloading along a specified side to simulate the rock’s force state during construction, persisting until the occurrence of rockburst. The unloaded state of the rock sample is shown in Figure 2b.

Firstly, the indoor simulated rockburst experiment will yield a substantial volume of experimental data, which will be stored in text files by the data acquisition system. Each text file comprises numerous datasets encompassing metric information for multiple parameters across different time intervals. Then, the extensive rockburst dataset was subjected to feature extraction. Important features like count, amplitude, peak frequency, and absolute energy are extracted at different moments during indoor simulated rockburst. Lastly, the text file containing the primary characteristic parameters of the indoor simulated rockburst experiment is generated.

3. Methodology

3.1. Informer Model

The Informer model is a Transformer-based architecture designed for time series prediction. It comprises two primary components: an encoder and a decoder [19]. Leveraging the Transformer architecture, it optimizes both the encoding and decoding layers to address its three primary challenges. The three primary enhancements are outlined below: to reduce the computational complexity of the traditional self-attention mechanism, multi-head ProbSparse self-attention is proposed at the encoder layer; self-attention distilling is employed to emphasize primary attention while reducing the input of the cascade layer by half; and to enhance prediction speed for lengthy sequences, a generative decoder mechanism is introduced at the decoding layer to generate the prediction output in a single step.

(1) ProbSparse self-attention mechanism: The conventional self-attention mechanism conducts the scaling dot product operation through the query–key–value triad, executing a scaled dot product operation. The dot product formula is as follows:

A (Q, K, V) = Softmax (\frac{Q K^{T}}{\sqrt{d}}) V

(1)

where Q is the matrix of the query, K is the matrix of the key, V is the matrix of the value, d is the dimensionality of the input variables, and Softmax is the activation function.

During self-attention computation, it is not always the case that one element of the input exhibits a high correlation with all other elements. To minimize computational overhead, the ProbSparse self-attention mechanism initially assesses the sparsity of the overall query vector. Then, the number of query vectors used for dot product computation is determined, and the values corresponding to the remaining unchecked query vectors are directly assigned to the mean. The formula for computing the correlation

\bar{M}

between the i-th query vector

q_{i}

and the key matrix k is:

\bar{M} (q_{i}, k) = \max_{j} \{\frac{q_{i} k_{j}^{T}}{\sqrt{d}}\} - \frac{1}{L_{k}} \sum_{j = 1}^{L_{k}} \frac{q_{i} k_{j}^{T}}{\sqrt{d}},

(2)

Thus, the formulation of the ProbSparse self-attention mechanism is established as:

A_{m} (Q, K, V) = Softmax (\frac{\bar{Q} K^{T}}{\sqrt{d}}) V

(3)

where

\bar{Q}

is a new sparse query matrix composed of u query vectors with the highest scores selected by sparsity computation.

(2) Self-attention distilling: Incorporating a distilling layer prior to the self-attention mechanism enables the extraction of prominent features, thereby facilitating the allocation of increased weights to these salient features. This process reduces the network parameter size, resulting in a halving of the input sequence length with each layer. The distilling operation from self-attention layer j to layer j + 1 is calculated as:

X_{J + 1}^{t} = MaxPool (ELU (Convld ({[X_{j}^{t}]}_{A B})))

(4)

where MaxPool is the maximum pooling layer operation, ELU is the activation function, Convld is the one-dimensional convolution operation, and

{[X_{j}^{t}]}_{A B}

includes the basic operations of the attention block and sparse self-attention mechanism.

(3) Generative decoder mechanism: It employs a masked multi-head ProbSparse self-attention mechanism to prevent the incorporation of future unknown information into the current prediction. The generative decoder mechanism concurrently generates the complete prediction, significantly enhancing prediction speed. The input sequence to the decoder comprises historical load sequence, denoted as

X_{t o k e n}^{t}

, serving as a reference, alongside the predicted sequence

X_{0}^{t}

, which requires masking from future contextual factors. The input sequence formula for the decoder is as follows:

X_{d e}^{t} = Concat (X_{t o k e n}^{t}, X_{0}^{t}) \in R^{(L_{t o k e n} + L_{y}) d_{m o d e l}}

(5)

where

X_{d e}^{t}

is the t-th input sequence in the decoder,

X_{t o k e n}^{t}

is the start token of the t-th sequence, and

X_{0}^{t}

is a placeholder for the target sequence of the t-th sequence.

3.2. Distributed Training

In recent years, the continuous advancement of graphics processing units (GPUs) has led to significant breakthroughs in areas such as big data analysis, deep learning, and large-scale model training, owing to their formidable computational capabilities. Consequently, an increasing number of researchers and scholars opt for GPUs over CPUs in deep learning training. Moreover, to expedite training, distributed training methods are employed. Model parallelism and data parallelism are the two predominant parallel schemes employed in distributed training [20]. Typically, the latter incurs lower communication overhead compared to the former [21]. Considering the model’s structure and the associated parallelism overhead, the data parallelism approach was chosen.

The essence of employing the data parallelism approach in distributed training is to address the challenge of handling excessive data with limited GPU memory, thereby enhancing training efficiency. The hardware setup must include multiple GPUs, each equipped with a complete and identical copy of the model. The data need to be partitioned and scattered to different GPUs for parallel training prior to training. Forward and backward propagation are central components of the process. The forward propagation process focuses on the fact that the data are scattered to each GPU and the GPUs all acquire models with consistent parameters. Subsequently, the GPUs collectively engage in training the model concurrently, with each GPU producing an output. The primary GPU (labeled as GPU1 in Figure 3a,b) is responsible for gathering outputs from each GPU, computing loss, and generating n loss. Backpropagation involves the primary GPU scattering each of the n losses to corresponding GPUs for gradient computation, followed by gathering the gradients and updating the model parameters. Then, the new model parameters are updated on each GPU for subsequent iteration. The schematic diagrams illustrating distributed training are depicted in Figure 3.

3.3. Distriformer Rockburst Prediction Model

In predicting rockburst, precision and timeliness are critical, as even a slight advance in detection time could prevent substantial losses in both assets and human lives. Thus, this paper proposes a distributed training model for rockburst prediction called Distriformer, with the aim of enhancing prediction accuracy and reducing prediction time.

The Distriformer rockburst prediction model enhances the self-attention mechanism by introducing the ProbSparse self-attention mechanism and self-attention distilling. This aims to reduce computational complexity while effectively capturing long-distance dependencies and improving prediction accuracy. In addition, the model uses a distributed training strategy to accelerate model training through parallel data processing across multiple computing nodes. The overall schematic diagram of the Distriformer rockburst prediction model is presented in Figure 4.

The rockburst data are initially partitioned into an equal number of segments corresponding to the number of GPUs and then distributed among the GPUs. Each GPU is equipped with an identical rockburst model and its parameters. GPUs start training the rockburst model. The rockburst data first pass through the Encoder layer. The ProbSparse self-attention layer uses a novel attention calculation mechanism to optimize computational efficiency. And self-attention distilling compresses the rockburst features to reduce the dimensionality and the number of network parameters. The Encoder layer enhances the robustness of the rockburst prediction model by stacking the aforementioned two operations. Subsequently, it traverses the Decoder layer, undergoes masked self-attention, and culminates in the single-step generation of the rockburst prediction outcome. The primary GPU summarizes the values and calculates the loss and gradient. During the backpropagation process, each GPU receives the loss values computed by the rockburst model and calculates the corresponding gradient, which is subsequently aggregated onto the primary GPU. The primary GPU distributes updated model parameters to individual GPUs for continual model optimization. Subsequently, the trained rockburst prediction model is employed to forecast the final rockburst outcomes.

4. Experiments and Analysis

4.1. Experimental Setup

The experimental configurations are detailed in Table 1 below.

The foundational experimental framework of the rockburst prediction method outlined in this paper comprises six essential components: data acquisition, data processing, data segmentation, model training, model evaluation, and analysis of results, visually depicted in Figure 5.

Data acquisition was carried out using the acoustic emission system in the indoor simulated rockburst experiments to collect the corresponding raw rockburst datasets. Subsequently, following data processing, pertinent data features were extracted, constituting the rockburst datasets for this experiment. The training set was utilized for model training, whereas the test and validation sets were employed for model evaluation. Three types of rockburst prediction models, namely Transformer, Informer, and Distriformer, underwent comparative analysis during model training to ascertain the optimal performer. The model evaluation process employed two widely used error assessment techniques: mean absolute error (MAE) and root mean square error (RMSE). The training time was measured by the average time taken for one epoch and the acceleration ratio. Finally, the results of the experiment were analyzed.

This paper employs two evaluation indices, namely MAE and RMSE, to assess the efficacy of rockburst prediction. The formulas for their calculation are as follows:

MAE (y, \tilde{y}) = \frac{1}{n} \sum_{i = 0}^{n - 1} |y_{i} - {\tilde{y}}_{i}|

(6)

RMSE (y, \tilde{y}) = \sqrt{\frac{1}{n} \sum_{i = 0}^{n - 1} {(y_{i} - {\tilde{y}}_{i})}^{2}}

(7)

where

\tilde{y}

represents the predicted value of rockburst, y denotes the true value of rockburst, and n signifies the total number of predicted points.

The experiments employed an acceleration ratio to quantify the duration needed for training the rockburst prediction model. The acceleration ratio is defined as the ratio of the average time per epoch required to train the two models. The formula is as follows:

S = \frac{T_{1}}{T_{2}}

(8)

where S represents the acceleration ratio,

T_{1}

denotes the longer average training duration per epoch, and

T_{2}

denotes the shorter average training duration per epoch.

4.2. Experimental Data

The present study encompasses an analysis of five distinct rockburst datasets, each with a substantial volume of samples to ensure robust predictive modeling. Specifically, the dataset ROCKBURST_S1 comprises 13,962 samples, ROCKBURST_S2 contains 27,572 samples, ROCKBURST_S3 encompasses 22,804 samples, ROCKBURST_S4 includes 31,210 samples, and ROCKBURST_S5 consists of 12,010 samples. Table 2 shows the specific sample number of each rockburst dataset.

4.3. Result Analysis

4.3.1. Accuracy Analysis

Table 3 presents the evaluation metrics for each rockburst prediction model, assessed using the MAE and RMSE metrics. Lower values denote increased accuracy in model predictions.

Table 3 illustrates that the Distriformer model outperforms the Transformer model significantly in both MAE and RMSE, exhibiting a considerable improvement. It reduces the MAE by 64.6%, 66.7%, 32.7%, 31.7%, and 26.4% and the RMSE by 62.4%, 24.4%, 31.1%, 14.2%, and 21.5%. The experimental results indicate that the Distriformer model exhibits reduced error compared to the baseline model (Informer), resulting in enhanced overall performance.

Figure 6 and Figure 7 illustrate the performance of the rockburst prediction model in terms of Mean MAE and RMSE across five datasets depicted in a bar chart. The figure also illustrates the superior performance of the Distriformer model.

4.3.2. Training Time Analysis

To facilitate a comparison of model training duration, the average time required for training one epoch was measured, and the acceleration ratio was introduced for emphasis. Table 4 displays the mean duration necessary for training per epoch across various rockburst prediction models.

As indicated in Table 4, the experimentations across all five datasets revealed that Informer required the longest duration, followed by Transformer, with Distriformer exhibiting the shortest time. Thus, the acceleration ratio of the Informer model was set as 1 for each dataset, while the acceleration ratios of the remaining two models were computed and juxtaposed for comparison. The acceleration ratio of the Transformer model ranges from 1.11 to 1.14, whereas the Distriformer model exhibits a minimum ratio of 1.63 and a maximum of 1.87. The Distriformer model decreases processing time by 38.8% to 46.4% compared to that of the Informer model.

Figure 8 illustrates a comparison of the average epoch training times for the model, providing a clearer visualization of the time disparities among the three. The x-axis of the graph represents time in seconds, while the y-axis denotes the corresponding dataset. The time disparity between the Transformer and Informer models is marginal, whereas the Distriformer model significantly diminishes training duration.

Figure 9 shows the line graph of the acceleration ratio for each model. The figure illustrates that the Distriformer model achieved the highest acceleration ratio across all datasets, implying the efficacy of the proposed model in reducing training time for rockburst prediction.

In summary, the Distriformer rockburst prediction model has excellent performance. The Distriformer model with distributed training outperforms the original Informer model across all measurement indices, confirming the efficacy of distributed training for Informer. This method enhances the accuracy of rockburst prediction, resulting in an average decrease of 44.4% in MAE and 30.7% in RMSE. Furthermore, it accelerates the training process for predicting rockburst, achieving an average acceleration ratio of 1.72. The issue of low model accuracy and extended model training duration has been partially addressed.

5. Conclusions and Outlook

This paper innovatively introduces distributed training methods into rockburst prediction to solve the problem of training time. The Distriformer model for rockburst prediction achieves a balance between prediction accuracy and timeliness, as evidenced by experimental results validating its feasibility. Application of this method in real-world scenarios holds the potential to enhance rockburst prediction accuracy, bolster mining operation safety, and reduce model training time. Simultaneously, it establishes the groundwork for subsequent rockburst prediction, facilitating the online prediction and processing of substantial volumes of rockburst data.

Nevertheless, the research presented in this paper is not without its limitations. Future research endeavors could be enhanced by focusing on the following areas.

(1) Data sources: This study exclusively employed data generated from indoor simulated rockburst. Future research endeavors can leverage real-time training and prediction using rockburst data from operational mines.

(2) Federated Learning: Employing federated learning techniques enables collaborative training and data sharing among units, safeguarding individual data privacy, and thereby enhancing the precision and applicability of rockburst prediction models.

Author Contributions

Conceptualization, Y.Z., K.F. and Z.G.; methodology, Y.Z., K.F. and Z.G.; validation, Y.Z.; formal analysis, K.F.; investigation, Y.Z., K.F. and Z.G.; resources, Y.Z.; data curation, Y.Z., K.F. and Z.G.; writing—original draft preparation, Y.Z., K.F. and Z.G.; writing—review and editing, Y.Z. and K.F.; visualization, K.F.; supervision, K.F.; project administration, Y.Z. and K.F.; funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Institute for Deep Underground Science and Engineering, grant number XD2021021, the BUCEA Post Graduate Innovation Project, grant number PG2024099, and the Postgraduate education and teaching quality improvement project of Beijing University of Civil Engineering and Architecture, grant number J2022003.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wu, M.; Ye, Y.; Wang, Q.; Hu, N. Development of rockburst research: A comprehensive review. Appl. Sci. 2022, 12, 974. [Google Scholar] [CrossRef]
Li, K.; WU, Y.; Du, F.; Zhang, X.; Wang, Y. Prediction of rockburst intensity grade based on convolutional neural network. Coal Geol. Explor. 2023, 51, 94–103. [Google Scholar] [CrossRef]
Zhang, J.; Xu, K.; Reniers, G.; You, G. Statistical analysis the characteristics of extraordinarily severe coal mine accidents (ESCMAs) in China from 1950 to 2018. Process Saf. Environ. Prot. 2020, 133, 332–340. [Google Scholar] [CrossRef]
Liu, J.; Gao, Y.; Chen, F.; Cao, Z. Mechanism Study and Tendency Judgement of Rockburst in Deep-Buried Underground Engineering. Minerals 2022, 12, 1241. [Google Scholar] [CrossRef]
Li, X.; Chen, D.; Fu, J.; Liu, S.; Geng, X. Construction and application of fuzzy comprehensive evaluation model for rockburst based on microseismic monitoring. Appl. Sci. 2023, 13, 12013. [Google Scholar] [CrossRef]
Luo, D.; Lu, S.; Su, G.; Tao, H. Experimental study on rock burst of granite with prefabricated single crack under true-triaxial stresscondition with a free face. Rock Soil Mech. 2023, 44, 75–87. [Google Scholar] [CrossRef]
Liu, X.; Zhang, S.; Wang, E.; Zhang, Z.; Wang, Y.; Yang, S. Multi-index geophysical monitoring and early warning for rockburst in coalmine: A case study. Int. J. Environ. Res. Public Health 2022, 20, 392. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Zhao, G.; Xiao, P.; Yin, Y. Ensemble tree model for long-term rockburst prediction in incomplete datasets. Minerals 2023, 13, 103. [Google Scholar] [CrossRef]
Jin, A.; Basnet, P.M.S.; Mahtab, S. Microseismicity-based short-term rockburst prediction using non-linear support vector machine. Acta Geophys. 2022, 70, 1717–1736. [Google Scholar] [CrossRef]
Jia, Z.; Wang, Y.; Wang, J.; Pei, Q.; Zhang, Y. Rockburst Intensity Grade Prediction Based on Data Preprocessing Techniques and Multi-model Ensemble Learning Algorithms. Rock Mech. Rock Eng. 2024, 1–21. [Google Scholar] [CrossRef]
Owusu-Ansah, D.; Tinoco, J.; Lohrasb, F.; Martins, F.; Matos, J. A decision tree for rockburst conditions prediction. Appl. Sci. 2023, 13, 6655. [Google Scholar] [CrossRef]
Sun, J.; Wang, W.; Xie, L. Predicting Short-Term Rockburst Using RF–CRITIC and Improved Cloud Model. Nat. Resour. Res. 2024, 33, 471–494. [Google Scholar] [CrossRef]
Liu, J.; Zhou, Z. Rockburst Grade Prediction Based on Modified Scatter Graph Matrix and Random Forest. Nonferrous Met. Eng. 2022, 12, 120–128. [Google Scholar] [CrossRef]
Zhou, J.; Guo, H.; Koopialipoor, M.; Jahed Armaghani, D.; Tahir, M.M. Investigating the effective parameters on the risk levels of rockburst phenomena by developing a hybrid heuristic algorithm. Eng. Comput. 2021, 37, 1679–1694. [Google Scholar] [CrossRef]
Zhang, W.; Bi, X.; Hu, L.; Li, P.; Yao, Z. Performance and Applicability of Recognizing Microseismic Waveforms Using Neural Networks in Tunnels. KSCE J. Civ. Eng. 2024, 28, 951–966. [Google Scholar] [CrossRef]
Yang, X.; Pei, Y.; Cheng, H.; Hou, X.; Lu, J. Prediction method of rockburst intensity grade based on SOFM neural network model. Chin. J. Rock Mech. Eng. 2021, 40, 2708–2715. [Google Scholar] [CrossRef]
Di, Y.; Wang, E.; Li, Z.; Liu, X.; Huang, T.; Yao, J. Comprehensive early warning method of microseismic, acoustic emission, and electromagnetic radiation signals of rock burst based on deep learning. Int. J. Rock Mech. Min. Sci 2023, 170, 105519. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 12, 11106–11115. [Google Scholar] [CrossRef]
Dai, M.; Yuan, J.; Huang, Q.; Lin, X.; Wang, H. Distributed Encoding and Updating for SAZD Coded Distributed Training. IEEE Trans. Parallel Distrib. Syst. 2023, 7, 2124–2137. [Google Scholar] [CrossRef]
Wei, J.; Zhang, X.; Wang, L.; Zhao, M.; Dong, X. MC² Energy Consumption Model for Massively Distributed Data Parallel Training of Deep Neural Networks. J. Comput. Res. Dev. 2023, 1–21. [Google Scholar] [CrossRef]

Figure 1. The strain rockburst physical simulation test system.

Figure 2. Stress state of rock samples: (a) force state of the rock sample and (b) unloaded state of the rock sample.

Figure 3. Distributed training flowchart: (a) forward propagation schematic and (b) backward propagation schematic.

Figure 4. Schematic diagram of the Distriformer rockburst prediction model.

Figure 5. Experimental framework diagram.

Figure 6. Status of MAE assessment indicators.

Figure 7. Status of RMSE assessment indicators.

Figure 8. Comparison graph of model training time.

Figure 9. Acceleration ratio line graph.

Table 1. Experimental configurations table.

Environment	Name	Configuration Information
Hardware environment	CPU	Intel(R) Core (TM) i9-7900X [email protected] GHz
	Random Access Memory	64GB
	GPU	4*NVIDIA GeForce GTX 1080
Software environment	Operating System	Ubuntu 18.04
	Programming Language	Python 3.7
	Basic Framework	Pytorch 1.10
	Programming Software	Pycharm

Table 2. Rockburst dataset sample sizes.

Dataset	Number of Samples
ROCKBURST_S1	13,962
ROCKBURST_S2	27,572
ROCKBURST_S3	22,804
ROCKBURST_S4	31,210
ROCKBURST_S5	12,010

Table 3. Rockburst model accuracy assessment.

Dataset	Model	MAE	RMSE
ROCKBURST_S1	Transformer	0.096	0.101
	Informer	0.036	0.042
	Distriformer	0.034	0.038
ROCKBURST_S2	Transformer	0.111	0.160
	Informer	0.048	0.131
	Distriformer	0.037	0.121
ROCKBURST_S3	Transformer	0.836	10.604
	Informer	0.581	7.326
	Distriformer	0.563	7.302
ROCKBURST_S4	Transformer	0.458	9.928
	Informer	0.339	8.725
	Distriformer	0.313	8.521
ROCKBURST_S5	Transformer	0.893	6.719
	Informer	0.690	5.314
	Distriformer	0.657	5.272

Table 4. Time required for model training.

Dataset	Model	Average Time Taken for One Epoch (s)	Acceleration Ratio
ROCKBURST_S1	Transformer	12.3	1.11
	Informer	13.6	1
	Distriformer	8.0	1.70
ROCKBURST_S2	Transformer	24.4	1.11
	Informer	27.1	1
	Distriformer	15.9	1.70
ROCKBURST_S3	Transformer	19.7	1.14
	Informer	22.5	1
	Distriformer	13.3	1.69
ROCKBURST_S4	Transformer	27.1	1.14
	Informer	30.8	1
	Distriformer	16.5	1.87
ROCKBURST_S5	Transformer	10.3	1.13
	Informer	11.6	1
	Distriformer	7.1	1.63

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Y.; Fang, K.; Guo, Z. Distriformer: Research on a Distributed Training Rockburst Prediction Method. Processes 2024, 12, 1205. https://doi.org/10.3390/pr12061205

AMA Style

Zhang Y, Fang K, Guo Z. Distriformer: Research on a Distributed Training Rockburst Prediction Method. Processes. 2024; 12(6):1205. https://doi.org/10.3390/pr12061205

Chicago/Turabian Style

Zhang, Yu, Kongyi Fang, and Zhengjia Guo. 2024. "Distriformer: Research on a Distributed Training Rockburst Prediction Method" Processes 12, no. 6: 1205. https://doi.org/10.3390/pr12061205

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Distriformer: Research on a Distributed Training Rockburst Prediction Method

Abstract

1. Introduction

2. Experimental System and Data Sources

3. Methodology

3.1. Informer Model

3.2. Distributed Training

3.3. Distriformer Rockburst Prediction Model

4. Experiments and Analysis

4.1. Experimental Setup

4.2. Experimental Data

4.3. Result Analysis

4.3.1. Accuracy Analysis

4.3.2. Training Time Analysis

5. Conclusions and Outlook

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI