Next Article in Journal
Force Tracking Impedance Control of Hydraulic Series Elastic Actuators Interacting with Unknown Environment
Previous Article in Journal
Comparative Study of Markov Chain Filtering Schemas for Stabilization of Stochastic Systems under Incomplete Information
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bagged Ensemble of Gaussian Process Classifiers for Assessing Rockburst Damage Potential with an Imbalanced Dataset

1
School of Resource Environment and Safety Engineering, University of South China, Hengyang 421001, China
2
China Tin Group Co., Ltd., Liuzhou 545026, China
3
School of Resources and Safety Engineering, Central South University, Changsha 410083, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(18), 3382; https://doi.org/10.3390/math10183382
Submission received: 6 August 2022 / Revised: 10 September 2022 / Accepted: 14 September 2022 / Published: 17 September 2022
(This article belongs to the Section Computational and Applied Mathematics)

Abstract

:
The evaluation of rockburst damage potential plays a significant role in managing rockburst risk and guaranteeing the safety of personnel. However, it is still a challenging problem because of its complex mechanisms and numerous influencing factors. In this study, a bagged ensemble of Gaussian process classifiers (GPCs) is proposed to assess rockburst damage potential with an imbalanced dataset. First, a rockburst dataset including seven indicators and four levels is collected. To address classification problems with an imbalanced dataset, a novel model that integrates the under-sampling technique, Gaussian process classifier (GPC) and bagging method is constructed. Afterwards, the comprehensive performance of the proposed model is evaluated using the values of accuracy, precision, recall, and F1. Finally, the methodology is applied to assess rockburst damage potential in the Perseverance nickel mine. Results show that the performance of the proposed bagged ensemble of GPCs is acceptable, and the integration of data preprocessing, under-sampling technique, GPC, and bagging method can improve the model performance. The proposed methodology can provide an effective reference for the risk management of rockburst.

1. Introduction

With the increase in mining depth, rockbursting has become an increasingly prominent issue [1,2,3]. It is induced by the instantaneous release of elastic strain energy, and often is accompanied by ejection and collapse of massive rock [4,5,6]. Many mines have suffered rockburst disasters, causing serious economic losses and casualties. For example, a rockburst with a magnitude of 3.5 happened in Falconbridge nickel mine, resulting in four deaths [7]; a rockburst with a magnitude of 2.47 occurred in Junde coal mine, causing five deaths and the destruction of a shearer and scraper conveyor [8]; and a rockburst with a magnitude of 5.2 appeared in the Klerksdorp district of South Africa, leading to two deaths and fifty-eight injuries [9]. Due to such serious consequences, assessing rockburst damage potential is necessary and significant.
According to the differences between the locations of damage and seismic event, rockbursts can be classified into self-initiated rockburst and remotely triggered rockburst [10]. For the former, the locations of damage and seismic event are consistent. While for the latter, rockburst is triggered by remote and relatively large magnitude seismic events. For different types of rockburst, the influencing factors are different, resulting in the disparity of the rockburst damage potential evaluation. This study aims to assess the damage potential of remotely triggered rockburst. Because the location of the damage is not consistent with that of the microseismic event, it is difficult to evaluate the damage potential only based on the microseismic event. The microseismic event information, stress wave propagation paths, and rock mass conditions on the excavation face should be considered simultaneously. Due to the complex mechanisms and numerous influencing factors, the evaluation of rockburst damage potential is still a difficult issue.
Scholars have proposed some methods to assess rockburst damage potential. Kaiser et al. [11] developed a rockburst damage assessment procedure. It mainly included four steps: propose rock and support damage scales, put forward an initial condition index, calculate the scaled distance, and establish relationships among the initial condition index, scaled distance, and rock and support damage scales. Durrheim et al. [12] summarized the influencing factors of rockburst damage according to the investigations of rockbursts in South African gold mines, which was valuable for the evaluation of rockburst damage potential. Brink et al. [13] proposed an approach for seismic risk evaluation, which can be summarized in four steps: determine an evaluation indicator system, score each sub-category according to the risk rating, calculate the score of each category, and determine the risk levels. Albrecht and Sharrock [14] investigated ten indicators that affect rockburst damage, and established the relationship between them and rockburst damage based on field rockburst incidences. With the increase of rockburst cases, machine learning (ML) algorithms were used to evaluate rockburst damage potential. Heal et al. [15] proposed the concept of excavation vulnerability potential (EVP), and then adopted logistic regression to assess rockburst damage potential. Zhou et al. [16] employed a stochastic gradient boosting approach for the evaluation of rockburst damage. Li et al. [17] put forward a rockburst damage scale index using rock engineering systems and artificial neural networks to evaluate rockburst damage.
When a large number of rockburst cases accumulate, ML is a possible way to evaluate rockburst damage potential [18,19]. However, due to the fact that most rockburst damage levels are slight, while the strong or even extremely strong type is relatively rare, the distribution of sample data for each level is usually imbalanced [20,21,22]. Considering the specific characteristics of rockburst data, two key issues need to be solved. The first one is the handling of the imbalanced rockburst dataset. Generally, classical ML algorithms are conceived on the premise of balanced datasets [23]. It is difficult to handle classification problems with an imbalanced dataset, especially for discriminating the minority category cases [24]. Therefore, traditional ML algorithms should be improved to deal with imbalanced datasets. The corresponding strategy can be roughly divided into four groups: data level, algorithm level, cost-sensitive level, and ensemble level [25]. Among them, combining bagging ensemble learning with under-sampling techniques is an effective way to deal with imbalanced datasets [26].
The second one is the selection of algorithms. A large number of ML algorithms have been used to solve classification problems. Although some other statistical algorithms, such as Monte Carlo methods, can also be adopted to solve multidimensional problems and obtain probabilistic results, the probability density functions need to be determined in advance [27,28,29]. Gaussian process classifier (GPC) is a promising statistical model because it can deal with high-dimensional and nonlinear problems, tune hyperparameters directly based on training data, and obtain probabilistic outputs [30,31]. However, due to the characteristics of imbalance and strong noise in rockburst data, a single GPC is hard to have stable prediction ability. Ensemble learning can overcome this drawback by combining multiple base classifiers to some extent [32,33,34]. Combining bagging ensemble learning with Gaussian process classifiers (GPCs) may improve the generalization ability and robustness of models.
This study proposes a novel model that integrates the under-sampling technique, GPC, and bagging method to assess rockburst damage potential with an imbalanced dataset. First, the rockburst dataset is collected and preprocessed by the Yeo-Johnson transformation and standardization process. Then, the reliability of the proposed methodology is verified, and the comprehensive performance is evaluated using four metrics. Finally, the proposed bagged ensemble of GPCs is applied to assess rockburst damage potential in the Perseverance nickel mine.

2. Data Acquisition

According to the original work of Heal [35], a total of 254 rockburst cases were collected from 13 underground metal mines in Canada and Australia. These cases were obtained based on the rock mass failure conditions caused by a single microseismic event. This database contains 83 microseismic events and 254 failure locations. It indicates some failure locations are caused by the same microseismic event. Based on the damage status of rock mass and support, the degree of rockburst damage was divided into five levels: none (L1), low (L2), moderate (L3), high (L4), and strong (L5). Among them, L1 indicated the rock mass showed no damage or minor loss, and the support was not damaged; L2 indicated the rock mass was slightly damaged, less than 1 ton of rock was displaced, the support system was loaded, the meshes were loose and the plates were deformed; L3 indicated 1 ton to 10 tons of rock was displaced and some bolts were broken; L4 indicated 10 tons to 100 tons of rock was displaced and the support system was severely damaged; L5 indicated above 100 tons of rock was displaced and the support system was completely destroyed. As the damage locations of L1 were not reported during investigations, the original database only contained L2, L3, L4, and L5. The sample sizes at these levels were 116, 48, 63, and 27, respectively.
The original database included nine indicators: the ratio of total maximum principal stress to uniaxial compressive strength (I1), the energy capacity of support system (I2), excavation span (I3), geology factor (I4), Richer magnitude of seismic event (I5), distance between rockburst location and microseismic event (I6), peak particle velocity (I7), rock density (I8) and support types (I9). The specific meaning of each indicator can be referred to in literature [35]. Different indicator combinations have a significant impact on evaluation results. Heal [35] and Zhou et al. [16] selected I1, I2, I3, I4 and I7; and Li et al. [17] chose I1, I2, I3, I4, I5, I7 and I8. Considering I7 was calculated by I5 and I6 based on an empirical formula and I9 was difficult to be quantified, this study adopted I1, I2, I3, I4, I5, I6 and I8 to evaluate the rockburst damage.
From this dataset, it can be seen that there were some duplicate sample data, and some samples had the same indicator values, but the corresponding rockburst damage levels were different. To improve the prediction accuracy, the duplicate samples were first removed. For the samples with the same indicator values but different levels, only the samples with the highest level were selected to ensure safety. Consequently, the number of samples in the updated dataset was 236, and the sample sizes at L2, L3, L4 and L5 were 107, 45, 57 and 27, respectively. The corresponding ratio of sample sizes at different levels was 4.0:1.7:2.1:1.0. It shows that the distribution is relatively unbalanced, which may affect the accuracy of evaluation results. The detailed dataset was listed in Appendix A.
To quantitatively analyze the correlations between these seven indicators, the heat map of the correlation coefficient was obtained, as shown in Figure 1. It can be seen that some indicators were positively correlated, such as I1 and I2, whereas some indicators were negatively correlated, such as I1 and I3. Overall, the correlations between these indicators were generally small. Although I5 and I6 had the largest correlation coefficient of 0.48, they were distinctly based on their physical meanings. Therefore, these indicators were relatively independent, which verified the rationality of the selected indicators.
The box plot of all indicators for each rockburst damage level was shown in Figure 2. It can be seen that all indicators had some outliers. Especially for I2, I3, I6 and I8, outliers were more obvious. Some overlapping parts existed in the range of indicator values for various levels. As a result, it was difficult to differentiate the level of rockburst damage only using one indicator. Second, there was no obvious correlation between rockburst damage level and each indicator. In addition, the distribution of indicator values was uneven. All these characteristics illustrated the complexity of rockburst damage evaluation.

3. Methodology

3.1. Gaussian Process Classifier

GPC is a statistical learning algorithm based on the Gaussian process and Bayesian theory, which has a solid mathematical foundation. By assuming the implicit function obeys the prior distribution of a Gaussian process, the posterior distribution can be obtained according to Bayesian inference [36]. Then, the probability of different classes can be determined. The main calculation steps are as follows.
Suppose the training set is:
D = ( X , Y ) = ( x i , y i ) y i = ± 1 , i = 1 , 2 , , m ,
where x i = x 1 i , x 2 i , , x B i is the input; y i is the output; and m is the number of samples in the training set.
To reflect the mapping relationship between x i and y i , the implicit function that obeys the Gaussian process distribution is defined as:
f = [ f ( x 1 ) , f ( x 1 ) , , f ( x i ) , , f ( x m ) ] T ,
Suppose f satisfies a Gaussian process distribution with a zero mean and covariance matrix K , then:
p ( f X ) ~ N ( f 0 , K ) ,
where K can be calculated by a covariance function k ( x , x ) , and is specifically defined according to the actual situation.
In general, the radial basis function is selected as the covariance function:
k ( x , x ) = θ 1 e x x 2 θ 2 ,
where θ 1 and θ 2 are hyperparameters.
Based on Equations (3) and (4), the prior probability can be determined as:
p ( f X ) = 1 ( 2 π ) 0.5 K 0.5 e x p ( 0.5 f T K 1 f ) .
Then, to obtain the probability of the predicted category, a likelihood function is used to map the output value of the implicit function to the interval [ 0 ,   1 ] . The logistic function is generally used as the likelihood function:
p ( Y f ) = ψ ( z ) = 1 1 + exp ( z ) .
Based on Bayes’ theorem, the posterior probability of the implicit function is:
p ( f ^ X , Y ) = p ( Y f ) p ( f X ) ) p ( Y X ) ,
where p ( Y X ) is the marginal likelihood function, which indicates the probability distribution of a training set.
p ( Y X ) can be calculated by:
p ( Y X ) = p ( D f ) p ( f ) d f .
Suppose the sample to be predicted is ( x ˜ , y ˜ ), the probability of y ˜ = + 1 can be determined by:
p ( y ˜ = + 1 X , Y , x ˜ ) = p ( y ˜ f ^ ) p ( f ^ X , Y , x ˜ ) d f ^ ,
where f ^ indicates the implicit function of x ˜ .
Since there is no analytical solution in Equations (7)–(9), Laplace approximation algorithm is often to obtain the solutions. Namely, the posterior probability distribution p ( f ^ X , Y ) is first obtained, then the implicit function f ^ can be determined.
Finally, the probability of y ˜ = + 1 can be calculated by
p ( y ˜ = + 1 X , Y , x ˜ ) = ψ ( f ^ ) p ( f ^ X , Y , x ˜ ) d f ^ .
If p ( y ˜ = + 1 X , Y , x ˜ ) 0.5 , then the prediction result is a positive class, otherwise it is a negative class.
For multi-classification issues, the binary Gaussian process classifier can be extended with a “one-vs-rest” or “one-vs-one” strategy [37]. For the “one-vs-rest” strategy, the binary Gaussian process classifier classifies one of the classes and the remaining classes respectively. In this case, the class with the highest probability is selected as the final result. For the “one-vs-one” strategy, the binary Gaussian process classifier classifies the two classes respectively. In this case, each classification is equivalent to one vote, and the class with the highest votes is selected as the final result.

3.2. Bagged Ensemble of Gaussian Process Classifiers

A bagged ensemble of GPCs is proposed to handle classification problems with imbalanced datasets, as shown in Figure 3. This model integrates the under-sampling technique, GPC and bagging method. The under-sampling technique is used to make the training samples balanced. The samples of classes except the minority class are resampled and combined with the minority class samples into a new dataset. GPC has a strong probabilistic prediction ability for unknown data by learning from the existing dataset. By integrating multiple classifiers, the bagging method can avoid over-fitting to a certain extent, and has better anti-noise ability and robustness. In addition, the defect of data loss from a single under-sampling can be overcome through multiple under-samplings with replacement. The specific steps are as follows.
First, the under-sampling technique with replacement is used to generate the balanced sample sets from the original dataset.
Second, the GPCs are independently trained based on the generated training sets.
Last, the final result is obtained by integrating the evaluation results of each GPC based on a voting classifier.

3.3. Establishment of Rockburst Damage Evaluation Model

The proposed ensemble model is used to evaluate rockburst damage. The detailed procedure is shown in Figure 4, which is described as follows.
First, the original rockburst damage dataset is preprocessed based on the Yeo-Johnson transformation and standardized processing. In many modeling scenarios, data needs to be normalized to improve predictive performance. Power transformation maps sample data from an arbitrary distribution to a Gaussian distribution as close as possible. It builds a set of monotonic functions to stabilize variance and minimize skewness. There are two transformation methods: the Yeo-Johnson and Box-Cox transformation. Since the Box-Cox transformation only works for positive data, the Yeo-Johnson transformation is adopted in this study. The calculation formula is [38]:
x i ( λ ) = ( x i + 1 ) λ 1 λ if λ 0 ,   x i 0 ln ( x i + 1 ) if λ = 0 ,   x i 0 ( x i + 1 ) 2 λ 1 / ( 2 λ ) if λ 2 ,   x i < 0 ln ( x i + 1 ) if λ = 2 ,   x i < 0 ,
where x i is the data to be transformed; and λ is a parameter, which can be estimated by the maximum likelihood method.
In addition, there is a large gap between some indicator values. For example, I 8 is three orders of magnitude larger than I 5 . In this case, it may lead to the dominance of this indicator, while the roles of other indicators are ignored. Therefore, the initial indicator values need to be firstly standardized. In this study, they are converted into a standard normal distribution with a mean of zero and a standard deviation of one. The conversion formula is:
x i ( λ ) = ( x i ( λ ) μ ) / σ ,
where μ is the mean value and σ is the standard deviation of sample data.
Second, the preprocessed dataset is randomly divided into training and test sets with a ratio of 4:1. Furthermore, the ratio of sample size for different levels in these two sets is kept consistent to make the results more stable.
Third, the hyperparameter of the bagged ensemble of GPCs is optimized using five-fold cross-validation. The number of GPC is adopted as the hyperparameter to be optimized, and both the hyperparameters θ 1 and θ 2 in kernel function of GPC are selected as 1.0. Then, the optimal hyperparameter value is determined based on the average accuracy of five-fold cross-validation.
Fourth, the model with the optimal hyperparameter value is fitted based on the training set, and then the optimal training model is obtained.
Fifth, the comprehensive performance of the proposed methodology is evaluated based on the test set. The accuracy, precision, recall, and F1 are chosen as the evaluation metrics, which can be calculated by a confusion matrix.
Suppose the confusion matrix is:
S = s 11 s 12 s 1 q s 21 s 22 s 2 q s q 1 s q 2 s q q ,
where q is the number of levels.
Then, the accuracy can be calculated by
Accuracy = 1 j = 1 q k = 1 q s j k j = 1 q s j j ;
The precision can be calculated by
Precision = s j j j = 1 q s j k ;
The recall can be calculated by
Re call = s j j k = 1 q s j k ;
The F1 can be calculated by
F 1 = 2 × Precision × Recall Precision + Recall .
Finally, if the prediction performance is reliable, the entire preprocessed dataset can be used as the training set to fit the model. Then, the rockburst damage level in actual engineering can be evaluated. Conversely, if the prediction performance is unreliable, improvements can be made in terms of database quality, data preprocessing, and evaluation models. Moreover, new cases can be adopted to update the original rockburst dataset, and the evaluation process in the next stage can be conducted.

4. Validity Verification

The collected rockburst dataset was used to verify the feasibility of the proposed methodology. Based on Equations (11) and (12), all indicator values were preprocessed. The distribution of indicator values before and after preprocessing was shown in Figure 5. The values of all indicators after preprocessing followed the standard normal distribution with a mean of zero and a standard deviation of one.
To make the model performance more reliable, the number of GPC was optimized by using the five-fold cross validation based on the training set. The average accuracy of five-fold cross validation corresponding to different numbers of GPC was shown in Figure 6. It can be seen that the average accuracy does not increase with the number of GPC, and the optimal number of GPC was 12 because of its maximum average accuracy.
After the bagged ensemble of GPCs with the optimal hyperparameter value was fitted on the training set, it was used to evaluate the rockburst damage on the test set. The evaluation results were expressed by the confusion matrix defined by Equation (13), which were indicated as:
S = 17 4 0 1 2 6 1 0 2 0 6 4 0 1 3 2
Based on Equation (14), the value of accuracy was 63.27%. If the levels L 2 and L 3 were merged into the low-risk group, and the levels L 4 and L 5 were merged into the high-risk group, then the accuracy value of low and high risk was 93.55% and 83.33%, respectively.
According to Equations (15)–(17), the values of precision, recall and F1 corresponding to different levels were obtained, as shown in Figure 7. It can be seen that the evaluation performance for level L 2 was the best, while that for level L 5 was the worst. After comprehensively considering the values of precision, recall and F1, the ranking of the evaluation performance for different levels was L 2 > L 3 > L 4 > L 5 .

5. Case Study

The proposed methodology was applied to evaluate the rockburst damage in the Perseverance nickel mine. The main orebody is hosted in ultramafic rocks, which is mined using the sub-level caving method. The hanging wall is composed of stiff Felsic volcanics and metasediments, which are prone to mining-induced seismicity. The microseismic monitoring system has been established in this mine, making it possible to manage rockburst risk based on microseismic data. Because most of the ramp and infrastructure are located in the hanging wall, it is necessary to evaluate the rockburst damage potential.
Heal [35] recorded twelve rockburst damage cases caused by six microseismic events at the depth of 950 m to 1100 m in this mine. The specific data was shown in Table 1. The Richer magnitude of microseismic events ranged from 1.5 to 2.2, and their real rockburst damage levels were between L 2 and L 4 .
The proposed methodology was used to evaluate the rockburst damage levels for these twelve cases. First, these cases were preprocessed together with the original rockburst dataset. Then, the preprocessed dataset was used as the training set to train the model. Finally, the rockburst damage levels of these twelve cases were obtained using the trained model, as shown in the last column of Table 1.
According to the evaluation results in Table 1, only the rockburst damage levels caused by microseismic events #2 and #6 were not identified. That is, the evaluation results of four cases were inconsistent with the actual situation, and the accuracy is 66.67%. In Heal’s method [35], the evaluation results of seven cases did not match the actual situation, and the accuracy is 41.67%. Therefore, the methodology proposed in this study improved the evaluation accuracy of rockburst damage to a certain extent.

6. Discussions

Since the dataset used in this study is the same as that in Heal [35], Zhou et al. [16], and Li et al. [17], the evaluation results using our methodology are compared with theirs. The comparison results are shown in Table 2. Among them, Heal [35] artificially synthesized 277 samples with level L 1 according to the distribution of the existing data. The evaluation criteria of accuracy used in different literatures are dissimilar, which mainly include the following four categories: (1) The evaluation value corresponds to the actual value; (2) the evaluation value corresponds to the actual value or the neighboring value; (3) the evaluation value corresponds to the actual value after combining L 1 , L 2 and L 3 (or L 2 and L 3 ) into a group while L 4 and L 5 into another group; and (4) the evaluation value corresponds to the actual value after combining L 2 and L 3 into a group. According to these four evaluation criteria, the accuracy of the proposed method is 63.27%, 91.84%, 89.80% and 75.51%, respectively. From Table 2, it can be seen that the accuracy of the proposed method is higher than that of other methods under these four evaluation criteria. This verifies the effectiveness of the method proposed in this study to a certain extent.
Moreover, to further illustrate the reliability of the proposed method, it is also compared with the bagged ensemble of GPCs without preprocessing, bagged ensemble of GPCs without under-sampling, and GPC without under-sampling. The evaluation results of different approaches are shown in Table 3. It can be seen that the bagged ensemble of GPCs without data preprocessing has the lowest accuracy of 48.98%, which shows the importance of data preprocessing. Before preprocessing, the distribution of some indicators is skewed, and there is no clear distribution law for each indicator. When using the Yeo-Johnson transformation, the sample data is mapped from an arbitrary distribution to a Gaussian distribution as close as possible to stabilize variance and minimize skewness. In addition, the influence of diverse dimensions and units on the evaluation results can be avoided after standardization. Therefore, the accuracy is improved by using data preprocessing. The bagged ensemble of GPCs and the GPC without under-sampling can identify level L 2 well, but cannot achieve a reliable recognition of level L 5 . Moreover, the value of F1 is 0 in these methods, which illustrates that under-sampling has an important influence on the evaluation results. Because the distribution of different rockburst damage levels is relatively unbalanced, the prediction results are biased towards the level with a larger number of samples. When using the under-sampling technique, this influence can be avoided to some extent by balancing the training samples. Compared with GPC, the bagged ensemble of GPCs improves the evaluation accuracy, which indicates the bagging method can improve the evaluation performance. By integrating multiple GPCs using the bagging method, the generalization ability and robustness of the model can be increased to a certain extent. Therefore, the integration of data preprocessing, under-sampling technique, GPC, and bagging method improves the comprehensive performance.
Although the proposed method can evaluate the rockburst damage to some extent, there are still some shortcomings:
(1)
The bagged ensemble of GPCs has better evaluation performance for level L 2 , but the evaluation performance for level L 5 still needs to be improved. The reason may be that the sample size of L 2 is the largest, while that of L 5 is the least. A large number of samples can make the model fit better, which can improve the evaluation performance in turn. Because the data-driven method is highly dependent on the quality of data, a higher-quality rockburst damage database should be established in the future.
(2)
More indicators for rockburst damage evaluation need to be considered. According to the original rockburst damage database, some samples with the same indicator values have different levels. This shows that some key indicators are ignored, which may be an important reason for restricting the evaluation accuracy of rockburst damage. In the future, some novel evaluation indicators may be proposed from the perspective of focal mechanisms and failure characteristics of rock mass under dynamic and static stress.
(3)
Considering the distribution of some indicators is skewed, the Gaussian process may yield impropriate results. Although the proposed method can obtain relatively good results, the Gaussian process with skewed errors can be further used to investigate the evaluation performance [39,40].

7. Conclusions

To effectively assess rockburst damage potential with an imbalanced dataset, this study proposed a novel model by integrating the under-sampling technique, GPC, and bagging method. Based on the rockburst dataset preprocessed by the Yeo-Johnson transformation and standardization, the reliability of the proposed model was verified. The accuracy values of all samples, low risk ( L 2 and L 3 ) and high risk ( L 4 and L 5 ) were 63.27%, 93.55%, and 83.33%, respectively. According to the values of precision, recall and F1, the ranking of evaluation performance at different levels was L 2 > L 3 > L 4 > L 5 . By using the evaluation criteria in other literature, the accuracy of the proposed method was highest, which further verified the reliability of our method. Based on the evaluation results of the other three methods (the bagged ensemble of GPCs without preprocessing, bagged ensemble of GPCs without under-sampling and GPC without under-sampling), the comprehensive performance of the proposed method was better. It indicated the effectiveness of the integration of data preprocessing, under-sampling technique, GPC and bagging method in this study. The proposed methodology was applied to assess rockburst damage potential in the Perseverance nickel mine. The evaluation accuracy was 66.67%, which was 25% higher than the method in original literature. Due to the improved performance, the evaluation results provided a valuable guidance for the prevention of rockburst disasters.
In the future, a higher-quality database for the evaluation of rockburst damage potential should be established under a unified data acquisition standard. Because of the complex mechanisms of rockburst, some novel evaluation indicators are worth investigating based on focal mechanisms and failure characteristics of rock mass under dynamic and static stress. The Gaussian process with skewed errors can be obtained to investigate the evaluation performance after considering the skewness of indicator distributions. In addition, the proposed methodology can be applied in other mining and geotechnical engineering fields, such as pillar stability prediction and landslide risk analysis.

Author Contributions

This research was jointly performed by Y.C., Q.D., W.L., P.X., B.D. and G.Z. Conceptualization, Y.C. and W.L.; Methodology, W.L.; Validation, Y.C. and W.L.; Formal analysis, Q.D.; Investigation, P.X.; Resources, B.D.; Data curation, W.L. and G.Z.; Writing—original draft preparation, Q.D.; Writing—review and editing, Y.C. and W.L.; Funding acquisition, Y.C. and W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (52004130), the Provincial Natural Science Foundation of Hunan (2022JJ40601, 2022JJ40373), and the China Postdoctoral Science Foundation (2021M693799).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Rockburst Damage Dataset.
Table A1. Rockburst Damage Dataset.
NumberI1I2I3/(m)I4I5I6/(m)I8/(kg/m3)Level
18056.21−0.3527004
26054.20.51.72027004
36084.20.51.72527002
480860.51.81027004
5708411.81527002
64053.810.4527002
78085.910.6527002
89086.810527004
98087101027002
10808712527004
118084.1121027004
127089.512.2527004
137583.812.21027003
14758412.21027004
156086.211.6527002
16601010.50.51.6527002
176584.310.3527002
186055.60.51.5527004
1945109.10.51.8527005
204359.30.51.8527004
2143109.311.8527002
2243109.40.5−0.2527002
235483.50.51.3527002
244583.611.31027002
258085.411.31027002
265087.811.31527002
275056.211527003
285055.10.51.2527004
295055.111.2527002
305088.310.7527002
315085.510.7527002
3260108.812527002
336056.212527002
346055.212527002
356058.412527002
36605612527002
3760108.412527002
386055.312527002
396010712527002
4060105.40.52527004
417555.110.61527003
427055.110.61027002
437055.110.6527003
447555.211.31027002
457556.711.31527002
467555.111.32027002
47758711.8527003
48758511.81027002
497555.311.8527003
503525.313.11527005
5135210.613.11527005
52351010.613.12027002
533555.91.53.12527004
5435210.61.53.12527004
5535511.81.53.12027002
5635510.60.53.12527003
573587.60.53.11527004
5841.757.61.53.11527002
5941.72713.11027004
60351010.60.53.11527002
61355713.11527004
624021412.81529005
63392712.81029004
643926.512.85029004
654025.711.62529003
6642.8626.41.51.62029003
6735.126.81.53.55029002
68381090.53.51029005
6932.324.51.53.53529003
7043.787.70.53.52029003
7143.71070.53.52029003
7243.724.213.51529004
7342.8250.53.53029005
7442.810100.53.53029003
7539.5550.53.51029004
7644.4101513.51529004
7747.388.313.55029003
7847.385.513.55029003
7951.4359.31.51.91029003
8039.459.51.52.11529002
8139.424.512.12029003
8239.4101112.12529002
8341.724.50.52.16029003
8440.810100.52.13029003
8540.8580.52.12529003
8644.151812.15029002
873681411.21029002
883686.50.51.21529002
8937.785.210.4529003
9040.884.211.8529002
9130105.30.51.2528003
923085.40.51.2528002
9330105.311.21028002
9430105.211.21028003
957485.90.51.5528005
967485.311.51028003
977485.911.51028002
987486.511.52028002
997486.50.51.51528004
100718810.9528004
1017185.310.91028002
1027110810.91028002
1037185.610.91028002
1047110200.52.1528005
1054085.90.52.1527004
1064085.30.52.1527002
1074085.912.1527002
1084086.512.11027002
1094025.612.11027004
1104025.812.1527004
1114085.512.12027002
112705810.8528004
1137010810.81028002
11470105.510.81028002
115705810.81028004
1167085.110.81028003
1177055.110.8528002
1187055.110.81028002
1195455.710.82027002
12054109.10.50.82527002
1213985.710.83028002
1228457.712.91029005
1234554.812.95029002
12484107.412.95029003
12545106.912.91029002
1264557.412.91029004
1275654.612.92529002
1281855.80.50.41030304
1292458.60.50.41030303
13095106.911.5529002
1319556.610.9529005
13245105.111.6529002
13321511.21.50.9530302
1342156.11.50.9530302
1359510811.6529005
13639105.311.61529003
1372155.511.92030303
1382458.70.51.51030305
1392451111.51530302
140671051−0.2529002
14121590.51.81030304
1422110911.81030302
14395106.811529004
14473256.811529003
14527511.513.13030304
1462757.613.14030304
14735511.513.13030304
14850254.513.14029002
14995257.113.12029002
15073254.713.13029002
15195256.313.13029002
15273254.413.14029002
15373259.613.15029002
15454254.813.16029002
1553454.513.17029003
156251011.60.51.4530304
15725511.60.51.4530304
1582451212530305
159395912529004
1602555.111.3530804
16125510.511.31030803
1622557.811.31030802
16325.9710170.52530805
16425.9786.20.521030803
16525.9785.70.52530804
16625.9755.4121030803
16725.9785.3121030802
16825.9785.60.52530804
1697555.211.61028005
170755511.6528005
171655211.6528002
1725059.111.4528003
1735054.60.51.93028003
17470850.51.6543004
175708812543004
17670860.52543004
17770812121043003
1786710110.52.51543005
1796725512.51543002
180675512.51543002
181761060.52.7543005
18240109.20.51.1543003
1834054.811.11043002
184401011.70.52.5543005
1855010912.72043004
186505612.1543004
187555811.9543005
188555612.3543004
18965101112.31043004
19055560.50.9543004
1916055.512.22043004
1925055.50.51.4543005
19350108.50.51.42043002
19450105.50.51.44043002
195505611.5543003
19650103011.7543005
1977054.41.51.72027002
19870104.60.52528004
19990104.51.521028502
2007055.212.1527005
20156.281011528704
20256.21010111028702
20356.286112028702
20457.886.111528703
20557.886.11.511028703
20657.886.511.5528704
20757.81011.311.51028702
20857.8106.511.51028702
2095786.712.2528704
21057109.512.2528704
211571011.212.21028702
2125786.412.22528702
2135786.512.21028702
214571011.50.51.7528704
21557101111.7528702
21657101111.71028702
217571011.511.71028702
21857107.411.71528702
21957.8106.40.52.5528705
22057.81011.20.52.5528705
22157.8106.412.51028702
22257.81010.612.51028702
22358.61012.40.52.23028702
22458.6105.912.23028702
22558.6106.112.23028702
22659.38812.2528705
22759.385.412.21528703
22859.381012.21028702
22959.38812.21528703
23059.388.412.21528702
23159.38512.22028702
23270.3106.90.52.3529005
23370.3101112.31029002
23470.3105.512.31529003
23570.3105.412.31529002
23672.28411.6529003

References

  1. Keneti, A.; Sainsbury, B.A. Review of published rockburst events and their contributing factors. Eng. Geol. 2018, 246, 361–373. [Google Scholar] [CrossRef]
  2. Gong, F.Q.; Yan, J.Y.; Li, X.B. A new criterion of rock burst proneness based on the linear energy storage law and the residual elastic energy index. Chin. J. Rock Mech. Eng. 2018, 37, 1993–2014. [Google Scholar]
  3. Hudyma, M.; Potvin, Y.H. An engineering approach to seismic risk management in hardrock mines. Rock Mech. Rock Eng. 2010, 43, 891–906. [Google Scholar] [CrossRef]
  4. Sepehri, M.; Apel, D.B.; Adeeb, S.; Leveille, P.; Hall, R.A. Evaluation of mining-induced energy and rockburst prediction at a diamond mine in Canada using a full 3D elastoplastic finite element model. Eng. Geol. 2020, 266, 105457. [Google Scholar] [CrossRef]
  5. Gong, F.Q.; Yan, J.Y.; Li, X.B.; Luo, S. A peak-strength strain energy storage index for rock burst proneness of rock materials. Int. J. Rock Mech. Min. Sci. 2019, 117, 76–89. [Google Scholar] [CrossRef]
  6. Ortlepp, W.D.; Stacey, T.R. Rockburst mechanisms in tunnels and shafts. Tunn. Undergr. Sp. Technol. 1994, 9, 59–65. [Google Scholar] [CrossRef]
  7. Hedley, D.G.F. A Five-Year Review of the Canada–Ontario Industry Rockburst Project; Special Report SP90, Division Report SP90-064; Canada Centre for Mineral and Energy Technology, Mining Research Laboratory: Ottawa, ON, Canada, 1990. [Google Scholar]
  8. Lu, C.P.; Liu, G.J.; Liu, Y.; Zhang, N.; Xue, J.H.; Zhang, L. Microseismic multi-parameter characteristics of rockburst hazard induced by hard roof fall and high stress concentration. Int. J. Rock Mech. Min. 2015, 76, 18–32. [Google Scholar] [CrossRef]
  9. Durrheim, R.J. Mitigating the Risk of Rockbursts in the Deep Hard Rock Mines of South Africa: 100 Years of Research. Extracting the Science: A Century of Mining Research; Brune, J., Ed.; Society for Mining, Metallurgy, and Exploration, Littleton: New York, NY, USA, 2010; pp. 156–171. [Google Scholar]
  10. Kaiser, P.K.; McCreath, D.R.; Tannant, D.D. Canadian Rockburst Research Program 1990–1995; Mining Division of the Canadian Mining Industry Research Organization: Sudbury, ON, Canada, 1997. [Google Scholar]
  11. Kaiser, P.K.; Tannant, D.D.; McCreath, D.R.; Jesenak, P. Rockburst Damage Assessment Procedure, Rock Support in Mining and Underground Construction; Balkema, K.M., Ed.; CRC Press: Rotterdam, The Netherlands, 1992; pp. 639–647. [Google Scholar]
  12. Durrheim, R.J.; Roberts, M.K.C.; Haile, A.T.; Hagan, T.O.; Jager, J.A.; Handley, M.F.; Spottiswoode, S.M.; Ortlepp, W.D. Factors influencing the severity of rockburst damage in South African gold mines. In SARES 97—1st Southern African Rock Engineering Symposium; Gurtunca, R.G., Hagan, T.O., Eds.; SARES: Johannesburg, South Africa, 1997; pp. 17–24. [Google Scholar]
  13. Brink, A.; Hagan, T.O.; Spottiswoode, S.M.; Malan, D.F.; Glazer, S.N.; Lasocki, S. Survey and Assessment of Techniques Used to Quantify the Potential for Rock Mass Instability; Safety in Mines Research Advisory Committee: Pretoria, South Africa, 2000. [Google Scholar]
  14. Albrecht, J.; Sharrock, G. A Model to Forecast Rockburst Damage. Challenges in Deep and High Stress Mining; Australian Centre for Geomechanics: Perth, Australia, 2006; pp. 1–15. [Google Scholar]
  15. Heal, D.; Hudyma, M.; Potvin, Y. Evaluating rockburst damage potential in underground mining. In Proceedings of the 41st US Symposium on Rock Mechanics (USRMS), Golden, CO, USA, 17–21 June 2006; pp. 1020–1025. [Google Scholar]
  16. Zhou, J.; Shi, X.Z.; Huang, R.D.; Qiu, X.Y.; Chen, C. Feasibility of stochastic gradient boosting approach for predicting rockburst damage in burst-prone mines. Trans. Nonferr. Met. Soc. 2016, 26, 1938–1945. [Google Scholar] [CrossRef]
  17. Li, N.; Zare Naghadehi, M.; Jimenez, R. Evaluating short-term rock burst damage in underground mines using a systems approach. Int. J. Min. Reclam. Environ. 2020, 34, 531–561. [Google Scholar] [CrossRef]
  18. Pu, Y.Y.; Apel, D.B.; Liu, V.; Mitri, H. Machine learning methods for rockburst prediction-state-of-the-art review. Int. J. Min. Sci. Technol. 2019, 29, 565–570. [Google Scholar] [CrossRef]
  19. Liang, W.Z.; Sari, Y.A.; Zhao, G.Y.; McKinnon, S.D.; Wu, H. Probability estimates of short-term rockburst risk with ensemble classifiers. Rock Mech. Rock Eng. 2021, 54, 1799–1814. [Google Scholar] [CrossRef]
  20. Yin, X.; Liu, Q.S.; Pan, Y.C.; Huang, X.; Wu, J.; Wang, X.Y. Strength of stacking technique of ensemble learning in rockburst prediction with imbalanced data: Comparison of eight single and ensemble models. Nat. Resour. Res. 2021, 30, 1795–1815. [Google Scholar] [CrossRef]
  21. Jiang, K.; Lu, J.; Xia, K.L. A novel algorithm for imbalance data classification based on genetic algorithm improved SMOTE. Arab. J. Sci. Eng. 2016, 41, 3255–3266. [Google Scholar] [CrossRef]
  22. Xue, Y.G.; Li, G.K.; Li, Z.Q.; Wang, P.; Gong, H.M.; Kong, F.M. Intelligent prediction of rockburst based on Copula-MC oversampling architecture. Bull. Eng. Geol. Environ. 2022, 81, 209. [Google Scholar] [CrossRef]
  23. López, V.; Fernández, A.; Herrera, F. On the importance of the validation technique for classification with imbalanced datasets: Addressing covariate shift when data is skewed. Inf. Sci. 2014, 257, 1–13. [Google Scholar] [CrossRef]
  24. Chawla, N.V.; Japkowicz, N.; Kotcz, A. Special issue on learning from imbalanced data sets. SIGKDD Explor. 2004, 6, 1–6. [Google Scholar] [CrossRef]
  25. Zhang, Z.; Krawczyk, B.; Garcia, S.; Rosales-Pérez, A.; Herrera, F. Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data. Knowl. -Based Syst. 2016, 106, 251–263. [Google Scholar] [CrossRef]
  26. Sun, B.; Chen, H.Y.; Wang, J.D.; Xie, H. Evolutionary under-sampling based bagging ensemble method for imbalanced data classification. Front. Comput. Sci. 2018, 12, 331–350. [Google Scholar] [CrossRef]
  27. Todorov, V.; Dimov, I. Innovative digital stochastic methods for multidimensional sensitivity analysis in air pollution modelling. Mathematics 2022, 10, 2146. [Google Scholar] [CrossRef]
  28. Sun, J.J.; Yeh, T.M.; Pai, F.Y. Application of Monte Carlo simulation to study the probability of confidence level under the PFMEA’s action priority. Mathematics 2022, 10, 2596. [Google Scholar] [CrossRef]
  29. Heilmeier, A.; Graf, M.; Betz, J.; Lienkamp, M. Application of Monte Carlo methods to consider probabilistic effects in a race simulation for circuit motorsport. Appl. Sci. 2020, 10, 4229. [Google Scholar] [CrossRef]
  30. Tama, B.A.; Lim, S. A comparative performance evaluation of classification algorithms for clinical decision support systems. Mathematics 2020, 8, 1814. [Google Scholar] [CrossRef]
  31. Rinta-Koski, O.P.; Särkkä, S.; Hollmén, J.; Leskinen, M.; Andersson, S. Gaussian process classification for prediction of in-hospital mortality among preterm infants. Neurocomputing 2018, 298, 134–141. [Google Scholar] [CrossRef]
  32. Liang, W.Z.; Sari, A.; Zhao, G.Y.; McKinnon, S.D.; Wu, H. Short-term rockburst risk prediction using ensemble learning methods. Nat. Hazards 2020, 104, 1923–1946. [Google Scholar] [CrossRef]
  33. Sagi, O.; Rokach, L. Ensemble learning: A survey. Wires Data Min. Knowl. 2018, 8, e1249. [Google Scholar] [CrossRef]
  34. Zhang, J.F.; Wang, Y.H.; Sun, Y.T.; Li, G.C. Strength of ensemble learning in multiclass classification of rockburst intensity. Int. J. Numer. Anal. Methods Geomech. 2020, 44, 1833–1853. [Google Scholar] [CrossRef]
  35. Heal, D. Observations and Analysis of Incidences of Rockburst Damage in Underground Mines. Ph.D. Thesis, University of Western Australia, Perth, Australia, 2010. [Google Scholar]
  36. Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; MIT Press: Cambridge, UK, 2006. [Google Scholar]
  37. Santhanam, V.; Morariu, V.I.; Harwood, D.; Davis, L.S. A non-parametric approach to extending generic binary classifiers for multi-classification. Pattern Recogn. 2016, 58, 149–158. [Google Scholar] [CrossRef]
  38. Yeo, I.K.; Johnson, R.A. A new family of power transformations to improve normality or symmetry. Biometrika 2000, 87, 954–959. [Google Scholar] [CrossRef]
  39. Alodat, M.T.; Shakhatreh, M.K. Gaussian process regression with skewed errors. J. Comput. Appl. Math. 2020, 370, 112665. [Google Scholar] [CrossRef]
  40. Benavoli, A.; Azzimonti, D.; Piga, D. A unified framework for closed-form nonparametric regression, classification, preference and mixed problems with Skew Gaussian Processes. Mach. Learn. 2021, 110, 3095–3133. [Google Scholar] [CrossRef]
Figure 1. Heatmap of the correlation coefficient.
Figure 1. Heatmap of the correlation coefficient.
Mathematics 10 03382 g001
Figure 2. Box plot of indicators for each level.
Figure 2. Box plot of indicators for each level.
Mathematics 10 03382 g002aMathematics 10 03382 g002b
Figure 3. Diagram of a bagged ensemble of Gaussian process classifiers.
Figure 3. Diagram of a bagged ensemble of Gaussian process classifiers.
Mathematics 10 03382 g003
Figure 4. The procedure of the ensemble model for the evaluation of rockburst damage.
Figure 4. The procedure of the ensemble model for the evaluation of rockburst damage.
Mathematics 10 03382 g004
Figure 5. Distribution of indicator values before and after preprocessing.
Figure 5. Distribution of indicator values before and after preprocessing.
Mathematics 10 03382 g005aMathematics 10 03382 g005bMathematics 10 03382 g005c
Figure 6. Average accuracy of five-fold cross validation under different numbers of GPC.
Figure 6. Average accuracy of five-fold cross validation under different numbers of GPC.
Mathematics 10 03382 g006
Figure 7. Average accuracy of five-fold cross validation under different number of GPC.
Figure 7. Average accuracy of five-fold cross validation under different number of GPC.
Mathematics 10 03382 g007
Table 1. Rockburst damage cases at Perseverance nickel mine.
Table 1. Rockburst damage cases at Perseverance nickel mine.
Microseismic EventI1I2I3I4I5I6I8Actual LevelResults Using Heal’s Method [35]Evaluation Results in This Study
#157.7512.20.51.62142700L2L5L2
57.7512.20.51.62222700L2L5L2
47860.51.62292700L2L2L2
#247810.30.51.8102700L3L4L2
4786.60.51.8102700L2L2L4
46.955.90.51.8162700L3L3L4
#347.5104.80.51.5102700L2L2L2
47.5101011.5102700L2L2L2
#439.25511.8132700L2L1L2
43.48511.8132700L2L1L2
#5588120.51.6102700L2L5L2
#658.181112.252700L4L3L2
Table 2. Comparison of evaluation results in different literatures.
Table 2. Comparison of evaluation results in different literatures.
Evaluation CriterionLevelMethodAccuracy
The evaluation value corresponds to the actual value L 1 , L 2 , L 3 , L 4 , L 5 EVP [35]28.0%
L 1 , L 2 , L 3 , L 4 , L 5 EVP.PPV [35]24.4%
L 2 , L 3 , L 4 , L 5 Stochastic gradient boosting approach [16]61.22%
L 2 , L 3 , L 4 , L 5 The proposed method63.27%
The evaluation value corresponds to the actual value or the neighboring value L 1 , L 2 , L 3 , L 4 , L 5 EVP [35]66.1%
L 1 , L 2 , L 3 , L 4 , L 5 EVP.PPV [35]72.4%
L 2 , L 3 , L 4 , L 5 The proposed method91.84%
The evaluation value corresponds to the actual value after combining L 1 , L 2 and L 3 (or L 2 and L 3 ) into one group while L 4 and L 5 into another group L 1 , L 2 , L 3 , L 4 , L 5 EVP [35]71.3%
L 1 , L 2 , L 3 , L 4 , L 5 EVP.PPV [35]78.0%
L 2 , L 3 , L 4 , L 5 The proposed method89.80%
The evaluation value corresponds to the actual value after combining L 2 and L 3 into one group L 2 , L 3 , L 4 , L 5 Rock engineering systems and artificial neural network [17]71%
L 2 , L 3 , L 4 , L 5 The proposed method75.51%
Table 3. Evaluation results of different approaches.
Table 3. Evaluation results of different approaches.
ApproachesConfusion MatrixAccuracyF1
Bagged ensemble of GPCs without data preprocessing 15 4 2 1 3 4 1 1 5 1 4 2 0 1 4 1 48.98%[0.6667, 0.4211, 0.3478, 0.1818]
Bagged ensemble of GPCs without under-sampling 20 1 0 1 5 4 0 0 4 0 6 2 3 1 2 0 61.22%[0.7407, 0.5333, 0.6, 0]
GPC without under-sampling 19 2 0 1 5 3 1 0 4 0 6 2 3 1 2 0 57.14%[0.7170, 0.4, 0.5714, 0]
The proposed method 17 4 0 1 2 6 1 0 2 0 6 4 0 1 3 2 63.27%[0.7907, 0.6, 0.5455, 0.3077]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Chen, Y.; Da, Q.; Liang, W.; Xiao, P.; Dai, B.; Zhao, G. Bagged Ensemble of Gaussian Process Classifiers for Assessing Rockburst Damage Potential with an Imbalanced Dataset. Mathematics 2022, 10, 3382. https://doi.org/10.3390/math10183382

AMA Style

Chen Y, Da Q, Liang W, Xiao P, Dai B, Zhao G. Bagged Ensemble of Gaussian Process Classifiers for Assessing Rockburst Damage Potential with an Imbalanced Dataset. Mathematics. 2022; 10(18):3382. https://doi.org/10.3390/math10183382

Chicago/Turabian Style

Chen, Ying, Qi Da, Weizhang Liang, Peng Xiao, Bing Dai, and Guoyan Zhao. 2022. "Bagged Ensemble of Gaussian Process Classifiers for Assessing Rockburst Damage Potential with an Imbalanced Dataset" Mathematics 10, no. 18: 3382. https://doi.org/10.3390/math10183382

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop