1. Introduction
The uniaxial compressive strength (UCS) is one of the most important physical and–mechanical characteristic parameters of rock masses in civil and mining engineering design, which is also to be used for rock mass classification [
1,
2]. To date, the main accurate way to obtain the UCS is the direct laboratory method in the light of the International Society for Rock Mechanics (ISRM) and the American Society for Testing Materials (ASTM) [
3]. However, the high-quality cores are necessary to obtain effective and reliable UCS in terms of the direct laboratory, and it is extremely difficult to obtain highly weathered rocks [
1]. Furthermore, the complex operation, time-consuming aspects, and expensive equipment costs of the direct laboratory are often not considered into the UCS calculation in small- and medium-sized rock engineering projects. Therefore, it is a challenging and practical task for modern engineers to explore a convenient and accurate measurement method for rock UCS.
The empirical approaches are firstly developed by engineers and had achieved some good estimation results for estimating the rock UCS [
4,
5,
6,
7,
8,
9,
10,
11]. The empirical approaches are usually presented in the form of regression formulas, i.e., one or more parameters related to UCS are considered to establish deterministic equations for the UCS calculation. The results of the literature review showed that the porosity (P
n), the Schmidt hardness rebound number (SHR), the P-wave velocity (V
p), and the point load strength (PLS) are generally considered independent variables of the most of the empirical equations [
3,
8,
9,
12,
13].
Nevertheless, the empirical formula universality is gradually exposed due to the limitation of sample location and lithology [
14,
15]. The same empirical formula is applied to different rock types while obtaining underestimates or overestimates for the UCS. Furthermore, the selection of independent variables depends largely on experienced engineers, which leads to objective errors. To eliminate the influence of lithology and the number and types of input parameters on the UCS estimation, numerous researchers have reported some successful cases in predicting the rock UCS by using different prediction models based on the artificial intelligence (AI) techniques, such as the artificial neural network (ANN) [
16,
17,
18], the adaptive neuro-fuzzy inference system (ANFIS) [
19,
20], the support vector machine (SVM) [
3,
21,
22], and the multi-layer perceptron (MLP) [
20,
23]. The random forest (RF) technique, with the advantages of anti-overfitting ability and processing the large amounts of data, is a common artificial intelligence model used to solve engineering prediction problems [
24,
25]. Many attempts have been tested to consider different metaheuristic optimization (MHO) algorithms to improve the performance of RF models, e.g., the imperialist competitive algorithm (ICA) [
9,
23], the particle swarm optimization (PSO) [
12,
17,
25,
26,
27], the grey wolf optimization (GWO) [
28,
29], the artificial bee colony algorithm (ABC) [
30], the firefly algorithm (FA) [
31], multi-verse optimizer (MVO) [
32], and the sine cosine algorithm (SCA) [
33]. However, there are some algorithms that have not been applied to optimize the RF model for predicting the rock UCS (e.g., flame optimization (MFO), the lion swarm optimization (LSO), and the sparrow search algorithm (SSA)). In this study, four MHO algorithms are used to improve the performance of the RF models, i.e., GWO, MFO, LSO, and SSA. It should be noted that the hyperparameters of RF model and internal parameters of these MHO algorithms (e.g., number of trees (Nt) and the minimum sample number at a leaf node (Minlefsize) and the population in the MHO algorithms) are not easily understood and optimized compared to the parameters of empirical formulas [
34].
In fact, mining engineers and geologists tend to use empirical approaches to estimate the UCS when the rock types have been identified. Furthermore, there are some novel intelligent models and optimization algorithms that have not been applied to the UCS prediction. Therefore, this study aims to compare the performance of empirical approaches and some novel AI models for predicting the rock UCS. To achieve this goal, various empirical equations are proposed as the representatives of empirical approaches, and four hybrid random forest (RF) models with different MHO optimization algorithms (i.e., GWO, MFO, LSO, and SSA) are developed and compared for the UCS prediction. A total of 386 rock samples are used to generate empirical equations and train MHO-RF models. Four statistical evaluation indices, i.e., the root mean square error (RMSE), the determination coefficient (R2), the Willmott’s index (WI), and the variance accounted for (VAF), are used to evaluate the performance of all the developed models.
3. Rock Data Preparation and Performance Indices
To evaluate the performance of AI models and empirical approaches for predicting the UCS, more rock samples from various rock engineering projects with lithologic diversity were integrated to the rock database used in this study. As a result, a dataset of 386 rock samples was collected from different previously published research studies, including 30 Travertine samples from Haji mine by Dehghan et al. [
8]; 71 Granite block rock samples from the PSRWT tunnel by Armaghani et al. [
9]; 115 Granite samples of weathering Grade III from the bedrock in Macao, China by Ng et al. [
54]; and 170 hybrid rock samples (Claystone, Granite, Schist, Sandstone, Travertine, Limestone, Slate, Dolomite, and Marl) from a quarry in Iran by Mahmoodzadeh et al. [
3]. The above samples can be divided into three categories according to lithologies, i.e., igneous (Granite), sedimentary (Travertine, Claystone, Sandstone, Limestone, Dolomite, Marl), metamorphic (Schist, Slate). Reviewing the published studies, the P
n, the SHR, the V
p, and the PLS were also considered as input variables to predict the UCS; the statistical information of input and output variables according to the rock lithologies are shown in
Table 4. As it can be seen in this table, the statistical values of the variables were similar for each rock lithology, indicating that the underlying relationship between four input variables and an output variable was consistent. Therefore, the rock data of different lithologies can be combined into a new database to improve the model prediction performance.
Figure 1 shows the correlation between input and output variables based on different rock types. For the igneous rock data, the correlation between the V
p and the UCS was the greatest. The SHR had a stronger correlation with the UCS than other variables for both of sedimentary and metamorphic samples. Note that except the P
n, other three variables were positively correlated with the UCS. In general, correlation results directly illustrated the necessity for the above four variables with high correlation coefficient values to be considered as input variables for predicting the UCS.
Four statistical evaluation indices were used to evaluate the performance of the empirical approaches and the proposed AI models, including the fact that the RMSE was responsible for measuring the difference between model predictions and observed values, the R
2 was used to judge the model fitting effect, and the WI was used to measure prediction accuracy and the VAF. The mean squared error (MSE) especially was considered separately as the fitness function to evaluate the optimization performance of all used MHO algorithms. These performance indices were introduced in several references [
61,
62,
63,
64,
65,
66,
67,
68,
69] and are defined as follows:
where
n is the number of the samples in the training and testing phase.
Ui and
ui are the actual and predicted values of the UCS, respectively.
and
are the average of the actual values and the predicted values of the UCS, respectively.
5. Comparison of Prediction Performance
After developing the SR and the MR equations and four MHO-RF methods, a series of comparative evaluation analysis between empirical approaches and AI methods for predicting the rock UCS was conducted in this section.
Table 8 illustrates the performance indices results of 16 SR equations, 2 MR equations, and 4 MHO-RF models in the training phase. As can be seen in this table, four SR equations developed by PLS (SR-1. SR-5, SR-9, and SR-13) have poor performance with lower values of R
2 (even less than zero; this is caused by the very large deviation of the prediction demonstrated in Equation (4)), WI, and VAF and higher values of RMSE. Among these SR equations, SR-14 has obtained the best performance indices of R
2 = 0.7090, RMSE = 26.2379, WI = 0.8974, and VAF = 71.9010%. By contrast, two MR equations and four hybrid MHO-RF models have satisfactory performance indices by considering high values of R
2, WI, and VAF (close to 1, 1, and 100%, respectively) and low values of RMSE (close to 0). Among them, the MR-2 (R
2 = 0.7559, RMSE = 24.0312, WI = 0.9265, and VAF = 75.5940%) and SSA-RF (R
2 = 0.9224, RMSE = 13.5502, WI = 0.9788, and VAF = 92.2401%.) are the best model of MR equations and all AI models for UCS prediction in the training phase, respectively. However, the prediction performances of the considered four MHO-RF models are obviously superior to two MR equations with higher accuracy.
To further compare the performance of empirical approaches and AI models for predicting the UCS, the regression diagrams of all SR and MR equations and four MHO-RF models are demonstrated in
Figure 6,
Figure 7 and
Figure 8. The vertical and horizontal coordinates represent the predicted and observed values of UCS, respectively. The solid black line in each diagram represents the line with 0 error between the predicted and observed UCS. The other dotted lines represent the lines with errors of 10% and 30%, respectively. The significance of these error lines is that the more data points are concentrated on the line with 0 error, the stronger the prediction performance of the model will be. As can be observed in these pictures, the power equation of P
n (SR-14), multivariate quadratic equation (MR-2), and SSA-RF model of MHO models have more data points concentrated on and near the line with 0 error than other models of the same type in the training phase, respectively.
The performance of the all models in the training phase cannot represent the final performance in the UCS prediction, and it is vital to continue to keep good prediction performance in the testing phase.
Table 9 illustrates the performance indices of 16 SR equations, 2 MR equations, and 4 MHO-RF models using the test ser. As it can be seen in this table, the power equation of P
n (SR-14) and MR-2 equation also has a better performance by resulting in higher values of R
2 (0.7558 and 0.8321), WI (0.9218 and 0.9488) and VAF (76.4239% and 83.3190%), and lower values of RMSE (22.9797 and 19.0525) than other models of the same type, respectively. For AI models, the LSO-RF model has replaced SSA-RF as the best model with the highest accuracy (R
2 = 0.8997, RMSE = 14.7261, WI = 0.9731, and VAF = 90.2630%) in the testing phase.
The necessary validation can prevent the adverse result of the inconsistent performance of the aforementioned models in the training and testing phase.
Figure 9,
Figure 10 and
Figure 11 show the regression diagrams of all SR and MR equations and four MHO-RF models in the testing phase. As it can be seen in these pictures, the SSA-RF obtained an unsatisfactory prediction performance compared to the training phase in terms of resulting in fewer data points clustered on the line with 0 error. Conversely, the LSO-RF model has the largest number of concentrated points on the line with 0 error, and the power equation of P
n (SR-14) and multivariate quadratic equation (MR-2) also have more data points concentrated on and near the line with 0 error than other models of the same type in the testing phase, respectively.
Based on the performance results in
Table 8 and
Table 9, the best model based on the empirical approaches and AI models is the SR equation of P
n, the MR equation of multivariate quadratic, and the LSO-RF model, respectively. To clearly compare the performance differences between empirical models and AI methods in predicting UCS, the graphs include compressive curves, error analyses, and the regression diagrams of the UCS predicted by empirical and artificial intelligence models in the training phase, which are shown in
Figure 12. As it can be seen in
Figure 12a, the prediction curves of UCS for the three models are basically consistent with the original training curve, but the LSO-RF model has obviously better performance. The distribution of errors between the observed and predicted UCS of the three models is shown in
Figure 12b. The LSO-RF model has the lowest median value of error (5.64), and the SR equation of n has the largest median value of error (13.13). Meanwhile, the upper and lower errors obtained by the SR model are broader than the other two models, which represent the worse prediction performance.
Figure 12c shows the regression diagram of all models in the training phase. As it can be observed in this diagram, the LSO-RF model not only has more data points clustered on the line with 0 error, but it also has the highest value of R
2 (0.9200). After this model, the MR equation of multivariate quadratic has a better prediction performance than the SR equation of P
n. The same results of performance comparison have been obtained in the testing phase, as shown in
Figure 13.
To further accurately evaluate the performance of all models in the testing phase, the graphical Taylor diagram is also drawn in
Figure 14. A typical Taylor diagram can be divided into three parts, i.e., correlation coefficient, standard deviation, and RMSE. As it can be seen in this picture, the red arcs and dots represent the correlation coefficient, the black arcs and dots represent standard deviation, and the green arcs and dots represent RMSE. The RMSE and correlation coefficient of the test data is defaulted to 0 and 1, respectively. Then, the prediction performance is determined by a correlation coefficient, standard deviation, and RMSE, which will be compared with those of the measured data in the test set. It can be observed that the LSO-RF is the best model with the closest position to the test.
After determining the best model for predicting the UCS of rock, the importance of input variables can be estimated by using the LSO-RF model. In addition, the MR equation of multivariate quadratic is also used to calculate the importance of input variables for comparison with the LSO-RF model. The results of the sensitivity analysis are shown in
Figure 15. As it can be seen in this picture, the most important input variable is the P
n with the scores of 0.7398 and 0.7031 obtained from the LSO-RF model and MR equation, respectively. The order of importance of the remaining parameters is the V
p (LSO-RF: 0.6311 and MR: 0.6367), the SHR (LSO-RF: 0.5814 and MR: 0.5675), and the PLS (LSO-RF: 0.5070 and MR: 0.4343).
6. Conclusions and Summary
As one of the most important physical and mechanical characteristic parameters for rocks in civil and mining engineering, the UCS can be estimated using various methods. In this study, the widely used empirical approaches by mining engineers and recently concerning AI methods were developed and compared in UCS predicting. A total of 386 rock samples were collected to form a dataset, and the Pn, the SHR, the Vp, and the PLS are considered input variables. The results of performance indices showed that the power equation of Pn and multivariate quadratic equation are the best models of SR and MR equations, respectively, and all MHO-RF models of AI techniques have superior performance than empirical approaches for predicting the rock UCS. However, the LSO-RF model is the best model among the three AI excellent models by means of higher R2 (0.9200; 0.8997), WI (0.9781 and 0.9731), and VAF (92.0076%; 90.2630%) and lower values of RMSE (13.7545; 14.7261) in the training and testing phases, respectively. Meanwhile, the sensitive analysis results illustrated that the Pn is the most important input variable for predicting the rock UCS.
Compared with the empirical method to predict the rock UCS, the advantages of AI techniques are strong data compatibility and model generalization. Since only nine rock types from three major lithologies were collected to train the AI models, the prediction accuracy for other rock types other than that used in this paper is not guaranteed. Therefore, more UCS data from various rock types should be supplemented to further improve the prediction accuracy of the proposed models. However, the random population initialization tends to trap optimization into local minima. Therefore, the LSO algorithm must be further optimized to select the optimal model hyperparameters. The chaos mapping can be introduced to achieve this goal. Furthermore, other AI models should also be developed to predict the UCS for generating a multivariate mixing model to adapt to UCS estimations of different rocks.