1. Introduction
According to the requirements of safe and rapid excavation and the improvement of underground engineering technology, the tunnel-boring machine (TBM) has been widely used and rapidly developed [
1]. The drilling and blasting methods still rely on manual, mechanical, or blasting, followed by the construction of a support structure based on the surrounding rock conditions. Many construction procedures, such as excavation, ballast removal, and support, result in a slow excavation speed, high labor intensity, and low safety, especially in urban underground space projects [
2,
3]. According to the requirements of safe and rapid excavation and the improvement of underground engineering technology, the tunnel-boring machine (TBM) has been widely used and rapidly developed. TBM can simultaneously perform multiple steps and has many advantages, such as fast excavation speed, precise deformation control, safety, and environmental protection [
4]. It has been widely applied in constructing railways, highways, subways, water conservancy, coal, and other fields [
5].
Tunnel engineering is gradually developing in the direction of large sections and long distances. When using TBM for tunnel excavation, the geological conditions are complex, and there are many risk sources [
6]. There are complex landscapes with severe fluctuations and unfavorable geological conditions [
7], such as a large burial depth, extremely hard rock, a strong rock burst [
8], a soft and large deformation, a mud and water inrush, and a high geothermal energy. However, TBM is sensitive to changes in complex geological conditions [
9]. If the operating parameters are not adjusted in real time, abnormal wear of the cutter and tool can easily occur. These issues can result in inefficient excavation and higher construction costs. Therefore, accurate measurement of the geological and main rock mass parameters before TBM tunneling is particularly important for efficiency and safety [
10]. The full-scale rotary cutting machine (RCM) is the most commonly used test rig. Shin [
11] simulated the excavation process of a hard rock tunnel-boring machine (TBM) and conducted tests on granite using different disc-cutter sizes. Zhang [
12] carried out rock fragmentation tests under different rolling radii. Geng [
13] conducted cutting experiments on rocks with different inclination angles and thicknesses. However, the rock classification method based on different tool installation positions and the dynamic response of components such as tools, cutterheads, and rock masses requires further experimental research.
Before construction, the geological conditions along the tunnel can be roughly obtained through geological exploration [
14]. Many approaches have been applied to study the geological prediction in front of TBM tunnel excavation, including core testing, seismic waves [
15], microseismic [
16], and resistivity. However, these methods roughly predict the strength and integrity of the rock mass in a certain section or the larger-scale undesirable geology. In addition, they can only be performed during the excavation stoppage, as they are unable to provide the real-time and accurate perception of rock mass parameters during excavation. Therefore, it is very important to propose a method that can accurately predict the classification of the rock mass in real time.
Many researchers have introduced machine-learning methods to evaluate rock mass quality through these monitoring data [
17]. With these methods, the relationship between operational data and rock mass classification can be characterized [
18,
19], minimizing the subjectivity and inaccuracy of artificial evaluation [
20]. Currently, the most widely used algorithms are the long short-term memory network (LSTM) [
21] and ensemble-learning algorithm [
22]. Ayawah [
23] evaluated the possibility of a single machine-learning model to predict ground conditions or rock mass in front of TBMs. Liu [
24] proposed a hybrid algorithm that integrates the backpropagation neural network with simulated annealing to predict the rock mass parameters based on TBM drive parameters. Santos [
25] used multivariate statistics and artificial intelligence to predict the Rock Mass Rating (RMR) classification index. A novel method for rock mass classifications was proposed, reducing subjectivity in the parameters and classification methods. Zhang [
26] proposed a generative adversarial network for geological prediction, which accurately estimated the thickness of each rock–soil type in a TBM construction tunnel. In order to establish prediction models, Xu [
27] evaluated the application of five different statistical and ensemble machine-learning methods and two different deep neural networks. It was proved that the accurate prediction of the advance rate, rotation speed, thrust, and torque indicators based on the operating parameters could guide the control and application of a TBM. Hou [
28] selected ten key operation parameters for prediction. The results indicated that the stacking ensemble classifier performs better than individual classifiers, exhibiting more powerful learning and a generalization ability for small and imbalanced samples. Most previous studies are based on a single machine-learning model [
29] and less meaningful features for rock-mass parameter sensing, resulting in a single algorithm function with high limitations and low accuracy.
This paper aims to build a rock mass classification model with high accuracy and stability. The remainder of this paper is organized as follows:
Section 2 presents the framework and methodology. The experimental procedure and data analysis are described in
Section 3.
Section 4 introduces the construction of the stacking technique for ensemble learning.
Section 5 gives the results and discussion. The conclusions are in
Section 6.
2. Materials and Methods
This section presents the methodology of the modeling process of the rock mass classification forecasting employed in this paper. It primarily includes five parts: data transformation, feature engineering, stacking ensemble model construction, and model evaluation. The frame diagram in
Figure 1 illustrates the implementation process of the proposed model.
First, through a large number of literature surveys, it is determined that the TBM signal types used this time are highly correlated with rock mass types, namely thrust [
25], torque [
30], and vibration. However, the direct use of time-domain signals will result in a tedious and time-intensive computational process. Therefore, feature engineering is performed on the three signals separately. The time-domain signal is refined into feature vectors as input to the algorithm. Then, the modeling process of the stacking technique of ensemble learning is introduced. Finally, metrics such as accuracy, confusion matrix, and stability are introduced.
2.1. Data Transformation
In the signal acquisition, the magnitude of vibration, thrust, and torque are collected and transmitted by wireless acceleration sensors, thrust sensors, and torque sensors, respectively. In this process, outliers and missing values inevitably occur. Therefore, the three time-domain signals are synchronously processed. Missing values are filled by interpolation, and fragments containing outliers are removed. The timestamp node is defined by the time for each revolution of the cutterhead, and the three signals are truncated and saved synchronously. Data from the initial rising stage and the unloading section were excluded. Only the table segment is retained as the analysis object for rock mass-classification prediction.
2.2. Feature Engineering Based on TBM Parameters
This section introduces the basics of feature engineering for text. The input length is closely related to the computation time of the machine-learning model, and a redundant input of signals can also lead to a decrease in accuracy. Therefore, it is crucial to design features to describe the relationship between rock mass and algorithms, making it possible to evaluate and optimize the stacking technique for ensemble learning.
2.2.1. Torque and Thrust
With reference to signal analysis methods and mathematical and statistical fundamentals, 9 characteristic of torque and thrust signals were calculated, respectively: maximum (Max), minimum (Min), peak, mean, variance (Var), root mean square (RMS), 25th percentile (Q1, the position is
PQ1), 50th percentile (Q2, the positions is
PQ2), and 75th percentile (Q3, the positions is
PQ3). Its calculation formula is shown in
Table 1.
2.2.2. Vibration
In this section, a feature engineering method is proposed to convert the vibration time-domain signal into a novel spectrogram-based local amplification feature (SLAF). The processing flow is shown in
Figure 2.
In this study, to prevent spectral leakage, the Hanning window is used for windowing. The fundamental idea of the Hanning window is to gradually taper the data at the end of the record, and therefore to avoid the abrupt truncation by a rectangular window [
30]. The Hanning window can be expressed as Equation (1). Windowing and time segmentation are used for transforming signals to frames. In this paper, a 50% window-width step length and a window length of 1 s are used.
where
t is the time, and
T is the window width.
Further, Fast Fourier Transform (FFT) is performed on each frame after division and windowing. This converts the vibration signal from the time domain to the frequency domain signal (
Xa(
k)). And the frequency of each frame in the time dimension is superimposed to obtain a spectrogram. It represents the frequency characteristics of the vibration of the cutterhead over a period of time. The
Xa(
k) is as follows:
where
a is
ath frame,
k represents the
kth spectral line in the frequency domain, and
is the number of sampling points.
The main frequency of the cutterhead vibration is concentrated in the local range. It represents that the perception of frequency is non-linear. Therefore, a triangular filter bank is used, and the layout is sparse and adjustable. Only filter banks with the same bandwidth are shown in
Figure 2c. Through the operation of spectrum and filter bank, the low-dimensional feature in the range of each filter bank can be obtained, as shown in
Figure 2d. This preserves the frequency integrity, while also simplifying the input feature dimension of the original features. The expression function of the triangular filter is as follows:
where
Hm(
k) is the
kth value of the
mth filter in the filter bank, and
f(
m) is the corresponding frequency.
The energy (
E(m)) in the range of a single filter can be obtained by calculating the spectrum and filter bank. After dividing by the frequency length of this filter bank, it is expressed as the mean vibrational spectral energy. The
E(m) can be expressed as follows:
2.2.3. TBM Performance
The advance rate (AR, mm/min), rotating speed (RS, rev/min), field penetration index (
FPI, kN/cutter/mm/rev), and torque penetration index (
TPI, kN m/cutter/mm/rev) are selected as the TBM operation indicators:
where
Fn is cutterhead load (kN),
Pn is actual torque (kN), and
num signifies the numbers of cutters on the cutterhead.
2.3. Stacking Ensemble-Learning Model and the Workflow
In order to improve the accuracy and stability, a two-layer stacking ensemble-learning model is developed. It can mine characteristic information more accurately than a single model. The workflow of the proposed method is shown in
Figure 3.
The stacking ensemble learning first divides the original dataset
(an,
bn) into a training set and a test set in a 4:1 ratio, where
an is the feature and
bn is the label. In order to train and test the prediction ability, the training set is divided into K-parts through the K-Fold function. For example, when
K = 5, the training set Train is divided into
Train-1,
Train-2,
Train-3,
Train-4, and
Train-5.
Train-k and
Traink are defined as the K-th test set and training set in K-fold cross-validation, respectively. Five models were obtained after five cross-validations, as well as the prediction results,
Prediction-1,
Prediction-2, P
rediction-3,
Prediction-4,
Prediction-5, for the test set after re-slicing on the five models. The length of the vertical stack of five predictions is identical to the Train length. The test set is brought into the five models to determine the prediction result (
test result), as shown in
Figure 3.
The output of the first layer is used as the input data for the second layer of stacking, and the results are output by the model in the second layer. When multiple base models are used, the predictions are horizontally stitched as described above. In this way, the transformation of all data from input features to output labels is achieved.
2.4. Performance Metrics Introduction
The evaluation indicators in the classification model include Accuracy, Precision, Recall, F1-score, and confusion matrix. Taking the binary classification problem as an example, all events are divided into positive (P) and negative (N), and predicted events are classified as true (T) and false (F). In this way, four predictions of TN, TP, FN, and FP are generated.
Accuracy represents the proportion of the correct number of predictions in the total sample, but it is not sensitive to the problem of sample imbalance. Therefore, Precision, Recall, and F1-score are designed to evaluate the classification accuracy of positive and negative classes. Precision indicates the proportion of the number of the true-positive samples to all predicted positive samples. Recall is the probability of being predicted as a positive sample out of actual positive samples. F1-score denotes the composite metric of Recall and Precision, eliminating the one-sidedness of these two indices to a certain extent.
An indicator,
Sk, is also designed for evaluating the confidence of the prediction results. When the predicted probability of the classifier output class is much larger than that of other classes, the confidence of the classifier is high, and the
Sk is closer to 0. Moreover, it also reflects the stronger stability of the model.
Accuracy,
Precision,
Recall,
F1
-score, and
Sk are defined as follows:
4. Model
4.1. Model Selection and Combination
To build the stacking ensemble-learning model, three components must be defined: combinations of input base model features, combinations of base algorithms, and types of meta−model algorithms that combine them. Each signal type is weighted differently when predicting rock mass−strength classes, especially in the soft—hard stratum. All features are divided into four groups and input into different base models, including thrust feature, torque feature, vibration feature, and the combination of all signal features.
When selecting the base model, it is necessary to comprehensively consider the accuracy and the correlation. The model needs to reflect the advantages of the stacking ensemble-learning model, classifying the results from various spatial perspectives. Therefore, several models that have performed well in classification prediction are selected, including logistic regression (LR), SVM, k-nearest neighbor (KNN), RF, gradient-boosting decision tree (GBDT), extreme gradient boosting (XGBoost), light gradient-boosting machine (LGBM), categorical boosting (CatBoost), CNN, and LSTM. The correlation between the prediction results can be reflected by the Pearson correlation coefficient.
For the selection of the meta-model, simpler algorithms are often selected to prevent the overfitting of the stacking ensemble-learning models. Therefore, the meta-model used in this study is selected from LR, SVM, LGBM, and CatBoost.
4.2. Feature Importance Identification
XGBoost, GBDT, and RF can obtain feature contribution scores according to the gain of decision tree. In model training, the contribution scores are directly related to the usage efficiency of each feature. The importance of each feature is ranked using XGBoost, GBRT, and RF, and the top eight are shown in
Table 5. Although the feature rankings of the three models are different, most features are of similar importance. The first two features are both
Fpeak and
AQ2, followed by the features that all appear in three models, which are
Tpeak,
Tvar, and
TQ1. The variables that appear twice are
Fvar and
AQ3, and the variables that appear once are
A65 and
A68.
4.3. Establishment of Stacking Ensemble-Learning Model
This section explores the optimal number of base models. The base model number affects the computational time and accuracy of the stacking ensemble-learning model. The number of base models trained on each signal type ranges from 0 to 6. The base models include LR, SVM, KNN, RF, GBDT, XGBoost, LGBM, CatBoost, CNN, and LSTM. The meta-model is selected from LR, SVM, LGBM, and CatBoost. The combination of optimal accuracy is selected via a grid search, and the base model is randomly selected. The final accuracy and computation time of the stacking ensemble-learning model are averaged over four runs, as shown in
Figure 10.
It can be seen that, as the number of base models increases, the accuracy of the stacking ensemble-learning model shows a trend of first increasing sharply and then stabilizing. When the number of the base model is increased from 1 to 2, the accuracy increases by about 4%. When the number is 3 and above, the accuracy remains table. It also reflects that SVM has the best prediction effect as a meta-model. The computation time of the stacking ensemble-learning model increases with the number of the base model, with six base models being three times faster than one. Considering the computational time and accuracy, the base model number is finally determined to be 3.
In order to verify the influence of the base model combination on the prediction results, two random and diverse selection methods are used. The random selection only considers the accuracy of the algorithm used, while the diverse selection is conducted to ensure the differences between base models.
Table 6 lists all combinations. For the selection of the meta-model, a simpler algorithm is selected to prevent the stacking ensemble-learning model from overfitting. The meta-model is selected from LR, SVM, LGBM, and CatBoost.
6. Conclusions
In this paper, rock mass classification was studied based on full-scale rotary-cutting experiments. Thrust, torque, and vibration signals from TBM-equipped sensors were trained independently. A stacking ensemble-learning model was proposed using a novel spectrogram-based local amplification feature. The results indicate that the proposed model has high precision in mass classification prediction, which can be used to avoid disasters caused by mispredicting the strength of the rock mass. Some major conclusions can be derived:
(1) The peak value of thrust and the first dominant vibration frequency are the two most important features in model prediction.
(2) The mean and variance of thrust and torque and the root mean square of vibration positively correlate with rock strength.
(3) The number, type, and combination of base-models have a significant impact on the accuracy of the stacking ensemble-learning model.
(4) According to the traditional evaluation index and stability test index, the Comb-1-SVM has high accuracy and stability, which is suitable for rock mass classification prediction of the TBM tunnel face.
Due to the complex framework of the stacking ensemble-learning model, the basic model needs to be trained many times. It takes more time than a single model. Therefore, future studies will focus on building the appropriate distributed computing environment for different base models and reducing the algorithm’s complexity through multi-tasking. Our findings underscore the effectiveness of a stacking ensemble-learning model, featuring a novel spectrogram-based local amplification approach, in predicting rock mass properties with high precision. Nevertheless, the translation of this research into practical engineering applications necessitates extensive field-data collection and continuous model refinement. We are actively engaged in these endeavors and look forward to sharing our latest advancements and insights in the near future.