Prediction Models and Feature Importance Analysis for Service State of Tunnel Sections Based on Machine Learning

Zhao, Debo; Yang, Yujia; Cao, Chengyong; Liu, Bin

doi:10.3390/app14209167

Open AccessArticle

Prediction Models and Feature Importance Analysis for Service State of Tunnel Sections Based on Machine Learning

¹

College of Civil and Transportation Engineering, Shenzhen University, Shenzhen 518060, China

²

Shenzhen Key Laboratory of Green, Efficient and Intelligent Construction of Underground Metro Station, Shenzhen 518060, China

³

Key Laboratory for Resilient Infrastructures of Coastal Cities (MOE), Shenzhen University, Shenzhen 518060, China

⁴

Hualan Design & Consulting Group, Nanning 530011, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(20), 9167; https://doi.org/10.3390/app14209167 (registering DOI)

Submission received: 4 September 2024 / Revised: 19 September 2024 / Accepted: 7 October 2024 / Published: 10 October 2024

(This article belongs to the Special Issue The Application of Machine Learning in Geotechnical Engineering, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

The evaluation of tunnel service conditions is a core problem in the maintenance of tunnel structures during their life cycles. To address this problem, machine learning algorithms were applied to the National Tunnel Inventory (NTI) database of the Federal Highway Administration of the United States to predict the service states of the structural, civil, and non-structural sections of a tunnel, respectively. The results indicate that ensemble learning algorithms such as Light Gradient Boosting Machine (LGBM) and Random Forest outperform Support Vector Machine, Multi-Layer Perceptron, Decision Tree, and K-Nearest Neighbor in solving imbalanced classification problems presented in the NTI database. The machine learning models established using the LGBM algorithm exhibited prediction accuracies of 90.9%, 96.4%, and 77.3% for the structural, civil, and non-structural sections, respectively. The importance sorting of features influencing the tunnel’s service state was then performed based on the LGBM model, revealing that the features with a significant impact on the service states of the structural, civil, and non-structural sections are service time, tunnel length and width, geographic position (longitude and latitude), minimum vertical clearance, annual average daily traffic (AADT), and annual average daily truck traffic (AADTT). Data-driven LGBM models identified human factors such as AADT and AADTT as key features influencing the service states of tunnels’ structural sections, and these factors should be taken into consideration in further research to elucidate the potential physical mechanisms.

Keywords:

tunnel service status; machine learning; light gradient boosting machine; ensemble learning; feature importance

1. Introduction

Tunnels have become an integral part of the transportation network, connecting vast terrains across China and other countries. Various tunnels have become operational in the last few decades, and the emphasis in tunnel engineering research and practice has gradually shifted from the construction phase to the operation phase throughout a tunnel’s entire life cycle. Tunnel infrastructures are usually designed to last for over 100 years, but they are subjected to a combination of complex loadings and aggressive environments, leading to long-term service performance degradation. Considering the massive initial construction investments required and maintenance difficulties encountered during the service lives of tunnels, the performance degradation of tunnels can create significant economic and social issues. Therefore, evaluating the service states of tunnels is critical to maintaining healthy service performance, ensuring the safety of tunnel structures, and allowing smart decisions to be made for interventions regarding maintenance and repair measures.

Researchers have identified various factors affecting the long-term service states of tunnels and their significance. In-service inspections have shown that many tunnels suffer from critical issues such as lining cracking [1,2], water leakage [3,4], floor heave [5,6], etc. Moreover, tunnels built in coastal areas and the southwestern mountainous regions of China suffer from degradations caused by the erosion of chloride and sulfate ions existing in the surrounding soil and water [5]. Ye et al. [7] investigated defects observed in 90 operational tunnels in China. It was found that the most common defects were cracks in the lining, water leakage, and human factors. A similar study conducted in Sweden revealed that the factors influencing the deterioration rate of tunnels include tunnel service time, traffic volume, the ventilation system employed, the waterproofing system employed, rock type, and groundwater chemistry [8]. Li et al. [9] conducted a case study on a metro shield tunnel in soft ground in Shanghai and proposed a comprehensive mathematical model for assessing the performance of tunnels. This mathematical model yields a Tunnel Serviceability Index (TSI) based on various measurable indicators obtained through statistical analysis. An analysis based on the TSI formula revealed that relative settlement, uneven settlement, convergent deformation, and water leakage significantly influenced the tunnel’s condition, while cracks and spalling had a lesser impact.

A tunnel’s service state is influenced by numerous factors, which often have complex coupling effects. However, the aforementioned research has failed to account for the complex coupling effects of serviceability performance factors when evaluating tunnel serviceability due to the unavailability of required applied theory and standard procedures. In order to understand and study tunnels’ service states, data-driven machine learning techniques have been developed, constituting powerful tools due to their ability to explore the implicit relationships among multiple factors. Various machine learning algorithms have been applied in research to assess tunnels’ service states. Xu et al. [10] employed the Long Short-Term Memory (LSTM) algorithm to extract data features from tunnel inspection data and accurately predict tunnel deformation. Bai et al. [11] utilized data-driven machine learning techniques to detect potential geological changes and anomalies that may occur during tunnel construction. Ahmed et al. [12] developed prediction models for tunnel deterioration using machine learning algorithms. However, the machine learning algorithms used in these studies still face challenges, such as an inability to directly process categorized features and obtain feature importance rankings, not being good enough to handle unbalanced categorization problems, having long computation times, and possessing insufficient accuracy and interpretability. In contrast, the Light Gradient Boosting Machine (LGBM) algorithm, a kind of ensemble learning algorithm, provides a powerful prediction tool that offers several advantages such as direct support for categorical features, access to feature importance rankings, an exceptional ability to handle unbalanced classification problems, and precise classification. The LGBM algorithm has been widely applied in stock price prediction, electricity theft detection, engineering cost prediction, the determination of the total time vessels spend at ports and delay time estimation, and other applications in other specialized fields [13,14,15,16]. The cited studies [13,14,15,16] indicate that the LGBM algorithm possesses efficient processing speed, superior predictive performance, and the capacity for automatic feature selection, demonstrating significant advantages when dealing with large-scale datasets and high-dimensional features. Previous studies have already used ensemble learning models to predict the overall service conditions of tunnels based on specific tunnel parameters [12,17]. However, tunnels are complex systems that consist of different types of sections, including tunnel structures, tunnel pavement, and mechanical system facilities, each with distinct performance evolution patterns. Therefore, the use of subdivided data-driven models would be beneficial for conducting detailed research on the service performance of the various components of tunnels.

Multiple factors with strong coupling effects influence the evolution mechanism of a tunnel’s service state. Moreover, a tunnel is a complex system with various parts whose performance evolution mechanisms may vary greatly. To tackle these research questions, in this study, we aimed to address the challenges related to tunnel service performance assessment by developing service performance predictive models of different types of tunnel elements. For this purpose, the National Tunnel Inventory (NTI) [18] database, which contains a large quantity of tunnel service data collected in the United States, was used to establish a machine learning model to predict the deterioration states of different tunnel sections. Furthermore, the procedure used to analyze the database and the mechanism behind the sorting of features affecting the service states of various tunnel sections are also presented in this study. Our research findings can serve as the foundation for assessing tunnel service performance to facilitate decision making in tunnel management while ensuring the safe operation of tunnels.

2. Introduction to the Database

2.1. Overview

The NTI database was developed by the Federal Highway Administration (FHWA) [18] through the collection of service inspection data for tunnels throughout the United States. These data were collected according to the National Tunnel Inspection Standards (NTIS) [19] and the Specifications for the National Tunnel Inventory (SNTI) [20]. The NTI database contains information on various parameters of a tunnel, such as geographic information, tunnel length, service time, and annual average daily traffic. Providing a quick and brief overview of the database, Figure 1a–c depict the distributions of tunnel length, service time, and average annual daily traffic volume based on 2020 tunnel data. Figure 2 displays the geographical distribution of the tunnels with yellow dots. In addition, the NTI database also provides a detailed assessment and record of the service statuses of various elements in the tunnels. It compiles a vast quantity of recent field-measured data encompassing a wide range of features, making it highly valuable for analytical purposes.

In this study, we utilized a total of 2092 four-year tunnel service status survey data collected from 2018 to 2021, and after data screening, the total number of complete observations used for this study is 1203. The rules for data screening were as follows: (a) The service condition gradings of the samples could have been given for all three types of components, i.e., three labeled values can be obtained for each sample. (b) The samples could not have missing values within the range of the sample characteristics considered.

2.2. Classification of Tunnel Elements

As a complex system, the tunnel infrastructure contains various elements with varying functions, such as lining structure, pavement, and ventilation. The performances of these elements are affected by diverse factors. To develop more refined and precise models, the tunnel elements are first categorized based on their functions; then, prediction models can be established for particular categories. In this study, tunnel elements are classified into three main categories based on their functional use, according to the element classification provided by the SNTI [20]. The first category is the structural section, constituting the primary bearing elements of the tunnel, such as the tunnel lining, beams, and columns. The second category is the civil section (which directly interacts with vehicles), including concrete pavement, traffic guardrails, and pedestrian railings. The third category is the non-structural section, including the mechanical system section, electrical and lighting system section, fire/life safety/security system sections, signaling elements, and protection system sections. Table 1 displays the classification and typical elements of tunnels.

2.3. Tunnel Element Service State Ratings

The machine learning models developed in this study utilize the service state ratings of three distinct types of tunnel elements: structural, civil, and non-structural. The NTI dataset was utilized to thoroughly examine and evaluate the service state of each tunnel element, categorizing their in-service conditions into four levels. Condition 1 represents a good condition with no significant performance degradation. Condition 2 is a fair condition with one instance of failure or deterioration. Condition 3 is a poor condition with multiple deterioration conditions but no reduction in load-carrying capacity. Condition 4 indicates severe deterioration, requiring a structural safety review. The four serviceability conditions defined are in line with the SNTI [20] and give detailed indicators for the service condition assessment of each tunnel element.

Here,

i

denotes the index of a particular tunnel (i = 1, 2, …, the total number of tunnels), and j denotes the index of a specific element of a tunnel (j = 1, 2, …, n_i), where n_i is the number of specific elements contained in a certain type of element in tunnel i. The combined goodness of a specific element of a tunnel can be calculated according to Equation (1).

x_{i j} = \frac{number of element j in condition 1}{total number of element j}

(1)

After obtaining the combined goodness of all specific elements, the overall goodness of a certain type of element of tunnel i can be obtained using Equation (2).

Overall goodness of a type = \frac{\sum_{j = 1}^{n_{i}} x_{i j}}{n_{i}}

(2)

where x_ij is the combined goodness of element j of tunnel i. According to the regulations of the American Society of Civil Engineers (ASCE) [21], the service state of infrastructure can be divided into four ratings. Table 2 displays the relationship between ratings and overall goodness. In this study, a similar concept was adopted to grade the service states of tunnel elements. It is worth noting that the dataset does not include tunnels classified as substandard F status as they are not in service. A total of 2092 data were collected over the four years from 2018 to 2021. After undergoing thorough data screening, 1203 complete observations were used for this study. The data-screening process was conducted in accordance with two rules: (1) The samples could have been given their in-service status gradings for all three types of elements, resulting in three labeled values per sample. (2) All of the samples’ values had to be within the range of features considered in this paper.

3. Methodology

The process of classifying, evaluating, and predicting the deterioration of tunnels in the United States involves three main steps: data preprocessing, model development, and model evaluation. Each step is explained thoroughly in this section.

3.1. Data Preprocessing

3.1.1. Categorical Variable Conversion

The features affecting the service states of tunnels in the database can be classified into two types: categorical variables (e.g., functional classification, tunnel shape, and ground condition) and numerical variables (e.g., longitude, latitude, and average daily traffic). The NTI dataset represents categorical variables using integer codes. For instance, geologic conditions are coded as 1 for soil, 2 for rock, and 3 for mixed face. Assigning the integer values to categorical variables facilitates the training and prediction of machine learning models. In this study, the integer codes were processed using one-hot encoding, which transforms discrete categorical variables into multidimensional binary forms. However, categorical variable conversion is not needed for the LGBM model. For datasets with high-dimensional features, using LGBM is advantageous for the optimal handling of categorical features because one-hot encoding may introduce an excess of new features.

3.1.2. Feature Screening

The original features of the tunnels collected in the NTI dataset contain 65 sub-items in 8 major categories. After inspection and screening, irrelevant features such as tunnel codes, tunnel names, and tunnel owners, which are not related to the service statuses of the tunnels, were removed. Additionally, machine learning models with good generalization performance are often desired because they help provide guidance for tunnel operation and maintenance in other countries (e.g., China). Therefore, some features that lack universal significance, such as those designed based on the demands of the U.S. military (STRAHNET designation), were also excluded from the feature library. Table 3 displays the features retained after feature screening along with their corresponding feature codes and feature types.

3.2. Model Development

In this study, tunnel element service state prediction models were developed mainly based on the LGBM. For comparison, five other machine learning algorithms, K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), and Multi-Layer Perceptron (MLP), were also employed in building the service state prediction models.

The LGBM algorithm is a distributed gradient-boosting framework based on the Decision Tree algorithm, which is characterized by high efficiency, fast speed, low memory consumption, and high accuracy, supporting parallel functions and processing large-scale data. A schematic diagram of this algorithm is shown in Figure 3. The training and prediction of the model were conducted using the LGBM model provided by the lightgbm library (version 3.3.3) in Python (version 3.8).

A total of 1203 complete observations were used in this study, with 70% of the data used as a training set and 30% used as a testing set. For model optimization, we employed the grid search method using the GridSearchCV function from Scikit-Learn. Additionally, 5-fold cross-validation was employed to determine the optimal model parameters when training. This approach divided the training set into five subsets. One subset was used for validation, and the remaining four were used for training. This process was repeated five times, with each subset serving as the validation set once in a rotation. The resulting accuracies from the five models were averaged to obtain the validation results. The hyperparameters of the model were then tuned based on the highest average accuracy obtained from the 5-fold cross-validation [22].

3.3. Model Evaluation Metrics

For the classification algorithm, the evaluation metrics primarily include accuracy, precision, recall, and F1-score, all of which are positively correlated with model classification performance. Accuracy is a measure of the overall classification performance of a model, while precision shows how often a prediction model correctly predicts features. Recall reflects the ability of a model to recognize positive samples, and the F1-score provides a comprehensive evaluation of precision and recall. By analyzing the confusion matrix of the test set, the evaluation metrics of the model can be calculated as follows:

a c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(3)

p r e c i s i o n = \frac{T P}{T P + F P}

(4)

r e c a l l = \frac{T P}{T P + F N}

(5)

F_{1} = 2 \cdot \frac{p r e c i s i o n \cdot r e c a l l}{p r e c i s i o n + r e c a l l}

(6)

Here, TP (True Positive) and TN (True Negative) indicate that positive and negative samples were correctly classified, while FP (False Positive) and FN (False Negative) indicate incorrect classification.

4. Results and Analysis

4.1. Evaluation and Validation

After we performed grid search optimization (as detailed in Section 3.2) on the machine learning models, the models with the highest grid cross-validation accuracy were selected to evaluate their performance. Table 4 shows the final hyperparameters and performance evaluation metrics (macro-averages) for the selected machine learning models. The suffixes 1, 2, and 3 represent the models trained for structural, transportation/civil, and non-structural sections, respectively.

Figure 4 compares the performance evaluation metrics of different machine learning models for structural, civil, and non-structural sections, showing that the effectiveness of machine learning modeling significantly depends upon the selection of a suitable algorithm. When evaluated using the four macro-average evaluation metrics, the LGBM classification algorithm demonstrated higher generalization ability compared to the other classifiers, which yielded prediction accuracies of 0.904, 0.944, and 0.762 for structural, transportation/civil, and non-structural sections, respectively. The prediction of the service state for the transportation/civil sections was the easiest, while prediction proved relatively more difficult for non-structural sections. This is due to the complexity of the various element types included in non-structural sections and the intricate relationship between the service states of the elements and the training features.

Ensemble learning algorithms, such as LGBM and RF, demonstrate better classification performance when compared to SVM, KNN, DT, and MLP algorithms. The imbalanced classification of three types of tunnel sections presents a challenge that must be addressed by the prediction model. Ensemble learning models have two advantages when it comes to dealing with imbalanced classification problems. Firstly, they employ multiple sampling techniques to avoid additional learning costs. Secondly, their integration of multiple classifiers helps prevent overfitting [22]. For the most critical structural section, the LGBM model exhibits a slightly lower Macro-PRE than the SVM and RF algorithms. However, the Macro-REC and Macro-F1 are significantly better than those of the other five algorithms, indicating that the LGBM model has a stronger ability to identify deteriorations in operational performance with a low false-negative rate. Since the deterioration of a tunnel’s structural components can lead to serious safety hazards, it is necessary to select a model with a higher recall rate.

To further demonstrate the prediction performance of the LGBM model, which was adopted for the following analysis of the factors influencing tunnel service state, Figure 5 shows the confusion matrix and Receiver Operating Characteristic (ROC) curves of the test set for the prediction of the service states of the three types of elements using the LGBM model. In the matrix, the rows represent the actual conditions of the test set, and the columns represent the predicted conditions. It can be observed that the largest number of labels in each column lies on the diagonal of the confusion matrix, indicating that most of the predictions are correct. For the three sections, the prediction accuracy of service state B (good) was the lowest, as predictions of this category are more likely to be misclassified as A (exceptional). A ROC curve is a graph used to evaluate the performance of a classification model, with the horizontal coordinate being the false-positive rate and the vertical coordinate being the true-positive rate. The four solid lines of different colors in the curve reflect the model’s ability to classify tunnel components in four service states. The closer the position of the curve to the upper left corner, the better the model’s performance, i.e., its true classification rate is higher, and its false-positive rate is lower. The Area Under the Curve (AUC) can be used to quantitatively judge the accuracy of a model. In Figure 5a–c, it can be seen that the LGBM model developed in this study has high prediction accuracy for the three types of components. For structural components, its prediction ability for components with service status C (mediocre) is relatively poor, with an AUC of 0.86; for transportation and non-structural components, its prediction ability for components with service status B (good) is also relatively poor, with an AUC of 0.87.

4.2. Analysis of Factors Influencing Tunnel Service State

According to the LGBM algorithm, the factors influencing the service state of the tunnel were ranked using the split method (making use of the number of splits). The total number of splits of the tree model was sorted based on the features, providing further help for analyzing the impact of a variable on the prediction results. Shapley additive explanations (SHAP), a model interpretation package developed in Python, was utilized to explain the machine learning models using game theory methods and determine the contribution of each feature. To further enhance the interpretability of the model, we also incorporated SHAP in validating and analyzing the ranking of feature importance. This paper also efficiently explains the ranking of the factors influencing the operational performance of tunnels from the perspective of the intrinsic evolutionary mechanism of tunnel performance.

Figure 6 shows the feature importance ranking and SHAP ranking for the three types of elements: structural, civil, and non-structural sections. The results show that several factors have a considerable influence on the service states of tunnel sections; these include service time (F5), tunnel length and width (F10 and F17), longitude and latitude (F4 and F3), the minimum vertical clearance (F16), the annual average daily traffic (F8), and annual average daily truck traffic (F9). However, the importance ranking orders of these features vary for different sections. The influential factors can broadly be classified into three categories: (1) natural factors, such as the latitude and longitude of the tunnel (F3 and F4), service time (F5), etc.; (2) human factors related to traffic, such as the annual average daily traffic (F8), annual average daily truck traffic (F9), etc.; and (3) tunnel attributes, such as tunnel length (F10) and width (F17) and vertical clearance (F16).

Figure 6 shows that the service time of the tunnel (F5) is one of the top two factors influencing the service state ranking of the three types of elements. This finding is consistent with the results of field investigations and experimental studies carried out by several researchers regarding the deterioration of tunnel service states [23,24,25]. In addition to service time, the geographic location of the tunnel (latitude and longitude) (F3 and F4) is also an important factor that affects the service state of a tunnel. The geographic location of a tunnel is closely tied to its temperature and climate. The literature contains many studies on how the natural environment affects the service states of tunnel structures. For instance, Wu et al. [26] utilized intelligent temperature recorders to examine the temperature fields of tunnels within a cold region. The findings established a correlation between the deterioration of the lining concrete and the number of unsaturated freeze–thaw cycles it endured. The ranking shows that latitude (F3) has a more significant impact on the service performance of the three sections than longitude (F4), likely due to the more direct influence of latitude on the temperature of the service environment.

It is worth noting that the service performance of tunnel elements is significantly impacted by human-related factors associated with traffic. These factors not only significantly affect the service states of civil sections such as pavements, guardrails, etc., that directly interact with vehicles but are also closely related to the service performance of structural sections that do not have direct contact with vehicles. In Figure 6a, the annual average daily traffic (F8) is ranked second in the SHAP ranking and fourth in the feature importance ranking for the structural section. Meanwhile, another human-related factor, the annual average daily truck traffic (F9), also ranks high (ninth and eighth, respectively). These findings are in line with the survey results for Chinese highway tunnels documented by Ye et al. [7], who concluded that the human factor is an important contributor to tunnel defects during operation. Human factors impacted various sections in different ways. Civil section components such as pavements are in direct contact with vehicles and are thus subjected to wear and tear, fatigue, and impact from vehicles. In contrast, the structural section, which is designed to mainly bear the pressure from surrounding soil and water, is less likely to be in direct contact with normal running vehicles, and traffic volume might affect the service state of structural components more through damage to the tunnel structure resulting from tunnel operational accidents. Sun et al. [27] analyzed 2703 tunnel traffic accidents in China from 2001 to 2019, finding that human factors were the primary causes of accidents and revealing that rear-end accidents and collisions with tunnel walls were the most common types of incidents, as shown in Figure 7. After a severe collision, a car may cause secondary disasters such as fires and explosions, posing a threat to the service state of tunnel structures. The sudden impacts resulting from secondary disasters such as explosions and fires can cause severe damage to a tunnel’s structure, thus adversely affecting its functionality. Additionally, damage to structural elements in hidden locations (such as the contact surface between the lining and the surrounding rock) can be difficult to identify and repair after such disasters. Damage such as cracking provides a fast pathway for the transmission of aggressive ions, water, and oxygen in concrete, resulting in an accelerated rate of steel reinforcement corrosion and an increase in the corroded area within the elements (Figure 8). This process eventually accelerates the deterioration process of tunnel linings [28,29,30]. In addition, tunnels with high traffic volume are more likely to become targets of terrorists. These terrorist attacks can also produce cracks in such tunnels [31]. It is worth noting that the existing literature investigating the service states of pavements has generally adopted traffic volume as one of the most influential factors. However, research on the service states of tunnels’ structural elements is currently more focused on the durability evolution of tunnel elements under the coupled effects of surrounding rock pressure and erosive environments [32,33,34]. There is still a lack of studies on the coupling mechanisms of impacts, explosions, fires, and other sudden effects generated by traffic accidents on the long-term service performance of tunnels. Furthermore, current design and evaluation methods for tunnel service performance can be improved to allow us to consider the possible impacts of sudden disasters on long-term degradation.

4.3. Analysis of the Prediction Result for a Real Tunnel

A real tunnel outside the model training set was selected. Based on actual data obtained from this tunnel, the LGBM algorithm was employed to predict the service states of three types of elements. The predicted results were then analyzed in comparison with the actual inspection results.

Data obtained from the Wallace Tunnel in 2021 were selected. This tunnel is located in downtown Mobile, United States, and was constructed in 1973. It has a length of approximately 3000 feet (914 m) and serves as a crucial link for traffic on Interstate 10. The prediction results obtained by applying LGBM models to the inspection data for three types of elements in the Wallace Tunnel are as follows: the structural section was given a rating of A (exceptional), the civil section was given a rating of B (good), and the non-structural section was given a rating of D (poor). The prediction results are consistent with the actual inspection results, thereby confirming the analysis conducted in Section 4.2 on the factors influencing the operational state of the tunnel. As of 2021, the service time of the Wallace Tunnel was 48 years, which is relatively shorter than that of other tunnels in the United States. Therefore, the structural section of the tunnel was in excellent condition at this time. The tunnel had a high traffic volume, with an AADT of 75,320. Compared to the structural section, factors such as traffic volume more directly affect the civil and non-structural sections, causing adverse effects.

5. Conclusions

In this paper, machine learning models based on the NTI database of the Federal Highway Administration (FHWA) were developed. According to their functions, tunnel elements were classified into three categories, namely, structural, civil, and non-structural sections. The service states of these elements were then classified into different ratings. Machine learning models for predicting the service states of tunnel elements were developed using LGBM, KNN, SVM, MLP, DT, and RF algorithms. Furthermore, an analysis of the importance of features influencing tunnel service state was conducted using the LGBM model. The main findings of this study are as follows.

(1) After analyzing the evaluation metrics of accuracy, precision, recall, and F1-score, it was found that LGBM performed the best, while MLP performed the worst. Ensemble learning algorithms such as LGBM and RF performed better in terms of addressing the issue of the imbalanced classification of tunnel elements’ service states. Moreover, the LGBM algorithm had an advantage over the other algorithms in conducting feature importance analysis based on a large quantity of data as it can directly handle categorical features without the need for one-hot encoding.

(2) The machine learning models established using the LGBM algorithm exhibited prediction accuracies of 90.9%, 96.4%, and 77.3% for the structural, civil, and non-structural sections, respectively. These results indicate that the LGBM models possess strong generalization ability.

(3) The feature importance analysis conducted on the LGBM model revealed that service time, tunnel length, latitude and longitude, the minimum vertical clearance over the tunnel roadway, AADT, AADTT, and tunnel width are the important factors affecting the service states of various type of tunnel elements. In addition to the environmental features and tunnel attributes of tunnels, human factors such as traffic volume have a significant impact on the structural section of tunnels. This study also shows that the sudden effects of impacts, explosions, and fire, constituting secondary disasters caused by traffic accidents, have the potential to lead to long-term deterioration of tunnel structure performance. Therefore, further research should be carried out to analyze the coupling mechanisms of such sudden effects regarding the long-term service performance of a tunnel, including investigations on the performance degradation of pre-damaged tunnel structures in aggressive underground environments, and, at a microscopic level, the transport mechanisms of erosive ions in pre-cracked concrete as well as the uneven corrosion characteristics of rebar in pre-damaged concrete cover.

(4) The machine learning model used in this study is based on survey data collected in the United States. Therefore, further data are needed to validate the applicability of the conclusions to the operational conditions of tunnels in China. It would be of great significance for various parties, such as transportation management agencies, tunnel owners, and tunnel inspection units, to collaborate to establish a tunnel operational status database specific to China. With a large quantity of high-quality field data, tunnel operational performance patterns in China can be more clearly revealed at a macro level via data-driven models. Moreover, for the practice of tunnel inspection, more detailed data related to traffic volume should be collected, such as the rates and types of traffic accidents. With more parameters considered in the predictive model employed, its accuracy can be further improved.

Author Contributions

D.Z. was in charge of conceptualization, data processing and analysis, methodology, and making important revisions to the original draft. Y.Y. contributed to the data processing, model construction, and writing of the original draft. C.C. supervised the whole program and played an important role in revising and approving the original draft. B.L. conducted the investigation and revised the original draft. All authors have read and agreed to the published version of the manuscript.

Funding

This paper received the following funding: Project (2021YFB26010000) supported by the National Key Research and Development Program of China; Project (12021GXNSFBA075034) supported by the Guangxi Provincial Natural Science Foundation of China; Project (52090084) supported by the National Natural Science Foundation of China; and Project (2023QNT002) supported by the Young Researcher Group Nurturing Program of Shenzhen University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets generated and analyzed in the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

Author Bin Liu was employed by the company Hualan Design & Consulting Group. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Tan, Z.; Li, S.; Yang, Y.; Wang, J. Large deformation characteristics and controlling measures of steeply inclined and layered soft rock of tunnels in plate suture zones. Eng. Fail. Anal. 2022, 131, 105831. [Google Scholar] [CrossRef]
Zheng, Y.; He, S.; Yu, Y.; Zheng, J.; Zhu, Y.; Liu, T. Characteristics, challenges and countermeasures of giant karst cave: A case study of Yujingshan tunnel in high-speed railway. Tunn. Undergr. Space Technol. 2021, 114, 103988. [Google Scholar] [CrossRef]
Huang, H.; Cheng, W.; Zhou, M.; Chen, J.; Zhao, S. Towards Automated 3D Inspection of Water Leakages in Shield Tunnel Linings Using Mobile Laser Scanning Data. Sensors 2020, 20, 6669. [Google Scholar] [CrossRef] [PubMed]
Gao, C.-L.; Zhou, Z.-Q.; Yang, W.-M.; Lin, C.-J.; Li, L.-P.; Wang, J. Model test and numerical simulation research of water leakage in operating tunnels passing through intersecting faults. Tunn. Undergr. Space Technol. 2019, 94, 103134. [Google Scholar] [CrossRef]
Ma, K.; Zhang, J.; Zhang, J.; Dai, Y.; Zhou, P. Floor heave failure mechanism of large-section tunnels in sandstone with shale stratum after construction: A case study. Eng. Fail. Anal. 2022, 140, 106497. [Google Scholar] [CrossRef]
Zhao, D.; Fan, H.; Jia, L. Characteristics and Mitigation Measures of Floor Heave in Operational High-Speed Railway Tunnels. KSCE J. Civ. Eng. 2021, 25, 1479–1490. [Google Scholar] [CrossRef]
Ye, F.; Qin, N.; Liang, X.; Ouyang, A.; Qin, Z.; Su, E. Analyses of the defects in highway tunnels in China. Tunn. Undergr. Space Technol. 2021, 107, 103658. [Google Scholar] [CrossRef]
Sandrone, F.; Labiouse, V. Identification and analysis of Swiss National Road tunnels pathologies. Tunn. Undergr. Space Technol. 2011, 26, 374–390. [Google Scholar] [CrossRef]
Li, X.; Lin, X.; Zhu, H.; Wang, X.; Liu, Z. Condition assessment of shield tunnel using a new indicator: The tunnel serviceability index. Tunn. Undergr. Space Technol. 2017, 67, 98–106. [Google Scholar] [CrossRef]
Xu, W.; Cheng, M.; Xu, X.; Chen, C.; Liu, W. Deep Learning Method on Deformation Prediction for Large-Section Tunnels. Symmetry 2022, 14, 2019. [Google Scholar] [CrossRef]
Bai, X.-D.; Cheng, W.-C.; Png, D.E.L.; Li, G. Evaluation of geological conditions and clogging of tunneling using machine learning. Geomech. Eng. 2021, 25, 59–73. [Google Scholar] [CrossRef]
Ahmed, M.O.; Khalef, R.; Ali, G.G.; El-Adaway, I.H. Evaluating Deterioration of Tunnels Using Computational Machine Learning Algorithms. J. Constr. Eng. Manag. 2021, 147, 04021125. [Google Scholar] [CrossRef]
Tian, L.; Feng, L.; Yang, L.; Guo, Y. Stock price prediction based on LSTM and LightGBM hybrid model. J. Supercomput. 2022, 78, 11768–11793. [Google Scholar] [CrossRef]
Yan, Z.; Wen, H. Comparative Study of Electricity-Theft Detection Based on Gradient Boosting Machine. In Proceedings of the 2021 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Glasgow, UK, 17–20 May 2021; pp. 1–6. [Google Scholar] [CrossRef]
Chakraborty, D.; Elhegazy, H.; Elzarka, H.; Gutierrez, L. A novel construction cost prediction model using hybrid natural and light gradient boosting. Adv. Eng. Inform. 2020, 46, 101201. [Google Scholar] [CrossRef]
Rao, A.R.; Wang, H.; Gupta, C. Predictive Analysis for Optimizing Port Operations. arXiv 2024, arXiv:2401.14498. [Google Scholar] [CrossRef]
Xue, Y.-D.; Zhang, W.; Wang, Y.-L.; Luo, W.; Jia, F.; Li, S.-T.; Pang, H.-J. Serviceability evaluation of highway tunnels based on data mining and machine learning: A case study of continental United States. Tunn. Undergr. Space Technol. 2023, 142, 105418. [Google Scholar] [CrossRef]
FHWA (Federal Highway Administration). Tunnel Inspection—Safety—Bridges & Structures—Federal Highway Administration. (2017–2021); Federal Highway Administration: Washington, DC, USA, 2017.
FHWA(Federal Highway Administration); DOT (Department Of Transportation). National Tunnel Inspection Standards; Federal Highway Administration: Washington, DC, USA, 2015.
FHWA-NTI (Federal Highway Administration-National Tunnel Inventory). Specifications for the National Tunnel Inventory; Federal Highway Administration: Washington, DC, USA, 2019.
ASCE. 2021 Report Card for America’s Infrastructure: A Comprehensive Assessment of America’s Infrastructure; ASCE: Reston, VA, USA, 2021. [Google Scholar]
Hammerla, N.Y.; Plötz, T. Let’s (not) Stick Together: Pairwise Similarity Biases Cross-Validation in Activity Recognition. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (ubicomp 2015), Osaka, Japan, 7–11 September 2015; Assoc Computing Machinery: New York, NY, USA, 2015; pp. 1041–1051. [Google Scholar] [CrossRef]
Liu, W.; Chen, J.; Luo, Y.; Chen, L.; Zhang, L.; He, C.; Shi, Z.; Xu, Z.; Zhu, H.; Hu, T. Long-term stress monitoring and in-service durability evaluation of a large-span tunnel in squeezing rock. Tunn. Undergr. Space Technol. 2022, 127, 104611. [Google Scholar] [CrossRef]
Liu, C.; Zhang, D.; Zhang, S.; Fang, H. Analytical solution of the long-term service performance of tunnel considering surrounding rock rheology and lining deterioration characteristics. Rock Soil Mech. 2021, 42, 2795–2807. [Google Scholar]
Xue, X.; Xie, Y.; Zhou, X. Study on the Life-Cycle Health Monitoring Technology of Water-Rich Loess Tunnel. Adv. Mater. Sci. Eng. 2019, 2019, 9461890. [Google Scholar] [CrossRef]
Wu, Y.; Xu, P.; Huang, L.; Cai, Z.; Hu, K. Progressive deterioration of tunnel lining in seasonal freezing zone and its engineering influence. J. Chang. Univ. 2021, 41, 63–72. [Google Scholar]
Sun, H.; Wang, Q.; Zhang, P.; Zhong, Y.; Yue, X. Spatialtemporal Characteristics of Tunnel Traffic Accidents in China from 2001 to Present. Adv. Civ. Eng. 2019, 2019, e4536414. [Google Scholar] [CrossRef]
Jin, H.; Zheng, J.; Yu, S. Causes, Hazards, and Treatment of Rust Cracks on the Soil Side of Shield Tunnels; Southeast University Press: Nanjing, China, 2023. [Google Scholar]
Hu, X.; He, C.; Feng, K.; Liu, S.; Walton, G. Effects of polypyrrole coated rebar on corrosion behavior of tunnel lining with the combination effect of sustained loading and pre-existing cracks when exposed to chlorides. Constr. Build. Mater. 2019, 221, 318–331. [Google Scholar] [CrossRef]
Zhang, X.; Yu, S.; Jin, H.; Bi, X.; Zhou, S. Experimental investigation of corrosion effect on bending deflection of shield tunnel segment containing transverse cracks. Struct. Concr. 2023, 24, 411–422. [Google Scholar] [CrossRef]
Zhao, D.; Huang, Y.; Chen, X.; Han, K.; Chen, C.; Zhao, X.; Chen, W. Numerical investigations on dynamic responses of subway segmental tunnel lining structures under internal blasts. Tunn. Undergr. Space Technol. 2023, 135, 105058. [Google Scholar] [CrossRef]
Lei, M.; Peng, L.; Shi, C. An experimental study on durability of shield segments under load and chloride environment coupling effect. Tunn. Undergr. Space Technol. 2014, 42, 15–24. [Google Scholar] [CrossRef]
Liu, D.; Chen, H.; Tang, Y.; Gong, C.; Jian, Y.; Cao, K. Analysis and Prediction of Sulfate Erosion Damage of Concrete in Service Tunnel Based on ARIMA Model. Materials 2021, 14, 5904. [Google Scholar] [CrossRef]
Lei, M.; Peng, L.; Shi, C.; Wang, S. Experimental study on the damage mechanism of tunnel structure suffering from sulfate attack. Tunn. Undergr. Space Technol. 2013, 36, 5–13. [Google Scholar] [CrossRef]

Figure 1. Distribution of tunnel features in the NTI database (2020): (a) tunnel lengths (unit: meters), (b) tunnel service time (unit: years), and (c) annual average daily traffic (unit: vehicles).

Figure 2. Distribution of tunnels’ geographic locations.

Figure 3. A schematic diagram of the LGBM algorithm solving classification problems.

Figure 4. A comparison of the evaluation metrics: (a) structural section; (b) civil section; and (c) non-structural section.

Figure 5. Confusion matrices and Receiver Operating Characteristic (ROC) curves for LGBM models: (a) structural section; (b) civil section; and (c) non-structural section.

Figure 6. The feature importance rankings and SHAP rankings (top 20) for the (a) structural; (b) civil; and (c) non-structural sections.

Figure 7. Accident types versus traffic accidents [27].

Figure 8. Illustration of structural damage accelerating long-term performance degradation.

Table 1. Classification of tunnel sections.

Element Classification	Specific Elements Included
Category I: Structural sections that are the main load-bearing elements of the tunnel	Lining, tunnel roof beams, columns/piles, cross passages, internal walls, doors, ceiling roof walls, roof beams, hangers and anchors, ceiling panels, soffit panels, floor panels, soffit beams, joints, spacers
Category II: Civil sections in direct contact with transportation	Wearing surface	Concrete wearing surface, asphalt wearing surface, other wearing surfaces
	Traffic barrier	Steel traffic barriers, concrete traffic barriers, other traffic barriers
	Pedestrian railings	Steel pedestrian railings, concrete pedestrian railings, other pedestrian railings
Category III: Non-structural sections	Mechanical system section	Ventilation system, drainage and pumping system, fans, flood gate
	Electrical and lighting system section	Electrical distribution system, tunnel lighting system, emergency lighting system
	Fire/life safety/security system sections	Fire detection system, fire protection system, emergency communication system
	Sign section	Traffic signs, egress signs, variable message boards, lane signals
	Protection-of-system sections	Protective coating

Table 2. Tunnel element condition classification.

Overall Goodness (%)	Condition Score	Overall Element Condition
[100, 75)	A	Exceptional
[75, 50)	B	Good
[50, 25)	C	Mediocre
[25, 0]	D	Poor

Table 3. Features retained after feature screening.

Feature Code	Retained Feature	Feature Type
F1	Route Direction	Categorical
F2	Route Type	Categorical
F3	Tunnel Portal Latitude	Numerical
F4	Tunnel Portal Longitude	Numerical
F5	Tunnel Service Time	Numerical
F6	Year Rehabilitated	Categorical
F7	Number of Lanes	Categorical
F8	Annual Average Daily Traffic (AADT)	Numerical
F9	Annual Average Daily Truck Traffic (AADTT)	Numerical
F10	Detour Length	Numerical
F11	Service in Tunnel	Categorical
F12	Direction of Traffic	Categorical
F13	Toll	Categorical
F14	Functional Classification	Categorical
F15	Tunnel Length	Numerical
F16	Minimum Vertical Clearance	Numerical
F17	Tunnel Width	Numerical
F18	Routine Inspection Interval	Categorical
F19	In-Depth Inspection	Categorical
F20	Damage Inspection	Categorical
F21	Height Restriction	Categorical
F22	Hazardous Material Restriction	Categorical
F23	Other Restrictions	Categorical
F24	Under Navigable Waterway	Categorical
F25	Navigable Waterway Clearance	Numerical
F26	Number of Bores	Categorical
F27	Tunnel Shape	Categorical
F28	Portal Shape	Categorical
F29	Ground Conditions	Categorical
F30	Complex	Categorical

Table 4. Machine learning models and optimized hyperparameters.

Model	Optimized Hyperparameters	Accuracy for Training Set	Evaluation Metric for Testing Set
Model	Optimized Hyperparameters	Accuracy for Training Set	Macro-ACC	Macro-PRE	Macro-REC	Macro-F1
DT-1	max_depth = None, max_leaf_nodes = 600, min_samples_leaf = 1, min_samples_split = 2	0.865	0.864	0.767	0.658	0.698
DT-2	max_depth = 25, max_leaf_nodes = 600, min_samples_leaf = 1, min_samples_split = 2	0.913	0.939	0.825	0.768	0.792
DT-3	max_depth = None, max_leaf_nodes = 400, min_samples_leaf = 1, min_samples_split = 2	0.720	0.751	0.745	0.735	0.738
KNN-1	n_neighbors = 34, weights = ‘distance’	0.888	0.892	0.863	0.618	0.694
KNN-2	n_neighbors = 15, weights = ‘distance’	0.929	0.936	0.863	0.739	0.790
KNN-3	n_neighbors = 2, weights = ‘distance’	0.715	0.784	0.793	0.771	0.777
SVM-1	C = 10,gamma = 0.1, kernel = ‘rbf’	0.893	0.898	0.904	0.620	0.710
SVM-2	C = 10,gamma = 0.1, kernel = ‘rbf’	0.930	0.942	0.984	0.693	0.796
SVM-3	C = 1000,gamma = 0.001, kernel = ‘rbf’	0.639	0.695	0.791	0.656	0.688
MLP-1	activation = ‘logistic’, hidden_layer_sizes = 8, max_iter = 2000, solver = ‘lbfgs’	0.833	0.806	0.333	0.305	0.305
MLP-2	activation = ‘logistic’, hidden_layer_sizes = 11, max_iter = 2000, solver = ‘lbfgs’	0.861	0.889	0.655	0.441	0.446
MLP-3	activation = ‘logistic’, hidden_layer_sizes = 14, max_iter = 1000, solver = ‘adam’	0.336	0.299	0.164	0.287	0.189
LGBM-1	max_depth = 8, num_leaves = 30, min_split_gain = 0, colsample_bytree = 1, subsample = 0.8	0.904	0.909	0.866	0.724	0.776
LGBM-2	max_depth = 8, num_leaves = 30, min_split_gain = 0, colsample_bytree = 1, subsample = 0.8	0.944	0.964	0.966	0.807	0.873
LGBM-3	max_depth = 9, num_leaves = 25, min_split_gain = 0, colsample_bytree = 1, subsample = 0.8	0.762	0.773	0.783	0.754	0.760
RF-1	max_depth = 15, max_features = 5, min_samples_split = 3, n_estimators = 220	0.893	0.911	0.894	0.649	0.728
RF-2	max_depth = 15, max_features = 15, min_samples_split = 3, n_estimators = 110	0.935	0.958	0.934	0.759	0.811
RF-3	max_depth = 15, max_features = 15, min_samples_split = 3, n_estimators = 280	0.759	0.776	0.802	0.756	0.766

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, D.; Yang, Y.; Cao, C.; Liu, B. Prediction Models and Feature Importance Analysis for Service State of Tunnel Sections Based on Machine Learning. Appl. Sci. 2024, 14, 9167. https://doi.org/10.3390/app14209167

AMA Style

Zhao D, Yang Y, Cao C, Liu B. Prediction Models and Feature Importance Analysis for Service State of Tunnel Sections Based on Machine Learning. Applied Sciences. 2024; 14(20):9167. https://doi.org/10.3390/app14209167

Chicago/Turabian Style

Zhao, Debo, Yujia Yang, Chengyong Cao, and Bin Liu. 2024. "Prediction Models and Feature Importance Analysis for Service State of Tunnel Sections Based on Machine Learning" Applied Sciences 14, no. 20: 9167. https://doi.org/10.3390/app14209167

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Prediction Models and Feature Importance Analysis for Service State of Tunnel Sections Based on Machine Learning

Abstract

1. Introduction

2. Introduction to the Database

2.1. Overview

2.2. Classification of Tunnel Elements

2.3. Tunnel Element Service State Ratings

3. Methodology

3.1. Data Preprocessing

3.1.1. Categorical Variable Conversion

3.1.2. Feature Screening

3.2. Model Development

3.3. Model Evaluation Metrics

4. Results and Analysis

4.1. Evaluation and Validation

4.2. Analysis of Factors Influencing Tunnel Service State

4.3. Analysis of the Prediction Result for a Real Tunnel

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI