A Condition Assessment Tool for Steel Bridge Deck Pavement Systems Based on Data Balancing Methods and Machine Learning Algorithms

Wei, Yazhou; Ji, Rongqing; Li, Qingfu; Song, Zongming

doi:10.3390/buildings14092959

Open AccessArticle

A Condition Assessment Tool for Steel Bridge Deck Pavement Systems Based on Data Balancing Methods and Machine Learning Algorithms

¹

Henan Puwei Expressway Company Limited, Puyang 457000, China

²

School of Water Conservancy and Transportation, Zhengzhou University, Zhengzhou 450001, China

^*

Author to whom correspondence should be addressed.

Buildings 2024, 14(9), 2959; https://doi.org/10.3390/buildings14092959

Submission received: 17 August 2024 / Revised: 13 September 2024 / Accepted: 17 September 2024 / Published: 19 September 2024

(This article belongs to the Section Building Structures)

Download

Browse Figures

Versions Notes

Abstract

The primary challenge in the operation of steel deck pavement systems lies in the inspection and assessment of their condition. Traditionally, manual inspection methods are employed. However, these approaches are not only time-consuming and labor-intensive but also prone to human error. As a result, integrating data-driven machine learning technologies into the evaluation of pavement systems presents a significant advantage in addressing these issues. This study proposes a decision-making tool for estimating the condition levels of steel bridge deck pavement systems by employing classification techniques. To address the issue of class imbalance in the dataset, the SMOTE algorithm is utilized. Additionally, seven different machine learning methods—Light Gradient Boosting Machine, Extreme Gradient Boosting, Random Forest, Adaptive Boosting, K-Nearest Neighbor, Multilayer Perceptron, and Logistic Regression—are applied for training. Comparative analysis reveals that the Light Gradient Boosting performs optimally, achieving classification accuracies of 0.841 and 0.929 on the original and synthetic datasets, respectively.

Keywords:

steel deck pavement system; imbalance data; decision-making tools; condition levels

1. Introduction

As bridge construction and transportation networks rapidly expand, steel bridge deck paving systems, recognized for their exceptional performance, have become increasingly common in global bridge projects [1]. These systems consist of a pavement layer and a steel bridge deck, with the pavement acting as a protective barrier against corrosion and providing essential friction for vehicles [2]. The pavement layer and steel bridge deck work together to bear loads, showcasing strong coordinated deformation abilities [3]. However, during operation, these systems are prone to defects, which pose significant safety risks to urban traffic. Therefore, regular inspection and evaluation are vital to maintaining their structural integrity and ensuring traffic safety [4].

Recently, artificial intelligence technology has become increasingly prevalent in various fields such as healthcare, finance, and engineering, bringing about substantial improvements and innovations [5,6]. Machine learning, a subset of artificial intelligence, facilitates automation by identifying patterns in data [7,8]. This study utilizes machine learning algorithms to classify condition levels of steel deck pavement systems, moving away from traditional manual inspection methods. Manual inspections are often time-consuming and labor-intensive. In contrast, machine learning provides a more efficient alternative, enabling transit agency personnel to assess the condition levels of steel deck pavement systems with limited data. This method not only reduces workload and costs but also decreases the likelihood of human error.

With advancements in computing power, artificial intelligence has increasingly captured public attention. Machine learning has been effectively applied in various fields and has been proven to be able to solve various nonlinear problems well [9,10,11]. Mangalathu et al. employed machine learning algorithms, including K-Nearest Neighbor and Random Forest, to classify the failure modes of reinforced concrete beam–column joints and predict their shear capacity [12]. Bakouregui et al. developed a gradient boosting algorithm model to estimate the load-bearing capacity of reinforced concrete columns strengthened with fiber-reinforced polymer strips [13]. Ikumi et al. used artificial neural networks to estimate the tensile strength of fiber-reinforced concrete after cracking [14]. Guan et al. developed an estimation model using a Random Forest algorithm to estimate the maximum interstory displacement angle and maximum roof acceleration of frame structures subjected to earthquakes [15]. Feng et al. employed an adaptive boosting formula to build a robust model by combining multiple weak learners to estimate the compressive strength of concrete [16]. In addition to building ML prediction models, researchers often use interpretability methods to help people understand how ML decisions are made [17].

In recent years, some researchers have introduced machine learning algorithms into bridge condition prediction problems. Nasab et al. developed a framework to improve the accuracy of predicting bridge deck conditions using machine learning algorithms with Ohio bridge data [18]. Rajkumar et al. designed a Random Forest algorithm to estimate the condition ratings of bridges in Florida, which can generate an efficient model with fewer input parameters [19]. Martinez et al. collected and used data from 2802 bridges in Canada over a 10-year period, used a decision tree algorithm to predict the bridge condition index, and verified the model through cross-validation [20]. Liu et al. used the convolutional neural network algorithm to build up a method for predicting the status of bridge structural components and optimized the model parameters [21]. Assaad et al. selected the feature importance of factors affecting the bridge and developed a bridge deck pavement defect prediction model using the K-Nearest Neighbor algorithm [22]. Although some researchers have used machine learning algorithms to study bridge conditions, the databases used have class imbalance problems. Researchers failed to consider the imbalance of categorical data in the database. To address these issues, this study developed a decision-making tool for assessing the condition levels of deck pavement systems using unbalanced data.

This study develops a decision-making tool to assist transit agency personnel in assessing the condition of steel bridge deck pavement systems. We build up a data-driven prediction model to replace the manual detection methods currently used, thereby saving costs and improving efficiency. A series of tests and validations on the condition levels of the pavement layer and steel bridge deck were performed using seven machine learning techniques. The primary contributions of this paper are outlined as follows: (1) this paper presents a decision-making tool for estimating the condition levels of deck pavement systems by employing data balancing techniques and machine learning algorithms for classification; (2) data imbalance is a common issue in classification problems: to address this, the Synthetic Minority Oversampling Technique (SMOTE) was used to create a balanced synthetic dataset for the deck pavement system, and the new dataset was utilized to evaluate the effectiveness of the machine learning model; (3) hyperparameters of the model were optimized through a mixture of 10-fold cross-validation and grid search to improve its generalization performance; (4) in the original dataset, the model achieved an accuracy of 0.841. After applying SMOTE to address the imbalance, the accuracy of the model improved to 0.929.

2. Method

2.1. Machine Learning Algorithm

The Light Gradient Boosting Machine (LightGBM) is a scalable and efficient machine learning method that utilizes gradient boosting with decision trees. It introduces innovative optimization strategies for data compression and decision tree modeling. LightGBM utilizes a one-sided sampling method for gradient compression to reduce the number of samples and applies feature bundling to compress features by grouping mutually exclusive ones [23]. LightGBM is well-suited for handling large-scale, high-dimensional datasets, offering significant advantages in terms of speed efficiency and prediction accuracy.

Extreme Gradient Boosting (XGBoost) is a widely used algorithm in machine learning competitions. The very important ideas in XGBoost are as follows: (1) to achieve a balance between prediction accuracy and model complexity, increase the risk factor in the loss function, and reset the quality score, and (2) to use various methods such as an estimation-greedy algorithm to reduce model complexity [24]. XGBoost can automatically handle missing values in the data, thereby enhancing the robustness of the model. Additionally, XGBoost supports a variety of objective functions and allows for custom objective functions, making it widely applicable.

Compared with other decision tree-based algorithms, when the Random Forest (RF) algorithm inputs training data, it does not build a decision tree for the entire training dataset, but uses different subsets and feature attributes to build several smaller decision trees. The final result is a combination of these decision tree results. In addition, randomly selected samples and randomly selected feature attributes are used when constructing subsets [25]. RF averages the decision trees it builds, reducing overfitting and improving the model’s generalization ability. RF is also resistant to data noise and outliers, providing higher reliability.

Adaptive Boosting (AdaBoost) is a classic machine learning algorithm that is implemented through continuous iteration. In the iterative process, multiple weak classifiers are trained to achieve the allowable error rate. Each training sample is given a weight, and if a sample is correctly classified, its likelihood of being chosen in subsequent iterations is reduced. If this sample is classified incorrectly, it will be given a higher weight. Multiple weak classifiers are trained iteratively, and finally these weak classifiers are combined through weighted voting [26]. AdaBoost has fewer hyperparameters, making the impact of parameter settings on the model less significant, and is simple to use. However, AdaBoost can be affected by large amounts of noise.

The K-Nearest Neighbor (KNN) method is a fundamental machine learning method. It classifies a sample based on the majority category among its K nearest neighbors in the feature space [26]. KNN does not require complex computations, offering high flexibility and interpretability. KNN is typically used in scenarios with simple data structures.

Multilayer Perceptron (MLP) is a machine learning model based on feedforward neural networks. MLP usually consists of multiple neuron layers, where the input layer is used to receive input variables, the middle-hidden layer performs nonlinear transformation, and the output layer is used to predict results. MLP has powerful expressive power in handling nonlinear problems and high-dimensional data [8]. The architecture of the MLP facilitates parallelization, enabling its training and inference to run efficiently on GPUs, significantly enhancing computational efficiency.

Logistic Regression (LR) is a classic classification method in machine learning, but its essence is still linear regression. This algorithm is based on linear regression and adds nonlinear functions in the process from features to predicted results. The parameter estimation method of LR no longer uses the least squares method of linear regression, but chooses the maximum likelihood method [8].

2.2. SMOTE

Data imbalance is a common issue in classification tasks [27]. When training on imbalanced data, machine learning algorithms often favor the majority class, leading to reduced model accuracy. The Synthetic Minority Oversampling Technique (SMOTE) addresses this by generating synthetic data points for the minority class [28]. The key steps of SMOTE are as follows: (1) for each sample belonging to the minority class, locate its k most similar neighbors within the minority class; (2) randomly select some of these neighbors and compute the differences between the minority sample and these neighbors; (3) create new synthetic samples along the line connecting the minority sample and its neighbors. By using SMOTE, the model can better learn from the minority class, thereby enhancing its accuracy.

2.3. SHAP

SHAP (Shapley Additive Explanations) is a framework used to explain machine learning models, primarily based on Shapley values derived from game theory [29]. The calculation of Shapley values takes into account various feature combinations. This framework computes the marginal contribution of each feature across all possible feature permutations and averages the results. The visualization interface of the SHAP framework makes it one of the most widely used interpretability methods within the domain of machine learning.

2.4. Model Development

This study evaluates the model using accuracy, precision, and recall. Accuracy indicates the proportion of correctly predicted samples out of all samples. Precision is the fraction of correctly predicted positive samples to all predicted positives. Recall is the ratio of correctly predicted positives to the total actual positives. For both the original and synthetic datasets, the training-to-test set ratio is 7:3. Cross-validation and grid search techniques are employed to optimize hyperparameters and enhance the model’s generalization performance. The flowchart of the decision-making tool is presented in Figure 1. The process of developing the decision-making tool is as follows:

(1) Construct a database and preprocess the original data. In order to solve the problem of category imbalance, SMOTE technology is used to build a generated database; (2) build a model and use the original and generated databases to train the model. Hyperparameter optimization of each model is performed according to 10-fold cross-validation and grid search technology; (3) performance evaluation: perform performance evaluation of the proposed model framework, and compare and analyze the optimal model framework with other models.

3. Results and Discussion

3.1. Database

3.1.1. Original Database

This study collected real data on the condition of 6073 steel bridge deck pavement systems from the U.S. national bridge information database in the public domain [30]. The database includes the following variables: (1) Year; (2) ADT; (3) Load; (4) Spans; (5) Length; (6) Width; (7) Pavement Condition. Figure 2 and Table 1 provide details of the original database. This study uses the Pearson coefficient method to determine the interrelation between variables in the dataset. There is a positive correlation between Pavement Condition and Year. In contrast, there is a negative correlation between Pavement Condition and other variables. Based on the operating status of the bridge, the steel bridge deck system condition assessment level is divided into four levels: Level 0 (poor), Level 1 (normal), Level 2 (good), and Level 3 (very good). The higher the assessment level, the better the pavement system performs [31].

3.1.2. Generate Database

SMOTE technology is adopted to solve the class imbalance problem in the original database. There are 6073 data in the original database, and the category distribution ratio is as follows: Level 0 accounts for 0.198%, Level 1 accounts for 9.699%, Level 2 accounts for 83.715%, and Level 3 accounts for 6.389%. After processing with the SMOTE algorithm, a balanced database is obtained. In the generated database, level 0 accounts for 25%, level 1 accounts for 25%, level 2 accounts for 25%, and level 3 accounts for 25%. Finally, 20,336 generated data were obtained. The detailed description of the generated database is shown in Table 2 and Figure 3. The correlation of each variable is similar to that of the original database, indicating that the generated data are reliable.

3.2. Original Database Results

In the original deck pavement system database, seven machine learning methods—LightGBM, XGBoost, Random Forest (RF), AdaBoost, K-Nearest Neighbor (KNN), Multilayer Perceptron (MLP), and Logistic Regression (LR)—were directed at training and assessed by classification performance. The cross-validation results of each algorithm on the training set are presented in Table 3 and Figure 4. These results represent the average performance across 10-fold cross-validation.

The performance of each algorithm on the test set can be obtained from Table 4 and Figure 5. The LightGBM algorithm has the highest accuracy, reaching 0.841. The three algorithms XGBoost, RF, and LR are second only to LightGBM, with accuracies of 0.837, 0.838, and 0.837. It can be seen that the performances of LightGBM, XGBoost, RF, and LR are close in terms of accuracy. In terms of of precision, the LightGBM algorithm is the highest, reaching 0.845. The three algorithms XGBoost, AdaBoost, and KNN are lower than the LightGBM algorithm, with precisions of 0.730, 0.738, and 0.740, respectively. The LightGBM algorithm has higher precision. For recall, the KNN algorithm has the highest performance, reaching 0.267. The three algorithms LightGBM, XGBoost, and AdaBoost are slightly lower than the KNN algorithm, with recalls of 0.265, 0.262, and 0.257, respectively. The recall of these four algorithms is relatively close, and the difference is not big. By comparing the evaluation indicators of each algorithm, LightGBM is the most accurate algorithm. However, the recall of all algorithms is relatively low, fluctuating around 0.250. This is a result of the class imbalance in the original database. To address this problem, we will use the generated database to train the algorithm in the next section.

Interpretability analysis was performed on the LightGBM model using the original database, and the importance of each variable is depicted in Figure 6. In a multiclassification problem, SHAP computes Shapley values for each feature separately for each category. The most impactful variable is the year the bridge was built. The secondary influencing factors are as follows: bridge width, bridge length, average daily traffic volume, the number of bridge spans, and design load.

3.3. Generate Database Results

To address the class imbalance problem in the original data, SMOTE technology is used to balance the original database. In the steel bridge deck pavement system generation database, seven distinct models, including LightGBM, XGBoost, RF, AdaBoost, KNN, MLP, and LR, were utilized to assess the classification performance of each model. For the generated database, the evaluation results of the cross-validation of each algorithm on the training set are displayed in Table 5 and Figure 7.

In the synthetic database, the performance of each algorithm on the test set is displayed in Table 6 and Figure 8. The LightGBM algorithm has the highest accuracy, reaching 0.929. XGBoost and RF algorithms are second only to LightGBM, with accuracies of 0.926 and 0.907. It can be seen that the LightGBM performs better than XGBoost in terms of accuracy. In the context of precision, the LightGBM algorithm is also the highest, reaching 0.929. The XGBoost and RF algorithms are lower than the LightGBM algorithm, with precisions of 0.926 and 0.906, respectively. The LightGBM algorithm has higher precision. For recall, the LightGBM algorithm also has the highest performance, reaching 0.930. The XGBoost algorithm is somewhat lower than the LightGBM algorithm, with a recall of 0.926. The recall of these two algorithms is relatively close. By comparing the evaluation indicators of each algorithm, it can be seen that LightGBM is the most accurate algorithm.

The accuracy, precision, and recall of the LightGBM algorithm are 0.929, 0.929, and 0.930, respectively. Next, the interpretability assessment of the LightGBM model employing the generated database is performed. The importance of each variable is shown in Figure 9. In a multiclassification problem, SHAP computes Shapley values for each feature separately for each category. The most influential variables are bridge width and the year of construction. The secondary influencing factors are bridge length, average daily traffic volume, the number of bridge spans, and design load.

3.4. Discussion

For the condition classification of the bridge deck pavement system, Table 7 presents the evaluation results of the optimal model on both the original and synthetic databases. The productivity of machine learning models on the generated database is significantly improved compared to the original database. In the generated database, accuracy increased by 10.5%, precision increased by 9.9%, and recall increased by 251%. This shows that the application of SMOTE technology can effectively solve the class imbalance problem of original data and elevate the performance and generalization ability of the model.

4. Conclusions

The existing steel bridge deck pavement system condition detection is usually carried out manually. The use of data-driven intelligent algorithms can get rid of the dependence on manual labor. The imbalance in data categories within the real steel bridge deck pavement system database impacts the accuracy of machine learning predictions. This study proposes a decision-making tool for predicting condition levels in steel bridge deck pavement systems, specifically designed to address unbalanced data.

To address the class imbalance problem, a generative database was created using SMOTE technology for training machine learning models. Evaluated using seven different machine learning algorithms (LightGBM, XGBoost, RF, AdaBoost, KNN, MLP, and LR). The best-performing LightGBM algorithm has an accuracy of 0.841, a precision of 0.845, and a recall of 0.265 in the original database. Accuracy in the generated database is 0.929, precision is 0.929, and recall is 0.930. The results show that the generated database using SMOTE technology can solve the problem of data category imbalance well. Therefore, the condition level classification algorithm proposed in this study holds significant potential for application in steel bridge deck pavement systems.

This study employs parameter optimization methods during the training of prediction models, which requires a high number of iterations and consumes a significant amount of computational resources. Currently, parallelizing the model training process, using multiple computing nodes, can expedite the training procedure given sufficient computational resources. Future research will focus on further improving data balancing methods to reduce computational costs while maintaining quality.

Author Contributions

Y.W.: resources, writing—original draft, and writing—review and editing. R.J.: project administration and data curation. Q.L.: investigation, supervision. Z.S.: methodology, formal analysis, and software. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Yazhou Wei and Rongqing Ji were employed by the company Henan Puwei Expressway Company Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Zeng, Y.; Qiu, Z.; Yang, C.; Haozheng, S.; Xiang, Z.; Zhou, J. Fatigue Experimental Study on Full-Scale Large Sectional Model of Orthotropic Steel Deck of Urban Rail Bridge. Adv. Mech. Eng. 2023, 15, 168781322311552. [Google Scholar] [CrossRef]
Liu, Y.; Shen, Z.; Liu, J.; Chen, S.; Wang, J.; Wang, X. Advances in the Application and Research of Steel Bridge Deck Pavement. Structures 2022, 45, 1156–1174. [Google Scholar] [CrossRef]
Shao, X.; Yi, D.; Huang, Z.; Zhao, H.; Chen, B.; Liu, M. Basic Performance of the Composite Deck System Composed of Orthotropic Steel Deck and Ultrathin RPC Layer. J. Bridge Eng. 2013, 18, 417–428. [Google Scholar] [CrossRef]
McClure, S.; Daniell, K. Development of User-Friendly Software Application for Extracting Information from National Bridge Inventory Source Files. Transp. Res. Rec. 2010, 2202, 137–147. [Google Scholar] [CrossRef]
Li, Q.; Song, Z. Prediction of Compressive Strength of Rice Husk Ash Concrete Based on Stacking Ensemble Learning Model. J. Clean. Prod. 2023, 382, 135279. [Google Scholar] [CrossRef]
Wang, C. State-of-the-Art AI-Based Computational Analysis in Civil Engineering. J. Ind. Inf. Integr. 2023, 33, 100470. [Google Scholar] [CrossRef]
Li, Q.-F.; Song, Z.-M. High-Performance Concrete Strength Prediction Based on Ensemble Learning. Constr. Build. Mater. 2022, 324, 126694. [Google Scholar] [CrossRef]
Thai, H.-T. Machine Learning for Structural Engineering: A State-of-the-Art Review. Structures 2022, 38, 448–491. [Google Scholar] [CrossRef]
Mia, M.M.; Kameshwar, S. Machine Learning Approach for Predicting Bridge Components’ Condition Ratings. Front. Built Environ. 2023, 9, 1254269. [Google Scholar] [CrossRef]
Farooq, F.; Ahmed, W.; Akbar, A.; Aslam, F.; Alyousef, R. Predictive Modeling for Sustainable High-Performance Concrete from Industrial Wastes: A Comparison and Optimization of Models Using Ensemble Learners. J. Clean. Prod. 2021, 292, 126032. [Google Scholar] [CrossRef]
Vu, Q.-V.; Truong, V.-H.; Thai, H.-T. Machine Learning-Based Prediction of CFST Columns Using Gradient Tree Boosting Algorithm. Compos. Struct. 2021, 259, 113505. [Google Scholar] [CrossRef]
Mangalathu, S.; Jeon, J.-S. Classification of Failure Mode and Prediction of Shear Strength for Reinforced Concrete Beam-Column Joints Using Machine Learning Techniques. Eng. Struct. 2018, 160, 85–94. [Google Scholar] [CrossRef]
Bakouregui, A.S.; Mohamed, H.M.; Yahia, A.; Benmokrane, B. Explainable Extreme Gradient Boosting Tree-Based Prediction of Load-Carrying Capacity of FRP-RC Columns. Eng. Struct. 2021, 245, 112836. [Google Scholar] [CrossRef]
Ikumi, T.; Galeote, E.; Pujadas, P.; De La Fuente, A.; López-Carreño, R.D. Neural Network-Aided Prediction of Post-Cracking Tensile Strength of Fibre-Reinforced Concrete. Comput. Struct. 2021, 256, 106640. [Google Scholar] [CrossRef]
Guan, X.; Burton, H.; Shokrabadi, M.; Yi, Z. Seismic Drift Demand Estimation for Steel Moment Frame Buildings: From Mechanics-Based to Data-Driven Models. J. Struct. Eng. 2021, 147, 04021058. [Google Scholar] [CrossRef]
Feng, D.-C.; Liu, Z.-T.; Wang, X.-D.; Chen, Y.; Chang, J.-Q.; Wei, D.-F.; Jiang, Z.-M. Machine Learning-Based Compressive Strength Prediction for Concrete: An Adaptive Boosting Approach. Constr. Build. Mater. 2020, 230, 117000. [Google Scholar] [CrossRef]
Abdollahi, A.; Li, D.; Deng, J.; Amini, A. An Explainable Artificial-Intelligence-Aided Safety Factor Prediction of Road Embankments. Eng. Appl. Artif. Intell. 2024, 136, 108854. [Google Scholar] [CrossRef]
Rashidi Nasab, A.; Elzarka, H. Optimizing Machine Learning Algorithms for Improving Prediction of Bridge Deck Deterioration: A Case Study of Ohio Bridges. Buildings 2023, 13, 1517. [Google Scholar] [CrossRef]
Rajkumar, M.; Nagarajan, S.; Arockiasamy, M. Bridge Infrastructure Management System: Autoencoder Approach for Predicting Bridge Condition Ratings. J. Infrastruct. Syst. 2023, 29, 04022042. [Google Scholar] [CrossRef]
Martinez, P.; Mohamed, E.; Mohsen, O.; Mohamed, Y. Comparative Study of Data Mining Models for Prediction of Bridge Future Conditions. J. Perform. Constr. Facil. 2020, 34, 04019108. [Google Scholar] [CrossRef]
Liu, H.; Zhang, Y. Bridge Condition Rating Data Modeling Using Deep Learning Algorithm. Struct. Infrastruct. Eng. 2020, 16, 1447–1460. [Google Scholar] [CrossRef]
Assaad, R.; El-adaway, I.H. Bridge Infrastructure Asset Management System: Comparative Computational Machine Learning Approach for Evaluating and Predicting Deck Deterioration Conditions. J. Infrastruct. Syst. 2020, 26, 04020032. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Adv. Neural Inf. Process. Syst. 2017, 30, 52. [Google Scholar]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Ganaie, M.A.; Hu, M.; Malik, A.K.; Tanveer, M.; Suganthan, P.N. Ensemble Deep Learning: A Review. Eng. Appl. Artif. Intell. 2022, 115, 105151. [Google Scholar] [CrossRef]
Wu, X.; Kumar, V.; Ross Quinlan, J.; Ghosh, J.; Yang, Q.; Motoda, H.; McLachlan, G.J.; Ng, A.; Liu, B.; Yu, P.S.; et al. Top 10 Algorithms in Data Mining. Knowl. Inf. Syst. 2008, 14, 1–37. [Google Scholar] [CrossRef]
Feng, S.; Keung, J.; Yu, X.; Xiao, Y.; Zhang, M. Investigation on the Stability of SMOTE-Based Oversampling Techniques in Software Defect Prediction. Inf. Softw. Technol. 2021, 139, 106662. [Google Scholar] [CrossRef]
Naser, M.Z.; Kodur, V.K. Explainable Machine Learning Using Real, Synthetic and Augmented Fire Tests to Predict Fire Resistance and Spalling of RC Columns. Eng. Struct. 2022, 253, 113824. [Google Scholar] [CrossRef]
Li, Z.; Burgueño, R. Structural Information Integration for Predicting Damages in Bridges. J. Ind. Inf. Integr. 2019, 15, 174–182. [Google Scholar] [CrossRef]
Chase, S.; Ghasemi, H. Implications of the Long Term Bridge Performance Program for Life Cycle Costing in the United States. Struct. Infrastruct. Eng. 2009, 5, 3–10. [Google Scholar] [CrossRef]

Figure 1. Decision tool flowchart.

Figure 2. Original database correlation matrix.

Figure 3. Generate database correlation matrix.

Figure 4. Training set cross-validation results (original).

Figure 5. Test set results (original).

Figure 6. Variables influencing factors (original).

Figure 7. Training set cross-validation results (generated).

Figure 8. Test set results (generated).

Figure 9. Variables influencing factors (generated).

Table 1. Original database statistics.

Variables	Mean	Std	Max	Min	Describe
Year	1978.910	24.836	2021.000	1900.000	Year of construction
ADT	10,522.022	24,585.016	278,232.000	0.000	Average daily traffic
Load	1.990	2.416	9.000	0.000	Design load
Spans	21.246	19.045	381.000	0.000	Number of spans in the bridge
Length (m)	115.716	267.847	4022.100	6.100	Structure length
Width (m)	9.292	5.786	62.200	2.500	Bridge width
Pavement Condition	1.962	0.409	3.000	0.000	Pavement system condition grade

Table 2. Generate database statistics.

Variables	Mean	Std	Max	Min	Describe
Year	1975.938	25.321	2021.000	1900.000	Year of construction
ADT	6996.778	19,189.377	278,232.000	0.000	Average daily traffic
Load	1.450	2.111	9.000	0.000	Design load
Spans	17.674	14.572	381.000	0.000	Number of spans in the bridge
Length (m)	89.064	201.321	4022.100	6.100	Structure length
Width (m)	7.701	4.582	62.200	2.500	Bridge width
Pavement Condition	1.500	1.118	3.000	0.000	Pavement system condition grade

Table 3. Training set cross-validation evaluation index (original).

Model	Accuracy		Precision		Recall
Model	Mean	Std	Mean	Std	Mean	Std
LightGBM	0.839	0.003	0.746	0.026	0.272	0.029
XGBoost	0.840	0.002	0.749	0.021	0.277	0.027
RF	0.836	0.001	0.701	0.002	0.258	0.025
AdaBoost	0.728	0.132	0.720	0.025	0.265	0.030
KNN	0.819	0.078	0.752	0.022	0.284	0.028
MLP	0.676	0.186	0.709	0.015	0.275	0.042
LR	0.835	0.001	0.698	0.002	0.256	0.025

Table 4. Test set evaluation index (original).

Model	Accuracy	Precision	Recall
LightGBM	0.841	0.845	0.265
XGBoost	0.837	0.730	0.262
RF	0.838	0.712	0.253
AdaBoost	0.829	0.738	0.257
KNN	0.812	0.740	0.267
MLP	0.831	0.724	0.254
LR	0.837	0.702	0.250

Table 5. Training set cross-validation evaluation index (generated).

Model	Accuracy		Precision		Recall
Model	Mean	Std	Mean	Std	Mean	Std
LightGBM	0.924	0.006	0.923	0.006	0.923	0.006
XGBoost	0.922	0.005	0.922	0.005	0.922	0.005
RF	0.906	0.006	0.906	0.006	0.905	0.006
AdaBoost	0.646	0.014	0.646	0.026	0.644	0.014
KNN	0.828	0.007	0.829	0.007	0.827	0.007
MLP	0.571	0.015	0.538	0.015	0.569	0.015
LR	0.546	0.015	0.524	0.015	0.545	0.015

Table 6. Test set evaluation index (generated).

Model	Accuracy	Precision	Recall
LightGBM	0.929	0.929	0.930
XGBoost	0.926	0.926	0.926
RF	0.907	0.906	0.908
AdaBoost	0.637	0.615	0.642
KNN	0.835	0.838	0.838
MLP	0.582	0.557	0.588
LR	0.502	0.499	0.504

Table 7. Evaluation results of different databases.

Databases	Accuracy	Precision	Recall
Original	0.841	0.845	0.265
Generate	0.929	0.929	0.930

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wei, Y.; Ji, R.; Li, Q.; Song, Z. A Condition Assessment Tool for Steel Bridge Deck Pavement Systems Based on Data Balancing Methods and Machine Learning Algorithms. Buildings 2024, 14, 2959. https://doi.org/10.3390/buildings14092959

AMA Style

Wei Y, Ji R, Li Q, Song Z. A Condition Assessment Tool for Steel Bridge Deck Pavement Systems Based on Data Balancing Methods and Machine Learning Algorithms. Buildings. 2024; 14(9):2959. https://doi.org/10.3390/buildings14092959

Chicago/Turabian Style

Wei, Yazhou, Rongqing Ji, Qingfu Li, and Zongming Song. 2024. "A Condition Assessment Tool for Steel Bridge Deck Pavement Systems Based on Data Balancing Methods and Machine Learning Algorithms" Buildings 14, no. 9: 2959. https://doi.org/10.3390/buildings14092959

APA Style

Wei, Y., Ji, R., Li, Q., & Song, Z. (2024). A Condition Assessment Tool for Steel Bridge Deck Pavement Systems Based on Data Balancing Methods and Machine Learning Algorithms. Buildings, 14(9), 2959. https://doi.org/10.3390/buildings14092959

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Condition Assessment Tool for Steel Bridge Deck Pavement Systems Based on Data Balancing Methods and Machine Learning Algorithms

Abstract

1. Introduction

2. Method

2.1. Machine Learning Algorithm

2.2. SMOTE

2.3. SHAP

2.4. Model Development

3. Results and Discussion

3.1. Database

3.1.1. Original Database

3.1.2. Generate Database

3.2. Original Database Results

3.3. Generate Database Results

3.4. Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI