Leveraging a Hybrid Machine Learning Approach for Compressive Strength Estimation of Roller-Compacted Concrete with Recycled Aggregates

Hoang, Nhat-Duc

doi:10.3390/math12162542

Open AccessArticle

Leveraging a Hybrid Machine Learning Approach for Compressive Strength Estimation of Roller-Compacted Concrete with Recycled Aggregates

by

Nhat-Duc Hoang

^1,2

¹

Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam

²

Faculty of Civil Engineering, Duy Tan University, Da Nang 550000, Vietnam

Mathematics 2024, 12(16), 2542; https://doi.org/10.3390/math12162542

Submission received: 23 July 2024 / Revised: 15 August 2024 / Accepted: 16 August 2024 / Published: 17 August 2024

(This article belongs to the Special Issue Automatic Control and Soft Computing in Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

In recent years, the use of recycled aggregate (RA) in roller-compacted concrete (RCC) for pavement construction has been increasingly attractive due to various environmental and economic benefits. Early determination of the compressive strength (CS) is crucial for the construction and maintenance of pavement. This paper presents the idea of combining metaheuristics and an advanced gradient boosting regressor for estimating the compressive strength of roller-compacted concrete containing RA. A dataset, including 270 samples, has been collected from previous experimental works. Recycled aggregates of construction demolition waste, reclaimed asphalt pavement, and industrial slag waste are considered in this dataset. The extreme gradient boosting machine (XGBoost) is employed to generalize a functional mapping between the CS and its influencing factors. A recently proposed gradient-based optimizer (GBO) is used to fine-tune the training phase of XGBoost in a data-driven manner. Experimental results show that the hybrid GBO-XGBoost model achieves outstanding prediction accuracy with a root mean square error of 2.64 and a mean absolute percentage error less than 8%. The proposed method is capable of explaining up to 94% of the variation in the CS. Additionally, an asymmetric loss function is implemented with GBO-XGBoost to mitigate the overestimation of CS values. It was found that the proposed model trained with the asymmetric loss function helped reduce overestimated cases by 17%. Hence, the newly developed GBO-XGBoost can be a robust and reliable approach for predicting the CS of RCC using RA.

Keywords:

roller compacted concrete; recycled aggregate; gradient boosting machine; gradient-based optimizer; asymmetric loss function

MSC:

68T05

1. Introduction

Roller Compacted Concrete (RCC) refers to a special dry concrete consisting of water, fine aggregate, coarse aggregate, and cement [1]. This construction material can be described as zero-slump concrete that is placed with a compacting machine [2]. RCC has been widely used for heavy-duty pavement construction [3]. RCC pavement offers various technical and economic benefits over other types of pavements, such as low cement content, a fast construction schedule, and reduced costs of maintenance [4]. Considering economic benefits, the use of RCC in pavement engineering can bring about savings of up to 40% compared to conventional concrete [5].

In addition, as pointed out in [6], RCC pavement has good resistance to varying temperatures, minor water absorption, and low deformation under heavy traffic loading. Courard et al. [1] discussed that this type of concrete pavement is less susceptible to cracking caused by drying shrinkage. The reason is that RCC can be constructed without joints; it does not necessitate formwork installation and finishing tasks, nor does it require the use of dowels or reinforcing steel [1]. In consideration of ecological aspects, the use of concrete pavement offers other advantages, such as mitigation of the heat island effect in urban areas [7] and reduction of lighting power on the road [8].

Nevertheless, the development of road pavement is widely recognized to be responsible for its negative impacts on the environment. A road construction project particularly requires high energy consumption and massive utilization of raw materials (e.g., fine and coarse aggregates) [9]. In many regions of the world, the fast pace of infrastructure development causes huge demand for concrete aggregates. Additionally, the dwindling capacity of existing sources of natural aggregates necessitates the exploitation of new mining areas [10]. The mining processes stand accused of causing negative effects on the environment. These processes are known for reducing biodiversity, deteriorating water quality, and causing soil pollution [11]. Therefore, in recent years, using recycled aggregates to replace natural ones in concrete has become a pressing need for enhancing the sustainability of the construction industry [12].

Notably, the recycled aggregates used in RCC can come from various sources, including construction and demolition waste, recycled asphalt pavement, and metallic slag waste generated from industrial plants. If not being recycled, those wastes are often ended up in landfill sites. This practice modifies the natural landscape and may bring about serious environmental problems. Therefore, not only does the utilization of recycled aggregates in RCC reduce the consumption of natural resources, but it is also helpful for avoiding the negative impacts of landfills on the environment.

Nevertheless, the characteristics of recycled aggregates are different from those of natural ones. Recycled aggregates often have higher porosity and water absorption than their natural counterparts [13]. In addition, the properties regarding density and strength of natural aggregates are generally better than those of recycled ones [14]. Microstructural analyses have demonstrated typical features of the interfacial transition zones between cement mortar and recycled aggregates [12]. It has been clearly shown that the microstructure of recycled aggregate (RA) concrete is more complex than that of a normal one. The high porosity and the presence of pre-existing cracks in interfacial transition zones negatively affect the mechanical properties of recycled aggregate concrete [15]. Therefore, to facilitate the use of recycled aggregates in RCC, their mechanical properties should be thoroughly understood and accurately predicted.

In pavement engineering, the capability of modeling and predicting the mechanical properties of RCC is crucial for the tasks of mix design and road maintenance [16]. The compressive strength is an essential characteristic of RCC, which is mandatorily required in designing RCC mixtures [17,18,19]. The experimental approach for determining the CS of RCC often necessitates destructive testing procedures, which are highly time-consuming and costly [20]. Moreover, during the mix design process, various combinations of concrete ingredients must be tested to obtain a suitable mix featuring desirable characteristics [1]. Therefore, there is an increasing trend of applying data-driven methods to the task of modeling the CS of concrete mixes [18,21]. These methods, relying on powerful machine learning algorithms, can learn from existing experimental records and generalize a functional mapping between the CS and its influencing factors (e.g., the ingredients of a mix and concrete age).

In recent years, there has been an increasing trend toward applying machine learning to pavement engineering [22]. Machine learning is emerging as a prominent approach in pavement engineering, offering significant advantages in the analysis and management of pavement performance. One of the primary benefits of machine learning is its ability to predict pavement conditions and performance indicators accurately by analyzing large datasets that encompass various influencing factors [23]. Various machine learning techniques, such as neural computing models [17], Bayesian dynamic modeling [24], deep learning [25], gradient boosting [26], etc., have been successfully employed to forecast pavement performance, assess structural integrity, and optimize pavement designs.

Ashrafian et al. [16] utilized tree-based regression models, including the M5 model tree, for predicting the CS of RCC. The authors employed the contents of coarse aggregate, fine aggregate, cementitious materials, and water as influencing factors. It was found that the tree-based regressors were capable and reliable methods for predicting the mechanical property of RCC. Abhilash et al. [27] compared the performance of artificial neural networks (ANN) and multiple regression analysis in predicting the CS values at 3, 7, and 28 days of curing; it was found that ANN outperformed the conventional regression analysis approach by a large margin.

ANN is used in [20] for estimating the CS of RCC pavement containing metallic slag waste and fly ash. ANN and support vector machine (SVM) were employed in [28] for estimating the CS of RA concrete; this work found that the results yielded by SVM were in better agreement with the measured outcomes. The good predictive performance of SVM in predicting the mechanical property of RA concrete is also reported in [29].

Debbarma, Ransinchung [30] proposed the use of ANN to estimate the 28-day CS of RCC pavements blended with RAP aggregates; however, neural network-based models were trained on a relatively limited dataset, which contained only 83 samples. Zhang et al. [18] relied on multivariate adaptive regression splines coupled with the grasshopper optimization algorithm for the task of interest. In [31], the authors demonstrated the superior performance of the boosted tree regression model over other benchmark approaches, including the SVM and Gaussian Process Regression. Thi Mai et al. [17] and Hoang [10] demonstrated the potential of advanced gradient boosting machines in modeling the CS of RCC; however, these previous works did not consider the use of recycled aggregates.

As pointed out in the review by Nguyen et al. [21], there is an increasing trend toward using machine learning methods for predicting the mechanical behaviors of RA concrete. Among reviewed methods, ANN is the most widely employed approach, which demonstrates superior predictive accuracy. In recent years, motivated by the positive outcomes of advanced gradient boosting machines [32,33], this study proposes the use of the extreme gradient boosting machine (XGBoost), a powerful and state-of-the-art regressor, for estimating the CS of RCC containing recycled aggregates.

In recent years, the use of hybrid machine learning approaches has represented a significant advancement in model performance, reliability, and interpretability. By combining machine learning regressors with metaheuristic algorithms for hyper-parameter optimization, practitioners can fine-tune their models to achieve superior accuracy and robustness [32,34,35]. Additionally, employing exponential smoothing-based feature processing, feature transformation, and deep learning-based feature extraction enhances the quality of the input data, allowing for more nuanced feature analysis that is crucial for constructing capable predictive models [36,37,38,39]. Furthermore, techniques such as data augmentation and projection not only enrich the dataset but also improve the model’s generalizability [39,40]. Overall, this hybrid approach facilitates the development of more effective prediction models, ultimately leading to better-informed engineering decisions and enhanced analysis of various engineering applications.

Moreover, in building a robust machine learning model, optimizing its hyper-parameters is crucial [41]. These hyperparameters govern the model’s complexity, which directly influences the prediction accuracy and generalization capability. Optimizing the model’s construction phase for the case of XGBoost is challenging because there are various hyper-parameters required to be tuned and they must be searched in continuous space [32,42,43]. Hence, the gradient-based optimizer (GBO), a recently proposed metaheuristic, is employed to optimize the hyper-parameter selection of XGBoost. The integration of XGBoost and GBO aims to construct a prediction model featuring a fine balance between the two objectives of model building: minimization of prediction error and maximizing the model generalization property. A historical dataset of recycled aggregate RCC experiments is collected to train and verify the proposed model named GBO-XGBoost. The quantities of cement, fly ash, water, natural aggregates, recycled aggregates, as well as concrete age are employed as influencing variables. In addition, to mitigate the problem of overestimation of the CS, an asymmetric loss function is used to train the XGBoost model.

The current work is inspired by the successful employment of metaheuristic optimization combined with machine learning models [29,34,42]. In addition, the proposed framework contributes to the body of knowledge by investigating the capability of the novel GBO algorithm in model construction. A sophisticated objective function is designed for the GBO algorithm to take into account both crucial aspects of model accuracy and generalization properties. To the best of our knowledge, none of the previous works on data-driven CS estimation of RCC have considered the restriction of overestimated results in the objective function during the model optimization phase. Furthermore, the number of studies dedicated to investigating the performance of hybrid machine learning models in predicting the CS of RCC is still rare. The current paper can fill this gap in the literature and promote sustainable practices in pavement engineering.

2. Research Significance

The current work aims to construct a robust and reliable data-driven method for estimating the CS of RCC mixes containing recycled aggregates. The main contributions of the paper to the body of knowledge can be summarized as follows:

(i): Although previous studies have investigated the tasks of modeling the CS of RCC for pavement engineering [17,18,26,27], the number of studies dedicated to the estimation of this mechanical parameter for the case of RCC containing RA is still limited. Therefore, there is a pressing need to investigate other advanced machine learning approaches for dealing with the task of interest.
(ii): ANN-based models have been proposed for predicting the CS of RCC containing recycled aggregates [20,30]. In addition, the literature review in [21] also points out the dominance of ANN in modeling the mechanical properties of RA concrete. Nevertheless, in light of recent reports on the superior performance of state-of-the-art gradient boosting machines [32,44,45,46], the employment of XGBoost in the current study has the potential to improve the accuracy of the CS estimation results.
(iii): As mentioned earlier, the task of modeling optimization is crucial for building a robust data-driven model. Nevertheless, metaheuristic-assisted model optimization has rarely been investigated in estimating the CS of RCC. Therefore, the current works propose a novel integration of the XGBoost regressor and the GBO metaheuristic to automate the model optimization process of the CS prediction approach. GBO is a powerful population-based metaheuristic. Its superior performance has been reported in various fields [47]. However, the capability of GBO to optimize machine learning-based regressors is still rarely reported. Hence, the current paper is an attempt to fill this gap in the literature.
(iv): This study has collected samples from previous experimental works to train and test the developed machine learning model. The quantities of cement, fly ash, water, natural coarse aggregate, natural fine aggregate, recycled coarse aggregate, and recycled fine aggregate are used to estimate the CS of RCC mixes at different ages. Notably, the current work has considered three types of recycled aggregates, namely construction and demolition waste, recycled asphalt pavement, and metallic slag waste. Therefore, the type of RA also serves as a predictor variable. Although machine learning-based modeling has been introduced to the estimation of the CS of RCC containing reclaimed asphalt pavement [48] and recycled slag aggregate [20] separately, none of the previous works has established a comprehensive dataset that takes into account multiple types of recycled aggregates for RCC.
(v): All the previous works related to the CS estimation of RCC using recycled aggregates have focused on minimizing the prediction error in general. In the case of CS prediction, it is worth noticing that mitigating overestimations is also a crucial objective. The reason is that the model with limited cases of overestimation is much more reliable than the one that frequently overestimates the results. Reliable estimations of the CS mixes are essential for guaranteeing the durability and safety of RCC pavements [17]. To go beyond the current research methodologies, this study proposes the use of an asymmetric XGBoost optimized by GBO for mitigating the issue of overestimation of the CS of RCC using recycled aggregates. In detail, an asymmetric squared error loss function is employed during the training phase to penalize overestimated outcomes committed by the machine.

3. Research Method

3.1. The Collected Dataset

Since the construction of a machine learning model for estimating the CS of RCC containing RA is a supervised learning task, data collection is crucial. Herein, each data point is a record of the CS value obtained from destructive tests, corresponding to a mixture design. The data samples in this study were gathered from 11 experimental works, which are summarized in Table 1. The studies of [20,49] contribute the largest proportions of records with 27.78% and 22.22%, respectively. It is noted that the types of specimens used in the data sources are inconsistent. To standardize the specimen size, this study relies on correlation factors [33], which are shown in Table 2. Using the correlation factors, the CS values of different specimen types are standardized to those of the 100 × 100 × 100 mm cubes.

The mixture design includes details about the quantities of the ingredients used in the RCC. In this research, based on reviewing existing literature and considering data availability, nine predictor variables have been identified. These variables consist of the amounts of cement, fly ash, water, natural coarse aggregate, natural fine aggregate, recycled coarse aggregate, recycled fine aggregate, the type of RA, and the age of the concrete. In total, these nine factors are utilized to determine the CS value of the RCC mix. A summary of the statistical characteristics of these variables, including minimum value, mean, standard deviation, skewness, and maximum value, is presented in Table 3. It is important to note that three types of recycled aggregates are employed in this study: construction demolition waste (CDW), reclaimed asphalt pavement (RAP), and metallic slag waste (MSW). For the purpose of the machine learning-based modeling, CDW, RAP, and MSW are coded as 1, 2, and 3, respectively. The fourth predictor variable indicates the type of recycled aggregate used.

Concrete age refers to the time elapsed since the concrete was mixed and placed. This duration is crucial in understanding the development of RCC’s properties, particularly its CS. It is because concrete strength increases over time due to the hydration of cement particles. In this study, the data on concrete age was compiled from previous research works, where each testing sample was associated with specific ages at the time of testing. Each sample’s age was documented alongside its corresponding CS measurements, allowing for an effective analysis of the relationship between age and strength. In addition, the distributions of the predictor variables and the CS are demonstrated by the histograms in Figure 1 and Figure 2. It is noted that the CS values in [48,49,52,56,57] are provided in the form of graphical presentations. Therefore, the CS values in those studies are measured from the reported graphs. This paper relied on MATLAB’s image tool [58] to measure the CS values.

Additionally, the linear correlation between the predictor variables and the CS is illustrated by the scatter plots in Figure 3. In this figure, the Pearson coefficient (R) is computed for each pair of variables to demonstrate the strength of the linear association between them. Observably, the contents of cement, fly ash, natural aggregates, and concrete age have positive linear correlations with the CS. On the contrary, the contents of recycled aggregates show a negative impact on the dependent variable; their Pearson coefficients are less than 0. It is understandable since the inclusion of recycled aggregates to replace their natural counterparts leads to a reduction in the CS of the RCC mixes [6,59,60,61].

The concrete age has the strongest positive correlation with the CS; its R value is 0.41. Meanwhile, the content of water (X₃) virtually shows no linear correlation with the dependent variable. In general, the computed Pearson coefficients show low linear associations between the predictor variables and the CS. Therefore, the functional mappings between those variables and the CS of RCC are highly nonlinear. This fact points out the strong motivation for using sophisticated machine learning approaches for effectively modeling such nonlinear and complex functional mappings.

3.2. Extreme Gradient Boosting Machine

XGBoost [62] belongs to the group of boosting algorithms, which are ensemble learning approaches. This machine learning model is a combination of multiple base learners, e.g., regression trees. XGBoost is a state-of-the-art gradient boosting machine that can achieve both accurate prediction accuracy and a fast convergence rate. This machine learning approach takes into account the residuals of the previous training iteration and adds them to the prediction. XGBoost iteratively constructs sub-trees to reduce the estimation error of the previous ensemble. Moreover, a mechanism of intelligent tree splitting is used during tree construction to build up the structure of the overall model. XGBoost also relies on an advanced randomization approach to enhance the diversity of the tree ensemble. Hence, this method is capable of achieving good prediction accuracy in various modeling tasks [10,45,63,64].

Notably, XGBoost allows flexible control over the model construction phase via the use of regularization coefficients in its loss function. Setting appropriate values for those coefficients can significantly help avoid the overfitting issue. In general, during the training phase, the loss function of XGBoost is minimized to obtain the most desired model that is capable of delivering an accurate prediction outcome. The loss function of XGBoost is given by:

f_{o b j} = \sum_{i} L (y_{i}, F (x_{i})) + \sum_{t} Ω (f_{t})

(1)

where

Ω (f) = ν T + \frac{1}{2} λ \sum_{j}^{T} w_{j}^{2} + α \sum_{j}^{T} | w_{j} |

is the regularization term;

λ

and

α

are the two regularization coefficients; T denotes the number of leaves in a decision tree f;

ν

represents the threshold of the score function employed in the phase of tree splitting; w_j stands for the score of a leaf j of a decision tree.

The first term in the aforementioned equation is the squared error loss (SEL), which is generally used for function approximation. The SEL is given by:

L (t, y) = {(t - y)}^{2}

(2)

where t and y refer to the actual and predicted CS values of a RCC mix.

During the training phase, sub-trees are gradually added to the ensemble so that the overall value of the loss function can be minimized. When a new tree is included in the model, its loss function can be expressed as follows:

f_{o b j}^{t} = \sum_{i} L (y_{i}, F_{t - 1} (x_{i}) + f_{t} (x_{i})) + \sum_{t} Ω (f_{t})

(3)

To derive the optimal value of the score of each leaf in a decision tree, XGBoost relies on Taylor’s second-order approximations. Additionally, the first- and second-order gradients of the loss function are required in the optimization process. In general, the optimal score

w_{j}^{*}

is calculated in the following manner:

w_{j}^{*} = - \frac{\sum_{i \in I_{j}} g_{i}}{\sum_{i \in I_{j}} h_{i} + λ}

(4)

where g_i and h_i stand for the first and second order gradients of

f_{o b j}^{t}

, respectively;

I_{j}

refers to the data samples (i.e., the information of the concrete mix and its measured CS value) that arrive at the leaf j.

When the training process finishes, the final XGBoost model employed for predicting the CS of RAC can be expressed as follows:

F_{X G B o o s t} = \sum_{t = 1}^{M} f_{t} (x)

(5)

where M refers to the number of individual decision trees in XGBoost.

F_{X G B o o s t}

denotes the predicted value of the CS of RCC containing recycled aggregates.

In addition, it is noted that when constructing a machine learning model for estimation of the CS values, restricting overestimations is a crucial objective. It is because overestimations negatively affect the reliability and safety of the predictions. To express the preference for underestimation over overestimation, it is required to modify the standard SEL used in XGBoost. In this study, the asymmetric squared error loss (ASEL) [65] is used to train the XGBoost with the aim of decreasing the ratio of overestimated results. The revised loss function can be expressed as follows [10]:

L_{A S E L} (t, y) = \{\begin{cases} {(t - y)}^{2}, i f (t - y) \geq 0 \\ γ \times {(t - y)}^{2}, o t h e r w i s e . \end{cases}

(6)

where t and y are the measured and estimated CS values, respectively;

γ

represents a tuning parameter of the loss function that governs its degree of asymmetry.

3.3. Gradient-Based Optimizer

Gradient-based optimizer (GBO), proposed in [66], is a population-based metaheuristic for solving global optimization in continuous space. This algorithm is inspired by the gradient-based Newton’s method. In detail, GBO relies on two operations, namely the gradient search rule (GSR) and the local escaping operator (LEO), for finding potential candidate solutions. The former operator aims to explore the search space by mathematical equations inspired by Newton’s method. GBO starts with a randomly generated population. Each member

x = {x_{n}}

(where n = 1,2,…, D) of the population is represented by a vector containing D elements. At generation m, the equation used to generate a new candidate solution via GSR is given by:

X_{n}^{m} = x_{n}^{m} - R_{N} \times ρ_{1} \times \frac{2 \times Δ x \times x_{n}^{m}}{x_{w} - x_{b} + ε} + R_{U} \times ρ_{2} \times (x_{b} - x_{n}^{m})

(7)

where

X_{n}^{m}

and

x_{n}^{m}

are the newly generated and existing candidate solutions, respectively; R_N refers to a normally distributed random number;

ε

denotes a small number in the range of [0, 0.1]; R_U is a uniformly distributed random number in the range of [0, 1]

ρ_{1}

and

ρ_{2}

are the parameters used to balance the exploration and exploitation searching operators.

In addition, to deal with complex optimization problems, GBO implements LEO, which helps prevent premature convergence of the algorithm. This operator basically involves the participation of two existing members and the best-found member. LEO relies on random permutations of these selected members to generate a new candidate solution. This step significantly helps enhance the exploitation process of the algorithm and fend off getting trapped in local optima [47]. Due to these features, GBO exhibits robustness in searching capability and high flexibility in solving a wide range of optimization problems [67,68]. Therefore, the current work relies on GBO to optimize the performance of the XGBoost model used for estimating the CS of RCC containing recycled aggregates.

3.4. Benchmark Machine Learning Approaches

3.4.1. Artificial Neural Network

Artificial Neural Network (ANN) are deemed among the most used machine learning methods in construction engineering for handling complex problems [69]. ANN is essentially a nonlinear information processing approach that mimics the human brain in the process of data generalization. This machine learning model generally contains three layers: an input layer, a hidden layer, and an output layer. The first layer receives the input signal; the second layer processes the received signal; and the final layer yields the estimated output. In the case of CS estimation, the number of input nodes in the first layer is equal to the number of influencing factors, e.g., the constituents of the concrete mixes and the curing age [70]. The neurons in the hidden layer are often equipped with nonlinear transfer functions that allow ANN to effectively deal with nonlinear and multivariate data modeling tasks. Hence, ANN is one of the most extensively utilized methods for estimating the CS of concrete mixes [21].

Each neuron in the hidden and output layers is linked with the ones in the successive layer. The strength of these connections is quantified by connecting weights or synaptic weights. The sophisticated interplay between the neurons and the use of transfer functions allow ANN to analyze complex patterns and perceive functional mapping between the input signals and the output, which is the CS of concrete mixes. To train an ANN model and use it for generalizing untested samples, the synaptic weights must be appropriately adapted according to a collected dataset. In this study, the Levenberg-Marquardt (LM) algorithm [71], which is a popular method for training ANN models, is used. The LM algorithm relies on information about the first-order partial derivatives of the loss function for fitting the ANN model to a training dataset. The effectiveness of the LM algorithm in training CS estimation models has been widely reported in previous studies [72,73].

3.4.2. Support Vector Machine

Support vector machine (SVM) for regression [74] is a capable method for constructing nonlinear functions that generalize relationships between a set of predictor variables and a target variable. SVM resorts to the use of kernel functions that map the data from its original feature space to a high-dimensional space. In the transformed data space, the machine learning model constructs a hyperplane that best fits the training data samples. For nonlinear function approximation, the radial basis kernel function is preferred [28]. Notably, the training phase of a SVM model is converted to a quadratic programming problem, which can be efficiently solved by a nonlinear programming solver [75]. As pointed out in [21], SVM is also a popular method for CS estimation, which is highly suitable for small- and medium-sized datasets.

3.4.3. M5-Model Tree

The M5 model tree (M5-MT) [76] is a decision tree-based model for function approximation. The main advantages of M5-MT are its capability of handling large-scale datasets and high-dimensional data [77]. Given a training dataset, this model separates the input space into a number of sub-spaces in which a linear regression function is constructed to describe the pattern in the data. The criterion used for data splitting is based on the measure of the estimation error at a node of the binary tree. The training process aims to construct a tree-like structure that can provide the best fit to a training dataset. M5-MT can be considered a combination of linear models or a piecewise linear function in which multiple linear models are constructed at the leaf nodes. Therefore, this machine learning model is resistant to overfitting and can provide good performance in the task of CS prediction [77,78,79,80].

4. Results and Comparison

4.1. Experimental Setting

In this study, the dataset, including 270 samples and nine predictor variables, is used to train and test the proposed machine learning approach for estimating the CS of RCC containing recycled aggregates. The proposed method, named GBO-XGBoost, is an integration of a metaheuristic algorithm and an advanced gradient boosting machine. The gathered dataset was separated into a training set and a testing set. The ratio of training-to-testing samples is set to be 9/1 to ensure sufficient samples for training a robust regressor. Moreover, to negate the effect of randomness in data sampling, the model training and testing phases have been repeated 20 times. The final model’s performance is determined by the average outcomes obtained from these multiple runs. In this study, the XGBoost regressor is constructed with the assistance of the Python library provided in [81]. The GBO search engine is implemented with the MEALPY library [82]. The experiments with the constructed model are performed on the Dell G15 5511 (Core i7-11800H and 16 GB RAM).

Notably, the variables in the current dataset have different ranges. For instance, the minimum and maximum values of the cement quantity are 142 and 375, respectively. Meanwhile, the quantity of recycled coarse aggregate varies from 403.05 to 1264.40. Therefore, data normalization is required to pre-process the data. This step can help avoid circumstances in which the variables with large magnitudes dominate the ones with small magnitudes. In this study, the Z-score method is used because it can help normalize the mean and standard deviation measures of the variables [83]. The Z-score normalization equation is expressed as follows:

X_{Z} = \frac{X_{O} - μ_{X}}{σ_{X}}

(8)

where

X_{Z}

and

X_{O}

refer to the normalized and original variables, respectively.

μ_{X}

and

σ_{X}

represent the mean and standard deviation of the original variable, respectively.

It is also noted that performance measurement is crucial for assessing the predictive capability of machine learning models. In the current work, various approaches are used to evaluate the goodness-of-fit achieved by different predictors. The employment of different metrics helps observe the predictive capability of a model from different angles. For regression analysis, the root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R²) are commonly employed. The equations used to compute these metrics are given by:

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - t_{i})}^{2}}

(9)

M A P E = \frac{100}{N} \times \sum_{i = 1}^{N} \frac{| y_{i} - t_{i} |}{y_{i}}

(10)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(t_{i} - y_{i})}^{2}}{\sum_{i = 1}^{N} {(t_{i} - \bar{t})}^{2}}

(11)

where t_i and y_i refer to the actual and predicted CS values of the ith data sample, respectively. N denotes the total number of data samples.

RMSE and MAPE are the two indices providing measurements regarding the average prediction error committed by the model. Meanwhile, the coefficient of determination indicates the proportion of variation in the target variable explained by the model. In addition, analyzing the characteristics of the model’s residuals is also essential in model evaluation. The model’s residual (∆) is computed as

t - y

, where t and y are the measured and estimated CS values of one concrete mix. Assessing the range of the ∆ may help shed light on how well the data is fitted by the model. Therefore, this study relies on various techniques for investigating the feature of the model’s residual, including range inspection, regression error characteristic curve [84], and statistical descriptions.

To demonstrate the advantages of the newly developed GBO-XGBoost, ANN, SVM, and M5-MT are used as benchmark approaches. The reason is that ANN and SVM have been shown to be prominent approaches for predicting the CS of recycled aggregate concrete [21]. In addition, good performance of M5-MT for CS estimation has been reported in various studies [77,85]. In this study, the ANN and SVM are constructed in MATLAB’s statistics and machine learning toolbox [86]. The M5-MT is built via the functions of the library provided in [87]. As pointed out in [21], k-fold cross-validation is the most common approach for machine learning-based predictions of the mechanical properties of recycled aggregate concrete. This approach helps ensure the robustness and generalizability of the developed models. Hence, the cross-validation processes [88] are used to perform the model selection phases and identify suitable hyper-parameters of the benchmark approaches.

As mentioned earlier, the current work resorts to GBO for optimizing the performance of the XGBoost model. The flowchart of the proposed framework is presented in Figure 4. Given the training and testing datasets, the structure of XGBoost is initiated. In each iteration, the GBO evaluates the quality of the candidate solutions, consisting of the XGBoost’s hyper-parameters. These hyperparameters are the number of individual trees, the learning rate, the maximum tree depth, the L₂-regularization factor, and the L₁-regularization factor. The lower and upper boundaries of the searched hyper-parameters are [1, 0.001, 1, 0, 0] and [200, 1, 10, 1000, 1000], respectively.

Herein, the model selection process of XGBoost is formulated as a global optimization problem in which a cost function is minimized. In detail, the following objective function is designed for the GBO searching process:

f_{G B O} = A P E + w_{T N} \times N_{T r e e} + w_{M T D} \times M T D + w_{Re g} \times (\frac{1}{1 + λ_{Re g}}) + w_{Re g} \times (\frac{1}{1 + α_{Re g}})

(12)

where APE refers to the average prediction error of the model; N_Tree denotes the number of individual trees in the XGBoost ensemble; MTD represents the maximum tree depth;

λ_{Re g}

and

α_{Re g}

are the L₂-regularization factor and L₁-regularization factor, respectively; w_TN = 0.001, w_MTD = 0.01, and w_Reg = 0.0001 are the weighting coefficients of the terms in the objective function.

The average prediction error (APE) mentioned above is the mean of the testing results obtained from a five-fold cross-validation process. APE is computed as follows:

A P E = \frac{\sum_{i = 1}^{M} R M S E_{k}^{T r a i n} + \sum_{i = 1}^{M} R M S E_{k}^{T e s t}}{K}

(13)

where K = 5 is the number of data folds; RSME refers to the root mean square error.

It can be observed that the aforementioned objective function is composed of a cross-fold evaluation and the terms used for expressing the model’s complexity. Notably, to minimize f_GBO, the metaheuristic algorithm needs to locate candidate solutions featuring low APE. However, reducing the prediction error often entails an increasing degree of the model’s complexity. In this study, to restrict the overcomplex model’s structure, the terms governing the complexity of XGBoost are taken into account. It is noted that the GBO searching process does not only favor the model having good predictive accuracy (i.e., a low value of APE), but also pays attention to reducing the number of N_Tree and MTD. The reason is that the lower the N_Tree and MTD are, the less sophisticated the model’s structure is. Moreover, high values of regularization coefficients (i.e.,

λ_{Re g}

and

α_{Re g}

) impose high pressure on restricting the complexity of the XGBoost model so that it is less susceptible to overfitting. Relying on f_GBO, the GBO metaheuristic carried out the searching process with 20 population members and during 100 iterations. When the optimization is accomplished, the performance of the best model is evaluated using the metrics of RMSE, MAPE, and R².

4.2. Prediction Results and Performance Comparison

As mentioned in the previous section, the proposed approach relies on GBO to optimize the performance of XGBoost. The predictive capability of XGBoost depends on the setting of its hyper-parameters: the number of individual trees (N_Tree), the learning rate (L_R), the maximum tree depth (MTD), the L₂-regularization factor (

λ_{Re g}

), and the L₁-regularization factor (

α_{Re g}

). After 100 iterations, the population-based metaheuristic helped identify the following configuration of the XGBoost model used for estimating the CS of recycle aggregate RCC: N_Tree = 46, L_R = 1.97, MTD = 4,

λ_{Re g}

= 168.96, and

α_{Re g}

= 0.00011. The optimization progress of GBO is illustrated in Figure 5.

In addition, based on the cross-validation process [88], the ANN trained by the LM algorithm is configured as follows: the number of neurons in the hidden layers is 11 and the learning rate of the algorithm is 0.001. The SVM model necessitates the setting of the penalty coefficient (C_SVM), the radial basis kernel function parameter (

γ_{S V M}

), and the margin of tolerance (

ε_{S V M}

). These hyperparameters of the SVM model are set as follows: C_SVM = 100,

γ_{S V M}

= 1, and

ε_{S V M}

= 0.1. In the case of M5-MT, the minimum leaf size and the splitting thresholding value are set to be 2 and 10⁻⁶, respectively.

The experimental results of the proposed GBO-XGBoost and the benchmark methods are summarized in Table 4. The CS estimation outcomes of the proposed model and other approaches are graphically demonstrated by the scatter plots in Figure 6. Observably, the data points predicted by GBO-XGBoost are clustered around the agreement line. Meanwhile, there is a large degree of dispersion of the data points in the scatter plots yielded by the benchmark models. This fact indicates that the accuracy of the data fitting provided by GBO-XGBoost is higher than that yielded by other models. Quantitatively speaking, the proposed method has attained an outstanding performance with a RMSE of 2.639, a MAPE of 7.823%, and an R² of 0.941. This outcome is better than SVM (RMSE = 4.048, a MAPE of 12.027%, and an R² of 0.866) as the second-best method by a large margin. ANN (RMSE = 4.493, a MAPE of 12.912%, and an R² of 0.835) and M5-MT (RMSE = 5.855, a MAPE of 17.692%, and a R² of 0.722) attained performance that is significantly inferior to that of GBO-XGBoost.

RMSE is deemed the most commonly employed metric for assessing the quality of a regressor. This metric is essentially the average deviation between a model’s estimated values and the measured values of CS. It can be shown that the typical differences between the actual and predicted results obtained from GBO-XGBoost, SVM, ANN, and M5-MT are 2.639, 4.048, 4.493, and 5.855 MPa, respectively. Since deviations or residuals represent the distance between the constructed functional mapping and the actual data samples, RSME typically exhibits the degree of dispersion of the model’s residuals. In other words, this index provides insight on how closely the actual CS values cluster around the constructed functional mapping. RSME entails the squaring of residuals; therefore, this index is sensitive to samples associated with exceptionally large residuals.

Notably, the RSME of SVM is roughly 1.5 times larger than the result of GBO-XGBoost. This fact indicates that the prediction outcome of the proposed method contains fewer exceptionally large residuals than the benchmark approach. Moreover, if the distribution of residuals is assumed to follow a normal distribution, it can be roughly estimated that 95% of the prediction results of GBO-XGBoost lie within the range of ±2 × RMSE, or approximately ±5.3 MPa. Following the aforementioned assumption, the deviation range of the benchmark approaches is at least ±8.1 MPa, which is significantly wider than that of the proposed method.

In addition, MAPE also provides an intuitive interpretation of a model’s performance in terms of relative deviation. This index basically computes the average magnitude of the residuals. As observed from the results, the average relative deviation between the actual and predicted CS yielded by GBO-XGBoost is about 7.8%. Based on the classification of MAPE values for CS estimation stated in [32], a MAPE < 10% indicates excellent predictive capability of the newly developed method. The results of the benchmark methods falling in the range of [10–20%] imply their good performance. These classification criteria based on MAPE are understandable because the task of predicting CS values is highly challenging. The reason is that the prediction error is inevitable due to the variations in the physical features of the concrete mixes’ constituents as well as the uncertainty in the processes of casting and testing the concrete specimens [89].

The R² index also provides crucial information about the goodness of fit of a machine learning model. This coefficient quantifies the proportion of the variation in the CS that is predictable from its influencing factors and the trained model. This index ranges from 0 to 1. An R² of 1 exhibits a perfect data-fitting result. With an R² of 0.94, the proposed GBO-XGBoost is capable of explaining up to 94% of the variations in the CS of recycle aggregate RCC. The R² values obtained from other benchmark models are all less than 90%. For instance, SVM, as the second-best approach, is able to predict about 87% of the variations in the target variable. Figure 7 graphically shows the improvements in the performance measurement metrics gained by the proposed model. In detail, compared to SVM as the best benchmark approach, GBO-XGBoost helps reduce the RSME and MAPE indices by roughly 35%. The proposed approach also enhanced the goodness of fit in terms of R² by at least 8.72%.

In Figure 8, detailed distributions of the models’ relative residual range (r) are graphically demonstrated. In the task of CS prediction, it is desired to attain values of r that are smaller than 20% [32]. Moreover, the information about how a model’s residuals are allocated within the range of [0%, 20%] is valuable to assess its prediction performance [10]. Accordingly, the current work investigated the proportion of r in five groups: (i) r ≤ 5%, (ii) 5 < r ≤ 10%, (iii) 10% < r ≤ 15%, (iv) 15% < r ≤ 20%, and (v) r > 20%. As observed from the figure, GBO-XGBoost attained the highest proportion of data samples in the first group. The ratios of data in this group obtained from the proposed method, SVM, ANN, and M5-MT are 48.89%, 31.48%, 34.26%, and 23.70%, respectively. Consider the relative range of ≤10%; GBO-XGBoost excels other models with the proportion of roughly 73%. SVM, ANN, and M5-MT attained outcomes of roughly 60%, 56%, and 43%, respectively. Notably, only about 9% of the data samples predicted by GBO-XGBoost are associated with r exceeding 20%. In this regard, the performance of the newly developed method is superior to other models by a large margin. The proportions of r exceeding 20% yielded by SVM, ANN, and M5-MT are approximately 16%, 20%, and 30%, respectively.

Moreover, to assess the model’s performance with respect to the cumulative distribution of the residuals, this study relies on the regression error characteristic curve (REC) proposed in [84]. A graph of REC is constructed by sorting the absolute value of the residuals. The analysis results are provided in Figure 9. Herein, the x-axis is the absolute value of prediction error. The y-axis displays the proportion of the magnitude of the residual that is less than a certain value in the x-axis. For example, a point in a REC with coordinates (6.00, 0.90) means that 90% of the absolute value of the residuals is less than 6 MPa. Due to the characteristics of the REC, the more data points in the upper left corner of the graph, the better the prediction performance. Therefore, the accuracy of the prediction result can be judged by how fast the REC reaches the line of y = 1. Based on such observations, the area under the REC (AUC) can be calculated to express the model’s prediction performance. XGBoost-GBO attained the highest AUC of 0.90; this outcome is followed by SVM (AUC = 0.85), ANN (AUC = 0.84), and M5-MT (0.78).

Additionally, the distributions of the prediction performance obtained from the machine learning models are provided in Figure 10. Herein, the RMSE index is selected to measure the prediction errors committed by the models. Each box plot in the figure displays the minimum, first quartile, median, third quartile, and maximum of RSME values obtained from 20 experiments. It is apparent that the median of the predictor error obtained from GBO-XGBoost (2.61) is significantly lower than that yielded by other models. The medians of RMSE attained by SVM, ANN, and M5-MT are 3.65, 4.65, and 5.71, respectively. Therefore, the median prediction error estimated by GBO-XGBoost is at least 28% lower than that of other benchmark methods. To reliably affirm the superiority of the proposed method, this study resorts to the Wilcoxon signed-rank test [90] with a threshold of p-value = 0.05. Pairwise hypothesis tests (GBO-XGBoost vs. SVM, GBO-XGBoost vs. ANN, and GBO-XGBoost vs. M5-MT) all yielded p-values of 0.0001 < 0.05. Therefore, it is confident to reject the null hypothesis of indifferent prediction performances and confirm the pre-eminence of the proposed method in the task of estimating the CS of recycled aggregate RCC.

Moreover, to assess the effect of potential outliers in the dataset, this study utilizes the criterion based on using the standard deviation. Herein, the value of 2 times the standard deviation is employed as the threshold for outlier filtering [91]. This approach is based on the observation that, for a normal distribution, approximately 95.4% of the data points are likely to fall within two standard deviations from the mean. Based on the analysis with the CS values, it is found that 10 samples in the dataset have the target output surpassing ±2×σ_CS, where σ_CS denotes the standard deviation of the CS values. To appraise the influence of these 10 samples on the GBO-XGBoost, this study has performed experiments consisting of two scenarios: (i) Scenario 1: The original dataset consisting of 270 samples; and (ii) Scenario 2: The data set without 10 samples identified as potential outliers. The experimental outcomes are reported in Table 5. As can be seen from the table, the model trained with 260 samples in Scenario 2 suffers from a slight reduction in prediction accuracy. Its RMSE and MAPE have been increased to 2.742 and 8.382%, respectively. Moreover, the model trained with fewer data samples can only explain 92% of the variability in the target output, a minor deterioration compared to 94% in the original model (Scenario 1). Thus, it can be concluded that the samples having CS values exceeding ±2×σ_CS are relevant for the model construction and should be included in the analysis with GBO-XGBoost.

4.3. Analysis of Feature Importance

Moreover, Shapley Additive exPlanations (SHAP) [92] are utilized in this section to interpret the predictions yielded by XGBoost. SHAP, derived from cooperative game theory, elucidates the estimated CS value produced by the machine learning model by assessing the impact of the influencing factors [35,93]. The primary advantage of SHAP analysis is the enhanced interpretability of the machine learning model [93]. SHAP values can be visualized conveniently, and they quantify the contribution of each feature to a model’s predictions, providing insights into how specific features influence the decision-making process of the model. This method assigns an importance value, known as the SHAP value, to each feature for every individual prediction. This means that SHAP provides local explanations, detailing how each feature impacts a specific prediction. Shapley values are derived from cooperative game theory and are used to determine the fair contribution of each player (or feature) to the overall outcome of the game (or model prediction). This theoretical basis allows SHAP to maintain desirable properties such as local accuracy and consistency. Hence, SHAP offers a more nuanced and consistent framework for understanding how features contribute to model predictions.

Figure 11 presents the SHAP summary plot obtained from XGBoost trained by the collected dataset. Here, this plot helps visualize the influence of a feature on a prediction. It is apparent that the concrete age and quantity of cement are the most crucial variables. They considerably affect the variation of the CS values. The age of the concrete is a critical factor that governs the pozzolanic activity of RCC mixes [94]. Cement is the primary binding material in RCC, and its quantity directly influences the strength performance of the mixes. Previous works have also confirmed the important role of cement quantity in machine learning-based models used to model the CS of RCC [26,30]. In addition, the quantities of natural fine aggregate, natural coarse aggregate, type of recycled aggregates, water, and recycled coarse aggregate moderately influence the model’s output. Meanwhile, the effects caused by the quantities of fly ash and recycled aggregate are minor and less than those of the aforementioned features. In addition, the concrete age, quantity of cement, and quantity of natural fine aggregate show clear positive correlations with the CS values. On the contrary, negative correlations with the CS variable can be seen in the type of recycled aggregate and quantity of recycled coarse aggregate. It is understandable because a high proportion of recycled coarse aggregate often leads to a significant reduction in the mechanical properties of RCC [59,60].

4.4. Reduction of Overestimation Based on GBO-XGBoost Using Asymmetric Loss Function

This section first analyzes the characteristics of the residuals obtained from GBO-XGBoost trained by the SEL function. As mentioned in the previous section of the paper, SEL is used to guide the model’s training phase to minimize the overall deviation between the actual and predicted output. This loss function does not take into account the sign of the residuals. It is reminding that the residual (Δ) is calculated as t − y, where t and y refer to the actual and estimated CS values, respectively. In this regard, a positive Δ implies an underestimation of the CS value. On the contrary, an overestimation of the CS value is equivalent to a negative Δ. Since the standard SEL cannot express the preference for underestimation of the target variable, it is expected that the proportions of negative and positive residuals yielded by GBO-XGBoost are relatively close to 50%. In addition, the average value of the residuals should be close to 0. The distribution of the residuals yielded by the model using the standard SEL is demonstrated by the histogram in Figure 12. As expected, the ratio of negative to positive Δ is close to 50:50. In addition, the model’s residuals cluster around the mean of −0.01. The dispersion of the distribution is characterized by a standard deviation of 2.67. The skewness value of −0.12 indicates that the left tail of the distribution is slightly longer than its right one.

To improve the safety and reliability of the prediction results, this study resorts to the ASEL. This asymmetric loss function introduces an additional parameter (γ) to the standard SEL for expressing the preference for underestimation. Notably, γ should be set to be ≥1; and the bigger the value of γ, the stronger the preference against negative residuals. To implement ASEL in XGBoost, it is required to modify the first-order (

ω_{A S E L}

) and second-order (

ψ_{A S E L}

) derivatives of the loss function as follows:

First - order derivative : ω_{A S E L} (t, y) = \{\begin{cases} 2 \times (t - y), i f (t - y) \geq 0 \\ 2 \times γ \times (t - y), o t h e r w i s e . \end{cases}

(14)

Sec ond - order derivative : ψ_{A S E L} (t, y) = \{\begin{cases} 2, i f (t - y) \geq 0 \\ 2 \times γ, o t h e r w i s e . \end{cases}

(15)

where γ is the tuning parameter of the ASEL; t and y denote the measured and estimated CS of the RCC, respectively.

To optimize the XGBoost model using ASEL, this study proposes the inclusion of the preference for underestimation of the target variable in the objective function of the GBO metaheuristic. In this regard, the revised objective function of GBO is given by:

f_{G B O}^{A S E L} = A P E + w_{T N} \times N_{T r e e} + w_{M T D} \times M T D + w_{Re g} \times (\frac{1}{1 + λ_{Re g}}) + w_{Re g} \times (\frac{1}{1 + α_{Re g}}) + (\frac{1}{1 + A P P R})

(16)

where APPR represents the average proportion of positive residuals; other terms in the equations are similar to those previously defined in

f_{G B O}

(refer to Equation (12)).

The objective function stated in Equation (16) helps exhibit the three goals of constructing the XGBoost model for CS estimation: minimizing the prediction error, maximizing the generalization property, and the preference for positive residuals. With each setting of γ (the tuning parameter of the ASEL), the GBO is used to optimize the XGBoost model. Through experiments, it was found that γ ≥ 5 brought about major improvements in terms of APPR. A large value of this tuning parameter tends to increase the value of APPR but also deteriorates the overall prediction performance. Notably, γ = 15 helps yield the most desired outcome, with an average proportion of positive residuals of 66.30% and a coefficient of determination of 0.929. The value of γ > 15 could not bring about any improvement in APPR. The experimental results regarding the effect of different values of γ are illustrated in Figure 13.

After 100 iterations, the GBO has identified the following configuration of the XGBoost using ASEL: the number of individual decision trees = 51, the learning rate = 0.70, the maximum depth of the tree = 4, the L₂-regularization coefficient = 275.07, and the L₁-regularization coefficient = 0.029. It is noted that this model configuration is associated with γ = 15. The characteristic of the residuals obtained from the GBO-XGBoost using ASEL is demonstrated by the histogram in Figure 14. As shown in this figure, the proportion of overestimation has been drastically reduced from 50.74% to 33.70%. The whole distribution is shifted towards the right side of the x-axis. The residuals cluster around the mean of 0.97, and the standard deviation of the distribution is 2.84. A positive skewness of 0.13 indicates that the right tail of the distribution has been broadened. Notably, the distribution has shifted to the right side of the x-axis to accommodate more positive residuals.

The overall prediction performance of GBO-XGBoost using ASEL is demonstrated by the scatter plot in Figure 15. With RMSE = 2.89, MAPE = 7.87%, and R² = 0.93, there is excellent agreement between the actual and estimated results. In addition, the majority of the data points (66.3%) have been observed to be below the line of best fit. With MAPE < 10%, the prediction performance of GBO-XGBoost using ASEL is highly desired and surpasses that of other benchmark methods. The newly developed method is capable of explaining 93% of the variation in the CS variable.

A comparison between the ASEL- and SEL-based models is graphically presented in Figure 16. As shown in this figure, the ASEL-based model is capable of reducing the number of overestimations by roughly 17%; meanwhile, the asymmetric loss function only causes a minor deterioration in the overall prediction performance. The MAPE value of the GBO-XGBoost using ASEL is only 0.05% less than that of the model using the standard SEL. Therefore, the newly developed model, which combines the advantages of GBO and XGBoost, can be a potential alternative for estimating the CS of RCC using recycled aggregates. To facilitate the implementation of GBO-XGBoost, a graphical user interface (GUI) based on the proposed framework was developed. The GUI is constructed with the help of the tkinter library [95]. The developed GUI is demonstrated in Figure 17 and has been deposited in the Github repository at https://github.com/NHDDTUEDU/RA_RCC_GBO_XGBoost (accessed on 22 July 2024). Herein, the program requires input information about the RCC’ constituents and the curing age. The user can also select the option of using either the SEL- or ASEL-based models via the radio buttons provided in the GUI.

5. Concluding Remarks

This study successfully demonstrates the efficacy of a hybrid machine learning model, specifically the GBO-XGBoost, for predicting the compressive strength of RCC incorporating recycled aggregates. The integration of GBO with XGBoost has proven to enhance predictive accuracy, achieving a RMSE of 2.64 and a MAPE of 7.82%. The model’s ability to account for 94% of the variance in compressive strength convincingly demonstrates its robustness and reliability. These outcomes are highly optimistic because CS estimation is a challenging problem. The actual testing results reported in the dataset are subject to various sources of uncertainty, such as variations in the properties of the concrete’s constituents and specimen casting processes.

Moreover, to reduce the proportion of overestimations in CS, this study resorted to ASEL for training the XGBoost model. A novel objective function has been proposed for the case in which XGBoost with ASEL is optimized by GBO. The use of ASEL helps restrict overestimations, but it results in a minor reduction in prediction performance. This compromise in prediction accuracy may be explained by the fact that when optimizing XGBoost with ASEL, GBO must consider an additional objective, i.e., maximizing the percentage of positive residuals. Meanwhile, when XGBoost with SEL is used, the metaheuristic is able to fully focus on minimizing the estimation error and maximizing the generalization property.

The research findings indicate that the proposed method not only mitigates the overestimation of compressive strength values—reducing such instances by 17% through the implementation of an asymmetric loss function—but also provides a valuable tool for practitioners in the field. By facilitating the early determination of CS, this approach can significantly contribute to the optimization of construction processes and maintenance strategies for RCC pavements. Results obtained from SHAP analysis and GBO-XGBoost show that the age of the concrete, the amount of cement used, and the quantity of natural fine aggregate all exhibit strong positive correlations with the CS of RCC mixes. On the other hand, there are negative correlations between the CS and the amount of recycled coarse aggregate. This fact indicates that the higher the content of recycled aggregate, the larger the reduction in the CS of RCC.

Nevertheless, the current work also has a number of limitations that can be addressed in future work. First, the number of data instances in the current dataset is still limited. In addition, the variability in key input variables is relatively large (such as the cement quantity and the content of the recycled coarse aggregate). This fact, to some degree, may hinder the model’s generalization in certain regions of the input space. Second, the current work has not taken into account the use of other cementitious replacement materials such as rice husk ash [56] and ground granulated blast furnace slag [96]. Third, due to the problem of data inconsistency, this study has not taken into account the mechanical properties of recycled aggregates and the curing conditions of concrete mixes. Fourth, the current model configuration process carried out by GBO is modeled as a single objective optimization. Although experimental results point out that this way of formulation can still help achieve satisfactory results, it is required to manually set the weighting coefficients of the terms in the objective function.

In this regard, future work should focus on expanding the dataset to enhance the generalization of the machine learning model. Especially, more data samples containing cementitious replacement materials, a wider variety of recycled aggregates in RCC, the mechanical properties of recycled aggregates, and curing conditions should be included. This will further enhance the model’s applicability and accuracy, paving the way for more sustainable practices in concrete construction. Moreover, investigating other advanced approaches for model construction, such as Bayesian optimization [97], swarm intelligence [29], random search [98], and hybrid optimization frameworks [99], can be worth investigating for CS estimation of RCC.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available in Github at https://github.com/NHDDTUEDU/RA_RCC_GBO_XGBoost (accessed on 22 July 2024).

Conflicts of Interest

The author has no competing interests to declare that are relevant to the content of this article.

Abbreviations

Acronyms	Meanings
ANN	Artificial neural network
APE	Average prediction error
APPR	Average proportion of positive residuals
ASEL	Asymmetric squared error loss
AUC	The area under the regression error characteristic curve
CDW	Construction demolition waste
CS	Compressive strength
GBO	Gradient-based optimizer
GUI	Graphical user interface
LEO	Local escaping operator
LM	Levenberg-Marquardt
M5-MT	M5 model tree
MAPE	Mean absolute percentage error
MSW	Metallic slag waste
MTD	Maximum tree depth
RA	Recycled aggregate
RAP	Reclaimed asphalt pavement
REC	Regression error characteristic curve
RCC	Roller-compacted concrete
RMSE	Root mean square error
SEL	Squared error loss
SHAP	Shapley Additive exPlanations
SVM	Support vector machine
XGBoost	Extreme gradient boosting machine

References

Courard, L.; Michel, F.; Delhez, P. Use of concrete road recycled aggregates for Roller Compacted Concrete. Constr. Build. Mater. 2010, 24, 390–395. [Google Scholar] [CrossRef]
Lopez-Uceda, A.; Agrela, F.; Cabrera, M.; Ayuso, J.; López, M. Mechanical performance of roller compacted concrete with recycled concrete aggregates. Road Mater. Pavement Des. 2018, 19, 36–55. [Google Scholar] [CrossRef]
Chhorn, C.; Kim, Y.K.; Hong, S.J.; Lee, S.W. Evaluation on compactibility and workability of roller-compacted concrete for pavement. Int. J. Pavement Eng. 2019, 20, 905–910. [Google Scholar] [CrossRef]
Aghaeipour, A.; Madhkhan, M. Mechanical properties and durability of roller compacted concrete pavement (RCCP)—A review. Road Mater. Pavement Des. 2020, 21, 1775–1798. [Google Scholar] [CrossRef]
ACI. Report on Roller Compacted Concrete Pavements; Reported by ACI Committee 325; ACI 325.10R-95; American Concrete Institute: Farmington Hills, MI, UDA, 2001. [Google Scholar]
Lam, M.N.-T.; Jaritngam, S.; Le, D.-H. Roller-compacted concrete pavement made of Electric Arc Furnace slag aggregate: Mix design and mechanical properties. Constr. Build. Mater. 2017, 154, 482–495. [Google Scholar] [CrossRef]
Santero, N.J.; Masanet, E.; Horvath, A. Life-cycle assessment of pavements Part II: Filling the research gaps. Resour. Conserv. Recycl. 2011, 55, 810–818. [Google Scholar] [CrossRef]
AzariJafari, H.; Yahia, A.; Ben Amor, M. Life cycle assessment of pavements: Reviewing research challenges and opportunities. J. Clean. Prod. 2016, 112, 2187–2197. [Google Scholar] [CrossRef]
Aghayan, I.; Khafajeh, R.; Shamsaei, M. Life cycle assessment, mechanical properties, and durability of roller compacted concrete pavement containing recycled waste materials. Int. J. Pavement Res. Technol. 2021, 14, 595–606. [Google Scholar] [CrossRef]
Hoang, N.-D. A novel ant colony-optimized extreme gradient boosting machine for estimating compressive strength of recycled aggregate concrete. Multiscale Multidiscip. Model. Exp. Des. 2024, 7, 375–394. [Google Scholar] [CrossRef]
Rentier, E.S.; Cammeraat, L.H. The environmental impacts of river sand mining. Sci. Total Environ. 2022, 838, 155877. [Google Scholar] [CrossRef]
de Andrade Salgado, F.; de Andrade Silva, F. Recycled aggregates from construction and demolition waste towards an application on structural concrete: A review. J. Build. Eng. 2022, 52, 104452. [Google Scholar] [CrossRef]
de Juan, M.S.; Gutiérrez, P.A. Study on the influence of attached mortar content on the properties of recycled concrete aggregate. Constr. Build. Mater. 2009, 23, 872–877. [Google Scholar] [CrossRef]
Casuccio, M.; Torrijos, M.C.; Giaccio, G.; Zerbino, R. Failure mechanism of recycled aggregate concrete. Constr. Build. Mater. 2008, 22, 1500–1506. [Google Scholar] [CrossRef]
Kisku, N.; Joshi, H.; Ansari, M.; Panda, S.K.; Nayak, S.; Dutta, S.C. A critical review and assessment for usage of recycled aggregate as sustainable construction material. Constr. Build. Mater. 2017, 131, 721–740. [Google Scholar] [CrossRef]
Ashrafian, A.; Taheri Amiri, M.J.; Masoumi, P.; Asadi-shiadeh, M.; Yaghoubi-chenari, M.; Mosavi, A.; Nabipour, N. Classification-Based Regression Models for Prediction of the Mechanical Properties of Roller-Compacted Concrete Pavement. Appl. Sci. 2020, 10, 3707. [Google Scholar] [CrossRef]
Thi Mai, H.-V.; Hoang Trinh, S.; Ly, H.-B. Enhancing Compressive strength prediction of Roller Compacted concrete using Machine learning techniques. Measurement 2023, 218, 113196. [Google Scholar] [CrossRef]
Zhang, G.; Hamzehkolaei, N.S.; Rashnoozadeh, H.; Band, S.S.; Mosavi, A. Reliability assessment of compressive and splitting tensile strength prediction of roller compacted concrete pavement: Introducing MARS-GOA-MCS. Int. J. Pavement Eng. 2022, 23, 5030–5047. [Google Scholar] [CrossRef]
ACI. Guide to Roller Compacted Concrete Pavements; Reported by ACI Committee 327; ACI 327R-14; American Concrete Institute: Farmington Hills, MI, UDA, 2014. [Google Scholar]
Lam, N.-T.-M.; Nguyen, D.-L.; Le, D.-H. Predicting compressive strength of roller-compacted concrete pavement containing steel slag aggregate and fly ash. Int. J. Pavement Eng. 2022, 23, 731–744. [Google Scholar] [CrossRef]
Nguyen, T.-D.; Cherif, R.; Mahieux, P.-Y.; Lux, J.; Aït-Mokhtar, A.; Bastidas-Arteaga, E. Artificial intelligence algorithms for prediction and sensitivity analysis of mechanical properties of recycled aggregate concrete: A review. J. Build. Eng. 2023, 66, 105929. [Google Scholar] [CrossRef]
Marcelino, P.; de Lurdes Antunes, M.; Fortunato, E.; Gomes, M.C. Machine learning approach for pavement performance prediction. Int. J. Pavement Eng. 2021, 22, 341–354. [Google Scholar] [CrossRef]
Sholevar, N.; Golroo, A.; Esfahani, S.R. Machine learning techniques for pavement condition evaluation. Autom. Constr. 2022, 136, 104190. [Google Scholar] [CrossRef]
Zhang, Y.; Marie d’Avigneau, A.; Hadjidemetriou, G.M.; de Silva, L.; Girolami, M.; Brilakis, I. Bayesian dynamic modelling for probabilistic prediction of pavement condition. Eng. Appl. Artif. Intell. 2024, 133, 108637. [Google Scholar] [CrossRef]
Chen, C.; Chandra, S.; Han, Y.; Seo, H. Deep Learning-Based Thermal Image Analysis for Pavement Defect Detection and Classification Considering Complex Pavement Conditions. Remote Sens. 2022, 14, 106. [Google Scholar] [CrossRef]
Hoang, N.-D. Estimating the compressive strength of roller compacted concrete using a novel swarm-optimised light gradient boosting machine. Int. J. Pavement Eng. 2023, 24, 2270765. [Google Scholar] [CrossRef]
Abhilash, P.T.; Satyanarayana, P.V.V.; Tharani, K. Prediction of compressive strength of roller compacted concrete using regression analysis and artificial neural networks. Innov. Infrastruct. Solut. 2021, 6, 218. [Google Scholar] [CrossRef]
Salimbahrami, S.R.; Shakeri, R. Experimental investigation and comparative machine-learning prediction of compressive strength of recycled aggregate concrete. Soft Comput. 2021, 25, 919–932. [Google Scholar] [CrossRef]
Peng, Y.; Unluer, C. Modeling the mechanical properties of recycled aggregate concrete using hybrid machine learning algorithms. Resour. Conserv. Recycl. 2023, 190, 106812. [Google Scholar] [CrossRef]
Debbarma, S.; Ransinchung, R.N.G.D. Using artificial neural networks to predict the 28-day compressive strength of roller-compacted concrete pavements containing RAP aggregates. Road Mater. Pavement Des. 2022, 23, 149–167. [Google Scholar] [CrossRef]
Kovačević, M.; Hadzima-Nyarko, M.; Grubeša, I.N.; Radu, D.; Lozančić, S. Application of Artificial Intelligence Methods for Predicting the Compressive Strength of Green Concretes with Rice Husk Ash. Mathematics 2024, 12, 66. [Google Scholar] [CrossRef]
Chou, J.-S.; Chen, L.-Y.; Liu, C.-Y. Forensic-based investigation-optimized extreme gradient boosting system for predicting compressive strength of ready-mixed concrete. J. Comput. Des. Eng. 2022, 10, 425–445. [Google Scholar] [CrossRef]
Nguyen, N.-H.; Abellán-García, J.; Lee, S.; Garcia-Castano, E.; Vo, T.P. Efficient estimating compressive strength of ultra-high performance concrete using XGBoost model. J. Build. Eng. 2022, 52, 104302. [Google Scholar] [CrossRef]
Chen, W.; Hasanipanah, M.; Nikafshan Rad, H.; Jahed Armaghani, D.; Tahir, M.M. A new design of evolutionary hybrid optimization of SVR model in predicting the blast-induced ground vibration. Eng. Comput. 2021, 37, 1455–1471. [Google Scholar] [CrossRef]
Hoang, N.-D.; Tran, V.-D.; Tran, X.-L. Predicting Compressive Strength of High-Performance Concrete Using Hybridization of Nature-Inspired Metaheuristic and Gradient Boosting Machine. Mathematics 2024, 12, 1267. [Google Scholar] [CrossRef]
Jamhiri, B.; Xu, Y.; Jalal, F.E.; Chen, Y. Hybridizing Neural Network with Trend-Adjusted Exponential Smoothing for Time-Dependent Resistance Forecast of Stabilized Fine Sands Under Rapid shearing. Transp. Infrastruct. Geotechnol. 2023, 10, 62–81. [Google Scholar] [CrossRef]
Jamhiri, B.; Jalal, F.E.; Chen, Y. Hybridizing multivariate robust regression analyses with growth forecast in evaluation of shear strength of zeolite–alkali activated sands. Multiscale Multidiscip. Model. Exp. Des. 2022, 5, 317–335. [Google Scholar] [CrossRef]
Chou, J.-S.; Pham, T.-B.-Q. Enhancing soil liquefaction risk assessment with metaheuristics and hybrid learning techniques. Georisk Assess. Manag. Risk Eng. Syst. Geohazards 2024, 1–19. [Google Scholar] [CrossRef]
Xiong, Q.; Xiong, H.; Kong, Q.; Ni, X.; Li, Y.; Yuan, C. Machine learning-driven seismic failure mode identification of reinforced concrete shear walls based on PCA feature extraction. Structures 2022, 44, 1429–1442. [Google Scholar] [CrossRef]
Chen, N.; Zhao, S.; Gao, Z.; Wang, D.; Liu, P.; Oeser, M.; Hou, Y.; Wang, L. Virtual mix design: Prediction of compressive strength of concrete with industrial wastes using deep data augmentation. Constr. Build. Mater. 2022, 323, 126580. [Google Scholar] [CrossRef]
Yang, L.; Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 2020, 415, 295–316. [Google Scholar] [CrossRef]
Nguyen, H.; Hoang, N.-D. Computer vision-based classification of concrete spall severity using metaheuristic-optimized Extreme Gradient Boosting Machine and Deep Convolutional Neural Network. Autom. Constr. 2022, 140, 104371. [Google Scholar] [CrossRef]
Kavzoglu, T.; Teke, A. Advanced hyperparameter optimization for improved spatial prediction of shallow landslides using extreme gradient boosting (XGBoost). Bull. Eng. Geol. Environ. 2022, 81, 201. [Google Scholar] [CrossRef]
Nguyen, H.; Vu, T.; Vo, T.P.; Thai, H.-T. Efficient machine learning models for prediction of concrete strengths. Constr. Build. Mater. 2021, 266, 120950. [Google Scholar] [CrossRef]
Khan, M.I.; Abbas, Y.M. Robust extreme gradient boosting regression model for compressive strength prediction of blast furnace slag and fly ash concrete. Mater. Today Commun. 2023, 35, 105793. [Google Scholar] [CrossRef]
Gogineni, A.; Panday, I.K.; Kumar, P.; Paswan, R.K. Predicting compressive strength of concrete with fly ash and admixture using XGBoost: A comparative study of machine learning algorithms. Asian J. Civ. Eng. 2024, 25, 685–698. [Google Scholar] [CrossRef]
Daoud, M.S.; Shehab, M.; Al-Mimi, H.M.; Abualigah, L.; Zitar, R.A.; Shambour, M.K.Y. Gradient-Based Optimizer (GBO): A Review, Theory, Variants, and Applications. Arch. Comput. Methods Eng. 2023, 30, 2431–2449. [Google Scholar] [CrossRef] [PubMed]
Debbarma, S.; Ransinchung, G.D.R.N.; Singh, S.; Sahdeo, S.K. Utilization of industrial and agricultural wastes for productions of sustainable roller compacted concrete pavement mixes containing reclaimed asphalt pavement aggregates. Resour. Conserv. Recycl. 2020, 152, 104504. [Google Scholar] [CrossRef]
Hosseinnezhad, H.; Hatungimana, D.; Yazıcı, Ş.; Ramyar, K. Mechanical properties of roller compacted concrete containing recycled concrete aggregate. Rev. Construcción 2021, 20, 277–290. [Google Scholar] [CrossRef]
Abedalqader, A.; Shatarat, N.; Ashteyat, A.; Katkhuda, H. Influence of temperature on mechanical properties of recycled asphalt pavement aggregate and recycled coarse aggregate concrete. Constr. Build. Mater. 2021, 269, 121285. [Google Scholar] [CrossRef]
Abut, Y.; Taner Yildirim, S. Structural Design and Economic Evaluation of Roller Compacted Concrete Pavement with Recycled Aggregates. IOP Conf. Ser. Mater. Sci. Eng. 2017, 245, 022064. [Google Scholar] [CrossRef]
Debbarma, S.; Ransinchung, G.D.R.N.; Singh, S. Feasibility of roller compacted concrete pavement containing different fractions of reclaimed asphalt pavement. Constr. Build. Mater. 2019, 199, 508–525. [Google Scholar] [CrossRef]
Fardin, H.E.; Santos, A.G.d. Roller Compacted Concrete with Recycled Concrete Aggregate for Paving Bases. Sustainability 2020, 12, 3154. [Google Scholar] [CrossRef]
Lopez-Uceda, A.; Ayuso, J.; Jiménez, J.R.; Galvín, A.P.; Del Rey, I. Feasibility study of roller compacted concrete with recycled aggregates as base layer for light-traffic roads. Road Mater. Pavement Des. 2020, 21, 276–288. [Google Scholar] [CrossRef]
Mahdavi, A.; Moghaddam, A.M.; Dareyni, M. Durability and Mechanical Properties of Roller Compacted Concrete Containing Coarse Reclaimed Asphalt Pavement. Balt. J. Road Bridge Eng. 2021, 16, 82–110. [Google Scholar] [CrossRef]
Modarres, A.; Hosseini, Z. Mechanical properties of roller compacted concrete containing rice husk ash with original and recycled asphalt pavement material. Mater. Des. 2014, 64, 227–236. [Google Scholar] [CrossRef]
Sheikh, E.; Mousavi, S.R.; Afshoon, I. Producing green Roller Compacted Concrete (RCC) using fine copper slag aggregates. J. Clean. Prod. 2022, 368, 133005. [Google Scholar] [CrossRef]
MathWorks. Get Started with Image Tool. MATLAB. 2021. Available online: https://www.mathworks.com/help/images/get-started-with-imtool.html (accessed on 18 January 2024).
Ashteyat, A.; Obaidat, A.; Kirgiz, M.; AlTawallbeh, B. Production of Roller Compacted Concrete Made of Recycled Asphalt Pavement Aggregate and Recycled Concrete Aggregate and Silica Fume. Int. J. Pavement Res. Technol. 2022, 15, 987–1002. [Google Scholar] [CrossRef]
Kheirbek, A.; Ibrahim, A.; Asaad, M.; Wardeh, G. Experimental Study on the Physical and Mechanical Characteristics of Roller Compacted Concrete Made with Recycled Aggregates. Infrastructures 2022, 7, 54. [Google Scholar] [CrossRef]
Settari, C.; Debieb, F.; Kadri, E.H.; Boukendakdji, O. Assessing the effects of recycled asphalt pavement materials on the performance of roller compacted concrete. Constr. Build. Mater. 2015, 101, 617–621. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Lee, S.; Park, J.; Kim, N.; Lee, T.; Quagliato, L. Extreme gradient boosting-inspired process optimization algorithm for manufacturing engineering applications. Mater. Des. 2023, 226, 111625. [Google Scholar] [CrossRef]
Cao, M.-T. Advanced soft computing techniques for predicting punching shear strength. J. Build. Eng. 2023, 79, 107800. [Google Scholar] [CrossRef]
Efron, B. Regression percentiles using asymmetric squared error loss. Stat. Sin. 1991, 1, 93–125. [Google Scholar]
Ahmadianfar, I.; Bozorg-Haddad, O.; Chu, X. Gradient-based optimizer: A new metaheuristic optimization algorithm. Inf. Sci. 2020, 540, 131–159. [Google Scholar] [CrossRef]
Rezk, H.; Ferahtia, S.; Djeroui, A.; Chouder, A.; Houari, A.; Machmoum, M.; Abdelkareem, M.A. Optimal parameter estimation strategy of PEM fuel cell using gradient-based optimizer. Energy 2022, 239, 122096. [Google Scholar] [CrossRef]
Premkumar, M.; Jangir, P.; Sowmya, R. MOGBO: A new Multiobjective Gradient-Based Optimizer for real-world structural optimization problems. Knowl. Based Syst. 2021, 218, 106856. [Google Scholar] [CrossRef]
Marzouk, M.; Elhakeem, A.; Adel, K. Artificial Neural Networks Applications in Construction and Building Engineering (1991–2021): Science Mapping and Visualization. Appl. Soft Comput. 2024, 152, 111174. [Google Scholar] [CrossRef]
Nunez, I.; Marani, A.; Flah, M.; Nehdi, M.L. Estimating compressive strength of modern concrete mixtures using computational intelligence: A systematic review. Constr. Build. Mater. 2021, 310, 125279. [Google Scholar] [CrossRef]
Hagan, M.T.; Menhaj, M.B. Training feedforward networks with the Marquardt algorithm. IEEE Trans. Neural Netw. 1994, 5, 989–993. [Google Scholar] [CrossRef]
Ly, H.-B.; Nguyen, M.H.; Pham, B.T. Metaheuristic optimization of Levenberg–Marquardt-based artificial neural network using particle swarm optimization for prediction of foamed concrete compressive strength. Neural Comput. Appl. 2021, 33, 17331–17351. [Google Scholar] [CrossRef]
B K A, M.A.R.; Ngamkhanong, C.; Wu, Y.; Kaewunruen, S. Recycled Aggregates Concrete Compressive Strength Prediction Using Artificial Neural Networks (ANNs). Infrastructures 2021, 6, 17. [Google Scholar] [CrossRef]
Drucker, H.; Burges, C.J.C.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. In Proceedings of the 9th International Conference on Neural Information Processing Systems, Denver, Colorado, 3–5 December 1996; pp. 155–161. [Google Scholar]
Naganna, S.R.; Deka, P.C. Support vector machine applications in the field of hydrology: A review. Appl. Soft Comput. 2014, 19, 372–386. [Google Scholar] [CrossRef]
Quinlan, J.R. Learning with continuous classes. In Proceedings of the Australian Joint Conference on Artificial Intelligence, Hobart, Australia, 16–18 November 1992; pp. 343–348. [Google Scholar]
Behnood, A.; Behnood, V.; Modiri Gharehveran, M.; Alyamac, K.E. Prediction of the compressive strength of normal and high-performance concretes using M5P model tree algorithm. Constr. Build. Mater. 2017, 142, 199–207. [Google Scholar] [CrossRef]
Ayaz, Y.; Kocamaz, A.F.; Karakoç, M.B. Modeling of compressive strength and UPV of high-volume mineral-admixtured concrete using rule-based M5 rule and tree model M5P classifiers. Constr. Build. Mater. 2015, 94, 235–240. [Google Scholar] [CrossRef]
Gholampour, A.; Mansouri, I.; Kisi, O.; Ozbakkaloglu, T. Evaluation of mechanical properties of concretes containing coarse recycled concrete aggregates using multivariate adaptive regression splines (MARS), M5 model tree (M5Tree), and least squares support vector regression (LSSVR) models. Neural Comput. Appl. 2020, 32, 295–308. [Google Scholar] [CrossRef]
Jain, S.; Barai, S.V. Prediction of Compressive Strength of Concrete Using M5′ Model Tree Algorithm: A Parametric Study. In Progress in Advanced Computing and Intelligent Engineering; Springer: Singapore, 2018; pp. 425–432. [Google Scholar]
XGBoost. XGBoost Documentation. 2021. Available online: https://xgboostreadthedocsio/en/stable/indexhtml (accessed on 30 December 2021).
Nguyen, V.T.; Seyedali, M. MEALPY: An open-source library for latest meta-heuristic algorithms in Python. J. Syst. Archit. 2023, 139, 102871. [Google Scholar] [CrossRef]
Singh, D.; Singh, B. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. 2020, 97, 105524. [Google Scholar] [CrossRef]
Bi, J.; Bennett, K. Regression Error Characteristic Curves. In Proceedings of the 20th International Conference on Machine Learning (ICML), Washington, DC, USA, 21–24 August 2003. [Google Scholar]
Kandiri, A.; Sartipi, F.; Kioumarsi, M. Predicting Compressive Strength of Concrete Containing Recycled Aggregate Using Modified ANN with Different Optimization Algorithms. Appl. Sci. 2021, 11, 485. [Google Scholar] [CrossRef]
MathWorks. Statistics and Machine Learning Toolbox User’s Guide; Matwork Inc.: Natick, MA, USA, 2017; Available online: https://www.mathworks.com/help/pdf_doc/stats/stats.pdf (accessed on 28 April 2018).
Jekabsons, G. M5PrimeLab—M5′ Regression Tree, Model Tree, and Tree Ensemble Toolbox for Matlab/Octave; ver. 1.8.0; Riga Technical University Institute of Applied Computer Systems: Riga, Latvia, 2020; Available online: http://www.cs.rtu.lv/jekabsons/Files/M5PrimeLab.pdf (accessed on 10 January 2023).
Wong, T.; Yeh, P. Reliable Accuracy Estimates from k-Fold Cross Validation. IEEE Trans. Knowl. Data Eng. 2020, 32, 1586–1594. [Google Scholar] [CrossRef]
Hariri-Ardebili, M.A.; Mahdavi, G. Generalized uncertainty in surrogate models for concrete strength prediction. Eng. Appl. Artif. Intell. 2023, 122, 106155. [Google Scholar] [CrossRef]
Conover, W.J. Practical Nonparametric Statistics; John Wiley & Sons: Hoboken, NJ, USA, 1999. [Google Scholar]
Mohseni, N.; Nematzadeh, H.; Akbarib, E.; Motameni, H. Outlier Detection in Test Samples using Standard Deviation and Unsupervised Training Set Selection. Int. J. Eng. 2023, 36, 119–129. [Google Scholar] [CrossRef]
Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4768–4777. [Google Scholar]
Ekanayake, I.U.; Meddage, D.P.P.; Rathnayake, U. A novel approach to explain the black-box nature of machine learning in compressive strength predictions of concrete using Shapley additive explanations (SHAP). Case Stud. Constr. Mater. 2022, 16, e01059. [Google Scholar] [CrossRef]
Tavakoli, D.; Fakharian, P.; de Brito, J. Mechanical properties of roller-compacted concrete pavement containing recycled brick aggregates and silica fume. Road Mater. Pavement Des. 2022, 23, 1793–1814. [Google Scholar] [CrossRef]
Python. Tkinter—Python Interface to Tcl/Tk. 2023. Available online: https://docspythonorg/3/library/tkinterhtml (accessed on 5 July 2023).
Karimpour, A. Effect of time span between mixing and compacting on roller compacted concrete (RCC) containing ground granulated blast furnace slag (GGBFS). Constr. Build. Mater. 2010, 24, 2079–2083. [Google Scholar] [CrossRef]
Zhang, Y.-M.; Wang, H.; Mao, J.-X.; Xu, Z.-D.; Zhang, Y.-F. Probabilistic Framework with Bayesian Optimization for Predicting Typhoon-Induced Dynamic Responses of a Long-Span Bridge. J. Struct. Eng. 2021, 147, 04020297. [Google Scholar] [CrossRef]
Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
Tran, V.D.; Hoang, N.D. A Neural Network-Based Asphalt Pavement Crack Classification Model Using Image Processing and Random Boosted Differential Flower Pollination. Int. J. Pavement Res. Technol. 2024, 17, 563–576. [Google Scholar] [CrossRef]

Figure 1. Distributions of the predictor variables.

Figure 2. Distribution of the CS.

Figure 3. Linear correlations between the predictor variables and CS.

Figure 4. Flowchart of GBO-XGBoost.

Figure 5. The GBO searching progress.

Figure 6. Results of CS estimation. (a) GBO-XGBoost, (b) SVM, (c) ANN, (d) M5-MT.

Figure 7. Performance improvement gained by GBO-XGBoost.

Figure 8. Distribution of the residual’s range. (a) GBO-XGBoost, (b) SVM, (c) ANN, (d) M5-MT.

Figure 9. Analysis based on regression error characteristic curves.

Figure 10. Distribution of model prediction results.

Figure 11. Influence of the explanation obtained from SHAP.

Figure 12. Residual histogram of GBO-XGBoost using symmetric loss function.

Figure 13. The model performance with respect to different values of γ.

Figure 14. The distribution of the residuals obtained from GBO-XGBoost using ASEL.

Figure 15. The prediction performance of the GBO-XGBoost using ASEL.

Figure 16. Comparison between ASEL and SEL-based models.

Figure 17. Graphical user interface of the proposed machine learning model.

Table 1. Summary of data sources.

Data Source	Number of Samples	Proportion (%)	Specimen Type	Ref.
1	16	5.93	100 × 200 mm cylinder	[50]
2	6	2.22	150 × 300 mm cylinder	[51]
3	21	7.78	150 × 150 × 150 mm cube	[52]
4	12	4.44	150 × 150 × 150 mm cube	[48]
5	8	2.96	150 × 300 mm cylinder	[53]
6	60	22.22	150 × 300 mm cylinder	[49]
7	75	27.78	150 × 300 mm cylinder	[20]
8	24	8.89	150 × 300 mm cylinder	[54]
9	15	5.56	100 × 100 × 100 mm cube	[55]
10	12	4.44	100 × 200 mm cylinder	[56]
11	21	7.78	150 × 300 mm cylinder	[57]

Table 2. Correlation factors for CS conversion.

Specimen	Cube	Cube	Cylinder	Cylinder
Dimension (mm)	150	100	100 × 200	150 × 300
Correlation factor	1.119	1.000	1.020	1.063

Table 3. Statistical descriptions of the variables in the dataset.

Variables	Unit	Notation	Min	Average	Std.	Skewness	Max
Quantity of cement	kg/m³	X₁	142.00	250.15	67.88	0.12	375.00
Quantity of fly ash	kg/m³	X₂	0.00	16.99	33.51	1.81	112.00
Quantity of water	kg/m³	X₃	60.20	137.84	37.40	−0.17	212.14
Type of recycled aggregate (*)	--	X₄	0.00	--	--	--	3.00
Quantity of natural coarse aggregate	kg/m³	X₅	0.00	540.51	417.46	0.04	1305.00
Quantity of natural fine aggregate	kg/m³	X₆	0.00	957.09	339.55	−1.17	1338.00
Quantity of recycled coarse aggregate	kg/m³	X₇	0.00	403.05	391.01	0.53	1264.40
Quantity of recycled fine aggregate	kg/m³	X₈	0.00	83.24	212.97	2.73	952.00
Concrete age	day	X₉	3.00	45.78	46.82	1.35	180.00
Compressive strength	MPa	Y	7.02	30.08	11.55	0.52	62.08

(*) X₄ is a categorical variable with a median of 1. The frequency of values in X₄ is demonstrated in Figure 1.

Table 4. Prediction performance.

Phase	Indices	GBO-XGBoost		SVM		ANN		M5-MT
Phase	Indices	Mean	Std	Mean	Std	Mean	Std	Mean	Std
Training	RMSE	1.247	0.077	1.250	0.059	3.322	0.637	4.034	0.377
	MAPE (%)	3.524	0.251	3.966	0.091	8.988	2.243	11.501	1.022
	R²	0.988	0.001	0.988	0.001	0.914	0.036	0.876	0.023
Testing	RMSE	2.639	0.381	4.048	1.128	4.493	0.870	5.855	0.934
	MAPE (%)	7.823	1.502	12.027	3.078	12.912	2.877	17.692	3.630
	R²	0.941	0.020	0.866	0.082	0.835	0.069	0.722	0.119

Table 5. Effect of potential outliers on GBO-XGBoost’s performance.

Phase	Indices	Scenario 1		Scenario 2
Phase	Indices	Mean	Std	Mean	Std
Training	RMSE	1.247	0.077	1.263	0.056
	MAPE (%)	3.524	0.251	3.722	0.196
	R²	0.988	0.001	0.985	0.001
Testing	RMSE	2.639	0.381	2.742	0.538
	MAPE (%)	7.823	1.502	8.382	2.093
	R²	0.941	0.020	0.921	0.036

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hoang, N.-D. Leveraging a Hybrid Machine Learning Approach for Compressive Strength Estimation of Roller-Compacted Concrete with Recycled Aggregates. Mathematics 2024, 12, 2542. https://doi.org/10.3390/math12162542

AMA Style

Hoang N-D. Leveraging a Hybrid Machine Learning Approach for Compressive Strength Estimation of Roller-Compacted Concrete with Recycled Aggregates. Mathematics. 2024; 12(16):2542. https://doi.org/10.3390/math12162542

Chicago/Turabian Style

Hoang, Nhat-Duc. 2024. "Leveraging a Hybrid Machine Learning Approach for Compressive Strength Estimation of Roller-Compacted Concrete with Recycled Aggregates" Mathematics 12, no. 16: 2542. https://doi.org/10.3390/math12162542

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Leveraging a Hybrid Machine Learning Approach for Compressive Strength Estimation of Roller-Compacted Concrete with Recycled Aggregates

Abstract

1. Introduction

2. Research Significance

3. Research Method

3.1. The Collected Dataset

3.2. Extreme Gradient Boosting Machine

3.3. Gradient-Based Optimizer

3.4. Benchmark Machine Learning Approaches

3.4.1. Artificial Neural Network

3.4.2. Support Vector Machine

3.4.3. M5-Model Tree

4. Results and Comparison

4.1. Experimental Setting

4.2. Prediction Results and Performance Comparison

4.3. Analysis of Feature Importance

4.4. Reduction of Overestimation Based on GBO-XGBoost Using Asymmetric Loss Function

5. Concluding Remarks

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI