**1. Introduction**

Modern artificial intelligence (AI) approaches are effective for evaluating the difficult problems of the engineering domain. By using these techniques, the output end products can be predicted with a set of input factors. Single-model-based standalone and ensemble, i.e., AdaBoost and bagging, methods are the two primary methods of machine learning (ML) that are used for predicting the properties of concrete. As per the available literature, the prediction performance of ensemble methods is better than the individual machine learning algorithm. Chaabene et al. [1] predicted the mechanical characteristics of concrete by employing ML approaches. Likewise, abundant literature is available on the utilization of ML for predicting different concrete types, such as recycled aggregates [2–5], self-healing [6], materials-integrated [7], and high-performance [8–12] concretes. In a study conducted by Han et al. [9] for predicting high-performance concrete strength via ML approaches, the considered input parameters were i. cement, ii. fine and coarse aggregates, iii. water,

**Citation:** Anjum, M.; Khan, K.; Ahmad, W.; Ahmad, A.; Amin, M.N.; Nafees, A. New SHapley Additive ExPlanations (SHAP) Approach to Evaluate the Raw Materials Interactions of Steel-Fiber-Reinforced Concrete. *Materials* **2022**, *15*, 6261. https://doi.org/10.3390/ ma15186261

Academic Editor: Krzysztof Schabowicz

Received: 8 August 2022 Accepted: 1 September 2022 Published: 9 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

iv. fly ash, v. ground-granulated blast furnace slag, and vi. The ageing period. The study concluded with the highly precise prediction of high-performance concrete strength using the developed ML model. The toughness, ductility, resistance to cracks, mechanical properties, and fatigue resistance of concrete can be enhanced by adding fibers [13–25]. Incorporating steel fibers in cementitious composites can enhance their post-cracking behavior and toughness [26–29]. Different fiber types, such as steel, and artificial and natural fibers, have been explored in various studies for their potential application as construction materials [25,30–35]. In SFRC, additional estimation factors regarding regular concrete are considered, such as the aspect ratio of steel fibers, their type, and the volumetric percentage content. However, the development of appropriate estimation models for SFRC is relatively new. Accordingly, conventional regression models (linear and nonlinear) are employed to determine SFRC flexural strength (FS).

The properties of different concrete types can now be precisely predicted by applying ML approaches. Significant effort, time, and cost are needed during experimental investigations. Hence, to save time it is necessary to develop data-modelling-based algorithms to identify closely linked independent parameters. The necessity is to employ AI techniques to estimate the properties of novel concrete types. ML techniques to predict SFRC FS are an effective alternative to save the cost, time, and effort required for the experimental setup. Accordingly, in the current work, the FS of steel-fiber-reinforced concrete (SFRC) is predicted by using artificial intelligence-based machine learning methods. Subsequently, in this work, the employment of ensemble ML models, such as gradient boosting, AdaBoost, XG Boosting, and bagging ensembled ML approaches, is done to achieve the study objectives. Moreover, the application of statistical checks is also done for the testing of models in addition to the comparison of all the applied models [36–39]. A model with the best performance is proposed based on performance due to applied statistical checks for the prediction of SFRC properties. Afterwards, a game theory approach [40], named SHapley Additive exPlanations (SHAP), is also employed to obtain an enhanced description of applied ML models by global features influences classification and interactions/dependencies. A novel knowledge era is identified by this method in terms of SFRC ingredients' influences on FS. It would assist the researcher's ability to identify adequate SFRC mix combinations and quickly estimate its FS without even performing experimental procedures for trials. It would also aid the upcoming research for the strategical development of SFRC with innovative mechanical properties based on various limitations such as resource availability in the form of cost, material, time, and FS requirements for multiple construction projects.

This study is conducted to extract the effective ML approach to estimate the FS of SFRC precisely. The precise prediction of concrete characteristics would help one to obtain the durable structures' economical, effective, and efficient design, ultimately reducing the time for selecting adequate materials and the resources, cost, and time. Furthermore, the SHAP analysis is conducted for depicting raw ingredients' influence on SFRC FS, which has not been performed yet by the previous scholars and is the novelty of this work. The suggested prediction approaches would also assist scholars in the civil engineering field in developing new materials.

#### **2. SHapley Additive ExPlanations (SHAP)**

Moreover, in this work, the identification of global feature impacts and the relations of all the input features with FS of steel-fiber-reinforced concrete, based on game theory model (i.e., SHAP analysis) [41], is made for broadening the explainability of the suggested algorithm. In the procedure of SHAP analysis, each instance prediction is explicated by quantifying the features contribution by using SHapley values, attained by the employment of coalition of game theory. The average of all possible combinations for every feature value is taken to calculate the SHapley value. The SHapley values depict a direct relation with the feature influence. The global feature influence values are quantified by averaging all of the SHapley values of every database feature. Later, the descending order sorting, in terms of importance, for all values is done to draw a plot. A solitary point on the plot

represents the SHapley value for each individual feature and instance. The X-axis shows the SHapley values and the y-axis portray feature importance. The position on the y-axis is directly related to feature influence on steel-fiber-reinforced concrete, where a color scale is used to indicate the feature importance. The plots of SHAP-feature dependence represents the interaction with/impact on steel-fiber-reinforced concrete, having colored the depiction for interactions. More elaborated information can be attained by using this method than partial dependence typical plots [40]. The feature importance (*j*) for model *f* outcome; *φj* (*f*) is the assigned weight against feature contribution summation for output of model *f*(*xi*) for overall likely feature mixtures [42]. *φ<sup>j</sup>* (*f*) is represented via Equation (1), as presented below:

$$\phi^j(f) = \sum\_{S \subseteq \{\mathbf{x}^1, \dots, \mathbf{x}^p\} / \{\mathbf{x}^j\}} \frac{|S|!(p-|S|-1)!}{p!} \left( f\left(S \sqcup \left\{\mathbf{x}^j\right\} \right) - f(S) \right) \tag{1}$$

*S* = subset of features; *xj* = *j* feature; and *p* = the number of features in model.

The SHAP technique determines the feature importance by quantifying the errors for prediction while distressing a specific feature value. The estimated error sensitivity is used for assigning weights to feature importance while perturbing its value. The trained ML model performance is explained by using SHAP. SHAP employs a method, i.e., input linear factors addition model demonstration, that is interpretable and is considered by the output of the model. For example, a model having input factors *xi*, where the range of *i* is from 1 to *k*, *k* shows the number of input factor, and *h* (*xs*) shows model explanation havng *xs* as an input, where Equation (2) is applied for the depiction of an original model *f*(*x*):

$$f(\mathbf{x}) = h(\mathbf{x}\_s) = \mathcal{Q}\_0 + \sum\_{i=1}^p \mathcal{Q}\_i \mathbf{x}\_s^i \tag{2}$$

where

*p* = number of input feature;

∅0= constant with no input.

The mapping function, i.e., *x* = *mx*(*xs*), has a relationship with input *x* and *xs* parameters. In the literature [43], Equation (2) is presented, where the prediction value, i.e., (*<sup>h</sup>* ()), was enhanced in terms of ∅0, ∅1, and ∅3, with an observed decrement of *<sup>h</sup>* () in terms of ∅4, as presented in Figure 1. Three desired characteristics are included in Equation (2), in terms of consistency, local accuracy, and missingness. The reduction minus the attribution is ensured by consistency, that is, allocated to a relevant feature as a feature change of significant influence. In missingness, it is ensured to have no value for importance assigned to the features that are missing, such as ∅*<sup>i</sup>* = 0 is employed in terms of *<sup>x</sup><sup>i</sup> <sup>s</sup>* = 0. As far as local accuracy is concerned, it is ensured that the summation of features attribution will be taken as a function for the outcome, which requires a model to tie the outcome as a simplified input *xs* for *f* . *x* = *mx xs* denotes the local precision achievement.

**Figure 1.** Attributes of SHAP analysis [44].

#### **3. Dataset**

The adopted dataset for estimating the FS of SFRC is depicted in Figure 2. The said dataset includes 151 mix designs with nine input parameters and is attained from the literature [34,45–60]. The factors cement (kg/m3), water (kg/m3), sand (kg/m3), coarse aggregate (kg/m3), superplasticizer (%), silica fume (%), Vf (%), fiber length (mm), and fiber diameter (mm). The variables for estimation in case of FS, which is considered as an output parameter in the current study, are based on input parameters. These variables are illustrated in Figure 2. Anaconda software's Python and Spyder scripting are deployed for the estimation [61]. The histogram for FS being utilized in this work is presented in Figure 3.

**Figure 2.** Input and output parameters.

**Figure 3.** FS distribution.

### **4. Results and Analysis**

#### *4.1. Decision Tree Adaptive Boosting*

The experimental and AdaBoost algorithm estimated values comparison for FS of SFRC is shown in Figure 4. Outcomes in the case of AdaBoost are reasonable, having less variation for SFRC FS. The 0.90 R<sup>2</sup> value depicts the AdaBoost model's suitability. Figure 5 shows the experimental and AdaBoost estimated error values distribution for SFRC FS. Figure 5 is plotted with an error difference between the experimental and predicted values on the Z-axis, while the X-axis shows the experimental values and Y-axis presents the predicted values. The error of experimental and estimated AdaBoost algorithm values for FS is 3.41 MPa, and 43% of values are less than 1 MPa, 39% of values are among 1 to 2 MPa, and 18% of values are more than 2 MPa.

**Figure 4.** Experimental and AdaBoost predicted results.

**Figure 5.** Estimated AdaBoost and experimental values, with errors.

#### *4.2. Decision Tree Bagging*

Figure 6 depicts the comparison of the bagging model experimental and predicted error values in the case of SFRC FS. The bagging R<sup>2</sup> of 0.91 indicates highly precise outcomes than the AdaBoost model. Figure 7 illustrates the distribution of error in the case of experimental and bagging estimated values against SFRC FS. It may be noted that the error between experimental and estimated bagging algorithm values is 43% below 1 MPa; 43% is in the range from 1–2 MPa, followed by 13% values that are higher than 2 MPa. Higher R<sup>2</sup> with a lesser error value for the bagging algorithm exhibits higher precision than AdaBoost.

**Figure 6.** Experimental and bagging predicted results.

**Figure 7.** Estimated bagging and experimental values, with errors.

#### *4.3. Gradient Boosting*

Figure 8 represents gradient boosting predicted and experimental values for the output parameter of SFRC. The R2 of 0.92 indicates highly accurate gradient boosting outcomes as compared to the bagging model. Furthermore, it is a highly accurate model among all the other considered models. Figure 9 illustrates the experimental and bagging predicted errors distribution. It is noted that less than 1 MPa, there are 48% values; 50% from 1 to 2 MPa; and above 2 MPa, there are 2% values. In comparison with AdaBoost, the gradient boosting is more precise.

**Figure 8.** Experimental and gradient boosting predicted results.

**Figure 9.** Estimated gradient boosting and experimental values, with errors.

#### *4.4. Extreme Gradient Boosting*

Figure 10 illustrates the experimental and estimated extreme gradient boosting values for SFRC considered output parameter. The R2 of 0.87 for extreme gradient boosting depicts lesser accuracy of outcomes than all other considered algorithms. The experimental and extreme gradient boosting predicted values error distribution for FS of SFRC is presented in Figure 11. Here, 50% of the values are below 1 MPa, 43% are 1 to 2 MPa, and the remaining 7% are above 2 MPa. Lower R<sup>2</sup> and more error values portray unacceptable outcomes of the extreme gradient boosting algorithm than bagging, AdaBoost, and gradient boosting. However, the bagging model's low error and higher R<sup>2</sup> values are adequate and depict accurate prediction. Therefore, as per these findings, it can be said in the case of bagging that it may predict outcomes more accurately than all the considered models.

**Figure 10.** Experimental and estimated extreme gradient boosting outcomes.

**Figure 11.** Estimated extreme gradient boosting and experimental values, with errors.

#### *4.5. Comparison of All Models*

The model's validity is evaluated during the execution by applying the k-fold crossvalidation technique. The performance of models is assessed with the help of statistical checks [36–39]. Generally, the splitting of data in a grouping of 10 for attaining the arbitrary scattering in k-fold cross-validation, and the ten-time repetition of this process, is done to obtain satisfactory outcomes. Table 1 illustrates the statistical checks for all the models. The R2 of 0.92, 0.87, 0.90, and 0.91 in the case of gradient, extreme gradient, adaptive boosting, and bagging models, as represented in Figure 12a–d. The MAE and RMSE are calculated by employing Equations (3) and (4), from the previous studies [36–39]. It is observed that the gradient boosting has a lower error and higher R2 values compared to all other considered models for SFRC flexural strength.

$$\text{MAE} = \frac{1}{n} \sum\_{i=1}^{n} |\mathbf{x}\_i - \mathbf{x}| \tag{3}$$

$$\text{RMSE} = \sqrt{\sum \frac{\left(y\_{pred} - y\_{ref}\right)^2}{n}} \tag{4}$$

where *n* = the total number of data, *x*, *yref* = reference values of the data, and *xi*, *ypred* = predicted model values.

**Table 1.** Extreme gradient boosting, bagging, and AdaBoost model statistical checks.


SFRC FS is estimated by applying ensembled ML techniques in the current study, which is focused on providing reliable and efficient outcomes. The 0.92 R2 value for a gradient boosting result with the lowest MAE and RMSE have offered more precise estimations for the FS of SFRC. Out of 20 sub-models, an optimized model for SFRC FS prediction, as presented in Figure 13a–d, the ensembled ML gradient-boosting model has superior performance in terms of MAE (1.07) and RMSE (1.34). Therefore, it is depicted that, among all other models, the ensembled ML gradient-boosting model has provided the highest accuracy and lowest error.

**Figure 12.** *Cont.*

**Figure 12.** (**a**) AdaBoost; (**b**) bagging; (**c**) gradient boosting; and (**d**) extreme gradient boosting statistical representation.

**Figure 13.** *Cont.*

**Figure 13.** (**a**) AdaBoost; (**b**) bagging; (**c**) gradient boosting; and (**d**) extreme gradient boosting sub models' outcomes.

#### *4.6. Enhanced Explainability for Machine Leaning Algorithms*

A detailed explanation of machine learning algorithms and features' relations are presented in this work. At the start, by applying the SHAP tree explainer to the entire dataset, an enhanced illustration for influences of global features by incorporating SHAP explanations is also discussed. The SHAP method is applied [62]. The determination of the tree-based models' internal structure is carried out with this method, summing up calculations set that is inter-connected with a leaf node of the tree model, resulting in low-order complexity [62]. The model interpretation is conducted for SFRC FS by using SHAP. The relation of different features with SFRC flexural strength is represented by SHAP values (Figure 14).

#### **Figure 14.** SHAP plot.

It is observed that the volumetric content of steel fiber has a maximum SHAP value for SFRC FS estimation as metallic fibers provide the effect of sewing, ultimately enhancing mechanical characteristics. Therefore, enhancing the steel fibers content would develop more SFRC FS, proving its positive influence. Figure 14 illustrates that the 2nd highest SHAP value is for water content. However, it has a negative influence, which means that increasing the amount of water causes decreased flexural properties of SFRC. In SFRC, the particle packing density theory is the basis of strength; hence, the requirement is to opt for limited water content in this case. Similarly, in the third, the coarse aggregates content also negatively influences SFRC FS. Then, the cement positively impacts the SFRC FS, which means that enhancement in cement content would increase the strength and vice versa. Further, the SFRC properties are significantly influenced by silica fume. Sand, however, shows both positive and negative influences, depending upon the content. Other features such as steel fibers diameter and length and super-plasticizer also have some minor but unique influences on SFRC FS.

The different features' interaction with SFRC FS is depicted in Figure 15. The cement feature interaction is shown in Figure 15a. The amount of cement has major direct impact on SFRC FS. Figure 15b illustrates the negative impact of water for SFRC. Increasing the water content leads to decreasing SFRC FS, leading to a decreasing trend. The sand feature interaction is provided in Figure 15c. The sand content also represents a negative impact on SFRC. However, up to 800 kg/m3, it is not very effective on the FS of SFRC. Beyond this content of 1300 kg/m3, it causes a reduction in strength. This might be because more cement paste would be required to coat a larger surface area of sand particles in case of more sand content, ultimately leaving less cement to be accounted for in terms of strength development. Then, in a row, the super-plasticizer content feature depicts both negative and positive interactions, depending upon optimum content (Figure 15d). Up to 3% of the content contributes towards strength enhancement; however, beyond this content, it causes a reduction in strength. The steel fiber volumetric content feature positively influences up to 2% content (Figure 15e), showing its direct relationship with SFRC FS. Similarly, steel fiber length also positively influences and directly relates to SFRC FS, as is evident from Figure 15f. The greater length of steel fibers would enhance the SFRC flexural properties by providing a more effective bridging mechanism.

**Figure 15.** Interaction plot: (**a**) cement; (**b**) water; (**c**) sand; (**d**) superplasticizer; (**e**) Vf; and (**f**) fiber length.

This prediction is based on the database utilized in the current study and focuses on strength prediction. However, the interaction between fiber length and diameter is found based on a limited data set in this study, and more accurate findings can be obtained by including more data points in the future. By expanding the number of data points, importing a slightly higher number of mixes, and taking into consideration a larger number of input factors (fiber length and diameter), a far more accurate model may be constructed

for interaction. To improve the number of data points and outcomes in future studies, it is suggested that experimental work, field testing, and numerical analysis employing a range of approaches be implemented. The limitations of machine learning methods for estimating the strength properties of concrete have already been documented in a previous study [63].

#### **5. Conclusions**

Nowadays, the construction industry is focused on utilizing artificial intelligence (AI) approaches to estimate the mechanical properties of concrete. The main focus of this research is to evaluate the accuracy of AI approaches for predicting SFRC FS, in addition to exploring the raw components effect on SFRC flexural strength, which have not been studied yet and constitute a research gap. Nine estimation input parameters are considered, and their interaction is analyzed. Based on the conducted study, the following conclusions are drawn:

The gradient-boosting model's higher R<sup>2</sup> value of 0.92 depicts a highly precise estimation of flexural strength of SFRC out of the actual data, where the extreme gradient boosting, bagging, and AdaBoost have 0.87, 0.90, and 0.91 R2 values, respectively, in SFRC flexural strength prediction within an acceptable range. Twenty sub-models that range between 10–200 estimators are used to optimize the prediction of SFRC flexural strength. The most effective and accurate forecast for SFRC flexural strength emerged for gradient boosting rather than for the other considered algorithms.

The higher R2 and lower MAE and RMSE values for SFRC FS prediction from gradient boosting are evident from k-fold cross validation findings. Therefore, it can be claimed as the prediction model with the highest precision for flexural strength of SFRC.

Statistical checks such as MAE and RMSE are also applied to evaluate the models' performance. Here again, the higher coefficient of determination and lower error values in gradient boosting for SFRC flexural strength prediction show their superiority.

Hence, it can be concluded that gradient boosting is the best technique for predicting SFRC flexural strength.

The volumetric content of steel fiber has the highest influence on SFRC flexural strength, followed by the contents of cement, water, and coarse aggregates, as revealed through SHAP observations. Contrary to this, super-plasticizer content has a minimal impact on SFRC flexural strength.

The feature interaction plot portrays that cement content is a major and positive influencing feature on SFRC flexural strength.

**Author Contributions:** K.K.: conceptualization, funding acquisition, project administration, writing, reviewing, and editing. W.A.: conceptualization, data curation, software, methodology, investigation, validation, supervision, and writing—original draft. M.A.: methodology, investigation, supervision, writing, reviewing, and editing. A.A.: visualization, methodology, software, writing, reviewing, and editing. M.N.A.: resources, validation, writing, reviewing, and editing. A.N.: data curation, software, writing, reviewing, and editing. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia (Grant No. 1,321), through its KFU Research Summer Initiative.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All data is available in the paper.

**Acknowledgments:** The authors acknowledge the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia (Grant No. 1,321), through its KFU Research Summer Initiative.

**Conflicts of Interest:** The authors declare no conflict of interest.
