*3.1. Artificial Neural Network*

Artificial neural network (ANN) is a supervised learning algorithm that was originally used to simulate the neural network of human brain [50]. Generally, a simple ANN model consists of an input layer, one or two hidden layers, and an output layer. There are several neurons in each layer of ANN, where the number of neurons in the input layer and output layer depends on the number of input parameters and output parameters, respectively. In this paper, the number of neurons in the input layer and output layer can be seen in Table 1, which equals 6 and 1, respectively.

Except for the output layer, the neurons of each layer needs to be activated by activation function. In ANN, generally, the sigmoid function is used as activation function [51], which expression can be written as:

$$\mathbf{g}(z) = \frac{1}{1 + \mathbf{e}^{-z}} \tag{2}$$

$$z = w^\mathrm{T} \mathbf{x}^\* + b \tag{3}$$

where *b* is the bias unit; *w* denotes the weight vector of sample. According to Rumelhart et al. [52] described, when ANN is used for regression forecasting, the cost function of model as follows:

$$J(w, b) \;= \frac{1}{m} \sum\_{i=1}^{m} \left( y\_{\text{pred}}^{(i)} - y^{(i)} \right)^2 + \frac{\mathcal{C}}{2m} ||w||^2 \tag{4}$$

where *y*pred denotes the predicted value of machine learning; *C* is the regularization coefficient.

Since the cost function represents the error between the actual value and the predicted value, to improve the accuracy of predicted result, a method similar to gradient descent, in order to decrease the value of cost function, is required. The gradient descent is expressed as:

$$\begin{cases} w\_{\rangle} := \; w\_{\rangle} - \alpha \frac{\partial}{\partial w\_{\rangle}} I(w, b) \\\ b := \; b - \alpha \frac{\partial}{\partial b} I(w, b) \end{cases} \tag{5}$$

where *α* is learning rate.

Before starting the training procedure, there are some values of hyper-parameters need to be set. For neuron number of hidden layer, Shen et al. [53] mentioned a method to determine their value range, which method as follows:

$$\begin{cases} \sum\_{i=0}^{X} \mathbf{C}\_{H}^{i} > m\\ H = \sqrt{X + O} + a\\ H = \log\_{2} X \end{cases} \tag{6}$$

where *X* is the number of neurons in the input layer; *H* is the number of neurons in the hidden layer; *O* is the number of neurons in the output layer; *a* is the constant between 1 to 10. According to the Equation (6), it can be ascertained that the range of neurons in hidden layer ranges from 7 to 13.

In order to better compare the performance of models which have different neuron numbers in the hidden layer, 3 indicators are introduced, which names root mean squared error (RMSE), mean absolute error (MAE) and coefficient of determination (R<sup>2</sup> ), which are, respectively, defined as:

$$\text{RMSE} = \sqrt{\frac{1}{m} \sum\_{i=1}^{m} \left( y\_{\text{pred}}^{(i)} - y^{(i)} \right)^2} \tag{7}$$

$$\text{MAE} = \frac{1}{m} \sum\_{i=1}^{m} \left| y\_{\text{pred}}^{(i)} - y^{(i)} \right| \tag{8}$$

$$\mathbf{R}^2 = 1 - \frac{\sum\_{i=1}^{m} \left( \mathcal{Y}\_{\text{pred}}^{(i)} - \mathcal{Y}^{(i)} \right)^2}{\sum\_{i=1}^{m} \left( \mathcal{Y}^{(i)} - \frac{1}{m} \sum\_{i=1}^{m} \mathcal{Y}^{(i)} \right)^2} \tag{9}$$

The performance of ANN which have different neuron numbers in hidden layer is shown as Table 2. According to the comparative results, the number of neurons in the hidden layer *H* is determined as 8. In addition, the values of other hyper-parameters confirmed by manual adjustment are: the number of hidden layers equal 1; the training times equal 5000; the learning rate *<sup>α</sup>* equals 5 <sup>×</sup> <sup>10</sup>−<sup>5</sup> ; the regularization coefficient *C* equals 20.

**Table 2.** Performance and ranking of ANN which have different neuron numbers in hidden layer.


### *3.2. Support Vector Machine*

Support vector machine (SVM) was originated from statistical learning theory, and it was firstly proposed by Cortes and Vapnik [54] in 1995. When SVM is used for regression analysis, it can also be called support vector regression (SVR). The predicted method of SVR maps the data to high-dimensional space and use hyperplane for fitting, and these data are generally hard to fit into low-dimensional space [55]. The expression of SVR reflect the relation between input factors and predicted value [56], it can be written as:

$$y\_{\text{pred}} = w^{\text{T}} \phi(\mathbf{x}) + b \tag{10}$$

where *φ*(*x*) means feature vector after mapping. With the same as ANN, SVR also has the cost function to represents the performance of model. The cost function can be written as:

$$J(w, b, \mathfrak{f}\_{l}, \mathfrak{f}\_{l}^{\*}) = \frac{1}{2} \left\| w \right\|^{2} + \mathbb{C} \sum\_{i=1}^{m} \left( \mathfrak{f}\_{i} + \mathfrak{f}\_{l}^{\*} \right) \tag{11}$$

where *ξ* and *ξ* \* are the slack variables, they represent the errors of the sample exceeding the lower and upper bounds of the hyperplane, respectively.

To minimize the value of cost function, Lagrange multiplier method is used for transforming the original problem to its dual problem. The Lagrange multiplier method is formulated in the following form:

$$\begin{aligned} L(w, b, \lambda, \lambda^\*, \mathfrak{f}, \mathfrak{f}^\*, \mathfrak{f}, \mu^\*) &= \frac{1}{2} \|w\|^2 + \mathbb{C} \sum\_{i=1}^m \left(\mathfrak{z}\_i + \mathfrak{f}\_i^\*\right) - \sum\_{i=1}^m \mu\_i \mathfrak{z}\_i - \sum\_{i=1}^m \mu\_i^\* \mathfrak{z}\_i^\* \\ + \sum\_{i=1}^m \lambda\_i \left(y\_{\text{pred}}^{(i)} - y^{(i)} - \varepsilon - \mathfrak{z}\_i\right) + \sum\_{i=1}^m \lambda\_i^\* \left(y^{(i)} - y\_{\text{pred}}^{(i)} - \varepsilon - \mathfrak{z}\_i^\*\right) \end{aligned} \tag{12}$$

where *µ*, *µ* \* , *λ*, *λ* \* denote the Lagrange multiplier; *ε* is the maximum allowable error. After transforming, the expression of dual problem as follows:

$$\text{Maximise } \sum\_{i=1}^{m} y^{(i)} (\lambda\_i^\* - \lambda\_i) - \varepsilon (\lambda\_i^\* + \lambda\_i) - \frac{1}{2} \sum\_{i=1}^{m} \sum\_{j=1}^{m} (\lambda\_i^\* - \lambda\_i) \left(\lambda\_j^\* - \lambda\_j\right) \mathbf{x}\_i^{\mathrm{T}} \mathbf{x}\_j \tag{13}$$

subjected to *m* ∑ *i* = 1 *λ* ∗ *<sup>i</sup>* − *λ<sup>i</sup>* = 0, 0 ≤ *λ<sup>i</sup>* , *λ* ∗ *<sup>i</sup>* ≤ *C*.

The Karush-Kuhn-Tucker (KKT) conditions need to be met for Equation (13), which can be written as:

$$\begin{cases} \lambda\_i \left( y^{(i)}\_{\text{pred}} - y^{(i)} - \varepsilon - \mathfrak{f}\_i \right) = \ 0, \\\ \lambda\_i^\* \left( y^{(i)} - y^{(i)}\_{\text{pred}} - \varepsilon - \mathfrak{f}\_i^\* \right) = \ 0, \\\ \lambda\_i \lambda\_i^\* = 0, \mathfrak{f}\_i \mathfrak{f}\_i^\* = 0, \\\ (\mathbb{C} - \lambda\_i) \mathfrak{f}\_i = \ 0, \{\mathbb{C} - \lambda\_i^\*\} \mathfrak{f}\_i^\* = \ 0. \end{cases} \tag{14}$$

According to Equations (13) and (14), the weight vector *w* and bias unit *b* can be acquired. Finally, the expression of SVR as follows:

$$y\_{\text{pred}} = \sum\_{i=1}^{m} (\lambda\_i^\* - \lambda\_i) \kappa(\mathbf{x}, \mathbf{x}\_i) + b \tag{15}$$

$$\kappa(\mathbf{x}\_{i\prime}\mathbf{x}\_{j\prime}) = \phi(\mathbf{x}\_{i\prime})^{\mathrm{T}}\phi(\mathbf{x}\_{j\prime}) \tag{16}$$

where *κ*(*x<sup>i</sup> , x<sup>j</sup>* ) denotes kernel function, which can used to express the inner product of eigenvectors in high-dimensional space. The kernel function has many types, and the radial basis function (RBF) kernel function is selected in this paper, which can be expressed as:

$$\kappa(\mathbf{x}\_{i}, \mathbf{x}\_{j}) \;=\; \exp\left(-\gamma \|\mathbf{x}\_{i} - \mathbf{x}\_{j}\|^{2}\right) \tag{17}$$

where *γ* > 0 is the coefficient of RBF kernel function, which is a hyper-parameter. Except that, in this paper, the values of other hyper-parameters in SVR need to be determined: the maximum allowable error *ε*, the regularization coefficient *C*. There are many methods which can be used for numerical adjustment of hyper-parameters, in this paper we have selected particle swarm optimization.

Particle swarm optimization (PSO) is an optimized algorithm inspired by the foraging behavior of bird flock [57,58]. On account of the fact that PSO has a large number of advantages, such as being easily comprehendible and containing, it is usually used for hyper-parameter adjustment in SVM [59,60]. For every particle in PSO, there is two parameters to determine the search result, which are position *p* and velocity *v,* respectively. These 2 parameters are defined as [58]:

$$v\_{d^\*}^{t+1} = Wv\_{d^\*}^t + r\_1 \mathbb{C}\_1(P\_{d^\*}^t - p\_{d^\*}^t) + r\_2 \mathbb{C}\_2(\mathbb{G}\_{d^\*}^t - p\_{d^\*}^t) \tag{18}$$

$$p\_{d^\*}^{t+1} = \ p\_{d^\*}^t + v\_{d^\*}^{t+1} \tag{19}$$

where *t* is the number of iterations; *d* \* denotes the dimension of search space; *W* is the inertial coefficient; *C*<sup>1</sup> and *C*<sup>2</sup> are the self-learning factor and the global learning factor; *r*<sup>1</sup> and *r*<sup>2</sup> are random number between (0, 1); *P* is the location with the best fitness among all the visited locations of each particle; *G* is the location with the best fitness among all the visited locations of all the particles.

In this paper, some hyper-parameters in Equations (18) and (19) have been confirmed: the maximum of iteration equals 150; the dimension of search space *d* \* equals 3; the inertial coefficient *W* equals 0.8; the total number of particles equals 40; the self-learning factor *C*<sup>1</sup> and global learning factor *C*<sup>2</sup> are all equal to 2. The hybrid machine learning approach consists of PSO and SVR can be called PSO-SVR, whose best value of hyperparameters also can be determined by calculating: the maximum allowable error *ε* equals 10; the regularization coefficient *C* equals 4238.58; the coefficient of RBF kernel function *γ* equals 2.88.

### *3.3. Decision Tree*

Decision tree (DT) served as a supervised learning algorithm that is proposed by Quinlan [61]. In the beginning, DT only could be used for conducting the classification forecasting, and its types less such as iterative dichotomiser 3 (ID3) and classifier 4.5 (C4.5). In this paper, classification and regression tree (CART) [62] is selected to make a study of regression prediction. The dividing evidence of CART is the variance of samples, which can be written as:

$$\text{Minimise}\left[\sum\_{\mathbf{x}\_{i}\in\mathcal{R}\_{1}}\left(\mathbf{y}^{(i)}-\mathbf{c}\_{1}\right)^{2}+\sum\_{\mathbf{x}\_{i}\in\mathcal{R}\_{2}}\left(\mathbf{y}^{(i)}-\mathbf{c}\_{2}\right)^{2}\right] \tag{20}$$

where *R*<sup>1</sup> and *R*<sup>2</sup> denote the sample set after dividing; *c*<sup>1</sup> and *c*<sup>2</sup> are mean of samples in *R*<sup>1</sup> and *R*2, respectively. With the division of sample set, gradually, the purity of leaf nodes will improve.

Pruning is a method to avoid the over-fitting in DT, and it contains two basic methods: pre-pruning and post-pruning [63,64]. In this paper, the specific way of pruning is to adjust the maximum of depth, which equals 5.

### *3.4. Adaptive Boosting*

Adaptive boosting (AdaBoost) is one of the ensemble learning algorithms that can combine multiple weak learners into strong learner [65]. To be distinct from aforementioned machine learning algorithms, each sample in AdaBoost has a subsampling weight which will constantly adjust [66]. At the beginning of the training process, the weights of samples need to be initialized [67], which can be written as:

$$w\_{ki}^{\*} = \frac{1}{m} \tag{21}$$

where *w* \* *ki* denotes the weight of the *i*-th sample in the *k*-th weak learner. The largest error of the *k*-th weak learner on the training set as follows:

$$E\_k = \max \left| y^{(i)} - G\_k(x\_i) \right| \tag{22}$$

where *G<sup>k</sup>* (*xi* ) denotes the predicted value of the *i*-th sample in the *k*-th weak learner. The relative error *eki* of each sample and the error ratio *e<sup>k</sup>* of the *k*-th weak learner can be written as:

$$e\_{ki} = \frac{\left(y^{(i)} - G\_k(x\_i)\right)^2}{E\_k^2} \tag{23}$$

$$e\_k = \sum\_{i=1}^{m} w\_{ki} e\_{ki} \tag{24}$$

After that, the weight *β<sup>k</sup>* of the *k*-th weak learner can be calculated, the value of which will increase as the error ratio *e<sup>k</sup>* decreases. The weight *β<sup>k</sup>* can be defined as:

$$
\beta\_k = \frac{e\_k}{1 - e\_k} \tag{25}
$$

Afterwards, the error ratio of the next weak learner can be reckoned, and the weights of samples need to be updated:

$$w\_{k+1,i}^{\*} = \frac{w\_{ki}^{\*}}{Z\_k} \beta\_k^{1-\varepsilon\_{ki}} \tag{26}$$

$$Z\_k = \sum\_{i=1}^{m} w\_{ki} \boldsymbol{\beta}\_k^{1-\varepsilon\_{ki}} \tag{27}$$

where *Z<sup>k</sup>* is the normalization factor. Finally, the expression of the strong learner can be written as:

$$f(\mathbf{x}) = \sum\_{k=1}^{K} \left( \ln \frac{1}{\beta\_k} \right) g(\mathbf{x}) \tag{28}$$

where *K* is the total number of weak learners; *g*(*x*) is the median of all the *βkG<sup>k</sup>* (*x*). The algorithm of the weak learner needs to be confirmed, and PSO-SVR is chosen in this paper.

### *3.5. Predicted Results*

The result comparison between ANN, PSO-SVR, DT and AdaBoost are provided in Table 3. The four machine learning models shows high predicted accuracy reflected in R<sup>2</sup> . Especially AdaBoost and PSO-SVR, the predicted accuracy of the former is highest among these four AI models in training set, and the latter is in test set. It is noted that although the forecasting performance of ANN for the test set is better than DT and just slightly lower than PSO-SVR and AdaBoost, the performance for training set is much lower than the other three models.

**Table 3.** Result comparison between machine learning algorithms.


Figures 4–7 depict the scatter distribution of predicted results for each machine learning model. In Figure 4, it can be found that the fitting condition of ANN is not very

suitable when the true value is larger than 1400 kN. Maybe this situation is related to insufficient data of punching shear strength larger than 1400 kN. Therefore, it is noted that if we want to use this ANN model for predicting the punching shear strength of FRP reinforced concrete slabs, the value range of the predicted object should be restrained under 1400 kN. In addition, from the figure, it can be seen that ANN focuses more on fitting samples which are located in the interval with dense sample distribution. Consequently, a way to improve the overall predicted accuracy of ANN is to ensure that the samples are evenly distributed. It is demonstrated in Figure 5 that PSO-SVR shows the good predicted performance. Although there some predicted values have a relatively larger error between the actual values, the overall predicted accuracy is higher than ANN and DT. Figure 6 depicts the predictions of DT on the dataset; there doubt that the predicted result of DT for the test set is worse than ANN and PSO-SVR. In training set, the predictions of DT are almost no error with the punching shear strength is larger than 1200 kN, it is related to the predicted mechanism of DT [68]. Moreover, the situation also can be seen from the predicted results of DT on punching shear strength less than 1200 kN, the predicted value of which is same when the punching strength is in the identical range. Meanwhile, in test set, DT has a low predicted accuracy for samples with punching shear strength of around 1200 kN, i.e., the DT model proposed in this paper has the risk of over-fitting when the punching shear strength larger than 1200 kN. Obviously, no matter ANN or DT, the predicted accuracy of them is closely related to the number of samples in each value range. Furthermore, in this paper, it can be concluded that neither ANN nor DT has mined the true intrinsic connection between the data. Figure 7 shows the predicted results of AdaBoost for the training set and the test set, it can be seen that all of the dots are closely distributed around the best fitting line. Since PSO-SVR is chosen as the weak learner of AdaBoost, some wildly inaccurate predicted values are improved, and some comparatively accurate predicted values are retained. Moreover, compare Figure 4 to Figure 7, it is no hard to see that the fitting situation of AdaBoost reveals the best predicting performance.

**Figure 5.** The predictions of PSO-SVR for data set. (**a**) Training set; (**b**) Test set.

### *3.6. Comparison with Traditional Empirical Models*

In order to reflect the accuracy of the machine learning calculation results, this paper introduces design specifications and empirical models proposed by researchers. In this study, 3 punching shear strength models are selected from design codes and reviewed: GB 50010-2010 (2015) [14], ACI 318-19 [69] and BS 8110-97 [70]. Most design codes adopt the same function for both FRP slabs and reinforced concrete slabs. The punching shear strength of slabs as a function of the concrete compressive strength. Some design codes also take the reinforcement ratio, size effect, depth of the slab and so on into consideration. Three design specifications and three empirical models are adopted in comparative analysis between calculations. The formulas of selected traditional empirical model are shown in Table 4.

**Figure 6.** The predictions of DT for data set. (**a**) Training set; (**b**) Test set.

**Figure 7.** The predictions of AdaBoost for data set. (**a**) Training set; (**b**) Test set.

**Table 4.** Calculation formulas of traditional empirical model.


*β*<sup>h</sup> is the sectional depth influence coefficient; *f* <sup>t</sup> is the design value of tensile strength; *b*0,0.5*<sup>d</sup>* is the the perimeter of the critical section for slabs and footings at a distance of *d*/2 away from the column face; *β*<sup>s</sup> is the ratio of the long side to the short side of the sectional shape under local load or concentrated reaction force; *α*<sup>s</sup> is the influence coefficient of column type; *f* cu is the compressive strength of concrete cube; *b*0,1.5*<sup>d</sup>* is the perimeter of the critical section for slabs and footings at a distance of 1.5*d*/2 away from the loaded area; *E*<sup>s</sup> is the Young's modulus of steel bar.

Table 5 summaries experimental results of machine learning models and traditional models. Obviously, the predicted performance of AdaBoost is better than other predicted models reflected in RMSE, MAE and R<sup>2</sup> . ANN, DT and PSO-SVR, the other three AI models, although their predicted results are not best, they show a better performance compared with traditional empirical models. Among these traditional models, Formula (1) (GB 50010-2010 (2015)) has the highest predicted accuracy, the RMSE, MAE and R<sup>2</sup> of which are 150.41, 98.48 and 0.73, respectively. Notably, the predictions between Formulas (1) and (2) only has a little error; perhaps it is related to the fact that they are both derived from the eccentric shear stress model [14,69]. In addition, the forecasting performance of the model proposed by El-Ghandour et al. is better than the other traditional models proposed by other researchers. In brief, the RMSE and MAE of traditional empirical models are generally greater than 100, and the R<sup>2</sup> of them ranged from 0.6 to 0.8. However, the RMSE and MAE of machine learning models are less than 100, and the R<sup>2</sup> of them are generally greater than 0.9 and even close to 1.0. Consequently, it can be said that the difference in the predicted accuracy between traditional empirical models and machine learning models is obvious.

**Table 5.** Result comparison between machine learning and traditional empirical models.


Figure 8 shows the predicted results of traditional empirical models and machine learning models for the whole data set. To facilitate comparison between different types of models in these figures, the red dots are used to represent the predicted result of national codes. Afterwards, the blue and purple dots are used to represent the predicted results of traditional models proposed by researchers and machine learning models, respectively. It can be found that the predicted error of traditional models is relatively large, and the machine learning models reach the opposite conclusion. The forecasting result of Figure 8a is similar to Figure 8b, and this conclusion is also reflected in Table 5. In addition, although the predicted accuracy of Formulas (1) and (2) is better than other traditional empirical models, their predicted results are still unsafe. On the contrary, the predicted results of Formula (3) to Formula (6) tend to be conservative even though they do not have the best predicted performance. From the predicted performance of the machine learning models for the complete database, it can be found that the predicted results of ANN and PSO-SVR are likely to be unsafe when the punching shear strength smaller than 200 kN. On the side, when the punching shear strength larger than 1200 kN, the forecasting result of ANN is conservative. Furthermore, except formula 4, the predicted accuracy of all the empirical models are better than ANN when the punching strength larger than 1400 kN. Consequently, it can be concluded that ANN is very dependent on samples.

**Figure 8.** *Cont*.

**Figure 8.** The predictions of models for the whole data set. (**a**) Formula (1); (**b**) Formula (2); (**c**) Formula (3); (**d**) Formula (4); (**e**) Formula (5); (**f**) Formula (6); (**g**) ANN; (**h**) PSO-SVR; (**i**) DT; (**j**) AdaBoost.

### **4. Model Interpretations**

### *4.1. Shapley Additive Explanation*

In this study, SHapley Additive exPlanation (SHAP) is applied to explain the predicting results given by the AdaBoost model. SHAP originates from game theory and is an

additive model, i.e., the output of the model is a linear addition of the input variables [71]. The expression of SHAP can be defined as:

$$y\_{\text{pred}}^{(i)} = y\_{\text{base}} + \sum\_{j=1}^{n} f(x\_{ij}) \tag{29}$$

where *y*base is the baseline value of model; *n* is the total number of input variables; *f*(*xij*) is the SHAP value of the *xij*.

According to Equation (29), it can be concluded that the predicted value of each sample depends on the SHAP value of each feature in the sample. In other words, each feature is a contributor for final predicted value. The SHAP value can be positive or negative, depending on the influence trend of each feature to predicted result. SHAP contains several explainers, in current machine learning papers it often uses the tree explainer to illustrate its models, which is only suitable for a part of machine learning algorithms. Consequently, the kernel explainer will be used to illustrate the machine learning model which has the best predicted performance in this paper.

### *4.2. Global Interpretations*

According to the experimental results obtained above, the predicted results of AdaBoost, which has the best predicted performance, was explained. The predicted results of AdaBoost can be interpreted by SHAP in different ways. Firstly, the feature importance analysis of input parameters is shown in Figure 9a. It can be seen that the importance of each input parameter is non-negligible, that is each input parameter will have a different degree of influence on the predicted results. This is consistent with the existing experimental conclusions [4,72]. According to the analysis results, it can be understood that the SHAP value of each parameter are *x*<sup>1</sup> equals 53.45, *x*<sup>2</sup> equals 44.50, *x*<sup>3</sup> equals 126.36, *x*<sup>4</sup> equals 21.57, *x*<sup>5</sup> equals 17.23, *x*<sup>6</sup> equals 52.96, respectively. Obviously, the slab's effective depth (*x*3) has the most critical influence on the punching shear capacity, more than twice as much as types of column section (*x*1). Moreover, though the Young's modulus of FRP reinforcement (*x*5) has the least importance on the predicted results, sometimes ignoring this feature will cause unnecessary errors in the predicted results of model. Consequently, using these 6 features as the input parameters of machine learning algorithm is reasonable.

**Figure 9.** Global interpretations of PSO-SVR model by SHAP values. (**a**) SHAP feature importance; (**b**) SHAP summary plot.

Figure 9b is a SHAP summary plot of the features, which displays the distribution of the SHAP values of each input variable in each sample and demonstrates the overall influence trends. In the figure, the x-axis is the SHAP value of input parameters, the y-axis is the input parameters, ranked by importance. The points in the figure represent the samples in the dataset and their colors represent the feature value, for which color from blue to red represents the value from small to large. From the figure, it can be seen that a few of SHAP values of samples in *x*<sup>3</sup> are very high, which are around 600. It can be said that in these samples, the value of *x*<sup>3</sup> has decisive effect on the predicted results. In these input variables, the slab's effective depth (*x*3), types of column section (*x*1), reinforcement ratio (*x*6), cross-section area of column (*x*2), Young's modulus of FRP reinforcement (*x*5) all have the positive influence on the punching shear strength. However, the influence trend of the compressive strength of concrete (*x*4) on the predicted results is hard to definite, which is very complex.

### *4.3. Individual Interpretations*

Except the global interpretations, SHAP also can provides the individual interpretation for each sample in the dataset. In this study, 2 samples are selected, the predicted process of which are illustrated as Figure 10. In the figure, the gray line represents the baseline value, which equals 403.62, and the colorful lines represents the decisive process of each feature. In Figure 10, it can be seen that from the baseline value, every feature will change the value to a different degree, and finally predicted values 419.27 and 501.00, respectively, will be obtained. In the predicted process of the 2 samples, types of column section (*x*1), slab's effective depth (*x*3), the compressive strength of concrete (*x*4) have negative influence on predicted results; whereas, the cross-section area of column (*x*2), Young's modulus of FRP reinforcement (*x*5), reinforcement ratio (*x*6) have positive influence on it. Furthermore, reinforcement ratio (*x*6) acts as determinant in forecasting process of sample 1, and the cross-section area of column (*x*2) plays an essential role in sample 2. In summary, both the global interpretations and the individual interpretations give a detailed insight into the process by which each input feature affects the final predicted value.

**Figure 10.** Individual interpretations for selected samples. (**a**) Sample 1: Specimen G(1.6)30/20; (**b**) Sample 2: Specimen G(1.6)45/20-B.

## *4.4. Feature Dependency*

In addition to explaining the predicted mechanism of model, SHAP can also reveal the interaction between the specific feature and its most closely related feature, which is shown in Figure 11. It can be seen from the Figure that with the values increasing of *x*1, *x*2, *x*3, *x*6, their SHAP values will also be increasing, but the relationships between *x*4, *x*<sup>5</sup> and their SHAP values are complicated and nonlinear. As the values of *x*4, *x*<sup>5</sup> increase, the SHAP values of them will increase or decrease; thus, it is difficult to describe the concrete relationships. However, the one that interacts most closely with *x*1, *x*2, *x*5, *x*<sup>6</sup> is *x*3, the features most closely related to *x*<sup>3</sup> and *x*<sup>4</sup> are *x*<sup>2</sup> and *x*6, respectively. According to the results of feature dependency, further experiments can be carried out to analyze the relationship between the two variables that interact most closely, and analyze the collaborative impact of these two variables to output. It might be helpful to improve the punching shear resistance of FRP reinforced concrete slab-column connections, and the traditional empirical models

cannot accomplish this because the traditional models, relatively, have the worse predicted performance.

**Figure 11.** Feature dependence plots. (**a**) *x*<sup>1</sup> ; (**b**) *x*<sup>2</sup> ; (**c**) *x*<sup>3</sup> ; (**d**) *x*<sup>4</sup> ; (**e**) *x*<sup>5</sup> ; (**f**) *x*<sup>6</sup> .

## **5. Conclusions**

In this paper, several machine learning algorithms, ANN, SVR, DT and AdaBoost, are adopted for punching shear strength of FRP reinforced concrete slabs. A total of 121 groups of experimental tests are collected and the test variables are divided into 6 inputs (types of column section, cross-section area of column, slab's effective depth, compressive strength of concrete, Young's modulus of FRP reinforcement, reinforcement ratio) and 1 output (punching shear strength). Random subsets of 80% and 20% of the data were used for training and testing, respectively, and 10% of the data in training set were used for validating. In addition, 3 indicators (RMSE, MAE, R<sup>2</sup> ) are introduced to assist the predicted

performance. For improving the predicted performance, the empirical method and PSO are utilized for adjusting the values of hyper-parameters in ANN and SVR, respectively. After tuning, both ANN and SVR display the good predicted performance. According to the experimental results, the best performance model is AdaBoost, the RMSE, MAE, R<sup>2</sup> of which are 29.83, 23.00, 0.99, respectively. The predicted performance of ANN, DT and PSO-SVR are worse than AdaBoost, but they are all better than the 6 traditional empirical models introduced in this paper. The predicted result of ANN for training set has a relatively large deviation when the punching shear strength exceeding 1400 kN, but DT has almost no error. Among these traditional models, GB 50010-2010 (2015) has the least deviation reflected in RMSE and R<sup>2</sup> , the values of them are 150.41 and 0.73, respectively. Moreover, the model proposed by El-Ghandour et al. (1999) has the best forecasting performance among three traditional models proposed by researchers.

SHAP is used to interpret the predicted process of AdaBoost with the best performance in this study. On the basis of the result of global interpretations, it can be known that every input variable has the different degrees of influence on predicted results, which demonstrates the rationality of input variables selection. Furthermore, the slab's effective depth has the highest impact factor on the predicted results, equaling 126.36, more than twice as much as types of column section, which equals 53.45. In addition, the Young's modulus of FRP reinforcement has the lowest impact factor, equaling 17.23. From the SHAP summary plot, it can be understood that all input variables have positive influence on predicted results except the compressive strength of concrete, it is because its influence trend is relatively fuzzy. According to the individual interpretations, the forecasting process of single sample is shown, which can help researchers to understand the predicted process of model roundly. In addition, SHAP also provides the results of feature dependency, which is helpful to researchers in understanding the relationship between the specific feature and its most closely related feature. According to the analysis results of SHAP, the relationship between material properties and punching shear strength may really be revealed.

**Author Contributions:** Conceptualization, Y.S. and S.L.; software, Y.S. and J.S.; validation, Y.S. and S.L.; formal analysis, S.L.; writing—original draft preparation, Y.S.; writing—review and editing, S.L.; visualization, Y.S.; supervision, S.L.; project administration, S.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Science Foundation of Zhejiang Province of China, grant number LY22E080016; National Science Foundation of China, grant number 51808499.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** This study is supported by the Engineering Research Centre of Precast Concrete of Zhejiang Province. The help of all members of the Engineering Research Centre is sincerely appreciated. We would also like to express our sincere appreciation to the anonymous referee for valuable suggestions and corrections.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Appendix A Test Data of FRP Reinforced Concrete Slabs**

The sources of the database and the specific material properties are listed in Table A1. The input variables are the types of column section *x*1, cross-section area of column *x*<sup>2</sup> (*A*/cm<sup>2</sup> ), slab's effective depth *x*<sup>3</sup> (*d*/mm), compressive strength of concrete *x*<sup>4</sup> (*f'*c/MPa), Young's modulus of FRP reinforcement *x*<sup>5</sup> (*E*f/GPa), and reinforcement ratio *x*<sup>6</sup> (*ρ*f/%), respectively. The output variable is the experimental value of punching shear strength *y* (*V*/kN). The column section has three types: square (*x*<sup>1</sup> = 1), circle (*x*<sup>1</sup> = 2), rectangle (*x*<sup>1</sup> = 3). In order to use the experimental data for comparative analysis between models, the concrete compressive strength of columns with different cross-section types (cylinder

and cube) was unified. Furthermore, to prevent errors caused by different data precision, retain two significant digits to the decimal point for input parameters except for column section type *x*1.


**Table A1.** Database of punching shear strength for FRP reinforced concrete slabs.

Gouda et al. [42,43] (2016)

Hussein et al. [44] (2018)

**Reference Specimen** *x***<sup>1</sup>** *x***<sup>2</sup>** *x***<sup>3</sup>** *x***<sup>4</sup>** *x***<sup>5</sup>** *x***<sup>6</sup>** *y* Xiao [39] (2010) A 1 225.00 130.00 22.16 45.60 0.42 176.40 B-2 1 225.00 130.00 32.48 45.60 0.42 209.40 B-3 1 225.00 130.00 32.40 45.60 0.55 245.30 B-4 1 225.00 130.00 32.80 45.60 0.29 166.60 B-6 1 225.00 130.00 33.20 45.60 0.42 217.20 B-7 1 225.00 130.00 28.32 45.60 0.42 221.50 C 1 225.00 130.00 46.05 45.60 0.42 252.50 Bouguerra et al. [9] (2011) G-200-N 3 1500.00 155.00 49.10 43.00 1.20 732.00 G-175-N 3 1500.00 135.00 35.20 43.00 1.20 484.00 G-150-N 3 1500.00 110.00 35.20 43.00 1.20 362.00 G-175-H 3 1500.00 135.00 64.80 43.00 1.20 704.00 G-175-N-0.7 3 1500.00 135.00 53.10 43.00 0.70 549.00 G-175-N-0.35 3 1500.00 137.00 53.10 43.00 0.35 506.00 C-175-N 3 1500.00 140.00 40.30 122.00 0.40 530.00 Hassan et al. [40] (2013) G(0.7)30/20 1 900.00 134.00 34.30 48.20 0.71 329.00 G(1.6)30/20 1 900.00 131.50 38.60 48.10 1.56 431.00 G(0.7)45/20 1 2025.00 134.00 45.40 48.20 0.71 400.00 G(1.6)45/20 1 2025.00 131.50 32.40 48.10 1.56 504.00 G(0.3)30/35 1 900.00 284.00 34.30 48.20 0.34 825.00 G(0.7)30/35 1 900.00 284.00 39.40 48.10 0.73 1071.00 G(0.3)45/35 1 2025.00 284.00 48.60 48.20 0.34 911.00 G(0.7)45/35 1 2025.00 281.50 29.60 48.10 0.73 1248.00 G(1.6)30/20-H 1 900.00 131.00 75.80 57.40 1.56 547.00 G(1.2)30/20 1 900.00 131.00 37.50 64.90 1.21 438.00 G(1.6)30/35 1 900.00 275.00 38.20 56.70 1.61 1492.00 G(1.6)30/35-H 1 900.00 275.00 75.80 56.70 1.61 1600.00 G(0.7)30/20-B 1 900.00 131.00 38.60 48.20 0.73 386.00 G(1.6)45/20-B 1 2025.00 131.00 39.40 48.10 1.56 511.00 G(0.3)30/35-B 1 900.00 284.00 39.40 48.20 0.34 781.00 G(1.6)30/20-B 1 900.00 131.00 32.40 48.10 1.56 451.00 G(0.3)45/35-B 1 2025.00 284.00 32.40 48.20 0.34 1020.00 G(0.7)30/35-B-1 1 900.00 281.00 29.60 48.10 0.73 1027.00 G(0.7)30/35-B-2 1 900.00 281.00 46.70 48.10 0.73 1195.00 G(0.7)37.5/27.5-B-2 1 1406.25 209.00 32.30 48.20 0.72 830.00 Nguyen-Minh et al. [10] (2013) GSL-0.4 1 400.00 129.00 39.00 48.00 0.48 180.00 GSL-0.6 1 400.00 129.00 39.00 48.00 0.68 212.00 GSL-0.8 1 400.00 129.00 39.00 48.00 0.92 244.00 Elgabbas et al. [41] (2016) S2-B 3 1500.00 167.00 48.81 64.80 0.70 548.30 S3-B 3 1500.00 169.00 42.20 69.30 0.69 664.60 S4-B 3 1500.00 167.00 42.20 64.80 0.70 565.90 S5-B 3 1500.00 167.00 47.90 64.80 0.99 716.40 S6-B 3 1500.00 167.00 47.90 64.80 0.42 575.80 S7-B 3 1500.00 167.00 47.90 64.80 0.42 436.40 GN-0.65 1 900.00 160.00 42.00 68.00 0.65 363.00 GN-0.98 1 900.00 160.00 38.00 68.00 0.98 378.00

**Table A1.** *Cont.*

GN-1.30 1 900.00 160.00 39.00 68.00 1.13 425.00 GH-0.65 1 900.00 160.00 70.00 68.00 0.65 380.00 G-00-XX 1 900.00 160.00 38.00 68.00 0.65 421.00 G-30-XX 1 900.00 160.00 42.00 68.00 0.65 296.00 R-15-XX 1 900.00 160.00 40.00 63.10 0.65 320.00

H-1.0-XX 1 900.00 160.00 80.00 65.00 0.98 461.00 H-1.5-XX 1 900.00 160.00 84.00 65.00 1.46 541.00


**Table A1.** *Cont.*
