Extreme Gradient Boosting

Extreme gradient boosting (XGBoost) inherits the combination strategy of gradient boosting decision tree (GBDT) and becomes the advanced implementation of the latter [42]. The employ of two regularization coefficients and the optimization of the second-order Taylor approximation guarantees not only the generalization ability, but also the prediction accuracy. The complexity of each base learner can be defined as:

$$
\Omega(f\_t) = \gamma T + \frac{1}{2}\lambda' ||w'||^2 \tag{1}
$$

where *γ* and *λ'* are the L1 and L2 regularization coefficients; *T* is the number of base learners; *w'* is the score of the node. Based on the fitting of the residuum of the prediction result, the prediction error of XGBoost can be further decreased. The fitting objective of each base learner can be formulated as:

$$\mathcal{L}(\varphi) = \sum\_{i} l\left(y\_{\text{pred}'}^{(i)}, y^{(i)}\right) + \sum\_{k} \Omega\left(f\_k\right) \tag{2}$$

where *l* is the loss function; *y*pred is the prediction value of the sample; *y* is the true value of the sample. Based on these, a prediction value generated by XGBoost can be expressed as the linear addition of the prediction values of all the base learners:

$$y\_{\text{pred}} = \sum\_{k=1}^{K} f\_k \tag{3}$$

where *K* is the number of base learners.

## *3.2. Prediction Results of Machine Learning Models*

The optimal hyperparameters of each ML model are obtained through the grid search method and through 10-fold cross-validation [43], which are listed in Table 2. To compare the prediction performances of different ML models, three performance measures, root mean squared error (RMSE), mean absolute error (MAE), and coefficient of determination (R2), are adopted and expressed as:

$$\text{RMSE} = \sqrt{\frac{1}{m} \sum\_{i=1}^{m} \left( y\_{\text{pred}}^{(i)} - y^{(i)} \right)^2} \tag{4}$$

$$\text{MAE} = \frac{1}{m} \sum\_{i=1}^{m} \left| y\_{\text{pred}}^{(i)} - y^{(i)} \right| \tag{5}$$

$$\mathbf{R}^2 = 1 - \frac{\sum\_{i=1}^{m} \left(\mathbb{y}\_{\text{pred}}^{(i)} - \mathbb{y}^{(i)}\right)^2}{\sum\_{i=1}^{m} \left(\mathbb{y}^{(i)} - \frac{1}{m} \sum\_{i=1}^{m} \mathbb{y}^{(i)}\right)^2} \tag{6}$$

where *m* is the number of samples.

**Table 2.** Optimal hyperparameters of ML models.


After the determination of the optimal hyper-parameters, four ML models are all established. To examine the prediction performance of ML models, five empirical models containing two design provisions [11,12], two mechanical models [8,14], and a regression analysis-based model [15] are introduced and listed in Table 3. Their prediction results are shown in Figure 5, where gray-green and blue-pink represent the prediction results of empirical models and ML models in the training set and the test set, respectively. XGBoost has the highest prediction accuracy, which indicates that XGBoost has been well-trained and possesses the best generalization ability. Such a conclusion is also in line with that of some studies [17,44]. RF and DT also have good prediction performance; their prediction tactics are suitable for the regression analysis of the punching shear resistance of RC slabcolumn joints [45]. However, the prediction performance of ANN must be improved; its

characteristic of a nonconvex function suggests that the obtained optimal solution is often local rather than global [18]. Utilizing the good fitting ability of the regression analysis method, the prediction model proposed by Chetchotisak et al. [15] has the best prediction result, but its credibility is low due to its lack of theoretical derivation. The prediction values of mechanical models proposed by Tian et al. [8] and Wu et al. [14] have a large deviation with true punching shear resistance, where coefficients reflected the relationships between influential factors and where punching shear resistance must be further modified. Furthermore, the prediction results of design provisions such as GB 50010-2010 [11] and ACI 318-19 [12] skew conservative; the prediction accuracy must be improved.


**Table 3.** Empirical models used for prediction performance comparison.

*β<sup>h</sup>* is the sectional depth influence coefficient; *ft* is the design value of the tensile strength of concrete; *b*0,0.5*<sup>d</sup>* is the critical section perimeter at a distance of 0.5*d* away from the column; *β<sup>s</sup>* is the ratio of the long side to the short side of the column; *α<sup>s</sup>* is the influential coefficient of the column type (40 for interior columns); *c* is the column size; *b*0,2*<sup>d</sup>* is the critical section perimeter at a distance of 2*d* away from the column; *L* is the perimeter of the column.

**Figure 5.** Scatter plots of the prediction results of the empirical models and the ML models: (**a**) GB 50010-2010; (**b**) ACI 318-19; (**c**) Tian et al.; (**d**) Wu et al.; (**e**) Chetchotisak et al.; (**f**) ANN; (**g**) DT; (**h**) RF; (**i**) XGBoost.

#### *3.3. Interpretation of the ML Prediction Model*

According to the performance comparison of the ML models in Section 3.2, XGBoost can be regarded as the final prediction model with the best prediction performance. The feature importance sorting produced by the built-in method of XGBoost [46], as shown in Figure 6, *d* has the greatest influence on punching shear resistance. However, this method can only provide the importance of influential factors; the effect tendency is unknown yet. Therefore, SHAP is introduced in this paper and utilized for model interpretation.

**Figure 6.** Importance sorting using XGBoost feature importance.

## 3.3.1. Overview of Shapley Additive Explanation

SHapley Additive exPlanation (SHAP) is useful for illustrating the prediction process of any ML model; it originates from the game theory and was proposed by Lundberg et al. [47,48]. For each prediction value, it can be formulated as the linear addition of the baseline value *y*base and the SHAP value of each feature *f*(*x*):

$$y\_{\text{pred}}^{(i)} = y\_{\text{base}} + \sum\_{j=1}^{n} f\left(x\_{ij}\right) \tag{7}$$

where *n* is the number of features. The quantified contribution of the feature is calculated through:

$$f\left(\mathbf{x}\_{\overline{\mathbf{y}}}\right) = \sum\_{S \subseteq N\backslash\{\overline{\mathbf{y}}\}} \frac{|S|!(M-|S|-1)!}{M!} [f\_{\mathbf{x}}(S \cup \{\overline{\mathbf{y}}\}) - f\_{\mathbf{x}}(S)] \tag{8}$$

where *N* is the *M*-dimensional set containing all of the features; *S* is the |*S*|-dimensional subset extracted from *N*; *fx*(*S*∪{*j*}) is the prediction calculated through set *S* and feature *j*; *fx*(*S*) is the prediction calculated through set *S*.

#### 3.3.2. Model Interpretation Using Shapley Additive Explanation

The importance sorting provided by SHAP is shown in Figure 7, which is calculated by sum of the SHAP values of each sample. The feature importance sorting provided by SHAP is similar to that provided by XGBoost, but they conflict on the impact of *s*. Figure 7b shows the impact of each feature on punching shear resistance as positive or negative, and a feature can be regarded as the positive influential factor if the color of dot transforms from blue to red with the increase of the SHAP value. It can be seen that *d*, *ρ*, *A*, *f'c*, *fy*, and *s* have positive impacts on resistance, and *λ* has a negative impact on resistance, which is consistent with some experimental studies [49–52]. Based on the importance sorting shown in Figure 7, the global impact of each influential factor is revealed, i.e., SHAP explains the global prediction process of XGBoost.

**Figure 7.** Global interpretation of punching shear resistance: (**a**) feature importance sorting; (**b**) SHAP value summary plot.

Figure 8 provides further insight for the impact of influential factors in the form of dependency plots, where the secondary axis represents the input variable interacting most frequently with the variable displayed in the x-axis. According to the variation range of the SHAP values, *d* and *s* have the greatest and least impacts, respectively, on punching shear resistance, which is consistent with the findings expressed by Figure 7a. Furthermore, the interaction between input variables is too complicated, such that the simple linear relationship cannot be used to represent it.

**Figure 8.** Feature dependencies of influential factors: (**a**) *s*; (**b**) *A*; (**c**) *d*; (**d**) *f'c*; (**e**) *fy*; (**f**) *ρ*; (**g**) *λ*.
