2.2.3. XGBoost

XGBoost (XGB) is a scalable, end-to-end tree boosting system which has been widely used in classification, regression and other machine learning tasks [43]. Based on Equation (1), XGBoost improves the running speed of model by using the regularized learning objective, which consists of two parts: the training loss term and regularization term, as given by:

$$Obj = \sum\_{i}^{N} l(\hat{y}\_{i\prime}y\_{i}) + \sum \Omega(f\_{k}) \tag{2}$$

where *l*(*y*ˆ*i*, *yi*) is the loss function which represents the deviation of *y*ˆ*<sup>i</sup>* (predicted value) from *yi* (true value); Ω(*fk*) represents the complexity of the model as a regularization term, which helps to control the complexity of the model and avoid overfitting; and *N* is the number of samples. In order to minimize the regularized learning objective as much as possible, Equation (2) will be minimized for multiple rounds. In each round, *ft* is added to Equation (2). The regularized learning objective of *t*-th round can be written as follows:

$$Obj^{(t)} = \sum\_{i=1}^{N} l\left(y\_{i\prime} \hat{g}\_i^{(t-1)} + f\_t(\mathbf{x}\_i)\right) + \Omega(f\_k) \tag{3}$$

The regularized learning objective can be approximated using the Taylor formula expansion:

$$\Omega \mathcal{O} \mathfrak{h}^{(t)} \cong \sum\_{i=1}^{N} \left( l\left( y\_i, \mathfrak{f}\_i^{(t-1)} \right) + \mathfrak{g}\_i f\_t(\mathbf{x}\_i) + \frac{1}{2} h\_i f\_t^2(\mathbf{x}\_i) \right) + \Omega(f\_k) \tag{4}$$

where *gi* = *∂y*ˆ(*t*−1)*l* - *yi*, *y*ˆ (*t*−1) *i* is the first gradient statistics on the loss function, *hi* = *∂*<sup>2</sup> *<sup>y</sup>*ˆ(*t*−1) *<sup>l</sup>* - *yi*, *y*ˆ (*t*−1) *i* is the second gradient statistics on the loss function. The regularized learning objective of the *t*-th round is as follows:

$$Obj^{(t)} = \sum\_{j=1}^{T} \left( G\_j w\_j + \frac{1}{2} (H\_{\hat{\jmath}} + \lambda) w\_{\hat{\jmath}}^2 \right) + \gamma T \tag{5}$$

where *Gj* = <sup>∑</sup>*i*∈*Ij gi* and *Hj* = <sup>∑</sup>*i*∈*Ij hi* are the accumulation of *gi* and *hi*, and *Ij* denote the instance of *j*-th leaf. *T* is the number of leaves in the tree. The optimal weight *wj* of the *j*-th leaf node can be determined as:

$$w\_{\circ} = -\frac{G\_{\circ}}{H\_{\circ} + \lambda} \tag{6}$$

and the corresponding optimal value of the objective function *Obj*(*t*) is given by:

$$Obj^{(t)} = -\frac{1}{2} \sum\_{j=1}^{T} \frac{G\_j^2}{H\_j + \lambda} + \gamma T \tag{7}$$

The parameter settings of XGBoost are shown in Table 2.

**Table 2.** Parameter settings of XGBoost.

