*3.2. Solution of Loss Function in the Objective Function*

In the XGBoost model, the objective function (Equation (20)) is difficult to solve by using the traditional stochastic gradient descent algorithm. In addition, the additive training boosting method is needed to solve the value, whose specific learning and training process is shown below.

$$\begin{cases} \begin{aligned} \mathcal{Y}\_{i}^{(0)} &= 0 \\ \mathcal{Y}\_{i}^{(1)} &= f\_{1}(\mathbf{x}\_{i}) = \mathcal{Y}\_{i}^{(0)} + f\_{1}(\mathbf{x}\_{i}) \\ \hat{y}\_{i}^{(2)} &= f\_{1}(\mathbf{x}\_{i}) + f\_{2}(\mathbf{x}\_{i}) = \hat{y}\_{i}^{(1)} + f\_{2}(\mathbf{x}\_{i}) \\ \dots \\ \mathcal{Y}\_{i}^{(t)} &= \sum\_{s=1}^{t} f\_{s}(\mathbf{x}\_{i}) = \mathcal{Y}\_{i}^{(t-1)} + f\_{t}(\mathbf{x}\_{i}) \end{aligned} \tag{21}$$

where *y*ˆ*<sup>i</sup>* (*t*) is the predicted value of the *t*-th round of the model, *y*ˆ*<sup>i</sup>* (*t*−1) is the predicted value of the (*t* − 1)-th round, and *ft*(**x***i*) is the prediction function added for the *t*-th round.

Substitute *y*ˆ*<sup>i</sup>* (*t*) in Equation (21) into Equation (20).

$$\{Ob\}^{(t)} = \sum\_{i=1}^{n} l\left(y\_i, \hat{y}\_i^{(t-1)} + f\_t(\mathbf{x}\_i)\right) + \Omega(f\_t) \tag{22}$$

For Equation (22), the purpose of iteration is to find the most appropriate *ft*(**x***i*) to minimize the objective function.

The XGBoost algorithm performs second-order Taylor expansion to the objective function in the optimization process, which is explained via the formula below.

$$\begin{split} Ob\boldsymbol{\hat{y}}^{(t)} &\simeq \sum\_{i=1}^{n} \Big[ \mathbb{I} \{ \mathbf{y}\_{i\cdot} \boldsymbol{\hat{y}}\_{i}^{(t-1)} \} + \boldsymbol{\mathcal{g}}\_{i} \boldsymbol{f}\_{t} (\mathbf{x}\_{i}) + \frac{1}{2} \mathbb{I} \boldsymbol{h}\_{i} \boldsymbol{f}\_{t}^{2} (\mathbf{x}\_{i}) \Big] + \boldsymbol{\Omega} (f\_{t}) + c \\ &= \sum\_{i=1}^{n} \Big[ \boldsymbol{g}\_{i} \boldsymbol{f}\_{t} (\mathbf{x}\_{i}) + \frac{1}{2} \boldsymbol{h}\_{i} \boldsymbol{f}\_{t}^{2} (\mathbf{x}\_{i}) \Big] + \boldsymbol{\Omega} (f\_{t}) + \left[ \sum\_{i=1}^{n} \boldsymbol{l} \{ \boldsymbol{y}\_{i\cdot} \boldsymbol{\hat{y}}\_{i}^{(t-1)} \} + c \right] \end{split} \tag{23}$$

where *gi*, *hi* can be defined as:

$$\begin{cases} \mathcal{g}\_i = \partial\_{\boldsymbol{\hat{y}}\_i^{(t-1)}} l(\boldsymbol{y}\_{i\prime} \boldsymbol{\hat{y}}\_i^{(t-1)}) \\\ h\_i = \partial\_{\boldsymbol{\hat{y}}\_i^{(t-1)}}^2 l(\boldsymbol{y}\_{i\prime} \boldsymbol{\hat{y}}\_i^{(t-1)}) \end{cases} \tag{24}$$

According to Equation (23), ignoring the influence of the constant value, the objective function optimized in step *t* can be simplified as:

$$\{Obj^{(t)} \simeq \sum\_{i=1}^{n} \left[ g\_i f\_l(\mathbf{x}\_i) + \frac{1}{2} h\_i f\_l^2(\mathbf{x}\_i) \right] + \Omega(f\_l) \tag{25}$$

As can be seen from Equation (25), it is *gi*, *hi* that the objective optimization parameters depend on.
