*3.3. Benchmark Prediction Models*

For the performance evaluation of our proposed model, we compared its default predictive ability to those of other models widely used in the literature. Thus, we constructed a statistical method with logit regression, and intelligent techniques, including support vector machine and neural network. Moreover, ensemble models, random forest and XGBoost [42], were also constructed as benchmark models. The following content will simply introduce these benchmark models, except XGBoost, which has been explained in Section 3.1.

#### 3.3.1. Logistic Regression

Logistic regression is one of the most popular models in credit default prediction due to its simplicity and interpretability [3]. Logistic regression overcomes the limitation of the linear regression model, which requires that the explained variables obey a normal distribution and be continuous. To design a failure prediction model, this method aims to estimate the probability of corporate failure based on the explanatory variables. The model can be expressed as follows:

$$P(Y=1|X) = \frac{e^{\beta\_0 + \beta\_1 X}}{1 + e^{\beta\_0 + \beta\_1 X}} \tag{11}$$

where *X* is the vector of explanatory variables, *Y* is the indicator of corporate failure, *β*<sup>1</sup> is a vector of coefficients, and *β*<sup>0</sup> is a scale parameter. The parameters *β*0, *β*<sup>1</sup> are estimated by the maximum likelihood method. With this method, we can forecast corporate failure by comparing the possibility to a threshold and further interpret the variables by the coefficients of each variable. To prevent overfitting, we apply *l*<sup>1</sup> and *l*<sup>2</sup> regularization.
