*3.4. Performance Evaluation for Credit Default Prediction*

To assess the out-of-sample performances of credit default prediction models, we adopted the split methods used by previous research [5]. We randomly divided the dataset into a training dataset and test dataset in an 80% to 20% ratio. Due to the class imbalance of the dataset in which non-default samples represent the majority group, we used a stratified sampling method for splitting to ensure the same population structure of training data and testing data.

Considering that credit default prediction is a binary classification problem, we can evaluate the out-of-sample performance via the metrics for classification models. One common metric is *overall accuracy,* defined as follows:

$$\text{Overall accuracy} = \frac{TP + TN}{TP + FN + FP + TN} \tag{12}$$

where *TP* (true positive) is the number of default companies which are correctly classified as default; *FN* (false negative) is the number of default companies which are wrongly classified as non-default; *TN* (true negative) is the number of non-default companies which are correctly classified as non-default; and *FP* (false positive) is the number of non-default companies that are wrongly classified as default.

Given the class-imbalance problem for the credit default prediction, the prediction performances of the two classes needed to be evaluated separately. For this purpose, *type I accuracy* and *type II accuracy* were taken into account. *Type I accuracy* (or sensitivity) is defined as the proportion of default samples predicted by the model correctly, and *type II accuracy* (or specificity) is defined as the proportion of non-default samples correctly predicted by the model.

$$Type\text{ }I\text{ }accuracy = \frac{TP}{TP + FN} \tag{13}$$

$$Type\ II\ accuracy = \frac{TN}{TN + FP} \tag{14}$$

Moreover, the area under the receiver operating characteristic curve (AUC) is a popular estimation of a classification model's overall performance [5]. The ROC curve is a graph consisting of two-dimensionality, on which one axis is the true positive rate (sensitivity) and the other axis is the false positive rate (1-specificity). While changing the default probability threshold, the curve would plot each point representing the true positive rate and false positive rate. For the reason that AUC is a part of the unit square area, its value shall always range from 0 to 1.0 [37]. In addition, AUC should be more than 0.5 for the model to be realistic, and the closer it is to 1, the better the prediction performance of the default prediction model.

#### **4. Empirical Results**
