4.4.1. Parameter Setting for Cost-Sensitive Strategy

The chief aim of the cost-sensitive strategy in the CT-XGBoost model is to assign different misclassification costs to different class samples. The parameter for the cost-sensitive strategy is the penalty ratio *p*, which is the misclassification cost ratio between the default class and the non-default class. In Section 3.2.1, we set parameter *p* as *Nn Nd* , where *Nn*, *Nd* are the numbers of non-default and default samples in the training dataset, respectively. The results in Section 4.2 demonstrate that a cost-sensitive strategy in CT-XGBoost is helpful for class imbalance credit default prediction. In this section, we investigate the influence of penalty ratio *p* in the cost-sensitive strategy on the prediction performance of the CT-XGBoost model. We set the penalty ratio *p* to range from 1 to 10 with increments of 1, and also 6.21 (the imbalance ratio of the dataset). The higher *p* is, the more misclassification costs are assigned to the default class samples. For fixing the parameters of the threshold method to those in Section 3.3.2, the figure shows the results.

First, we can notice that there are fluctuations in the prediction performance at different values of penalty ratio *p*. As the value of *p* increases from 1 to 10, the curves of the four performance metrics changes with similar trends, which are roughly upward and then downward. The results suggest that the default prediction performance can be better when more misclassification costs are assigned to default class samples; at the same time, high misclassification costs may not benefit the prediction model. This means that an appreciable penalty ratio *p* is important for default prediction. As shown by the dotted line in Figure 2, the prediction performance of the CT-XGBoost model was best when the penalty ratio *p* was set as 6.21, which is the imbalance ratio in the training dataset. Thus, it is crucial

to consider the class distribution in the dataset when setting the penalty ratio *p* for the cost-sensitive method.

**Figure 2.** Performance of CT-XGBoost with different parameters for the cost-sensitive strategy.
