4.4.2. Parameter Setting for Threshold Method

The role of threshold setting in CT-XGBoost is to classify samples with the predicting default probabilities into two groups. The sample is considered as default when the default probability is higher than the threshold, and non-default in reverse. As mentioned in Section 3.2.2, we set the threshold value as the default probability value of the *Nd*-th sample in the training dataset. In practice, the threshold determination is useful for controlling credit risk, and the creditors, such as banks, can control the number of debtors by adjusting the threshold for deciding whether to approve a loan. A higher threshold value means more applicants will be considered as non-default and approved for a loan. Meanwhile, the creditors will face higher credit risk. Therefore, investigating the influence of threshold setting on the default prediction performance is very important. In this studies, we varied the threshold value according to the predicting probabilities of samples in the training dataset. For fixing the penalty ratio *p* to the optimal value of 6.21, the results are presented in Figure 3.

We can see that as the threshold increases from 0 to 1, the curve of *type I accuracy* shows a downward trend, and the curves of *type II* and *overall accuracy* show similar upward trends. These results demonstrate that the prediction performance can be significantly influenced by threshold setting. When setting a lower threshold value, more potential credit defaults can be identified, but more true non-default cases can be mis-considered as default. In addition, the three curves in Figure 3 intersect when the threshold value is 0.74; default and non-default samples can be identified equally accurately. When the threshold value increases based on 0.74, the *type I accuracy* decreases rapidly, but the *type II accuracy* increases slightly. Thus, it is proper to set the threshold to around 0.74 in this case.

Moreover, the creditor can find the optimal threshold value based on its credit risk tolerance ability. When the creditor has weak credit risk tolerance ability, the threshold can be set low to obtain a high *type I accuracy*, which means the majority of potential default applicants are identified. However, we should notice that the *type II accuracy* can be low caused by a low threshold, which means a large number of risk-free clients would be turned away. To avoid losing huge benefits, assuming that the creditor can tolerate about 10% of default cases, the proper *type II accuracy* threshold is about 0.74, which means that about 89% of potential credit default applicants can be accurately identified. At the same time, the *type II accuracy* can be limited to 89%, which means that the creditor would only lose about 11% of free-risk clients.
