3.2.2. Threshold Method

Considering the default prediction is essentially a binary classification, a threshold is crucial to be set to determine the predicted default probability should be divided into which category. Corporates with default probabilities higher than the threshold are regarded as default class, and those with default probabilities lower than the threshold are regarded as the non-default class.

However, most of the previous prediction methods simply set 0.5 as the threshold, which is not suitable for imbalanced data [42]. For instance, if the default probability generated by the prediction model is a uniform distribution of [0, 1] and the threshold is set as 0.5, half of the samples will be classified as a default class, which results in many nondefault samples being misclassified. Thus, how to set a rational threshold is an important problem for default prediction.

In the CT-XGBoost model, we set a rational threshold which equals the *Nd*-th highest default probability in the training dataset. After the default probability of the testing dataset is predicted, corporates with default probabilities higher than the threshold are classified as default corporates, and those with default probabilities lower than the threshold are classified as non-default corporates.
