*Article* **A Hybrid Algorithm-Level Ensemble Model for Imbalanced Credit Default Prediction in the Energy Industry**

**Kui Wang 1, Jie Wan 2,3, Gang Li <sup>4</sup> and Hao Sun 2,3,\***


**Abstract:** Credit default prediction for the energy industry is essential to promoting the healthy development of the energy industry in China. While previous studies have constructed various credit default prediction models with brilliant performance, the class-imbalance problem in the credit default dataset cannot be ignored, where the numbers of credit default cases are usually much smaller than the number of non-default ones. To address the class-imbalance problem, we proposed a novel CT-XGBoost model, which adds to XGBoost with two algorithm-level methods for class imbalance, including the cost-sensitive strategy and threshold method. Based on the credit default dataset consisting of energy corporates in western China, which suffers from the class-imbalance problem, the CT-XGBoost model achieves better performance than the conventional models. The results indicate that the proposed model can efficiently alleviate the inherent class-imbalance problem in the credit default dataset. Moreover, we analyze how the prediction performance is influenced by different parameter settings in the cost-sensitive strategy and threshold method. This study can help market investors and regulators precisely assess the credit risk in the energy industry and provides theoretical guidance to solving the class-imbalance problem in credit default prediction.

**Keywords:** credit default prediction; energy industry; class imbalance; cost-sensitive; threshold method
