Solar photovoltaic (PV) electricity generation is growing rapidly in China. Accurate estimation of solar energy resource potential (
Rs) is crucial for siting, designing, evaluating and optimizing PV systems. Seven types of tree-based ensemble models, including classification and regression trees (CART),
[...] Read more.
Solar photovoltaic (PV) electricity generation is growing rapidly in China. Accurate estimation of solar energy resource potential (
Rs) is crucial for siting, designing, evaluating and optimizing PV systems. Seven types of tree-based ensemble models, including classification and regression trees (CART), extremely randomized trees (ET), random forest (RF), gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), gradient boosting with categorical features support (CatBoost) and light gradient boosting method (LightGBM), as well as the multi-layer perceotron (MLP) and support vector machine (SVM), were applied to estimate
Rs using a k-fold cross-validation method. The three newly developed models (CatBoost, LighGBM, XGBoost) and GBDT model generally outperformed the other five models with satisfactory accuracy (R
2 ranging from 0.893–0.916, RMSE ranging from 1.943–2.195 MJm
−2d
−1, and MAE ranging from 1.457–1.646 MJm
−2d
−1 on average) and provided acceptable model stability (increasing the percentage in testing RMSE over training RMSE from 8.3% to 31.9%) under seven input combinations. In addition, the CatBoost (12.3 s), LightGBM (13.9 s), XGBoost (20.5 s) and GBDT (16.8 s) exhibited satisfactory computational efficiency compared with the MLP (132.1 s) and SVM (256.8 s). Comprehensively considering the model accuracy, stability and computational time, the newly developed tree-based models (CatBoost, LighGBM, XGBoost) and commonly used GBDT model were recommended for modeling
Rs in contrasting climates of China and possibly similar climatic zones elsewhere around the world. This study evaluated three newly developed tree-based ensemble models of estimating
Rs in various climates of China, from model accuracy, model stability and computational efficiency, which provides a new look at indicators of evaluating machine learning methods.
Full article