Next Article in Journal
South Anze Structure and Its Control on Coalbed Methane Aggregation in the Qinshui Basin and the Mechanism of Syncline Gas Enrichment in the Qinshui Basin
Next Article in Special Issue
Exploring the Influence of Innovation and Technology on Climate Change
Previous Article in Journal
Efficient Energy Management for the Smart Sustainable City Multifloor Manufacturing Clusters: A Formalization of the Water Supply System Operation Conditions Based on Monitoring Water Consumption Profiles
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Forecasting the Return of Carbon Price in the Chinese Market Based on an Improved Stacking Ensemble Algorithm

1
School of Management, University of Science and Technology of China (USTC), Jinzhai Road, Hefei 230026, China
2
New Finance Research Center, International Institute of Finance, University of Science and Technology of China (USTC), Guangxi Road, Hefei 230026, China
*
Authors to whom correspondence should be addressed.
Energies 2023, 16(11), 4520; https://doi.org/10.3390/en16114520
Submission received: 9 April 2023 / Revised: 24 May 2023 / Accepted: 29 May 2023 / Published: 4 June 2023

Abstract

:
Recently, carbon price forecasting has become critical for financial markets and environmental protection. Due to their dynamic, nonlinear, and high noise characteristics, predicting carbon prices is difficult. Machine learning forecasting often uses stacked ensemble algorithms. As a result, common stacking has many limitations when applied to time series data, as its cross-validation process disrupts the temporal sequentiality of the data. Using a double sliding window scheme, we proposed an improved stacking ensemble algorithm that avoided overfitting risks and maintained temporal sequentiality. We replaced cross-validation with walk-forward validation. Our empirical experiment involved the design of two dynamic forecasting frameworks utilizing the improved algorithm. This incorporated forecasting models from different domains as base learners. We used three popular machine learning models as the meta-model to integrate the predictions of each base learner, further narrowing the gap between the final predictions and the observations. The empirical part of this study used the return of carbon prices from the Shenzhen carbon market in China as the prediction target. This verified the enhanced accuracy of the modified stacking algorithm through the use of five statistical metrics and the model confidence set (MCS). Furthermore, we constructed a portfolio to examine the practical usefulness of the improved stacking algorithm. Empirical results showed that the improved stacking algorithm could significantly and robustly improve model prediction accuracy. Support vector machines (SVR) aggregated results better than the other two meta-models (Random forest and XGBoost) in the aggregation step. In different volatility states, the modified stacking algorithm performed differently. We also found that aggressive investment strategies can help investors achieve higher investment returns with carbon option assets.

1. Introduction

In recent years, with the aggravation of global warming, the topic of carbon dioxide emissions has attracted widespread attention. Governments worldwide have implemented numerous mitigation tools to address this challenge [1]. China, the second largest economy globally, produces the greatest carbon emissions, which means that it is of great significance to accurately predict the trend of its carbon return and grasp the fluctuation characteristics of its carbon markets [2]. More specifically, accurate expected carbon return provides a scientific basis for investors and regulators in the carbon market, reduces market risks, and effectively promotes the healthy development of the carbon financial market [3]. Therefore, accurate prediction of carbon returns takes priority in academic research and practical applications, from the perspectives of both guiding investment and environmental protection.
Multi-model integration is a technical approach to improve model performance by integrating the results of multiple models. With the great popularity of machine learning in the forecasting field, ensemble learning, as the most frequently used multi-model integration technique in machine learning applications, has attracted more and more attention. Bagging, boosting, and stacking are three classic ensemble learning algorithms, which are the foundation for a series of ensemble algorithms. The core idea of bagging and boosting is to change the way training data are fed to the model to construct a better model, while stacking uses cross-validation to collect and integrate the results of different models, which focuses more on using the diversity of different models in capturing sample information. However, the common stacking algorithm uses cross-validation, aiming to avoid overfitting, but, at the same time, this process also disrupts the temporal sequentiality of the samples. Therefore, some necessary improvements are required to adopt the stacking algorithm for time series prediction.

1.1. Literature Review

In order to clearly show the progress of the relevant work, the literature review is partitioned into some sub-sections. The first sub-section provides a detailed overview of the current work on carbon market prediction and its limitations. The second sub-section introduces the development of ensemble learning, especially the stacking algorithms. Finally, we summarize the feasibility of combining carbon price prediction with the stacking algorithm based on the literature review.

1.1.1. Progress in Carbon Market Prediction

The prediction of carbon return (in this study, carbon return refers to the log return of carbon prices, and the calculation follows Equation (29)) is complex and challenging work. Fan et al. [4] enumerated the chaotic characteristics of the carbon market and summarized that the carbon price is dynamically nonlinear, non-stationary, and abundantly noisy. Many scholars have made attempts in different ways to obtain more accurate predictions. The first is to elaborate the predicting model frameworks to boost the predicting performance based on historical data, which is regarded as the modeling paradigm in carbon market prediction studies. They must focus on fitting the dynamic characteristics of the carbon price, by itself, more accurately. According to our survey, the predictive models for carbon market prediction can be divided into three types of methods, which are traditional statistical or econometric methods, single-model machine learning methods, and hybrid models based on decomposition and integration.
Traditional statistical and econometric methods, such as ARIMA [5], GARCH-type models [6,7], and HAR-RV [8], are simple, effective, and lightweight, but can only reflect linear changes and have high requirements for data distribution. With the increasing popularity of artificial intelligence technology in recent years, many scholars have tried to apply machine learning models to estimate the variability of the carbon market. There is increasing evidence to show that machine learning methods outperform other models for nonlinear time series prediction [9]. Compared with the traditional model, the machine learning-based predictive models, such as Support Vector Machine (SVM) [10], Artificial Neural Network (ANN) [11], Convolutional Neural Network (CNN), and Long Short-Term Memory network (LSTM) [12], have greater forecasting accuracy when applied to carbon market prediction. The shortcomings of these models are embodied in the results being highly dependent on parameter tuning, which can be blamed for overfitting the nonlinearity of data. These shortcomings were recognized as inherent and difficult to overcome until the work of Ji et al. [12], which combined the machine learning method and traditional econometric method into the same forecasting framework to promote strengths and palliate weaknesses.
As for the hybrid model, most of them follow three steps, which are time series decomposition, separate forecast, and result aggregation. In different cases, the choice of method for the three steps varies, but they all aim to bring a competitive accuracy to carbon market prediction. Some advanced signal decomposition methods, such as EMD [13], VMD [14], and CEEMDAN [3], have been introduced to decompose the original time series into several independent series of simple patterns, and then different prediction methods are used to predict the decomposed sequences separately. Some researchers have contributed to finding a more suitable prediction model for the second step; for example, Qin et al. [15] innovatively adopted Local Polynomial Prediction (LPP) and Sun and Duan [16] used the improved Extreme Learning Machine (ELM). Thirdly, the final result is obtained by integrating the predicted results through different strategies [17,18]. According to current studies, the hybrid model produces the most stable and accurate predictions. However, using such a hybrid model has the potential risk of losing essential information or introducing overwhelming noise during the process of decomposition. As of yet, a perfect solution to solving the overfitting problem has not been proposed.
Additionally, Fan et al. [4] have pointed out that the carbon market is affected by the market mechanism, climate agreements, climate change, economic situation, and other factors, showing a trend of instability and fluctuation. Despite this, in most of the current carbon prediction literature, multi-variable prediction has received little attention, and the potential predictive power of related variables is ignored [19]. The introduction of multi-variable techniques into carbon return prediction has great potential and needs to be explored. Current multivariate prediction of the carbon market is mostly related to energy commodities [20,21]. A notable exception is Tan et al. [19], who comprehensively assessed the predictive power of 53 commodity and financial predictors related to European carbon futures return. These works consider carbon-related variables, but the prediction model they chose still has room for improvement.
According to previous studies [22,23], the relationship between the return and the volatility is close. At present, there is a considerable amount of literature focusing on the volatility of the carbon market. Benz and Trück [24] constructed a stochastic model using Markov switching and AR-GARCH models, as well as in-sample and out-of-sample predictive analysis. Their model captures features such as skewness and excessive kurtosis in carbon price volatility and, in particular, distinguishes different stages of volatility in returns. Byun and Cho [7] explored the ability of the GARCH-type model, implied volatility, and K-nearest neighbour method to predict carbon price volatility and proved that the GARCH-type model has the best effect, based on empirical results. Segnon et al. [6] reviewed the price volatility models, ranging from simple GARCH-type models to recently popular volatility models with long-term dependence and state transitions. For investors in the market, carbon returns can better reflect the profits generated by carbon assets. Thus, how to use volatility information to ferret out the potential carbon return is an unstudied but attractive direction.

1.1.2. Development of Ensemble Learning

The core idea of ensemble learning is to aggregate multiple base learners into a strong learner with superior generalization performance by combining strategies. Dasarathy and Sheela [25] proposed a composite classifier system consisting of two or more component classifiers of different types, which is widely recognized as the origin of ensemble learning. Schapire [26] proposed the boosting algorithm, which converts a weak learner into a strong learner. The stacking algorithm was proposed by Wolpert [27], in which the core idea is to aggregate the results of multiple base models through a complex level-2 model. Breiman [28] proposed the bagging algorithm, which aggregates the results of various models trained by subsamples. These three classic ensemble algorithms laid a solid foundation for the development of ensemble learning in the future.
In general, there are three main differences between ensemble algorithms: the process to feed training data, the ways to generate individual learners, and the combination strategies. These three aspects also represent the directions in which ensemble learning researchers can innovate and improve the algorithm. For the stacking algorithm, the main innovations of the previous studies concentrated on the selection and generation of base learners, the optimization of combination strategies, and the extension of applications.
Impressive work on base learners for stacking ensemble algorithms includes the following: Ding and Wu [29] used an artificial bee colony algorithm to construct the base learners, and their improved stacking ensemble algorithm performs well on multiple datasets, which proves the successful introduction of the bee colony algorithm. Bakurov et al. [30] chose four predictive models of different types as primary models to maintain the diversity in order to improve the generalization performance. Agarwal and Chowdary [31] proposed an improved stacking algorithm called A-stacking. They clustered the training set and then selected the results of the best base learner in each cluster as the input to the level-2 meta-learner.
Many researchers have introduced different combination strategies to optimize the common stacking algorithm. Varshini et al. [32] used generalized linear models, decision trees, Support Vector Machines, and Random Forests as meta-learners for the combination step. Lacy et al. [33] compared the differences between using linear and nonlinear models as the meta-learner. Menahem et al. [34] proposed an improved stacking model called “Troika”, and their main work was to add a third layer to further aggregate the results of the meta-learner. Pari et al. [35] added a middle layer to combine the results of base learners, and then used the combined results as input for the meta-learner.
Due to its excellent performance, stacking has been applied in various fields, including computer science [36], medicine [37], engineering [38], and finance [39]. However, the application of the stacking algorithm for predicting carbon market changes has not attracted much attention, even though it has a wide field of application with good prospects.

1.1.3. Literature Review Summary

Firstly, current research on carbon market forecasting is aimed at finding the best model to measure and predict the dynamic changes in the carbon market. However, as mentioned above, it is difficult to break through the inherent limitations of a specific forecast model, whether for machine learning models or statistical models. The hybrid model can, to some extent, take advantage of multiple models, but its potential risks mentioned above cannot be ignored. Stacking is a popular ensemble algorithm in machine learning, which mitigates overfitting while integrating multiple models. Compared to the hybrid model, the stacking algorithm has no decomposition step in the integration process, so there is no risk of information leakage or noise introduction. The idea of introducing stacking is a good attempt to overcome the drawbacks of hybrid models, but considering the problem of cross-validation failure on time series data prediction, some improvements are necessary.
Secondly, the current innovative work on the common stacking algorithm focuses on the generation of base learners and the improvement of combination strategies, but there are few improvements for cross-validation, which is necessary for applying stacking to time series data prediction and is of great significance for extending its application area.
Finally, the multiple variables related to carbon prices and hidden information in carbon price volatility have not received much attention in carbon market forecasting, which would help a lot in building more reliable and systematic forecasts.

1.2. Objectives and Contributions

In this paper, we focus on several core issues, which are reflected in the following three questions:
  • How to modify the common stacking algorithm to maintain the sequentiality of the time series training data and at the same time improve the predictive power?;
  • How to apply the improved stacking ensemble algorithm to carbon return prediction?;
  • How to evaluate the improved algorithm’s practical power and provide investment guidance based on the results?
For the first question, we elaborately designed a double sliding window scheme to replace the cross-validation scheme of the common stacking algorithm. The improved stacking ensemble algorithm uses a sliding window scheme in both the base learner training phase and the meta-learner aggregation phase. This improvement ensures that the time series sample are sequential in order and, similar to cross-validation, the dataset is divided and fed to the base learner, thus effectively avoiding overfitting.
For the second question, we designed two forecasting frameworks based on the improved stacking algorithm for the empirical experiment. One is a homogeneous ensemble framework that combines model selection methods based on factor-augmented regression with Random Forest (RF), Support Vector Regression (SVR), and eXtreme gradient boosting (XGBoost) in machine learning. The other is a heterogeneous ensemble framework, which incorporates seven forecasting models as base learners: MMA (Mallows Model Average), LASSO regression, Ridge regression, E-net regression, Random Forest, Support Vector Regression, and XGBoost. The three popular aggregation models of machine learning are used as the meta-learners of the framework. With 34 carbon price-related variables as features and the return of carbon price as the target variable, we constructed six ensemble models based on the improved stacking algorithm.
For the final question, we introduced a new perspective to examine the performance of the improved algorithm by dividing the carbon return series into turmoil and tranquil states, which made the evaluation more practically valid. To implement this purpose, we adopted the SWARCH (Markov-switching GARCH) model to divide the carbon market into “high” and “low” volatility states, and the subsequent assessments were distinguished into two parts. Five statistical accuracy metrics and the model confidence set (MCS) were used to evaluate the performance of the improved algorithm in improving accuracy. In addition, we constructed a systematic portfolio experiment to verify the economic impact of the improved stacking algorithm on different volatility states. As far as we know, this study is the first work to apply the carbon return prediction model for practical usage and examine it under different volatile levels.
The main contributions of this paper can be briefly summarized as follows:
  • This study innovatively improves the common stacking algorithm for better application to time series forecasting, and the results show that the modified algorithm can significantly improve the accuracy and increase the economic gain;
  • The two ensemble forecasting frameworks we constructed are robust and accurate for predicting carbon price return;
  • We novelly explored the predictability of carbon option returns using the stacking ensemble algorithm from a statistical and economic perspective, and the characteristics of carbon assets we obtained are very enlightening to relevant practitioners and academics;
  • We linked the carbon return forecast with the volatility of the carbon market, opening up a new perspective to capture the variations of predictability of returns under different market conditions.
The rest of the paper is organized as follows. Section 2 includes the innovative work of this study and introduces the algorithm, the forecasting models involved, and the evaluation criteria. Section 3 provides a brief introduction and exploratory analysis of the data. Section 4 presents the empirical results, including accuracy metrics results, MCS analysis, and portfolio results. The research conclusions and future work are discussed in Section 5.

2. Methodology and Models

2.1. Improvement on the Stacking Ensemble Algorithm

2.1.1. The Common Stacking Algorithm

The core idea of the stacking ensemble algorithm is to divide the training process into two layers (two levels). First, multiple individual learners (called base learners) are trained in the first layer (level-1) using the original dataset. Next, the output of the base learners is used as input features for the learners in the second layer (level-2). The learner called meta-learner in the second layer is used for aggregation. The stacking algorithm uses a complex model to aggregate results instead of the simple strategy of averaging or voting used in most ensemble learning algorithms, thus further reducing the bias and variance and improving the generalizability. Figure 1 shows the workflow of the common stacking ensemble algorithm.
However, there is a high risk of overfitting if all the original datasets are directly used to train the base learner to generate the input for the meta-learner. Therefore, the common stacking algorithm includes a step of k-fold cross-validation.
The stacking algorithm with cross-validation first divides the original data into a training set D and a test set D t e s t , then generates the training set of the meta-learner on the training set D by k-fold ways, and, finally, evaluates the performance of the ensemble model on the test set D t e s t . Figure 2 shows the topological structure of the stacking algorithm with cross-validation.
To better understand the stacking algorithm with k-fold cross-validation, the process of the algorithm is next described in two specific steps. (The pseudocode of the algorithm is given in Algorithm 1). According to Algorithm 1, it is assumed that the base learners consist of N different models ( η 1 , η 2 , , η N ) and the meta-learner is η .
Training with k-fold cross-validation: Divide the original training set D = { ( x 1 , y 1 ) ,   ( x 2 , y 2 ) , , ( x m , y m ) } into k similarly sized and disjointed sets D 1 , D 2 , , D k , and for any u v , D u D v = Ø . Let D j and D ¯ j = D D j denote the test set and training set of the j-th fold, respectively. Next, train the base learners η 1 , η 2 , , η N on D ¯ j and use the trained base learners to make predictions on D j . Assuming that a certain sample of the test set D j is x i , a set of N predictions for x i is generated, denoted as x i = ( η 1 ( j ) ( x i ) , η 2 ( j ) ( x i ) , , η N ( j ) ( x i ) ) . When j gradually increases from 1 to k, (i.e., after k iterations) the predictions corresponding to all samples on the training set D are obtained, and—taking x i as the features’ input to the meta-learner and y i as the target variable—the new training set D = { ( x i , y i ) } i = 1 m of the meta-learner is generated.
Prediction on test set: During training, as j increases from 1 to k, each base learner η t is trained k times using different D ¯ j , while each trained base learner makes predictions on D j and, at the same time, makes out-of-sample predictions for x on D t e s t , generating k predictions ( η n ( 1 ) ( x ) , η n ( 2 ) ( x ) , , η n ( k ) ( x ) ) . Then, take the average of these k results as the final prediction of the base learner η n for the sample x on the test set D t e s t :
η ¯ n ( x ) = 1 k j = 1 k η n ( j ) ( x )
Therefore, for N base learners, N predictions are generated for sample x on D t e s t , and η ¯ 1 ( x ) , η ¯ 2 ( x ) , , η ¯ N ( x ) are used as N feature inputs to the trained meta-learner; thus, the final predictions of the algorithm for samples on D t e s t are obtained.    
Algorithm 1: Common stacking algorithm with k-fold cross-validation
Input: Training set: D = ( x 1 , y 1 ) ( x 2 , y 2 ) , , ( x m , y m ) , Test set D t e s t = ( x , y )
            Base learners: η 1 , η 2 , , η N
            Meta-learner: η
Output: H ( x ) = η η ¯ 1 ( x ) , η ¯ 2 ( x ) , , η ¯ N ( x )
1: level-1: Train the base learners with k-fold cross-validation
2: for  j = 1 , , k  do
3:      Divide D into D j and D ¯ j = D D j
4:      for  n = 1 , , N  do
5:         Train base learner η n on D ¯ j η n ( j ) ( D D j )
6:      end for
7: end for
8: for  x i D j  do
9:      Use the trained base learner to make prediction for x i η n ( j ) ( x i )
10: end for
11: for  x D t e s t  do
12:      Use the trained base learner to make prediction for x η n ( j ) ( x )
13: end for
14: level-2: Train the meta-learner
15: for  i = 1 , 2 , , m  do
16:      Generate D = { ( x i , y i ) } i = 1 m with x i = ( η 1 ( j ) ( x i ) , η 2 ( j ) ( x i ) , , η N ( j ) ( x i ) ) to train meta-learner η η = η ( D )
17: end for
18: Make prediction on test set D t e s t
19: for  x D t e s t  do
20:      (1) Generate new features for x with η ¯ t ( x ) = 1 k j = 1 k η n ( j ) ( x )
21:      (2) Input η ¯ 1 ( x ) , η ¯ 2 ( x ) , , η ¯ N ( x ) into trained meta-model η to obtain the
            final prediction η η ¯ 1 ( x ) , η ¯ 2 ( x ) , , η ¯ N ( x )
22: end for

2.1.2. Our Improved Stacking Algorithm

Since the cross-validation process uses a k-fold scheme, the common stacking ensemble algorithm disrupts the temporal sequentiality of the time series data, which leads to information loss and the failure of many time series-related prediction models. To solve this problem, this study proposes a double sliding window scheme that maintains the advantages of the common stacking algorithm against overfitting while ensuring input is in chronological order.
We call the improved stacking algorithm the “stacking ensemble algorithm with walk-forward validation”. The improved algorithm still has two layers which are similar to the common algorithm, namely, the base learner training layer (level-1) and the meta-learner aggregation layer (level-2). Figure 3 shows the workflow of the improved stacking ensemble algorithm.
Training base learner with walk-forward validation: Assume that the whole original time series dataset is D = { ( x 1 , y 1 ) , , ( x t , y t ) , , ( x n , y n ) } , the base learner is composed of M different models ( η 1 , , η m , , η M ), and the meta-learner is η . First, determine the size of the sliding window in the first layer as l, and use the first training set D 1 = { ( x 1 , y 1 ) , , ( x l , y l ) } D to train M different base learners at the same time and obtain the next period prediction ( y ^ l + 1 ( m ) = η m ( x l + 1 ) ) to form the prediction set ( y ^ l + 1 ( 1 ) , y ^ l + 1 ( 2 ) , , y ^ l + 1 ( M ) ) . Next, slide the training set window forward one period in chronological order to obtain the second training set D 2 = { ( x 2 , y 2 ) , , ( x l + 1 , y l + 1 ) } D , and then use the training set D 2 to train T base learners and obtain a set of l + 2 period predictions y ^ l + 2 ,..., carrying on until the entire dataset D is traversed. In this way, a set of predictions { y ^ t = ( y ^ t ( 1 ) , y ^ t ( 2 ) , , y ^ t ( M ) ) } t = l + 1 n of the base learner for y t are obtained, and the vector y ^ t corresponds to the observation y t to form a new dataset, denoted as Y = { ( y t , y t ) : t = l + 1 , , n } , which is the input to the meta-learner in the next layer.
Training meta-model and making final prediction: Connect the new datasets Y collected in the first layer in chronological order so that the predictions of M base learners correspond to the M input features of the meta-learner. Next, determine the size of the training set window (L) and the size of the validation set window (v) in the second layer. Then, train the meta-learner on the first training set Y 1 = { ( y l + 1 , y l + 1 ) , , ( y l + L , y l + L ) } Y . After the training is complete, evaluate the accuracy of the meta-learner on the interval of length v (i.e., t = l + L + 1 , , l + L + v ), and adjust the hyperparameters of the meta-learner according to the model accuracy for the validation set. Once the optimal parameters are obtained, the data from the validation set are then incorporated into the training set to retrain the meta-learner. At this time, with the optimal parameters, the meta-learner is retrained using Y 1 = { ( y l + 1 , y l + 1 ) , , ( y l + L + v , y l + L + v ) } (the retrained meta-model is denoted as η ) and is used to make the prediction at the next period t = l + L + v + 1 ( y ^ l + L + v + 1 = η ( y l + L + v + 1 ) ), which is the final prediction of the algorithm. Next, slide the training set window forward one period in chronological order in the second layer and repeat the above "training-validation-retraining-prediction" process to obtain the final predictions at the next period. After the new dataset Y is traversed, all the final predictions of the algorithm are collected. Algorithm 2 shows the pseudo-code of the improved stacking ensemble algorithm.   
Algorithm 2: Improved stacking algorithm with walk-forward validation
Input: Dataset: D = { ( x 1 , y 1 ) , , ( x t , y t ) , , ( x n , y n ) }
            Base learners: η 1 , , η m , , η M
            Meta-learner: η
Output: H ( x t ) = η ( y t ) , t = l + L + v + 1 , , n
1: level-1: Training base learners with walk-forward validation
2: for  t = l + 1 , , n  do
3:      for  i = 1 , , m , , M  do
4:          y ^ t ( i ) = η i ( x t )
5:      end for
6: end for
7: level-2: Training meta-learner and make the final prediction
8: for  t = l + L + 1 , , n  do
9:      Connect y ^ t ( i ) to obtain y t = ( y ^ t ( 1 ) , y ^ t ( 2 ) , , y ^ t ( M ) )
10:      Train η using Y t 1 = { ( y t L , y t L ) , , ( y t 1 , y t 1 ) } to obtain trained meta-learner
            η = η ( Y t 1 )
11:      while  t δ t + v 1  do
12:          H ( x δ ) = η ( y δ )
13:      end while
14:      Adjust the hyperparameters of η according to H ( x δ )
15:      if Find the best hyperparameter of η  then
16:         Retrain η using Y t 2 = { ( y t L , y t L ) , , ( y t + v 1 , y t + v 1 ) } on ( t L , t + v 1 )
              to obtain retrained meta-learner η = η ( Y t 2 )
17:      end if
18:      Predict y t + v using the retrained meta-learner η H ( x t + v ) = η ( y t + v )
19: end for
20: Isolating the test set on the timeline D t e s t
21: D t e s t = D D l + L + v

2.1.3. Forecasting Framework for Empirical Experiments

Ensemble learning is classified into homogeneous ensemble and heterogeneous ensemble according to the different settings of learners. If the base learners are all composed of the same type of models, then it is called a homogeneous ensemble, and if the base learners are composed of different types of models, then it is called a heterogeneous ensemble [40]. Stacking algorithms can be homogeneous or heterogeneous [41]. Therefore, we innovatively designed two ensemble frameworks called homogeneous ensemble framework and heterogeneous ensemble framework for predicting the return of carbon price based on the improved stacking ensemble algorithm, and the empirical experiment design in this study follows the setup of the ensemble framework.
The details of the homogeneous ensemble framework are discussed below: First, we chose factor-augmented regression as the level-1 base learner (called base model) (the type of all base models is factor-augmented regression, but their input independent variables are different (recall Section 2.2)) in the homogeneous ensemble framework, because it has been proven to have good performance in predicting carbon returns in the study of Tan et al. [19]. Moreover, we further expanded their work by elaborately adding Mallows model selection in the forecasting framework. The training and prediction of the base models were performed under a sliding window scheme. The window size was l + 1 , which means that the prediction on data t was obtained by training the model on data from time t l to t 1 . After collecting the predictions of all the base models (a set of candidate models, recall Section 2.2), instead of directly using all of them as input to the meta-model, we added a model selection step to filter out the underperforming base models through the Mallows model selection criteria, and only the best one remained. Since the model selection criteria are based on the prediction at time t = l + 1 and all fitted values of the previous period l, we used the prediction of the remaining best base model at time l + 2 as the input to the meta-model to isolate the training set and the testing set, while the data at time t was regarded as the validation set. After the above steps are completed, the size-fixed window slides forward for one day in the time axis and repeats the same "train-predict" process to obtain the prediction at time l + 3 …Finally, when the sliding window slides to the end of the original dataset, a set of predictions are collected from the best base model. However, there were still errors between these predictions and the observations.
The next part of the ensemble framework was to use machine learning models as the level-2 aggregation meta-learner (also called meta-model) to bridge the gap between the predictions of base models and the observations. The predictions collected from the level-1 layer were concatenated in chronological order and combined with real observations to form data to train the meta-model. In the level-2 layer, the size of the fixed window was set to L + v , where L = 550 days was the sample size of the training set and v = 20 days was the sample size of the validation set. After training the meta-model on the training set, predicting the carbon return on the validation set and calculating the out-of-sample R-squared ( A represented the validation set period, and y ¯ was the prediction of the historical average model (HA). The historical average model is widely used as the benchmark model—calculated by y ¯ t | t 1 = 1 t 1 T = 1 t 1 y T ), as proposed by Campbell and Thompson [42]—to search for the best hyperparameter. The out-of-sample R-squared is of the following form:
R o s 2 = 1 t A y t y ^ t t 1 2 t A y t y ¯ t t 1 2
The best hyperparameter settings are those that enable the meta-model to obtain the highest R o s 2 on the validation set. Once the parameters of the meta-model were fixed, we re-trained the meta-model on the new dataset formed by the previous training set and validation set. The final prediction at time t = l + L + v + 1 comes from the re-trained meta-model, and then the size-fixed window slides forward for one day in the time axis and the above process is repeated until all remaining final predictions are collected. The topological structure diagram of the homogeneous ensemble framework is visualized in Figure 4.
The benefit of homogeneity is to reduce the risk of additional noise from different models, but it neglects that different models have their own advantages in capturing information in different dimensions. As such, we fine-tuned the details of the homogeneous ensemble framework to construct the heterogeneous ensemble framework, to incorporate diverse models so as to break through the accuracy limits of a single model.
The core idea of the heterogeneous ensemble framework is still the improved stacking algorithm: First, we chose seven different prevailing predictive models (i.e., LASSO, Ridge regression, E-net, MMA, SVM, RF, and XGBoost (See Section 2.2 and the Appendix A for a description of these models)) of penalty regression, model average, and machine learning as level-1 base learners. These models were used to train all base learners at window size l = 100 days, and the predicted carbon return at time t = l + 1 was obtained from the trained base models. To achieve a balance of prediction accuracy among all base models, no validation process was included to calibrate the base models specifically. This highlights the final difference between the base model and the ensemble model in accuracy caused by the improved stacking algorithm. Next in the process, the size-fixed window slides forward for one day in the time axis and the “train-predict” process is repeated, and predictions of the base models at time t = l + 2 are obtained, and so on… Finally, when the sliding window slides to the end of the original dataset, seven sets of predictions are collected, each from a different base model, which serve as feature inputs for the meta-learner. In the level-2 layer of the heterogeneous ensemble framework, the partition of datasets, the determination of hyperparameters, the designed size of the window, and the sliding scheme are basically the same as those in the homogeneous ensemble framework. The specific details can be found in the heterogeneous ensemble framework in Figure 5.
The purpose of stacking is to improve the final prediction accuracy, not to find the optimal or best base model. We included a variety of different models so that the base predictions were versatile. It is possible that there is a better list of base models and meta-models, but this is not the core issue of this paper. Our choices for meta-model included SVM, RF, and XGBoost, which are popular machine learning models for performing integration tasks in industry and academic circles. As mentioned above, the core issue of this study is the benefits brought by the improved stacking algorithm, and we acknowledge that more refined tuning of parameters and model selection may lead to better final results.

2.2. Forecasting Models

Considering the large number of models involved in the ensemble algorithm, we mainly introduced factor-augmented regression combined with Mallows model selection theory (FAR+MMS) and Mallows model average (MMA) in this subsection. As for the rest of the prediction models (LASSO, Ridge regression, E-net, Support Vector Regression(SVR), Random Forest(RF), and eXtreme gradient boosting(XGBoost)), since they are widely used, brief introduction to each of them is provided in Appendix A. According to the experimental design in Section 2.1.3, we summarize the use of models as follows: for the homogeneous ensemble framework, FAR was chosen as the base model and filtered by MMS (FAR+MMS); for the heterogeneous ensemble framework, MMA, LASSO, Ridge regression, E-net, SVR, RF, and XGBoost were chosen as the base model. The selection of these base models was based on relevant research on carbon market prediction for better comparison (FAR [19], SVR [43,44], RF [45,46], XGBoost [47,48]). Meanwhile, SVR, RF, and XGBoost are also used as the meta-model for two ensemble frameworks to form six ensemble models (homo_svr,homo_rf,homo_xgb;hete_svr,hete_rf,hete_xgb). The hyperparameters of each model are listed in Table A3 of Appendix B.
Factor-augmented regression (FAR): In the following discussion of factor-augmented regression, we follow the study of Kim and Swanson [49] and Cheng and Hansen [50]. Factor-augmented regression extracts the common factors through the dimension reduction techniques of ‘factor decomposition’ and builds a concise and effective model with rich information.
Let X i t be the observation for t = 1 , , T and i = 1 , , N , and y t + h be the predicted target variables. We begin with the following factor model:
X i t = λ i F t + e i t ,
where F t is a 1 × r vector of common factors, λ i is a 1 × r vector called the factor loading, and e i t is the idiosyncratic component of X i t . Let X be a T × N dimensional matrix of observations and F = ( F 1 , , F T ) be a T × r dimensional matrix of common factors; then, Equation (3) can be converted to matrix notation:
X = F Λ + e ,
where Λ = ( λ 1 , , λ N ) is N × r , and e is a T × N error matrix. Once F is extracted, we construct the following factor-augmented regression model:
y t + h = α 0 + α ( L ) y t + β ( L ) F t + ϵ t + h ,
where h 1 is the forecast horizon, α ( L ) and β ( L ) are lag polynomials of order p and q, respectively, for some 0 p p m a x and 0 q q m a x . p m a x and q m a x are, respectively, the maximum lag of y t and F t .
For the empirical experiment, this paper adopts the approximate model structure proposed by Cheng and Hansen [50], which can be written as follows:
y t + h = z t b + ϵ t + h ,
where z t = ( 1 , y t , , y t p m a x , F t , , F t q m a x ) , and b includes all coefficients from Equation (5).
Sequentially nested subsets of z t are taken in order from smallest to largest in size to construct M candidate models. M approximating models are considered, indexed by m = 1 , , M , where each approximating model m specifies a subset z t ( m ) of the regressors z t . Thus, the first model sets z t ( 1 ) = 1 , the second model sets z t ( 2 ) = ( 1 , y t ) , etc., expanding to a total of M = ( 1 + p m a x ) ( 1 + r ˜ ) sequentially nested models, where r ˜ is the number of common factors retained. The approximate form of the mth candidate model is thus written as follows:
y t + h = z t ( m ) b ( m ) + ϵ t + h .
Let z ˜ t ( m ) denote z t ( m ) , the factors F t of which are replaced by their estimates F ˜ t , leading to Z ˜ ( m ) = ( z ˜ 1 ( m ) , z ˜ T h ( m ) ) . Consequently, the least squares estimate of b ( m ) is b ^ ( m ) = ( Z ˜ ( m ) Z ˜ ( m ) ) 1 Z ˜ ( m ) y with residual ϵ ^ t + h ( m ) = y t + h z ˜ t ( m ) b ^ ( m ) . The prediction of the mth candidate model at time T is expressed as follows:
y ^ T + h | T ( m ) = z ˜ T ( m ) b ^ ( m ) .
Mallows Model Selection (MMS): When factor-augmented regression is used as the base model of the homogeneous ensemble algorithm, we need to select the best model from M candidate models. We refer to the model selection criteria of factor-augmented regression derived by Cheng and Hansen [50] under the Mallows [51] criterion.
The Mallows criterion is an unbiased estimate of the expected squared fit under the assumption of independent observations and homoskedastic regression. We directly present the criteria for model selection below:
S T ( m ) = 1 T t = 1 T ϵ ^ t ( m ) 2 + 2 σ ^ T 2 T k ( m ) ,
where k ( m ) = d i m ( z t ( m ) ) denotes the number of regressors in the mth model, and σ ^ T 2 = ( T K ( M ) 1 t = 1 T ) ϵ ^ t ( M ) 2 denotes the preliminary estimate of σ 2 .
The best model selected under the Mallows criterion is the model m ^ that satisfies equation m ^ = a r g m i n 1 m M S T ( m ) . To sum up, there are three steps: estimating the parameters of each model m, calculating the S T ( m ) for each model, and selecting the prediction of the model with the minimum S T ( m ) .
Mallows Model Averaging (MMA): Cheng and Hansen [50] obtain edthe model average form suitable for factor-augmented regression by minimizing the Mallows criteria [51], which was further work after the MMA (Mallows Model Averaging) proposed by Hansen [52]. In this paper, we follow Cheng and Hansen [50] to use factor-augmented regressions with nested subsets of regressors as candidate models, and the final forecast combinations of candidate models are as follows:
y ^ T + h T ( w ) = m = 1 M w ( m ) y ^ T + h T ( m )
where y ^ T + h T ( m ) is the prediction of the mth candidate model, and w ( m ) represents its weight, which minimizes the following objective function:
min w 1 T t = 1 T m = 1 M w ( m ) ε ^ t ( m ) 2 + 2 σ ^ T 2 T m = 1 M w ( m ) k ( m ) ,
where k ( m ) = d i m ( z t ( m ) ) denotes the number of regressors in the mth model, and σ ^ T 2 = ( T K ( M ) 1 t = 1 T ) ϵ ^ t ( M ) 2 denotes the preliminary estimate of σ 2 .

2.3. Statistical and Economic Evaluation

2.3.1. Judgment on Different Volatile Intervals

Hamilton and Susmel [53] incorporated Markov switching (MS) and ARCH models to construct a new model named SWARCH (switching ARCH), which aimed to distinguish the volatility state between the tranquil and turmoil periods. The SWARCH model has been proven to be robust in differentiating the volatility states of the time series dynamics in subsequent empirical studies. Liu and Lee [54] adopted the SWARCH model to reveal the pattern of regime-switching in the INE crude oil futures market and explore how external shock turns the crude oil futures market from stable to volatile. Wang et al. [55] classified stock market crises based on SWARCH filtering probabilities of the high volatility regime. Shi et al. [56] used the SWARCH model to measure the significance of firm-specific news sentiment in quantifying intraday volatility persistence in the calm (low-volatility) state and the turbulent (high-volatility) state. We needed to figure out whether the proposed ensemble algorithm could effectively work in different volatility state episodes, and the typical AR(p)–SWARCH(K,q) model was the optimal choice to divide the carbon return series into different volatility states:
y t = u + θ 1 y t 1 + θ 2 y t 2 + + θ p y t p + ϵ t , ϵ t | I t 1 N ( 0 , h t ) ;
h t 2 γ s t = α 0 + α 1 ϵ t 1 2 γ s t 1 + + α q ϵ t q 2 γ s t q , s t { 1 , , K } ,
where u is a constant; ϵ t is the residual of a normal distribution with zero mean and variance of h t ; θ 1 , θ 2 , , θ p and α 1 , α 2 , , α q are parameters to be estimated; and γ denotes a set of scaling parameters related to the latent state variable s t , which is a Markov chain with K regimes. In our study, we uniformly set the lag p and q to be 1, and the input of the SWARCH model was the actual observations of carbon return.
In this study, R package MSGARCH [57] is applied to calculate the filtering probability according to the following formulation:
P ( s t = j | Y t ; θ t ) , j = 1 , , K
where Y t are historic observations of price log return at time t, and θ t denotes the parameter vector. In the experiment, we set K to 3 to increase the differentiation between high and low volatility states; thus, s t = 3 means that the carbon market is in a state of high volatility, and the rest of s t represents a low volatility state. The criteria are written as follows:
C t = 1 , P ( s t = 3 | Y t ; θ t ) 0.5 , 0 , otherwise .

2.3.2. Prediction Accuracy Metrics

In order to evaluate the performance of the ensemble algorithm in prediction accuracy, we explored five statistical metrics from different perspectives: root mean square error (RMSE), symmetric mean absolute percentage error (SMAPE), mean absolute error (MAE), Theil U statistic 1 (U1), and out-of-sample R 2 ( R o s 2 ) [42]. RMSE reflects the deviation between the prediction and the true value. SMAPE measures the relative error in the sense of ratio. MAE intuitively represents the absolute value of error. U1 considers both the prediction and the observation as the measurement basis, and it evaluates the prediction power of the model. Lastly, R o s 2 evaluates the superiority of the model by comparing it with the historical average benchmark model (HA).
Table 1 lists the details of the above evaluation metrics. For the four measures RMSE, SMAPE, MAE, and U1, the smaller the value, the higher the prediction accuracy of the model. R o s 2 represents the proportion of improvement in forecasting accuracy of the measured model compared to the historical average benchmark model. Campbell and Thompson [42] indicated that even very small positive R o s 2 can signal superior predictive accuracy relative to the benchmark.

2.3.3. Model Confidence Set

Using only the statistical metrics in Table 1 as the criteria for evaluating the model accuracy, the results are easily influenced by the sample, and such an evaluation is not robust. A small number of singularities can significantly affect the computation of the model loss function, leading to an abnormal increase in the loss value and, ultimately, invalidating the evaluation of the model accuracy. The model confidence set (MCS) proposed by Hansen et al. [58] is designed to overcome the above problem, so this paper uses the MCS to verify the usefulness of the improved stacking algorithm in improving the accuracy of the model from the perspective of hypothesis testing.
The process of the MCS is as follows: Firstly, suppose there are m 0 prediction models, denoted as M 0 = { 1 , 2 , 3 , , m 0 } , and there are M real observations y t on the test set. Then, each prediction model in M 0 generates M corresponding predictions y ^ t ( t = 1 , 2 , , M ) . For each y ^ t , the corresponding predicted loss value L t , j , j = 1 , 2 , , m 0 is calculated according to the chosen loss function. Next, the difference between the loss values of the two prediction models u , v ( u , v M 0 ) in M 0 is calculated, which is denoted as d t , u v and is computed as follows:
d t , u v = L t , u L t , v .
Secondly, define the set of best models M * :
M * { u M 0 : E ( d t , u v ) 0 , for all v M 0 } .
The models with poor accuracy in the model set M 0 are filtered out by continuous significance tests, and the model with the best prediction accuracy is left at the end. In each significance test, the null hypothesis states that the two models have the same prediction accuracy, expressed as follows:
H 0 , M : E ( d t , u v ) = 0 , for all u , v M M 0 .
The equivalence test is used to test the null hypothesis, and the elimination rule is used to filter out the models that reject the null hypothesis. The specific process is divided into the following three steps: (1) Let M = M 0 ; (2) Test the null hypothesis at the significance level α using the equivalence test; (3) If the null hypothesis is not rejected, consider M 1 α * = M ; otherwise, the model for which the null hypothesis is rejected is filtered out from M . The testing and filtering continue until the null hypothesis is no longer rejected, and the final retained models in M 1 α * are the models with the best accuracy at the confidence level of 1 α . If the p value of the prediction model is greater than the given significance level α , it belongs to the highest accuracy model set, and a larger p value represents higher prediction accuracy.
In this study, we chose three loss functions for the MCS: Mean Squared Error (MSE), Mean Absolute Error (MAE), and Huber loss (In this study, we set δ = 1 ), which are calculated as follows:
MSE : 1 n t = 1 n y t y ^ t 2 ,
MAE : 1 n t = 1 n y t y ^ t ,
Huber Loss : 1 2 ( y t y ^ t ) 2 , | y t y ^ t | δ δ | y t y ^ t | 1 2 δ 2 , | y t y ^ t | > δ
The test statistic is as follows:
T R = max u , v M d ¯ u v v a r ^ ( d ¯ u v ) ,
T M A X = max u M d ¯ u . v a r ^ ( d ¯ u ) .
where d ¯ u v = 1 M t = 1 M d t , u v , and d ¯ u . = 1 m 0 u M d ¯ u v The null hypothesis is rejected when the statistic is greater than the critical value. Since the distributions of T R and T M A X are very complex, this study uses block bootstrap to obtain the p-value of the test (we set the ’block size’ parameter of bootstrap to 2 and the number of simulations to 10,000, referring to the study of Hansen et al. [58] for the specific procedure).

2.3.4. Investment Portfolio

Additionally, we measured the gains generated by the developed ensemble framework using portfolio strategy from the perspective of economic value. More specifically, following Campbell and Thompson [42], Rapach et al. [59] and Zhao and Cheng [60], we calculated realized utility gains for a mean-variance investor who allocates his portfolio daily between carbon emission option and risk-free bills with weights ω t and 1 ω t , based on the prediction of carbon return. At time t, the investor determines the amount of funds allocated to two assets in the next period ( t + 1 ) according to ω . The weight ω t is determined by the following formula:
ω t = 1 γ · y ^ t + 1 σ ^ t + 1 2
where σ ^ t + 1 2 is the sample variance of carbon return with a rolling window of 50 days (Zhao and Cheng [60] used monthly data in their empirical experiment and estimated σ ^ t + 1 2 as the sample variance of quarterly returns within a fixed ten-year rolling-window; referring to this, we decided to set the window size to 50 for the daily data used in the portfolio), y ^ t + 1 is the prediction of carbon return at time t + 1 , and γ is a relative risk aversion parameter which describes the trading style of the investor, to some extent—the lower the value of γ , the more aggressive the investors. ω t is constrained between 1.5 and 1.5 (if ω t 1.5 , ω t is set to 1.5 , and if ω t 1.5 , then ω t is set to 1.5 ) (According to Campbell and Thompson [42] and Rapach et al. [59], ω t is set between 0 and 1.5 to preclude short sales and prevent more than 50% leverage. There are also some studies that add the option of short selling [61]. Considering the actual trading situation of carbon options, we chose the latter range of ω t ). We explored the average utility level of the ensemble models under fixed threshold bounds of ω t with different γ . The realized return is calculated as shown:
R t + 1 = ω t y t + 1 + ( 1 ω t ) r t + 1 ,
in which y t + 1 is the observation of carbon return, and r t + 1 is a risk-free rate. (We chose 1-year China government bond yield as the risk-free rate.) Then, we calculated the universally acknowledged certainty equivalent return (CER):
C E R = μ ^ R 1 2 γ σ ^ R 2
where μ ^ R and σ ^ R 2 are the sample mean and variance of series R t . The difference between C E R calculated by the forecast model and the historical average model (HA) is called utility gain, which is multiplied by 400 as an annualized percentage return. As such, the following metrics are obtained:
U G m o d e l = 400 ( C E R m o d e l C E R H A ) .
Utility gain ( U G m o d e l ) can be interpreted as the portfolio management fees that investors are willing to pay to obtain the additional information provided by the forecasting ‘ m o d e l ’ compared to using only a historical average model (HA).
We employed another popular criterion, called Sharpe ratio (SR), to evaluate the performance of the portfolio above. It is constructed based on the portfolio excess returns:
S R = μ p σ p
where μ p and σ p are, respectively, the means and standard deviations of portfolio excess returns. The Sharpe ratio is the economic indicator that evaluates the portfolio returns against its corresponding volatile risks.

3. Data Description

In this section, we provide an introduction to our data, including composition and source of variables, partition time node of dataset, and data cleaning approach. We collected 895 daily settlement prices of carbon emission options in the carbon emission trading markets of Shenzhen, China, from 27 March 2018 to 6 July 2022. The last 224 samples of the dataset from 10 May 2021 to 6 July 2022 were used as test sets. The main reason for choosing Shenzhen’s carbon market is that it is the first carbon market in China, with a large scale, more complete data, and a strong influence on carbon prices in other markets. We employed the log return (computed as Equation (29)) of the carbon price (SZA) as the target for prediction, which is widely used as the continuously compounded return of carbon.
In terms of the predictors, we followed Tan et al. [19] to consider three categories of variables:
  • Commodity variables, including energy and non-energy commodity futures, based on settlement prices and a price index from the Chinese market. For energy commodities, the selections were as follows: (1) China Liquefied Natural Gas Price Index (LNG); (2) thermal coal (SPcoa); (3) INE crude oil (SPcru). For non-energy commodities, the variables were subdivided into non-ferrous metals and agricultural products. Non-ferrous metals were (1) aluminum (SPalu); (2) zinc (SPzin); (3) lead (SPlea); (4) nickel (SPnic); (5) tin (SPtin); (6) silver (SPsil); (7) gold (SPgol); and (8) cathode copper (SPcop). Agricultural products were (1) yellow corn (SPcor); (2) egg (SPegg); (3) cotton (SPcot); and (4) high-quality strong gluten wheat (SPwhe) (except for LNG, other variables were option settlement price (active contract));
  • Stock and bond market variables, including some composite indexes and rate variables. For the stock market, the predictors were (1) SSE: Average P/E ratio (SSEPE); (2) SSE Composite Index (SSECI); (3) CSI 300 Index (CSI300); (4) SSE 180 Index (SSE180); (5) SZSE Composite Index (SZSECI); (6) CSI 100 Index (CSI100); and (7) CSI 500 Index (CSI500). For the bond market, the predictors were (1) SSE Government Bond (SSEGBI); (2) SSE Corporate Bond Index (SSECBI); (3) SSE Enterprise Bond Index (SSEEBI); (4) CCDC government bond yield: 3-months (Gb3M); (5) CCDC government bond yield: 10-years (Gb10Y); (6) CCDC corporate bond yield (AAA): 3-months (Cb3M); (7) CCDC corporate bond yield (AAA): 10-years (Cb10M); (8) CCDC coal industry bond yield (AAA): 3-months (coalb3M); and (9) CCDC coal industry bond yield (AAA): 5-years (coalb5Y);
  • Economic and industry composite variables, including (1) Financial Conditions Index (FCI); (2) China Securities Industry Index: energy (CSIIene); and (3) Wind Industry Index: energy (Windene). These three indexes were used to depict the financial sector and the energy sector as a whole, sectors which are closely related to the carbon option return.
We transformed the original data into two forms as follows: (i) Logarithmic difference method (LD) (according to Equation (29)); (ii) First-order difference method (FD) (according to Equation (30)). The transformation alleviates the heteroscedasticity of time series data and makes the data stable.
y t = l n P t l n P t 1
y t = P t P t 1
Table A1 summarizes the statistical descriptions of all variables. From the results of the ADF test (Augmented Dickey–Fuller test) and Jarque–Bera test, the transformed data are stationary and show the characteristics of non-normal distribution. Table A2 (see Appendix B) provides an explanation of all variables involved. All variables are available from the WIND (WIND is a popular financial database in mainland China, which contains both micro- and macro-economic variable data for researchers and practitioners) database. Since there are some missing data in the original samples at different time points, we adopted an approach to fill in the missing data with the mean value of the previous and the next points to deal with this problem. In addition, we calculated the correlation of each variable and drew a heat map (shown in Figure A2 in Appendix B).

4. Empirical Results

In this section, we compare the performance difference of the forecasting model before and after using the improved stacking algorithm from the perspective of accuracy through statistical metrics and model confidence set, and evaluate the increase in economic gains brought by the improved algorithms through a portfolio. Also, we systematically explore the performance of the ensemble algorithm in high and low volatility states so that potential investors can make more rational use of the improved stacking ensemble algorithm to invest in carbon assets profitably.

4.1. Out-of-Sample Accuracy Performance

As mentioned in the data description section, the test data to compare model performance covered 224 samples from 10 May 2021 to 6 July 2022. The accuracy of the forecasting model was measured by five statistical metrics, and the models included base models before the ensemble process and six ensemble models using different frameworks. More specifically, we explored in detail the positive effect of our improved stacking algorithm on carbon return prediction from two dimensions: the whole mixing interval and the intervals with high or low volatility. In addition, to further analyze the robustness of the ensemble algorithm, we compare the performance of different ensemble models at different sliding window sizes in the third part of this section.

4.1.1. Accuracy for the Whole Test Set

The accuracy performance of the ensemble models on the whole test set covering 224 samples is presented in Table 2, according to the different ensemble frameworks applied. For homogeneous ensemble, it can be clearly seen from the table that, after adding the model selection process, compared with only using factor-augmented regression, the model achieved better performance in the five accuracy evaluation metrics.
We introduced an index PI to quantitatively measure the percentage of improvement in the evaluation index. For homogeneous ensemble, ‘base_model’ indicates FAR, while it indicates ‘base model with the best performance’ for the heterogeneous ensemble; ‘index’ represents one of the five precision evaluation indexes.
PI I n d e x e n s e m b l e _ m o d e l = Index b a s e _ m o d e l Index e n s e m b l e _ m o d e l Index b a s e _ m o d e l × 100 % ,
where ‘ P I I n d e x e n s e m b l e _ m o d e l ’ is interpreted as the rate of improvement generated by the ’ensemble model’ in terms of ‘index’.
Among the three ensemble models under the homogeneous ensemble framework, h o m o _ s v r achieved the best performance in five statistical error indicators, with P I R M S E h o m o _ s v r = 10.04%, P I S M A P E h o m o _ s v r = 11.41 % , P I M A E h o m o _ s v r = 11.95 % , P I U 1 h o m o _ s v r = 15.13 % , and P I R o s 2 h o m o _ s v r = 84.52 % . For heterogeneous ensemble, more forecasting models joined the comparison; however, even the best performers of the base models could not beat the ensemble model in the accuracy metrics. The best P I was still obtained by ‘ h e t e _ s v r ’, with values of P I R M S E h e t e _ s v r = 5.98 % , P I S M A P E h e t e _ s v r = 3.85 % , P I M A E h e t e _ s v r = 4.85 % , P I U 1 h e t e _ s v r = 4.19 % , and P I R o s 2 h e t e _ s v r = 42.80 % .
In addition, the positive P I indicates that, compared with the base model, the accuracy of each of the three ensemble models was improved after the integration process. Considering the results of homogeneous ensemble and heterogeneous ensemble, as a level-2 model for performing aggregation tasks, the advantage of SVR in improving accuracy is outstanding, and all the ensemble models play a positive role in improving prediction accuracy. Figure 6 provides more intuitive support for this view. The accuracy error in the ensemble models shrinks to the minimum range compared to the base model. (In order to unify the measurement criteria and more intuitively represent the comparison of the accuracy of the integrated model, R o s 2 was converted to 1 R o s 2 , which means that the smaller the 1 R o s 2 shown in Figure 6, the higher the accuracy).

4.1.2. Accuracy for the Different Volatility Intervals

In order to validate the effectiveness of the improved stacking algorithm in improving the prediction accuracy in more detail, we used the MSGARCH model to divide the timeline of the entire dataset into ‘high volatility state’ and ‘low volatility state’, which, respectively, represent whether the current state is high volatility or low volatility. The volatility state of the whole time series is shown in Figure 7. According to the division results, the first 112 samples of the test set, from 10 May 2021 to 22 November 2021, were in a high volatility state, and the remaining half were in a low volatility state. The carbon return curves predicted by the six ensemble models are shown in Figure 8. Visually, from Figure 8, these ensemble models achieved good results in predicting the trend of carbon return, but the precision differences between the models under different volatility states, as well as before and after the ensemble process, need to be verified by quantitative calculation.
Table 3 and Table 4 show the calculation results of five statistical accuracy metrics of models when adopting two different ensemble frameworks under high and low volatility states. Compared to the results for the whole test data, a similar conclusion can be drawn, that the stacking algorithm always improves the precision of the model, whether in the high or low volatility state. However, with different ensemble frameworks and different level-2 meta-models, the ensemble models show some specific patterns in periods with different volatility.
As shown in Table 3, h o m o _ s v r is still the most competitive model in both high and low volatility states compared to the other two. However, based on the results in Table 4, h e t e _ r f achieves the best performance for three indicators (RMSE, U1, R os 2 ) in low volatility state, while in high volatility state, h e t e _ s v r is superior in these three indicators while h e t e _ x g b is optimal in the other two (SMAPE, MAE). These results illustrate that ‘ R F ’ and ‘ X G B o o s t ’ can also be taken into consideration to be the level-2 model, depending on the ensemble mode and the volatility state. Moreover, we calculated the percentage of precision improvement on average for RMSE, SMAPE, MAE, and U1, (For homogeneous ensemble [shown in Table 3], the result was based on the mean of the three ensemble models over the value of ‘FAR’; for heterogeneous ensemble [shown in Table 4], the result was based on the mean of the three ensemble models over the mean of all base models; we did not calculate the percentage increase in R os 2 , because R os 2 can be negative), which are listed at the bottom of Table 3 and Table 4. An interesting generalization is observed that, for the homogeneous ensemble, the ensemble process can improve the precision more significantly in the period with low volatility than that in the period with high volatility, while it is the opposite for the heterogeneous ensemble. The above results indicate that, in order to improve the accuracy of the forecasting model, it is more advantageous to choose the homogeneous ensemble framework when the carbon return is in the low volatility period, while the heterogeneous ensemble framework is a better choice when the carbon price is in the high volatility period.
In addition, to verify the effectiveness and advancement of the ensemble models, other popular methods related to carbon market prediction, especially deep learning models (LSTM [62], GRU [63], EMD+LSTM [48]), were considered. Table 5 shows the accuracy comparison between the ensemble models and other prediction models. It can be seen from the results that our proposed ensemble models outperform the other models.

4.1.3. Robustness Analysis for Different Rolling Window Sizes

Arbitrary choices of window sizes have consequences on how the sample is split into in-sample and out-of-sample portions, which may lead to different empirical results in practice [64]. The results of the above two sections were obtained under the condition that the window size was 100; we also expanded the window size to 200 and 300 to check the robustness of the experimental results. Table A4 (see Appendix B) lists the accuracy performance of models using two different ensemble frameworks at different window sizes. As seen in Table A4, the forecasting accuracy of the ensemble models is still better than that of the base models after the expansion of window size, which further confirms the most important conclusion, that the ensemble algorithm we developed can improve the predictive power of forecasting models to an impressive extent with regard to robustness. We also notice that, with the expansion of window size, the forecasting accuracy of the ensemble models decreases and the range of improvement in precision brought by the ensemble algorithm also decreases. Thus, setting the window size to 100 is a good choice to let the ensemble algorithm unleash its full power. Of course, the selection of window size is not the focus of this paper; accordingly, the following portfolio research was also carried out on the results with a window size of 100.

4.2. Analysis of MCS

The p-values of the ensemble model constructed by the improved stacking algorithm with regard to the MCS are shown in Table 6. The significance level α was set to 0.05. According to the results in Table 6, there was a significant boost in the accuracy of the model after using the improved stacking algorithm (seen in the larger p-values), which means that the improved stacking ensemble algorithm passes the hypothesis test for improving the accuracy; the improvement is proven to be robust. Moreover, h o m o s v r and h e t e s v r , respectively, achieved the highest p-values in homogeneous ensemble and heterogeneous ensemble, which indicates the significant advantage of support vector regression (SVR) as an aggregation meta-model for the ensemble algorithm.

4.3. Investment Gains from a Portfolio Perspective

Thus far, the advantages of the improved stacking ensemble algorithms in improving the accuracy of prediction have been comprehensively proven. However, translating such advantages into investment gains in practical business scenarios is a direction of more concern for market participants. We calculated the annualized utility gain (recall Equation (27)) generated by the mean-variance investor who constructs a portfolio strategy between carbon option and risk-free asset based on the predictions of the ensemble models. Inspired by Tan et al. [19] and Zhao and Cheng [60], we chose the 1-year China government bond as the alternative allocation asset, meaning that the 1-year China government bond yield was substituted into Equation (25) to represent the risk-free rate.

4.3.1. Influence of Risk Preference on Portfolio Construction

As mentioned in the section about economic evaluation (see Section 4.2), the risk aversion parameter γ measures the extent of the investor’s aversion to risky assets and aggressive investment. The lower the value of γ , the greater the tolerance of investors to risk. It can be inferred from the calculation formula of weight ω (recall Equation (24)) that a lower γ value leads to a higher weight assigned to the carbon option in the portfolio. As such, we conducted the robustness test on the annualized utility gain generated by investors with different risk preferences. The results were calculated by the predictions over three periods: the whole test set interval, high volatility interval, and low-volatility interval, which are shown in Figure 9 and Figure 10.
Figure 9 reveals a phenomenon that, when γ is extremely small, h e t e _ r f is the model to obtain the maximum utility gain, but when γ increases to more than 0.5, h o m o _ s v r begins to take the lead, and when it exceeds 0.9, the annualized utility gain (UG) of all ensemble models gradually turns negative. Moreover, it is obvious that the annualized utility gain in all ensemble models over the whole prediction interval with mixed volatility state decreases with the increase in investor risk aversion parameter γ .
Figure 10a depicts how the various ensemble models perform in terms of utility gain in the high volatility interval. Under this condition, the dominance of the ensemble models based on the homogeneous ensemble framework is very clear. In the high volatility interval, with extremely small γ (e.g., γ = 0.1 ), h o m o _ x g b produces the largest annualized utility gain. Then, when γ = 0.2 0.4 , h o m o _ r f slightly leads the others. Finally, when γ exceeds 0.4, h o m o _ s v r always performs the best.
As for Figure 10b, the excellence of h e t e _ r f in the low volatility interval is outstanding, which can be demonstrated from two aspects. Firstly, within the range of γ plotted in (b), the annualized utility gain obtained by h e t e _ r f is much higher than that obtained by other ensemble models. Secondly, with the increase in γ , the annualized utility gain of other ensemble models rapidly drops below 0, while that of h e t e _ r f stays positive until γ is greater than 1.9 (specific values can be found in Table A5 and Table A6 of Appendix B). h o m o _ r f has significantly better performance and better robustness in terms of economic returns in the low volatility interval of the test set.
To summarize, for the carbon option, investors with risk-preferred attitudes are more likely to obtain high economic gains, and under the portfolio strategy in this paper, using the homogeneous ensemble is more conducive to achieving high economic returns when the market is in a high volatility state. Also, ‘ h e t e _ r f ’ compared to the other ensemble models has significant advantages in the portfolio during the low volatility period. Considering the difference in magnitude of the annualized utility gain between high and low volatility states, we find that the improved stacking algorithm exerts significantly greater advantage in the high volatility state.

4.3.2. Functionality of Ensemble Strategy in Improving Economic Gains

In the previous subsection, the performance of the ensemble models is shown through portfolios constructed by investors with different risk appetites. In this section, we further explore the function of the stacking algorithms we proposed in improving the economic gains of the model quantitatively. We report the comparison of annualized utility gain and Sharpe ratio with γ = 0.3 between the base models and the ensemble models, in Table 7, to show the impact of ‘ensemble’.
As detailed in Table 7, h e t e _ r f achieves the best results for annualized utility gain and Sharpe ratio over both the whole test set and low volatility interval. In the high volatility interval, h o m o _ r f is the best ensemble model, with U G = 88.2185 and Sharpe ratio = 0.3774. Moreover, whether the stacking ensemble algorithm is homogeneous or heterogeneous, the ensemble process can always improve the portfolio return, based on the comparison of the performance of ensemble models with that of base models. In other words, ensemble strategy plays a positive role in improving the economic gains of prediction.
We use PI (Equation (31)) to show the extent of improvement brought by ensemble strategy on the two indicators, the best of which are P I U G h e t e _ r f = 16 % and P I S R h o m o _ r f = 9.96 % for the whole test set, P I U G h o m o _ r f = 14.33 % and P I S R h o m o _ r f = 9.55 % for the high volatility interval, and P I U G h o m o _ x g b = 58.70 % and P I S R h o m o _ x g b = 23.99 % for the low volatility interval. In terms of the absolute value of annualized utility gain and Sharpe ratio (SR), the heterogeneous ensemble framework performs better, and the homogeneous ensemble framework is superior for the improvement degree compared to the base model. An extra finding is that, compared to FAR, FAR+MMS in the homogeneous ensemble makes a significant improvement in obtaining economic gains, which means that our innovation in adding model selection to the homogeneous ensemble is successful. As well, to confirm the usefulness of the modified stacking algorithms, it is also necessary to show the performance of the b u y & h o l d strategy (buy on the first day and sell on the last) in the comparison in terms of Sharpe ratio. Table 7 lists the obtained Sharpe ratio using the b u y & h o l d strategy at the bottom. The Sharpe ratio of using the b u y & h o l d strategy to invest in carbon option assets is much lower than that of using other prediction models in the table, which means that the additional information provided by the model is very valuable.
A counter-intuitive phenomenon occurs, in that h o m o _ s v r and h e t e _ s v r outperform other ensemble models in accuracy but h o m o _ r f and h e t e _ r f make more economic gains in the portfolio. Considering the portfolio strategy we have adopted, this problem is not difficult to explain. First, the premise of our portfolio is that the investors are mean-variance types, and from the whole calculation process of certainty equivalent return (CER), we can infer that a more fluctuant prediction of carbon return will result in a realized return with higher volatility, which means that CER will be penalized more by variance in Equation (26). We list the sample variance of the real carbon return and predictions made by each ensemble model in Table 8. As can be seen from Table 8, real carbon return has a relatively greater variance, which makes the prediction of h e t e _ s v r tend to have high fluctuation since h e t e _ s v r has higher accuracy. (In fact, according to the values, the prediction of h e t e _ s v r does have a relatively larger variance compared to other integrated models.) The variance in the prediction made by h e t e _ r f is the smallest in all intervals, which is beneficial to achieve a more robust certainty equivalent return (CER) in the portfolio. This reasonable inference of the result indicates that, for investors of mean-variance type, the forecast model with stable prediction is more conducive to obtaining high returns when investing in carbon assets.
Since the historical average (HA) model reflects historical information from the average, we added another baseline for comparison from recent information. We took the carbon return at day t as the prediction of the next day (day t + 1 ), and calculated the annualized utility gains based on this strategy (i.e., calculate the CER based on this strategy, and replace C E R H A in Equation (27) with it to obtain the corresponding annualized utility gains), as shown in Table 9. The results show that, even if this strategy is used as a benchmark, the improvement brought by the modified stacking algorithm is still significant, and for assets with sharp price fluctuations such as carbon options, using only the information from the previous day is risky (it can be seen that the U G in Table 9 is much greater than that in Table 7).

5. Conclusions and Future Works

Through implementing comprehensive empirical experiments, we proved that the improved stacking ensemble algorithm can effectively improve the accuracy of models in predicting carbon returns. Support vector regression has advantages in improving prediction accuracy as meta-models for the improved stacking ensemble algorithm. When the carbon market is in a low volatility state, the improvement in homogeneous ensemble is greater, while in a high volatility state, heterogeneous integration is a better choice. Based on the results of detailed portfolio experiments, we find interesting generalizations about forecasts using stacking algorithms and characteristics of carbon assets. Not only did we provide supportive evidence for the existing research on carbon return prediction, but they we also explored supplementary research discussing the predictive model’s practical significance in investment. Firstly, the improved stacking ensemble algorithm significantly improves the economic benefits of carbon asset in portfolios. Secondly, if investing in carbon assets, a risk-prone investor is more likely to receive higher returns. Meanwhile, the empirical results demonstrate that different stacking ensemble frameworks perform diversely during turmoil and tranquil periods. We recommend the ensemble technique in practice since it brings stable forecasting performance and attractive investing gains, even when the volatility situation varies from low to high. Last but not least, the details of our innovation on stacking algorithm provide a valuable reference for researchers who study time series prediction.
However, there are still some limitations requiring further discussion in future work. We do not attempt to construct the optimal portfolio strategy in this study since the main focus, as mentioned, is the economic impact of ensemble strategy on the forecasting model and the optimal ensemble models in different market situations. In future works, we plan to test whether the improved stacking algorithm can continue to play a constructive role as investment portfolios vary across multiple financial assets. Additionally, we intend to set our sights on the broader carbon market to test the robustness of this ensemble algorithm.

Author Contributions

P.Y.: Conceptualization, data curation, data analysis, methodology, funding, writing (original draft), and writing (review & editing); A.B.S.: Validation, visualization, and writing (review & editing); Y.L.: Supervision, validation, funding, and writing (review & editing). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that support the findings of this study are available from the corresponding authors upon reasonable request.

Acknowledgments

The researchers would like to express their gratitude to the anonymous reviewers for their efforts to improve the quality of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Supplementary Introduction of Models

Appendix A.1. Support Vector Regression (SVR)

Support Vector Regression (SVR), proposed by Drucker et al. [65], is an important branch of Support Vector Machine (SVM). The basic idea of the SVR algorithm is to find a regression plane that makes the data in the whole set have the shortest distance from it, which can be described as follows.
Given a set of observations { x i , y i } i = 1 T , where x i is an n-dimensional input vector, x i R n , y i is the corresponding target output, y i R , and T is the sample size. The regression function used to formulate the nonlinear relationship between input and output is called the SVR function, which is expressed as follows:
f ( x ; ω , b ) = ω T ϕ ( x ) + b ,
where ϕ is the nonlinear transfer function that maps the input vector to high-dimensional feature space, ω denotes a set of weights, and b is the coefficient of the threshold. ω and b are estimated by minimizing the following regularized risk function [66]:
R ( C ) = C i = 1 T L ε y i , f x i + 1 2 | | ω | | 2 ,
where L ε y i , f x i is called the ε -insensitive loss function, which is defined as shown:
L ε y i , f ( x ) = | y f ( x ) | ε , | y f ( x ) | ) ε 0 , otherwise .
C and ε are the prescribed parameters that are chosen beforehand by the user. Then, SVR is transformed into an optimization problem with an objective function as follows:
m i n 1 2 ω 2 + C i = 1 T ξ i + ξ i * ,
which is subject to the constraints:
ω ϕ ( x ) + b y i ε + ξ i , y i ω ϕ ( x ) b ε + ξ i * , ξ i , ξ i * 0 .
In Equation (A4), ξ i and ξ i * are positive slack variables that denote the distance from actual values to the corresponding boundary values of ε -tube. We obtain the SVR regression function by solving the optimization problem:
f ( x ) = i = 1 T α i α i * K x i , x j + b ,
where α i and α i * are Lagrange multipliers and K x i , x j is called the kernel function, which yields the inner products in the feature space ϕ ( x i ) and ϕ ( x j ) . In this study, we used the Gaussian radial basis function (RBF) as the kernel function, which is not only easy to implement, but also has advantages in dealing with nonlinear problems [67]. Its expression is given by the following:
K x i , x j = exp x i x j 2 2 σ 2 .

Appendix A.2. Random Forest(RF)

Random Forest (RF), which combines bootstrapping and random feature selection, is an ensemble machine learning algorithm based on evaluations of classification and regression trees (CART). Ho [68] proposed the original random decision forest algorithm and Breiman [69] further extended and developed the random forest algorithm. The core idea of RF is to combine the predicted values from multiple decision trees to achieve more diverse and robust results.
More specifically, the first step is to extract multiple samples by the bootstrap resampling method from the original sample, which improves generalization capacity and avoids overfitting. Then, several decision trees are constructed for the extracted samples. The tree nodes continue to split until the tree reaches its maximum depth, and these trees will not be pruned. The prediction results of the decision trees are collected, and the simple average strategy is adopted to calculate the final predicted value. The process of the random forest algorithm is shown in Figure A1.

Appendix A.3. eXtreme Gradient Boosting(XGBoost)

XGBoost (eXtreme Gradient Boosting) is an efficient decision tree algorithm, which is based on the original gradient boosting decision tree (GBDT) and greatly improves the model performance. As a forward-additive model, its core idea is to combine several weak learners into a strong one by integration, more specifically, by boosting. XGBoost is composed of multiple Classification And Regression Trees (CART), so it can be used for both regression and classification.
Figure A1. Flowchart of the random forest method.
Figure A1. Flowchart of the random forest method.
Energies 16 04520 g0a1
The formula and derivation of the XGBoost algorithm are briefly introduced as follows. Firstly, the additive model takes the following form:
y ^ i = φ x i = k = 1 K f k x i ,
where f k x i represents the prediction of a weak learner and K is the number of candidate weak learners. The subsequent direction of the algorithm is to find the optimal parameters by minimizing the objective function, as shown in Equation (A9):
L ( φ ) = i = 1 n l y i , y ^ i + k = 1 K Ω f k ,
where Ω f k = γ T + 1 2 λ ω 2 denotes the complexity of the kth model, γ and λ are configurable parameters used for controlling the degree of penalty and regularization, T represents the number of leaf nodes of the decision trees, and l ( · ) is the original convex loss function to measure the difference between the observation and the predicted value.
To reduce the difficulty of this optimization problem, an additive manner is used, which adds f t ( x i ) and uses a second-order Taylor expansion to further approach the exact solution. The transformed objective function is shown in Equations (A10) and (A11):
L t = i = 1 n l y i , y ^ i ( t 1 ) + f t x i + Ω f t ,
L t = i = 1 n l y i , y ^ i ( t 1 ) + g i f t x i + 1 2 h i f t 2 x i + Ω f t ,
where g i and h i , respectively, represent the first- and second-order derivatives of the loss function.

Appendix A.4. Three Penalty Regression

Ridge regression: Ridge regression is a well-known modified linear regression proposed by Hoerl and Kennard [70], which aims to address overfitting of ordinary least squares by imposing a penalty on the size of the coefficients. Mathematically, the ordinary least squares solve a problem of the form:
min w X w y 2 2 ,
while the ridge coefficients minimize a penalized residual sum of squares:
min w X w y 2 2 + α w 2 2
In our model, we assigned α the default value ( α = 1 ) given by sklearn (sklearn (scikit-learn) is a famous and powerful machine learning library in Python, which covers almost all fields of machine learning, such as data preprocessing, model validation, feature selection, classification, regression, clustering, and dimensionality reduction).
Lasso regression: Tibshirani [71] proposed the famous Lasso, which sweeps the whole high-dimensional field. Lasso is very useful in some contexts for effectively reducing the number of features. Lasso regression adds a regularization term to the linear regression in its objective function:
min w 1 2 n X w y 2 2 + α w 1 ,
where n is the sample size, α is a constant, and w 1 is the L 1 -norm of the coefficient vector w. We used the cross-validation function l a s s o c v provided by sklearn to determine the parameter α .
E-net regression: E-net (Elastic-net) regression, proposed by Zou and Hastie [72], is regarded as a combination of Ridge regression and Lasso regression, maintaining some advantages of both. The objective function of E-net regression is to minimize the following:
min w 1 2 n X w y 2 2 + α ρ w 1 + α ( 1 ρ ) 2 w 2 2
where w 1 and w 2 represent the L 1 -norm and L 2 -norm of the coefficient vector, respectively. In determining constants α and ρ , we used the cross-validation function E l a s t i c N e t C V from sklearn.

Appendix B

Table A1. Descriptive statistics.
Table A1. Descriptive statistics.
VariablesMeanStd. Dev.Skew.Kurt.ADF TestJarque–Bera
SZA0.000340.187220.2811.27−18.85 ***4857.24 ***
LNG0.000680.000470.3433.09−9.76 ***41764.82 ***
SPcoa0.000420.00050−2.1424.06−8.86 ***22770.72 ***
SPcru0.000440.00052−0.3310.50−25.58 ***4217.38 ***
SPcor0.000460.000050.684.42−26.42 ***815.51 ***
SPegg0.000200.000454.6863.46−8.15 ***156892.57 ***
SPcot−0.000130.000180.068.94−25.61 ***3045.48 ***
SPwhe0.000310.000153.1643.93−8.06 ***75088.17 ***
SPalu0.000320.00013−0.5817.67−7.01 ***11956.52 ***
SPzin−0.000010.000170.609.79−26.96 ***3712.28 ***
SPlea−0.000210.00010−0.314.56−16.78 ***806.90 ***
SPnic0.000630.000280.145.36−12.91 ***1096.74 ***
SPtin0.000350.000200.1612.29−6.73 ***5760.94 ***
SPsil0.000200.00023−0.9811.20−15.31 ***4924.58 ***
SPgol0.000380.00006−0.8210.13−9.97 ***4013.54 ***
SPcop0.000200.000132.1249.51−27.06 ***94123.69 ***
SSEPE−0.000320.00024−4.4049.17−9.80 ***95125.33 ***
SSECI0.000030.00015−0.393.76−30.72 ***563.26 ***
CSI3000.000080.00018−0.413.33−11.46 ***448.50 ***
SSE1800.000050.00017−0.253.18−11.65 ***394.53 ***
SZSECI0.000210.00023−0.623.52−10.88 ***532.07 ***
CSI1000.000030.00018−0.352.87−10.28 ***331.63 ***
CSI5000.000070.00022−0.593.79−29.92 ***599.16 ***
SSEGBI0.000210.000004.9657.39−5.88 ***129320.39 ***
SSECBI0.000230.000003.9627.77−5.29 ***31781.76 ***
SSEEBI0.000210.000002.6820.55−5.10 ***17187.30 ***
Gb3M−0.000810.00075−1.7117.74−10.91 ***12447.21 ***
Gb10Y−0.000340.00008−1.7618.40−7.98 ***13377.39 ***
Cb3M−0.001080.00031−1.3616.33−15.98 ***10454.93 ***
Cb10M−0.000440.00002−2.1127.65−6.90 ***29825.27 ***
coalb3M−0.001080.00033−1.1715.50−16.51 ***9368.78 ***
coalb5Y−0.000600.00006−0.8110.93−8.55 ***4653.14 ***
FCI−0.004500.01904−0.8117.78−8.76 ***12151.94 ***
CSIIene0.000180.000340.205.04−31.51 ***974.00 ***
Windene0.000250.000320.054.83−31.17 ***888.71 ***
Note: ADF test tests the null hypothesis that the series has the unit root, which means that the series is nonstationary. Jarque-Bera test tests the null hypothesis that the series follows a normal distribution. *** indicates that the null hypothesis is rejected at the statistical significance of 1%.
Table A2. Explanation of variables.
Table A2. Explanation of variables.
LabelVariableTransform.
SZAOption Settlement Price: Carbon Emission Right (Shenzhen)LD
Panel A: Energy and non-energy commodities
LNGChina Liquified Natural Gas Price IndexLD
SPcoaFutures settlement price (active contract): CoalLD
SPcruFutures settlement price (active contract): Crude oilLD
SPcorFutures settlement price (active contract): CornLD
SPeggFutures settlement price (active contract): EggLD
SPcotFutures settlement price (active contract): CottonLD
SPwheFutures settlement price (active contract): WheatLD
SPaluFutures settlement price (active contract): AluminiumLD
SPzinFutures settlement price (active contract): ZincLD
SPleaFutures settlement price (active contract): LeadLD
SPnicFutures settlement price (active contract): NickelLD
SPtinFutures settlement price (active contract): TinLD
SPsilFutures settlement price (active contract): SilverLD
SPgolFutures settlement price (active contract): GoldLD
SPcopFutures settlement price (active contract): CopperLD
SZAOption Settlement Price: Carbon Emission Right (Shenzhen)LD
Panel B: Financial variables
SSEPESSE Average P/E ratioLD
SSECISSE Composite IndexLD
CSI300CSI 300 IndexLD
SSE180SSE 180 IndexLD
SZSECISZSE Composite IndexLD
CSI100CSI 100 IndexLD
CSI500CSI 500 IndexLD
SSEGBISSE Government Bond IndexLD
SSECBISSE Corporate Bond IndexLD
SSEEBISSE Enterprise Bond IndexLD
Gb3MCCDC government bond yield: 3-monthsLD
Gb10YCCDC government bond yield: 10-yearsLD
Cb3MCCDC corporate bond yield (AAA): 3-monthsLD
Cb10MCCDC corporate bond yield (AAA): 10-yearsLD
coalb3MCCDC coal industry bond yield (AAA): 3-monthsLD
coalb5YCCDC coal industry bond yield (AAA): 5-yearsLD
Panel C: Economic and industry index
FCIFinancial Conditions IndexFD
CSIIeneChina Securities Industry Index: EnergyLD
WindeneWIND Industry Index: EnergyLD
Note: 1. Shanghai Stock Exchange; 2. China Securities Index; 3. Shenzhen Stock Exchange; 4. China Central Depository & Clearing Co., Ltd., Beijing, China.
Table A3. List of hyperparameters.
Table A3. List of hyperparameters.
Model NameExplanationsHyperparameters
Base Model
FARFactor-augmented regression p m a x = q m a x = 4 , r = 7
FAR+MMSMallows Model Selection-
MMAMallows Model Averaging-
E-netElastic-net α = 0.168 , l 1 r a t i o = 0.1
lassoLasso regression-
ridgeRidge regression-
svrSupport Vector Regressionkernel:rbf,
gamma:auto,
C:[0.5,1]
XGBoosteXtreme Gradient Boostingmax_depth:3,
learning_rate:0.04,
subsample:0.3,
colsample_bytree:0.8,
reg_alpha:0.05,
reg_lambda:0.05,
n_estimators:50
rfRandom Forestn_estimators:50,
max_features:sqrt,
max_depth:4,
min_samples_split:2,
min_samples_leaf:4
Ensemble Model
homo_rfRF as meta-model for homogeneous ensemblen_estimators:180,
max_features:sqrt,
max_depth:2,
min_samples_split:2,
min_samples_leaf:4
homo_svrSVR as meta-model for homogeneous ensemblec = 0.5
homo_xgbXGBoost as meta-model for homogeneous ensemblemax_depth:2,
learning_rate:0.1,
subsample:0.95,
colsample_bytree:0.7,
reg_alpha:0.2,
reg_lambda:0.05,
n_estimators:50
hete_rfRF as meta-model for heterogeneous ensemblen_estimators:70,
max_features:sqrt,
max_depth:3,
min_samples_split:2,
min_samples_leaf:2
hete_svrSVR as meta-model for heterogeneous ensemblec = 8
hete_xgbXGBoost as meta-model heterogeneous ensemblemax_depth:2,
learning_rate:0.06,
subsample:0.6,
colsample_bytree:0.6,
reg_alpha:0.2,
reg_lambda:0.06,
n_estimators:80
Table A4. Comparison of model accuracy at different window sizes.
Table A4. Comparison of model accuracy at different window sizes.
w = 200
RMSE SMAPE MAE U1 R os 2
WholeHighLow WholeHighLow WholeHighLow WholeHighLow WholeHighLow
Base Model
FAR0.43690.56200.2568 0.25950.34810.1708 0.27490.37580.1740 0.16300.25690.0935 0.29420.3363−0.0142
FAR+MMS0.43620.56040.2578 0.27040.36010.1807 0.28510.38660.1836 0.16320.25950.0942 0.29650.3400−0.0222
MMA0.43780.56410.2549 0.27130.36340.1792 0.28620.39030.1821 0.16310.26010.0926 0.29130.33110.0003
E-net0.46580.61270.2419 0.25520.35510.1553 0.27420.39030.1582 0.16410.26810.0793 0.19770.21110.0996
lasso0.47560.62800.2410 0.25730.35780.1569 0.27760.39560.1596 0.16900.27670.0784 0.16340.17110.1067
ridge0.46920.61570.2473 0.25980.35710.1626 0.27910.39260.1656 0.16620.27040.0843 0.18590.20320.0590
SVR0.45070.59160.2372 0.24770.34300.1524 0.26520.37540.1551 0.15640.25230.0771 0.24880.26430.1346
XGBoost0.47590.62420.2518 0.27630.37860.1740 0.29550.41440.1767 0.16470.26110.0916 0.16230.18110.0248
RF0.48840.64690.2420 0.26450.36700.1621 0.28610.40750.1647 0.16200.26150.0795 0.11800.12060.0990
Ensemble Model
homo_rf0.42680.55280.2426 0.26080.35090.1707 0.27510.37700.1732 0.15300.24330.0841 0.32620.35780.0950
homo_svr0.40590.52310.2365 0.24020.31880.1616 0.25330.34260.1640 0.13720.21090.0802 0.39060.42500.1394
homo_xgb0.42100.54720.2347 0.25080.33950.1622 0.26500.36540.1646 0.14560.23160.0788 0.34450.37070.1529
hete_rf0.43110.56680.2248 0.25400.34680.1613 0.26930.37520.1633 0.15360.24490.0778 0.31260.32490.2227
hete_svr0.41690.54220.2317 0.24130.32600.1565 0.25550.35200.1590 0.14420.22670.0788 0.35720.38210.1744
hete_xgb0.41960.55000.2226 0.25110.34460.1576 0.26530.37080.1597 0.14770.23330.0788 0.34890.36410.2375
Percentage of improvement(%)
homogeneous4.33893.72717.3429 3.42483.37813.5201 3.79683.76423.8672 10.867611.003613.2982 ---
heterogeneous9.36139.62337.6670 4.94315.86982.8976 6.13687.36883.2041 9.244811.10465.7464 ---
w = 300
RMSE SMAPE MAE U1 R os 2
WholeHighLow WholeHighLow WholeHighLow WholeHighLow WholeHighLow
Base Model
FAR0.44320.57230.2555 0.26310.35540.1708 0.27920.38440.1740 0.16140.25740.0900 0.27370.3117−0.0043
FAR+MMS0.43630.55920.2609 0.27230.35730.1872 0.28700.38370.1902 0.16360.25720.0964 0.29590.3428−0.0473
MMA0.43780.56090.2620 0.27210.35570.1884 0.28690.38250.1914 0.16370.25680.0958 0.29120.3387−0.0561
E-net0.46570.61200.2431 0.25490.35370.1561 0.27400.38890.1591 0.16450.26710.0804 0.19810.21270.0907
lasso0.46810.61630.2415 0.25440.35310.1557 0.27370.38900.1585 0.16540.26950.0792 0.18980.20160.1031
ridge0.46850.61580.2447 0.25730.35530.1592 0.27660.39090.1622 0.16610.26960.0825 0.18810.20310.0788
SVR0.44730.58680.2363 0.24710.34200.1523 0.26430.37370.1549 0.15590.25140.0774 0.26010.27640.1410
XGBoost0.48380.64170.2376 0.26810.38080.1554 0.28860.41930.1580 0.17150.27520.0837 0.13430.13460.1316
RF0.49160.64850.2504 0.27420.37100.1775 0.29600.41190.1802 0.17910.27810.0967 0.10640.11610.0355
Ensemble Model
homo_rf0.42990.55640.2450 0.25880.34320.1744 0.27360.37020.1769 0.15270.23810.0867 0.31660.34940.0766
homo_svr0.42340.54920.2387 0.24640.32780.1651 0.26120.35470.1676 0.14450.22300.0813 0.33690.36600.1235
homo_xgb0.42550.54960.2448 0.25430.33930.1694 0.26880.36560.1721 0.14770.23160.0831 0.33050.36500.0780
hete_rf0.43430.56670.2367 0.25690.33810.1757 0.27250.36700.1779 0.15460.23770.0860 0.30260.32500.1384
hete_svr0.41690.54170.2328 0.24080.32620.1553 0.25490.35210.1578 0.14400.22210.0786 0.35720.38330.1664
hete_xgb0.42760.55860.2319 0.25260.34390.1614 0.26750.37130.1637 0.14940.23340.0813 0.32370.34430.1731
Percentage of improvement(%)
homogeneous3.81093.58464.9584 3.78215.26030.7060 4.05425.42351.0293 8.106910.29436.9572 ---
heterogeneous8.54649.16474.6173 4.22876.3325−0.3874 5.38117.6877−0.0795 10.372313.38813.6899 ---
Table A5. The annualized utility gain in the high volatility state with different γ values.
Table A5. The annualized utility gain in the high volatility state with different γ values.
γ homo_rfhomo_svrhomo_xgbhete_rfhete_svrhete_xgb
0.1127.7197123.8112130.1952124.2137125.9954123.8034
0.2106.8630101.8535105.7327103.7007105.0748105.3883
0.388.218585.670987.118583.826486.431685.9962
0.470.267568.899070.214565.635468.652567.9815
0.552.701159.831754.793847.468749.015950.8217
0.638.326149.114645.418633.730330.226937.5260
0.729.115737.584934.788023.830220.685528.0542
0.819.587626.351124.597014.080811.014918.0043
0.910.204215.676714.69743.97321.53897.7412
1.00.86235.46784.9580−6.3192−7.4124−1.6894
1.1−8.2981−4.2715−4.6567−15.5063−16.6081−10.8347
1.2−17.4042−13.3336−14.0189−24.3991−25.6488−19.2512
1.3−26.3873−22.1643−23.2896−33.1668−34.0418−27.5019
1.4−35.2305−30.8520−32.3527−41.2748−42.2087−35.9446
1.5−43.9335−39.4210−41.2813−48.5382−49.5585−44.2988
1.6−52.3764−47.8206−49.8819−55.5433−56.5896−52.5452
1.7−59.5221−56.0732−57.9944−62.5579−63.4440−60.4958
1.8−66.6184−63.7761−65.4391−62.2680−67.9305−68.0212
1.9−67.0725−71.0538−70.1967−61.9225−72.4949−71.0244
2.0−66.7368−78.2603−74.9988−61.8104−77.0745−67.9159
Table A6. The annualized utility gain in the low volatility state with different γ values.
Table A6. The annualized utility gain in the low volatility state with different γ values.
γ homo_rfhomo_svrhomo_xgbhete_rfhete_svrhete_xgb
0.110.583311.06489.658321.740214.00936.2470
0.29.29228.05469.588819.74749.33246.7526
0.37.14805.26828.157816.54445.66466.6055
0.44.15193.35135.876312.79612.84515.2168
0.51.60632.20583.10169.31131.12123.6444
0.6−0.23341.11630.36787.7594−0.21192.3696
0.7−0.2521−0.1230−1.81657.5543−1.74351.4899
0.8−0.2480−1.4857−2.25437.3756−3.34730.5143
0.9−0.2202−2.8608−2.34716.8545−3.4884−0.1160
1.0−0.4539−4.2558−2.50656.2759−3.6066−0.6956
1.1−0.8144−5.6323−2.53515.6874−3.4685−0.8434
1.2−1.2150−6.6567−2.60405.0897−3.4482−0.9459
1.3−1.6539−6.5332−2.91804.4465−3.5440−1.1259
1.4−2.1931−6.5012−3.31783.7551−3.7335−1.3668
1.5−2.6683−6.5235−3.74213.0814−3.9281−1.6564
1.6−3.1688−6.6074−4.23892.4251−4.1701−1.9855
1.7−3.7310−6.7433−4.78521.7832−4.4465−2.3472
1.8−4.2997−6.9287−5.35041.1336−4.7509−2.7454
1.9−4.8519−7.1635−5.83270.4947−5.0776−3.1775
2.0−5.3919−7.4461−6.3234−0.0607−5.3202−3.6242
Figure A2. Correlation heat map.
Figure A2. Correlation heat map.
Energies 16 04520 g0a2

References

  1. Weng, Q.; Xu, H. A review of China’s carbon trading market. Renew. Sustain. Energy Rev. 2018, 91, 613–619. [Google Scholar] [CrossRef]
  2. Qi, S.; Cheng, S.; Tan, X.; Feng, S.; Zhou, Q. Predicting China’s carbon price based on a multi-scale integrated model. Appl. Energy 2022, 324, 119784. [Google Scholar] [CrossRef]
  3. Lu, H.; Ma, X.; Huang, K.; Azimi, M. Carbon trading volume and price forecasting in China using multiple machine learning models. J. Clean. Prod. 2020, 249, 119386. [Google Scholar] [CrossRef]
  4. Fan, X.; Li, S.; Tian, L. Chaotic characteristic identification for carbon price and an multi-layer perceptron network prediction model. Expert Syst. Appl. 2015, 42, 3945–3952. [Google Scholar] [CrossRef]
  5. Zhu, B.; Wei, Y. Carbon price forecasting with a novel hybrid ARIMA and least squares support vector machines methodology. Omega 2013, 41, 517–524. [Google Scholar] [CrossRef]
  6. Segnon, M.; Lux, T.; Gupta, R. Modeling and forecasting the volatility of carbon dioxide emission allowance prices: A review and comparison of modern volatility models. Renew. Sustain. Energy Rev. 2017, 69, 692–704. [Google Scholar] [CrossRef] [Green Version]
  7. Byun, S.J.; Cho, H. Forecasting carbon futures volatility using GARCH models with energy volatilities. Energy Econ. 2013, 40, 207–221. [Google Scholar] [CrossRef]
  8. Chevallier, J.; Sévi, B. On the realized volatility of the ECX CO2 emissions 2008 futures contract: Distribution, dynamics and forecasting. Ann. Stat. 2009, 32, 407–499. [Google Scholar] [CrossRef] [Green Version]
  9. Zhang, X.; Wang, J.; Zhang, K. Short-term electric load forecasting based on singular spectrum analysis and support vector machine optimized by Cuckoo search algorithm. Electr. Power Syst. Res. 2017, 146, 270–285. [Google Scholar] [CrossRef]
  10. Jiang, L.; Wu, P. International carbon market price forecasting using an integration model based on SVR. In Proceedings of the 2015 International Conference on Engineering Management, Engineering Education and Information Technology, Guangzhou, China, 24–25 October 2015; Atlantis Press: Amsterdam, The Netherlands, 2015; pp. 303–308. [Google Scholar]
  11. Atsalakis, G.S. Using computational intelligence to forecast carbon prices. Appl. Soft Comput. 2016, 43, 107–116. [Google Scholar] [CrossRef]
  12. Ji, L.; Zou, Y.; He, K.; Zhu, B. Carbon futures price forecasting based with ARIMA-CNN-LSTM model. Procedia Comput. Sci. 2019, 162, 33–38. [Google Scholar] [CrossRef]
  13. Zhu, B.; Ye, S.; Wang, P.; He, K.; Zhang, T.; Wei, Y.M. A novel multiscale nonlinear ensemble leaning paradigm for carbon price forecasting. Energy Econ. 2018, 70, 143–157. [Google Scholar] [CrossRef]
  14. Xiong, S.; Wang, C.; Fang, Z.; Ma, D. Multi-step-ahead carbon price forecasting based on variational mode decomposition and fast multi-output relevance vector regression optimized by the multi-objective whale optimization algorithm. Energies 2019, 12, 147. [Google Scholar] [CrossRef] [Green Version]
  15. Qin, Q.; He, H.; Li, L.; He, L.Y. A novel decomposition-ensemble based carbon price forecasting model integrated with local polynomial prediction. Comput. Econ. 2020, 55, 1249–1273. [Google Scholar] [CrossRef]
  16. Sun, W.; Duan, M. Analysis and forecasting of the carbon price in china’s regional carbon markets based on fast ensemble empirical mode decomposition, phase space reconstruction, and an improved extreme learning machine. Energies 2019, 12, 277. [Google Scholar] [CrossRef] [Green Version]
  17. Yang, Y.; Guo, H.; Jin, Y.; Song, A. An ensemble prediction system based on artificial neural networks and deep learning methods for deterministic and probabilistic carbon price forecasting. Front. Environ. Sci. 2021, 9, 740093. [Google Scholar] [CrossRef]
  18. Zhou, J.; Yu, X.; Yuan, X. Predicting the carbon price sequence in the shenzhen emissions exchange using a multiscale ensemble forecasting model based on ensemble empirical mode decomposition. Energies 2018, 11, 1907. [Google Scholar] [CrossRef] [Green Version]
  19. Tan, X.; Sirichand, K.; Vivian, A.; Wang, X. Forecasting European carbon returns using dimension reduction techniques: Commodity versus financial fundamentals. Int. J. Forecast. 2022, 38, 944–969. [Google Scholar] [CrossRef]
  20. Adekoya, O.B. Predicting carbon allowance prices with energy prices: A new approach. J. Clean. Prod. 2021, 282, 124519. [Google Scholar] [CrossRef]
  21. Zhao, X.; Han, M.; Ding, L.; Kang, W. Usefulness of economic and energy data at different frequencies for carbon price forecasting in the EU ETS. Appl. Energy 2018, 216, 132–141. [Google Scholar] [CrossRef]
  22. French, K.R.; Schwert, G.W.; Stambaugh, R.F. Expected stock returns and volatility. J. Financ. Econ. 1987, 19, 3–29. [Google Scholar] [CrossRef] [Green Version]
  23. Nelson, D.B. Conditional heteroskedasticity in asset returns: A new approach. Model. Stock. Mark. Volatility 1991, 59, 347–370. [Google Scholar] [CrossRef]
  24. Benz, E.; Trück, S. Modeling the price dynamics of CO2 emission allowances. Energy Econ. 2009, 31, 4–15. [Google Scholar] [CrossRef]
  25. Dasarathy, B.V.; Sheela, B.V. A composite classifier system design: Concepts and methodology. Proc. IEEE 1979, 67, 708–713. [Google Scholar] [CrossRef]
  26. Schapire, R.E. The strength of weak learnability. Mach. Learn. 1990, 5, 197–227. [Google Scholar] [CrossRef] [Green Version]
  27. Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
  28. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
  29. Ding, W.; Wu, S. ABC-based stacking method for multilabel classification. Turk. J. Electr. Eng. Comput. Sci. 2019, 27, 4231–4245. [Google Scholar] [CrossRef]
  30. Bakurov, I.; Castelli, M.; Gau, O.; Fontanella, F.; Vanneschi, L. Genetic programming for stacked generalization. Swarm Evol. Comput. 2021, 65. [Google Scholar] [CrossRef]
  31. Agarwal, S.; Chowdary, C.R. A-Stacking and A-Bagging: Adaptive versions of ensemble learning algorithms for spoof fingerprint detection. Expert Syst. Appl. 2020, 146. [Google Scholar] [CrossRef]
  32. Varshini, P.A.G.; Kumari, A.K.; Varadarajan, V. Estimating software development efforts using a random forest-based stacked ensemble approach. Electronics 2021, 10, 1195. [Google Scholar] [CrossRef]
  33. Lacy, S.E.; Lones, M.A.; Smith, S.L. A Comparison of evolved linear and non-linear ensemble vote aggregators. In Proceedings of the 2015 IEEE Congress on Evolutionary Computation (CEC), Sendai, Japan, 25–28 May 2015; IEEE Congress on Evolutionary Computation. IEEE: Piscataway, NJ, USA, 2015; pp. 758–763. [Google Scholar]
  34. Menahem, E.; Rokach, L.; Elovici, Y. Troika—An improved stacking schema for classification tasks. Inf. Sci. 2009, 179, 4097–4122. [Google Scholar] [CrossRef] [Green Version]
  35. Pari, R.; Sandhya, M.; Sankar, S. A multitier stacked ensemble algorithm for improving classification accuracy. Comput. Sci. Eng. 2020, 22, 74–85. [Google Scholar] [CrossRef]
  36. Adyapady R, R.; Annappa, B. An ensemble approach using a frequency-based and stacking classifiers for effective facial expression recognition. Multimed. Tools Appl. 2022, 82, 14689–14712. [Google Scholar] [CrossRef]
  37. Yoon, T.; Kang, D. Multi-model Stacking ensemble for the diagnosis of cardiovascular diseases. J. Pers. Med. 2023, 13, 373. [Google Scholar] [CrossRef]
  38. Dumancas, G.; Adrianto, I. A stacked regression ensemble approach for the quantitative determination of biomass feedstock compositions using near infrared spectroscopy. Spectrochim. Acta Part Mol. Biomol. Spectrosc. 2022, 276, 121231. [Google Scholar] [CrossRef]
  39. Zhang, Z.; Ma, Y.; Hua, Y. Financial Fraud Identification Based on Stacking Ensemble Learning Algorithm: Introducing MD&A Text Information. Comput. Intell. Neurosci. 2022, 2022, 1780834. [Google Scholar] [CrossRef]
  40. Yang, Y.; Liu, X. A robust semi-supervised learning approach via mixture of label information. Pattern Recognit. Lett. 2015, 68, 15–21. [Google Scholar] [CrossRef]
  41. Breiman, L. Stacked regressions. Mach. Learn. 1996, 24, 49–64. [Google Scholar] [CrossRef] [Green Version]
  42. Campbell, J.Y.; Thompson, S.B. Predicting excess stock returns out of sample: Can anything beat the historical average? Rev. Financ. Stud. 2007, 21, 1509–1531. [Google Scholar] [CrossRef] [Green Version]
  43. Zhu, B.; Han, D.; Wang, P.; Wu, Z.; Zhang, T.; Wei, Y.M. Forecasting carbon price using empirical mode decomposition and evolutionary least squares support vector regression. Appl. Energy 2017, 191, 521–530. [Google Scholar] [CrossRef] [Green Version]
  44. Zhang, L.; Zhang, J.; Xiong, T.; Su, C. Interval forecasting of carbon futures prices using a novel hybrid approach with exogenous variables. Discret. Dyn. Nat. Soc. 2017, 2017, 5730295. [Google Scholar] [CrossRef] [Green Version]
  45. Yahsi, M.; Canakoglu, E.; Agrali, S. Carbon price forecasting models based on big data analytics. Carbon Manag. 2019, 10, 175–187. [Google Scholar] [CrossRef]
  46. Wang, J.; Sun, X.; Cheng, Q.; Cui, Q. An innovative random forest-based nonlinear ensemble paradigm of improved feature extraction and deep learning for carbon price forecasting. Sci. Total Environ. 2021, 762, 143099. [Google Scholar] [CrossRef]
  47. Zhang, C.; Zhao, Y.; Zhao, H. A novel hybrid price prediction model for multimodal carbon emission trading market based on CEEMDAN algorithm and window-based XGBoost approach. Mathematics 2022, 10, 72. [Google Scholar] [CrossRef]
  48. Jaramillo-Moran, M.A.; Fernandez-Martinez, D.; Garcia-Garcia, A.; Carmona-Fernandez, D. Improving artificial intelligence forecasting models performance with data preprocessing: European Union Allowance prices case study. Energies 2021, 14, 845. [Google Scholar] [CrossRef]
  49. Kim, H.H.; Swanson, N.R. Forecasting financial and macroeconomic variables using data reduction methods: New empirical evidence. J. Econom. 2014, 178, 352–367. [Google Scholar] [CrossRef] [Green Version]
  50. Cheng, X.; Hansen, B.E. Forecasting with factor-augmented regression: A frequentist model averaging approach. J. Econom. 2015, 186, 280–293. [Google Scholar] [CrossRef] [Green Version]
  51. Mallows, C.L. Some comments on Cp. Technometrics 2000, 42, 87–94. [Google Scholar]
  52. Hansen, B.E. Least squares model averaging. Econometrica 2007, 75, 1175–1189. [Google Scholar] [CrossRef] [Green Version]
  53. Hamilton, J.D.; Susmel, R. Autoregressive conditional heteroskedasticity and changes in regime. J. Econom. 1994, 64, 307–333. [Google Scholar] [CrossRef]
  54. Liu, M.; Lee, C.C. Capturing the dynamics of the China crude oil futures: Markov switching, co-movement, and volatility forecasting. Energy Econ. 2021, 103, 105622. [Google Scholar] [CrossRef]
  55. Wang, P.; Zong, L.; Ma, Y. An integrated early warning system for stock market turbulence. Expert Syst. Appl. 2020, 153, 113463. [Google Scholar] [CrossRef] [Green Version]
  56. Shi, Y.; Ho, K.Y.; Liu, W.M. Public information arrival and stock return volatility: Evidence from news sentiment and Markov Regime-Switching Approach. Int. Rev. Econ. Financ. 2016, 42, 291–312. [Google Scholar] [CrossRef]
  57. Ardia, D.; Bluteau, K.; Boudt, K.; Catania, L.; Trottier, D.A. Markov-switching GARCH models in R: The MSGARCH package. J. Stat. Softw. 2019, 91, 1–38. [Google Scholar] [CrossRef] [Green Version]
  58. Hansen, P.R.; Lunde, A.; Nason, J.M. The model confidence set. Econometrica 2011, 79, 453–497. [Google Scholar] [CrossRef] [Green Version]
  59. Rapach, D.E.; Strauss, J.K.; Zhou, G. Out-of-sample equity premium prediction: Combination forecasts and links to the real economy. Rev. Financ. Stud. 2009, 23, 821–862. [Google Scholar] [CrossRef]
  60. Zhao, A.B.; Cheng, T. Stock return prediction: Stacking a variety of models. J. Empir. Financ. 2022, 67, 288–317. [Google Scholar] [CrossRef]
  61. Liu, J.; Ma, F.; Tang, Y.; Zhang, Y. Geopolitical risk and oil volatility: A new insight. Energy Econ. 2019, 84, 104548. [Google Scholar] [CrossRef]
  62. Zhang, F.; Xia, Y. Carbon price prediction models based on online news information analytics. Financ. Res. Lett. 2022, 46, 102809. [Google Scholar] [CrossRef]
  63. Yun, P.; Zhang, C.; Wu, Y.; Yang, Y. Forecasting carbon dioxide price using a time-varying high-order moment hybrid model of NAGARCHSK and gated recurrent unit network. Int. J. Environ. Res. Public Health 2022, 19, 899. [Google Scholar] [CrossRef]
  64. Rossi, B.; Inoue, A. Out-of-sample forecast tests robust to the choice of window size. J. Bus. Econ. Stat. 2012, 30, 432–453. [Google Scholar] [CrossRef] [Green Version]
  65. Drucker, H.; Burges, C.J.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. Adv. Neural Inf. Process. Syst. 1996, 9, 155–161. [Google Scholar]
  66. Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1999. [Google Scholar]
  67. Wang, X.; Wang, Y. A hybrid model of EMD and PSO-SVR for short-term load forecasting in residential quarters. Math. Probl. Eng. 2016, 2016, 9895639. [Google Scholar] [CrossRef] [Green Version]
  68. Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; IEEE: Piscataway, NJ, USA, 1995; Volume 1, pp. 278–282. [Google Scholar]
  69. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  70. Hoerl, A.; Kennard, R. Ridge regression—Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  71. Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B-Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  72. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B-Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Workflow of common stacking.
Figure 1. Workflow of common stacking.
Energies 16 04520 g001
Figure 2. Topological structure of the common stacking with 5-fold cross-validation.
Figure 2. Topological structure of the common stacking with 5-fold cross-validation.
Energies 16 04520 g002
Figure 3. Workflow of the improved stacking.
Figure 3. Workflow of the improved stacking.
Energies 16 04520 g003
Figure 4. Homogeneous ensemble framework.
Figure 4. Homogeneous ensemble framework.
Energies 16 04520 g004
Figure 5. Heterogeneous ensemble framework.
Figure 5. Heterogeneous ensemble framework.
Energies 16 04520 g005
Figure 6. Radar chart of accuracy performance on the whole test data.
Figure 6. Radar chart of accuracy performance on the whole test data.
Energies 16 04520 g006
Figure 7. Division of high and low volatility states.
Figure 7. Division of high and low volatility states.
Energies 16 04520 g007
Figure 8. Predicted carbon return curve on the out-of-sample test set.
Figure 8. Predicted carbon return curve on the out-of-sample test set.
Energies 16 04520 g008
Figure 9. Impact of risk aversion parameter γ on the annualized utility gain over the whole test set period.
Figure 9. Impact of risk aversion parameter γ on the annualized utility gain over the whole test set period.
Energies 16 04520 g009
Figure 10. Impact of risk aversion parameter γ on the annualized utility gain in the periods with different volatility states.
Figure 10. Impact of risk aversion parameter γ on the annualized utility gain in the periods with different volatility states.
Energies 16 04520 g010
Table 1. Statistical evaluation indexes.
Table 1. Statistical evaluation indexes.
Evaluation
Index
DefinitionEquation
RMSE
Root mean
square error
1 n t = 1 n y ^ t y t 2
SMAPE
Symmetric mean absolute
percentage error
1 n t = 1 n | y ^ t y t | ( | y ^ t + y t | ) / 2
MAE
Mean
absolute error
1 n t = 1 n y ^ t y t
U1
Theil U
statistic 1
1 n t = 1 n y ^ t y t 2 / 1 n t = 1 n y t 2 + 1 n t = 1 n y ^ t 2
R os 2 Out-of-sample
R 2 statistic
1 t = 1 n y t y ^ t | t 1 2 t = 1 n y t y ¯ t | t 1 2
Table 2. Comparison of model accuracy for the whole test data.
Table 2. Comparison of model accuracy for the whole test data.
RMSESMAPEMAEU1 R os 2
Homogeneous ensemble
Base Model
FAR0.47010.27740.29590.17380.1828
FAR+MMS0.45130.27060.28720.16710.2469
Ensemble Model
homo_rf0.43170.25440.26980.15300.3107
homo_svr0.42290.24580.26050.14750.3386
homo_xgb0.42700.25100.26600.15120.3257
Heterogeneous ensemble
Base Model
MMA0.46160.27330.29100.17120.2120
E-net0.48320.26160.28270.16790.1364
lasso0.48680.26830.28950.16620.1238
ridge0.48250.26470.28550.16770.1390
SVR0.46680.25420.27340.16020.1943
XGBoost0.50220.28630.30870.17130.0671
RF0.49920.27080.29360.16290.0783
Ensemble Model
hete_rf0.43870.25410.27020.15440.2884
hete_svr0.43400.24450.26020.15350.3034
hete_xgb0.43930.24460.26110.15360.2862
Note: Explanation of ensemble model name abbreviations: ‘homo’ stands for homogeneous and the suffix connecting ‘homo’ represents the name of the meta-model used in the stacking algorithm. For example, ‘homo_rf’ denotes the ensemble model using homogeneous ensemble framework and random forest as the meta-model. ‘hete’ stands for heterogeneous, and its rules for abbreviation are similar to ‘homo’.FAR+MMS denotes the best factor-augmented regression selected by Mallows model selection criteria.
Table 3. Comparison of model accuracy for the period with different volatility using homogeneous ensemble.
Table 3. Comparison of model accuracy for the period with different volatility using homogeneous ensemble.
RMSE SMAPE MAE U1 R os 2
HighLow HighLow HighLow HighLow HighLow
Base Model
FAR0.60410.2775 0.36910.1858 0.40200.1897 0.27570.0945 0.2330−0.1846
FAR+MMS0.58640.2518 0.36570.1755 0.39600.1784 0.27320.0862 0.27730.0251
Ensemble Model
homo_rf0.56170.2392 0.34080.1680 0.36920.1704 0.24510.0797 0.33680.1199
homo_svr0.54820.2390 0.32850.1630 0.35540.1656 0.23110.0790 0.36830.1213
homo_xgb0.55420.2399 0.33730.1648 0.36460.1673 0.24160.0799 0.35450.1148
Percentage of improvement(%)
8.173513.7447 9.090011.0440 9.695211.5317 13.212615.8577 --
Table 4. Comparison of model accuracy for the period with different volatility using heterogeneous ensemble.
Table 4. Comparison of model accuracy for the period with different volatility using heterogeneous ensemble.
RMSE SMAPE MAE U1 R os 2
HighLow HighLow HighLow HighLow HighLow
Base Model
MMA0.60560.2437 0.37530.1713 0.40810.1740 0.28150.0855 0.22920.0863
E-net0.63760.2460 0.36540.1579 0.40450.1609 0.27500.0807 0.14560.0688
lasso0.63840.2575 0.36830.1683 0.40750.1715 0.26850.0857 0.1434−0.0197
ridge0.63450.2511 0.36420.1652 0.40260.1683 0.27370.0845 0.15390.0298
SVR0.61540.2388 0.35300.1555 0.38870.1581 0.25920.0758 0.20400.1231
XGBoost0.65730.2692 0.39000.1827 0.43130.1860 0.27240.0910 0.0919−0.1143
RF0.66430.2390 0.37750.1642 0.42070.1666 0.26050.0800 0.07240.1214
Ensemble Model
hete_rf0.57690.2283 0.34580.1625 0.37590.1646 0.24700.0772 0.30060.1986
hete_svr0.56860.2311 0.33620.1527 0.36520.1551 0.24180.0776 0.32050.1784
hete_xgb0.57650.2318 0.33230.1569 0.36280.1593 0.24300.0786 0.30160.1734
Percentage of improvement(%)
9.77617.5941 8.74615.4426 10.04235.7167 9.70166.6027 --
Table 5. Comparison of accuracy between integrated models and deep learning models.
Table 5. Comparison of accuracy between integrated models and deep learning models.
RMSE SMAPE MAE U1 R os 2
WholeHighLow WholeHighLow WholeHighLow WholeHighLow WholeHighLow
Ensemble Model
homo_rf0.43170.56170.2392 0.25440.34080.1680 0.26980.36920.1704 0.15300.24510.0797 0.31070.33680.1199
homo_svr0.42290.54820.2390 0.24580.32850.1630 0.26050.35540.1656 0.14750.23110.0790 0.33860.36830.1213
homo_xgb0.42700.55420.2399 0.25100.33730.1648 0.26600.36460.1673 0.15120.24160.0799 0.32570.35450.1148
hete_rf0.43870.57690.2283 0.25410.34580.1625 0.27020.37590.1646 0.15440.24700.0772 0.28840.30060.1986
hete_svr0.43400.56860.2311 0.24450.33620.1527 0.26020.36520.1551 0.15350.24180.0776 0.30340.32050.1784
hete_xgb0.43930.57650.2318 0.24460.33230.1569 0.26110.36280.1593 0.15360.24300.0786 0.28620.30160.1734
Deep Learning Model
LSTM0.44150.52800.3334 0.29430.33730.2513 0.30860.36070.2566 0.19720.21550.1861 0.27900.4141−0.7100
GRU0.46960.54710.3763 0.34010.37650.3038 0.35520.40000.3104 0.20590.22180.1964 0.18460.3709−1.1785
EMD+LSTM0.45600.53580.3588 0.31230.34360.2809 0.32700.36720.2868 0.16310.18790.1467 0.23110.3966−0.9798
Table 6. The p-value of MCS.
Table 6. The p-value of MCS.
MSE MAE Huber Loss
T R T MAX T R T MAX T R T MAX
Homogeneous ensemble
FAR0.10750.0969 0.04160.0801 0.08620.0541
FAR+MMS0.21320.2082 0.07520.0801 0.23020.1569
homo_rf0.68140.6707 0.27390.163 0.83260.8531
homo_xgb0.68140.6707 0.39610.3961 0.83260.8531
homo_svr1.00001.0000 1.00001.0000 1.00001.0000
Heterogeneous ensemble
MMA0.36830.3922 0.08760.3015 0.26290.3739
E-net0.36830.3922 0.29460.3015 0.26290.3739
lasso0.36830.3776 0.18810.3015 0.25090.2619
ridge0.36830.3776 0.29210.3015 0.26290.3158
SVR0.32440.3922 0.36580.3676 0.25090.3739
XGBoost0.07780.0536 0.01040.0184 0.04430.0316
RF0.36830.3550 0.09260.3015 0.15780.2619
hete_rf0.86300.7634 0.29210.3676 0.89590.7860
hete_xgb0.86300.6902 0.88460.8846 0.89590.7514
hete_svr1.00001.0000 1.00001.0000 1.00001.0000
Note: The value in bold indicates that the corresponding model has the best prediction accuracy.
Table 7. Annualized utility gains and Sharpe ratio at γ = 0.3 .
Table 7. Annualized utility gains and Sharpe ratio at γ = 0.3 .
UG SR
WholeHighLow WholeHighLow
Base Model
FAR40.072077.12605.1235 0.24450.34450.0876
FAR+MMS45.742288.47375.6559 0.26500.37590.0899
MMA42.466079.26137.9127 0.25260.34850.1066
E-net42.385480.56256.3027 0.25510.35810.0947
lasso41.858980.77374.9107 0.25720.36680.0837
ridge40.257881.07341.8021 0.24660.35820.0615
svr41.781771.954313.1367 0.25490.33440.1537
XGBoost9.691219.39710.3235 0.13200.18100.0520
rf27.181241.112213.6153 0.20070.24480.1538
Ensemble Model
homo_rf46.429388.21857.1480 0.26890.37740.1010
homo_svr44.233985.67095.2682 0.26080.37020.0867
homo_xgb46.410287.11858.1578 0.26810.37300.1086
hete_rf49.288683.826416.5444 0.28100.36780.1690
hete_svr44.845786.43165.6646 0.26380.37420.0899
hete_xgb45.100885.99626.6055 0.26490.37250.0968
buy&hold--- 0.01190.00500.0187
Note: The values in bold represent the best model for the corresponding metric.
Table 8. Variance in real observation and predicted value by ensemble models.
Table 8. Variance in real observation and predicted value by ensemble models.
homo_rfhomo_svrhomo_xgbhete_rfhete_svrhete_xgbObservation
Whole0.07150.06560.07750.05470.08840.06240.2715
High0.11660.10410.12640.09700.15080.10480.4800
Low0.02580.02560.02780.01080.02460.01710.0655
Table 9. Annualized utility gains using another baseline.
Table 9. Annualized utility gains using another baseline.
UG
WholeHighLow
Base Model
FAR144.9971263.473225.9002
FAR+MMS150.6673268.820926.4326
MMA147.3911265.608528.6894
E-net147.3105266.909627.0794
lasso146.7839267.120925.6873
ridge145.1829267.420622.5788
svr146.7068258.301533.9134
XGBoost114.6163205.744321.1002
rf132.1063227.459334.3919
Ensemble Model
homo_rf151.3544274.565627.9247
homo_svr149.1590272.018126.0448
homo_xgb151.3353273.465728.9345
hete_rf154.2137270.173537.3211
hete_svr149.7708272.778726.4413
hete_xgb150.0259272.343427.3822
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ye, P.; Li, Y.; Siddik, A.B. Forecasting the Return of Carbon Price in the Chinese Market Based on an Improved Stacking Ensemble Algorithm. Energies 2023, 16, 4520. https://doi.org/10.3390/en16114520

AMA Style

Ye P, Li Y, Siddik AB. Forecasting the Return of Carbon Price in the Chinese Market Based on an Improved Stacking Ensemble Algorithm. Energies. 2023; 16(11):4520. https://doi.org/10.3390/en16114520

Chicago/Turabian Style

Ye, Peng, Yong Li, and Abu Bakkar Siddik. 2023. "Forecasting the Return of Carbon Price in the Chinese Market Based on an Improved Stacking Ensemble Algorithm" Energies 16, no. 11: 4520. https://doi.org/10.3390/en16114520

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop