**1. Introduction**

Dramatic increase in greenhouse gas emissions directly leads to the aggravation of negative environmental externality. The emission of pollutants such as carbon dioxide is a serious threat to human health, and it is unacceptable that the pollution of this harmful gas will continue in the foreseeable future. According to the report of the Word Bank, the global carbon emissions in 2018 were 2.1 times the 1960 level, the per capita emissions increased by nearly 67%, and the per capita energy consumption increased by nearly 60% (data from author's calculation based on the Wind database). Economic growth, CO<sup>2</sup> emissions and energy consumption are complementary [1,2]. Based on the scientific report released by the National Oceanic and Atmospheric Administration (NOAA) of the United States in 2020, the global CO<sup>2</sup> concentration was 280 ppm at the beginning of the first industrial revolution, that is, the CO<sup>2</sup> quality accounted for 2.8% of the global atmospheric quality. In 2017, this data increased to 400 ppm, and in 2019, it increased to 415 ppm. The continuous growth of carbon dioxide emissions not only leads to the rise of global temperature, triggers sea-level rise, aggravates glacier melting and other severe environmental problems, but also threatens human health, and ultimately affects the sustainability of global economy and human civilization. Therefore, it has become an urgent task for human society to effectively curb the global climate problems and reduce greenhouse gas emissions.

**Citation:** Yun, P.; Zhang, C.; Wu, Y.; Yang, Y. Forecasting Carbon Dioxide Price Using a Time-Varying High-Order Moment Hybrid Model of NAGARCHSK and Gated Recurrent Unit Network. *Int. J. Environ. Res. Public Health* **2022**, *19*, 899. https://doi.org/10.3390/ijerph 19020899

Academic Editors: Taoyuan Wei and Qin Zhu

Received: 14 December 2021 Accepted: 12 January 2022 Published: 14 January 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The establishment of carbon market is a market-oriented means for the international community to solve climate problems and reduce pollution emissions. Based on the "Kyoto Protocol" signed in 2005 and the "Paris Agreement" passed in 2015, the carbon allowance assets traded in the carbon market have commodity and financial attributes, and there exists three exclusive characteristics that cannot to be ignored compared with other capital markets. The first is the asymmetric distribution of market returns, the tail distribution has the characteristics of left deviation, and the skewness is negative [3–5]. The second is the high sensitivity to policy events or external events [6]. For example, the policy implementation of banning interterm storage of carbon quotas led to a serious decline in European carbon price at the end of 2007; the fall in carbon price caused by the expiration of the second phase of emission reduction in Europe at the end of 2012; the outbreak of COVID-19 virus led to global economic downturn and triggered a sharp drop in carbon price. The third is the time-varying characteristics of carbon price volatility [7–9]. Therefore, the research into carbon price prediction and pricing models need to reflect the above three indispensable characteristics. The price mechanism is the core of the carbon market to promote emission reduction of the whole society. Consequently, studying the pricing mechanism of the carbon market in this paper can better serve the emission reduction practice of entity enterprises and create a healthier social environment.

The structure of this paper is as follows: the second part is the literature review; the third section analyzes the econometric model; the fourth section is the empirical analysis and discussion; the last part summarizes the conclusion and the prospects.

#### **2. Literature Review**

Existing research methods on carbon price forecasting mainly focus on two aspects: one is the volatility modeling technology and the other is artificial intelligence-integrated technology.

#### *2.1. Volatility Modeling Technology*

As for volatility modeling technology, Byun and Cho [10] pointed out that the GARCH family model could better fit the carbon future returns than other volatility models. Conducting the asymmetric threshold GARCH model, Chevallier [11] concluded that the stock and bond market variables could effectively explain the asymmetric volatility of carbon future returns. Based on the autoregressive, comprehensive, moving average model, Dhamija et al. [12] found that the asymmetric ARIMA-GARCH model can fit the conditional return and variance of European carbon price. Using the multi-GARCH model, Oberndorfer [13] stated that the EUA (European Union Allowance, EUA) price was positively correlated with the electricity stock return, and the stock market return did not cause EUA market volatility. Based on the ARCH regression model, the crude oil, natural gas and coal returns have a significant effect on carbon price [14,15]. Testing the EGARCH model, Chevallier [16] maintained that the abnormal events, policy factors, compliance events and uncertainty after the Kyoto Protocol are evidence of the instability of carbon price. The time-varying GARCH model with generalized nonlinear parameters can effectively fit the carbon price for prediction [16]. Designing bilaterally modified variables, Ren et al. [17] point out that the AR-GARCH model can reveal the impacts of regulatory update events on the Chinese carbon market. Employing the dynamic nonlinear (DMA) model, Koop et al. [18] found that the pricing precision of the DMA model is superior to the time-varying parameter regression model (TVP). The European carbon price is characterized by heterogeneous volatility, the prediction performance of the GARCH model based on Markov regime switching is better than other GARCH models [19].

#### *2.2. Artificial Intelligence-Integrated Technology*

The volatility modeling technology represented by the GARCH family model usually requires the carbon price being subject to strict parameter assumptions and tail distribution, which means the application of the model has great limitations [18]. Artificial

intelligence-integrated technology with the advantages of mapping nonlinear relations and without considering the tail distribution has been widely used in carbon price forecasting research. The BP neural network model with high-frequency data has a more accurate prediction performance on the CER (Certified Emission Reduction, CER) price than the GARCH family model [20]. Tiwari et al. [21] found that the time-varying Markov switching copula model can provide evidence of a time-varying tail-dependence structure, and AI (artificial intelligence) is an effective means to capture carbon price. The finite distributed lag (FDL) model based on a genetic algorithm (GA) has better performance on predicting carbon price than other GARCH models [22]. Based on the idea of ensemble learning, the EMD model (Empirical Mode Decomposition, EMD) is used to extract the intrinsic mode function (IMF) that represents the different coexisting oscillation modes of carbon series [23–25], and then a hybrid carbon price forecasting model integrating the variational mode decomposition (VMD) and optimal combination forecasting model (CFM) is constructed, the results suggesting the superiority of the proposed hybrid model for carbon price forecasting [26,27]. Conducting the EMD method, Wang et al. [28] proposed a new random forest-based nonlinear ensemble paradigm for carbon price prediction and proved the model's superiority in European carbon price forecasting. Furthermore, the hybrid carbon price forecasting model, for example the multiobjective grasshopper optimization algorithm model proposed by Hao et al. [29] and the wavelet least-square support vector machine (WLSSVM) model carried by Sun et al. [30] have been proven to have strong superiority and accuracy in carbon price prediction. Different from the prediction of EMDtype models, the LSTM (Long and Short-Term Memory network) model does not need to decompose the data frequency, and shows advantages in capturing the long-term lag return characteristics based on the special gate structure of forget gate, input gate and output gate [31,32]. The application of the LSTM model in predicting stock market index has stronger accuracy and robustness than other exponential smooth models and the ARIMA model [33,34]. Employing the models of ARIMA, CNN, GARCH and LSTM to extract the linear characteristics, hierarchical data structure, long memory characteristics and volatility characteristics of carbon return, respectively, the conclusion suggests that the hybrid model of ARIMA–CNN–LSTM and GARCH-LSTM contribute a lower prediction error [35,36]. Based on similar modeling ideas, the integrated models of EMD–LSTM and that composed of total average EMD with LSTM (MEEMD–LSTM) have also proven to have significant superiority in carbon price prediction [37,38].

The above literature provides valuable references for this paper. However, the most obvious defect is that the existing literature ignores the time-varying impact of market asymmetric information and extreme shock factors on carbon premium from the perspective of high-order moment (skewness and kurtosis). More importantly, the time-varying highorder moment characteristics have been ignored. In fact, studies have proven that the financial assets of low-order moment information cannot fully reflect the actual financial return distribution [39]. The innovation and contribution of this study is to construct a new hybrid carbon pricing model, NAGARCHSK-GRU, that reveals the time-varying high-moment volatility characteristics of carbon price. The proposed NAGARCHSK-GRU price-forecasting model combines the advantages of the NAGARCHSK model in parameter estimation of the time-varying, high-order moment characteristics and the superiority of GRU (Gated Recurrent Unit, GRU) network in nonlinear fitting and forecasting. The purpose for integrating the models of NAGARCHSK and GRU network is to improve the robustness and generalization ability of the proposed pricing model, and then to provide certain technical support for market participants to capture price information and predict carbon price.

#### **3. Econometric Modeling**

Based on the classical GARCH models, this paper first constructs constant and timevarying, high-order moment carbon price volatility methods to estimate the parameters of the proposed pricing model. Secondly, the multilayer GRU network model is designed to

realize the nonlinear prediction based on the time-varying, high-order moment parameters estimated by the NAGARCHSK model.

#### *3.1. High-Order Moment Volatility Model*

#### 3.1.1. Constant High-Order Moment Model

The constant high-order moment model assumes that the third-order moment skewness and the fourth-order moment kurtosis have no impact on the first-order moment return of carbon price, but assumes that they are constant. The common constant high-moment model is the GARCHSK (*q*1,*p*1;0,0;0,0) model with the constant high-order moment term. During modeling of the carbon price, we use the AR (R) model to describe the autocorrelation process of carbon price series, assuming the return series follows a first-order lag AR ®process:

$$\mathcal{R}\_t = \rho \mathcal{R}\_{t-1} + h\_t^{1/2} \mathfrak{F}\_t \tag{1}$$

where, *h* 1/2 *t* is the conditional variance of carbon return; *ξ<sup>t</sup>* means the conditional return item; *ρ* indicates the autocorrelation coefficient, and *ξ<sup>t</sup>* ∼ *N*(0, 1).

For modeling the conditional variance *h<sup>t</sup>* process that with the characteristics of volatility clustering and asymmetric distribution, this paper uses the GARCH model and its derivative models to estimate the parameters of the proposed carbon pricing model. As the first-order GARCH model can simulate the financial return volatility, we introduce the following common forms of conditional variance based on the GARCH (1,1) model.

Conditional variance of GARCH (1,1) is:

$$h\_t = \beta\_0 + \beta\_1 \varepsilon\_{t-1}^2 + \beta\_2 h\_{t-1} \tag{2}$$

Conditional variance of TGARCH (1,1) is:

$$h\_t^{1/2} = \beta\_0 + \beta\_1|\varepsilon\_{t-1}| + \beta\_2 h\_{t-1} + \beta\_3 v\_{t-1}|\varepsilon\_{t-1}|\tag{3}$$

Conditional variance of NAGARCH (1,1) is:

$$h\_t = \beta\_0 + \beta\_1 \left(\varepsilon\_{t-1} + \beta\_3 h\_{t-1}^{1/2}\right)^2 + \beta\_2 h\_{t-1} \tag{4}$$

where, *β*<sup>0</sup> represents the constant term of the variance equation, skewness equation and kurtosis equation; *β*<sup>1</sup> and *β*<sup>2</sup> denote the ARCH and GARCH term coefficients of the highorder moment equation, respectively; *ε* represents the residual term; *β*<sup>3</sup> means the leverage coefficient, reflecting the impact of asymmetric information on the carbon returns; *υt*−<sup>1</sup> is a dummy variable that controls the impact direction of asymmetric information, when *εt*−<sup>1</sup> < 0, *υt*−<sup>1</sup> = 1;*εt*−<sup>1</sup> > 0, *υt*−<sup>1</sup> = 0.

#### 3.1.2. Time-Varying High-Order Moment Model

The constant high-order moment model regards the third-order moment skewness and fourth-order moment kurtosis as fixed constants and ignores the financial asset distribution characterization of leptokurtosis and fat-tail caused by the market asymmetric information and extreme factors, which make it difficult to meet the real asset volatility-modeling requirements. Therefore, this paper considers the third-order skewness and fourth-order moment kurtosis attributes with the exclusive features of time-varying volatility, so as to describe the shock of market asymmetric information and policy factors on carbon price. The specific form of the GARCHSK (*q*1,*p*1;*q*2,*p*2;*q*3,*p*3) model, considering the volatility of

time-varying conditional variance, conditional skewness and conditional kurtosis, is as follows:

$$\begin{cases} \begin{aligned} \mathcal{R}\_{t} &= \rho E\_{t-1}(\mathcal{R}\_{t}) + \varepsilon\_{t} = \mu\_{t} + h\_{t}^{1/2} \mathfrak{F}\_{t}; \mathfrak{f}\_{t} | I\_{t-1} \sim F\_{n}(0, 1, s\_{t}, k\_{t}) \\\ h\_{t} &= \beta\_{0} + \sum\_{i=1}^{q\_{1}} \beta\_{1,i} \mathfrak{z}\_{t-i}^{2} + \sum\_{j=1}^{p\_{1}} \beta\_{2,j} h\_{t-j} \\\ s\_{t} &= \gamma\_{0} + \sum\_{i=1}^{q\_{2}} \gamma\_{1,i} \mathfrak{z}\_{t-i}^{3} + \sum\_{j=1}^{p\_{2}} \gamma\_{2,j} s\_{t-j} \\\ k\_{t} &= \delta\_{0} + + \sum\_{i=1}^{q\_{3}} \delta\_{1,i} \mathfrak{z}\_{t-i}^{4} + \sum\_{j=1}^{p\_{3}} \delta\_{2,j} k\_{t-j} \end{aligned} \tag{5}$$

The specific form of the NAGARCHSK (*q*1,*p*1;*q*2,*p*2;*q*3,*p*3) model with leverage effect that considers the volatility of the time-varying, high-order moment, is as follows:

$$\begin{cases} \begin{aligned} \mathcal{R}\_{t} &= \rho \mathcal{E}\_{t-1}(\mathcal{R}\_{l}) + \varepsilon\_{t} = \mu\_{t} + h\_{l}^{1/2} \mathcal{Z}\_{t}; \mathcal{Z}\_{t}|\_{l-1} \sim \mathcal{F}\_{\text{n}}(0, 1, s\_{t}, k\_{l}) \\\ h\_{t} &= \beta\_{0} + \sum\_{i=1}^{q\_{1}} \beta\_{1,i} (\varepsilon\_{t-1} + \beta\_{3,i} h\_{t-i}^{1/2})^{2} + \sum\_{j=1}^{p\_{1}} \beta\_{2,j} h\_{t-j} \\\ s\_{t} &= \gamma\_{0} + \sum\_{i=1}^{q\_{2}} \gamma\_{1,i} \mathcal{Z}\_{t-i}^{3} + \sum\_{j=1}^{p\_{2}} \gamma\_{2,j} s\_{t-j} \\\ k\_{l} &= \delta\_{0} + \sum\_{i=1}^{q\_{3}} \delta\_{1,i} \mathcal{Z}\_{t-i}^{4} + \sum\_{j=1}^{p\_{3}} \delta\_{2,j} k\_{t-j} \end{aligned} \tag{6}$$

where, *It*−<sup>1</sup> represents the information set when the carbon return volatility reaches the time of *t* − 1; *Et*−1(*Rt*) is the corresponding conditional expected return that can be obtained steadily without risk impact under the certain *It*−<sup>1</sup> information set. The form AR(1) is used to depict the autoregressive carbon return process. *Ft*(0, 1,*s<sup>t</sup>* , *kt*) represents the fourth-order moment distribution type of the carbon return series based on the classical GARCH(1,1) model, and we can obtain *Et*−1(*ξt*) = 0, *Et*−1(*ξ* 2 *t* )= 1, *Et*−1(*ξ* 3 *t* ) = *s<sup>t</sup>* , *Et*−1(*ξ* 4 *t* ) = *k<sup>t</sup>* ; *s<sup>t</sup>* and *k<sup>t</sup>* represents the skewness and kurtosis corresponding to standardized residual *ξ<sup>t</sup>* = *h* −1/2 *t εt* . *β*0, *β*1, *β*2, *β*<sup>3</sup> denotes the coefficient of the conditional variance equation; *γ*0, *γ*1, *γ*<sup>2</sup> represents the coefficient of the conditional skewness equation; *δ*0, *δ*1, *δ*<sup>2</sup> means the coefficient of the conditional kurtosis equation. (*q*1,*p*1);(*q*2,*p*2);(*q*3,*p*3) represents the lag order of the conditional variance, conditional skewness and conditional kurtosis equations for capturing the relationship between carbon return and its time-varying, conditional, high-order moment term.

For estimating the parameters of the time-varying, high-order moment model (NA-GARCHSK), the Gram–Charlier expansion of normal density function is used and truncates it in the fourth moment. Then, the conditional probability density of the standard error can be obtained under the information set *It*−1:

$$\begin{split} f(\boldsymbol{\xi}\_{t}|\boldsymbol{I}\_{t-1}) &= g(\boldsymbol{\xi}\_{t}) \lambda(\boldsymbol{\xi}\_{t}) / \boldsymbol{\Gamma}\_{t} \\ &\frac{1}{\sqrt{2\pi}} e^{-\boldsymbol{\xi}\_{t}^{2}/2} \left( 1 + \frac{s\_{t}^{\*}}{3!} (\boldsymbol{\xi}\_{t}^{3} - 3\boldsymbol{\xi}\_{t}) + \frac{k\_{t}^{\*} - 3}{4!} (\boldsymbol{\xi}\_{t}^{4} - 6\boldsymbol{\xi}\_{t}^{2} + 3) \right) / \left( 1 + \frac{s\_{t}^{\*}}{3!} + \frac{k\_{t}^{\*} - 3}{4!} \right) \end{split} \tag{7}$$

where Γ*<sup>t</sup>* = 1 + *s* ∗ *t* 3! + *k* ∗ *<sup>t</sup>* −3 4! .

Furthermore, the conditional distribution of *ε<sup>t</sup>* is expressed as *h* −1/2 *t f*( *ξt*|*It*−1), and the log likelihood function is expressed as:

$$LF(\varepsilon\_t | I\_{t-1}, \theta) = -\frac{1}{2} \ln(2\pi) - \frac{1}{2} \ln h\_t - \frac{1}{2} \mathfrak{J}\_t^2 + \ln(\lambda^2(\mathfrak{J}\_t)) - \ln(\Gamma\_t) \tag{8}$$

By maximizing the likelihood function above, the consistency estimation of the parameter vector can be obtained, and the parameter estimation results of the conditional mean equation, conditional variance, conditional skewness and conditional kurtosis equations can also be obtained simultaneously. Where *θ*= [*β*, *γ*, *δ*] <sup>0</sup> = [*β*0, *β*1, *β*2, *β*3; *γ*0, *γ*1, *γ*2; *δ*0, *δ*1, *δ*2] 0

is the parameter vector, representing the parameter to be estimated in the time-varying, high-order moment, carbon price volatility model. parameter to be estimated in the time-varying, high-order moment, carbon price volatility model.

δδδ

ε

 ξ

1 11 ( )=- ln(2 ) ln ln( ( )) ln( ) 2 22 *LF I h tt t t t t*

By maximizing the likelihood function above, the consistency estimation of the parameter vector can be obtained, and the parameter estimation results of the conditional

is expressed as

is the parameter vector, representing the

2 2

<sup>−</sup> − − + −Γ (8)

 λξ

1/2 <sup>1</sup> ( ) *t tt hf I* ξ<sup>−</sup>

<sup>−</sup> ,

*Int. J. Environ. Res. Public Health* **2022**, *19*, x FOR PEER REVIEW 6 of 20

Furthermore, the conditional distribution of *<sup>t</sup>*

 π

.

1,

=[ , , ]'=[ , , , ; , , ; , , ]'

ββββγ

01 2 3012 012

 γγ

 θ

and the log likelihood function is expressed as:

ε

<sup>3</sup> =1 3! 4! *t t*

*s k* ∗ ∗ <sup>−</sup> Γ ++

#### *3.2. GRU Model 3.2. GRU Model*

β γ δ

θ

where

*t*

For mapping the nonlinear, time-varying, high-order moment shock of market asymmetric information and extreme events on carbon price, this paper constructs a multilayer GRU (Gated Recurrent Unit, GRU) model to predict and fit the carbon price with the characteristic of the time-varying, high-order moment. Different from the special input gate, forget gate and output gate structure of the LSTM (Long and Short-Term Memory network, LSTM), another feedforward network structure similar to the GRU network, the GRU model is constructed based on the gate structure of the LSTM and composed of update gate and reset gate [40]. The GRU training structure can be showed in Figure 1. For mapping the nonlinear, time-varying, high-order moment shock of market asymmetric information and extreme events on carbon price, this paper constructs a multilayer GRU (Gated Recurrent Unit, GRU) model to predict and fit the carbon price with the characteristic of the time-varying, high-order moment. Different from the special input gate, forget gate and output gate structure of the LSTM (Long and Short-Term Memory network, LSTM), another feedforward network structure similar to the GRU network, the GRU model is constructed based on the gate structure of the LSTM and composed of update gate and reset gate [40]. The GRU training structure can be showed in Figure 1.

**Figure 1.** The model training structure of the GRU network. **Figure 1.** The model training structure of the GRU network.

Specifically, the update gate of GRU is combined of the input gate and forget gate of the LSTM network, and this function is used to determine the information to be discarded and the new information to be added. The reset gate determines the forgotten information in the past time series, which can help to capture the short-term dependency of the finance series. Unlike the LSTM model, which relies on the cell units to obtain the long-term information, the GRU network gets rid of the cell state instead of the hidden state, in order to transmit the previous information and obtain the long-term dependency. Although the debate about the model superiority of GRU and LSTM network continues, it is generally accepted that as an effective variant of LSTM network, the structure of the GRU network is simpler and requires fewer parameters and training samples. Therefore, some studies suggest that GRU is more effective than the LSTM model in solving the long dependency problem of RNN networks [41]. According to the above GRU model diagram, the forward propagation process of the GRU network is as follows: Specifically, the update gate of GRU is combined of the input gate and forget gate of the LSTM network, and this function is used to determine the information to be discarded and the new information to be added. The reset gate determines the forgotten information in the past time series, which can help to capture the short-term dependency of the finance series. Unlike the LSTM model, which relies on the cell units to obtain the long-term information, the GRU network gets rid of the cell state instead of the hidden state, in order to transmit the previous information and obtain the long-term dependency. Although the debate about the model superiority of GRU and LSTM network continues, it is generally accepted that as an effective variant of LSTM network, the structure of the GRU network is simpler and requires fewer parameters and training samples. Therefore, some studies suggest that GRU is more effective than the LSTM model in solving the long dependency problem of RNN networks [41]. According to the above GRU model diagram, the forward propagation process of the GRU network is as follows:

Firstly, the state *ht*−<sup>1</sup> transmitted from the previous network is combined with the input *x<sup>t</sup>* of the current node to obtain the gate structure of the GRU network, that is, the reset gate r and update gate *z*. Where *σ* means the activation function, which converts the input data to a value in the range of 0–1 to act as a gating signal.

$$r\_t = \sigma(\mathcal{W}\_r \mathbf{x}\_t + \mathcal{U}\_r h\_{t-1}) \tag{9}$$

$$z\_t = \sigma(\mathcal{W}\_z \mathbf{x}\_t + \mathcal{U}\_z h\_{t-1}) \tag{10}$$

Secondly, after obtaining the gating signal, the reset gate is used to obtain the data after "reset". If the element value *r<sup>t</sup>* in the reset gate is close to 0, it means the hidden state element related to the reset gate should be set to 0, that is, the hidden state information of the last time should be discarded. Further, the result of element multiplication is linked to the input of current time step, and the candidate hidden state *h*e*<sup>t</sup>* is calculated by the activation function tanh, and the element value ranges from −1 to 1. The calculation for candidate hidden state is: activation function tanh, and the element value ranges from −1 to 1. The calculation for candidate hidden state is: <sup>1</sup> ( ) *t t tt h tanh Wx rUh* = + <sup>−</sup> (11)

Firstly, the state *h*t-1 transmitted from the previous network is combined with the input *x*t of the current node to obtain the gate structure of the GRU network, that is, the reset

<sup>1</sup> ( ) *t rt rt r Wx Uh* = +

<sup>1</sup> ( ) *t zt zt z Wx Uh* = +

Secondly, after obtaining the gating signal, the reset gate is used to obtain the data after "reset". If the element value *r*t in the reset gate is close to 0, it means the hidden state element related to the reset gate should be set to 0, that is, the hidden state information of

σ

σ

*Int. J. Environ. Res. Public Health* **2022**, *19*, x FOR PEER REVIEW 7 of 20

σ

input data to a value in the range of 0–1 to act as a gating signal.

gate r and update gate *z*. Where

$$h\_t = \tanh(\mathcal{W}\mathbf{x}\_t + r\_t \mathcal{U}h\_{t-1})\tag{11}$$

means the activation function, which converts the

<sup>−</sup> (9)

<sup>−</sup> (10)

*<sup>t</sup> <sup>h</sup>* is calculated by the

Finally, the most critical process of training the GRU model is the update memory stage. The update gate *z<sup>t</sup>* controls the forgotten information of the hidden layer *ht*−<sup>1</sup> at the previous moment, and the new hidden layer information *h*e*<sup>t</sup>* needs to be added at the current moment. The update gate *z<sup>t</sup>* is expressed as: Finally, the most critical process of training the GRU model is the update memory stage. The update gate zt controls the forgotten information of the hidden layer ht-1 at the previous moment, and the new hidden layer information *<sup>t</sup> <sup>h</sup>* needs to be added at the current moment. The update gate zt is expressed as:

$$h\_t = (1 - z\_t)h\_{t-1} + z\_t \tilde{h}\_t \tag{12}$$

It is worth noting that the value of updated gating *z<sup>t</sup>* is in a range from 0 to 1. The closer the gating value is to 1, the more data there is to be remembered, while the closer it is to 0, the more information is forgotten. The GRU model can realize data forgetting and memory at the same time by using update gating *z<sup>t</sup>* , unlike the LSTM model that requires multiple gating. It is worth noting that the value of updated gating zt is in a range from 0 to 1. The closer the gating value is to 1, the more data there is to be remembered, while the closer it is to 0, the more information is forgotten. The GRU model can realize data forgetting and memory at the same time by using update gating zt, unlike the LSTM model that requires multiple gating.

#### *3.3. NAGARCHSK-GRU Model 3.3. NAGARCHSK-GRU Model*

The proposed hybrid carbon price forecasting model combines the advantages of NAGARCHSK and GRU neural networks. Firstly, the NAGARCHSK model is better than the constant high-order moment models and other time-varying, high-order moment models in fitting the carbon price series with time-varying, high-order moment volatility characteristics. Therefore, we select the NAGARCHSK model to estimate the time-varying parameters of carbon price. The proposed hybrid carbon price forecasting model combines the advantages of NAGARCHSK and GRU neural networks. Firstly, the NAGARCHSK model is better than the constant high-order moment models and other time-varying, high-order moment models in fitting the carbon price series with time-varying, high-order moment volatility characteristics. Therefore, we select the NAGARCHSK model to estimate the time-varying parameters of carbon price.

Secondly, we use these estimated parameters as network inputs, and use the GRU neural network to train the time-varying, high-order moment volatility characteristics of carbon price for improving the prediction accuracy. The basic idea of constructing the proposed NAGARCHSK-GRU carbon-forecasting model shown in Figure 2. Secondly, we use these estimated parameters as network inputs, and use the GRU neural network to train the time-varying, high-order moment volatility characteristics of carbon price for improving the prediction accuracy. The basic idea of constructing the proposed NAGARCHSK-GRU carbon-forecasting model shown in Figure 2.

**Figure 2.** The structuring idea of the carbon price-forecasting hybrid model of NAGARCHSK–GRU. **Figure 2.** The structuring idea of the carbon price-forecasting hybrid model of NAGARCHSK–GRU.

#### *3.4. Evaluation Criteria and the Benchmark Model*

For evaluating the prediction performance of the proposed time-varying, high-order moment carbon-pricing model, this paper adopts the following criteria to measure the model performance.

$$RMSE = \sqrt{\frac{\sum\_{i=1}^{T} \left(y\_i - \hat{y}\_i\right)^2}{T}} \tag{13}$$

$$MAE = \frac{1}{T} \sum\_{i=1}^{T} |y\_i - \mathcal{Y}\_i| \tag{14}$$

$$MAPE = \frac{1}{T} \sum\_{i=1}^{T} \left| \frac{y\_i - \mathcal{Y}\_i}{y\_i} \right| \tag{15}$$

$$DA = \frac{1}{T} \sum\_{i=1}^{T-1} a\_i \text{ where } a\_i = \begin{cases} 1, if(y\_{i+1} - y\_i) \times (\hat{y}\_{i+1} - y\_i) > 0\\ 0, otherwise \end{cases} \tag{16}$$

where *<sup>Y</sup>* <sup>=</sup> {*y*1, *<sup>y</sup>*2, · · · , *<sup>y</sup>T*} represents the carbon return series; *<sup>Y</sup>*<sup>ˆ</sup> <sup>=</sup> {*y*ˆ1, *<sup>y</sup>*ˆ2, · · · , *<sup>y</sup>*ˆ*T*} represents the prediction return. *T* is a time series variable.

The values of root-mean-square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) range from 0 to 1, and a larger value means the deviation between predicted return and real return is greater, and the model performance is worse. The correct investment direction prediction can help investors make more valuable decisions. This paper uses DA (direction accuracy) index to measure the consistency probability of market trend and investors' prediction direction. The larger DA value means the predicted value of carbon return is closer to investors' psychological expectation.

For assessing the performance of the proposed pricing model, this paper also selects the following aggressors as the comparison benchmarks. The first one is the BP (back propagation network) model with the advantage of nonlinear mapping. The second is GBR (gradient boosting regression) model, which is a kind of integrated learning method. The third is MLP (multilayer perceptron); the parameter optimization of the MLP can improve the nonlinear mapping and carbon price-prediction accuracy. The fourth is the RNN (recurrent neural network) model, which is an artificial neural network with a tree structure, that has significant advantages in forecasting carbon price. The fifth is the LSTM (long and short-term memory network) model, which is another improved structure of the RNN model that shows superiority for solving the problems of gradient explosion.

#### **4. Empirical Analysis and Discussion**

#### *4.1. The Data*

This article selects the continuous futures contract of EUAf (European Union Allowance future, EUAf) from the European Energy Exchange as the representative variable of carbon assets. The data range from 22 June 2012 to 7 May 2021, with a total of 2274 data samples. The sample selection rule refers to the experience of Wen et al. [42], that is, continuous futures contracts with different maturity dates are connected according to the time sequence. Based on this, the sample of this article integrates the daily settlement price of the four futures contracts, DEC12, DEC16, DEC18 and DEC20. The reason for choosing EUAf is that the EUAf is the largest emission reduction quota in the world. The carbon futures trading of the European Energy Exchange accounts for about 70% of the global futures trading, and the EUAf trading volume is larger than EUAs (European Union Allowance spot, EUAs), the price discovery function is also relatively mature. It uses *R<sup>t</sup>* to represent the carbon assets return:

$$R\_t = 100 \times \left( lnP\_t - lnP\_{t-1} \right) \tag{17}$$

where *P<sup>t</sup>* represents the carbon asset price, that is, the daily settlement price of EUAf continuous futures contracts.

#### *4.2. Time-Varying High-Order Moment Characteristics Estimate*

The study findings shown in Table 1 concluded that the ARCH and GARCH terms of all constant and time-varying high-order moment models are significant, indicating that the carbon return has obvious volatility clustering, which is not only caused by the variance shock, but also conditional skewness and conditional kurtosis, representing the impacts of asymmetric information and extreme factors on carbon return. All the volatility leverage coefficients *β*<sup>3</sup> are negative and significant, denoting that variance volatility has obvious asymmetry shock on carbon return, and the degree of negative impact is greater than the positive impact. This conclusion is completely consistent with the pioneering research results of Engle and Manganelli [43] that the negative VAR impact of the stock market is more significant. This finding indirectly proves that carbon assets have general financial attributes and common volatility characteristics.

The variance impact coefficient *β*<sup>2</sup> is smaller than the coefficient of the constant model, that is, with the addition of the conditional skewness and conditional kurtosis equations, the volatility clustering effect from the shock of variance term gradually decreases, for example, the *β*<sup>2</sup> coefficient of the GARCH, TGARCH and AGARCH models are 0.8824, 0.8875, 0.8904, respectively. When the time-varying conditional skewness and conditional kurtosis are added, the volatility clustering coefficients of AGARCH-K, NAGARCH-K and NAGARCHSK models are reduced to 0.8358, 0.8685, 0.7873, respectively. This conclusion is completely consistent with Harvey's [39] research.

This phenomenon shows that when the time-varying, high-order moment models no longer assume the skewness and kurtosis are constants, they can effectively identify the volatility clustering effect caused by asymmetric information and extreme shocks through the time-varying skewness and kurtosis equation. We can say that the impact of carbon return from the time-varying skewness and kurtosis is becoming more obvious, resulting in the variance impact coefficient decreasing as the skewness and kurtosis coefficient increases. This similar reason can be used to explain why the extreme impact coefficient *δ*<sup>2</sup> of the NAGARCHSK model is smaller than that of the AGARCH-K and the NAGARCH-K model.

**Table 1.** Parameter estimation of the high-order moment volatility model for carbon return.


Note: The bold indicates the model with the minimum maximum likelihood value and the best parameter estimation performance; the data in brackets indicate the t-statistic of parameter estimation of each model. Likelihood 5159.771 5070.972 5067.591 6469 6473 **4682**  Note: The bold indicates the model with the minimum maximum likelihood value and the best

parameter estimation performance; the data in brackets indicate the t-statistic of parameter estima-

Compared with the constant model and other time-varying, high-order moment models, it should be noted that the maximum likelihood value of the NAGARCHSK model is the lowest, as shown in Table 1, therefore, this paper chooses the NAGARCHSK model to estimate the model parameters of time-varying conditional variance, conditional skewness and conditional kurtosis equations of the carbon return. Figure 3 shows that the conditional high-moment series of carbon assets have obvious volatility persistence effects, the risk of variance, skewness and kurtosis are large, and the high-moment volatility series also shows time-varying characteristics. tion of each model. Compared with the constant model and other time-varying, high-order moment models, it should be noted that the maximum likelihood value of the NAGARCHSK model is the lowest, as shown in Table 1, therefore, this paper chooses the NAGARCHSK model to estimate the model parameters of time-varying conditional variance, conditional skewness and conditional kurtosis equations of the carbon return. Figure 3 shows that the conditional high-moment series of carbon assets have obvious volatility persistence effects, the risk of variance, skewness and kurtosis are large, and the high-moment volatility series also shows time-varying characteristics.

**Figure 3.** Time-varying, high-moment fluctuation of the carbon price identified by the NA-GARCHSK model. **Figure 3.** Time-varying, high-moment fluctuation of the carbon price identified by the NAGARCHSK model.

We use the NAGARCHSK model to estimate the time-varying, high-order moment

Input unit, output unit, number of hidden layers and hidden layer neurons are the basic structure of a deep-learning network. The input of the carbon price forecasting NA-GARCHSK-GRU model is the time-varying conditional lagging mean, conditional variance, conditional skewness and conditional kurtosis of carbon returns estimated by the NAGARCHSK model, and the output is the carbon return series we need to predict. The hidden layer is a network structure for parameter optimization and feature learning. Fewer hidden layers may limit the learning ability of the forecasting model, which makes it difficult to reach the optimal solution. Research has found that a neural network with two hidden layers can already solve most problems [44]. Similarly, the designing of hidden layer neurons is to capture and map the input data. Although more neurons can improve the learning and generalization ability of the network, it may also consume more

For determining the appropriate the GRU network structure, based on the experimental method, this paper measures the forecasting performance when the hidden layers

the carbon returns based on the obtained high-order moment series. The first 70% of samples of carbon return series are selected for model training, and the last 30% of samples

*4.3. Predicting Results Analysis* 

for testing the prediction performance.

4.3.1. GRU Structure Construction

training time and lead to overfitting.

#### *4.3. Predicting Results Analysis*

We use the NAGARCHSK model to estimate the time-varying, high-order moment parameter characteristics that represent the shock from the asymmetric information and extreme external impact. Then, a multilayer GRU model is constructed to map and predict the carbon returns based on the obtained high-order moment series. The first 70% of samples of carbon return series are selected for model training, and the last 30% of samples for testing the prediction performance.

#### 4.3.1. GRU Structure Construction

Input unit, output unit, number of hidden layers and hidden layer neurons are the basic structure of a deep-learning network. The input of the carbon price forecasting NAGARCHSK-GRU model is the time-varying conditional lagging mean, conditional variance, conditional skewness and conditional kurtosis of carbon returns estimated by the NAGARCHSK model, and the output is the carbon return series we need to predict. The hidden layer is a network structure for parameter optimization and feature learning. Fewer hidden layers may limit the learning ability of the forecasting model, which makes it difficult to reach the optimal solution. Research has found that a neural network with two hidden layers can already solve most problems [44]. Similarly, the designing of hidden layer neurons is to capture and map the input data. Although more neurons can improve the learning and generalization ability of the network, it may also consume more training time and lead to overfitting.

For determining the appropriate the GRU network structure, based on the experimental method, this paper measures the forecasting performance when the hidden layers number is 1, 2, 3, 4, 5, 6 and the hidden layers neuron nodes are 4, 8, 16, 32, 64, 128, respectively (as showed in Table 2). It is found that when there are two hidden layers in the NAGARCHSK-GRU model, and the neuron nodes in both layers are 16-16, the model's error criteria MSE, RMSE, and MAE values are 0.0006284, 0.0250681, and 0.1399925, respectively, which are the lowest of the whole experimental sample. Therefore, the network structure of the proposed NAGARCHSK-GRU forecasting model is designed as 4-16-16-1 for training the time-varying, high-order moment carbon series.


**Table 2.** Performance of the proposed NAGARCHSK-GRU: hidden layers and hidden nodes.

Note: Bold numbers are the minimum MSE, RMSE and MAE, respectively.

NAGARCHSK-GRU

NAGARCHSK-

#### 4.3.2. Performance of the NAGARCHSK-GRU Model

NAGARCHSK-

For testing the prediction performance of the proposed NAGARCHSK-GRU model, this paper compares the prediction results of the proposed model and benchmark evaluation models. The results are shown in Table 3. *Int. J. Environ. Res. Public Health* **2022**, *19*, x FOR PEER REVIEW 12 of 20

For high-order moment pricing models that consider time-varying conditional variance, conditional skewness and conditional kurtosis in Panel A, the NAGARCHSK-GRU model has significant advantages over other benchmark models in all the error evaluation criteria and market expected criteria. That is, the NAGARCHSK-GRU model has better prediction ability than other benchmark models (as shown in Figure 4). **Table 3.** Performance of the proposed and benchmark model for forecasting the carbon price. **Proposed Model Benchmark Model**  Panel A: Pricing model considering the features of conditional variance, conditional skewness and conditional kurtosis

NAGARCHSK-

NAGARCHSK-

NAGARCHSK-

**Table 3.** Performance of the proposed and benchmark model for forecasting the carbon price. LSTM RNN MLP GRB BP


Note: Bold numbers are the minimum MSE, RMSE, MAE, respectively, and the maximum of DA. Note: Bold numbers are the minimum MSE, RMSE, MAE, respectively, and the maximum of DA.

**Figure 4.** The forecasting performance of the proposed and benchmark model considering the features of time-varying conditional variance, conditional skewness and conditional kurtosis. **Figure 4.** The forecasting performance of the proposed and benchmark model considering the features of time-varying conditional variance, conditional skewness and conditional kurtosis.

bility for fitting carbon price series with time-varying, high-order moment characteristics. For the market-expected criteria, the DA of the NAGARCHSK-GRU model was 0.984211,

Specifically, as for the error evaluation criteria, the RMSE, MAE, and MAPE values of the NAGARCHSK-GRU model are 0.509902, 0.172333, and 0.594527, respectively, which are lower than those of benchmark models such as NAGARCHSK-LSTM, NA-GARCHSK-RNN, NAGARCHSK-MLP, NAGARCHSK-GBR, and NAGARCHSK-BP.

Specifically, as for the error evaluation criteria, the RMSE, MAE, and MAPE values of the NAGARCHSK-GRU model are 0.509902, 0.172333, and 0.594527, respectively, which are lower than those of benchmark models such as NAGARCHSK-LSTM, NAGARCHSK-RNN, NAGARCHSK-MLP, NAGARCHSK-GBR, and NAGARCHSK-BP. This result concludes that the NAGARCHSK-GRU model has better robustness and stability for fitting carbon price series with time-varying, high-order moment characteristics. For the market-expected criteria, the DA of the NAGARCHSK-GRU model was 0.984211, which is higher than that of the benchmark models NAGARCHSK-LSTM (0.978947), NAGARCHSK-GRB (0.877551), NAGARCHSK-MLP (0.875), NAGARCHSK-RNN (0.742475) and NAGARCHSK-BP (0.729114). This indicates that the NAGARCHSK-GRU model is in line with investors' psychological expectations for predicting carbon return, and the predicted returns are strongly consistent with the real return. As a result, the pricing model can provide technical support for investors to judge market conditions and formulate investment strategies. *Int. J. Environ. Res. Public Health* **2022**, *19*, x FOR PEER REVIEW 13 of 20 which is higher than that of the benchmark models NAGARCHSK-LSTM (0.978947), NA-GARCHSK-GRB (0.877551), NAGARCHSK-MLP (0.875), NAGARCHSK-RNN (0.742475) and NAGARCHSK-BP (0.729114). This indicates that the NAGARCHSK-GRU model is in line with investors' psychological expectations for predicting carbon return, and the predicted returns are strongly consistent with the real return. As a result, the pricing model can provide technical support for investors to judge market conditions and formulate investment strategies. In contrast, the error-evaluation criteria and market-expected criteria of the NA-GARCHSK-RNN model shows the worst prediction effect of all models, that is, the RMSE, MAE and MAPE values are, respectively, 3.348075, 2.413839 and 7.415169, and the DA value is 0.742475. We can conclude that using the NAGARCHSK-RNN model it is difficult to map the carbon price series with the time-varying, high-order moment feature, and its

In contrast, the error-evaluation criteria and market-expected criteria of the NAGARCHSK-RNN model shows the worst prediction effect of all models, that is, the RMSE, MAE and MAPE values are, respectively, 3.348075, 2.413839 and 7.415169, and the DA value is 0.742475. We can conclude that using the NAGARCHSK-RNN model it is difficult to map the carbon price series with the time-varying, high-order moment feature, and its predictive ability cannot meet investors' expectations. predictive ability cannot meet investors' expectations. For the high-order moment forecasting models considering time-varying conditional variance and conditional kurtosis, as shown in Panel B, the NAGARCHSK-GRU model still has obvious forecasting advantages in error evaluation criteria and is relatively better in market-expected criteria compared with other benchmark models. Specifically, the error indexes RMSE, MAE and MAPE of the NAGARCHSK-GRU model are 2.034801,

For the high-order moment forecasting models considering time-varying conditional variance and conditional kurtosis, as shown in Panel B, the NAGARCHSK-GRU model still has obvious forecasting advantages in error evaluation criteria and is relatively better in market-expected criteria compared with other benchmark models. Specifically, the error indexes RMSE, MAE and MAPE of the NAGARCHSK-GRU model are 2.034801, 1.473751 and 0.835218, respectively, which are lower than other benchmark criteria, the market-expected criteria DA is 0.731621, which is second only to the 0.843725 of the NAGARCHSK-MLP model. It is worth noting that the NAGARCHSK-LSTM model, which has the advantage in fitting financial time series, has the worst performance in carbon prediction among all models. The error criteria RMSE, MAE and MAPE are 4.616283, 2.380641 and 3.237167, respectively, and the market-expected criteria DA is 0.505615. This shows that the NAGARCHSK-LSTM model's prediction ability and generalization ability are declining. The conclusion that the gap between the fitting curve and the real value is extremely obvious is also shown in Figure 5, and the correlation is poor. 1.473751 and 0.835218, respectively, which are lower than other benchmark criteria, the market-expected criteria DA is 0.731621, which is second only to the 0.843725 of the NA-GARCHSK-MLP model. It is worth noting that the NAGARCHSK-LSTM model, which has the advantage in fitting financial time series, has the worst performance in carbon prediction among all models. The error criteria RMSE, MAE and MAPE are 4.616283, 2.380641 and 3.237167, respectively, and the market-expected criteria DA is 0.505615. This shows that the NAGARCHSK-LSTM model's prediction ability and generalization ability are declining. The conclusion that the gap between the fitting curve and the real value is extremely obvious is also shown in Figure 5, and the correlation is poor. For the carbon pricing model without considering the shock of time-varying, highorder moment, the GRU network model has the smallest error evaluation criteria (as shown in Panel C), which denotes that the GRU model still has strong prediction accuracy and robustness even without considering the characteristics of time-varying, high-order moment. However, the market-expected criteria DA is 0.74368, which is only higher than the 0.536842 of the LSTM model. We can conclude that the prediction performance of the model makes it difficult to satisfy the investors' psychological expectations.

**Figure 5.** The forecasting performance of the proposed and benchmark models consider the feature of time-varying conditional variance and conditional kurtosis. **Figure 5.** The forecasting performance of the proposed and benchmark models consider the feature of time-varying conditional variance and conditional kurtosis.

For the carbon pricing model without considering the shock of time-varying, highorder moment, the GRU network model has the smallest error evaluation criteria (as shown in Panel C), which denotes that the GRU model still has strong prediction accuracy and robustness even without considering the characteristics of time-varying, high-order moment. However, the market-expected criteria DA is 0.74368, which is only higher than the 0.536842 of the LSTM model. We can conclude that the prediction performance of the model makes it difficult to satisfy the investors' psychological expectations. *Int. J. Environ. Res. Public Health* **2022**, *19*, x FOR PEER REVIEW 14 of 20

> Although the error criteria of other benchmark models are lower than those of the GRU model, the difference is not significant (as shown in Figure 6), particularly the RMSE and MAE of all models are basically close, and the deviation is small. More obviously, the market-expected criteria of the GBR and MLP models are significantly higher than those of other models, with DA values of 0.875318 and 0.872774, respectively, indicating the carbon-prediction performance of those two models is relatively stable and has a certain robustness. The market prediction performance suggests a reliable reference for investors making investment decisions. Although the error criteria of other benchmark models are lower than those of the GRU model, the difference is not significant (as shown in Figure 6), particularly the RMSE and MAE of all models are basically close, and the deviation is small. More obviously, the market-expected criteria of the GBR and MLP models are significantly higher than those of other models, with DA values of 0.875318 and 0.872774, respectively, indicating the carbon-prediction performance of those two models is relatively stable and has a certain robustness. The market prediction performance suggests a reliable reference for investors making investment decisions.

**Figure 6.** The forecasting performance of the proposed and benchmark models without considering the feature of time-varying, high-order moment. **Figure 6.** The forecasting performance of the proposed and benchmark models without considering the feature of time-varying, high-order moment.

Comparing the prediction results of all the pricing models of Panel A, Panel B and Panel C in Table 3, firstly, the carbon price-forecasting performance of the NAGARCHSK-GRU model is the best among all pricing models in Panel A, Panel B and Panel C. Secondly, the error criteria RMSE, MAE, and MAPE of the carbon pricing model in Panel A are significantly smaller than those of the error criteria in Panel B and Panel C, while the market-expected indicator DA is significantly higher than other models. Furthermore, the error criteria RMSE, MAE, and MAPE of the pricing model in Panel C are relatively high, while the market-expected index DA is relatively low. Comparing the prediction results of all the pricing models of Panel A, Panel B and Panel C in Table 3, firstly, the carbon price-forecasting performance of the NAGARCHSK-GRU model is the best among all pricing models in Panel A, Panel B and Panel C. Secondly, the error criteria RMSE, MAE, and MAPE of the carbon pricing model in Panel A are significantly smaller than those of the error criteria in Panel B and Panel C, while the market-expected indicator DA is significantly higher than other models. Furthermore, the error criteria RMSE, MAE, and MAPE of the pricing model in Panel C are relatively high, while the market-expected index DA is relatively low.

The empirical results shown in Table 3 conclude that the deep-learning, carbon price forecasting model that considers the time-varying, high-order moment characteristics can provide more confident carbon premium evidence. This conclusion further proves that the carbon return is not only affected by the low-order moment attribute pricing factor, but also that the time-varying, high-order moment attribute that reflects the market asymmetric information and extreme shock is also an important explanatory factor for carbon The empirical results shown in Table 3 conclude that the deep-learning, carbon price forecasting model that considers the time-varying, high-order moment characteristics can provide more confident carbon premium evidence. This conclusion further proves that the carbon return is not only affected by the low-order moment attribute pricing factor, but also that the time-varying, high-order moment attribute that reflects the market asymmetric information and extreme shock is also an important explanatory factor for carbon

return. The research results of this article can provide valuable reference for investors, commercial banks, and emission-reduction companies to judge market conditions and

The significant advantage of the GRU model is that the parameter training structure of the long memory function can fit the finance time series, especially the time series over a long period time. Therefore, to prove the robustness of the NAGARCHSK-GRU model

4.3.3. Robustness of the NAGARCHSK-GRU Model

predict market trends.

return. The research results of this article can provide valuable reference for investors, commercial banks, and emission-reduction companies to judge market conditions and predict market trends.

#### 4.3.3. Robustness of the NAGARCHSK-GRU Model

The significant advantage of the GRU model is that the parameter training structure of the long memory function can fit the finance time series, especially the time series over a long period time. Therefore, to prove the robustness of the NAGARCHSK-GRU model in different prediction period, this paper analyzes the performance of the proposed pricing model in the short-term, medium-term and long-term, respectively. Among them, the last 4 months, 10 months, and 15 months of the carbon returns are used as the prediction set in the short-term, medium-term, and long-term, respectively, and the rest of the data are used as the training set. The pricing model structure adopts the optimal network structure decided in the previous section. This part mainly describes the prediction performance of the carbon price-forecasting model that considers the time-varying, high-order moment characteristics, furthermore, the RMSE, MAE, MAPE error criteria are used to evaluate the model's pricing accuracy and stability.

For carbon price-forecasting performance in different periods (as shown in Table 4), the NAGARCHSK-GRU pricing model has significant superiority in the short-term, mediumterm and long-term for all the error criteria, that is, the values of RMAE, MAE and MAPE are significantly lower than other benchmark models, and the proposed model has satisfactory robustness over all periods. The error distribution of the proposed and benchmark pricing models can be seen in Figures 7–9.


**Table 4.** Prediction performance of the pricing model considering the feature of time-varying, highorder moment.

Note: Bold numbers are the minimum MSE, RMSE, MAE, respectively. The network structure of the proposed and benchmark models adopts the optimal structure decided experimentally in the previous section, that is, the structure of 4-16-16-1.

Based on the estimation errors of the pricing models over different periods, it is found that as the forecasting period gradually increases from short-term to long-term, the forecasting errors of all pricing model gradually decrease, resulting the improvement of model accuracy and stability.

In particular, the NAGARCHSK-GRU model has the smallest prediction error and the best prediction performance, that is, the long-term prediction error RMSE, MAE and MAPE are 0.752385, 0.218883 and 0.354984, respectively, the medium-term prediction error RMSE, MAE and MAPE are 0.825573, 0.246388 and 0.476214, respectively, and the short-term prediction error RMSE, MAE and MAPE are 1.109585, 0.408066 and 0.264886, respectively. This evidence shows that the accuracy and stability of the NAGARCHSK-GRU model are gradually optimized with the extension of forecasting time, and it is significantly better than other benchmark models for forecasting the 15 month lagged returns. Since the advantage

**NAGARCHSK-GRU** 

of the GRU model is fitting the longer finance time series, the findings of this article provide further evidence for this convinced conclusion and also the robust performance of the proposed model for different prediction periods. MAPE 0.264886 0.873019 7.400078 0.794943 0.807474 2.576337 Note: Bold numbers are the minimum MSE, RMSE, MAE, respectively. The network structure of the proposed and benchmark models adopts the optimal structure decided experimentally in the previous section, that is, the structure of 4-16-16-1.

**Table 4.** Prediction performance of the pricing model considering the feature of time-varying, high-

**NAGARCHSK-MLP** 

**NAGARCHSK-GRB** 

**NAGARCHSK-BP** 

*Int. J. Environ. Res. Public Health* **2022**, *19*, x FOR PEER REVIEW 16 of 20

Panel A: Long-term prediction performance (15 months) RMSE **0.752385** 1.157871 2.996408 1.981738 1.985327 3.001473 MAE **0.218883** 0.412011 2.352625 1.559398 1.560740 2.393809 MAPE **0.354984** 0.299397 0.765891 0.692497 0.689473 6.234842 Panel B: Medium-term prediction performance (10 months) RMSE 0.825573 0.877745 3.699027 2.388462 2.388496 3.686389 MAE 0.246388 0.345985 2.629891 1.698282 1.698174 2.590053 MAPE 0.476214 1.997452 10.473157 0.809171 0.838451 11.833577

RMSE 1.109585 0.761311 3.348173 2.160763 2.162539 3.209178 MAE 0.408066 0.258531 2.414090 1.560627 1.562554 2.334026

**NAGARCHSK-RNN** 

order moment.

**NAGARCHSK-LSTM** 

**Proposed Model Benchmark Model** 

**Figure 7.** Error scatter distribution of the proposed and benchmark forecasting models considering the feature of time varying, high-order moment in the long term. **Figure 7.** Error scatter distribution of the proposed and benchmark forecasting models considering the feature of time varying, high-order moment in the long term.

*Int. J. Environ. Res. Public Health* **2022**, *19*, x FOR PEER REVIEW 17 of 20

**Figure 8.** Error scatter distribution of the proposed and benchmark forecasting models considering the feature of time-varying, high-order moment in the medium term. **Figure 8.** Error scatter distribution of the proposed and benchmark forecasting models considering the feature of time-varying, high-order moment in the medium term.

**Figure 9.** Error scatter distribution of the proposed and benchmark forecasting models considering

As a market-oriented mechanism for innovation to curb global climate issues, the carbon market is recognized as the most effective means to reducing the global carbon dioxide emissions and realize the sustainability of human society and economic growth. Compared with other financial markets, the carbon market has obvious market asymmetry, is sensitive to policy shocks and has time-varying volatility. However, the existing carbon price-forecasting research mainly focuses on the price information transmission and risk volatility spillover from the perspective of low-order moment of return, and ignores the time-varying impact of asymmetric information and extreme policies on carbon assets from the perspective of high-order moment attributes (market skewness and kur-

tosis). The explanation for carbon premium lacks sufficient evidential support.

the feature of time-varying, high-order moment in the short term.

**5. Conclusions and Prospects** 

*5.1. Conclusions* 

the feature of time-varying, high-order moment in the medium term.

**Figure 9.** Error scatter distribution of the proposed and benchmark forecasting models considering the feature of time-varying, high-order moment in the short term. **Figure 9.** Error scatter distribution of the proposed and benchmark forecasting models considering the feature of time-varying, high-order moment in the short term.

**Figure 8.** Error scatter distribution of the proposed and benchmark forecasting models considering

#### **5. Conclusions and Prospects 5. Conclusions and Prospects**

#### *5.1. Conclusions 5.1. Conclusions*

As a market-oriented mechanism for innovation to curb global climate issues, the carbon market is recognized as the most effective means to reducing the global carbon dioxide emissions and realize the sustainability of human society and economic growth. Compared with other financial markets, the carbon market has obvious market asymmetry, is sensitive to policy shocks and has time-varying volatility. However, the existing carbon price-forecasting research mainly focuses on the price information transmission and risk volatility spillover from the perspective of low-order moment of return, and ignores the time-varying impact of asymmetric information and extreme policies on carbon assets from the perspective of high-order moment attributes (market skewness and kurtosis). The explanation for carbon premium lacks sufficient evidential support. As a market-oriented mechanism for innovation to curb global climate issues, the carbon market is recognized as the most effective means to reducing the global carbon dioxide emissions and realize the sustainability of human society and economic growth. Compared with other financial markets, the carbon market has obvious market asymmetry, is sensitive to policy shocks and has time-varying volatility. However, the existing carbon price-forecasting research mainly focuses on the price information transmission and risk volatility spillover from the perspective of low-order moment of return, and ignores the time-varying impact of asymmetric information and extreme policies on carbon assets from the perspective of high-order moment attributes (market skewness and kurtosis). The explanation for carbon premium lacks sufficient evidential support.

The innovation and contribution of this article are constructing an integrated carbon price-forecasting model, NAGARCHSK-GRU, based on the special characteristics of carbon assets such as market asymmetry, strong policy-shock sensitivity, and time-varying volatility. The proposed forecasting model considers the time-varying impact of market asymmetric information and extreme factors on carbon prices from the perspective of highorder moment attributes, so as to provide new evidence to explain the carbon premiums. The main work and research conclusions of this paper are as follows:

Firstly, carbon assets have obvious time-varying, high-order moment volatility characteristics. Compared with constant high-order moment volatility models, the time-varying, high-order moment volatility NAGARCHSK model can reveal the time-varying impact of systemic risk, asymmetric information and extreme factors on carbon premium by the function of time-varying variance, time-varying skewness and time-varying kurtosis equations. Moreover, the time-varying, high-order moment characteristics estimated by the NAGARCHSK model can explain the volatility clustering and premium mechanism of carbon price.

Secondly, the proposed machine-learning pricing model has more accuracy and stability in predicting carbon price with time-varying, high-order moment volatility characteristics. The time-varying impact of asymmetric information and extreme factors on carbon price is also important evidence for explaining carbon premium. This conclusion shows that the carbon pricing model proposed in this paper can fit and forecast carbon return effectively, specifically, the NAGARCHSK-GRU model is significantly better than other

deep-network models. Further research shows that the NAGARCHSK-GRU model has reliable advantages in long-term, medium-term and short-term carbon price fitting and forecasting. In particular, the long-term carbon price-forecasting ability is outstanding, that is, it has perfect stability and accuracy for 15 months of prices forecasting. This conclusion not only confirms the advantages of the NAGARCHSK-GRU model in fitting long-term financial data, but also proves that the carbon pricing model considering time-varying, high-order moment volatility can provide a strong explanation for carbon price.

The theoretical and practical implications of this paper are: firstly, as for the theoretical innovation, the findings show the rationality and effectiveness of incorporating the timevarying impact characteristics of asymmetric market information and extreme factors into the carbon price-forecasting model. The accuracy of carbon price forecasting can suggest a stronger reference for investors to judge market conditions, formulate investment strategies and may serve the implementation of carbon emission reduction. Secondly, as for the practical function, the maturity of carbon pricing mechanisms provide a decision-making basis for the government to speed up the construction of carbon market mechanisms and enhance the ability of the financial system to manage climate change. The conclusion of this paper also provides technical support for investors, emission-reduction entities and other market participants to capture price information and predict price changes.

#### *5.2. Prospects*

The focus of this paper is carbon premium explanation from the perspective of highorder moments. In the proposed high-order-moment pricing framework, each pricing term is a statistically structured factor, and the actual meaning behind the statistical indicators of high-order-moment attributes is not clear. Based on this, employing text-mining technology to obtain unstructured carbon pricing data that represent investor sentiment, policy impacts and other pricing factors, rather than the statistical moment attributes, is a valuable avenue for continuing relevant research in the future.

**Author Contributions:** Conceptualization, P.Y. and Y.Y.; methodology, P.Y. and Y.W.; model code analysis, P.Y. and C.Z.; formal analysis, Y.W. and C.Z.; writing—original draft preparation, P.Y. and Y.W.; writing—review and editing, P.Y. and Y.Y.; supervision, C.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was funded by the following projects: the Youth Fund Project for Humanities and Social Sciences of the Ministry of education of China, the grant number is 21YJC790152; the University Humanities and Social Sciences Research Project of Anhui Province of China, the grant number is SK2021A0574; the Social Sciences innovation and development project of Anhui Province of China, the grant number is 2021CX028; the National Natural Science Foundation of China, the grant number are 72101006 and 71971071.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data in this paper are from the trading information of Environmental Markets in European Energy Exchange. The data can be obtained free of charge on the websites of Environmental Markets (eex.com). Of cause, the data are also available from the corresponding author upon reasonable request.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

