CGAOA-AttBiGRU: A Novel Deep Learning Framework for Forecasting CO2 Emissions

Liu, Haijun; Wu, Yang; Tan, Dongqing; Chen, Yi; Wang, Haoran

doi:10.3390/math12182956

Open AccessArticle

CGAOA-AttBiGRU: A Novel Deep Learning Framework for Forecasting CO₂ Emissions

by

Haijun Liu

¹

,

Yang Wu

¹,

Dongqing Tan

^2,*,

Yi Chen

¹ and

Haoran Wang

¹

School of Emergency Management, Institute of Disaster Prevention, Langfang 065201, China

²

College of General Education, Hainan Vocational University, Haikou 570216, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(18), 2956; https://doi.org/10.3390/math12182956

Submission received: 1 August 2024 / Revised: 17 September 2024 / Accepted: 20 September 2024 / Published: 23 September 2024

(This article belongs to the Special Issue Advanced Analyses and Algorithms for Trustworthy AI Systems and Applications)

Download

Browse Figures

Versions Notes

Abstract

Accurately predicting carbon dioxide (CO₂) emissions is crucial for environmental protection. Currently, there are two main issues with predicting CO₂ emissions: (1) existing CO₂ emission prediction models mainly rely on Long Short-Term Memory (LSTM) and Gate Recurrent Unit (GRU) models, which can only model unidirectional temporal features, resulting in insufficient accuracy: (2) existing research on CO₂ emissions mainly focuses on designing predictive models, without paying attention to model optimization, resulting in models being unable to achieve their optimal performance. To address these issues, this paper proposes a framework for predicting CO₂ emissions, called CGAOA-AttBiGRU. In this framework, Attentional-Bidirectional Gate Recurrent Unit (AttBiGRU) is a prediction model that uses BiGRU units to extract bidirectional temporal features from the data, and adopts an attention mechanism to adaptively weight the bidirectional temporal features, thereby improving prediction accuracy. CGAOA is an improved Arithmetic Optimization Algorithm (AOA) used to optimize the five key hyperparameters of the AttBiGRU. We first validated the optimization performance of the improved CGAOA algorithm on 24 benchmark functions. Then, CGAOA was used to optimize AttBiGRU and compared with 12 optimization algorithms. The results indicate that the AttBiGRU optimized by CGAOA has the best predictive performance.

Keywords:

CO₂ emissions prediction; hyperparameter optimization; attention mechanism; Arithmetic Optimization Algorithm (AOA)

MSC:

68T07

1. Introduction

In the past few decades, rapid economic growth in China coupled with heavy reliance on high-carbon energy sources such as coal has led to a significant increase in CO₂ emissions [1]. According to the data from the International Energy Agency (IEA) and other relevant organizations, since 2006, China’s CO₂ emissions have surpassed those of the United States, making China the world’s largest emitter of CO₂. The substantial emissions of CO₂ into the atmosphere have significant adverse effects on humanity. On the one hand, excessive CO₂ emissions can enhance the greenhouse effect in the Earth’s atmosphere, leading to global warming, rising sea levels, extreme droughts, increased floods and storms; and even ecosystem collapse, species extinction, and loss of human habitats [2,3]. On the other hand, excessive CO₂ emissions can lead to human respiratory alkalosis, causing rapid breathing, elevated partial pressure of CO₂, mental confusion, drowsiness, semi-consciousness to unconsciousness, and even seizures [4]. Moreover, the increase in CO₂ contributes to ocean acidification, which damages the ability of marine organisms such as mollusks, crustaceans, and corals to form calcium carbonate skeletons and shells, thus affecting the structure and function of the entire marine ecosystem. The threat posed by CO₂ has been widely recognized as one of the greatest environmental challenges facing the Earth [5]. In response to the pressing environmental situation, China has taken a series of actions and implemented numerous policies to reduce carbon emissions. For example, China launched a nationwide carbon market pilot in 2017 to establish a nationwide carbon trading system, driving enterprises to reduce carbon emissions and stimulating low-carbon development through carbon market mechanisms [6]. In addition, China pledged at the 21st United Nations Climate Conference that, from 2030 onwards, China will achieve the goal of no further increase in CO₂ emissions [7]. To achieve this goal, it is crucial to accurately predict the future CO₂ emissions so as to develop corresponding emission reduction policies [8].

To predict CO₂ emissions, two key issues need to be addressed: the first is to design a prediction model, and the second is to optimize the hyperparameters of the model to achieve optimal performance.

At present, the prediction models for CO₂ emissions are mainly based on deep learning models, including the Recurrent Neural Network (RNN), the Long Short-Term Memory Network (LSTM), and the Gated Recurrent Unit (GRU) [9,10,11]. However, RNN is unable to perform long time series prediction tasks due to the phenomenon of gradient vanishing during long time series prediction. LSTM addresses this issue by employing memory cells and gate mechanisms to effectively retain information in long sequences, thereby resolving problems such as gradient vanishing and long-term dependency in RNN [12]. However, the three gate mechanisms in LSTM are overly complex, leading to more model parameters and increasing the computational cost. GRU simplifies LSTM by utilizing two gate mechanisms—the update gate and the reset gate—to control the selective forgetting and retention of historical information. Due to its simplified structure, GRU has fewer parameters and lower computational costs [13]. However, traditional GRU can only extract unidirectional temporal features, resulting in incomplete understanding of certain specific patterns in the sequence and limited feature extraction for complex sequence patterns. To address this issue, researchers have proposed the Bidirectional GRU (BiGRU). A BiGRU unit comprises a forward and a backward GRU unit, which extract forward and backward temporal features, respectively, enabling a more comprehensive capturing of patterns and trends in the sequence [14,15]. Studies have shown that compared to the original GRU and LSTM, BiGRU significantly improves prediction accuracy [16]. Currently, BiGRU has been widely applied in time series prediction, but there have been no reports on its application in CO₂ emission prediction.

Due to the simplicity of the BiGRU structure and its ability to capture bidirectional patterns and trends in sequences, this study chooses BiGRU as the prediction model for CO₂ emission prediction. Furthermore, to pay more attention to important bidirectional temporal features so as to effectively improve the prediction performance, we add an attention mechanism to BiGRU. The attention mechanism adaptively calculates the weights for features, assigning greater weights to important features to allow them to play a larger role in prediction [17,18]. Recent studies have shown that adding attention mechanisms to models can significantly improve performance [19,20,21]. In this study, by adding an attention mechanism to BiGRU, the bidirectional temporal features extracted by BiGRU are utilized more effectively in prediction, thereby improving the model’s prediction performance. The final model in this study is named as AttBiGRU.

There are some important hyperparameters in the proposed AttBiGRU that have a significant impact on its performance. Research has shown that hyperparameters have a considerable impact on deep learning models’ performance, sometimes surpassing the impact of model selection itself [22,23,24]. Finding the optimal combination of hyperparameters for deep learning models—known as hyperparameter optimization—needs first to define a search space, including the hyperparameters to be optimized and their corresponding ranges. Then, an optimization algorithm is defined to find the best solution within this search space. Due to the large number of hyperparameters and the wide range of each parameter, there exist numerous combinations of hyperparameters in the search space. All these possible combinations need to be evaluated to find the best one [25,26,27]. Hyperparameter optimization, therefore, has always been a challenging task in the applications of deep learning.

When using AttBiGRU for predicting CO₂ emissions, there are two types of hyperparameters: (1) hyperparameters related to the structure of AttBiGRU, such as the number of units in each BiGRU layer, the number of layers stacked in BiGRU, the dimension of attention in the attention layer, etc. (2) Hyperparameters related to AttBiGRU training, such as the step size of parameter updates during model training (i.e., learning rate), the proportion of neurons randomly dropped during training (denoted by dropout rate), the maximum number of iterations allowed during training (epochs), optimizer used in error backpropagation, loss function, etc. [28,29]. Due to the large number of hyperparameters and the large search space for each hyperparameter, optimizing all hyperparameters requires a huge amount of computation. This paper selects five of the most critical hyperparameters that have the greatest impact on the performance of AttBiGRU for optimization. The detailed descriptions of these selected hyperparameters can be found in Table 10.

The hyperparameter optimization of deep learning models is a non-convex problem that traditional gradient-based algorithms cannot solve. Recently, swarm intelligence optimization algorithms have been proven to be effective in hyperparameter optimization [30]. Swarm intelligence optimization algorithms belong to a class of optimization algorithms based on collective collaboration and heuristic search, mimicking the characteristics of collective behavior in nature to find the optimal or near-optimal solution to a problem [31]. At present, the mainstream swarm intelligence optimization algorithms include GWO (Grey Wolf Optimizer), BWO (Beluga Whale Optimization), SCA (Sine Cosine Algorithm), SSA (Sparrow Search Algorithm), SOA (Seagull Optimization Algorithm), DE (Differential Evolution), MFO (Moth Flame Optimization), FPA (Flower Pollination Algorithm), and PSO (Particle Swarm Optimization) [32,33,34,35,36,37,38,39,40], etc. Although a large number of swarm intelligence optimization algorithms have been proposed, according to the “No Free Lunch” theorem, no optimization algorithm is omnipotent in solving all problems [41,42]. That is, if an algorithm performs well in some cases, it will not perform as well in other cases. In other words, there is no single algorithm that can perform excellently on all problems [43]. Therefore, it is still very important to improve existing algorithms to enhance their ability to optimize a certain type of problem.

AOA was proposed by Laith Abualigah in 2021 [44]. It has the advantages of strong global search capability, fast convergence speed, and good adaptability and robustness. Experiments on benchmark functions have shown that its performance surpasses GWO, BWO, SCA, SSA, SOA, DE, MFO, FPA, and PSO. However, the original AOA algorithm suffers from two issues: (1) imbalance between the exploration and exploitation phases. (2) Over-reliance on historical optimal positions during the position updates in the exploration and exploitation phases, leading to the algorithm easily getting stuck in local optima. To address these two issues, we propose an improved version of the Arithmetic Optimization Algorithm called chaotic mapping and Gaussian mutation Arithmetic Optimization Algorithm (CGAOA) by adding chaotic mapping and Gaussian mutation. The improved algorithm is called CGAOA.

CGAOA is used to optimize the five key hyperparameters of the carbon dioxide emission prediction model AttBiGRU (the optimized hyperparameters are described in detail in Table 10).

The contributions of this paper are as follows:

(1): This paper proposes a CGAOA-AttBiGRU framework for CO₂ emission prediction, in which AttBiGRU is a deep learning model for CO₂ emission prediction and CGAOA is an improved AOA algorithm for hyperparameter optimization of AttBiGRU.
(2): The proposed CGAOA-AttBiGRU was compared with ARMA, ARIMA, SVM, ANN, GRU, LSTM, and BiGRU in predicting CO₂ emissions from four sectors in China. The results show that our proposed model is significantly superior to the comparison models.

This paper is structured as follows: Section 2 provides a review of CO₂ emission prediction models and hyperparameter optimization methods. Section 3 introduces the CO₂ emission prediction model AttBiGRU and an improved optimization algorithm CGAOA. Section 4 demonstrates the comparative experiments of CGAOA, AOA, and the remaining 9 swarm intelligent optimization algorithms on 24 benchmark functions. It also introduces the application of CGAOA and AttBiGRU in predicting CO₂ emissions in China. Finally, Section 5 concludes this paper.

2. Literature Review

As mentioned earlier, predicting CO₂ emissions requires two key steps: (1) establishing a predicting model. (2) Optimizing the model. In this section, the current research of CO₂ emissions prediction models and optimization methods will be introduced separately.

2.1. Research on CO₂ Emissions Prediction Models

There are various methods for predicting CO₂ emissions. For example, Kavoosi et al. forecasted global CO₂ emissions using linear and nonlinear equation models [45]. Lotfalipour et al. predicted Iran’s future 10-year CO₂ emissions using Autoregressive Integrated Moving Average Model (ARIMA) [46]. ARIMA is a linear model, its accuracy is insufficient when modeling nonlinear CO₂ emissions. Sun et al. utilized support vector machines (SVM) to forecast CO₂ emissions from three major industries in China [47]. Zhao et al. proposed a hybrid data sampling regression model, MIDAS, and combined it with BP neural networks to predict the CO₂ emissions of the United States [48]. SVM and BP Network can be used for nonlinear modeling, but they do not consider the temporal variation of CO₂ emissions. Wang et al. used RNN to predict CO₂ emissions [49]. RNN is a time series prediction model, which can describe the time variation pattern in the series. However, RNN is affected by a phenomenon of data forgetting in long-term series prediction. LSTM solves the data-forgetting problem of RNN in long-term time series prediction through gate structure, and has also been introduced into carbon dioxide prediction [50,51]. Subsequently, GRU—with a simpler structure than LSTM—was proposed and applied to predict CO₂ emissions [52]. In recent years, researchers have combined attention techniques with LSTM and GRU to enhance their ability to focus on important features, thereby improving prediction accuracy. For example, Cao et al. proposed the TA-GRU model by adding an attention mechanism to the GRU model to predict CO₂ emissions for 27 provincial-level administrative regions in China [53]. Research has shown that LSTM and GRU with attention techniques added have better predictive abilities. The related works on CO₂ emission prediction are summarized in Table 1.

2.2. Research on Models’ Hyperparameter Optimization

From Table 1, it can be seen that in the field of CO₂ emission prediction, research work mainly focuses on designing different prediction models, with little involvement in model hyperparameter optimization. The hyperparameters in existing prediction models are manually set by researchers. Manual setting of hyperparameters is easily influenced by personal subjective opinions as it is based on the experience and intuition of researchers. In particular, many hyperparameters are continuous, and those that researchers manually set are almost impossible to be the optimal ones [54]. That is to say, the model with manual hyperparameters cannot achieve its optimal performance.

In other applications of deep learning models, hyperparameter optimization methods include grid search, random search, Bayesian optimization, and swarm intelligence optimization [55,56,57,58].

Grid search is an automatic hyperparameter optimization algorithm, which is widely used in hyperparameter optimization for deep learning [59,60,61]. In the grid search algorithm, each hyperparameter is first discretized to generate a discrete search space. Then, an exhaustive search is performed within this space to determine the optimal combination of hyperparameters [62]. The grid search method achieves automatic optimization of hyperparameters and solves the problem of manual methods relying on expert’s experience. Grid search is effective for low-dimensional optimization but becomes computationally expensive for high-dimensional optimization problems (when there are many hyperparameters). Additionally, in grid search, continuous-valued hyperparameters need to be discretized, leading to a discrete search space. So not all possible values of hyperparameters are included in the search space. Therefore, grid search can only find suboptimal solutions [63,64].

Random search is a commonly used optimization algorithm [65]. It randomly samples the hyperparameter space and evaluates each sampling point, repeating the process to find the best combination of hyperparameters. Its time complexity is lower than grid search, but due to random selection, it is almost impossible to find the optimal combination of hyperparameters. Furthermore, the random search algorithm cannot learn from historical iterations, resulting in unstable results [66,67].

The Bayesian optimization method uses probability models to “learn” from previous iterations and guides the search towards the optimal hyperparameters in the search space [68]. Compared with memory-less grid search and random search methods, Bayesian optimization can find better parameters in fewer iterations. However, when the search space of hyperparameters is complex, Bayesian optimization tends to focus on the most promising region during the exploration process, which may lead to being trapped in local optima [69,70].

In addition to the above optimization algorithms, there is a class of population-based stochastic optimization algorithms called swarm intelligence (SI) optimization algorithms. These population-based optimization algorithms simulate the optimization process of various animal or object populations to find the optimal solution of the model [71]. These SI algorithms are simple, flexible, efficient, and have a low dependence on the prior knowledge of the problem, and are widely used in the optimization of deep learning models in other application fields, such as wind speed forecasting [72,73,74], image recognition [75,76], medical diagnosis [77,78], water resources management [79,80], stock price forecasting [81,82], solar radiation forecasting [83,84], and fault diagnosis [85]. However, no application of swarm intelligence optimization algorithms has been reported in the field of CO₂ emissions forecasting.

3. Methods

This paper proposed a framework for predicting and optimizing CO₂ emissions, which was named CGAOA-AttBiGRU. In this framework, AttBiGRU is a deep learning prediction model used for predicting CO₂ emissions, and CGAOA is a swarm intelligence optimization algorithm used to optimize the five important hyperparameters of AttBiGRU. This section provides a detailed introduction to the model and the optimization algorithm, respectively.

3.1. CO₂ Emissions Prediction Model: AttBiGRU

The main network structure of the proposed AttBiGRU adopts BiGRU, which is a variant of GRU. A GRU unit includes two gate structures: a reset gate and an update gate. Compared with LSTM, GRU has a simpler structure and runs faster, so it can be used to build larger-scale prediction models. Figure 1 shows the internal structure of a GRU unit.

The calculation of GRU is shown in Formulas (1)–(4).

z_{t} = σ (W_{Z} [h_{t - 1}, x_{t}] + b_{z})

(1)

r_{t} = σ (W_{r} [h_{t - 1}, x_{t}] + b_{r})

(2)

\tilde{h_{t}} = t a n h (w_{h} [r_{t} h_{t - 1}, x_{t}] + b_{h})

(3)

h_{t} = (1 - z_{t}) \tilde{h_{t}} + h_{t - 1} z_{t}

(4)

where

z_{t}

is the update gate,

r_{t}

is the reset gate,

{\tilde{h}}_{t}

is the candidate hidden state,

h_{t}

is the hidden state.

w_{z}

,

w_{r}

and

w_{h}

are learnable weight matrix.

b_{z}

,

b_{r}

and

b_{h}

are learnable biases.

A BiGRU unit contains a forward GRU unit and a backward GRU unit, which are, respectively, used to extract forward and backward temporal features from sequences. BiGRU has stronger sequence modeling ability than basic GRU; therefore, this paper chooses it as the basic unit of the predicting model. Furthermore, we have added an attention mechanism to the model to adaptively weight the bidirectional temporal features extracted by BiGRU, so as to improve its performance. The specific structure of our proposed model, called AttBiGRU, is shown in Figure 2.

3.2. CGAOA

The proposed AttBiGRU model contains many hyperparameters that can affect its performance. Therefore, an optimization algorithm is needed to find the optimal hyperparameters. In this paper, we improved the arithmetic optimization algorithm (AOA) by adding chaotic mapping and Gaussian mutation and proposed CGAOA. CGAOA is then used to optimize AttBiGRU. In this section, the original AOA and our improved CGAOA will be introduced separately.

3.2.1. The Original AOA

The arithmetic optimization algorithm (AOA) was proposed by Laith Abualigah et al. in 2021 [44]. It has the advantages of strong global search ability, fast convergence speed, good adaptability, and robustness. Experiments on benchmark functions show that its performance exceeds many optimization algorithms [40]. Through the arithmetic operators in mathematical operations—namely addition (+), subtraction (−), multiplication (×), and division (÷)—the algorithm gradually explores and develops optimal solutions using optimization rules after initialization of a clustering-based metaheuristic algorithm that can solve optimization problems without calculating derivatives. The arithmetic optimization algorithm primarily consists of three stages: initialization, exploration, and exploitation. These stages are elaborated upon as follows:

(1): Initialization phase

In AOA, the first phase is to generate a set of candidate solutions (X) randomly, as is shown in Equation (5).

X = [\begin{matrix} x_{1,1} & x_{1,2} & \dots & x_{1, d} \\ x_{2,1} & x_{2,2} & \dots & x_{2, d} \\ x_{31} & x_{3,2} & \dots & x_{3, d} \\ . . & . . & . . \\ \ddot{.} & . . & . . \\ x_{n, 1} & x_{n, 2} & \dots & x_{n, d} \end{matrix}]

(5)

where

d

represents the number of the hyperparameters to be optimized.

n

is the number of solutions.

x_{i, j}

indicates the position of the

i

-th solution in the j-th dimension.

Then, exploration and exploitation are determined by the math optimizer accelerated (MOA) function, which is shown in Equation (6).

M O A (t) = M i n + t \times \frac{M a x - M i n}{T}

(6)

where MOA denotes the function value at the

t

-th iteration, T is the maximum number of iterations, Min and Max represent the minimum and maximum values of MOA function, respectively.

In each iteration, a random number

r_{1}

is generated first. If

r_{1}

is larger than

M O A (t)

, update the position based on the formula in exploration phase. Otherwise, update the position based on the formula in the exploitation phase.

To balance exploration and exploitation, AOA adopts a parameter called the math optimizer probability (MOP), which is defined in Equation (7).

M O P (t) = 1 - \frac{t^{\frac{1}{α}}}{T^{\frac{1}{α}}}

(7)

where α is a sensitive parameter that determines the level of exploitative search during the optimization. In the original work [40], α is found best to be 5.

(2): Exploration phase

In the exploration phase, AOA provides two methods for updating positions: one using the division (D) operator and the other using the multiplication (M) operator. The position update formulas for this phase are shown in Equation (8).

x_{i, j}^{t + 1} = \{\begin{matrix} b e s t (x_{j}) \div MOP ((t) + ε) \times (({U B}_{j} - {L B}_{j}) \times μ + {L B}_{j}) r_{2} < 0.5 \\ b e s t (x_{j}) \times MOP (t) \times (({U B}_{j} - {L B}_{j}) \times μ + {L B}_{j}) o t h e r w i s e \end{matrix}

(8)

where,

r_{2}

is a random number between (0, 1).

x_{i, j}^{t + 1}

represents the

i

-th solution at the

j

-th position in the

t + 1

th iteration.

b e s t (x_{j})

denotes the best solution obtained in the

t

-th iteration. ε is defined as a small number used to avoid division by zero.

{U B}_{j}

and

{L B}_{j}

represent the upper and lower boundaries of the

j

-th hyperparameter to be optimized, respectively. μ is a control parameter used to adjust the search process and is typically set to 0.499 [40].

(3): Exploitation phase
In the Exploitation phase, AOA uses addition (A) and subtraction (S) to update positions, as shown in Equation (9).

x_{i, j}^{t + 1} = \{\begin{matrix} b e s t (x_{j}) - MOP (t) \times (({U B}_{j} - {L B}_{j}) \times μ + {L B}_{j}) r_{3} < 0.5 \\ b e s t (x_{j}) + MOP (t) \times (({U B}_{j} - {L B}_{j}) \times μ + {L B}_{j}) o t h e r w i s e \end{matrix}

(9)

where,

r_{3}

is a random number between (0, 1).

3.2.2. Proposed CGAOA

The original AOA algorithm has advantages such as simple structure, fewer parameters, and low computational cost, but it also has some shortcomings, such as an imbalance between the exploitation and exploration phases, and it is easy to fall into local optima. To address these two issues, we have added two mechanisms to the basic AOA algorithm. The following gives detailed overviews of our improvements.

(1): Chaotic mapping mechanism

In the original AOA, whether the algorithm enters the exploration phase or the exploitation phase is determined by the value of MOA(t) and a random number

r_{1}

that varies between 0 and 1. When

r_{1}

is greater than MOA (t), the algorithm enters the exploration phase; otherwise, the algorithm enters the exploitation phase. However, MOA is a linearly increasing function of the iteration number t. This means that in the early stages of iteration, the value of MOA is small; while in the later stages of iteration, the value of MOA is large. Thus, the original AOA will likely enter the exploration phase in the early stages of iteration. While in the later stages of iteration, the algorithm tends to enter the exploration phase. This will lead to an imbalance between exploration and development at different iteration stages. In addition, the MOP in the original AOA is a monotonically decreasing function with the iteration number t, which will limit the diversity of position updates in the algorithm. To improve these two drawbacks, we use a chaotic mapping mechanism in MOA and MOP.

Chaos is a deterministic stochastic dynamic system, and chaotic mappings can be considered as a source of randomness. Adding chaotic mapping can effectively introduce diversity into the search process by blending determinism and randomness [86]. Some researchers have integrated chaotic mapping mechanisms into various optimization algorithms to enhance random diversity, thereby improving the ability to search for optimal or near-optimal solutions in complex multimodal scenarios.

In this paper, we add the logistic chaotic mapping mechanism into MOA and MOP to balance the exploration and exploitation phase, and to introduce uncertainty and fluctuation into the position updates.

The calculation formula of logistic chaotic mapping used in this paper is shown in Equation (10):

P^{t + 1} = a p^{t} (1 - p^{t})

(10)

where

t

represents the iteration number,

p^{t}

denotes the

t

-th chaotic number, and

p^{0}

is a random number in [0, 1].

a

is set to 4.

After adding logistic chaotic mapping, the new formulas of MOA and MOP are shown in Equations (11) and (12). the new formulas of MOA and MOP are shown in Equations (11) and (12).

M O A = a p^{t} (1 - p^{t}) \times {(\frac{t}{T})}^{\frac{1}{8}}

(11)

M O A = a p^{t} (1 - p^{t}) \times [1 - {(\frac{t}{T})}^{\frac{1}{2}}]

(12)

where the value of

a

is 4,

t

represents the current iteration number, and

T

represents the maximum number of iterations. The power of 1/8 in Formula (11) is determined experimentally. We conducted multiple experiments and found that the algorithm performs best when the power in MOA is 1/8.

Figure 3 shows the comparison effect with and without the logistic chaotic mapping mechanism. It can be seen that the MOA and MOP curves without logistic chaotic mapping exhibit fixed and monotonic changes. In comparison, the MOA and MOP curves with the addition of logical chaotic mapping exhibit good randomness, which increases the diversity of position updates and enables the algorithm to have better exploration and exploitation capabilities.

(2): Gaussian mutation mechanism

The basic AOA algorithm also has a problem of over-reliance on the historical best position during the position update process. This will lead to the algorithm hovering around the local optimum during the optimization process, rather than exploring the entire search space to find the global optimum. To address this issue, Çelik E [87] introduced the Gaussian mutation mechanism into the AOA algorithm. Inspired by his work, we introduced a Gaussian mutation mechanism after the exploration and exploitation phase of AOA. The newly added position update formula is as follows in Equation (13).

x_{i, j}^{t + 1} \{\begin{matrix} 2 \times MOP \times f (u) \times x_{i, j}^{t} r_{4} \leq 0.5 \\ 2 \times MOP \times f (u) + x_{i, j}^{t} r_{4} > 0.5 \end{matrix}

(13)

where

r_{4}

is a newly generated random number, ranging from 0 to 1,

f (u)

represents the probability density function of a Gaussian distribution, u is a random variable that follows a standard Gaussian distribution with mean 0 and standard deviation 1. Its calculation is shown in Equation (14).

f (u) = \frac{1}{\sqrt{2 π}} e^{- \frac{u^{2}}{2}}

(14)

The flowchart of final proposed CGAOA is shown in Figure 4, where the purple sections represent our improvements.

3.2.3. CGAOA Computational Complexity

The complexity analysis of the optimization algorithm is an essential key link to evaluating the performance of the algorithm, because the resource occupation and space cost of the algorithm needs to be inversely proportional to the performance. The computational complexity mainly consists of three aspects: the dimension d of the problem, the population size n and the maximum number of iterations T.

Firstly, the time complexity of initializing n search agents in the search space with dimension d is O(2 × n × d), and the time complexity of updating vectors of n search agents after T iterations is O(T × 2 × n × d). Secondly, the computational complexity of MOA and MOP after introducing chaotic mapping mechanism is O((n − 1)2). In the exploration stage of CGAOA, the time complexity of all individual location updating and border control strategies is O(2 × n × d). In the development stage, the time complexity of all location updating and border control strategies is also O(2 × n × d). To sum up, the overall complexity of CGAOA is O(T × 2 × n × d) + O((n − 1) 2) + O(2 × n × d) + O(2 × n × d).

4. Results and Discussion

4.1. CGAOA Performance Verification Experiments

In this section, experiments are done to test the optimization ability of the proposed CGAOA, including exploitation ability, exploration ability, and local optimal avoidance ability. These experiments include ablation experiments and comparative experiments with nine state-of-the-art metaheuristic algorithms. All experiments are conducted on a computer equipped with an Intel(R) Xeon(R) CPU E5-2686 v4 12-core processor and an NVIDIA GeForce RTX 3060 Ti graphics card with 8 GB VRAM. To make fair comparisons, all algorithms have the same experimental environment. Among them, the maximum number of iterations (T) is set to 1000, the population size (

n

) is set to 50.

4.1.1. Benchmark Functions

To evaluate the performance of CGAOA, we conducted comparative experiments on 24 different benchmark functions. These 24 test functions include nine unimodal benchmark functions (F1–F9, as shown in Table 2) used to assess the algorithm’s exploitation capability, nine multimodal benchmark functions (F10–F18, as shown in Table 3) used to evaluate the algorithm’s exploration capability and six composition functions from CEC2017 (F19–F24, as shown in Table 4) used to assess the local optimum avoidance capability. Range represents the boundary of variables, and F(min) represents the optimal value.

In this section, each algorithm is executed 30 times on each benchmark function to mitigate the influence of random factors. In this paper, the Friedman test method was used to sort and evaluate the results of all algorithms on the benchmark functions.

4.1.2. Ablation Analysis of the CGAOA

This paper added a logistic chaotic mapping mechanism and a Gaussian mutation mechanism to the basic AOA, and proposed CGAOA. This section conducted ablation experiments to verify the improvement of the added mechanisms on the performance of the basic AOA. Table 5 presents variant AOA algorithms with one or more fusion mechanisms, in which “1” means the mechanism is added and “0” indicates the opposite. These AOA variant algorithms were compared and tested on the 24 benchmark functions mentioned earlier.

The experimental results are given in Table 6, where Avg indicates the algorithm’s average rank of Friedman test on the 24 benchmark functions. A lower Avg value indicates a better performance of the algorithm. “+/−/=” represents the CGAOA is better than, worse than, or equal to other AOA variants; rank represents the final ranking of the algorithm.

From Table 6, it can be seen that the performance of basic AOA is the worst. Adding either logical chaotic mapping or Gaussian mutation mechanism can improve the performance of AOA. CGAOA ranks first, which means that adding both logical chaotic mapping and Gaussian mutation mechanisms to AOA can effectively improve its optimization ability.

4.1.3. Comparison with Other Algorithms

To test the performance of the proposed CGAOA, it was compared with nine other well-known algorithms. The comparative algorithms used in this paper are as follows:

Grey wolf optimizer (GWO) [30]
Beluga whale optimization (BWO) [31]
Sine cosine algorithm (SCA) [32]
Sparrow search algorithm (SSA) [33]
Seagull optimization algorithm (SOA) [34]
Differential evolution (DE) [35]
Moth flame optimization (MFO) [36]
Flower pollination algorithm (FPA) [37]
Particle swarm optimization (PSO) [38]

The parameter settings of the above comparative algorithms are shown in Table 7.

All algorithms are tested in the same environment, with a maximum iteration of 1000 and a population size of 50. To waken the impact of randomness on experimental results, each algorithm was run independently 30 times on each benchmark function and then the mean (Aver) and standard deviation (Std) from 30 separate runs of each algorithm on each test function were calculated (presented in Table 8, the best test results are in bold), with the smaller Aver indicating the better performance.

According to Aver in Table 8, it is obvious that CGAOA ranks first on 21 out of 24. Based on the Aver in Table 8, the Friedman test method is carried out to sort the fitness of all algorithms on the benchmark functions. The results of the Friedman test are presented in Figure 5, with the smallest Average ranking indicating the best performance. Obviously, CGAOA has the lowest average rankings in terms of unimodal, multimodal, and hybrid test functions. This indicates that the optimization performance of CGAOA outperforms the competitive algorithm no matter on unimodal, multimodal, or on hybrid functions.

Meanwhile, the Wilcoxon rank-sum test is used to illustrate the significant difference between CGAOA and the competing algorithms. Table 9 reported the p-values from the Wilcoxon rank-sum test, where a p-value of <0.05 indicates a statistically significant advantage of CGAOA over its competitors (the result of p-value > 0.05 are in bold). The non-significant results are highlighted in bold. Obviously, the majority of the p-values are lower than 0.05, indicating that the superiority of CGAOA on benchmark functions is statistically significant.

Figure 6, Figure 7 and Figure 8 show the curves of the fitness of each algorithm on each benchmark function during the iteration process. It can be seen that in the vast majority of benchmark functions, the fitness of the CGAOA algorithm proposed in this paper decreases faster, and the final fitness is the lowest; indicating that, compared with the competitive algorithms, the convergence speed of CGAOA is faster and the solutions found are better.

In summary, the results on 24 benchmark functions indicate that the proposed CGAOA algorithm is better than the comparison algorithms.

4.2. CGAOA-AttBiGRU Framework for CO₂ Emission Forecasting

Previous experiments have been conducted on benchmark functions. In this section, we will apply our proposed CGAOA to optimize practical application. We proposed a framework for CO₂ emission forecasting named CGAOA-AttBiGRU, where AttBiGRU is a deep learning model for CO₂ emission forecasting, and CGAOA is used to optimize the five hyperparameters of AttBiGRU.

For hyperparameter optimization in deep learning models, the optimization space has asymmetry and non-convexity; and existing grid search methods, Bayesian optimization methods, and random search methods have limitations when dealing with such problems. The proposed CGAOA algorithm is very suitable for optimizing deep learning models, mainly because chaotic mapping can increase the initial values of hyperparameters and the diversity of hyperparameter updates, and Gaussian mutation can increase the amplitude of hyperparameter updates. The AOA algorithm combined with these two strategies can quickly converge the model training process and find the global optimal solution.

4.2.1. CO₂ Emission Data

The CO₂ emission data in this section come from four sectors in China, including power, industry, transport, and resident sectors. The time range of the data is from 1 January 2019 to 31 June 2023. During this period, each sector contains 1612 sampling points. Figure 9 shows all the data used in this section. Among them, the x-axis represents the date order starting from 1 January 2019, the y-axis represents CO₂ emissions, which is measured in millions of metric tons (MM·T⁻¹).

4.2.2. Sample Making

In this paper, we used 12 consecutive days to predict the CO₂ emission on the next day. The data samples are divided using a sliding window approach, where each sample consists of 12 data points as input and 1 data point as output (as illustrated in Figure 10), where the green data points represent the input, while the red data point represents the output. With this method, a total of 1599 training samples are obtained, among which, the first 60% are used as the training set, 20% as the validation set, and the remaining 20% as the test set.

4.2.3. CGAOA-AttBiGRU Flowchart

When using AttBiGRU for CO₂ emission forecasting, five hyperparameters have the greatest impact on the predictive performance of the AttBiGRU, including: (1) the number of units in each BiGRU layer (unit); (2) the proportion of neurons randomly dropped during training (dropout_rate); (3) the number of samples used in each training iteration (batch_size); (4) the step size of parameter updates during model training (learning_rate); and (5) the hyperparameters of the Attention layer (attention_column).

The improved CGAOA is used to optimize these five hyperparameters. Firstly, the upper and lower boundaries of these five hyperparameters should be given to form the search space. The search space for these five hyperparameters is shown in Table 10.

Secondly, initialize CGAOA. Among them, the maximum number of iterations T is 200, the dimension d is 5, the population size n is 30. Then, the loss of AttBiGRU is used as the fitness function (in this paper, the loss function is RMSE). The solver of AttBiGRU is set to AdaGrad. Finally, the CGAOA algorithm is used to search for the optimal hyperparameters of AttBiGRU. Figure 11 shows the flowchart of the entire CGAOA-AttBiGRU framework.

4.2.4. Evaluation Metrics

In this section, three indicators are used for model evaluation, including Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). The calculation formula for these indicators is as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - \hat{y_{i}}|

(15)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}

(16)

M A P E = \frac{1}{n} \sum_{i = 1}^{n} |\frac{y_{i} - \hat{y_{i}}}{y_{i}}| \times 100 %

(17)

Among them,

n

is the number of test samples,

y_{i}

and

{\hat{y}}_{i}

are the true and predicted value of the sample i, separately.

4.2.5. Comparison of AttBiGRU Optimized by Various Algorithms

In this section, in addition to CGAOA, we also used 12 optimization algorithms to optimize our proposed CO₂ emission prediction model AttBiGRU, and compared the optimization performance of different optimization algorithms. The comparative algorithms include three traditional optimization algorithms (i.e., Grid Search, Random Search and Bayesian optimization algorithm) and nine state-of-the-art swarm intelligence optimization algorithms (i.e., GWO, BWO, CSA, SSA, SOA, DE, MFO, FPA, and PSO). To minimize the error arising from experimental contingency. Each comparative experiment was repeated 30 times.

The experimental dataset adopts the CO₂ emission data of four sectors in China. The results are shown in Table 11, Table 12, Table 13 and Table 14. Among them, the best results are bold, and unit, dr, bs, lr and ac are the optimal hyperparameters optimized by each algorithm; representing unit, dropout_rate, batch_size, learning_rate, and attention_column described in Table 10, respectively.

Based on the results from Table 11, Table 12, Table 13 and Table 14, it is evident that the performances of traditional optimization algorithms (i.e., grid search, random search, Bayesian) are significantly lower than those of swarm intelligence optimization algorithms. Among the 10 swarm intelligence optimization algorithms, our improved CGAOA is the best.

It can also be seen that, using different optimization algorithms to optimize the same prediction model, AttBiGRU can bring significant performance improvements. This indicates that in deep learning applications, model optimization is important, and may even be more important than model selection.

4.2.6. Comparison with Other Models

In the previous section, we discussed the performance of AttBiGRU optimized by different algorithms and found that the model optimized by CGAOA algorithm performed the best. In this section, we compare the optimal AttBiGRU with other machine learning and deep learning models, including SVM, ANN, GRU, LSTM, and BiGRU. In addition, we also compare our model with two statistical models, ARMA and ARIMA. For fair comparison, all models were optimized using the proposed CGAOA, and each experiment was repeated 30 times. The values of the evaluation indexes in Table 15 are all the average values of 30 times. The results of various models are shown in Table 15 (the best results are bold), where Rank indicates the final rank of a model. It can be seen that our AttBiGRU is significantly better than the comparison models in all four sectors.

Figure 12, Figure 13, Figure 14 and Figure 15 show the comparison between the predicted value and the actual value of each model in the test set in four sectors. Among them, the x-axis represents time, the y-axis represents CO₂ emission, the red area represents that the predicted value is greater than the actual value, and the blue area represents that the predicted value is less than the actual value. From these figures, we can see that our proposed AttBiGRU has the best performance, indicating that our model has the smallest prediction error.

5. Conclusions

This paper proposed a deep learning framework, CGAOA-AttBiGRU, for CO₂ emission prediction, where AttBiGRU is a deep learning model used for CO₂ emission prediction, and CGAOA is an optimization algorithm used to optimize AttBiGRU. We first conducted experiments on 24 benchmark functions to verify the performance of the proposed CGAOA. These experiments include ablation experiments and comparisons with nine other popular metaheuristic algorithms. The results show that the improved CGAOA is superior to the comparison algorithms on the 24 benchmark functions. Then, AttBiGRU was used to predict the CO₂ emissions of four sectors in China. During this process, CGAOA, along with nine other metaheuristic optimization algorithms and three traditional optimization algorithms, were used to optimize AttBiGRU, respectively. The results indicate that AttBiGRU optimized by CGAOA has the best performance. Finally, the optimized AttBiGRU model was compared with five machine learning and deep learning models, including ARMA, ARIMA, ANN, SVM, GRU, LSTM, and BiGRU. The results indicate that our proposed AttBiGRU has the best predictive performance in all four sectors.

Author Contributions

Conceptualization, H.L. and Y.W.; methodology, D.T. and H.W.; writing—original draft: Y.C.; formal analysis, Y.W.; software: H.W.; writing—review and editing: H.L., Y.W., D.T., Y.C. and H.W.; supervision, D.T.; funding acquisition, H.L. and Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Science Research Project of Hebei Education Department (ZC2024028), Natural Science Foundation of Hebei Province (D2023512004), Langfang City science and Technology support plan Project (2023011105), Science and Technology Innovation Program for Postgraduate students in IDP subsidized by Fundamental Research Funds for the Central Universities (ZY20240339), Langfang City science and Technology support plan Project (23011064).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Aichele, R.; Felbermayr, G. Kyoto and carbon leakage: An empirical analysis of the carbon content of bilateral trade. Rev. Econ. Stat. 2015, 97, 104–115. [Google Scholar] [CrossRef]
Chiroma, H.; Abdul-Kareem, S.; Khan, A.; Nawi, N.M.; Gital, A.Y.U.; Shuib, L.; Abubakar, A.I.; Rahman, M.Z.; Herawan, T. Global warming: Predicting OPEC carbon dioxide emissions from petroleum consumption using neural network and hybrid cuckoo search algorithm. PLoS ONE 2015, 10, e0136140. [Google Scholar] [CrossRef] [PubMed]
Chen, W. The costs of mitigating carbon emissions in China: Findings from China MARKAL-MACRO modeling. Energy Policy 2005, 33, 885–896. [Google Scholar] [CrossRef]
Deschênes, O.; Greenstone, M. The economic impacts of climate change: Evidence from agricultural output and random fluctuations in weather. Am. Econ. Rev. 2007, 97, 354–385. [Google Scholar] [CrossRef]
van der Gaast, W.; Sikkema, R.; Vohrer, M. The contribution of forest carbon credit projects to addressing the climate change challenge. Clim. Policy 2018, 18, 42–48. [Google Scholar] [CrossRef]
Anjos, M.F.; Feijoo, F.; Sankaranarayanan, S. A multinational carbon-credit market integrating distinct national carbon allowance strategies. Appl. Energy 2022, 319, 119181. [Google Scholar] [CrossRef]
Liu, L.; Chen, C.; Zhao, Y.; Zhao, E. China’s carbon-emissions trading: Overview, challenges and future. Renew. Sustain. Energy Rev. 2015, 49, 254–266. [Google Scholar] [CrossRef]
Zhang, Y.J. The impact of financial development on carbon emissions: An empirical analysis in China. Energy Policy 2011, 39, 2197–2203. [Google Scholar] [CrossRef]
Koutnik, J.; Greff, K.; Gomez, F.; Schmidhuber, J. A clockwork RNN. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 22–24 June 2014; pp. 1863–1871. [Google Scholar]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 3, 1235–1270. [Google Scholar] [CrossRef]
Dey, R.; Salem, F.M. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; pp. 1597–1600. [Google Scholar]
Li, Y.; Zhu, Z.; Kong, D.; Han, H.; Zhao, Y. EA-LSTM: Evolutionary attention-based LSTM for time series prediction. Knowl.-Based Syst. 2019, 181, 104785. [Google Scholar] [CrossRef]
Hussain, B.; Afzal, M.K.; Ahmad, S.; Mostafa, A.M. Intelligent traffic flow prediction using optimized GRU model. IEEE Access 2021, 9, 100736–100746. [Google Scholar] [CrossRef]
Yan, J.; Liu, J.; Yu, Y.; Xu, H. Water quality prediction in the luan river based on 1-drcnn and bigru hybrid neural network model. Water 2021, 13, 1273. [Google Scholar] [CrossRef]
Zhao, X.; Kang, H.; Feng, T.; Meng, C.; Nie, Z. A hybrid model based on LFM and BiGRU toward research paper recommendation. IEEE Access 2020, 8, 188628–188640. [Google Scholar] [CrossRef]
Zhi, Y.; Bao, Z.; Zhang, S.; He, R. BiGRU based online multi-modal driving maneuvers and trajectory prediction. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2021, 235, 3431–3441. [Google Scholar] [CrossRef]
Liu, J.; Yang, Y.; Lv, S.; Wang, J.; Chen, H. Attention-based BiGRU-CNN for Chinese question classification. J. Ambient Intell. Humaniz. Comput. 2019, 1–12. [Google Scholar] [CrossRef]
Bao, K.; Bi, J.; Ma, R.; Sun, Y.; Zhang, W.; Wang, Y. A Spatial-Reduction Attention-Based BiGRU Network for Water Level Prediction. Water 2023, 15, 1306. [Google Scholar] [CrossRef]
Zhu, Q.; Jiang, X.; Ye, R. Sentiment analysis of review text based on BiGRU-attention and hybrid CNN. IEEE Access 2021, 9, 149077–149088. [Google Scholar] [CrossRef]
Chen, J.; Zhang, J.; Chen, H.; Zhao, Y.; Wang, H. A TDV attention-based BiGRU network for AIS-based vessel trajectory prediction. iScience 2023, 26, 106383. [Google Scholar] [CrossRef]
Chi, D.; Yang, C. Wind power prediction based on WT-BiGRU-attention-TCN model. Front. Energy Res. 2023, 11, 1156007. [Google Scholar] [CrossRef]
Khalid, R.; Javaid, N. A survey on hyperparameters optimization algorithms of forecasting models in smart grid. Sustain. Cities Soc. 2020, 61, 102275. [Google Scholar] [CrossRef]
Bergstra, J.; Yamins, D.; Cox, D. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the International Conference on Machine Learning, PMLR, Atlanta, GA, USA, 17–19 June 2013; pp. 115–123. [Google Scholar]
Probst, P.; Boulesteix, A.L.; Bischl, B. Tunability: Importance of hyperparameters of machine learning algorithms. J. Mach. Learn. Res. 2019, 20, 1934–1965. [Google Scholar]
Bergstra, J.; Komer, B.; Eliasmith, C.; Yamins, D.; Cox, D.D. Hyperopt: A python library for model selection and hyperparameter optimization. Comput. Sci. Discov. 2015, 8, 014008. [Google Scholar] [CrossRef]
MacKay, D.J.C. Comparison of approximate methods for handling hyperparameters. Neural Comput. 1999, 11, 1035–1068. [Google Scholar] [CrossRef]
Zhang, B.; Rajan, R.; Pineda, L.; Lambert, N.; Biedenkapp, A.; Chua, K.; Hutter, F.; Calandra, R. On the importance of hyperparameter optimization for model-based reinforcement learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Virtual, 13–15 April 2021; pp. 4015–4023. [Google Scholar]
Mai, L.; Koliousis, A.; Li, G.; Brabete, A.O.; Pietzuch, P. Taming hyper-parameters in deep learning systems. ACM SIGOPS Oper. Syst. Rev. 2019, 53, 52–58. [Google Scholar] [CrossRef]
Kaur, S.; Aggarwal, H.; Rani, R. Hyper-parameter optimization of deep learning model for prediction of Parkinson’s disease. Mach. Vis. Appl. 2020, 31, 32. [Google Scholar] [CrossRef]
Yeh, W.C.; Lin, Y.P.; Liang, Y.C.; Lai, C.M.; Huang, C.L. Simplified swarm optimization for hyperparameters of convolutional neural networks. Comput. Ind. Eng. 2023, 177, 109076. [Google Scholar] [CrossRef]
Zhang, R.; Qiu, Z. Optimizing hyper-parameters of neural networks with swarm intelligence: A novel framework for credit scoring. PLoS ONE 2020, 15, e0234254. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Zhong, C.; Li, G.; Meng, Z. Beluga whale optimization: A novel nature-inspired metaheuristic algorithm. Knowl.-Based Syst. 2022, 251, 109215. [Google Scholar] [CrossRef]
Mirjalili, S. SCA: A sine cosine algorithm for solving optimization problems. Knowl.-Based Syst. 2016, 96, 120–133. [Google Scholar] [CrossRef]
Xue, J.; Shen, B. A novel swarm intelligence optimization approach: Sparrow search algorithm. Syst. Sci. Control Eng. 2020, 8, 22–34. [Google Scholar] [CrossRef]
Dhiman, G.; Kumar, V. Seagull optimization algorithm: Theory and its applications for large-scale industrial engineering problems. Knowl.-Based Syst. 2019, 165, 169–196. [Google Scholar] [CrossRef]
Cuevas, E.; Zaldivar, D.; Pérez-Cisneros, M. A novel multi-threshold segmentation approach based on differential evolution optimization. Expert Syst. Appl. 2010, 37, 5265–5271. [Google Scholar] [CrossRef]
Mirjalili, S. Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowl.-Based Syst. 2015, 89, 228–249. [Google Scholar] [CrossRef]
Yang, X.S. Flower pollination algorithm for global optimization. In Proceedings of the International Conference on Unconventional Computing and Natural Computation, Orléans, France, 3–7 September 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 240–249. [Google Scholar]
Li, W.; Wang, G.G.; Gandomi, A.H. A survey of learning-based intelligent optimization algorithms. Arch. Comput. Methods Eng. 2021, 28, 3781–3799. [Google Scholar] [CrossRef]
Gad, A.G. Particle swarm optimization algorithm and its applications: A systematic review. Arch. Comput. Methods Eng. 2022, 29, 2531–2561. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN′95-International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
Kaveh, A.; Hamedani, K.B. Improved arithmetic optimization algorithm and its application to discrete structural optimization. Structures 2022, 35, 748–764. [Google Scholar] [CrossRef]
Abualigah, L.; Diabat, A.; Mirjalili, S.; Abd Elaziz, M.; Gandomi, A.H. The arithmetic optimization algorithm. Comput. Methods Appl. Mech. Eng. 2021, 376, 113609. [Google Scholar] [CrossRef]
Kavoosi, H.; Saidi, M.H.; Kavoosi, M.; Bohrng, M. Forecast global carbon dioxide emission by use of genetic algorithm (GA). Int. J. Comput. Sci. Issues 2012, 9, 418. [Google Scholar]
Lotfalipour, M.R.; Falahi, M.A.; Bastam, M. Prediction of CO₂ emissions in Iran using grey and ARIMA models. Int. J. Energy Econ. Policy 2013, 3, 229–237. [Google Scholar]
Sun, W.; Liu, M. Prediction and analysis of the three major industries and residential consumption CO₂ emissions based on least squares support vector machine in China. J. Clean. Prod. 2016, 122, 144–153. [Google Scholar] [CrossRef]
Zhao, X.; Han, M.; Ding, L.; Calin, A.C. Forecasting carbon dioxide emissions based on a hybrid of mixed data sampling regression model and back propagation neural network in the USA. Environ. Sci. Pollut. Res. 2018, 25, 2899–2910. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Liang, W.; Liang, S.; Chen, B. Research on Carbon Dioxide Concentration Prediction Based on RNN Model in Deep Learning. Highlights Sci. Eng. Technol. 2023, 48, 281–287. [Google Scholar] [CrossRef]
Zuo, Z.; Guo, H.; Cheng, J. An LSTM-STRIPAT model analysis of China’s 2030 CO₂ emissions peak. Carbon Manag. 2020, 11, 577–592. [Google Scholar] [CrossRef]
Kumari, S.; Singh, S.K. Machine learning-based time series models for effective CO₂ emission prediction in India. Environ. Sci. Pollut. Res. 2023, 30, 116601–116616. [Google Scholar] [CrossRef]
Yang, F.; Liu, D.; Zeng, Q.; Chen, Z.; Ye, Y.; Yang, T.; He, Y.; Zhou, S.; Zheng, L. Prediction of Mianyang Carbon Emission Trend Based on Adaptive GRU Neural Network. In Proceedings of the 2022 4th International Conference on Frontiers Technology of Information and Computer (ICFTIC), Qingdao, China, 2–4 December 2022; IEEE: New York, NY, USA, 2022; pp. 747–750. [Google Scholar]
Cao, L.; Han, Y.; Feng, M.; Geng, Z.; Lu, Y.; Chen, L.; Ping, W.; Xia, T.; Li, S. Economy and carbon emissions optimization of different provinces or regions in China using an improved temporal attention mechanism based on gate recurrent unit. J. Clean. Prod. 2024, 434, 139827. [Google Scholar] [CrossRef]
Claesen, M.; De Moor, B. Hyperparameter search in machine learning. arXiv 2015, arXiv:1502.02127. [Google Scholar]
Belete, D.M.; Huchaiah, M.D. Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results. Int. J. Comput. Appl. 2022, 44, 875–886. [Google Scholar] [CrossRef]
Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
Wu, J.; Chen, X.Y.; Zhang, H.; Xiong, L.D.; Lei, H.; Deng, S.H. Hyperparameter optimization for machine learning models based on Bayesian optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar]
Bacanin, N.; Bezdan, T.; Tuba, E.; Strumberger, I.; Tuba, M. Optimizing convolutional neural network hyperparameters by enhanced swarm intelligence metaheuristics. Algorithms 2020, 13, 67. [Google Scholar] [CrossRef]
Chang, Z.H.; Yuan, W.; Huang, K. Remaining useful life prediction for rolling bearings using multi-layer grid search and LSTM. Comput. Electr. Eng. 2022, 101, 108083. [Google Scholar] [CrossRef]
Priyadarshini, I.; Cotton, C. A novel LSTM-CNN-grid search-based deep neural network for sentiment analysis. J. Supercomput. 2021, 77, 13911–13932. [Google Scholar] [CrossRef]
Huang, Q.; Mao, J.; Liu, Y. An improved grid search algorithm of SVR parameters optimization. In Proceedings of the 2012 IEEE 14th International Conference on Communication Technology, Chengdu, China, 9–11 November 2012; pp. 1022–1026. [Google Scholar]
Fayed, H.A.; Atiya, A.F. Speed up grid-search for parameter selection of support vector machines. Appl. Soft Comput. 2019, 80, 202–210. [Google Scholar] [CrossRef]
Tkachenko, A. Grid search in stellar parameters: A software for spectrum analysis of single stars and binary systems. Astron. Astrophys. 2015, 581, A129. [Google Scholar] [CrossRef]
Zhao, Y.; Zhang, W.; Liu, X. Grid search with a weighted error function: Hyper-parameter optimization for financial time series forecasting. Appl. Soft Comput. 2024, 154, 111362. [Google Scholar] [CrossRef]
Zabinsky, Z.B. Random Search Algorithms; Department of Industrial and Systems Engineering, University of Washington: Washington, DC, USA, 2009. [Google Scholar]
Andonie, R.; Florea, A.C. Weighted random search for CNN hyperparameter optimization. arXiv 2020, arXiv:2003.13300. [Google Scholar] [CrossRef]
Ragab, M.G.; Abdulkadir, S.J.; Aziz, N. Random search one dimensional CNN for human activity recognition. In Proceedings of the 2020 International Conference on Computational Intelligence (ICCI), Bandar Seri Iskandar, Malaysia, 8–9 October 2020; pp. 86–91. [Google Scholar]
Frazier, P.I. A tutorial on Bayesian optimization. arXiv 2018, arXiv:1807.02811. [Google Scholar]
Cho, H.; Kim, Y.; Lee, E.; Choi, D.; Lee, Y.; Rhee, W. Basic enhancement strategies when using Bayesian optimization for hyperparameter tuning of deep neural networks. IEEE Access 2020, 8, 52588–52608. [Google Scholar] [CrossRef]
Han, S.; Eom, H.; Kim, J.; Park, C. Optimal DNN architecture search using Bayesian Optimization Hyperband for arrhythmia detection. In Proceedings of the 2020 IEEE Wireless Power Transfer Conference (WPTC), Seoul, Republic of Korea, 15–19 November 2020; pp. 357–360. [Google Scholar]
Mavrovouniotis, M.; Li, C.; Yang, S. A survey of swarm intelligence for dynamic optimization: Algorithms and applications. Swarm Evol. Comput. 2017, 33, 1–17. [Google Scholar] [CrossRef]
Zhang, C.; Ma, H.; Hua, L.; Sun, W.; Nazir, M.S.; Peng, T. An evolutionary deep learning model based on TVFEMD, improved sine cosine algorithm, CNN and BiLSTM for wind speed prediction. Energy 2022, 254, 124250. [Google Scholar] [CrossRef]
Chen, X.; Li, Y.; Zhang, Y.; Ye, X.; Xiong, X.; Zhang, F. A novel hybrid model based on an improved seagull optimization algorithm for short-term wind speed forecasting. Processes 2021, 9, 387. [Google Scholar] [CrossRef]
Wang, J.; Yang, W.; Du, P.; Niu, T. A novel hybrid forecasting system of wind speed based on a newly developed multi-objective sine cosine algorithm. Energy Convers. Manag. 2018, 163, 134–150. [Google Scholar] [CrossRef]
Singh, P.; Chaudhury, S.; Panigrahi, B.K. Hybrid MPSO-CNN: Multi-level particle swarm optimized hyperparameters of convolutional neural network. Swarm Evol. Comput. 2021, 63, 100863. [Google Scholar] [CrossRef]
Huang, C.L.; Dun, J.F. A distributed PSO-SVM hybrid system with feature selection and parameter optimization. Appl. Soft Comput. 2008, 8, 1381–1391. [Google Scholar] [CrossRef]
Xia, J.; Wang, Z.; Yang, D.; Li, R.; Liang, G.; Chen, H.; Heidari, A.A.; Turabieh, H.; Mafarja, M.; Pan, Z. Performance optimization of support vector machine with oppositional grasshopper optimization for acute appendicitis diagnosis. Comput. Biol. Med. 2022, 143, 105206. [Google Scholar] [CrossRef]
Hu, J.; Heidari, A.A.; Shou, Y.; Ye, H.; Wang, L.; Huang, X.; Chen, H.; Chen, Y.; Wu, P. Detection of COVID-19 severity using blood gas analysis parameters and Harris hawks optimized extreme learning machine. Comput. Biol. Med. 2022, 142, 105166. [Google Scholar] [CrossRef]
Tikhamarine, Y.; Souag-Gamane, D.; Ahmed, A.N.; Kisi, O.; El-Shafie, A. Improving artificial intelligence models accuracy for monthly streamflow forecasting using grey Wolf optimization (GWO) algorithm. J. Hydrol. 2020, 582, 124435. [Google Scholar] [CrossRef]
Rahmanshahi, M.; Jafari-Asl, J.; Shafai Bejestan, M.; Mirjalili, S. A hybrid model for predicting the energy dissipation on the block ramp hydraulic structures. Water Resour. Manag. 2023, 37, 3187–3209. [Google Scholar] [CrossRef]
Xiao, J.; Zhu, X.; Huang, C.; Yang, X.; Wen, F.; Zhong, M. A new approach for stock price analysis and prediction based on SSA and SVM. Int. J. Inf. Technol. Decis. Mak. 2019, 18, 287–310. [Google Scholar] [CrossRef]
Jovanovic, L.; Milutinovic, N.; Gajevic, M.; Krstovic, J.; Rashid, T.A.; Petrovic, A. Sine cosine algorithm for simple recurrent neural network tuning for stock market prediction. In Proceedings of the 2022 30th Telecommunications Forum (TELFOR), Belgrade, Serbia, 15–16 November 2022; pp. 1–4. [Google Scholar]
El-Kenawy, E.S.M.; Mirjalili, S.; Ghoneim, S.S.; Eid, M.M.; El-Said, M.; Khan, Z.S.; Ibrahim, A. Advanced ensemble model for solar radiation forecasting using sine cosine algorithm and newton’s laws. IEEE Access 2021, 9, 115750–115765. [Google Scholar] [CrossRef]
Peng, T.; Zhang, C.; Zhou, J.; Nazir, M.S. An integrated framework of Bi-directional long-short term memory (BiLSTM) based on sine cosine algorithm for hourly solar radiation forecasting. Energy 2021, 221, 119887. [Google Scholar] [CrossRef]
Wu, Y.; Sun, X.; Zhang, Y.; Zhong, X.; Cheng, L. A power transformer fault diagnosis method-based hybrid improved seagull optimization algorithm and support vector machine. IEEE Access 2021, 10, 17268–17286. [Google Scholar] [CrossRef]
Li, X.D.; Wang, J.S.; Hao, W.K.; Zhang, M.; Wang, M. Chaotic arithmetic optimization algorithm. Appl. Intell. 2022, 52, 16718–16757. [Google Scholar] [CrossRef]
Mehmood, K.; Chaudhary, N.I.; Khan, Z.A.; Cheema, K.M.; Raja, M.A.Z.; Shu, C.M. Novel knacks of chaotic maps with Archimedes optimization paradigm for nonlinear ARX model identification with key term separation. Chaos Solitons Fractals 2023, 175, 114028. [Google Scholar] [CrossRef]

Figure 1. Internal structure of a GRU unit.

Figure 2. Specific structure of our proposed AttBiGRU.

Figure 3. Comparison of effects with and without logistic chaotic mapping.

Figure 4. Flowchart of CGAOA.

Figure 5. Friedman ranking of different algorithms on 24 benchmark functions.

Figure 6. Comparison of the fitness curves of various algorithms during iteration (unimodal functions).

Figure 7. Comparison of the fitness curves of various algorithms during iteration (multimodal functions).

Figure 8. Comparison of the fitness curves of various algorithms during iteration (composition functions).

Figure 9. CO₂ emissions from different sectors.

Figure 10. Schematic diagram of a sample making method using a sliding window.

Figure 11. CGAOA-AttBiGRU framework.

Figure 12. Error curves of different models in the power sector.

Figure 13. Error curves of different models in the industry sector.

Figure 14. Error curves of different models in the transport sector.

Figure 15. Error curves of different models in the resident sector.

Table 1. Research related to CO₂ prediction models.

Research Model	Hyperparameter Optimization Method	Research Content	Prediction Results
Linear and non-linear forms of equations [45]	Manual	The linear and nonlinear equations are used to predict the CO₂ emissions from 2004 to 2010, and the prediction results are better than the exponential model.	Relative error: 1.06%
ARIMA [46]	Manual	Using ARIMA to predict the CO₂ emissions from 2011 to 2020, the prediction results show that ARIMA is better than GM, AR, ARMA and other models.	RMSE: 14.482 MAE: 10.84 MAPE: 6.768%
SVM [47]	Manual	Forecast CO₂ emission from 2008 to 2012 by using the least squares SVM model, the prediction results are better than logistic model, BP neural network, and GM model.	RMSE: 0.003 MAPE: 0.328%
BP Network [48]	Manual	Forecasting global CO₂ emissions over a 15-quarter period using the BP network, the prediction results are better than those of OLS, PDL, ADL, and ARMA models.	RMSE: 16.87
RNN [49]	Manual	The RNN model is used to predict CO₂ emissions from 2012 to 2022, and the prediction results are better than the statistical model (Holt–Winters model) and the machine learning model (linear regression model).	MSE: 0.36 MAE: 0.23
LSTM-STRIPAT [50]	Manual	The LSTM-STRIPAT model is used to predict the CO₂ emissions of China from 2022 to 2040, and the prediction results are better than those of the BPNN and GM models.	MAPE: 2.6%
LSTM [51]	Manual	The LSTM model is used to predict India’s CO₂ emissions from 2019 to 2029, and the prediction results are better than the ARIMA, SARIMAX, and Holt–Winters models.	MAPE: 3.101% RMSE: 60.635 MedAE: 28.298
GRU [52]	Manual	The GRU model is used to predict the CO₂ emission of Mianyang City from 2020 to 2030, and the prediction results are better than those of the RNN and BP networks.	MAPE: 1.87%

Table 2. Unimodal benchmark functions.

Type	Name	Description	Range
Unimodal benchmark functions	F1	$F_{1} (x) = x_{1}^{2} + 10^{6} \sum_{i = 2}^{n} x_{i}^{2}$	[−500, 500]
	F2	$F_{2} (x) = \sum_{i = 1}^{n} x_{i}^{2} + {(\frac{1}{2} \sum_{i = 1}^{n} i x_{i}^{2})}^{2}$	[−500, 500]
	F3	$F_{3} (x) = \sum_{i = 1}^{n} \|x_{i} s i n x_{i} + 0.1 x_{i}\|$	[−500, 500]
	F4	$F_{4} (x) = \sum_{i = 1}^{n} [{(x_{i - 3} + 10 x_{i - 2})}^{2} + {(5 x_{i - 1} - x_{i})}^{2}]$	[−100, 100]
	F5	$F_{5} (x) = \frac{1}{2} \sum_{i = 1}^{n} (x_{i}^{4} - 16 x_{i}^{2} + 5 x_{i})$	[−500, 500]
	F6	$F_{6} (x) = \sum_{i = 1}^{n - 1} [{100 (x_{i + 1} - x_{i}^{2})}^{2} + (1 - x_{i}^{2})]$	[−30, 30]
	F7	$F_{7} (x) = \sum_{i = 1}^{n} x_{i}^{2}$	[0, 500]
	F8	$F_{8} (x) = \sum_{i = 1}^{n} {s i n (x_{i}) s i n (\frac{i x_{i}^{2}}{π})}^{2}$	[−512, 512]
	F9	$F_{9} (x) = \sum_{i = 1}^{n} {c o s}^{2} (x_{i}) - 0.1 \sum_{i = 1}^{n} e x p (2 x_{i})$	[−512, 512]

Table 3. Multimodal benchmark functions.

Type	Name	Description	Range	F(min)
Multimodal benchmark functions	F10	$F_{10} (x) = \sum_{i = 1}^{n - 1} (x_{i}^{2} + 2 x_{i + 1}^{2} - 0.3 c o s (3 π x_{i}))$	[−512, 512]	0
	F11	$F_{11} (x) = \sum_{i = 1}^{n} (x_{i}^{2} - 10 c o s (2 π x_{i}) + 10)$	[−5.12, 5.12]	0
	F12	$F_{12} (x) = - 20 e x p (- 0.2 \sqrt{\frac{1}{n} \sum_{i = 1}^{n} x_{i}^{2}}) - e x p (\frac{1}{n} \sum_{i = 1}^{n} c o s (2 π x_{i}))$	[−32, 32]	0
	F13	$F_{13} (x) = 1 + \frac{1}{4000} \sum_{i = 1}^{n} x_{i}^{2} - \prod_{i = 1}^{n} c o s \frac{x_{i}}{\sqrt{i}}$	[−600, 600]	0
	F14	$F_{14} (x) = \sum_{i = 1}^{n} {[a_{i} - \frac{x_{1} (b_{i}^{2} + b_{i} x_{2})}{b_{i}^{2} + b_{i} x_{3} + x_{4}}]}^{2}$	[−50, 50]	0
	F15	$F_{15} (x) = 4 x_{1}^{2} - 2.1 x_{1}^{4} + \frac{1}{3} x_{1}^{6} + x_{3} x_{2} - 4 x_{2}^{2} + 4 x_{1}^{2}$	[−50, 50]	0
	F16	$F_{16} (x) = {(x_{2} - \frac{5.1}{{4 π}^{2}} x_{1}^{2} + \frac{5}{π} x_{1} - 6)}^{2} + 10 (1 - \frac{1}{8 π}) c o s x_{1} + 10$	[−5, 5]	0.398
	F17	$F_{17} (x) = 10 n + \sum_{i = 1}^{n} [x_{i}^{2} - 10 c o s (2 π x_{i})]$	[−512, 512]	0
	F18	$F_{18} (x) = \|\sum_{i = 1}^{n} ({x_{i}^{2})}^{0.25} + 0.5 \sum_{i = 1}^{n} x_{i} + 0.5 n\|$	[−500, 500]	0

Table 4. Composition benchmark functions.

Type	Name	Description	F(min)
Composition benchmark functions	F19	Composition Function 1 (N = 3)	2100
	F20	Composition Function 2 (N = 3)	2200
	F21	Composition Function 3 (N = 4)	2300
	F22	Composition Function 4 (N = 4)	2400
	F23	Composition Function 5 (N = 5)	2500
	F24	Composition Function 6 (N = 5)	2600

Table 5. Various AOA variants with two mechanisms.

Model	Chaotic Mapping	Gaussian Variation
CGAOA	1	1
CAOA	1	0
GAOA	0	1
AOA	0	0

Table 6. The results of the ablation experiment.

Algorithm	Rank	+/−/=	Avg.
CGAOA	1	~	1.3069
CAOA	2	15/3/6	2.0965
GAOA	3	17/2/5	2.9181
AOA	4	20/1/3	3.8011

Table 7. Parameter settings of comparison algorithms.

Algorithm	Parameter	Value
GWO	Area vector a, random vector r₁, r₂	$A$ ∈ [0, 2], r₁ ∈ [0, 1], r₂ ∈ [0, 1]
BWO	The probability of reduced w_f between regions	w_f ∈ [0.05–0.1]
SCA	Convergence parameter spiral factor $a$	$a = 2$
SSA	Leadership position update probability c₃	c₃ = 0.5
SOA	Control parameters A, fc	A ∈ [0–2], fc = 2
DE	Scaling factor c, crossover probability p	c = 0.5, p = 0.5
MFO	Convergence parameter spiral factor c, special parameter b	c ∈ [−2, −1], b = 1
FPA	Probability switch p	p = 0.8
PSO	Acceleration constants c₁ and c₂, inertia weight w	c₁ = c₂ = 2, w ∈ [0.2–0.9]

Table 8. Experimental results on 24 benchmark functions (the best are in bold).

Fun	F1		F2		F3
Fun	Aver	Std	Aver	Std	Aver	Std
GWO	2.9997E+0	4.6547E-1	3.3143E+0	2.0422E-1	3.1791E+0	3.5212E-1
BWO	3.0770E+0	1.8606E-1	3.0340E+0	1.5253E-1	3.3282E+0	1.5460E-1
SCA	3.3571E+0	2.7111E-1	3.3237E+0	3.8633E-1	3.2291E+0	2.1695E-1
SSA	3.1942E+0	1.4920E-1	3.4142E+0	2.2133E-1	3.2309E+0	1.7968E-1
SOA	3.1887E+0	1.5777E-1	3.3139E+0	2.4405E-1	3.2282E+0	3.8577E-1
DE	2.9895E+0	2.9881E-1	3.5137E+0	1.5967E-1	3.1292E+0	1.9160E-1
MFO	2.9662E+0	2.1549E-1	3.2237E+0	2.7874E-1	3.3277E+0	2.5340E-1
FPA	2.9604E+0	1.6161E-1	3.3437E+0	2.4016E-1	3.2293E+0	1.2904E-1
PSO	3.0900E+0	2.7604E-1	3.1144E+0	2.6725E-1	3.3239E+0	4.6655E-1
CGAOA	2.9326E+0	1.3288E-1	2.8933E+0	2.5175E-1	2.0310E+0	1.5649E-1
	F4		F5		F6
	Aver	Std	Aver	Std	Aver	Std
GWO	1.0719E+1	4.6951E+0	2.1570E+2	6.5842E-1	1.0986E+1	8.5764E-0
BWO	1.0824E+1	5.1139E+0	2.2272E+2	2.9650E-1	1.0767E+1	3.8749E-0
SCA	1.0378E+1	4.2987E+0	2.3019E+2	2.2765E-1	1.1298E+1	6.9866E-0
SSA	1.0681E+1	1.1846E+1	2.3671E+2	6.7768E-1	1.0721E+1	5.7972E-0
SOA	1.0647E+1	3.9037E+0	2.2250E+2	1.9946E-1	1.0633E+1	6.8975E-0
DE	1.0214E+1	4.8932E+0	2.2497E+2	7.9160E-1	1.0578E+1	1.9244E-0
MFO	1.0623E+1	5.4174E+0	2.2712E+2	2.3956E-1	1.0726E+1	2.5291E-0
FPA	1.0631E+1	4.4114E+0	2.2110E+2	2.7796E-1	1.0625E+1	7.2794E-0
PSO	1.0734E+1	1.0935E+1	2.3799E+2	3.6425E-1	1.0742E+1	5.8692E-0
CGAOA	1.0151E+1	4.2956E+0	1.9259E+2	1.5649E-1	1.0346E+1	7.5649E-0
	F7		F8		F9
	Aver	Std	Aver	Std	Aver	Std
GWO	3.1670E+0	1.3070E-1	3.1987E+0	3.1116E-1	1.2579E+0	3.8597E-2
BWO	2.9973E+0	3.3041E-1	3.2957E+0	1.8463E-1	1.2433E+0	6.9653E-2
SCA	3.1508E+0	2.6584E-1	3.1406E+0	2.1921E-1	1.2172E+0	3.8924E-2
SSA	3.1374E+0	1.5546E-1	3.1979E+0	1.7608E-1	1.2141E+0	4.1477E-2
SOA	3.0854E+0	1.4907E-1	3.2099E+0	2.4198E-1	1.2239E+0	5.3566E-2
DE	3.0971E+0	2.8891E-1	3.1824E+0	3.2782E-1	1.2209E+0	3.2203E-2
MFO	3.2373E+0	3.0831E-1	3.2330E+0	1.3248E-1	1.2462E+0	3.6615E-2
FPA	3.1890E+0	3.6180E-1	3.2270E+0	2.5491E-1	1.2016E+0	2.5762E-2
PSO	3.1522E+0	1.6059E-1	3.2700E+0	2.3791E-1	1.2455E+0	3.3193E-2
CGAOA	2.9512E+0	3.3340E-1	3.0196E+0	7.3299E-2	1.1807E+0	5.6026E-2
	F10		F11		F12
	Aver	Std	Aver	Std	Aver	Std
GWO	1.2653E+0	1.0201E-1	2.1454E+1	5.8218E-2	1.0155E+1	4.9109E-1
BWO	1.2338E+0	6.6893E-2	2.1436E+1	1.1449E-1	1.0202E+1	6.8390E-1
SCA	1.2536E+0	4.7131E-2	2.1456E+1	3.9038E-2	1.0631E+1	4.6110E-1
SSA	1.2738E+0	7.5439E-2	2.1409E+1	1.2943E-1	1.0854E+1	3.4829E-1
SOA	1.2371E+0	1.1242E-2	2.1433E+1	8.8435E-2	1.0284E+1	5.9602E-1
DE	1.2070E+0	2.8333E-2	2.1404E+1	4.3957E-2	1.0527E+1	9.2007E-1
MFO	1.2174E+0	4.8090E-2	2.1411E+1	3.7122E-2	1.1215E+1	5.1018E-1
FPA	1.2331E+0	5.6104E-2	2.1534E+1	5.7014E-2	1.0685E+1	6.1983E-1
PSO	1.2314E+0	1.9357E-2	2.1465E+1	6.9590E-2	1.0861E+1	4.7621E-1
CGAOA	1.1577E+0	4.6213E-2	2.1389E+1	1.2943E-1	9.8487E+0	4.1400E-1
	F13		F14		F15
	Aver	Std	Aver	Std	Aver	Std
GWO	2.1476E+1	7.9116E-2	1.0612E+1	2.3848E-1	1.0758E+1	4.6697E-1
BWO	2.1393E+1	1.2265E-1	1.0153E+1	4.3953E-1	1.0477E+1	4.6273E-1
SCA	2.1424E+1	4.1515E-2	1.0650E+1	2.9989E-1	1.0634E+1	4.9492E-1
SSA	2.1436E+1	6.7325E-2	1.0681E+1	1.8724E-1	1.0849E+1	6.5998E-1
SOA	2.1444E+1	6.8851E-2	1.0494E+1	5.8187E-1	1.0417E+1	4.3076E-1
DE	2.1431E+1	8.8886E-2	1.0802E+1	2.5304E-1	1.0474E+1	1.0579E+0
MFO	2.1409E+1	1.0024E-1	1.0562E+1	3.4618E-1	1.0535E+1	7.2999E-1
FPA	2.1417E+1	7.7235E-2	1.0632E+1	3.4969E-1	1.1088E+1	1.4657E-1
PSO	2.1455E+1	8.6014E-2	1.0936E+1	2.7408E-1	1.0793E+1	5.7992E-1
CGAOA	2.1364E+1	6.1035E-2	1.0396E+1	2.3615E-1	9.7259E+0	4.6295E-1
	F16		F17		F18
	Aver	Std	Aver	Std	Aver	Std
GWO	1.2473E+0	1.9568E-1	3.2095E+0	5.6922E-1	1.2802E+0	5.4862E-2
BWO	1.2560E+0	6.9572E-2	3.3195E+0	3.1956E-1	1.2633E+0	7.5922E-2
SCA	1.2523E+0	4.1935E-2	3.2871E+0	1.1905E-1	1.2459E+0	2.6752E-2
SSA	1.2221E+0	3.0825E-1	3.1075E+0	3.0785E-1	1.2796E+0	2.6280E-2
SOA	1.2406E+0	2.9865E-2	3.3964E+0	2.0895E-1	1.2659E+0	3.8863E-2
DE	1.2106E+0	2.2553E-2	3.2076E+0	6.4925E-2	1.2706E+0	2.1102E-2
MFO	1.2395E+0	6.7865E-2	3.2210E+0	2.4783E-1	1.2895E+0	2.3206E-2
FPA	1.2122E+0	3.6956E-2	3.1552E+0	2.2935E-1	1.2469E+0	2.7853E-2
PSO	1.2259E+0	1.6715E-1	3.2106E+0	6.2532E-2	1.2693E+0	5.7932E-2
CGAOA	1.1826E+0	6.6054E-2	2.9722E+0	5.9782E-2	1.1807E+0	5.0823E-2
	F19		F20		F21
	Aver	Std	Aver	Std	Aver	Std
GWO	3.0724E+3	8.3062E+1	3.1702E+3	7.7017E+1	3.0738E+3	2.6927E+1
BWO	3.0826E+3	3.2694E+1	3.2408E+3	3.1103E+1	3.0567E+3	2.9714E+1
SCA	3.0492E+3	8.1778E+1	3.1258E+3	7.3933E+1	3.0841E+3	1.8063E+1
SSA	3.0401E+3	9.0293E+1	3.1529E+3	2.2837E+1	3.0821E+3	2.6106E+1
SOA	3.0644E+3	1.6985E+2	3.1685E+3	4.1144E+1	3.0769E+3	5.5635E+0
DE	3.1023E+3	3.8974E+1	3.1924E+3	4.4718E+1	3.0772E+3	6.3956E+0
MFO	2.9619E+3	1.3894E+2	3.1458E+3	2.5557E+1	3.0710E+3	1.4702E+1
FPA	3.0406E+3	1.1531E+2	3.1826E+3	5.7103E+1	3.0785E+3	2.3036E+1
PSO	3.0241E+3	1.2808E+2	3.1883E+3	3.3575E+1	3.0714E+3	2.3316E+1
CGAOA	2.9965E+3	7.4005E+1	3.0849E+3	5.4413E+1	3.0697E+3	3.2925E+1
	F22		F23		F24
	Aver	Std	Aver	Std	Aver	Std
GWO	3.0777E+3	2.5562E+1	3.0735E+3	1.0757E+1	3.0896E+3	2.0343E+1
BWO	3.0471E+3	5.2708E+1	3.0673E+3	2.4820E+1	3.0901E+3	2.6887E+1
SCA	3.0831E+3	2.7704E+1	3.0979E+3	2.0277E+1	3.0772E+3	2.3202E+1
SSA	3.0741E+3	1.5444E+1	3.0807E+3	9.2561E+0	3.0760E+3	1.1781E+1
SOA	3.0403E+3	7.1725E+1	3.0816E+3	2.4380E+1	3.0885E+3	2.7669E+1
DE	3.0519E+3	8.1391E+1	3.0709E+3	2.4522E+1	3.0881E+3	3.6343E+0
MFO	3.0445E+3	3.6526E+1	3.0760E+3	1.2796E+1	3.0857E+3	1.7441E+1
FPA	3.0788E+3	2.2727E+1	3.0687E+3	4.9937E+1	3.0640E+3	3.5676E+1
PSO	3.0622E+3	2.0521E+1	3.0902E+3	2.1660E+1	3.0794E+3	2.0797E+1
CGAOA	3.0219E+3	1.4764E+1	3.0436E+3	4.0715E+1	3.0484E+3	4.6827E+1

Table 9. p-values obtained from the Wilcoxon rank-sum test (F1–F24).

Fun	CGAOA vs. GWO	CGAOA vs. BWO	CGAOA vs. SCA	CGAOA vs. SSA	CGAOA vs. SOA	CGAOA vs. DE	CGAOA vs. MFO	CGAOA vs. FPA	CGAOA vs. PSO
F1	5.02E-06	7.98E-08	3.98E-05	6.79E-10	9.63E-03	4.91E-08	7.92E-06	8.38E-07	8.50E-09
F2	1.07E-06	8.25E-03	6.18E-02	9.22E-11	6.16E-07	8.37E-07	7.27E-05	2.99E-04	4.82E-08
F3	4.08E-03	4.22E-04	6.81E-02	1.69E-10	4.36E-08	2.12E-08	2.72E-10	8.72E-03	7.91E-06
F4	5.30E-09	1.73E-04	2.16E-07	1.28E-03	4.55E-06	8.50E-09	3.11E-03	3.98E-05	6.79E-10
F5	5.30E-09	1.73E-04	2.16E-07	1.28E-03	4.55E-06	8.50E-09	3.11E-03	1.73E-04	2.16E-07
F6	7.27E-05	2.99E-04	4.82E-08	9.13E-016	3.36E-08	3.98E-05	6.79E-10	8.42E-13	1.07E-07
F7	9.32E-03	6.37E-09	2.03E-07	8.22E-06	1.03E-05	3.02E-13	1.78E-03	8.42E-13	1.07E-07
F8	6.16E-02	5.18E-06	7.83E-02	8.04E-09	5.31E-03	9.09E-02	9.22E-11	8.22E-06	1.03E-05
F9	7.27E-05	2.99E-04	4.82E-08	9.13E-016	3.36E-08	3.98E-05	6.79E-10	1.33E-05	3.06E-04
F10	8.13E-04	8.31E-04	9.11E-10	1.33E-05	3.06E-04	8.33E-07	1.07E-011	8.25E-03	6.13E-04
F11	9.32E-03	6.37E-09	2.03E-07	8.22E-06	1.03E-05	3.02E-13	1.78E-03	8.99E-06	2.05E-01
F12	8.38E-04	3.13E-06	9.31E-07	1.03E-09	9.25E-06	5.32E-08	4.08E-03	4.22E-04	5.90E-05
F13	7.27E-05	2.99E-04	4.82E-08	9.13E-016	3.36E-08	1.05E-03	5.65E-05	2.88E-03	6.63E-08
F14	5.05E-09	6.28E-09	3.00E-04	4.20E-05	6.98E-04	9.95E-06	2.21E-06	7.37E-09	1.34E-03
F15	7.98E-08	3.98E-05	6.79E-10	9.63E-03	4.91E-08	5.08E-09	2.48E-07	6.85E-06	2.93E-01
F16	8.38E-07	8.50E-09	3.11E-03	4.50E-05	7.22E-01	6.30E-05	8.65E-01	8.08E-03	5.01E-06
F17	5.16E-02	9.01E-05	7.11E-01	8.91E-04	5.24E-07	6.21E-09	9.32E-03	6.37E-09	2.03E-07
F18	8.13E-04	8.31E-04	9.11E-10	1.33E-05	3.06E-04	8.33E-07	1.07E-011	8.25E-03	6.13E-04
F19	6.01E-08	1.90E-04	6.60E-05	2.08E-08	8.30E-10	2.98E-07	2.02E-08	8.10E-04	9.19E-08
F20	4.08E-08	1.52E-06	1.08E-06	5.30E-04	8.93E-08	7.66E-12	1.23E-07	5.54E-10	8.92E-13
F21	7.12E-04	6.32E-04	5.15E-01	6.72E-01	6.15E-05	6.52E-07	7.24E-08	7.98E-08	3.98E-05
F22	1.07E-011	8.25E-03	9.09E-02	9.22E-11	6.16E-07	8.37E-07	7.27E-05	2.99E-04	4.82E-08
F23	8.38E-07	8.50E-09	3.11E-03	4.50E-05	9.05E-06	6.30E-04	5.05E-09	6.28E-09	3.00E-04
F24	7.98E-08	3.98E-05	6.79E-10	9.63E-03	4.91E-08	5.08E-09	2.48E-07	6.85E-06	2.93E-01

Table 10. The hyperparameters to be optimized and their search space boundaries.

Hyperparameter Name	Description	Lower Bounds	Upper Bounds
Unit	The number of units in each BiGRU layer	32	128
dropout_rate	The proportion of neurons randomly dropped during training	0.2	0.5
batch_size	The number of samples used in each training iteration	32	128
learning_rate	The step size of parameter updates during model training	0.01	0.1
attention_column	The hyperparameters of the Attention layer	1	12

Table 11. Comparison of various models in the power sector.

Optimization Algorithm	Optimal Solution from the Optimization Algorithm					Evaluation Indicators
Optimization Algorithm	Unit	dr	bs	lr	ac	MAE (MM·T⁻¹)	RMSE (MM·T⁻¹)	MAPE
Grid Search	81	0.38	44	0.062	7	0.1320	0.1976	0.866%
Random Search	33	0.46	82	0.032	7	0.1677	0.2081	0.933%
Bayesian	65	0.28	28	0.061	7	0.0926	0.1510	0.720%
GWO	62	0.33	72	0.041	6	0.0362	0.0533	0.298%
BWO	76	0.26	82	0.075	7	0.0452	0.0619	0.352%
CSA	56	0.44	66	0.022	7	0.0308	0.0528	0.277%
SSA	54	0.32	68	0.048	6	0.0291	0.0506	0.270%
SOA	89	0.45	56	0.036	6	0.0339	0.0512	0.280%
DE	61	0.22	62	0.055	9	0.0519	0.0692	0.389%
MFO	39	0.37	86	0.071	6	0.0304	0.0491	0.298%
FPA	87	0.41	62	0.033	7	0.0291	0.0426	0.251%
PSO	38	0.32	60	0.071	7	0.0636	0.0710	0.409%
CGAOA	66	0.33	64	0.045	7	0.0242	0.0397	0.205%

Table 12. Comparison of various models in the industry sector.

Optimization Algorithm	Optimal Solution from the Optimization Algorithm					Evaluation Indicators
Optimization Algorithm	Unit	dr	bs	lr	ac	MAE (MM·T⁻¹)	RMSE (MM·T⁻¹)	MAPE
Grid search	76	0.49	56	0.062	7	0.1128	0.1790	0.830%
Random search	38	0.46	32	0.053	5	0.1327	0.1944	0.862%
Bayesian	70	0.36	36	0.062	6	0.0890	0.1471	0.769%
GWO	60	0.30	72	0.040	6	0.0339	0.0502	0.270%
BWO	76	0.26	82	0.075	7	0.0413	0.0588	0.310%
CSA	56	0.44	66	0.023	6	0.0293	0.0503	0.259%
SSA	56	0.33	68	0.050	7	0.0277	0.0493	0.249%
SOA	86	0.41	50	0.033	6	0.0310	0.0501	0.265%
DE	60	0.22	62	0.054	8	0.0498	0.0630	0.301%
MFO	38	0.33	80	0.066	7	0.0298	0.0403	0.277%
FPA	82	0.36	60	0.029	7	0.0273	0.0401	0.232%
PSO	36	0.30	60	0.070	7	0.0591	0.0642	0.397%
CGAOA	63	0.31	58	0.039	7	0.0133	0.0226	0.198%

Table 13. Comparison of various models in the transport sector.

Optimization Algorithm	Optimal Solution from the Optimization Algorithm					Evaluation Indicators
Optimization Algorithm	Unit	dr	bs	lr	ac	MAE (MM·T⁻¹)	RMSE (MM·T⁻¹)	MAPE
Grid search	64	0.23	68	0.042	6	0.1196	0.1821	0.860%
Random search	46	0.28	38	0.051	8	0.1401	0.2066	0.896%
Bayesian	68	0.26	44	0.062	8	0.0838	0.1566	0.733%
GWO	62	0.28	62	0.036	7	0.0342	0.0514	0.270%
BWO	70	0.21	80	0.066	8	0.0396	0.0502	0.287%
CSA	62	0.38	58	0.044	6	0.0329	0.0586	0.274%
SSA	48	0.34	64	0.071	7	0.0262	0.0473	0.272%
SOA	80	0.33	66	0.021	7	0.0288	0.0476	0.252%
DE	56	0.26	64	0.058	7	0.0611	0.0722	0.420%
MFO	38	0.29	66	0.060	7	0.0170	0.0320	0.239%
FPA	64	0.41	50	0.026	7	0.0322	0.0457	0.250%
PSO	52	0.32	52	0.081	8	0.0603	0.0673	0.406%
CGAOA	66	0.35	64	0.044	7	0.0135	0.0169	0.195%

Table 14. Comparison of various models in the resident sector.

Optimization Algorithm	Optimal Solution from the Optimization Algorithm					Evaluation Indicators
Optimization Algorithm	Unit	dr	bs	lr	ac	MAE (MM·T⁻¹)	RMSE (MM·T⁻¹)	MAPE
Grid search	74	0.30	40	0.065	8	0.1013	0.1988	0.875%
Random search	30	0.55	76	0.055	6	0.1685	0.2077	0.913%
Bayesian	68	0.39	26	0.055	6	0.0890	0.1533	0.713%
GWO	60	0.38	66	0.043	7	0.0311	0.0508	0.294%
BWO	78	0.28	80	0.070	7	0.0412	0.0667	0.357%
CSA	50	0.41	66	0.050	8	0.0311	0.0440	0.292%
SSA	65	0.38	66	0.052	6	0.0289	0.0511	0.264%
SOA	80	0.39	50	0.041	6	0.0308	0.0506	0.290%
DE	55	0.20	66	0.051	6	0.0520	0.0661	0.320%
MFO	42	0.41	80	0.050	8	0.0311	0.0522	0.303%
FPA	86	0.41	60	0.031	7	0.0246	0.0508	0.257%
PSO	58	0.43	60	0.078	6	0.0633	0.0706	0.401%
CGAOA	65	0.31	60	0.047	7	0.0239	0.0388	0.230%

Table 15. Comparison of different models in four sectors.

Sector	Model	MAE (MM·T⁻¹)	RMSE (MM·T⁻¹)	MAPE	Rank
Aver(30 times)
Power	ARMA	0.2875	0.2850	0.821%	8
	ARIMA	0.1892	0.2542	0.833%	7
	SVM	0.1556	0.1960	0.733%	6
	ANN	0.1106	0.1297	0.663%	5
	GRU	0.0728	0.1106	0.520%	3
	LSTM	0.0772	0.1201	0.568%	4
	BiGRU	0.0582	0.0881	0.413%	2
	AttBiGRU	0.0240	0.0391	0.213%	1
Industry	ARMA	0.3413	0.3891	0.966%	8
	ARIMA	0.3030	0.3231	0.901%	7
	SVM	0.2344	0.2626	0.813%	6
	ANN	0.2088	0.2292	0.793%	5
	GRU	0.1633	0.1799	0.632%	3
	LSTM	0.1611	0.1862	0.628%	4
	BiGRU	0.1359	0.1082	0.539%	2
	AttBiGRU	0.0138	0.0229	0.198%	1
Transport	ARMA	0.2981	0.3192	0.947%	8
	ARIMA	0.2651	0.2662	0.933%	7
	SVM	0.2313	0.2335	0.832%	6
	ANN	0.1801	0.1995	0.723%	5
	GRU	0.1492	0.1803	0.697%	3
	LSTM	0.1452	0.1703	0.632%	4
	BiGRU	0.1182	0.1591	0.583%	2
	AttBiGRU	0.0129	0.0163	0.188%	1
Resident	ARMA	0.3142	0.2992	0.981%	8
	ARIMA	0.2630	0.2856	0.951%	7
	SVM	0.2238	0.2331	0.891%	6
	ANN	0.1762	0.2059	0.703%	5
	GRU	0.1462	0.1691	0.613%	3
	LSTM	0.1432	0.1673	0.603%	4
	BiGRU	0.1138	0.1585	0.532%	2
	AttBiGRU	0.0251	0.0391	0.236%	1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, H.; Wu, Y.; Tan, D.; Chen, Y.; Wang, H. CGAOA-AttBiGRU: A Novel Deep Learning Framework for Forecasting CO₂ Emissions. Mathematics 2024, 12, 2956. https://doi.org/10.3390/math12182956

AMA Style

Liu H, Wu Y, Tan D, Chen Y, Wang H. CGAOA-AttBiGRU: A Novel Deep Learning Framework for Forecasting CO₂ Emissions. Mathematics. 2024; 12(18):2956. https://doi.org/10.3390/math12182956

Chicago/Turabian Style

Liu, Haijun, Yang Wu, Dongqing Tan, Yi Chen, and Haoran Wang. 2024. "CGAOA-AttBiGRU: A Novel Deep Learning Framework for Forecasting CO₂ Emissions" Mathematics 12, no. 18: 2956. https://doi.org/10.3390/math12182956

APA Style

Liu, H., Wu, Y., Tan, D., Chen, Y., & Wang, H. (2024). CGAOA-AttBiGRU: A Novel Deep Learning Framework for Forecasting CO₂ Emissions. Mathematics, 12(18), 2956. https://doi.org/10.3390/math12182956

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CGAOA-AttBiGRU: A Novel Deep Learning Framework for Forecasting CO₂ Emissions

Abstract

1. Introduction

2. Literature Review

2.1. Research on CO₂ Emissions Prediction Models

2.2. Research on Models’ Hyperparameter Optimization

3. Methods

3.1. CO₂ Emissions Prediction Model: AttBiGRU

3.2. CGAOA

3.2.1. The Original AOA

3.2.2. Proposed CGAOA

3.2.3. CGAOA Computational Complexity

4. Results and Discussion

4.1. CGAOA Performance Verification Experiments

4.1.1. Benchmark Functions

4.1.2. Ablation Analysis of the CGAOA

4.1.3. Comparison with Other Algorithms

4.2. CGAOA-AttBiGRU Framework for CO₂ Emission Forecasting

4.2.1. CO₂ Emission Data

4.2.2. Sample Making

4.2.3. CGAOA-AttBiGRU Flowchart

4.2.4. Evaluation Metrics

4.2.5. Comparison of AttBiGRU Optimized by Various Algorithms

4.2.6. Comparison with Other Models

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

CGAOA-AttBiGRU: A Novel Deep Learning Framework for Forecasting CO2 Emissions

Abstract

1. Introduction

2. Literature Review

2.1. Research on CO2 Emissions Prediction Models

2.2. Research on Models’ Hyperparameter Optimization

3. Methods

3.1. CO2 Emissions Prediction Model: AttBiGRU

3.2. CGAOA

3.2.1. The Original AOA

3.2.2. Proposed CGAOA

3.2.3. CGAOA Computational Complexity

4. Results and Discussion

4.1. CGAOA Performance Verification Experiments

4.1.1. Benchmark Functions

4.1.2. Ablation Analysis of the CGAOA

4.1.3. Comparison with Other Algorithms

4.2. CGAOA-AttBiGRU Framework for CO2 Emission Forecasting

4.2.1. CO2 Emission Data

4.2.2. Sample Making

4.2.3. CGAOA-AttBiGRU Flowchart

4.2.4. Evaluation Metrics

4.2.5. Comparison of AttBiGRU Optimized by Various Algorithms

4.2.6. Comparison with Other Models

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

CGAOA-AttBiGRU: A Novel Deep Learning Framework for Forecasting CO₂ Emissions

2.1. Research on CO₂ Emissions Prediction Models

3.1. CO₂ Emissions Prediction Model: AttBiGRU

4.2. CGAOA-AttBiGRU Framework for CO₂ Emission Forecasting

4.2.1. CO₂ Emission Data