Next Article in Journal
The Dynamic Event-Based Non-Fragile H State Estimation for Discrete Nonlinear Systems with Dynamical Bias and Fading Measurement
Previous Article in Journal
Investigation of Micro-Scale Damage and Weakening Mechanisms in Rocks Induced by Microwave Radiation and Their Associated Strength Reduction Patterns: Employing Meta-Heuristic Optimization Algorithms and Extreme Gradient Boosting Models
Previous Article in Special Issue
A Method for Evaluating the Data Integrity of Microseismic Monitoring Systems in Mines Based on a Gradient Boosting Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

CGAOA-AttBiGRU: A Novel Deep Learning Framework for Forecasting CO2 Emissions

1
School of Emergency Management, Institute of Disaster Prevention, Langfang 065201, China
2
College of General Education, Hainan Vocational University, Haikou 570216, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(18), 2956; https://doi.org/10.3390/math12182956
Submission received: 1 August 2024 / Revised: 17 September 2024 / Accepted: 20 September 2024 / Published: 23 September 2024

Abstract

:
Accurately predicting carbon dioxide (CO2) emissions is crucial for environmental protection. Currently, there are two main issues with predicting CO2 emissions: (1) existing CO2 emission prediction models mainly rely on Long Short-Term Memory (LSTM) and Gate Recurrent Unit (GRU) models, which can only model unidirectional temporal features, resulting in insufficient accuracy: (2) existing research on CO2 emissions mainly focuses on designing predictive models, without paying attention to model optimization, resulting in models being unable to achieve their optimal performance. To address these issues, this paper proposes a framework for predicting CO2 emissions, called CGAOA-AttBiGRU. In this framework, Attentional-Bidirectional Gate Recurrent Unit (AttBiGRU) is a prediction model that uses BiGRU units to extract bidirectional temporal features from the data, and adopts an attention mechanism to adaptively weight the bidirectional temporal features, thereby improving prediction accuracy. CGAOA is an improved Arithmetic Optimization Algorithm (AOA) used to optimize the five key hyperparameters of the AttBiGRU. We first validated the optimization performance of the improved CGAOA algorithm on 24 benchmark functions. Then, CGAOA was used to optimize AttBiGRU and compared with 12 optimization algorithms. The results indicate that the AttBiGRU optimized by CGAOA has the best predictive performance.

1. Introduction

In the past few decades, rapid economic growth in China coupled with heavy reliance on high-carbon energy sources such as coal has led to a significant increase in CO2 emissions [1]. According to the data from the International Energy Agency (IEA) and other relevant organizations, since 2006, China’s CO2 emissions have surpassed those of the United States, making China the world’s largest emitter of CO2. The substantial emissions of CO2 into the atmosphere have significant adverse effects on humanity. On the one hand, excessive CO2 emissions can enhance the greenhouse effect in the Earth’s atmosphere, leading to global warming, rising sea levels, extreme droughts, increased floods and storms; and even ecosystem collapse, species extinction, and loss of human habitats [2,3]. On the other hand, excessive CO2 emissions can lead to human respiratory alkalosis, causing rapid breathing, elevated partial pressure of CO2, mental confusion, drowsiness, semi-consciousness to unconsciousness, and even seizures [4]. Moreover, the increase in CO2 contributes to ocean acidification, which damages the ability of marine organisms such as mollusks, crustaceans, and corals to form calcium carbonate skeletons and shells, thus affecting the structure and function of the entire marine ecosystem. The threat posed by CO2 has been widely recognized as one of the greatest environmental challenges facing the Earth [5]. In response to the pressing environmental situation, China has taken a series of actions and implemented numerous policies to reduce carbon emissions. For example, China launched a nationwide carbon market pilot in 2017 to establish a nationwide carbon trading system, driving enterprises to reduce carbon emissions and stimulating low-carbon development through carbon market mechanisms [6]. In addition, China pledged at the 21st United Nations Climate Conference that, from 2030 onwards, China will achieve the goal of no further increase in CO2 emissions [7]. To achieve this goal, it is crucial to accurately predict the future CO2 emissions so as to develop corresponding emission reduction policies [8].
To predict CO2 emissions, two key issues need to be addressed: the first is to design a prediction model, and the second is to optimize the hyperparameters of the model to achieve optimal performance.
At present, the prediction models for CO2 emissions are mainly based on deep learning models, including the Recurrent Neural Network (RNN), the Long Short-Term Memory Network (LSTM), and the Gated Recurrent Unit (GRU) [9,10,11]. However, RNN is unable to perform long time series prediction tasks due to the phenomenon of gradient vanishing during long time series prediction. LSTM addresses this issue by employing memory cells and gate mechanisms to effectively retain information in long sequences, thereby resolving problems such as gradient vanishing and long-term dependency in RNN [12]. However, the three gate mechanisms in LSTM are overly complex, leading to more model parameters and increasing the computational cost. GRU simplifies LSTM by utilizing two gate mechanisms—the update gate and the reset gate—to control the selective forgetting and retention of historical information. Due to its simplified structure, GRU has fewer parameters and lower computational costs [13]. However, traditional GRU can only extract unidirectional temporal features, resulting in incomplete understanding of certain specific patterns in the sequence and limited feature extraction for complex sequence patterns. To address this issue, researchers have proposed the Bidirectional GRU (BiGRU). A BiGRU unit comprises a forward and a backward GRU unit, which extract forward and backward temporal features, respectively, enabling a more comprehensive capturing of patterns and trends in the sequence [14,15]. Studies have shown that compared to the original GRU and LSTM, BiGRU significantly improves prediction accuracy [16]. Currently, BiGRU has been widely applied in time series prediction, but there have been no reports on its application in CO2 emission prediction.
Due to the simplicity of the BiGRU structure and its ability to capture bidirectional patterns and trends in sequences, this study chooses BiGRU as the prediction model for CO2 emission prediction. Furthermore, to pay more attention to important bidirectional temporal features so as to effectively improve the prediction performance, we add an attention mechanism to BiGRU. The attention mechanism adaptively calculates the weights for features, assigning greater weights to important features to allow them to play a larger role in prediction [17,18]. Recent studies have shown that adding attention mechanisms to models can significantly improve performance [19,20,21]. In this study, by adding an attention mechanism to BiGRU, the bidirectional temporal features extracted by BiGRU are utilized more effectively in prediction, thereby improving the model’s prediction performance. The final model in this study is named as AttBiGRU.
There are some important hyperparameters in the proposed AttBiGRU that have a significant impact on its performance. Research has shown that hyperparameters have a considerable impact on deep learning models’ performance, sometimes surpassing the impact of model selection itself [22,23,24]. Finding the optimal combination of hyperparameters for deep learning models—known as hyperparameter optimization—needs first to define a search space, including the hyperparameters to be optimized and their corresponding ranges. Then, an optimization algorithm is defined to find the best solution within this search space. Due to the large number of hyperparameters and the wide range of each parameter, there exist numerous combinations of hyperparameters in the search space. All these possible combinations need to be evaluated to find the best one [25,26,27]. Hyperparameter optimization, therefore, has always been a challenging task in the applications of deep learning.
When using AttBiGRU for predicting CO2 emissions, there are two types of hyperparameters: (1) hyperparameters related to the structure of AttBiGRU, such as the number of units in each BiGRU layer, the number of layers stacked in BiGRU, the dimension of attention in the attention layer, etc. (2) Hyperparameters related to AttBiGRU training, such as the step size of parameter updates during model training (i.e., learning rate), the proportion of neurons randomly dropped during training (denoted by dropout rate), the maximum number of iterations allowed during training (epochs), optimizer used in error backpropagation, loss function, etc. [28,29]. Due to the large number of hyperparameters and the large search space for each hyperparameter, optimizing all hyperparameters requires a huge amount of computation. This paper selects five of the most critical hyperparameters that have the greatest impact on the performance of AttBiGRU for optimization. The detailed descriptions of these selected hyperparameters can be found in Table 10.
The hyperparameter optimization of deep learning models is a non-convex problem that traditional gradient-based algorithms cannot solve. Recently, swarm intelligence optimization algorithms have been proven to be effective in hyperparameter optimization [30]. Swarm intelligence optimization algorithms belong to a class of optimization algorithms based on collective collaboration and heuristic search, mimicking the characteristics of collective behavior in nature to find the optimal or near-optimal solution to a problem [31]. At present, the mainstream swarm intelligence optimization algorithms include GWO (Grey Wolf Optimizer), BWO (Beluga Whale Optimization), SCA (Sine Cosine Algorithm), SSA (Sparrow Search Algorithm), SOA (Seagull Optimization Algorithm), DE (Differential Evolution), MFO (Moth Flame Optimization), FPA (Flower Pollination Algorithm), and PSO (Particle Swarm Optimization) [32,33,34,35,36,37,38,39,40], etc. Although a large number of swarm intelligence optimization algorithms have been proposed, according to the “No Free Lunch” theorem, no optimization algorithm is omnipotent in solving all problems [41,42]. That is, if an algorithm performs well in some cases, it will not perform as well in other cases. In other words, there is no single algorithm that can perform excellently on all problems [43]. Therefore, it is still very important to improve existing algorithms to enhance their ability to optimize a certain type of problem.
AOA was proposed by Laith Abualigah in 2021 [44]. It has the advantages of strong global search capability, fast convergence speed, and good adaptability and robustness. Experiments on benchmark functions have shown that its performance surpasses GWO, BWO, SCA, SSA, SOA, DE, MFO, FPA, and PSO. However, the original AOA algorithm suffers from two issues: (1) imbalance between the exploration and exploitation phases. (2) Over-reliance on historical optimal positions during the position updates in the exploration and exploitation phases, leading to the algorithm easily getting stuck in local optima. To address these two issues, we propose an improved version of the Arithmetic Optimization Algorithm called chaotic mapping and Gaussian mutation Arithmetic Optimization Algorithm (CGAOA) by adding chaotic mapping and Gaussian mutation. The improved algorithm is called CGAOA.
CGAOA is used to optimize the five key hyperparameters of the carbon dioxide emission prediction model AttBiGRU (the optimized hyperparameters are described in detail in Table 10).
The contributions of this paper are as follows:
(1)
This paper proposes a CGAOA-AttBiGRU framework for CO2 emission prediction, in which AttBiGRU is a deep learning model for CO2 emission prediction and CGAOA is an improved AOA algorithm for hyperparameter optimization of AttBiGRU.
(2)
The proposed CGAOA-AttBiGRU was compared with ARMA, ARIMA, SVM, ANN, GRU, LSTM, and BiGRU in predicting CO2 emissions from four sectors in China. The results show that our proposed model is significantly superior to the comparison models.
This paper is structured as follows: Section 2 provides a review of CO2 emission prediction models and hyperparameter optimization methods. Section 3 introduces the CO2 emission prediction model AttBiGRU and an improved optimization algorithm CGAOA. Section 4 demonstrates the comparative experiments of CGAOA, AOA, and the remaining 9 swarm intelligent optimization algorithms on 24 benchmark functions. It also introduces the application of CGAOA and AttBiGRU in predicting CO2 emissions in China. Finally, Section 5 concludes this paper.

2. Literature Review

As mentioned earlier, predicting CO2 emissions requires two key steps: (1) establishing a predicting model. (2) Optimizing the model. In this section, the current research of CO2 emissions prediction models and optimization methods will be introduced separately.

2.1. Research on CO2 Emissions Prediction Models

There are various methods for predicting CO2 emissions. For example, Kavoosi et al. forecasted global CO2 emissions using linear and nonlinear equation models [45]. Lotfalipour et al. predicted Iran’s future 10-year CO2 emissions using Autoregressive Integrated Moving Average Model (ARIMA) [46]. ARIMA is a linear model, its accuracy is insufficient when modeling nonlinear CO2 emissions. Sun et al. utilized support vector machines (SVM) to forecast CO2 emissions from three major industries in China [47]. Zhao et al. proposed a hybrid data sampling regression model, MIDAS, and combined it with BP neural networks to predict the CO2 emissions of the United States [48]. SVM and BP Network can be used for nonlinear modeling, but they do not consider the temporal variation of CO2 emissions. Wang et al. used RNN to predict CO2 emissions [49]. RNN is a time series prediction model, which can describe the time variation pattern in the series. However, RNN is affected by a phenomenon of data forgetting in long-term series prediction. LSTM solves the data-forgetting problem of RNN in long-term time series prediction through gate structure, and has also been introduced into carbon dioxide prediction [50,51]. Subsequently, GRU—with a simpler structure than LSTM—was proposed and applied to predict CO2 emissions [52]. In recent years, researchers have combined attention techniques with LSTM and GRU to enhance their ability to focus on important features, thereby improving prediction accuracy. For example, Cao et al. proposed the TA-GRU model by adding an attention mechanism to the GRU model to predict CO2 emissions for 27 provincial-level administrative regions in China [53]. Research has shown that LSTM and GRU with attention techniques added have better predictive abilities. The related works on CO2 emission prediction are summarized in Table 1.

2.2. Research on Models’ Hyperparameter Optimization

From Table 1, it can be seen that in the field of CO2 emission prediction, research work mainly focuses on designing different prediction models, with little involvement in model hyperparameter optimization. The hyperparameters in existing prediction models are manually set by researchers. Manual setting of hyperparameters is easily influenced by personal subjective opinions as it is based on the experience and intuition of researchers. In particular, many hyperparameters are continuous, and those that researchers manually set are almost impossible to be the optimal ones [54]. That is to say, the model with manual hyperparameters cannot achieve its optimal performance.
In other applications of deep learning models, hyperparameter optimization methods include grid search, random search, Bayesian optimization, and swarm intelligence optimization [55,56,57,58].
Grid search is an automatic hyperparameter optimization algorithm, which is widely used in hyperparameter optimization for deep learning [59,60,61]. In the grid search algorithm, each hyperparameter is first discretized to generate a discrete search space. Then, an exhaustive search is performed within this space to determine the optimal combination of hyperparameters [62]. The grid search method achieves automatic optimization of hyperparameters and solves the problem of manual methods relying on expert’s experience. Grid search is effective for low-dimensional optimization but becomes computationally expensive for high-dimensional optimization problems (when there are many hyperparameters). Additionally, in grid search, continuous-valued hyperparameters need to be discretized, leading to a discrete search space. So not all possible values of hyperparameters are included in the search space. Therefore, grid search can only find suboptimal solutions [63,64].
Random search is a commonly used optimization algorithm [65]. It randomly samples the hyperparameter space and evaluates each sampling point, repeating the process to find the best combination of hyperparameters. Its time complexity is lower than grid search, but due to random selection, it is almost impossible to find the optimal combination of hyperparameters. Furthermore, the random search algorithm cannot learn from historical iterations, resulting in unstable results [66,67].
The Bayesian optimization method uses probability models to “learn” from previous iterations and guides the search towards the optimal hyperparameters in the search space [68]. Compared with memory-less grid search and random search methods, Bayesian optimization can find better parameters in fewer iterations. However, when the search space of hyperparameters is complex, Bayesian optimization tends to focus on the most promising region during the exploration process, which may lead to being trapped in local optima [69,70].
In addition to the above optimization algorithms, there is a class of population-based stochastic optimization algorithms called swarm intelligence (SI) optimization algorithms. These population-based optimization algorithms simulate the optimization process of various animal or object populations to find the optimal solution of the model [71]. These SI algorithms are simple, flexible, efficient, and have a low dependence on the prior knowledge of the problem, and are widely used in the optimization of deep learning models in other application fields, such as wind speed forecasting [72,73,74], image recognition [75,76], medical diagnosis [77,78], water resources management [79,80], stock price forecasting [81,82], solar radiation forecasting [83,84], and fault diagnosis [85]. However, no application of swarm intelligence optimization algorithms has been reported in the field of CO2 emissions forecasting.

3. Methods

This paper proposed a framework for predicting and optimizing CO2 emissions, which was named CGAOA-AttBiGRU. In this framework, AttBiGRU is a deep learning prediction model used for predicting CO2 emissions, and CGAOA is a swarm intelligence optimization algorithm used to optimize the five important hyperparameters of AttBiGRU. This section provides a detailed introduction to the model and the optimization algorithm, respectively.

3.1. CO2 Emissions Prediction Model: AttBiGRU

The main network structure of the proposed AttBiGRU adopts BiGRU, which is a variant of GRU. A GRU unit includes two gate structures: a reset gate and an update gate. Compared with LSTM, GRU has a simpler structure and runs faster, so it can be used to build larger-scale prediction models. Figure 1 shows the internal structure of a GRU unit.
The calculation of GRU is shown in Formulas (1)–(4).
z t = σ W Z h t 1 , x t + b z
r t = σ W r h t 1 , x t + b r
h t ~ = t a n h w h r t h t 1 , x t + b h
h t = 1 z t h t ~ + h t 1 z t
where z t is the update gate, r t is the reset gate, h ~ t is the candidate hidden state, h t is the hidden state.   w z , w r and w h are learnable weight matrix. b z , b r and b h are learnable biases.
A BiGRU unit contains a forward GRU unit and a backward GRU unit, which are, respectively, used to extract forward and backward temporal features from sequences. BiGRU has stronger sequence modeling ability than basic GRU; therefore, this paper chooses it as the basic unit of the predicting model. Furthermore, we have added an attention mechanism to the model to adaptively weight the bidirectional temporal features extracted by BiGRU, so as to improve its performance. The specific structure of our proposed model, called AttBiGRU, is shown in Figure 2.

3.2. CGAOA

The proposed AttBiGRU model contains many hyperparameters that can affect its performance. Therefore, an optimization algorithm is needed to find the optimal hyperparameters. In this paper, we improved the arithmetic optimization algorithm (AOA) by adding chaotic mapping and Gaussian mutation and proposed CGAOA. CGAOA is then used to optimize AttBiGRU. In this section, the original AOA and our improved CGAOA will be introduced separately.

3.2.1. The Original AOA

The arithmetic optimization algorithm (AOA) was proposed by Laith Abualigah et al. in 2021 [44]. It has the advantages of strong global search ability, fast convergence speed, good adaptability, and robustness. Experiments on benchmark functions show that its performance exceeds many optimization algorithms [40]. Through the arithmetic operators in mathematical operations—namely addition (+), subtraction (−), multiplication (×), and division (÷)—the algorithm gradually explores and develops optimal solutions using optimization rules after initialization of a clustering-based metaheuristic algorithm that can solve optimization problems without calculating derivatives. The arithmetic optimization algorithm primarily consists of three stages: initialization, exploration, and exploitation. These stages are elaborated upon as follows:
(1)
Initialization phase
In AOA, the first phase is to generate a set of candidate solutions (X) randomly, as is shown in Equation (5).
X = x 1,1 x 1,2 x 1 , d x 2,1 x 2,2 x 2 , d x 31 x 3,2 x 3 , d . . . . . . . ¨ . . . . x n , 1 x n , 2 x n , d
where d represents the number of the hyperparameters to be optimized. n is the number of solutions. x i , j indicates the position of the i -th solution in the j-th dimension.
Then, exploration and exploitation are determined by the math optimizer accelerated (MOA) function, which is shown in Equation (6).
M O A ( t ) = M i n + t × M a x M i n T
where MOA denotes the function value at the t -th iteration, T is the maximum number of iterations, Min and Max represent the minimum and maximum values of MOA function, respectively.
In each iteration, a random number r 1 is generated first. If r 1 is larger than M O A ( t ) , update the position based on the formula in exploration phase. Otherwise, update the position based on the formula in the exploitation phase.
To balance exploration and exploitation, AOA adopts a parameter called the math optimizer probability (MOP), which is defined in Equation (7).
M O P ( t ) = 1 t 1 α T 1 α
where α is a sensitive parameter that determines the level of exploitative search during the optimization. In the original work [40], α is found best to be 5.
(2)
Exploration phase
In the exploration phase, AOA provides two methods for updating positions: one using the division (D) operator and the other using the multiplication (M) operator. The position update formulas for this phase are shown in Equation (8).
x i , j t + 1 = b e s t ( x j ) ÷ MOP ( ( t ) + ε ) × ( ( U B j L B j ) × μ + L B j )               r 2 < 0.5     b e s t ( x j ) × MOP ( t ) × ( ( U B j L B j ) × μ + L B j )             o t h e r w i s e
where, r 2 is a random number between (0, 1). x i , j t + 1 represents the i -th solution at the j -th position in the t + 1 th iteration. b e s t ( x j ) denotes the best solution obtained in the t -th iteration. ε is defined as a small number used to avoid division by zero. U B j and L B j represent the upper and lower boundaries of the j -th hyperparameter to be optimized, respectively. μ is a control parameter used to adjust the search process and is typically set to 0.499 [40].
(3)
Exploitation phase
In the Exploitation phase, AOA uses addition (A) and subtraction (S) to update positions, as shown in Equation (9).
x i , j t + 1 = b e s t ( x j ) MOP ( t ) × ( ( U B j L B j ) × μ + L B j )               r 3 < 0.5 b e s t ( x j ) + MOP ( t ) × ( ( U B j L B j ) × μ + L B j )         o t h e r w i s e
where, r 3 is a random number between (0, 1).

3.2.2. Proposed CGAOA

The original AOA algorithm has advantages such as simple structure, fewer parameters, and low computational cost, but it also has some shortcomings, such as an imbalance between the exploitation and exploration phases, and it is easy to fall into local optima. To address these two issues, we have added two mechanisms to the basic AOA algorithm. The following gives detailed overviews of our improvements.
(1)
Chaotic mapping mechanism
In the original AOA, whether the algorithm enters the exploration phase or the exploitation phase is determined by the value of MOA(t) and a random number r 1 that varies between 0 and 1. When r 1 is greater than MOA (t), the algorithm enters the exploration phase; otherwise, the algorithm enters the exploitation phase. However, MOA is a linearly increasing function of the iteration number t. This means that in the early stages of iteration, the value of MOA is small; while in the later stages of iteration, the value of MOA is large. Thus, the original AOA will likely enter the exploration phase in the early stages of iteration. While in the later stages of iteration, the algorithm tends to enter the exploration phase. This will lead to an imbalance between exploration and development at different iteration stages. In addition, the MOP in the original AOA is a monotonically decreasing function with the iteration number t, which will limit the diversity of position updates in the algorithm. To improve these two drawbacks, we use a chaotic mapping mechanism in MOA and MOP.
Chaos is a deterministic stochastic dynamic system, and chaotic mappings can be considered as a source of randomness. Adding chaotic mapping can effectively introduce diversity into the search process by blending determinism and randomness [86]. Some researchers have integrated chaotic mapping mechanisms into various optimization algorithms to enhance random diversity, thereby improving the ability to search for optimal or near-optimal solutions in complex multimodal scenarios.
In this paper, we add the logistic chaotic mapping mechanism into MOA and MOP to balance the exploration and exploitation phase, and to introduce uncertainty and fluctuation into the position updates.
The calculation formula of logistic chaotic mapping used in this paper is shown in Equation (10):
P t + 1 = a p t 1 p t
where t represents the iteration number, p t denotes the t -th chaotic number, and p 0 is a random number in [0, 1]. a is set to 4.
After adding logistic chaotic mapping, the new formulas of MOA and MOP are shown in Equations (11) and (12). the new formulas of MOA and MOP are shown in Equations (11) and (12).
M O A = a p t 1 p t × t T 1 8
M O A = a p t 1 p t × 1 t T 1 2
where the value of a is 4, t represents the current iteration number, and T represents the maximum number of iterations. The power of 1/8 in Formula (11) is determined experimentally. We conducted multiple experiments and found that the algorithm performs best when the power in MOA is 1/8.
Figure 3 shows the comparison effect with and without the logistic chaotic mapping mechanism. It can be seen that the MOA and MOP curves without logistic chaotic mapping exhibit fixed and monotonic changes. In comparison, the MOA and MOP curves with the addition of logical chaotic mapping exhibit good randomness, which increases the diversity of position updates and enables the algorithm to have better exploration and exploitation capabilities.
(2)
Gaussian mutation mechanism
The basic AOA algorithm also has a problem of over-reliance on the historical best position during the position update process. This will lead to the algorithm hovering around the local optimum during the optimization process, rather than exploring the entire search space to find the global optimum. To address this issue, Çelik E [87] introduced the Gaussian mutation mechanism into the AOA algorithm. Inspired by his work, we introduced a Gaussian mutation mechanism after the exploration and exploitation phase of AOA. The newly added position update formula is as follows in Equation (13).
x i , j t + 1 2 × MOP × f u × x i , j t           r 4 0.5 2 × MOP × f u + x i , j t           r 4 > 0.5
where r 4 is a newly generated random number, ranging from 0 to 1, f ( u ) represents the probability density function of a Gaussian distribution, u is a random variable that follows a standard Gaussian distribution with mean 0 and standard deviation 1. Its calculation is shown in Equation (14).
f u = 1 2 π e u 2 2
The flowchart of final proposed CGAOA is shown in Figure 4, where the purple sections represent our improvements.

3.2.3. CGAOA Computational Complexity

The complexity analysis of the optimization algorithm is an essential key link to evaluating the performance of the algorithm, because the resource occupation and space cost of the algorithm needs to be inversely proportional to the performance. The computational complexity mainly consists of three aspects: the dimension d of the problem, the population size n and the maximum number of iterations T.
Firstly, the time complexity of initializing n search agents in the search space with dimension d is O(2 × n × d), and the time complexity of updating vectors of n search agents after T iterations is O(T × 2 × n × d). Secondly, the computational complexity of MOA and MOP after introducing chaotic mapping mechanism is O((n − 1)2). In the exploration stage of CGAOA, the time complexity of all individual location updating and border control strategies is O(2 × n × d). In the development stage, the time complexity of all location updating and border control strategies is also O(2 × n × d). To sum up, the overall complexity of CGAOA is O(T × 2 × n × d) + O((n − 1) 2) + O(2 × n × d) + O(2 × n × d).

4. Results and Discussion

4.1. CGAOA Performance Verification Experiments

In this section, experiments are done to test the optimization ability of the proposed CGAOA, including exploitation ability, exploration ability, and local optimal avoidance ability. These experiments include ablation experiments and comparative experiments with nine state-of-the-art metaheuristic algorithms. All experiments are conducted on a computer equipped with an Intel(R) Xeon(R) CPU E5-2686 v4 12-core processor and an NVIDIA GeForce RTX 3060 Ti graphics card with 8 GB VRAM. To make fair comparisons, all algorithms have the same experimental environment. Among them, the maximum number of iterations (T) is set to 1000, the population size ( n ) is set to 50.

4.1.1. Benchmark Functions

To evaluate the performance of CGAOA, we conducted comparative experiments on 24 different benchmark functions. These 24 test functions include nine unimodal benchmark functions (F1–F9, as shown in Table 2) used to assess the algorithm’s exploitation capability, nine multimodal benchmark functions (F10–F18, as shown in Table 3) used to evaluate the algorithm’s exploration capability and six composition functions from CEC2017 (F19–F24, as shown in Table 4) used to assess the local optimum avoidance capability. Range represents the boundary of variables, and F(min) represents the optimal value.
In this section, each algorithm is executed 30 times on each benchmark function to mitigate the influence of random factors. In this paper, the Friedman test method was used to sort and evaluate the results of all algorithms on the benchmark functions.

4.1.2. Ablation Analysis of the CGAOA

This paper added a logistic chaotic mapping mechanism and a Gaussian mutation mechanism to the basic AOA, and proposed CGAOA. This section conducted ablation experiments to verify the improvement of the added mechanisms on the performance of the basic AOA. Table 5 presents variant AOA algorithms with one or more fusion mechanisms, in which “1” means the mechanism is added and “0” indicates the opposite. These AOA variant algorithms were compared and tested on the 24 benchmark functions mentioned earlier.
The experimental results are given in Table 6, where Avg indicates the algorithm’s average rank of Friedman test on the 24 benchmark functions. A lower Avg value indicates a better performance of the algorithm. “+/−/=” represents the CGAOA is better than, worse than, or equal to other AOA variants; rank represents the final ranking of the algorithm.
From Table 6, it can be seen that the performance of basic AOA is the worst. Adding either logical chaotic mapping or Gaussian mutation mechanism can improve the performance of AOA. CGAOA ranks first, which means that adding both logical chaotic mapping and Gaussian mutation mechanisms to AOA can effectively improve its optimization ability.

4.1.3. Comparison with Other Algorithms

To test the performance of the proposed CGAOA, it was compared with nine other well-known algorithms. The comparative algorithms used in this paper are as follows:
  • Grey wolf optimizer (GWO) [30]
  • Beluga whale optimization (BWO) [31]
  • Sine cosine algorithm (SCA) [32]
  • Sparrow search algorithm (SSA) [33]
  • Seagull optimization algorithm (SOA) [34]
  • Differential evolution (DE) [35]
  • Moth flame optimization (MFO) [36]
  • Flower pollination algorithm (FPA) [37]
  • Particle swarm optimization (PSO) [38]
The parameter settings of the above comparative algorithms are shown in Table 7.
All algorithms are tested in the same environment, with a maximum iteration of 1000 and a population size of 50. To waken the impact of randomness on experimental results, each algorithm was run independently 30 times on each benchmark function and then the mean (Aver) and standard deviation (Std) from 30 separate runs of each algorithm on each test function were calculated (presented in Table 8, the best test results are in bold), with the smaller Aver indicating the better performance.
According to Aver in Table 8, it is obvious that CGAOA ranks first on 21 out of 24. Based on the Aver in Table 8, the Friedman test method is carried out to sort the fitness of all algorithms on the benchmark functions. The results of the Friedman test are presented in Figure 5, with the smallest Average ranking indicating the best performance. Obviously, CGAOA has the lowest average rankings in terms of unimodal, multimodal, and hybrid test functions. This indicates that the optimization performance of CGAOA outperforms the competitive algorithm no matter on unimodal, multimodal, or on hybrid functions.
Meanwhile, the Wilcoxon rank-sum test is used to illustrate the significant difference between CGAOA and the competing algorithms. Table 9 reported the p-values from the Wilcoxon rank-sum test, where a p-value of <0.05 indicates a statistically significant advantage of CGAOA over its competitors (the result of p-value > 0.05 are in bold). The non-significant results are highlighted in bold. Obviously, the majority of the p-values are lower than 0.05, indicating that the superiority of CGAOA on benchmark functions is statistically significant.
Figure 6, Figure 7 and Figure 8 show the curves of the fitness of each algorithm on each benchmark function during the iteration process. It can be seen that in the vast majority of benchmark functions, the fitness of the CGAOA algorithm proposed in this paper decreases faster, and the final fitness is the lowest; indicating that, compared with the competitive algorithms, the convergence speed of CGAOA is faster and the solutions found are better.
In summary, the results on 24 benchmark functions indicate that the proposed CGAOA algorithm is better than the comparison algorithms.

4.2. CGAOA-AttBiGRU Framework for CO2 Emission Forecasting

Previous experiments have been conducted on benchmark functions. In this section, we will apply our proposed CGAOA to optimize practical application. We proposed a framework for CO2 emission forecasting named CGAOA-AttBiGRU, where AttBiGRU is a deep learning model for CO2 emission forecasting, and CGAOA is used to optimize the five hyperparameters of AttBiGRU.
For hyperparameter optimization in deep learning models, the optimization space has asymmetry and non-convexity; and existing grid search methods, Bayesian optimization methods, and random search methods have limitations when dealing with such problems. The proposed CGAOA algorithm is very suitable for optimizing deep learning models, mainly because chaotic mapping can increase the initial values of hyperparameters and the diversity of hyperparameter updates, and Gaussian mutation can increase the amplitude of hyperparameter updates. The AOA algorithm combined with these two strategies can quickly converge the model training process and find the global optimal solution.

4.2.1. CO2 Emission Data

The CO2 emission data in this section come from four sectors in China, including power, industry, transport, and resident sectors. The time range of the data is from 1 January 2019 to 31 June 2023. During this period, each sector contains 1612 sampling points. Figure 9 shows all the data used in this section. Among them, the x-axis represents the date order starting from 1 January 2019, the y-axis represents CO2 emissions, which is measured in millions of metric tons (MM·T−1).

4.2.2. Sample Making

In this paper, we used 12 consecutive days to predict the CO2 emission on the next day. The data samples are divided using a sliding window approach, where each sample consists of 12 data points as input and 1 data point as output (as illustrated in Figure 10), where the green data points represent the input, while the red data point represents the output. With this method, a total of 1599 training samples are obtained, among which, the first 60% are used as the training set, 20% as the validation set, and the remaining 20% as the test set.

4.2.3. CGAOA-AttBiGRU Flowchart

When using AttBiGRU for CO2 emission forecasting, five hyperparameters have the greatest impact on the predictive performance of the AttBiGRU, including: (1) the number of units in each BiGRU layer (unit); (2) the proportion of neurons randomly dropped during training (dropout_rate); (3) the number of samples used in each training iteration (batch_size); (4) the step size of parameter updates during model training (learning_rate); and (5) the hyperparameters of the Attention layer (attention_column).
The improved CGAOA is used to optimize these five hyperparameters. Firstly, the upper and lower boundaries of these five hyperparameters should be given to form the search space. The search space for these five hyperparameters is shown in Table 10.
Secondly, initialize CGAOA. Among them, the maximum number of iterations T is 200, the dimension d is 5, the population size n is 30. Then, the loss of AttBiGRU is used as the fitness function (in this paper, the loss function is RMSE). The solver of AttBiGRU is set to AdaGrad. Finally, the CGAOA algorithm is used to search for the optimal hyperparameters of AttBiGRU. Figure 11 shows the flowchart of the entire CGAOA-AttBiGRU framework.

4.2.4. Evaluation Metrics

In this section, three indicators are used for model evaluation, including Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). The calculation formula for these indicators is as follows:
M A E = 1 n i = 1 n | y i y i ^
R M S E = 1 n i = 1 n y i y i ^ 2
M A P E = 1 n i = 1 n y i y i ^ y i × 100 %
Among them, n is the number of test samples, y i and y ^ i are the true and predicted value of the sample i, separately.

4.2.5. Comparison of AttBiGRU Optimized by Various Algorithms

In this section, in addition to CGAOA, we also used 12 optimization algorithms to optimize our proposed CO2 emission prediction model AttBiGRU, and compared the optimization performance of different optimization algorithms. The comparative algorithms include three traditional optimization algorithms (i.e., Grid Search, Random Search and Bayesian optimization algorithm) and nine state-of-the-art swarm intelligence optimization algorithms (i.e., GWO, BWO, CSA, SSA, SOA, DE, MFO, FPA, and PSO). To minimize the error arising from experimental contingency. Each comparative experiment was repeated 30 times.
The experimental dataset adopts the CO2 emission data of four sectors in China. The results are shown in Table 11, Table 12, Table 13 and Table 14. Among them, the best results are bold, and unit, dr, bs, lr and ac are the optimal hyperparameters optimized by each algorithm; representing unit, dropout_rate, batch_size, learning_rate, and attention_column described in Table 10, respectively.
Based on the results from Table 11, Table 12, Table 13 and Table 14, it is evident that the performances of traditional optimization algorithms (i.e., grid search, random search, Bayesian) are significantly lower than those of swarm intelligence optimization algorithms. Among the 10 swarm intelligence optimization algorithms, our improved CGAOA is the best.
It can also be seen that, using different optimization algorithms to optimize the same prediction model, AttBiGRU can bring significant performance improvements. This indicates that in deep learning applications, model optimization is important, and may even be more important than model selection.

4.2.6. Comparison with Other Models

In the previous section, we discussed the performance of AttBiGRU optimized by different algorithms and found that the model optimized by CGAOA algorithm performed the best. In this section, we compare the optimal AttBiGRU with other machine learning and deep learning models, including SVM, ANN, GRU, LSTM, and BiGRU. In addition, we also compare our model with two statistical models, ARMA and ARIMA. For fair comparison, all models were optimized using the proposed CGAOA, and each experiment was repeated 30 times. The values of the evaluation indexes in Table 15 are all the average values of 30 times. The results of various models are shown in Table 15 (the best results are bold), where Rank indicates the final rank of a model. It can be seen that our AttBiGRU is significantly better than the comparison models in all four sectors.
Figure 12, Figure 13, Figure 14 and Figure 15 show the comparison between the predicted value and the actual value of each model in the test set in four sectors. Among them, the x-axis represents time, the y-axis represents CO2 emission, the red area represents that the predicted value is greater than the actual value, and the blue area represents that the predicted value is less than the actual value. From these figures, we can see that our proposed AttBiGRU has the best performance, indicating that our model has the smallest prediction error.

5. Conclusions

This paper proposed a deep learning framework, CGAOA-AttBiGRU, for CO2 emission prediction, where AttBiGRU is a deep learning model used for CO2 emission prediction, and CGAOA is an optimization algorithm used to optimize AttBiGRU. We first conducted experiments on 24 benchmark functions to verify the performance of the proposed CGAOA. These experiments include ablation experiments and comparisons with nine other popular metaheuristic algorithms. The results show that the improved CGAOA is superior to the comparison algorithms on the 24 benchmark functions. Then, AttBiGRU was used to predict the CO2 emissions of four sectors in China. During this process, CGAOA, along with nine other metaheuristic optimization algorithms and three traditional optimization algorithms, were used to optimize AttBiGRU, respectively. The results indicate that AttBiGRU optimized by CGAOA has the best performance. Finally, the optimized AttBiGRU model was compared with five machine learning and deep learning models, including ARMA, ARIMA, ANN, SVM, GRU, LSTM, and BiGRU. The results indicate that our proposed AttBiGRU has the best predictive performance in all four sectors.

Author Contributions

Conceptualization, H.L. and Y.W.; methodology, D.T. and H.W.; writing—original draft: Y.C.; formal analysis, Y.W.; software: H.W.; writing—review and editing: H.L., Y.W., D.T., Y.C. and H.W.; supervision, D.T.; funding acquisition, H.L. and Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Science Research Project of Hebei Education Department (ZC2024028), Natural Science Foundation of Hebei Province (D2023512004), Langfang City science and Technology support plan Project (2023011105), Science and Technology Innovation Program for Postgraduate students in IDP subsidized by Fundamental Research Funds for the Central Universities (ZY20240339), Langfang City science and Technology support plan Project (23011064).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Aichele, R.; Felbermayr, G. Kyoto and carbon leakage: An empirical analysis of the carbon content of bilateral trade. Rev. Econ. Stat. 2015, 97, 104–115. [Google Scholar] [CrossRef]
  2. Chiroma, H.; Abdul-Kareem, S.; Khan, A.; Nawi, N.M.; Gital, A.Y.U.; Shuib, L.; Abubakar, A.I.; Rahman, M.Z.; Herawan, T. Global warming: Predicting OPEC carbon dioxide emissions from petroleum consumption using neural network and hybrid cuckoo search algorithm. PLoS ONE 2015, 10, e0136140. [Google Scholar] [CrossRef] [PubMed]
  3. Chen, W. The costs of mitigating carbon emissions in China: Findings from China MARKAL-MACRO modeling. Energy Policy 2005, 33, 885–896. [Google Scholar] [CrossRef]
  4. Deschênes, O.; Greenstone, M. The economic impacts of climate change: Evidence from agricultural output and random fluctuations in weather. Am. Econ. Rev. 2007, 97, 354–385. [Google Scholar] [CrossRef]
  5. van der Gaast, W.; Sikkema, R.; Vohrer, M. The contribution of forest carbon credit projects to addressing the climate change challenge. Clim. Policy 2018, 18, 42–48. [Google Scholar] [CrossRef]
  6. Anjos, M.F.; Feijoo, F.; Sankaranarayanan, S. A multinational carbon-credit market integrating distinct national carbon allowance strategies. Appl. Energy 2022, 319, 119181. [Google Scholar] [CrossRef]
  7. Liu, L.; Chen, C.; Zhao, Y.; Zhao, E. China’s carbon-emissions trading: Overview, challenges and future. Renew. Sustain. Energy Rev. 2015, 49, 254–266. [Google Scholar] [CrossRef]
  8. Zhang, Y.J. The impact of financial development on carbon emissions: An empirical analysis in China. Energy Policy 2011, 39, 2197–2203. [Google Scholar] [CrossRef]
  9. Koutnik, J.; Greff, K.; Gomez, F.; Schmidhuber, J. A clockwork RNN. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 22–24 June 2014; pp. 1863–1871. [Google Scholar]
  10. Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 3, 1235–1270. [Google Scholar] [CrossRef]
  11. Dey, R.; Salem, F.M. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; pp. 1597–1600. [Google Scholar]
  12. Li, Y.; Zhu, Z.; Kong, D.; Han, H.; Zhao, Y. EA-LSTM: Evolutionary attention-based LSTM for time series prediction. Knowl.-Based Syst. 2019, 181, 104785. [Google Scholar] [CrossRef]
  13. Hussain, B.; Afzal, M.K.; Ahmad, S.; Mostafa, A.M. Intelligent traffic flow prediction using optimized GRU model. IEEE Access 2021, 9, 100736–100746. [Google Scholar] [CrossRef]
  14. Yan, J.; Liu, J.; Yu, Y.; Xu, H. Water quality prediction in the luan river based on 1-drcnn and bigru hybrid neural network model. Water 2021, 13, 1273. [Google Scholar] [CrossRef]
  15. Zhao, X.; Kang, H.; Feng, T.; Meng, C.; Nie, Z. A hybrid model based on LFM and BiGRU toward research paper recommendation. IEEE Access 2020, 8, 188628–188640. [Google Scholar] [CrossRef]
  16. Zhi, Y.; Bao, Z.; Zhang, S.; He, R. BiGRU based online multi-modal driving maneuvers and trajectory prediction. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2021, 235, 3431–3441. [Google Scholar] [CrossRef]
  17. Liu, J.; Yang, Y.; Lv, S.; Wang, J.; Chen, H. Attention-based BiGRU-CNN for Chinese question classification. J. Ambient Intell. Humaniz. Comput. 2019, 1–12. [Google Scholar] [CrossRef]
  18. Bao, K.; Bi, J.; Ma, R.; Sun, Y.; Zhang, W.; Wang, Y. A Spatial-Reduction Attention-Based BiGRU Network for Water Level Prediction. Water 2023, 15, 1306. [Google Scholar] [CrossRef]
  19. Zhu, Q.; Jiang, X.; Ye, R. Sentiment analysis of review text based on BiGRU-attention and hybrid CNN. IEEE Access 2021, 9, 149077–149088. [Google Scholar] [CrossRef]
  20. Chen, J.; Zhang, J.; Chen, H.; Zhao, Y.; Wang, H. A TDV attention-based BiGRU network for AIS-based vessel trajectory prediction. iScience 2023, 26, 106383. [Google Scholar] [CrossRef]
  21. Chi, D.; Yang, C. Wind power prediction based on WT-BiGRU-attention-TCN model. Front. Energy Res. 2023, 11, 1156007. [Google Scholar] [CrossRef]
  22. Khalid, R.; Javaid, N. A survey on hyperparameters optimization algorithms of forecasting models in smart grid. Sustain. Cities Soc. 2020, 61, 102275. [Google Scholar] [CrossRef]
  23. Bergstra, J.; Yamins, D.; Cox, D. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the International Conference on Machine Learning, PMLR, Atlanta, GA, USA, 17–19 June 2013; pp. 115–123. [Google Scholar]
  24. Probst, P.; Boulesteix, A.L.; Bischl, B. Tunability: Importance of hyperparameters of machine learning algorithms. J. Mach. Learn. Res. 2019, 20, 1934–1965. [Google Scholar]
  25. Bergstra, J.; Komer, B.; Eliasmith, C.; Yamins, D.; Cox, D.D. Hyperopt: A python library for model selection and hyperparameter optimization. Comput. Sci. Discov. 2015, 8, 014008. [Google Scholar] [CrossRef]
  26. MacKay, D.J.C. Comparison of approximate methods for handling hyperparameters. Neural Comput. 1999, 11, 1035–1068. [Google Scholar] [CrossRef]
  27. Zhang, B.; Rajan, R.; Pineda, L.; Lambert, N.; Biedenkapp, A.; Chua, K.; Hutter, F.; Calandra, R. On the importance of hyperparameter optimization for model-based reinforcement learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Virtual, 13–15 April 2021; pp. 4015–4023. [Google Scholar]
  28. Mai, L.; Koliousis, A.; Li, G.; Brabete, A.O.; Pietzuch, P. Taming hyper-parameters in deep learning systems. ACM SIGOPS Oper. Syst. Rev. 2019, 53, 52–58. [Google Scholar] [CrossRef]
  29. Kaur, S.; Aggarwal, H.; Rani, R. Hyper-parameter optimization of deep learning model for prediction of Parkinson’s disease. Mach. Vis. Appl. 2020, 31, 32. [Google Scholar] [CrossRef]
  30. Yeh, W.C.; Lin, Y.P.; Liang, Y.C.; Lai, C.M.; Huang, C.L. Simplified swarm optimization for hyperparameters of convolutional neural networks. Comput. Ind. Eng. 2023, 177, 109076. [Google Scholar] [CrossRef]
  31. Zhang, R.; Qiu, Z. Optimizing hyper-parameters of neural networks with swarm intelligence: A novel framework for credit scoring. PLoS ONE 2020, 15, e0234254. [Google Scholar] [CrossRef]
  32. Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
  33. Zhong, C.; Li, G.; Meng, Z. Beluga whale optimization: A novel nature-inspired metaheuristic algorithm. Knowl.-Based Syst. 2022, 251, 109215. [Google Scholar] [CrossRef]
  34. Mirjalili, S. SCA: A sine cosine algorithm for solving optimization problems. Knowl.-Based Syst. 2016, 96, 120–133. [Google Scholar] [CrossRef]
  35. Xue, J.; Shen, B. A novel swarm intelligence optimization approach: Sparrow search algorithm. Syst. Sci. Control Eng. 2020, 8, 22–34. [Google Scholar] [CrossRef]
  36. Dhiman, G.; Kumar, V. Seagull optimization algorithm: Theory and its applications for large-scale industrial engineering problems. Knowl.-Based Syst. 2019, 165, 169–196. [Google Scholar] [CrossRef]
  37. Cuevas, E.; Zaldivar, D.; Pérez-Cisneros, M. A novel multi-threshold segmentation approach based on differential evolution optimization. Expert Syst. Appl. 2010, 37, 5265–5271. [Google Scholar] [CrossRef]
  38. Mirjalili, S. Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowl.-Based Syst. 2015, 89, 228–249. [Google Scholar] [CrossRef]
  39. Yang, X.S. Flower pollination algorithm for global optimization. In Proceedings of the International Conference on Unconventional Computing and Natural Computation, Orléans, France, 3–7 September 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 240–249. [Google Scholar]
  40. Li, W.; Wang, G.G.; Gandomi, A.H. A survey of learning-based intelligent optimization algorithms. Arch. Comput. Methods Eng. 2021, 28, 3781–3799. [Google Scholar] [CrossRef]
  41. Gad, A.G. Particle swarm optimization algorithm and its applications: A systematic review. Arch. Comput. Methods Eng. 2022, 29, 2531–2561. [Google Scholar] [CrossRef]
  42. Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN′95-International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
  43. Kaveh, A.; Hamedani, K.B. Improved arithmetic optimization algorithm and its application to discrete structural optimization. Structures 2022, 35, 748–764. [Google Scholar] [CrossRef]
  44. Abualigah, L.; Diabat, A.; Mirjalili, S.; Abd Elaziz, M.; Gandomi, A.H. The arithmetic optimization algorithm. Comput. Methods Appl. Mech. Eng. 2021, 376, 113609. [Google Scholar] [CrossRef]
  45. Kavoosi, H.; Saidi, M.H.; Kavoosi, M.; Bohrng, M. Forecast global carbon dioxide emission by use of genetic algorithm (GA). Int. J. Comput. Sci. Issues 2012, 9, 418. [Google Scholar]
  46. Lotfalipour, M.R.; Falahi, M.A.; Bastam, M. Prediction of CO2 emissions in Iran using grey and ARIMA models. Int. J. Energy Econ. Policy 2013, 3, 229–237. [Google Scholar]
  47. Sun, W.; Liu, M. Prediction and analysis of the three major industries and residential consumption CO2 emissions based on least squares support vector machine in China. J. Clean. Prod. 2016, 122, 144–153. [Google Scholar] [CrossRef]
  48. Zhao, X.; Han, M.; Ding, L.; Calin, A.C. Forecasting carbon dioxide emissions based on a hybrid of mixed data sampling regression model and back propagation neural network in the USA. Environ. Sci. Pollut. Res. 2018, 25, 2899–2910. [Google Scholar] [CrossRef] [PubMed]
  49. Wang, H.; Liang, W.; Liang, S.; Chen, B. Research on Carbon Dioxide Concentration Prediction Based on RNN Model in Deep Learning. Highlights Sci. Eng. Technol. 2023, 48, 281–287. [Google Scholar] [CrossRef]
  50. Zuo, Z.; Guo, H.; Cheng, J. An LSTM-STRIPAT model analysis of China’s 2030 CO2 emissions peak. Carbon Manag. 2020, 11, 577–592. [Google Scholar] [CrossRef]
  51. Kumari, S.; Singh, S.K. Machine learning-based time series models for effective CO2 emission prediction in India. Environ. Sci. Pollut. Res. 2023, 30, 116601–116616. [Google Scholar] [CrossRef]
  52. Yang, F.; Liu, D.; Zeng, Q.; Chen, Z.; Ye, Y.; Yang, T.; He, Y.; Zhou, S.; Zheng, L. Prediction of Mianyang Carbon Emission Trend Based on Adaptive GRU Neural Network. In Proceedings of the 2022 4th International Conference on Frontiers Technology of Information and Computer (ICFTIC), Qingdao, China, 2–4 December 2022; IEEE: New York, NY, USA, 2022; pp. 747–750. [Google Scholar]
  53. Cao, L.; Han, Y.; Feng, M.; Geng, Z.; Lu, Y.; Chen, L.; Ping, W.; Xia, T.; Li, S. Economy and carbon emissions optimization of different provinces or regions in China using an improved temporal attention mechanism based on gate recurrent unit. J. Clean. Prod. 2024, 434, 139827. [Google Scholar] [CrossRef]
  54. Claesen, M.; De Moor, B. Hyperparameter search in machine learning. arXiv 2015, arXiv:1502.02127. [Google Scholar]
  55. Belete, D.M.; Huchaiah, M.D. Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results. Int. J. Comput. Appl. 2022, 44, 875–886. [Google Scholar] [CrossRef]
  56. Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
  57. Wu, J.; Chen, X.Y.; Zhang, H.; Xiong, L.D.; Lei, H.; Deng, S.H. Hyperparameter optimization for machine learning models based on Bayesian optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar]
  58. Bacanin, N.; Bezdan, T.; Tuba, E.; Strumberger, I.; Tuba, M. Optimizing convolutional neural network hyperparameters by enhanced swarm intelligence metaheuristics. Algorithms 2020, 13, 67. [Google Scholar] [CrossRef]
  59. Chang, Z.H.; Yuan, W.; Huang, K. Remaining useful life prediction for rolling bearings using multi-layer grid search and LSTM. Comput. Electr. Eng. 2022, 101, 108083. [Google Scholar] [CrossRef]
  60. Priyadarshini, I.; Cotton, C. A novel LSTM-CNN-grid search-based deep neural network for sentiment analysis. J. Supercomput. 2021, 77, 13911–13932. [Google Scholar] [CrossRef]
  61. Huang, Q.; Mao, J.; Liu, Y. An improved grid search algorithm of SVR parameters optimization. In Proceedings of the 2012 IEEE 14th International Conference on Communication Technology, Chengdu, China, 9–11 November 2012; pp. 1022–1026. [Google Scholar]
  62. Fayed, H.A.; Atiya, A.F. Speed up grid-search for parameter selection of support vector machines. Appl. Soft Comput. 2019, 80, 202–210. [Google Scholar] [CrossRef]
  63. Tkachenko, A. Grid search in stellar parameters: A software for spectrum analysis of single stars and binary systems. Astron. Astrophys. 2015, 581, A129. [Google Scholar] [CrossRef]
  64. Zhao, Y.; Zhang, W.; Liu, X. Grid search with a weighted error function: Hyper-parameter optimization for financial time series forecasting. Appl. Soft Comput. 2024, 154, 111362. [Google Scholar] [CrossRef]
  65. Zabinsky, Z.B. Random Search Algorithms; Department of Industrial and Systems Engineering, University of Washington: Washington, DC, USA, 2009. [Google Scholar]
  66. Andonie, R.; Florea, A.C. Weighted random search for CNN hyperparameter optimization. arXiv 2020, arXiv:2003.13300. [Google Scholar] [CrossRef]
  67. Ragab, M.G.; Abdulkadir, S.J.; Aziz, N. Random search one dimensional CNN for human activity recognition. In Proceedings of the 2020 International Conference on Computational Intelligence (ICCI), Bandar Seri Iskandar, Malaysia, 8–9 October 2020; pp. 86–91. [Google Scholar]
  68. Frazier, P.I. A tutorial on Bayesian optimization. arXiv 2018, arXiv:1807.02811. [Google Scholar]
  69. Cho, H.; Kim, Y.; Lee, E.; Choi, D.; Lee, Y.; Rhee, W. Basic enhancement strategies when using Bayesian optimization for hyperparameter tuning of deep neural networks. IEEE Access 2020, 8, 52588–52608. [Google Scholar] [CrossRef]
  70. Han, S.; Eom, H.; Kim, J.; Park, C. Optimal DNN architecture search using Bayesian Optimization Hyperband for arrhythmia detection. In Proceedings of the 2020 IEEE Wireless Power Transfer Conference (WPTC), Seoul, Republic of Korea, 15–19 November 2020; pp. 357–360. [Google Scholar]
  71. Mavrovouniotis, M.; Li, C.; Yang, S. A survey of swarm intelligence for dynamic optimization: Algorithms and applications. Swarm Evol. Comput. 2017, 33, 1–17. [Google Scholar] [CrossRef]
  72. Zhang, C.; Ma, H.; Hua, L.; Sun, W.; Nazir, M.S.; Peng, T. An evolutionary deep learning model based on TVFEMD, improved sine cosine algorithm, CNN and BiLSTM for wind speed prediction. Energy 2022, 254, 124250. [Google Scholar] [CrossRef]
  73. Chen, X.; Li, Y.; Zhang, Y.; Ye, X.; Xiong, X.; Zhang, F. A novel hybrid model based on an improved seagull optimization algorithm for short-term wind speed forecasting. Processes 2021, 9, 387. [Google Scholar] [CrossRef]
  74. Wang, J.; Yang, W.; Du, P.; Niu, T. A novel hybrid forecasting system of wind speed based on a newly developed multi-objective sine cosine algorithm. Energy Convers. Manag. 2018, 163, 134–150. [Google Scholar] [CrossRef]
  75. Singh, P.; Chaudhury, S.; Panigrahi, B.K. Hybrid MPSO-CNN: Multi-level particle swarm optimized hyperparameters of convolutional neural network. Swarm Evol. Comput. 2021, 63, 100863. [Google Scholar] [CrossRef]
  76. Huang, C.L.; Dun, J.F. A distributed PSO-SVM hybrid system with feature selection and parameter optimization. Appl. Soft Comput. 2008, 8, 1381–1391. [Google Scholar] [CrossRef]
  77. Xia, J.; Wang, Z.; Yang, D.; Li, R.; Liang, G.; Chen, H.; Heidari, A.A.; Turabieh, H.; Mafarja, M.; Pan, Z. Performance optimization of support vector machine with oppositional grasshopper optimization for acute appendicitis diagnosis. Comput. Biol. Med. 2022, 143, 105206. [Google Scholar] [CrossRef]
  78. Hu, J.; Heidari, A.A.; Shou, Y.; Ye, H.; Wang, L.; Huang, X.; Chen, H.; Chen, Y.; Wu, P. Detection of COVID-19 severity using blood gas analysis parameters and Harris hawks optimized extreme learning machine. Comput. Biol. Med. 2022, 142, 105166. [Google Scholar] [CrossRef]
  79. Tikhamarine, Y.; Souag-Gamane, D.; Ahmed, A.N.; Kisi, O.; El-Shafie, A. Improving artificial intelligence models accuracy for monthly streamflow forecasting using grey Wolf optimization (GWO) algorithm. J. Hydrol. 2020, 582, 124435. [Google Scholar] [CrossRef]
  80. Rahmanshahi, M.; Jafari-Asl, J.; Shafai Bejestan, M.; Mirjalili, S. A hybrid model for predicting the energy dissipation on the block ramp hydraulic structures. Water Resour. Manag. 2023, 37, 3187–3209. [Google Scholar] [CrossRef]
  81. Xiao, J.; Zhu, X.; Huang, C.; Yang, X.; Wen, F.; Zhong, M. A new approach for stock price analysis and prediction based on SSA and SVM. Int. J. Inf. Technol. Decis. Mak. 2019, 18, 287–310. [Google Scholar] [CrossRef]
  82. Jovanovic, L.; Milutinovic, N.; Gajevic, M.; Krstovic, J.; Rashid, T.A.; Petrovic, A. Sine cosine algorithm for simple recurrent neural network tuning for stock market prediction. In Proceedings of the 2022 30th Telecommunications Forum (TELFOR), Belgrade, Serbia, 15–16 November 2022; pp. 1–4. [Google Scholar]
  83. El-Kenawy, E.S.M.; Mirjalili, S.; Ghoneim, S.S.; Eid, M.M.; El-Said, M.; Khan, Z.S.; Ibrahim, A. Advanced ensemble model for solar radiation forecasting using sine cosine algorithm and newton’s laws. IEEE Access 2021, 9, 115750–115765. [Google Scholar] [CrossRef]
  84. Peng, T.; Zhang, C.; Zhou, J.; Nazir, M.S. An integrated framework of Bi-directional long-short term memory (BiLSTM) based on sine cosine algorithm for hourly solar radiation forecasting. Energy 2021, 221, 119887. [Google Scholar] [CrossRef]
  85. Wu, Y.; Sun, X.; Zhang, Y.; Zhong, X.; Cheng, L. A power transformer fault diagnosis method-based hybrid improved seagull optimization algorithm and support vector machine. IEEE Access 2021, 10, 17268–17286. [Google Scholar] [CrossRef]
  86. Li, X.D.; Wang, J.S.; Hao, W.K.; Zhang, M.; Wang, M. Chaotic arithmetic optimization algorithm. Appl. Intell. 2022, 52, 16718–16757. [Google Scholar] [CrossRef]
  87. Mehmood, K.; Chaudhary, N.I.; Khan, Z.A.; Cheema, K.M.; Raja, M.A.Z.; Shu, C.M. Novel knacks of chaotic maps with Archimedes optimization paradigm for nonlinear ARX model identification with key term separation. Chaos Solitons Fractals 2023, 175, 114028. [Google Scholar] [CrossRef]
Figure 1. Internal structure of a GRU unit.
Figure 1. Internal structure of a GRU unit.
Mathematics 12 02956 g001
Figure 2. Specific structure of our proposed AttBiGRU.
Figure 2. Specific structure of our proposed AttBiGRU.
Mathematics 12 02956 g002
Figure 3. Comparison of effects with and without logistic chaotic mapping.
Figure 3. Comparison of effects with and without logistic chaotic mapping.
Mathematics 12 02956 g003
Figure 4. Flowchart of CGAOA.
Figure 4. Flowchart of CGAOA.
Mathematics 12 02956 g004
Figure 5. Friedman ranking of different algorithms on 24 benchmark functions.
Figure 5. Friedman ranking of different algorithms on 24 benchmark functions.
Mathematics 12 02956 g005
Figure 6. Comparison of the fitness curves of various algorithms during iteration (unimodal functions).
Figure 6. Comparison of the fitness curves of various algorithms during iteration (unimodal functions).
Mathematics 12 02956 g006
Figure 7. Comparison of the fitness curves of various algorithms during iteration (multimodal functions).
Figure 7. Comparison of the fitness curves of various algorithms during iteration (multimodal functions).
Mathematics 12 02956 g007aMathematics 12 02956 g007b
Figure 8. Comparison of the fitness curves of various algorithms during iteration (composition functions).
Figure 8. Comparison of the fitness curves of various algorithms during iteration (composition functions).
Mathematics 12 02956 g008
Figure 9. CO2 emissions from different sectors.
Figure 9. CO2 emissions from different sectors.
Mathematics 12 02956 g009
Figure 10. Schematic diagram of a sample making method using a sliding window.
Figure 10. Schematic diagram of a sample making method using a sliding window.
Mathematics 12 02956 g010
Figure 11. CGAOA-AttBiGRU framework.
Figure 11. CGAOA-AttBiGRU framework.
Mathematics 12 02956 g011
Figure 12. Error curves of different models in the power sector.
Figure 12. Error curves of different models in the power sector.
Mathematics 12 02956 g012aMathematics 12 02956 g012b
Figure 13. Error curves of different models in the industry sector.
Figure 13. Error curves of different models in the industry sector.
Mathematics 12 02956 g013
Figure 14. Error curves of different models in the transport sector.
Figure 14. Error curves of different models in the transport sector.
Mathematics 12 02956 g014
Figure 15. Error curves of different models in the resident sector.
Figure 15. Error curves of different models in the resident sector.
Mathematics 12 02956 g015
Table 1. Research related to CO2 prediction models.
Table 1. Research related to CO2 prediction models.
Research ModelHyperparameter
Optimization Method
Research ContentPrediction Results
Linear and non-linear forms of equations [45]ManualThe linear and nonlinear equations are used to predict the CO2 emissions from 2004 to 2010, and the prediction results are better than the exponential model.Relative error: 1.06%
ARIMA [46]ManualUsing ARIMA to predict the CO2 emissions from 2011 to 2020, the prediction results show that ARIMA is better than GM, AR, ARMA and other models.RMSE: 14.482
MAE: 10.84
MAPE: 6.768%
SVM [47]ManualForecast CO2 emission from 2008 to 2012 by using the least squares SVM model, the prediction results are better than logistic model, BP neural network, and GM model.RMSE: 0.003
MAPE: 0.328%
BP Network [48]ManualForecasting global CO2 emissions over a 15-quarter period using the BP network, the prediction results are better than those of OLS, PDL, ADL, and ARMA models.RMSE: 16.87
RNN [49]ManualThe RNN model is used to predict CO2 emissions from 2012 to 2022, and the prediction results are better than the statistical model (Holt–Winters model) and the machine learning model (linear regression model).MSE: 0.36
MAE: 0.23
LSTM-STRIPAT [50]ManualThe LSTM-STRIPAT model is used to predict the CO2 emissions of China from 2022 to 2040, and the prediction results are better than those of the BPNN and GM models.MAPE: 2.6%
LSTM [51]ManualThe LSTM model is used to predict India’s CO2 emissions from 2019 to 2029, and the prediction results are better than the ARIMA, SARIMAX, and Holt–Winters models.MAPE: 3.101%
RMSE: 60.635
MedAE: 28.298
GRU [52]ManualThe GRU model is used to predict the CO2 emission of Mianyang City from 2020 to 2030, and the prediction results are better than those of the RNN and BP networks.MAPE: 1.87%
Table 2. Unimodal benchmark functions.
Table 2. Unimodal benchmark functions.
TypeNameDescriptionRangeF(min)
Unimodal benchmark functions F1 F 1 ( x ) = x 1 2 + 10 6 i = 2 n x i 2 [−500, 500]0
F2 F 2 ( x ) = i = 1 n x i 2 + 1 2 i = 1 n i x i 2 2 [−500, 500]0
F3 F 3 ( x ) = i = 1 n x i s i n x i + 0.1 x i [−500, 500]0
F4 F 4 ( x ) = i = 1 n ( x i 3 + 10 x i 2 ) 2 + ( 5 x i 1 x i ) 2 [−100, 100]0
F5 F 5 ( x ) = 1 2 i = 1 n ( x i 4 16 x i 2 + 5 x i ) [−500, 500]0
F6 F 6 ( x ) = i = 1 n 1 100 ( x i + 1 x i 2 ) 2 + ( 1 x i 2 ) [−30, 30]0
F7 F 7 ( x ) = i = 1 n x i 2 [0, 500]0
F8 F 8 ( x ) = i = 1 n s i n x i s i n ( i x i 2 π ) 2 [−512, 512]0
F9 F 9 ( x ) = i = 1 n c o s 2 ( x i ) 0.1 i = 1 n e x p ( 2 x i ) [−512, 512]0
Table 3. Multimodal benchmark functions.
Table 3. Multimodal benchmark functions.
TypeNameDescriptionRangeF(min)
Multimodal benchmark functionsF10 F 10 ( x ) = i = 1 n 1 ( x i 2 + 2 x i + 1 2 0.3 c o s ( 3 π x i ) ) [−512, 512]0
F11 F 11 ( x ) = i = 1 n ( x i 2 10 c o s ( 2 π x i ) + 10 ) [−5.12, 5.12]0
F12 F 12 ( x ) = 20 e x p ( 0.2 1 n i = 1 n x i 2 ) e x p ( 1 n i = 1 n c o s ( 2 π x i ) ) [−32, 32]0
F13 F 13 ( x ) = 1 + 1 4000 i = 1 n x i 2 i = 1 n c o s x i i [−600, 600]0
F14 F 14 ( x ) = i = 1 n a i x 1 ( b i 2 + b i x 2 ) b i 2 + b i x 3 + x 4 2 [−50, 50]0
F15 F 15 ( x ) = 4 x 1 2 2.1 x 1 4 + 1 3 x 1 6 + x 3 x 2 4 x 2 2 + 4 x 1 2 [−50, 50]0
F16 F 16 ( x ) = ( x 2 5.1 4 π 2 x 1 2 + 5 π x 1 6 ) 2 + 10 ( 1 1 8 π ) c o s x 1 + 10 [−5, 5]0.398
F17 F 17 ( x ) = 10 n + i = 1 n x i 2 10 c o s ( 2 π x i ) [−512, 512]0
F18 F 18 ( x ) = i = 1 n ( x i 2 ) 0.25 + 0.5 i = 1 n x i + 0.5 n [−500, 500]0
Table 4. Composition benchmark functions.
Table 4. Composition benchmark functions.
TypeNameDescriptionF(min)
Composition benchmark functionsF19Composition Function 1 (N = 3)2100
F20Composition Function 2 (N = 3)2200
F21Composition Function 3 (N = 4)2300
F22Composition Function 4 (N = 4)2400
F23Composition Function 5 (N = 5)2500
F24Composition Function 6 (N = 5)2600
Table 5. Various AOA variants with two mechanisms.
Table 5. Various AOA variants with two mechanisms.
ModelChaotic MappingGaussian Variation
CGAOA11
CAOA10
GAOA01
AOA00
Table 6. The results of the ablation experiment.
Table 6. The results of the ablation experiment.
AlgorithmRank+/−/=Avg.
CGAOA1~1.3069
CAOA215/3/62.0965
GAOA317/2/52.9181
AOA420/1/33.8011
Table 7. Parameter settings of comparison algorithms.
Table 7. Parameter settings of comparison algorithms.
AlgorithmParameterValue
GWOArea vector a, random vector r1, r2 A ∈ [0, 2], r1 ∈ [0, 1], r2 ∈ [0, 1]
BWOThe probability of reduced wf between regionswf ∈ [0.05–0.1]
SCAConvergence parameter spiral factor a a = 2
SSALeadership position update probability c3c3 = 0.5
SOAControl parameters A, fcA ∈ [0–2], fc = 2
DEScaling factor c, crossover probability pc = 0.5, p = 0.5
MFOConvergence parameter spiral factor c, special parameter bc ∈ [−2, −1], b = 1
FPAProbability switch pp = 0.8
PSOAcceleration constants c1 and c2, inertia weight wc1 = c2 = 2, w ∈ [0.2–0.9]
Table 8. Experimental results on 24 benchmark functions (the best are in bold).
Table 8. Experimental results on 24 benchmark functions (the best are in bold).
FunF1F2F3
AverStdAverStdAverStd
GWO2.9997E+04.6547E-13.3143E+02.0422E-13.1791E+03.5212E-1
BWO3.0770E+01.8606E-13.0340E+01.5253E-13.3282E+01.5460E-1
SCA3.3571E+02.7111E-13.3237E+03.8633E-13.2291E+02.1695E-1
SSA3.1942E+01.4920E-13.4142E+02.2133E-13.2309E+01.7968E-1
SOA3.1887E+01.5777E-13.3139E+02.4405E-13.2282E+03.8577E-1
DE2.9895E+02.9881E-13.5137E+01.5967E-13.1292E+01.9160E-1
MFO2.9662E+02.1549E-13.2237E+02.7874E-13.3277E+02.5340E-1
FPA2.9604E+01.6161E-13.3437E+02.4016E-13.2293E+01.2904E-1
PSO3.0900E+02.7604E-13.1144E+02.6725E-13.3239E+04.6655E-1
CGAOA2.9326E+01.3288E-12.8933E+02.5175E-12.0310E+01.5649E-1
F4F5F6
AverStdAverStdAverStd
GWO1.0719E+14.6951E+02.1570E+26.5842E-11.0986E+18.5764E-0
BWO1.0824E+15.1139E+02.2272E+22.9650E-11.0767E+13.8749E-0
SCA1.0378E+14.2987E+02.3019E+22.2765E-11.1298E+16.9866E-0
SSA1.0681E+11.1846E+12.3671E+26.7768E-11.0721E+15.7972E-0
SOA1.0647E+13.9037E+02.2250E+21.9946E-11.0633E+16.8975E-0
DE1.0214E+14.8932E+02.2497E+27.9160E-11.0578E+11.9244E-0
MFO1.0623E+15.4174E+02.2712E+22.3956E-11.0726E+12.5291E-0
FPA1.0631E+14.4114E+02.2110E+22.7796E-11.0625E+17.2794E-0
PSO1.0734E+11.0935E+12.3799E+23.6425E-11.0742E+15.8692E-0
CGAOA1.0151E+14.2956E+01.9259E+21.5649E-11.0346E+17.5649E-0
F7F8F9
AverStdAverStdAverStd
GWO3.1670E+01.3070E-13.1987E+03.1116E-11.2579E+03.8597E-2
BWO2.9973E+03.3041E-13.2957E+01.8463E-11.2433E+06.9653E-2
SCA3.1508E+02.6584E-13.1406E+02.1921E-11.2172E+03.8924E-2
SSA3.1374E+01.5546E-13.1979E+01.7608E-11.2141E+04.1477E-2
SOA3.0854E+01.4907E-13.2099E+02.4198E-11.2239E+05.3566E-2
DE3.0971E+02.8891E-13.1824E+03.2782E-11.2209E+03.2203E-2
MFO3.2373E+03.0831E-13.2330E+01.3248E-11.2462E+03.6615E-2
FPA3.1890E+03.6180E-13.2270E+02.5491E-11.2016E+02.5762E-2
PSO3.1522E+01.6059E-13.2700E+02.3791E-11.2455E+03.3193E-2
CGAOA2.9512E+03.3340E-13.0196E+07.3299E-21.1807E+05.6026E-2
F10F11F12
AverStdAverStdAverStd
GWO1.2653E+01.0201E-12.1454E+15.8218E-21.0155E+14.9109E-1
BWO1.2338E+06.6893E-22.1436E+11.1449E-11.0202E+16.8390E-1
SCA1.2536E+04.7131E-22.1456E+13.9038E-21.0631E+14.6110E-1
SSA1.2738E+07.5439E-22.1409E+11.2943E-11.0854E+13.4829E-1
SOA1.2371E+01.1242E-22.1433E+18.8435E-21.0284E+15.9602E-1
DE1.2070E+02.8333E-22.1404E+14.3957E-21.0527E+19.2007E-1
MFO1.2174E+04.8090E-22.1411E+13.7122E-21.1215E+15.1018E-1
FPA1.2331E+05.6104E-22.1534E+15.7014E-21.0685E+16.1983E-1
PSO1.2314E+01.9357E-22.1465E+16.9590E-21.0861E+14.7621E-1
CGAOA1.1577E+04.6213E-22.1389E+11.2943E-19.8487E+04.1400E-1
F13F14F15
AverStdAverStdAverStd
GWO2.1476E+17.9116E-21.0612E+12.3848E-11.0758E+14.6697E-1
BWO2.1393E+11.2265E-11.0153E+14.3953E-11.0477E+14.6273E-1
SCA2.1424E+14.1515E-21.0650E+12.9989E-11.0634E+14.9492E-1
SSA2.1436E+16.7325E-21.0681E+11.8724E-11.0849E+16.5998E-1
SOA2.1444E+16.8851E-21.0494E+15.8187E-11.0417E+14.3076E-1
DE2.1431E+18.8886E-21.0802E+12.5304E-11.0474E+11.0579E+0
MFO2.1409E+11.0024E-11.0562E+13.4618E-11.0535E+17.2999E-1
FPA2.1417E+17.7235E-21.0632E+13.4969E-11.1088E+11.4657E-1
PSO2.1455E+18.6014E-21.0936E+12.7408E-11.0793E+15.7992E-1
CGAOA2.1364E+16.1035E-21.0396E+12.3615E-19.7259E+04.6295E-1
F16F17F18
AverStdAverStdAverStd
GWO1.2473E+01.9568E-13.2095E+05.6922E-11.2802E+05.4862E-2
BWO1.2560E+06.9572E-23.3195E+03.1956E-11.2633E+07.5922E-2
SCA1.2523E+04.1935E-23.2871E+01.1905E-11.2459E+02.6752E-2
SSA1.2221E+03.0825E-13.1075E+03.0785E-11.2796E+02.6280E-2
SOA1.2406E+02.9865E-23.3964E+02.0895E-11.2659E+03.8863E-2
DE1.2106E+02.2553E-23.2076E+06.4925E-21.2706E+02.1102E-2
MFO1.2395E+06.7865E-23.2210E+02.4783E-11.2895E+02.3206E-2
FPA1.2122E+03.6956E-23.1552E+02.2935E-11.2469E+02.7853E-2
PSO1.2259E+01.6715E-13.2106E+06.2532E-21.2693E+05.7932E-2
CGAOA1.1826E+06.6054E-22.9722E+05.9782E-21.1807E+05.0823E-2
F19F20F21
AverStdAverStdAverStd
GWO3.0724E+38.3062E+13.1702E+37.7017E+13.0738E+32.6927E+1
BWO3.0826E+33.2694E+13.2408E+33.1103E+13.0567E+32.9714E+1
SCA3.0492E+38.1778E+13.1258E+37.3933E+13.0841E+31.8063E+1
SSA3.0401E+39.0293E+13.1529E+32.2837E+13.0821E+32.6106E+1
SOA3.0644E+31.6985E+23.1685E+34.1144E+13.0769E+35.5635E+0
DE3.1023E+33.8974E+13.1924E+34.4718E+13.0772E+36.3956E+0
MFO2.9619E+31.3894E+23.1458E+32.5557E+13.0710E+31.4702E+1
FPA3.0406E+31.1531E+23.1826E+35.7103E+13.0785E+32.3036E+1
PSO3.0241E+31.2808E+23.1883E+33.3575E+13.0714E+32.3316E+1
CGAOA2.9965E+37.4005E+13.0849E+35.4413E+13.0697E+33.2925E+1
F22F23F24
AverStdAverStdAverStd
GWO3.0777E+32.5562E+13.0735E+31.0757E+13.0896E+32.0343E+1
BWO3.0471E+35.2708E+13.0673E+32.4820E+13.0901E+32.6887E+1
SCA3.0831E+32.7704E+13.0979E+32.0277E+13.0772E+32.3202E+1
SSA3.0741E+31.5444E+13.0807E+39.2561E+03.0760E+31.1781E+1
SOA3.0403E+37.1725E+13.0816E+32.4380E+13.0885E+32.7669E+1
DE3.0519E+38.1391E+13.0709E+32.4522E+13.0881E+33.6343E+0
MFO3.0445E+33.6526E+13.0760E+31.2796E+13.0857E+31.7441E+1
FPA3.0788E+32.2727E+13.0687E+34.9937E+13.0640E+33.5676E+1
PSO3.0622E+32.0521E+13.0902E+32.1660E+13.0794E+32.0797E+1
CGAOA3.0219E+31.4764E+13.0436E+34.0715E+13.0484E+34.6827E+1
Table 9. p-values obtained from the Wilcoxon rank-sum test (F1–F24).
Table 9. p-values obtained from the Wilcoxon rank-sum test (F1–F24).
FunCGAOA
vs.
GWO
CGAOA
vs.
BWO
CGAOA
vs.
SCA
CGAOA
vs.
SSA
CGAOA
vs.
SOA
CGAOA
vs.
DE
CGAOA
vs.
MFO
CGAOA
vs.
FPA
CGAOA
vs.
PSO
F15.02E-067.98E-083.98E-056.79E-109.63E-034.91E-087.92E-068.38E-078.50E-09
F21.07E-068.25E-036.18E-029.22E-116.16E-078.37E-077.27E-052.99E-044.82E-08
F34.08E-034.22E-046.81E-021.69E-104.36E-082.12E-082.72E-108.72E-037.91E-06
F45.30E-091.73E-042.16E-071.28E-034.55E-068.50E-093.11E-033.98E-056.79E-10
F55.30E-091.73E-042.16E-071.28E-034.55E-068.50E-093.11E-031.73E-042.16E-07
F67.27E-052.99E-044.82E-089.13E-0163.36E-083.98E-056.79E-108.42E-131.07E-07
F79.32E-036.37E-092.03E-078.22E-061.03E-053.02E-131.78E-038.42E-131.07E-07
F86.16E-025.18E-067.83E-028.04E-095.31E-039.09E-029.22E-118.22E-061.03E-05
F97.27E-052.99E-044.82E-089.13E-0163.36E-083.98E-056.79E-101.33E-053.06E-04
F108.13E-048.31E-049.11E-101.33E-053.06E-048.33E-071.07E-0118.25E-036.13E-04
F119.32E-036.37E-092.03E-078.22E-061.03E-053.02E-131.78E-038.99E-062.05E-01
F128.38E-043.13E-069.31E-071.03E-099.25E-065.32E-084.08E-034.22E-045.90E-05
F137.27E-052.99E-044.82E-089.13E-0163.36E-081.05E-035.65E-052.88E-036.63E-08
F145.05E-096.28E-093.00E-044.20E-056.98E-049.95E-062.21E-067.37E-091.34E-03
F157.98E-083.98E-056.79E-109.63E-034.91E-085.08E-092.48E-076.85E-062.93E-01
F168.38E-078.50E-093.11E-034.50E-057.22E-016.30E-058.65E-018.08E-035.01E-06
F175.16E-029.01E-057.11E-018.91E-045.24E-076.21E-099.32E-036.37E-092.03E-07
F188.13E-048.31E-049.11E-101.33E-053.06E-048.33E-071.07E-0118.25E-036.13E-04
F196.01E-081.90E-046.60E-052.08E-088.30E-102.98E-072.02E-088.10E-049.19E-08
F204.08E-081.52E-061.08E-065.30E-048.93E-087.66E-121.23E-075.54E-108.92E-13
F217.12E-046.32E-045.15E-016.72E-016.15E-056.52E-077.24E-087.98E-083.98E-05
F221.07E-0118.25E-039.09E-029.22E-116.16E-078.37E-077.27E-052.99E-044.82E-08
F238.38E-078.50E-093.11E-034.50E-059.05E-066.30E-045.05E-096.28E-093.00E-04
F247.98E-083.98E-056.79E-109.63E-034.91E-085.08E-092.48E-076.85E-062.93E-01
Table 10. The hyperparameters to be optimized and their search space boundaries.
Table 10. The hyperparameters to be optimized and their search space boundaries.
Hyperparameter NameDescriptionLower BoundsUpper Bounds
UnitThe number of units in each BiGRU layer32128
dropout_rateThe proportion of neurons randomly dropped during training0.20.5
batch_sizeThe number of samples used in each training iteration32128
learning_rateThe step size of parameter updates during model training0.010.1
attention_columnThe hyperparameters of the Attention layer112
Table 11. Comparison of various models in the power sector.
Table 11. Comparison of various models in the power sector.
Optimization AlgorithmOptimal Solution from the Optimization AlgorithmEvaluation Indicators
UnitdrbslracMAE (MM·T−1)RMSE (MM·T−1)MAPE
Grid Search810.38440.06270.13200.19760.866%
Random Search330.46820.03270.16770.20810.933%
Bayesian 650.28280.06170.09260.15100.720%
GWO620.33720.04160.03620.05330.298%
BWO760.26820.07570.04520.06190.352%
CSA560.44660.02270.03080.05280.277%
SSA540.32680.04860.02910.05060.270%
SOA890.45560.03660.03390.05120.280%
DE610.22620.05590.05190.06920.389%
MFO390.37860.07160.03040.04910.298%
FPA870.41620.03370.02910.04260.251%
PSO380.32600.07170.06360.07100.409%
CGAOA660.33640.04570.02420.03970.205%
Table 12. Comparison of various models in the industry sector.
Table 12. Comparison of various models in the industry sector.
Optimization AlgorithmOptimal Solution from the Optimization AlgorithmEvaluation Indicators
UnitdrbslracMAE (MM·T−1)RMSE (MM·T−1)MAPE
Grid search760.49560.06270.11280.17900.830%
Random search380.46320.05350.13270.19440.862%
Bayesian 700.36360.06260.08900.14710.769%
GWO600.30720.04060.03390.05020.270%
BWO760.26820.07570.04130.05880.310%
CSA560.44660.02360.02930.05030.259%
SSA560.33680.05070.02770.04930.249%
SOA860.41500.03360.03100.05010.265%
DE600.22620.05480.04980.06300.301%
MFO380.33800.06670.02980.04030.277%
FPA820.36600.02970.02730.04010.232%
PSO360.30600.07070.05910.06420.397%
CGAOA630.31580.03970.01330.02260.198%
Table 13. Comparison of various models in the transport sector.
Table 13. Comparison of various models in the transport sector.
Optimization AlgorithmOptimal Solution from the Optimization AlgorithmEvaluation Indicators
UnitdrbslracMAE (MM·T−1)RMSE (MM·T−1)MAPE
Grid search640.23680.04260.11960.18210.860%
Random search460.28380.05180.14010.20660.896%
Bayesian 680.26440.06280.08380.15660.733%
GWO620.28620.03670.03420.05140.270%
BWO700.21800.06680.03960.05020.287%
CSA620.38580.04460.03290.05860.274%
SSA480.34640.07170.02620.04730.272%
SOA800.33660.02170.02880.04760.252%
DE560.26640.05870.06110.07220.420%
MFO380.29660.06070.01700.03200.239%
FPA640.41500.02670.03220.04570.250%
PSO520.32520.08180.06030.06730.406%
CGAOA660.35640.04470.01350.01690.195%
Table 14. Comparison of various models in the resident sector.
Table 14. Comparison of various models in the resident sector.
Optimization AlgorithmOptimal Solution from the Optimization AlgorithmEvaluation Indicators
UnitdrbslracMAE (MM·T−1)RMSE (MM·T−1)MAPE
Grid search740.30400.06580.10130.19880.875%
Random search300.55760.05560.16850.20770.913%
Bayesian 680.39260.05560.08900.15330.713%
GWO600.38660.04370.03110.05080.294%
BWO780.28800.07070.04120.06670.357%
CSA500.41660.05080.03110.04400.292%
SSA650.38660.05260.02890.05110.264%
SOA800.39500.04160.03080.05060.290%
DE550.20660.05160.05200.06610.320%
MFO420.41800.05080.03110.05220.303%
FPA860.41600.03170.02460.05080.257%
PSO580.43600.07860.06330.07060.401%
CGAOA650.31600.04770.02390.03880.230%
Table 15. Comparison of different models in four sectors.
Table 15. Comparison of different models in four sectors.
SectorModelMAE (MM·T−1)RMSE (MM·T−1)MAPERank
Aver(30 times)
PowerARMA0.28750.28500.821%8
ARIMA0.18920.25420.833%7
SVM0.15560.19600.733%6
ANN0.11060.12970.663%5
GRU0.07280.11060.520%3
LSTM0.07720.12010.568%4
BiGRU0.05820.08810.413%2
AttBiGRU0.02400.03910.213%1
Industry ARMA0.34130.38910.966%8
ARIMA0.30300.32310.901%7
SVM0.23440.26260.813%6
ANN0.20880.22920.793%5
GRU0.16330.17990.632%3
LSTM0.16110.18620.628%4
BiGRU0.13590.10820.539%2
AttBiGRU0.01380.02290.198%1
Transport ARMA0.29810.31920.947%8
ARIMA0.26510.26620.933%7
SVM0.23130.23350.832%6
ANN0.18010.19950.723%5
GRU0.14920.18030.697%3
LSTM0.14520.17030.632%4
BiGRU0.11820.15910.583%2
AttBiGRU0.01290.01630.188%1
Resident ARMA0.31420.29920.981%8
ARIMA0.26300.28560.951%7
SVM0.22380.23310.891%6
ANN0.17620.20590.703%5
GRU0.14620.16910.613%3
LSTM0.14320.16730.603%4
BiGRU0.11380.15850.532%2
AttBiGRU0.02510.03910.236%1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, H.; Wu, Y.; Tan, D.; Chen, Y.; Wang, H. CGAOA-AttBiGRU: A Novel Deep Learning Framework for Forecasting CO2 Emissions. Mathematics 2024, 12, 2956. https://doi.org/10.3390/math12182956

AMA Style

Liu H, Wu Y, Tan D, Chen Y, Wang H. CGAOA-AttBiGRU: A Novel Deep Learning Framework for Forecasting CO2 Emissions. Mathematics. 2024; 12(18):2956. https://doi.org/10.3390/math12182956

Chicago/Turabian Style

Liu, Haijun, Yang Wu, Dongqing Tan, Yi Chen, and Haoran Wang. 2024. "CGAOA-AttBiGRU: A Novel Deep Learning Framework for Forecasting CO2 Emissions" Mathematics 12, no. 18: 2956. https://doi.org/10.3390/math12182956

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop