Multi-Scale Temporal Integration for Enhanced Greenhouse Gas Forecasting: Advancing Climate Sustainability

Wang, Haozhe; Mei, Yuqi; Ren, Jingxuan; Zhu, Xiaoxu; Qian, Zhong

doi:10.3390/su17083436

Open AccessArticle

Multi-Scale Temporal Integration for Enhanced Greenhouse Gas Forecasting: Advancing Climate Sustainability

by

Haozhe Wang

¹,

Yuqi Mei

²,

Jingxuan Ren

²,

Xiaoxu Zhu

² and

Zhong Qian

^2,*

¹

School of Business, Soochow University, Suzhou 215021, China

²

School of Computer Science &Technology, Soochow University, Suzhou 215021, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(8), 3436; https://doi.org/10.3390/su17083436

Submission received: 26 February 2025 / Revised: 7 April 2025 / Accepted: 9 April 2025 / Published: 12 April 2025

(This article belongs to the Collection Air Pollution Control and Sustainable Development)

Download

Browse Figures

Versions Notes

Abstract

:

Greenhouse gases (GHGs) significantly shape global climate systems by driving temperature rises, disrupting weather patterns, and intensifying environmental imbalances, with direct consequences for human life, including rising sea levels, extreme weather, and threats to food security. Accurate forecasting of GHG concentrations is crucial for crafting effective climate policies, curbing carbon emissions, and fostering sustainable development. However, current models often struggle to capture multi-scale temporal patterns and demand substantial computational resources, limiting their practicality. This study presents MST-GHF (Multi-Scale Temporal Greenhouse Gas Forecasting), an innovative framework that integrates daily and monthly CO₂ data through a multi-encoder architecture to address these challenges. It leverages an Input Attention encoder to manage short-term daily fluctuations, an Autoformer encoder to capture long-term monthly trends, and a Temporal Attention mechanism to ensure stability across scales. Evaluated on a fifty-year NOAA dataset from Mauna Loa, Barrow, American Samoa, and Antarctica, MST-GHF surpasses 14 baseline models, achieving a Test_R² of 0.9627 and a Test_MAPE of 1.47%, with notable stability in long-term forecasting. By providing precise GHG predictions, MST-GHF empowers policymakers with reliable data for crafting targeted climate policies and conducting scenario simulations enabling proactive adjustments to emission reduction strategies and enhancing sustainability by aligning interventions with long-term environmental goals. Its optimized computational efficiency, reducing resource demands compared to Transformer-based models, further strengthens sustainability in climate modeling, making it deployable in resource-limited settings. Ultimately, MST-GHF serves as a robust tool to mitigate GHG impacts on climate and human life, advancing sustainability across environmental and societal domains.

Keywords:

greenhouse gas forecasting; time series prediction; multi-scale modeling; deep learning; environmental protection; climate sustainability

1. Introduction

Climate change is a global challenge impacting all of humanity and sustainability, and greenhouse gas emissions data play a crucial role in assessing the effectiveness of environmental policies and understanding the dynamic changes in the global climate system [1,2,3]. As a key parameter for quantifying climate system states, greenhouse gas concentration reflects the imbalance in the Earth’s carbon cycle, far surpassing the significance of regional or national emissions data [4,5,6]. By integrating diverse data sources—atmospheric monitoring stations, satellite remote sensing, and ground-based observations—GHG concentration data provides accurate insights into human activity, ecosystem carbon sink capacity, and the effectiveness of climate policies [7,8,9]. Notably, CO₂, as a primary GHG, plays a dual role in sustainability—it is both a key driver of climate change and a crucial target for mitigation efforts. At the same time, it can be effectively harnessed through carbon capture and utilization technologies, transforming emissions into valuable resources and promoting sustainable development [10,11]. Consequently, accurately forecasting GHG concentrations, particularly CO₂, is essential for predicting global climate trends, supporting international climate negotiations like the Paris Agreement, and aiding governments in optimizing emission reduction strategies, identifying critical points in the carbon peak process, and enhancing climate governance [12,13].

Accurately capturing the balance between carbon sources (e.g., industrial emissions, deforestation) and sinks (e.g., forests, oceans) is essential for reliable GHG concentration predictions. Multi-dimensional modeling approaches are employed to assess natural systems, human activities, policy interventions, and technological innovations, offering a holistic view of these interactions [14,15]. Historical data analysis, including trend decomposition and anomaly detection, unveils the mechanisms driving carbon cycle dynamics, particularly near critical thresholds such as permafrost thawing or ocean acidification [16,17]. However, policy analysis and scenario simulations, while vital for evaluating international agreements, carbon pricing reforms, and clean energy promotion, are fraught with uncertainty due to their dependence on political will, economic transitions, and unpredictable factors like technology diffusion [18,19]. In contrast, data-driven analysis, rooted in observational data and scientific principles, provides a more stable and repeatable basis for prediction [20].

Traditional climate models, such as Global Circulation Models (GCMs) and Regional Climate Models (RCMs), have been instrumental in simulating atmospheric dynamics. Smith et al., for example, developed a GCM that improved decadal predictions, marking a notable advancement [21]. However, GCMs have several limitations. They often underestimate the periods and amplitudes of major atmospheric oscillations—for instance, the SINTEX coupled model simulates ENSO cycles that are shorter than observed (3–4 years vs. 4–7 years) [22]. Additionally, limited resolution and parameterization deficiencies lead to an underestimation of AMO variability, affecting long-term climate sensitivity projections [23]. Furthermore, GCMs frequently decouple the climate system from the carbon cycle, excluding critical land and ocean carbon uptake feedbacks and thereby underestimating the impacts of carbon–climate interactions [24]. These shortcomings highlight the need for more adaptable and precise modeling approaches. Consequently, these traditional methods struggle to tackle today’s sustainability challenges, as they fail to fully connect greenhouse gas effects on climate and carbon emissions with human life and ecosystem dynamics, underscoring the demand for innovative solutions to achieve effective sustainable development.

As the demand for precise climate predictions grows, data-driven methods have gained traction, leveraging vast datasets to uncover patterns traditional models overlook. Machine learning techniques like Random Forests [25] and Gradient Boosting Decision Trees [26] excel at handling complex high-dimensional data, identifying nonlinear relationships, and pinpointing regional emission hotspots. For instance, Zhang Jianxun et al. applied XGBoost to predict carbon emissions in expanding megacities, demonstrating its utility for urban sustainability planning [27]. However, the complexity of climate systems arises from the nonlinear interactions among thousands of variables, making it difficult for traditional machine learning models to capture multi-scale coupling features [28]. Additionally, purely data-driven models, such as MLP and SVR, may violate fundamental physical principles like energy conservation, leading to systematic errors. For instance, some models misclassify wet months as dry when simulating seasonal precipitation cycles, underscoring the limitations of purely statistical approaches [29]. Consequently, these methods fall short in addressing current sustainability challenges because they cannot deliver multi-scale, long-term precision in forecasting greenhouse gas effects on climate and carbon emissions critical for human life and ecosystems, necessitating more advanced frameworks to ensure sustainable progress [30,31,32].

Deep learning has further transformed GHG forecasting by capturing spatio-temporal dependencies. Long Short-Term Memory (LSTM) networks [33] and Spatio-Temporal Graph Convolutional Networks (STGCNs) have proven effective at modeling atmospheric transport and regional carbon dynamics. Zhang Lei et al. combined CNNs with LSTM to forecast soil organic carbon, showcasing regional applicability [34], while Panja et al.’s E-STGCN model achieved robust air quality predictions in Delhi across seasons [35]. Despite their strengths, these models falter in data-scarce regions and may produce physically inconsistent extrapolations, limiting their generalizability. The Transformer architecture [36], with its self-attention mechanism, has opened new avenues for modeling global atmospheric processes. Wu Xingping et al. developed a temporal graph Transformer-based neural network, achieving an 89.5% accuracy rate in carbon emission predictions [37]. Yet, despite their high precision, Transformers suffer from significant drawbacks: their vast parameter sizes and computational demands make them challenging to deploy in resource-constrained settings, while their lack of multi-scale temporal modeling hinders their ability to capture both short- and long-term GHG dynamics effectively [38,39]. Consequently, these limitations render them inadequate for tackling current sustainability challenges, as they struggle to provide the comprehensive, multi-scale forecasts of greenhouse gas impacts on climate and carbon emissions needed to support human life and ecosystems, highlighting the need for more practical and versatile solutions to advance sustainability. Emerging models like Temporal Convolutional Networks (TCN) [40], DA-RNN [41], Autoformer [42], and Informer [43] have achieved notable success in certain time series forecasting tasks. However, their limitations are evident: they lack multi-scale capabilities, restricting their ability to integrate short- and long-term dynamics, and they have not been applied to address climate sustainability development, hindering their effectiveness for comprehensive GHG forecasting.

To address these gaps, this study introduces the MST-GHF framework, a novel multi-encoder fusion approach for cross-scale greenhouse gas (GHG) concentration forecasting. The true innovation lies in its multi-time-resolution design, which integrates daily and monthly CO₂ data, and its hybrid use of specialized encoders, akin to ensemble machine learning where diverse models combine their strengths. Optimized Input Attention and Autoformer Encoders, paired with LSTM units, extract features across temporal scales: the Input Attention Encoder excels at capturing short-term fluctuations (e.g., daily weather impacts), while the Autoformer Encoder models long-term patterns (e.g., seasonal or policy-driven trends). A Temporal Attention mechanism drives multi-step forecasting, ensuring robust accuracy and stability across datasets. By harmonizing high- and low-frequency dynamics, MST-GHF surpasses the limitations of single-resolution models like TCN, DA-RNN, and Autoformer. This hybrid architecture boosts predictive accuracy—outperforming baselines such as LSTM and Transformers—while enhancing computational efficiency for real-time use, balancing interpretability and power through its temporal context integration.

Through the MST-GHF framework, this study seeks to improve the precision of long-term greenhouse gas (GHG) forecasting while reducing the training and inference costs associated with high-precision deep learning models applied to long-term GHG prediction tasks, with the aim of promoting climate sustainability development and thereby advancing overall sustainability. Experimental results indicate that this framework not only enhances prediction accuracy—surpassing all baseline models in accuracy tests with a substantial margin—but also improves computational efficiency.

This research delivers more accurate predictive data to climate policymakers, providing a reliable basis for precise decision-making as GHG emissions near critical thresholds, while advancing sustainability science research. By facilitating the evaluation of climate interventions—such as carbon taxes and ecosystem restoration—it enhances our understanding of sustainable practices. Consequently, the MST-GHF framework optimizes global carbon management, accelerates the transition to a low-carbon economy, and drives progress toward climate sustainability.

2. Materials and Methods

2.1. Data Collection

The dataset consists of daily averages of Carbon Dioxide (CO₂) in situ measurements (Atmospheric Carbon Dioxide Dry Air Mole Fraction) from the Barrow Atmospheric Baseline Observatory in the United States, collected by the National Oceanic and Atmospheric Administration (NOAA) [44]. The time span of the dataset is from 24 July 1973 to 30 April 2024, comprising 16,597 daily data points.

This dataset, part of NOAA’s Global Greenhouse Gas Reference Network, is crucial for analyzing spatiotemporal patterns in greenhouse gas emissions and removals, supporting carbon management, and providing early warnings of climatic anomalies [44]. Its high temporal resolution and long duration make it essential for validating models addressing both regional phenomena and global trends, as well as for evaluating multi-temporal hybrid models.

2.2. Data Processing

This study uses a sliding window approach to create time-series datasets for multi-step forecasting. Each window consists of 100 consecutive historical data points (input data points), spanning from time t to t + 99, where t ranges from 1 to 16,488. The model predicts the next 10 time steps (output prediction points). With a total of 16,597 data points, this method generates 16,488 samples, resulting in an input matrix of size 16,488 × 100 and a target matrix of 16,488 × 10.

To capture both short-term and long-term temporal patterns, a monthly dataset is created by computing 30-day moving averages from the daily data. For each daily input sequence, a corresponding monthly sequence is generated by averaging every 30 consecutive daily values. This enables the model to effectively capture multi-scale temporal patterns.

The model incorporates an input attention mechanism that dynamically adjusts the importance of historical data within each time window. Initially, all input variables (

x_{t}

to

x_{t + 99}

, t ∈ [1, 16, 488]) are uniformly weighted. During training, these weights are iteratively optimized through backpropagation to prioritize temporally significant features. The network architecture directly maps the 100-dimensional input to a 10-dimensional output, enabling end-to-end prediction of the future 10-step sequences.

The dataset is split into 13,190 training samples (80%) and 3298 testing samples (20%) for evaluation. For each sample, the model’s 10-step output is divided into 10 separate prediction sequences (Step 1 to Step 10). Each sequence (13,190 × 1 for training and 3298 × 1 for testing) is evaluated using R², MSE, MAE, and MAPE to assess the model’s performance across the forecast horizon. Further details on the model architecture and computational process are provided below.

2.3. Methods

The proposed model employs a hierarchical encoder–decoder architecture for multi-scale time series forecasting. Raw data are preprocessed into daily and monthly datasets (see Section 2.2). During encoding, daily data are processed using an Input Attention mechanism to capture short-term dependencies, while monthly data are handled by an Autoformer encoder to extract long-term periodic patterns. The outputs from both encoders are concatenated to form a unified latent representation. In the decoding phase, a Temporal Attention mechanism assigns dynamic weights to this representation, emphasizing key temporal features, which are then projected to the target dimension to predict the next 10 time steps. This architecture effectively combines the sensitivity of daily data with the periodicity of monthly data for multi-scale temporal modeling. Further details are provided in the following sections, with an overview in Figure 1.

2.3.1. Input Attention Mechanism

The input attention mechanism (input-attn) is particularly effective for handling fine-grained data, such as daily time series, due to its ability to adaptively select and weight input features at each time step. Given the frequent and complex variations in fine-grained data, this mechanism dynamically adjusts weights to highlight the most relevant inputs [41]. It also manages multiple driving series within the time window (set to 100 in this study) and suppresses noise, thereby enhancing model robustness [41]. By recalculating attention weights at each step, the mechanism adapts to local feature fluctuations and improves interpretability, making it well-suited for fine-grained time series forecasting.

Next, we introduce how the Input Attention Mechanism is used to process the daily input sequence

X_{D} = (x_{D_{1}}, x_{D_{2}}, \dots, x_{D_{T_{d}}})

. This mechanism combines an Input Attention Unit, which selects and weights relevant inputs, with LSTM units that generate the corresponding hidden state sequence.

Specifically, for each time step t, we first compute the basic hidden state

h_{t}

through the LSTM unit:

h_{t} = f_{1} (h_{t - 1}, x_{t})

(1)

where

h_{t}

is the hidden state at time t,

h_{t - 1}

is the previous hidden state,

x_{t}

is the input sequence at t (daily data), and

f_{1}

represents the LSTM unit.

Next, we apply the input attention mechanism to adaptively weight features of the driving sequence, enhancing focus on relevant inputs and suppressing noise. For each feature k (where k = 1, …, n and n is the number of features), we calculate an attention score as follows:

e_{t}^{k} = v_{e}^{T} \tanh (W_{e} [h_{t - 1}; s_{t - 1}] + U_{e} x_{t}^{k})

(2)

where

e_{t}^{k}

is the attention score for feature k at time t,

v_{e}^{T}

is the attention weight vector,

W_{e}

is the weight matrix for the hidden and cell states,

[h_{t - 1}; s_{t - 1}]

concatenates the previous hidden state

h_{t - 1}

and cell state

s_{t - 1}

from the LSTM,

U_{e}

is the weight matrix for the input feature

x_{t}^{k}

(the k-th component of

x_{t}

), and

t a n h

is the activation function. In this experiment, we predict only concentration values, thus n = 1, and k indexes a single feature. The attention weights are then normalized as follows:

α_{t}^{k} = \frac{\exp (e_{t}^{k})}{\sum_{i = 1}^{n} \exp (e_{t}^{i})}

(3)

where

α_{t}^{k}

is the attention weight for feature k, computed via softmax over all n features.

Using these weights, we construct a new weighted driving sequence as follows:

{\tilde{x}}_{t} = {(α_{t}^{1} x_{t}^{1}, α_{t}^{2} x_{t}^{2}, \dots, α_{t}^{n} x_{t}^{n})}^{T}

(4)

where

{\tilde{x}}_{t}

is the updated input vector, and

x_{t}^{k}

is the k-th feature of the original input

x_{t}

, scaled by its corresponding weight

α_{t}^{k}

.

Using the new driving sequence, we update the hidden state

h_{t}

as follows:

h_{t} = f_{1} (h_{t - 1}, {\tilde{x}}_{t})

(5)

Thus, we obtain the hidden state sequence for daily data as follows:

H_{D} = (h_{D_{1}}, h_{D_{2}}, \dots, h_{D_{T_{d}}})

(6)

where

H_{D}

is the daily hidden sequence,

h_{D_{i}}

is the hidden state at daily step i, and

T_{d}

is the total number of daily time window, set to 100 in this study.

2.3.2. Autoformer Encoder

The Autoformer Encoder is well-suited for coarse-grained data, such as monthly time series, due to its decomposition-based architecture and autocorrelation mechanism. By integrating time series decomposition, Autoformer progressively extracts long-term trends and seasonal components—crucial for modeling extended temporal dependencies [42]. Its autocorrelation module captures cycle-based dependencies by computing autocorrelation across subsequences, effectively identifying periodic patterns. Moreover, with a computational complexity of O(Llog L), substantially lower than the O(L²) of traditional self-attention, it efficiently handles long sequences [42]. These features make Autoformer particularly effective for modeling complex and long-term temporal patterns in coarse-grained data.

Next, we describe how the Autoformer processes the monthly input sequence

X_{M} = (x_{M_{1}}, x_{M_{2}}, \dots, x_{M_{T_{m}}})

. The Autoformer encoder, composed of a series decomposition block (Series Decomp), an auto-correlation mechanism (Auto-Correlation), and a feed-forward neural network (Feed Forward), extracts long-term trends and seasonal patterns from the input, producing the hidden state sequence for monthly data.

At each time step t, the input sequence is first passed through a series decomposition block. Since future sequences cannot be directly decomposed, Autoformer introduces this block as an internal operation to progressively extract long-term trends from predicted intermediate hidden states. The decomposition smooths out periodic fluctuations using moving averages, thereby emphasizing stable trends. For an input sequence

X_{M} \in R^{L \times d}

, where

L

is the sequence length and

d

the feature dimension, the computation proceeds as follows:

X_{t} = A v g P o o l (P a d d i n g (X_{M}))

(7)

X_{s} = X_{M} - X_{t}

(8)

where

X_{s} {, X}_{t} \in R^{L \times d}

denote the seasonal and trend components, respectively. This is denoted as

X_{s} {, X}_{t} = S e r i e s D e c o m p (X_{M})

, where AvgPool applies a moving average with a fixed kernel size, and Padding ensures boundary consistency during the averaging process.

Next, the auto-correlation mechanism replaces traditional self-attention by capturing cycle-based dependencies via time-delay aggregation. For a discrete-time process

\{X_{t}\}

, autocorrelation at lag

τ

is computed as follows:

R_{x x} (τ) = \lim_{L \to \infty} \frac{1}{L} \sum_{1}^{L} X_{t} X_{t - τ}

(9)

This is efficiently implemented using Fast Fourier Transform (FFT) based on the Wiener–Khinchin theorem [42], and the top

k = ⌊c \times l o g L⌋

lags

τ_{1}

,…,

τ_{k}

are selected, where c is a hyperparameter.

The

R o l l

operation shifts the value matrix across time delays

τ_{i}

, aligning repeating patterns across periods. These shifted values are weighted and aggregated using softmax-normalized autocorrelation scores:

τ_{1}, \dots, τ_{k} = a r g T o p k (R_{Q, K} (τ)), τ \in \{1, \dots, L\}

(10)

{\hat{R}}_{Q, K} (τ_{1}), \dots, {\hat{R}}_{Q, K} (τ_{k}) = s o f t m a x (R_{Q, K} (τ_{1}), \dots, R_{Q, K} (τ_{k}))

(11)

A u t o - C o r r e l a t i o n (Q, K, V) = \sum_{i = 1}^{k} R o l l (V, τ_{i}) {\hat{R}}_{Q, K} (τ_{i})

(12)

where

R_{Q, K} (τ)

is the autocorrelation between

Q

and

K

,

a r g T o p k

selects the

k

highest

τ

values,

\hat{R}

are normalized weights, and

R o l l (V, τ_{i})

shifts

V

by

τ_{i}

, reintroducing overflow elements at the end.

For multi-head mechanisms with hidden variable channels

d_{m o d e l}

, and number of heads

h

, each head processes a subspace:

{h e a d}_{i} = A u t o - C o r r e l a t i o n (Q_{i}, K_{i}, V_{i}), Q_{i}, K_{i}, V_{i} \in R^{L \times \frac{d_{m o d e l}}{h}}, i \in \{1, \dots, h\}

(13)

M u l t i H e a d (Q, K, V) = W_{O} * C o n c a t ({h e a d}_{1}, \dots, {h e a d}_{h})

(14)

where

W_{O}

\in R^{d_{m o d e l} \times d}

is the output projection matrix, and

C o n c a t

merges head outputs.

The result, denoted

h_{t}^{a t t n}

, is transformed via a feed-forward network, mirroring Transformer’s design:

F e e d F o r w a r d (x) = R e L U (x W_{1} + b_{1}) W_{2} + b_{2}

(15)

h_{t}^{f f n} = F e e d F o r w a r d (h_{t}^{a t t n})

(16)

where

W_{1}, W_{1}

are weight matrices,

b_{1}, b_{2}

are biases, and

R e L U

is the activation function.

Residual connections and layer normalization refine the output as follows:

h_{t} = L a y e r N o r m (h_{t}^{a t t n} + h_{t}^{f f n})

(17)

Finally, the hidden sequence for monthly data are given by:

H_{M} = (h_{M_{1}}, h_{M_{2}}, \dots, h_{M_{T_{m}}})

(18)

where

H_{M}

is the daily hidden sequence,

h_{M_{i}}

is the hidden state at monthly step i, and

T_{m}

is the total number of monthly time steps.

2.3.3. Concating Phase

After encoding, the hidden state sequences

H_{D}, H_{M}

from the two encoders are concatenated along the feature dimension to form the final hidden sequence

H = (h_{1}, h_{2}, \dots, h_{T})

, which integrates multi-time-resolution features. This fused sequence is then passed to the decoder for final prediction.

H = C o n c a t (H_{D}, H_{M})

(19)

2.3.4. Temporal Attention Mechanism

The Temporal Attention mechanism is well-suited for decoders, as it effectively addresses long-term dependency challenges often encountered in traditional encoder–decoder architectures. Unlike RNNs, which rely heavily on the encoder’s final hidden state and struggle to retain earlier time-step information, the Temporal Attention mechanism enables the decoder to dynamically attend to hidden states across the entire input sequence [41]. This flexibility allows the decoder to focus on the most relevant time steps for each prediction, regardless of their temporal distance [41].

Next, we outline the use of the Temporal Attention mechanism for decoding the final hidden data and generating predictions. The final hidden sequence

H = (h_{1}, h_{2}, \dots, h_{T})

passed into the Temporal Mechanism to decode the final prediction sequence. The Temporal Mechanism includes the Temporal Attention Unit and LSTM unit, where the Temporal Attention Unit is used to select the hidden states of the encoders, and the LSTM unit updates the decoder’s hidden state and outputs the final prediction

ŷ_{T}

.The formulation of Temporal Attention is fundamentally consistent with the Input Attention Mechanism in Section 2.3.1. For detailed parameter meanings, please refer to that section.

Specifically, for each time step

t

, we first compute the time attention scores as follows:

l_{t}^{i} = v_{d}^{T} t a n h (W_{d} [d_{t - 1}; s_{t - 1}^{'}] + U_{d} h_{i})

(20)

where

l_{t}^{i}

is the attention score gauging the relevance of encoder hidden state

h_{i}

to the current decoding step.

Then, the attention weights are normalized as follows:

β_{t}^{i} = \frac{\exp (l_{t}^{i})}{\sum_{j = 1}^{T} \exp (l_{t}^{j})}

(21)

where

β_{t}^{i}

determines the contribution of

h_{i}

to the context.

Next, the context vector is computed as follows:

c_{t} = \sum_{i = 1}^{T} β_{t}^{i} h_{i}

(22)

where

c_{t}

aggregates encoder information for decoding at t.

The decoder input is combined with the context vector to generate the new input as follows:

{\tilde{y}}_{t - 1} = {\tilde{w}}^{T} [y_{t - 1}; c_{t - 1}] + \tilde{b}

(23)

The decoder’s hidden state is updated as follows:

d_{t} = f_{1} (d_{t - 1}, {\tilde{y}}_{t - 1})

(24)

where

d_{t}

evolves the decoder’s memory, using the same LSTM unit

f_{1}

as in Section 2.3.1.

Finally, we output the prediction as follows:

ŷ_{T} = v_{y}^{T} (W_{y} [d_{t}; c_{t}] + b_{w}) + b_{v}

(25)

where

ŷ_{T}

is the final output at time T, derived from the combined decoder state and context. Notably, the projection vector

v_{y}^{T}

and scalar bias

b_{v}

adjust the dimensions of the transformed vector

W_{y} [d_{t}; c_{t}] + b_{w}

to produce the final prediction

ŷ_{T}

.

2.4. Metric

The following are the formulas for calculating the goodness-of-fit of the model:

R^{2}

, Coefficient of Determination:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(26)

M S E

, Mean Squared Error:

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(27)

M A E

, Mean Absolute Error:

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(28)

M A P E

, Mean Absolute Percentage Error:

M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}|

(29)

In Equations (26)–(29),

y_{i}

represents the true value,

{\hat{y}}_{i}

represents the predicted value,

\bar{y}

represents the mean of the true values,

n

represents the sample size.

2.5. Experiment Conditions

In this experiment, we used the Optuna framework to efficiently tune key hyperparameters such as learning rate and batch size for the MST-GHF model and 14 baseline models, ensuring fair comparison under optimal conditions. To enhance reproducibility, we fixed random seeds and GPU settings, conducting only one trial per model to eliminate variability [45]. The experiment ran for 500 iterations on an NVIDIA 3060 GPU. Optuna’s Bayesian optimization and controlled settings minimized bias and randomness, enabling consistent results that can be reliably replicated by other researchers [46].

3. Results

3.1. Overall Performances

The MST-GHF framework demonstrates exceptional predictive accuracy in GHG forecasting to improve the sustainable development, surpassing 14 baseline models across key evaluation metrics. MST-GHF achieves a Test_R² of 0.9627, exceeding DARNN (0.9519) by 1.1%, Autoformer (0.9480) by 1.5%, and TCN (0.9408) by 2.2%, illustrating its superior ability to model greenhouse gas (GHG) concentration trends. Additionally, its Test_MSE of 0.0002878 is 22.7% lower than DARNN’s 0.0003716 and 37.1% lower than TCN’s 0.0004571, highlighting its significantly reduced forecasting error. MST-GHF also achieves the lowest Test_MAPE at 0.0147 (1.47%), outperforming Autoformer’s 0.0175 (1.75%) by 15.7% and GRU’s 0.0367 (3.67%) by nearly 60%, demonstrating its exceptional precision in minimizing relative errors. This superior accuracy results from MST-GHF’s hybrid multi-encoder architecture, which effectively integrates both daily and monthly data to capture short-term fluctuations and long-term trends. The Input Attention Encoder reduces high-frequency noise in daily data, allowing the model to prioritize the most critical short-term features, while the Autoformer Encoder leverages decomposition and autocorrelation mechanisms to extract periodic patterns and long-term trends.

The cross-scale fusion of these encoders enhances MST-GHF’s ability to outperform single-resolution models like TCN and traditional recurrent models like LSTM, which struggle to maintain stable long-term predictions. This allows MST-GHF to ensure tighter alignment with actual CO₂ values, as visualized in Figure 2 across ten steps, further underscoring its robustness and precision in forecasting. All the experimental data can be found in Appendix A Table A1.

From a sustainability perspective, this enhanced predictive accuracy has significant implications. Reliable GHG forecasts provide policymakers with a robust scientific basis for evaluating climate policies, refining carbon reduction strategies, and designing proactive mitigation measures. Furthermore, minimizing forecasting errors helps prevent misallocation of resources in climate action initiatives and ensures that emission targets are based on accurate trend assessments. Improved forecasting precision also supports corporate sustainability planning, aiding businesses in optimizing energy consumption, implementing carbon offset programs, and achieving net-zero emissions more effectively.

3.2. Multi-Step Forecasting

MST-GHF exhibits remarkable stability in multi-step GHG forecasting, with minimal accuracy degradation over extended prediction horizons. Its Test_R² decreases only 3.3% from 0.9788 at Step 1 to 0.9470 at Step 10, outperforming DARNN, which declines by 3.9% (0.9710 → 0.9334), Autoformer by 4.2% (0.9693 → 0.9284), and TCN by 4.6% (0.9651 → 0.9193). This controlled error growth underscores MST-GHF’s advantage in long-term stability, driven by its Temporal Attention mechanism, which dynamically adjusts feature weighting across different time scales to prevent the accumulation of forecasting errors.

A detailed breakdown of forecasting error metrics further highlights MST-GHF’s robustness. Its Test_MSE increases from 0.000163 at Step 1 to 0.000410 at Step 10, reflecting a 2.5 × increase, whereas Informer’s MSE rises by 2.2 × (0.000416 → 0.000926) and LSTM’s by 1.5 × (0.000678 → 0.000992), indicating that MST-GHF maintains a more controlled error propagation. Similarly, Test_MAPE increases from 1.07% at Step 1 to 1.83% at Step 10, remaining consistently lower than DARNN’s 2.03% and Autoformer’s 2.11% at the same step. Figure 3 illustrates this stability, showing that MST-GHF’s predictions remain closely aligned with actual values even in extended forecasts, unlike BiTransformer_LSTM and Informer, which exhibit greater deviation at later time steps. All the experimental data can be found in Appendix A Table A2, Table A3, Table A4 and Table A5.

This stability is crucial from a sustainability perspective. Reliable multi-step forecasting ensures that long-term climate projections remain actionable, allowing policymakers to formulate effective intervention strategies with minimal uncertainty. High accuracy over extended time horizons also enables precise planning for carbon credit markets, emissions trading schemes, and large-scale reforestation efforts. Furthermore, minimizing forecasting deviations enhances the credibility of climate impact models used to assess future environmental risks, ultimately leading to better-informed decisions on climate adaptation and mitigation policies.

3.3. Ablation Experiment

To systematically assess the contributions of different model components, we conducted ablation experiments by removing individual modules and analyzing the resulting performance changes. The tested configurations included: (1) the complete MST-GHF model, (2) MST-GHF without the Input-Attn Mechanism, (3) MST-GHF with the Autoformer Encoder replaced by a standard LSTM, and (4) MST-GHF without the Temporal-Attn Mechanism, replacing it with a simple linear layer for output mapping. To ensure comparability, all experiments maintained consistent training settings, including hyperparameters, dataset splits, and initialization seeds. The specific data from these experiments can be found in Appendix A Table A6, Table A7, Table A8, Table A9 and Table A10.

The results underscore the critical role of each module in enhancing prediction accuracy and stability. Removing the Input Attention Mechanism led to a sharp decline in performance, with Test_R² dropping from 0.9627 to 0.9165—a 4.8% reduction—while Test_MSE surged by 123.9% (0.000288 → 0.000645). This highlights the mechanism’s effectiveness in filtering noise and selecting the most relevant short-term features from daily data.

The absence of the Autoformer Encoder resulted in a 1.5% drop in Test_R² (0.9627 → 0.9480) and a 39.2% increase in Test_MSE, indicating its essential role in capturing long-term periodic trends. Multi-step forecasting further revealed the Autoformer’s advantages: accuracy at Step 5 declined to 0.9267 (compared to 0.9656 in the full model), and at Step 10, it fell to 0.9092 (compared to 0.9470). This sharp degradation suggests that the Autoformer’s decomposition and autocorrelation mechanisms are crucial for identifying hidden periodic patterns in CO₂ variations. Without it, the model struggles to maintain stability over extended forecasting horizons, as conventional LSTMs lack the ability to extract long-range dependencies effectively.

The most severe performance deterioration occurred when removing the Temporal Attention Mechanism. Test_R² plummeted to 0.9077 (a 5.7% decline), and Test_MSE increased by 147.6% (0.000288 → 0.000713). Multi-step evaluation showed that R² at Step 10 dropped to 0.8883—significantly lower than the full model’s 0.9470—confirming that the Temporal Attention Mechanism is essential for maintaining prediction stability by dynamically adjusting feature importance over different time scales. Its absence leads to a loss of focus in long-term forecasting, causing greater cumulative errors.

From a sustainability perspective, these findings reinforce the necessity of advanced forecasting models in climate policy planning. The ability to accurately model both short- and long-term GHG trends provides a strong scientific foundation for emission reduction strategies, allowing policymakers to make informed decisions based on reliable, data-driven insights. By leveraging a multi-scale forecasting approach, MST-GHF enables governments and businesses to develop adaptive strategies for mitigating greenhouse gas emissions, optimizing green energy investments, and enhancing climate resilience. Additionally, its optimized computational efficiency significantly reduces the energy consumption associated with large-scale climate modeling, making it a more sustainable and resource-efficient solution for environmental decision-making and long-term climate governance.

4. Discussions

The accurate and stable prediction of greenhouse gas (GHG) concentrations is fundamental to effective climate action. Policy decisions on emissions reductions, carbon taxation, and energy transitions rely on precise forecasts to anticipate environmental trends and mitigate risks before they become unmanageable [47,48,49]. Without reliable models, short-term fluctuations in CO₂ levels caused by weather variations or human activities could lead to reactive rather than proactive policymaking, while long-term uncertainties in emission trends could hinder strategic planning for sustainable development [50,51]. MST-GHF, with its advanced multi-scale forecasting capability, ensures both short-term precision and long-term trend stability, providing decision-makers with high-fidelity data for scientifically grounded climate policies.

Beyond policymaking, accurate and stable forecasts facilitate critical sustainability research. MST-GHF’s dual-encoder architecture distinguishes between high-frequency daily variations and long-term seasonal or anthropogenic trends, making it particularly valuable for scenario simulations. Researchers can use this capability to test the impact of policy interventions—such as carbon pricing or industrial decarbonization—by modifying historical data and projecting future CO₂ levels under different regulatory conditions. Additionally, it supports environmental impact assessments by providing baseline CO₂ concentration trajectories that integrate into broader ecological studies, such as evaluating how atmospheric carbon trends correlate with permafrost thawing or ocean acidification. By offering a scientifically robust framework for emissions forecasting, MST-GHF strengthens the analytical foundation for sustainability initiatives, aligning policy, environmental research, and climate mitigation efforts.

One of MST-GHF’s defining advantages is its computational efficiency, which significantly reduces the energy and infrastructure requirements traditionally associated with high-accuracy climate models. Unlike Transformer-based models with O(L²) complexity, MST-GHF employs an Input Attention mechanism with linear complexity and an Autoformer encoder with O(Llog L) efficiency, drastically lowering processing demands. This streamlined design allows the model to be trained within a reasonable timeframe using mid-range hardware and deployed in resource-limited environments such as remote monitoring stations or developing regions with scarce computational power. The reduction in energy consumption not only makes MST-GHF a cost-effective solution but also aligns with sustainability principles by lowering the carbon footprint of climate modeling itself. As a result, high-quality emissions forecasting becomes accessible beyond well-funded research institutions, enabling more regions to integrate data-driven decision-making into their climate strategies.

Despite its strengths, MST-GHF faces challenges, particularly in data-sparse regions where inconsistent monitoring can reduce forecast reliability. While its computational efficiency lowers deployment barriers, some level of hardware infrastructure (e.g., a GPU) is still required for training. Future research could focus on expanding its applicability by incorporating additional greenhouse gases like methane (CH₄) and nitrogen oxides (NOₓ) and improving adaptability to incomplete datasets through imputation techniques or transfer learning. Further optimization for low-power edge computing could enhance its real-world usability, ensuring that even the most underserved regions benefit from advanced GHG forecasting to support sustainable climate policies.

5. Conclusions

This study introduces the MST-GHF framework, a multi-scale greenhouse gas (GHG) forecasting model that integrates daily and monthly CO₂ data through a dual-encoder architecture. By leveraging an Input Attention mechanism to capture short-term variability, an Autoformer encoder to model long-term trends, and a Temporal Attention mechanism to ensure cross-scale stability, MST-GHF significantly enhances forecasting accuracy and robustness. Compared to baseline models, MST-GHF improves Test_R² by up to 2.2% and reduces Test_MSE by 37.1%, demonstrating superior predictive performance. Its ability to deliver reliable short- and long-term GHG forecasts provides a strong data foundation for climate policy planning and environmental sustainability. Accurate and stable predictions enable policymakers to anticipate emission trends, refine carbon mitigation strategies, and proactively adjust environmental regulations. Additionally, MST-GHF facilitates scenario simulations, aiding in the evaluation of various climate interventions.

Beyond policy applications, MST-GHF contributes to sustainability science by advancing multi-scale forecasting methodologies that bridge the gap between short-term atmospheric variability and long-term climate trends. Its computational efficiency, achieved through optimized attention mechanisms, reduces resource consumption compared to traditional forecasting models, making it more accessible for deployment in resource-limited settings such as remote monitoring stations. This not only democratizes high-precision climate modeling but also minimizes the environmental footprint of data-intensive computations. Despite its reliance on high-quality continuous datasets, future work can extend the framework to include additional greenhouse gases (e.g., CH₄, NOₓ) and improve adaptability to incomplete data, further supporting global climate governance and sustainable development initiatives.

Author Contributions

H.W. contributed to conceptualization, data curation, software, methodology, visualization, writing—original draft preparation; Y.M. contributed to conceptualization, resources, writing—review and editing; J.R. contributed to conceptualization, data curation, and software; X.Z. contributed to resources and project administration; Z.Q. contributed to project administration and writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the first author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. All test set data for 15 models.

Model	Test_MSE	Test_MAE	Test_R²	Test_MAPE
MST-GHF	0.0002878	0.0116377	0.9627339	0.0147310
DARNN	0.0003716	0.0131211	0.9518857	0.0167605
Autoformer	0.0004013	0.0136400	0.9480379	0.0174719
TCN	0.0004571	0.0146228	0.9408147	0.0187982
BiTransfomer_LSTM	0.0005962	0.0184426	0.9227868	0.0231243
Informer	0.0006874	0.0182407	0.9109833	0.0236269
LSTM	0.0008636	0.0243913	0.8881612	0.0295327
Bi_GRU	0.0012059	0.0257306	0.8438375	0.0333236
Bi_LSTM	0.0012679	0.0260454	0.8358032	0.0338971
GRU	0.0012723	0.0292467	0.8352157	0.0367302
RNN	0.0013303	0.0306885	0.8276975	0.0373433
CNN1D	0.0018290	0.0351233	0.7631485	0.0422021
Bi_RNN	0.0025301	0.0413129	0.6723005	0.0485256
CNN1D_LSTM	0.0028931	0.0447957	0.6253179	0.0527006
ANN	0.0043728	0.0612601	0.4336226	0.0744804

Table A2. 10 step Test_R² data of 15 models participating in the experiment.

Model	STEP1	STEP2	STEP3	STEP4	STEP5	STEP6	STEP7	STEP8	STEP9	STEP10
MST-GHF	0.97878	0.97443	0.97074	0.96684	0.96564	0.96125	0.95805	0.95458	0.95005	0.94698
DARNN	0.97102	0.96757	0.96023	0.95383	0.95822	0.94851	0.94829	0.94355	0.93422	0.93341
Autoformer	0.96930	0.96559	0.95608	0.94813	0.95578	0.94324	0.94548	0.93990	0.92843	0.92844
TCN	0.96514	0.96146	0.94898	0.93800	0.95049	0.93385	0.93957	0.93361	0.91780	0.91926
BiTransfomer _LSTM	0.94296	0.92876	0.93205	0.92333	0.91395	0.94672	0.90693	0.91436	0.91156	0.90723
Informer	0.94596	0.94410	0.92072	0.89425	0.92664	0.89500	0.91480	0.90963	0.87846	0.88027
LSTM	0.91190	0.90502	0.90029	0.88391	0.89488	0.90956	0.88113	0.85724	0.86588	0.87179
Bi_GRU	0.88297	0.83717	0.90740	0.88788	0.84569	0.76714	0.91339	0.78506	0.74577	0.86590
Bi_LSTM	0.85418	0.92533	0.88067	0.78546	0.82317	0.80695	0.76901	0.85771	0.80601	0.84954
GRU	0.85939	0.84737	0.86361	0.88095	0.79092	0.82703	0.83580	0.79108	0.85289	0.80312
RNN	0.85730	0.84250	0.84315	0.82455	0.82357	0.82105	0.80826	0.81793	0.82042	0.81823
CNN1D	0.79477	0.79974	0.86495	0.76666	0.71630	0.78686	0.73372	0.79676	0.71642	0.65530
Bi_RNN	0.72101	0.70851	0.72330	0.66788	0.61899	0.65601	0.62097	0.69373	0.66151	0.65108
CNN1D_LSTM	0.70959	0.63854	0.65612	0.62404	0.70758	0.59694	0.67335	0.53487	0.60110	0.51105
ANN	0.51734	0.44780	0.37191	0.43568	0.46613	0.43010	0.44904	0.45915	0.47785	0.28123

Table A3. 10 step Test_MSE data of 15 models participating in the experiment.

Model	STEP1	STEP2	STEP3	STEP4	STEP5	STEP6	STEP7	STEP8	STEP9	STEP10
MST-GHF	0.000163	0.000197	0.000226	0.000256	0.000265	0.000299	0.000324	0.000351	0.000386	0.000410
DARNN	0.000223	0.000250	0.000307	0.000356	0.000322	0.000398	0.000400	0.000436	0.000509	0.000515
Autoformer	0.000236	0.000265	0.000339	0.000400	0.000341	0.000438	0.000421	0.000465	0.000553	0.000554
TCN	0.000268	0.000297	0.000393	0.000478	0.000382	0.000511	0.000467	0.000513	0.000636	0.000625
BiTransfomer _LSTM	0.000439	0.000549	0.000524	0.000591	0.000664	0.000411	0.000719	0.000662	0.000684	0.000718
Informer	0.000416	0.000431	0.000611	0.000816	0.000566	0.000811	0.000658	0.000699	0.000940	0.000926
LSTM	0.000678	0.000732	0.000769	0.000896	0.000811	0.000698	0.000919	0.001104	0.001037	0.000992
Bi_GRU	0.000901	0.001255	0.000714	0.000865	0.001191	0.001798	0.000669	0.001662	0.001966	0.001037
Bi_LSTM	0.001123	0.000575	0.000920	0.001655	0.001365	0.001491	0.001785	0.001100	0.001500	0.001164
GRU	0.001083	0.001176	0.001052	0.000918	0.001614	0.001336	0.001269	0.001615	0.001138	0.001523
RNN	0.001099	0.001214	0.001209	0.001354	0.001362	0.001382	0.001482	0.001407	0.001389	0.001406
CNN1D	0.001581	0.001543	0.001041	0.001800	0.002190	0.001646	0.002058	0.001571	0.002193	0.002667
Bi_RNN	0.002149	0.002246	0.002133	0.002562	0.002941	0.002657	0.002929	0.002368	0.002618	0.002699
CNN1D_LSTM	0.002237	0.002785	0.002651	0.002900	0.002257	0.003113	0.002524	0.003596	0.003085	0.003783
ANN	0.003717	0.004255	0.004843	0.004354	0.004121	0.004401	0.004257	0.004181	0.004038	0.005561

Table A4. 10 step Test_MAE data of 15 models participating in the experiment.

Model	STEP1	STEP2	STEP3	STEP4	STEP5	STEP6	STEP7	STEP8	STEP9	STEP10
MST-GHF	0.008435	0.009374	0.010210	0.010994	0.011186	0.012021	0.012577	0.013206	0.013918	0.014457
DARNN	0.010061	0.010630	0.011907	0.012869	0.012302	0.013680	0.013829	0.014469	0.015657	0.015807
Autoformer	0.010274	0.010891	0.012558	0.013691	0.012639	0.014409	0.014199	0.014928	0.016397	0.016415
TCN	0.010958	0.011553	0.013643	0.015087	0.013372	0.015661	0.014940	0.015786	0.017707	0.017521
Informer	0.013944	0.014224	0.017385	0.020133	0.016555	0.020105	0.018042	0.018612	0.021806	0.021601
BiTransfomer _LSTM	0.016098	0.018339	0.017876	0.018638	0.020110	0.013880	0.020849	0.019472	0.019489	0.019675
LSTM	0.021328	0.022347	0.022861	0.024917	0.023618	0.021431	0.025366	0.028250	0.027284	0.026510
Bi_GRU	0.022423	0.026983	0.021235	0.022717	0.026333	0.032263	0.019261	0.030710	0.032259	0.023122
Bi_LSTM	0.024449	0.017924	0.022438	0.029404	0.027164	0.028833	0.031595	0.024125	0.027837	0.026685
GRU	0.027698	0.027475	0.026573	0.025132	0.033664	0.030206	0.029865	0.032262	0.028023	0.031569
RNN	0.027605	0.029178	0.029061	0.030809	0.031127	0.031362	0.032624	0.031747	0.031580	0.031791
CNN1D	0.032972	0.032569	0.026367	0.035236	0.038554	0.033392	0.037443	0.033172	0.038725	0.042803
Bi_RNN	0.037570	0.038244	0.037586	0.041467	0.044881	0.042510	0.044928	0.040119	0.042524	0.043299
CNN1D_LSTM	0.038234	0.043631	0.042643	0.044788	0.038907	0.046889	0.041832	0.051089	0.047142	0.052802
ANN	0.057103	0.060965	0.065537	0.061544	0.059535	0.061573	0.060271	0.059627	0.057873	0.068573

Table A5. 10 step Test_MAPE data of 15 models participating in the experiment.

Model	STEP1	STEP2	STEP3	STEP4	STEP5	STEP6	STEP7	STEP8	STEP9	STEP10
MST-GHF	0.010664	0.011856	0.012908	0.013898	0.014153	0.015210	0.015928	0.016728	0.017639	0.018326
DARNN	0.012742	0.013492	0.015195	0.016444	0.015651	0.017497	0.017647	0.018523	0.020128	0.020286
Autoformer	0.013061	0.013875	0.016077	0.017557	0.016118	0.018489	0.018148	0.019151	0.021127	0.021117
TCN	0.013990	0.014770	0.017531	0.019441	0.017128	0.020177	0.019164	0.020293	0.022875	0.022614
BiTransfomer _LSTM	0.019778	0.022580	0.022051	0.023331	0.025092	0.017770	0.026130	0.024649	0.024767	0.025097
Informer	0.017983	0.018345	0.022511	0.026143	0.021399	0.026094	0.023321	0.024085	0.028329	0.028060
LSTM	0.025587	0.026832	0.027559	0.030017	0.028598	0.026056	0.030931	0.034351	0.033116	0.032278
Bi_GRU	0.028922	0.034812	0.027270	0.028817	0.033980	0.042075	0.024814	0.040054	0.042387	0.030106
Bi_LSTM	0.032075	0.022765	0.029086	0.038758	0.035504	0.037625	0.041396	0.031411	0.036513	0.033838
GRU	0.034505	0.034609	0.033364	0.031164	0.042297	0.037924	0.037472	0.041016	0.035105	0.039846
RNN	0.033020	0.035437	0.035304	0.037536	0.037904	0.038326	0.039871	0.038640	0.038453	0.038941
CNN1D	0.039474	0.038810	0.032060	0.042228	0.046183	0.040666	0.045155	0.039777	0.046488	0.051180
Bi_RNN	0.043991	0.044804	0.044306	0.048538	0.052553	0.049851	0.052712	0.047334	0.050156	0.051010
CNN1D_LSTM	0.044628	0.051063	0.050001	0.052544	0.045748	0.055173	0.049362	0.060271	0.055763	0.062453
ANN	0.069636	0.074062	0.079720	0.074864	0.072489	0.074870	0.073280	0.072609	0.070289	0.082984

Table A6. Results of Ablation Experiment.

Model	Test_MSE	Test_MAE	Test_R²	Test_MAPE
MST-GHF	0.000288	0.011638	0.962734	0.014731
Without Input-Attn	0.000645	0.019629	0.916457	0.024515
Without Autoformer Encoder	0.000401	0.013640	0.948038	0.017472
Without Temporal-Attn	0.000713	0.020487	0.907653	0.025760

Table A7. 10 step Test_R² data of 4 models participating in the Ablation Experiment.

Model	STEP1	STEP2	STEP3	STEP4	STEP5	STEP6	STEP7	STEP8	STEP9	STEP10
MST-GHF	0.9788	0.9744	0.9707	0.9668	0.9656	0.9613	0.9581	0.9546	0.9501	0.9470
Without Input-Attn	0.9396	0.9351	0.9270	0.9194	0.9242	0.9217	0.9027	0.8935	0.9017	0.8997
Without Autoformer Encoder	0.9533	0.9528	0.9592	0.9564	0.9267	0.9332	0.9383	0.9351	0.9394	0.9092
Without Temporal-Attn	0.9405	0.9226	0.9168	0.9143	0.9159	0.9088	0.8903	0.8877	0.8913	0.8883

Table A8. 10 step Test_MSE data of 4 models participating in the Ablation Experiment.

Model	STEP1	STEP2	STEP3	STEP4	STEP5	STEP6	STEP7	STEP8	STEP9	STEP10
MST-GHF	0.000163	0.000197	0.000226	0.000256	0.000265	0.000299	0.000324	0.000351	0.000386	0.000410
Without Input-Attn	0.000465	0.000500	0.000563	0.000622	0.000585	0.000605	0.000752	0.000824	0.000760	0.000776
Without Autoformer Encoder	0.000360	0.000363	0.000314	0.000336	0.000566	0.000516	0.000477	0.000501	0.000468	0.000702
Without Temporal-Attn	0.000458	0.000597	0.000641	0.000662	0.000649	0.000704	0.000847	0.000868	0.000841	0.000864

Table A9. 10 step Test_MAE data of 4 models participating in the Ablation Experiment.

Model	STEP1	STEP2	STEP3	STEP4	STEP5	STEP6	STEP7	STEP8	STEP9	STEP10
MST-GHF	0.008435	0.009374	0.010210	0.010994	0.011186	0.012021	0.012577	0.013206	0.013918	0.014457
Without Input-Attn	0.014397	0.014028	0.012550	0.012403	0.017957	0.017335	0.015426	0.015473	0.014923	0.019603
Without Autoformer Encoder	0.016912	0.017078	0.018338	0.019633	0.018462	0.018914	0.021147	0.022735	0.021612	0.021463
Without Temporal-Attn	0.016471	0.018758	0.019669	0.020032	0.019470	0.020607	0.022347	0.022622	0.022380	0.022511

Table A10. 10 step Test_MAPE data of 4 models participating in the Ablation Experiment.

Model	STEP1	STEP2	STEP3	STEP4	STEP5	STEP6	STEP7	STEP8	STEP9	STEP10
MST-GHF	0.010664	0.011856	0.012908	0.013898	0.014153	0.015210	0.015928	0.016728	0.017639	0.018326
Without Input-Attn	0.020704	0.021273	0.022807	0.024334	0.023051	0.023641	0.026623	0.028500	0.027168	0.027045
Without Autoformer Encoder	0.017960	0.017658	0.015864	0.015856	0.022439	0.021565	0.019613	0.019965	0.019018	0.024743
Without Temporal-Attn	0.020429	0.023467	0.024578	0.025037	0.024404	0.025819	0.028282	0.028697	0.028360	0.028526

References

Barnett, J. Security and Climate Change. Glob. Environ. Change 2003, 13, 7–17. [Google Scholar] [CrossRef]
Adedeji, O.; Reuben, O.; Olatoye, O. Global Climate Change. J. Geosci. Environ. Prot. 2014, 2, 114–122. [Google Scholar] [CrossRef]
Jeffry, L.; Ong, M.Y.; Nomanbhay, S.; Mofijur, M.; Mubashir, M.; Show, P.L. Greenhouse Gases Utilization: A Review. Fuel 2021, 301, 121017. [Google Scholar] [CrossRef]
EI Kenawy, A.M.; Al-Awadhi, T.; Abdullah, M.; Jawarneh, R.; Abulibdeh, A. A Preliminary Assessment of Global CO₂: Spatial Patterns, Temporal Trends, and Policy Implications. Glob. Chall. 2023, 7, 2300184. [Google Scholar] [CrossRef]
Le, P.V.V.; Randerson, J.T.; Willett, R.; Wright, S.; Smyth, P.; Guilloteau, C.; Mamalakis, A.; Foufoula-Georgiou, E. Climate-Driven Changes in the Predictability of Seasonal Precipitation. Nat. Commun. 2023, 14, 3822. [Google Scholar] [CrossRef] [PubMed]
Singh, S. Forest Fire Emissions: A Contribution to Global Climate Change. Front. For. Glob. Change 2022, 5, 925840. [Google Scholar] [CrossRef]
Ledley, T.S.; Sundquist, E.T.; Schwartz, S.E.; Hall, D.K.; Fellows, J.D.; Killeen, T.L. Climate Change and Greenhouse Gases. Eos Trans. Am. Geophys. Union 1999, 80, 453–458. [Google Scholar] [CrossRef]
Li, X.; Jiang, S.; Wang, X.; Wang, T.; Zhang, S.; Guo, J.; Jiao, D. XCO₂ Super-Resolution Reconstruction Based on Spatial Extreme Random Trees. Atmosphere 2024, 15, 440. [Google Scholar] [CrossRef]
Marvin, D.C.; Sleeter, B.M.; Cameron, D.R.; Nelson, E.; Plantinga, A.J. Natural Climate Solutions Provide Robust Carbon Mitigation Capacity under Future Climate Change Scenarios. Sci. Rep. 2023, 13, 19008. [Google Scholar] [CrossRef]
Li, Q.; Li, Q.; Wu, J.; Li, X.; Li, H.; Cheng, Y. Wellhead Stability During Development Process of Hydrate Reservoir in the Northern South China Sea: Evolution and Mechanism. Processes 2025, 13, 40. [Google Scholar] [CrossRef]
Li, Q.; Li, Q.; Cao, H.; Wu, J.; Wang, F.; Wang, Y. The Crack Propagation Behaviour of CO₂ Fracturing Fluid in Unconventional Low Permeability Reservoirs: Factor Analysis and Mechanism Revelation. Processes 2025, 13, 159. [Google Scholar] [CrossRef]
Selin, H.; VanDeveer, S.D. Political Science and Prediction: What’s Next for U.S. Climate Change Policy? Rev. Policy Re-Search 2007, 24, 1–27. [Google Scholar] [CrossRef]
Patnaik, S. A Cross-Country Study of Collective Political Strategy: Greenhouse Gas Regulations in the European Union. J. Int. Bus. Stud. 2019, 50, 1130–1155. [Google Scholar] [CrossRef]
Wang, M.; Hu, Z.; Wang, X.; Li, X.; Wang, Y.; Liu, H.; Han, C.; Cai, J.; Zhao, W. Spatio-Temporal Variation of Carbon Sources and Sinks in the Loess Plateau under Different Climatic Conditions and Land Use Types. Forests 2023, 14, 1640. [Google Scholar] [CrossRef]
Wen, H.; Li, Y.; Li, Z.; Cai, X.; Wang, F. Spatial Differentiation of Carbon Budgets and Carbon Balance Zoning in China Based on the Land Use Perspective. Sustainability 2022, 14, 12962. [Google Scholar] [CrossRef]
Maberly, S.C.; Stott, A.W.; Gontero, B. The Differential Ability of Two Species of Seagrass to Use Carbon Dioxide and Bicarbonate and Their Modelled Response to Rising Concentrations of Inorganic Carbon. Front. Plant Sci. 2022, 13, 936716. [Google Scholar] [CrossRef]
Mauclet, E.; Villani, M.; Monhonval, A.; Hirst, C.; Schuur, E.A.G.; Opfergelt, S. Quantifying Exchangeable Base Cations in Permafrost: A Reserve of Nutrients about to Thaw. Earth Syst. Sci. Data 2023, 15, 3891–3904. [Google Scholar] [CrossRef]
van den Bergh, J.; van Beers, C.; King, L.C. Prioritize Carbon Pricing over Fossil-Fuel Subsidy Reform. iScience 2024, 27, 108584. [Google Scholar] [CrossRef]
Wei, X.; Xu, Y. Research on Carbon Emission Prediction and Economic Policy Based on TCN-LSTM Combined with Attention Mechanism. Front. Ecol. Evol. 2023, 11, 1270248. [Google Scholar] [CrossRef]
Nagarajan, G.; Babu, L.D.D. Predictive Analytics On Big Data—An Overview. Informatica 2019, 43, 4. [Google Scholar] [CrossRef]
Smith, D.M.; Cusack, S.; Colman, A.W.; Folland, C.K.; Harris, G.R.; Murphy, J.M. Improved Surface Temperature Prediction for the Coming Decade from a Global Climate Model. Science 2007, 317, 796–799. [Google Scholar] [CrossRef] [PubMed]
Gualdi, S.; Navarra, A.; Guilyardi, E.; Delecluse, P. Assessment of the Tropical Indo-Pacific Climate in the SINTEX CGCM. Ann. Geophys. 2003, 46, 1–26. [Google Scholar] [CrossRef]
Eiselt, K.-U.; Graversen, R.; Fredriksen, H.-B. Time dependence of climate sensitivity. In Proceedings of the EGU General Assembly 2020, EGU2020-12950. Online, 4–8 May 2020. [Google Scholar] [CrossRef]
Bonsor, H.; MacDonald, A.; Calow, R. Potential Impact of Climate Change on Improved and Unimproved Water Supplies in Africa. RSC Issues Environ Sci. Technol 2010, 31, 25–50. [Google Scholar] [CrossRef]
Sun, H.; Liang, L.; Wang, C.; Wu, Y.; Yang, F.; Rong, M. Prediction of the Electrical Strength and Boiling Temperature of the Substitutes for Greenhouse Gas SF₆ Using Neural Network and Random Forest. IEEE Access 2020, 8, 124204–124216. [Google Scholar] [CrossRef]
Cai, W.; Wei, R.; Xu, L.; Ding, X. A Method for Modelling Greenhouse Temperature Using Gradient Boost Decision Tree. Inf. Process. Agric. 2022, 9, 343–354. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, H.; Wang, R.; Zhang, M.; Huang, Y.; Hu, J.; Peng, J. Measuring the Critical Influence Factors for Pre-dicting Carbon Dioxide Emissions of Expanding Megacities by XGBoost. Atmosphere 2022, 13, 599. [Google Scholar] [CrossRef]
Rasp, S.; Pritchard, M.S.; Gentine, P. Deep Learning to Represent Subgrid Processes in Climate Models. Proc. Natl. Acad. Sci. USA 2018, 115, 9684–9689. [Google Scholar] [CrossRef]
Isphording, R.N.; Alexander, L.V.; Bador, M.; Green, D.; Evans, J.P.; Wales, S. A Standardized Benchmarking Framework to Assess Downscaled Precipitation Simulations. J. Climate 2024, 37, 1089–1110. [Google Scholar] [CrossRef]
Fabbri, S.; Hauschild, M.Z.; Lenton, T.M.; Owsianiak, M. Multiple Climate Tipping Points Metrics for Improved Sustainability Assessment of Products and Services. Environ. Sci. Technol. 2021, 55, 2800–2810. [Google Scholar] [CrossRef]
Ye, S. Analysis of Influencing Factors of Carbon Emissions from China’s Marine Fishery Energy Consumption under Different Development Scenarios. Front. Mar. Sci. 2024, 11, 1377215. [Google Scholar] [CrossRef]
Tudor, C.; Sova, R. Benchmarking GHG Emissions Forecasting Models for Global Climate Policy. Electronics 2021, 10, 3149. [Google Scholar] [CrossRef]
Sak, H.; Senior, A.; Beaufays, F. Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling. In Proceedings of the Interspeech 2014, Singapore, 14–18 September 2014; ISCA: Pune, India, 2024; pp. 338–342. [Google Scholar]
Zhang, L.; Cai, Y.; Huang, H.; Li, A.; Yang, L.; Zhou, C. A CNN-LSTM Model for Soil Organic Carbon Content Prediction with Long Time Series of MODIS-Based Phenological Variables. Remote Sens. 2022, 14, 4441. [Google Scholar] [CrossRef]
Panja, M.; Chakraborty, T.; Biswas, A.; Deb, S. E-STGCN: Extreme Spatiotemporal Graph Convolutional Networks for Air Quality Forecasting. arXiv 2024, arXiv:2411.12258. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: New York, NY, USA, 2017; Volume 30. [Google Scholar]
Wu, X.; Yuan, Q.; Zhou, C.; Chen, X.; Xuan, D.; Song, J. Carbon Emissions Forecasting Based on Temporal Graph Transform-er-Based Attentional Neural Network. J. Comput. Methods Sci. Eng. 2024, 24, 1405–1421. [Google Scholar] [CrossRef]
Li, S.; Yuan, G.; Chen, J.; Tan, C.; Zhou, H. Self-Supervised Learning for Solar Radio Spectrum Classification. Universe 2022, 8, 656. [Google Scholar] [CrossRef]
Kim, S.-J.; Chung, Y.-J. Multi-Scale Features for Transformer Model to Improve the Performance of Sound Event Detection. Appl. Sci. 2022, 12, 2626. [Google Scholar] [CrossRef]
Hewage, P.; Behera, A.; Trovati, M.; Pereira, E.; Ghahremani, M.; Palmieri, F.; Liu, Y. Temporal Convolutional Neural (TCN) Network for an Effective Weather Forecasting Using Time-Series Data from the Local Weather Station. Soft Comput. 2020, 24, 16453–16482. [Google Scholar] [CrossRef]
Qin, Y.; Song, D.; Chen, H.; Cheng, W.; Jiang, G.; Cottrell, G. A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 2627–2633. [Google Scholar] [CrossRef]
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–14 December 2021; Curran Associates, Inc.: New York, NY, USA, 2021; Volume 34, pp. 22419–22430. Available online: https://proceedings.neurips.cc/paper/2021/hash/bcc0d400288793e8bdcd7c19a8ac0c2b-Abstract.html (accessed on 14 November 2024).
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
Thoning, K.W.; Crotwell, A.M.; Mund, J.W. Atmospheric Carbon Dioxide Dry Air Mole Fractions from Continuous Measurements at Mauna Loa, Hawaii, Barrow, Alaska, American Samoa and South Pole, 1973-Present; Version 2024-08-15, National Oceanic and Atmospheric Administration (NOAA); Global Monitoring Laboratory (GML): Boulder, CO, USA, 2024. [Google Scholar] [CrossRef]
Gu, Z.; Zhu, X.; Ye, H.; Zhang, L.; Wang, J.; Zhu, Y.; Jiang, S.; Xiong, Z.; Li, Z.; Wu, W.; et al. Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation. Proc. AAAI Conf. Artif. Intell. 2024, 38, 18099–18107. [Google Scholar] [CrossRef]
Zhou, Y.; Dong, Z.; Bao, X. A Ship Trajectory Prediction Method Based on an Optuna–BILSTM Model. Appl. Sci. 2024, 14, 3719. [Google Scholar] [CrossRef]
Farnsworth, A.; Lo, Y.T.E.; Valdes, P.J.; Buzan, J.R.; Mills, B.J.W.; Merdith, A.S.; Scotese, C.R.; Wakeford, H.R. Climate Extremes Likely to Drive Land Mammal Extinction during next Supercontinent Assembly. Nat. Geosci. 2023, 16, 901–908. [Google Scholar] [CrossRef]
Rahman, M.M.; Shafiullah, M.; Alam, M.S.; Rahman, M.S.; Alsanad, M.A.; Islam, M.M.; Islam, M.K.; Rahman, S.M. Decision Tree-Based Ensemble Model for Predicting National Greenhouse Gas Emissions in Saudi Arabia. Appl. Sci. 2023, 13, 3832. [Google Scholar] [CrossRef]
Kumari, S.; Singh, S.K. Machine Learning-Based Time Series Models for Effective CO₂ Emission Prediction in India. Env. Sci. Pollut. Res. 2023, 30, 116601–116616. [Google Scholar] [CrossRef]
McNorton, J.; Bousserez, N.; Agustí-Panareda, A.; Balsamo, G.; Cantarello, L.; Engelen, R.; Huijnen, V.; Inness, A.; Kipling, Z.; Parrington, M.; et al. Quantification of Methane Emissions from Hotspots and during COVID-19 Using a Global Atmospheric Inversion. Atmos. Chem. Phys. 2022, 22, 5961–5981. [Google Scholar] [CrossRef]
Ou, Y.; Roney, C.; Alsalam, J.; Calvin, K.; Creason, J.; Edmonds, J.; Fawcett, A.A.; Kyle, P.; Narayan, K.; O’Rourke, P.; et al. Deep Mitigation of CO₂ and Non-CO₂ Greenhouse Gases toward 1.5 °C and 2 °C Futures. Nat. Commun. 2021, 12, 6245. [Google Scholar] [CrossRef]

Figure 1. MST-GHF encoding and decoding mechanism.

Figure 2. Ten steps MST-GHF predicted values and true values on the dataset.

Figure 3. The predicted values and true values of the nine best models.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, H.; Mei, Y.; Ren, J.; Zhu, X.; Qian, Z. Multi-Scale Temporal Integration for Enhanced Greenhouse Gas Forecasting: Advancing Climate Sustainability. Sustainability 2025, 17, 3436. https://doi.org/10.3390/su17083436

AMA Style

Wang H, Mei Y, Ren J, Zhu X, Qian Z. Multi-Scale Temporal Integration for Enhanced Greenhouse Gas Forecasting: Advancing Climate Sustainability. Sustainability. 2025; 17(8):3436. https://doi.org/10.3390/su17083436

Chicago/Turabian Style

Wang, Haozhe, Yuqi Mei, Jingxuan Ren, Xiaoxu Zhu, and Zhong Qian. 2025. "Multi-Scale Temporal Integration for Enhanced Greenhouse Gas Forecasting: Advancing Climate Sustainability" Sustainability 17, no. 8: 3436. https://doi.org/10.3390/su17083436

APA Style

Wang, H., Mei, Y., Ren, J., Zhu, X., & Qian, Z. (2025). Multi-Scale Temporal Integration for Enhanced Greenhouse Gas Forecasting: Advancing Climate Sustainability. Sustainability, 17(8), 3436. https://doi.org/10.3390/su17083436

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Scale Temporal Integration for Enhanced Greenhouse Gas Forecasting: Advancing Climate Sustainability

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.2. Data Processing

2.3. Methods

2.3.1. Input Attention Mechanism

2.3.2. Autoformer Encoder

2.3.3. Concating Phase

2.3.4. Temporal Attention Mechanism

2.4. Metric

2.5. Experiment Conditions

3. Results

3.1. Overall Performances

3.2. Multi-Step Forecasting

3.3. Ablation Experiment

4. Discussions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI