STFM: Accurate Spatio-Temporal Fusion Model for Weather Forecasting

Liu, Jun; Wu, Li; Zhang, Tao; Huang, Jianqiang; Wang, Xiaoying; Tian, Fang

doi:10.3390/atmos15101176

Open AccessArticle

STFM: Accurate Spatio-Temporal Fusion Model for Weather Forecasting

by

Jun Liu

¹,

Li Wu

^1,*,

Tao Zhang

²,

Jianqiang Huang

¹,

Xiaoying Wang

¹

and

Fang Tian

¹

State Key Laboratory of Plateau Ecology and Agriculture, Department of Computer Technology and Applications, Qinghai University, Xining 810016, China

²

Brookhaven National Laboratory, Upton, NY 11973, USA

^*

Author to whom correspondence should be addressed.

Atmosphere 2024, 15(10), 1176; https://doi.org/10.3390/atmos15101176

Submission received: 25 August 2024 / Revised: 15 September 2024 / Accepted: 29 September 2024 / Published: 30 September 2024

(This article belongs to the Section Meteorology)

Download

Browse Figures

Versions Notes

Abstract

:

Meteorological prediction is crucial for various sectors, including agriculture, navigation, daily life, disaster prevention, and scientific research. However, traditional numerical weather prediction (NWP) models are constrained by their high computational resource requirements, while the accuracy of deep learning models remains suboptimal. In response to these challenges, we propose a novel deep learning-based model, the Spatiotemporal Fusion Model (STFM), designed to enhance the accuracy of meteorological predictions. Our model leverages Fifth-Generation ECMWF Reanalysis (ERA5) data and introduces two key components: a spatiotemporal encoder module and a spatiotemporal fusion module. The spatiotemporal encoder integrates the strengths of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), effectively capturing both spatial and temporal dependencies. Meanwhile, the spatiotemporal fusion module employs a dual attention mechanism, decomposing spatial attention into global static attention and channel dynamic attention. This approach ensures comprehensive extraction of spatial features from meteorological data. The combination of these modules significantly improves prediction performance. Experimental results demonstrate that STFM excels in extracting spatiotemporal features from reanalysis data, yielding predictions that closely align with observed values. In comparative studies, STFM outperformed other models, achieving a 7% improvement in ground and high-altitude temperature predictions, a 5% enhancement in the prediction of the u/v components of 10 m wind speed, and an increase in the accuracy of potential height and relative humidity predictions by 3% and 1%, respectively. This enhanced performance highlights STFM’s potential to advance the accuracy and reliability of meteorological forecasting.

Keywords:

weather; meteorological forecast; numerical weather forecast; deep learning; ERA5

1. Introduction

Meteorological forecasting [1] is a vital and long-standing discipline aimed at accurately predicting future weather changes. It helps researchers make informed decisions to reduce the economic losses and casualties caused by natural disasters. In the past, meteorologists often predicted weather conditions by observing meteorological changes and summarizing relevant patterns, heavily relying on personal experience and knowledge. With the advancement of technology, modern weather forecasting methods mainly use numerical weather prediction (NWP) models [2,3,4] and machine learning models [5,6,7], which are able to reveal meteorological patterns from data and make accurate predictions, and have achieved remarkable results in agriculture, transportation, aerospace, and other fields.

Numerical weather prediction models are based on atmospheric dynamics theory and numerical calculations [8], which can analyze the atmospheric system [9] and provide rich forecast results of meteorological parameters, such as surface temperature [10], humidity, and precipitation. However, NWP models have high requirements for the accuracy of initial and boundary conditions and may not perform well in special situations such as complex terrain and oceans. Additionally, they require substantial computing resources and observational data support, making them computationally expensive. Deep learning models, on the other hand, are machine learning methods based on artificial neural networks. By training on large amounts of historical meteorological data, they can extract features from the data to achieve prediction. Deep learning models possess strong data processing and feature extraction capabilities [11], and have certain adaptive capabilities, allowing them to adapt to different meteorological phenomena and regional characteristics.

Since weather forecasting tasks are similar to spatiotemporal sequence prediction tasks [12,13,14], most existing research approaches use deep learning methods to perform weather forecasting [15,16]. For example, ConvLSTM [17] proposed by Shi can be applied to radar echo extrapolation maps [18] for precipitation forecasting [19]; E3D-LSTM [20] proposed by Wang et al. has shown promising results in video prediction [21] and action recognition [22]; and recent advanced spatiotemporal prediction models, such as UniFormer [23], PredRNN-v2 [24], and SimVP-v2 [25], have performed well on spatiotemporal prediction datasets like the traffic prediction dataset (TaxiBJ) [26], the human behavior recognition dataset (KTH) [27], and the moving digit prediction dataset (Moving-MNIST) [28]. However, when we applied the above spatiotemporal prediction models to weather forecasting, we found that they performed poorly on common meteorological variables (such as temperature, wind speed, and relative humidity) or only performed well on a few variables. To address this issue, we conducted in-depth research and found that general spatiotemporal series data (such as Moving-MNIST and KTH) are usually two-dimensional image sequences. Although there are spatiotemporal changes, they only contain simple features such as pixel values or color channels, and the changes are relatively simple. In contrast, meteorological data usually include variables in multiple dimensions, such as temperature, pressure, humidity, and wind speed, which are highly variable over time and involve complicated physics interplay. Therefore, considering the multidimensionality and complexity of meteorological data, we should pay further attention to its internal spatiotemporal characteristics.

To address the above issues, we propose a novel spatiotemporal fusion meteorological forecasting model. In the general framework of spatiotemporal prediction models, the encoder module predominantly employs convolutional neural networks (CNN) [29] to extract primary spatial features of data, as seen in models like ConvLSTM and its enhanced variants. However, these models are not well-suited for capturing the complexities of meteorological data, which exhibit multi-dimensional and highly correlated temporal patterns. Therefore, we incorporate temporal feature learning methods into the encoder to construct a new spatiotemporal encoding module that better captures the long-term dependencies and time series patterns in meteorological data. Additionally, to further exploit the high-dimensional spatiotemporal features of meteorological data, we propose a novel spatiotemporal attention module. This module enhances the model’s ability to focus on data correlations and dependencies across different times and locations, enabling the model to better adapt to various meteorological variables and improve its generalization ability.

In general, the main contributions of our work include the following: (1) Proposing a new model: We developed an innovative spatiotemporal fusion meteorological forecasting model that outperforms existing spatiotemporal forecasting models. This model excels in accurately predicting common meteorological variables within the fifth-generation ECMWF reanalysis (ERA5) dataset, including temperature, wind speed u/v components, and terrain height. (2) Designing a novel spatiotemporal encoder: Our approach involves the creation of an advanced spatiotemporal encoder that effectively captures the spatial distribution and structure within meteorological data via a specialized spatial feature extractor. Additionally, it encodes the temporal dynamics of the data through a sophisticated time series processing module. This integration of spatial and temporal information results in a high-dimensional feature representation, thereby significantly enhancing the accuracy of model predictions. (3) Designing an effective spatiotemporal fusion module: We developed a highly efficient spatiotemporal fusion module. For spatial attention, we take into account not only the significant spatial features at each time step but also the spatial attention weights between different time steps, integrating these to capture spatial variations across the entire time axis. The data processed by this spatial attention mechanism are then fed into a time series convolution, creating a seamless blend of spatial attention and temporal pattern learning. This enables the model to extract spatiotemporal features with greater precision.

2. Related Work

2.1. Spatiotemporal Weather Prediction

Using deep learning methods for weather forecasting involves applying deep learning technology to analyze atmospheric data and predict future meteorological elements, including temperature, humidity, and wind speed. Due to the dynamic changes in the atmosphere and climate system, meteorological data often comprise complex time series data and spatial information. Employing spatiotemporal prediction methods is an effective way to capture these spatiotemporal characteristics.

Spatiotemporal prediction learning can be seen as an extension and development of time series prediction learning. After the recurrent neural network (RNN) [30] was proposed, neural networks began to process sequence data. Srivastava et al. developed the LSTM [31] network by introducing a gating mechanism into the RNN, enabling the network to better capture long-term dependencies. Shi et al. proposed the ConvLSTM model, which effectively integrated temporal and spatial information by combining the structure of the long short-term memory network (LSTM) with the convolutional neural network (CNN), demonstrating its effectiveness on spatiotemporal sequence data such as radar echo extrapolation images. PredRNN [32] introduced a new spatiotemporal memory (ST-LSTM) unit capable of modeling both temporal and spatial information. PredRNN++ [33] involved an LSTM with a cascaded dual memory structure (Causal LSTM) to address the gradient vanishing problem caused by recursive depth in PredRNN, effectively utilizing long-term information. E3D-LSTM employed 3D convolution to enhance LSTM’s ability to process three-dimensional data, adding additional memory units to store context information, thereby strengthening long-term memory and improving prediction accuracy. MIM [34] introduced two cascaded recursive modules to handle non-stationary and steady-state components in spatiotemporal dynamics and improves the forget gate structure in LSTM to enhance the learning of non-stationary features. MAU [35] proposed a prediction unit that uses spatial information to supervise temporal information. PredRNNv2 improves the spatiotemporal LSTM unit (ST-LSTM) in PredRNN, allowing the cell memory state to pass through the recurrent unit in both vertical and horizontal directions. While these models have made significant efforts to extract spatiotemporal features and capture spatiotemporal dependencies, their prediction results on meteorological variables are still not satisfactory due to the inherent complexity of meteorological data. Therefore, we propose a spatiotemporal fusion model based on the attention mechanism [36] to more effectively extract spatiotemporal features of meteorological data, aiming to achieve more accurate meteorological forecasts.

2.2. Spatiotemporal Dependency Modeling

Spatiotemporal dependency modeling aims to effectively integrate information in the spatial and temporal dimensions [37] to improve the overall performance of the model. In traditional methods, convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are the main approaches for spatial feature extraction and temporal dependency modeling. The benefit of combining these two methods lies in their complementary strengths: CNNs excel at extracting spatial features from data, such as local patterns in images or grid data, while RNNs are adept at processing sequential data and capturing temporal dependencies. By organically integrating these two methods, a single model can effectively handle complex spatial and temporal dependencies, thereby significantly improving the accuracy and robustness of predictions. For example, Li et al. used a novel diffuse convolutional recurrent neural network (DCRNN) [38] to adjust the convolution kernel size through diffuse convolution to capture spatial features of different scales and employed an encoder–decoder structure with a planned sampling strategy to capture temporal dependencies. Yao et al. proposed a flow gating mechanism (FGM) to learn dynamic similarity in local regions and designed a periodic shifted attention mechanism (PSAM) [39] to handle the periodic constraints of processing time series data. Furthermore, the spatiotemporal attention mechanism is another critical technology. This mechanism allows the model to dynamically focus on important spatiotemporal locations within the input data, thereby further enhancing the efficiency of spatiotemporal information utilization. For instance, Lin et al. used a dynamic attention network (DSAN) [40] to separate spatiotemporal attention into different subspaces for representation by adopting a multi-space attention approach, which improved the model’s ability to capture spatiotemporal context information. While spatiotemporal attention has gained widespread attention in fields such as video classification and action recognition, it is still underutilized in meteorological forecasting. Given that meteorological data contain extremely rich spatiotemporal information, using spatiotemporal attention to fuse the features of meteorological data is highly appropriate.

3. Data and Methods

3.1. ERA5 Data of Qinghai Province, China

Qinghai Province (Figure 1), situated in the northeastern part of the Qinghai–Tibet Plateau, exhibits distinct climatic characteristics shaped by the interplay of its high-altitude topography and large-scale atmospheric circulation. Owing to its elevation, the province experiences relatively low average temperatures, with marked seasonal variations. During winter, under the influence of continental high-pressure systems, the region is cold and dry, characterized by low temperatures. In contrast, summer is dominated by the Asian monsoon, bringing increased humidity and concentrated precipitation, particularly between July and September, forming a brief but well-defined wet season. A significant feature of Qinghai’s climate is the frequent occurrence of extreme weather events, especially during transitional seasons such as spring and autumn. Rapid temperature fluctuations and sudden strong winds are common during these periods. The region’s complex plateau topography contributes to the intricate spatial structures of meteorological systems. Climate conditions vary substantially across different altitudes, with localized air circulations often occurring in the mountainous and canyon regions along the plateau’s edges. These local circulations play a crucial role in shaping the spatial and temporal distribution of temperature, humidity, and precipitation across the province. Regarding atmospheric circulation, Qinghai lies at the intersection of the westerlies and the Asian monsoon, resulting in a high degree of variability and uncertainty in its weather patterns. As a critical region within the broader Asian climate system, the Qinghai–Tibet Plateau exerts considerable influence not only on regional weather forecasts across China but also on global atmospheric circulation. The plateau’s thermal and dynamic effects intensify its impact on surrounding areas, particularly in interactions with neighboring regions, such as the development of warm and moist air masses. These interactions frequently amplify the occurrence of extreme meteorological events. Precipitation in Qinghai is unevenly distributed, with the eastern regions being relatively humid, while the western and northern areas remain arid. The barrier effect of the Qinghai–Tibet Plateau hinders the penetration of the southwest monsoon into the plateau’s interior, leading to greater precipitation in the southeastern part of the province and sparse rainfall in the drier western regions. Additionally, the presence of glaciers and snowmelt in certain parts of Qinghai plays a crucial role in supporting local ecosystems and sustaining the hydrological cycle.

Traditional numerical weather prediction (NWP) models struggle to obtain sufficient high-resolution data for accurate predictions under such conditions. Therefore, advanced spatiotemporal fusion models are needed to improve the accuracy of predictions. To address these difficulties, we aim to achieve more accurate weather forecasts under complex meteorological conditions in the plateau through our proposed novel spatiotemporal fusion model. Specifically, we selected ERA5 data from 2005 to 2022 for Qinghai Province (longitudes 88° to 103.75°, latitudes 27° to 42.75°) for single-variable meteorological forecasting. The surface variables include 2 m temperature (t2m), 10 m wind speed u-component (u10), 10 m wind speed v-component (v10), and high-altitude variables: 850 hPa temperature (t850), 500 hPa geopotential height (Geopotential_500), and 500 hPa relative humidity (Relative_humidity_500). The data have a temporal resolution of 1 h and a spatial resolution of 0.25 degrees, resulting in an initial data resolution size of 64 × 64 for the input model. We used the ERA5 dataset spanning from 2005 to 2022. Specifically, data from 2005 to 2019 were employed for model training, data from 2020 were used for validation during each training epoch, and data from 2021 to 2022 were reserved for final performance testing of the model. We input data using a sliding window approach with a window length of 24 frames. The first 12 frames represent the past 12 h of data, and the last 12 frames represent the true values for the next 12 h. The window interval is two units long, allowing for predictions for the next 12 h.

3.2. Methodology

3.2.1. Problem Definition

One approach to weather forecasting involves utilizing historical weather data to project future conditions. In this study, we employ ERA5 reanalysis data to model and predict the future values of a single meteorological variable. Specifically, we divide a square area based on the spatial resolution and the longitude and latitude range. The data in this area also have a certain temporal resolution. They are input into the model as a spatiotemporal series, and the sequence data for future time periods are used as the model’s output. This process can be strictly defined as follows: Assuming the current time is t, we aim to predict the data sequence

{\tilde{S}}_{t + 1}, {\tilde{S}}_{t + 2}, \dots, {\tilde{S}}_{t + q}

of future length q based on the ERA5 data sequence

S_{t - p + 1}, S_{t - p + 2}, \dots, S_{t}

of past time length p. The objective of our model is to minimize the gap between the model’s predicted sequence

{\tilde{S}}_{t + 1}, {\tilde{S}}_{t + 2}, \dots, {\tilde{S}}_{t + q}

and the true sequence

S_{t + 1}, S_{t + 2}, \dots, S_{t + q}

of future time steps. This can be expressed as

{\tilde{S}}_{t + 1, lon, lat}^{var}, {\tilde{S}}_{t + 2, lon, lat}^{var}, \dots, {\tilde{S}}_{t + q, lon, lat}^{var} = f_{θ} (S_{t - p + 1, lon, lat}^{var}, S_{t - p + 2, lon, lat}^{var}, \dots, S_{t, van, lat}^{var})

where

S_{t + i, lon, lat}^{var}

represents the actual value of the variable

var

at time

t + i

in longitude

88^{\circ}

to

103 . 75^{\circ}

and latitude

27^{\circ}

to

42 . 75^{\circ}

,

{\tilde{S}}_{t + i, lon, lat}^{var}

represents the predicted value at that time and place, and the optional values of

var

are 2 m temperature, 850 hPa temperature, 10 m wind speed u component, 10 m wind speed v component, 500 hPa geopotential height, 500 hPa relative humidity, f is the meteorological forecasting model, and

θ

represents the model parameters to be optimized.

3.2.2. Overall Framework

We use the 2 m temperature data from the ERA5 dataset as an input as an example and illustrate the overall framework of the model in Figure 2. First, the temperature data sequence is fed into the encoder, which encodes the shallow spatiotemporal relationships. The abstracted feature representation is then fed into the spatiotemporal fusion module (STFM) to further integrate the high-dimensional spatiotemporal features. Finally, the decoder recovers the spatial information. To further enhance the utilization of input information and maintain the spatiotemporal correlation between input and output, we introduce residual connections before decoding because they can fuse [41] the original features of the input during the decoding process, thereby avoiding information loss and enhancing the learning ability of the model.

Spatiotemporal Encoder. The encoder plays a crucial role in transforming raw data into meaningful feature representations, rich in spatiotemporal features and context modeling information essential for deep networks to comprehend dynamic data changes. Traditional encoder architectures [42,43,44] often rely solely on convolutional neural networks (CNNs) to capture spatiotemporal features. However, due to CNNs’ inherent limitations, although they can recognize local patterns or features, they cannot remember the state of the previous input and do not have the ability to propagate state information throughout the sequence, and traditional encoders may inadequately capture temporal dependencies within data sequences. To address this limitation, we propose an innovative encoder structure that integrates a bidirectional long short-term memory network (Bi-LSTM) prior to traditional two-dimensional convolution. The improvements in the encoder structure are particularly crucial for meteorological forecasting in regions with complex climate characteristics. In Qinghai Province, the monsoon system introduces significant seasonal variability, resulting in intricate spatial patterns of precipitation and temperature distribution. Our model employs a Bi-LSTM architecture to capture the cumulative effects of these seasonal changes over time, followed by convolutional layers to extract spatial features. This approach enhances the traditional encoder, which relies solely on CNNs, by better capturing the spatiotemporal correlations within the climate system, ultimately improving forecast accuracy. We reshape the input data initially from the shape of $B \times T \times C \times H \times W$ to $(B \times H \times W) \times T \times C$ and feed it into a Bi-LSTM [45] for temporal encoding, where B represents the batch size of the input data, T represents the sequence length, C represents the channel, and H and W represent the height and width of the image. This step allows us to capture temporal patterns and trends at each position in the data. The Bi-LSTM’s output is reshaped to $(B \times T) \times C^{'} \times H \times W$ and subsequently inputted into two-dimensional convolution layers to extract spatial features, resulting in a more comprehensive spatiotemporal representation. The fundamental concept underlying this structure is the fusion of Bi-LSTM’s temporal feature extraction capabilities with two-dimensional convolution’s spatial feature extraction abilities, thereby enhancing the encoder’s spatiotemporal modeling prowess. The above process can be expressed as

\begin{matrix} R^{'} = σ (ϕ (BiLSTM (R_{i n}))) \\ R^{''} = σ (ϕ ({Conv}_{3 \times 3} (R^{'}))) \\ σ (x) = x \cdot sig (x) \end{matrix}

where

R_{i n} \in R^{B \times T \times C \times H \times W}

represents input data,

B i L S T M

represents a bidirectional LSTM network,

ϕ

represents group normalization,

σ

represents

S i L U

activation function,

S i g

represents

S i g m o i d

activation function, and

{Conv}_{3 \times 3}

represents a standard two-dimensional convolution with a convolution kernel size of 3 (Figure 2).

Spatiotemporal Fusion Unit. In order to further improve the accuracy of meteorological forecasts, we proposed a spatiotemporal fusion module. As shown in Figure 3, our spatiotemporal fusion module consists of a spatial attention unit and a temporal convolution unit. We found that the spatial attention module we designed can capture the spatial correlation of meteorological data in a global and local manner, and then effectively capture the temporal continuity and change law through a one-dimensional convolution operation, while enabling the temporal convolution to have spatial perception capabilities to a certain extent. The organic combination of the two can realize the comprehensive modeling of the spatiotemporal relationship in meteorological data in the model, thereby improving the prediction performance of the model.

Spatial Attention Unit. We represent encoded meteorological data as a multi-dimensional tensor of shape $B \times T \times C \times H \times W$ , where B denotes the batch size. Within the spatial attention unit (Figure 3), we compress the temporal and channel dimensions of the feature, reshaping it into $B \times (T \times C) \times H \times W$ . This transformation facilitates decomposition of spatial attention into static attention (Global Static Attention, GSA), focusing on global information, and dynamic attention (Channel Dynamic Attention, CDA), emphasizing channel interactions (Figure 3). Inspired by the spatial attention mechanism and the gradual changes in meteorological elements, we employ global average pooling and global minimum pooling operations to capture spatial features. Global average pooling aggregates data across the entire time axis to preserve trend information, while global minimum pooling extracts local minimums, crucial for detecting subtle changes without losing significant details. This dual pooling strategy yields a comprehensive feature representation. However, static attention alone insufficiently captures temporal dynamics of spatial information. Therefore, we augment it with dynamic spatial attention, which interacts and integrates spatial information across different time steps using compressed excitation methods [46]. This modification enhances the model’s ability to learn dynamic spatial patterns over time. In summary, our spatial attention module (SAU) combines global and dynamic spatial attention mechanisms to more accurately model the dynamic evolution of meteorological data. Since complex climate systems often exhibit pronounced spatiotemporal dynamics, including large-scale precipitation variations due to monsoons, localized effects of mountainous terrain on wind patterns, and interactions among multiple climate systems. Our model addresses these challenges through the use of global static attention (GSA) and channel dynamic attention (CDA). GSA effectively captures broad-scale climate trends, such as long-term seasonal changes and large-scale airflow patterns. CDA identifies key dynamic climate events within time series data, such as frontal activities and cyclone trajectories. Furthermore, our dual-pooling strategy retains comprehensive climate trend information through global average pooling, while global minimum pooling enables the model to detect local extreme events and subtle changes, which are critical for identifying phenomena like heavy rainfall and drought. Dynamic spatial attention further enhances the model’s ability to adapt its focus to spatial features over time, thereby capturing the spatiotemporal evolution of meteorological elements, particularly in regions with complex terrain, such as Qinghai Province. By integrating global and dynamic attention mechanisms, our model provides a more thorough simulation of meteorological data across spatiotemporal scales. This approach improves prediction accuracy, especially for complex climate phenomena, and significantly enhances the model’s generalization capability. We elaborate on the proposed spatial attention module (SAU):

$\begin{matrix} GSA (R) = σ (f^{7 \times 7} ([AvgPool (R); MinPool (R)])) \end{matrix}$

(1)

$\begin{matrix} C D A (R) = S E (AvgPool (R)) \end{matrix}$

(2)

$\begin{matrix} R^{'} = G S A (R) \otimes C D A (R) ⊙ R \end{matrix}$

(3)

where R is the processed representation of shape $B \times (T \times C^{'}) \times H \times W$ ; $A v g P o o l ()$ and $M i n P o o l ()$ represent average pooling and minimum pooling; $f_{7 \times 7}$ represents the result of fusion of average pooling and minimum pooling using the convolution kernel of $7 \times 7$ ; $σ$ represents the $S i g m o i d$ activation function; $S E$ represents the compressed excitation module composed of fully connected layers; $G S A$ and $C D A$ represent global static attention and channel dynamic attention, respectively; ⊗ represents Kronecker product; and ⊙ represents Hadamard product.
Temporal Convolution Unit. The temporal convolution unit utilizes a one-dimensional convolution operation. The convolution kernel performs local perception by sliding across different time steps to capture short-term changes in meteorological data. To comprehensively understand long-term dependencies, we implement a two-layer convolution structure: first, a standard one-dimensional convolution layer, and second, a dilated convolution layer, which expands the receptive field. This two-layer structure effectively encodes temporal information and enriches the representation of temporal features through residual connections. To fully consider the spatiotemporal relationships in the data, we adopt a data reshaping strategy. Specifically, we reshape the data tensor from $B \times (T \times C) \times H \times W$ to $(B \times H \times W) \times T \times C$ , merging the spatial and temporal dimensions. This operation aids the model in capturing spatiotemporal dynamics, enabling the temporal convolution unit to fuse spatial and temporal information more effectively. After processing through the spatiotemporal convolutional network (TCN) [47], we reshape the data back into the format $B \times (T \times C) \times H \times W$ for subsequent processing in the decoder module. This reshaping ensures smooth data transfer between modules and facilitates the collaborative functioning of the entire model. In summary, the organic combination of temporal convolution units and spatial attention units enables our spatiotemporal fusion module to comprehensively consider the spatiotemporal relationships in meteorological data, thereby significantly improving forecasting accuracy. We have theoretically analyzed how this combination enriches the model’s understanding of meteorological data. In the next section, we will experimentally verify the significant improvement in forecasting performance achievable through these enhancements.

4. Results

We selected the fifth generation of European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis (ERA5) data as the dataset to evaluate our model. This dataset provides meteorological variable data over a long period, allowing for a comprehensive assessment of our model’s performance and its generalization ability across different meteorological variables. We designed experiments using various upper air and ground variables and verified the accuracy and effectiveness of our model by comparing it with the results of the most advanced spatiotemporal prediction models. The purpose of our study is to gain a deeper understanding of the performance of our proposed model under various meteorological conditions and provide a reliable reference for further advancements in the field of meteorological prediction.

4.1. Experiment Setup

We implemented a series of models in the PyTorch framework within an environment built using Anaconda (version 4.10.3) and conducted experiments on two NVIDIA A100 GPUs. The specific experimental parameters are as follows: we set the training epoch to 100, used training data with a batch size of 64, and adopted the Adam optimizer. Additionally, to improve training speed and stability, we introduced a planned sampling strategy for models based on recurrent network architectures (such as ConvLSTM, PredRNN, etc.). During model training, the proportion of real data and model prediction results as input is dynamically adjusted.

4.2. Evaluation Metrics

In the following experiments, we utilize the mean square error (MSE), mean absolute error (MAE), and anomaly correlation coefficient (ACC) to evaluate the prediction performance of the model. These metrics represent the average values over the prediction period. Lower values of MSE and MAE, and higher values of ACC, indicate higher prediction accuracy and stability.

4.3. Analysis and Evaluation of Comparative Experimental Results

For meteorological data, we conducted extensive experiments on different surface and high-altitude variables. The comparison results of our model with other methods are shown in Table 1 and Table 2. For surface variables, the STFM outperforms the previous best models, reducing the MSE index by approximately 5% to 7%. For higher-altitude variables, such as the 850 hPa temperature (Temperature_850), the STFM reduces the MSE by 8.5% compared to the previous best model. For the 500 hPa geopotential height and 500 hPa relative humidity variables, the MSE index is reduced by 3% and 1%, respectively. Overall, the STFM shows great potential as an effective and widely applicable meteorological prediction model.

To clearly demonstrate the prediction performance of the STFM, we present qualitative results in Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9. In each figure, the first row shows the input frames, the second row shows the ground truth frames, the third row shows the prediction results from the STFM, the fourth row shows the best prediction results among the comparison models, and the last two rows display the prediction error graphs. The error is calculated as the absolute value of the difference between the predicted frame and the ground truth frame. The fifth row shows the prediction error graph of the STFM, and the last row shows the error graph with the best prediction result among the comparison models. Next, we will analyze the prediction results in detail.

Temperature. In Table 1 and Table 2, we present the quantitative comparison results of the STFM and existing models for the 850 hPa temperature (t_850) and 2 m Temperature (t2m) variables. For these variables, the STFM reduced the MSE evaluation index by 8.5% and 7.5%, respectively, compared to the optimal TAU. Figure 4 and Figure 5 show the qualitative results for the Temperature_850 and 2 m Temperature variables. It is evident that after the sixth frame (six hours into the future), the predicted values of the TAU begin to significantly deviate from the true values, whereas the STFM maintains a relatively small deviation. Between the tenth and twelfth frames (ten to twelve hours into the future), the predicted values of the TAU in some areas show serious inconsistencies with the true values, while the STFM significantly reduces the area of such prediction inconsistencies. This shows that the STFM demonstrates superior performance in capturing seasonal variations compared to TAU. Specifically, when forecasting temperature over the next six hours and the subsequent ten to twelve hours, the STFM exhibits a smaller prediction bias, indicating a more accurate response to seasonal fluctuations. Additionally, in long-term forecasts, the STFM effectively maintains prediction consistency, reflecting its enhanced adaptability to the dynamic changes within the climate system. This advantage is particularly notable in the short- to medium-term forecasts, from the next six hours to the next ten to twelve hours. The STFM’s efficacy in capturing long-term dependencies and spatiotemporal dynamics is attributed to its extended convolutional layers and dynamic spatial attention mechanisms. These features contribute to lower mean squared error (MSE) and reduced prediction bias, especially in the context of complex climate conditions.
Wind speed component. In Table 1, we present the quantitative comparison results of the STFM and existing models for the 10 m wind speed u-component (u_10) and the 10 m wind speed v-component (v_10). For the u_10 and v_10 variables, the STFM shows about a 5.7% reduction in the MSE evaluation index compared to the optimal TAU. Figure 6 and Figure 7 display the qualitative results for the u_10 and v_10 variables. From the error graphs, we can see that in the prediction of wind speed variables over the next 12 h, the STFM does not exhibit significant prediction deviations as the prediction time increases. In contrast, the TAU begins to show serious prediction deviations in multiple areas after the eighth frame (eight hours into the future). The observed performance indicates that the STFM demonstrates a strong ability to adapt to the dynamic changes within the climate system. In contrast, the TAU exhibits significant deviation in predictions beyond the eighth frame, which may be attributed to its inadequate capture of seasonal variations in wind speed and local wind patterns, leading to reduced prediction accuracy. The STFM’s stability and accuracy in the short- to medium-term forecasts underscore its effectiveness in adapting to fluctuations in wind speed and complex climate conditions. This capability not only enhances the precision of wind speed predictions but also improves responsiveness to climate change, underscoring its substantial practical application value.
Geopotential and Relative Humidity. In Table 2, we present the quantitative comparison results of the STFM and existing models for the 500 hPa geopotential (Geopotential_500) and 500 hPa relative humidity (Relative_humidity_500) variables. For Geopotential_500, the STFM reduces the MSE by about 3% compared to the best-performing Uniformer. For Relative_humidity_500, the STFM achieves a 1% reduction in MSE compared to the best-performing SimVP-V2. As shown in Figure 8 and Figure 9, although the improvement of the STFM is modest, it is still evident that the error maps of the STFM are significantly sparser than those of the Uniformer and SimVP-V2. The STFM’s advantage in predicting 500 hPa potential indicates its superior ability to capture atmospheric circulation characteristics. Additionally, the mean squared error (MSE) for relative humidity is reduced by 1%, reflecting an improvement in the model’s performance for predicting humidity distribution. While the enhancement may seem modest, the increased stability and accuracy of the STFM contribute significantly to a better understanding of middle-atmosphere characteristics.

4.4. Analysis and Evaluation of Ablation Experiment Results

To intuitively demonstrate the effectiveness of different components of the model, we conducted comprehensive ablation experiments on all predicted meteorological variables to verify how our work enhances the prediction accuracy of the model. For STFM, the comparison models include STFM-NTE, which excludes temporal encoding, and STFM-NSTFU, which omits the spatiotemporal fusion module. We further performed ablation experiments on the individual components of the spatiotemporal fusion module—specifically, the spatial attention module and the temporal convolution module. Consequently, the comparison models for STFM also include STFM-NTE-NSAU, which lacks both temporal encoding and the spatial attention module, and STFM-NTE-NTCU, which omits temporal encoding and the temporal convolution module. These ablation experiments comprehensively verify the contributions of the different modules we proposed. The results of the ablation experiments are presented in Table 3 and Table 4.

Temporal Encoder. The results indicate that for the three variables Geopotential_500, Relative humidity_500, and u_10, the MSE of STFM-NTE increased by 9.7%, 1.8%, and 0.77%, respectively, compared to STFM. Since the encoder module is crucial for feature extraction in the entire model, and the encoder of STFM-NTE only uses convolution operations for spatial feature extraction, it is less effective than the spatiotemporal encoder used in STFM. Therefore, adding temporal encoding to the encoder significantly enhances the overall prediction accuracy of the model.
Spatiotemporal Fusion Unit. The results show that STFM-NSTFU significantly increases the MSE evaluation index for each variable compared to SFTM. For instance, the Geopotential_500 variable increased by 12%, and the Relative humidity_500 and v_10 variables increased by 5%. Since STFM-NSTFU lacks a spatiotemporal fusion unit, it misses out on guiding high-dimensional spatiotemporal information, leading to a considerable drop in prediction accuracy.
Temporal Convolutional Unit. The prediction loss of STFM-NTE-NSAU on each meteorological variable increased significantly compared to STFM, which lacks temporal encoding and spatial attention modules. This demonstrates the crucial role of the temporal convolutional module.

5. Conclusions and Discussion

This paper introduces a Spatiotemporal Fusion Model (STFM) to address the limitations of deep learning spatiotemporal prediction models in meteorological forecasting. By integrating the spatiotemporal encoding module and the spatiotemporal fusion module, we significantly improve the prediction accuracy of common meteorological variables for the next 12 h. Experimental results based on ERA5 data from Qinghai Province show that STFM outperforms other deep learning spatiotemporal weather forecasting models in terms of prediction indicators for 2 m temperature, 850 hPa temperature, 10 m wind speed u-component, 10 m wind speed v-component, 500 hPa relative humidity, and 500 hPa potential variables. This study mainly introduces the development and preliminary validation of the proposed model, demonstrating its potential. Despite the promising results, the STFM has certain limitations that warrant further exploration. One such limitation is its dependency on high-quality historical data, particularly in regions with sparse observational data, which may affect its predictive accuracy. Moreover, while the model demonstrates improved performance in complex terrains, its computational complexity is higher compared to simpler models, potentially requiring more resources during training and inference. Additionally, for certain variables, such as 500 hPa geopotential height, the performance gain over traditional models remains marginal, indicating room for improvement. This work can be seen as a preliminary introduction to the method. In addition, the ERA5 reanalysis data are not completely equivalent to the actual operational data. Operational data usually contain more noise and observation errors, which may affect the performance of the model. Future research will focus on a more comprehensive analysis of the applicability of the model, as well as further validation with larger actual operational datasets, different regions, and longer time periods to thoroughly evaluate its practicality.

Author Contributions

Conceptualization, J.L. and L.W.; methodology, J.L.; software, J.L.; validation, J.L. and L.W.; formal analysis, J.H. and X.W.; investigation, L.W.; resources, L.W. and J.H.; data curation, L.W.; writing—original draft preparation, J.L.; writing—review and editing, L.W., T.Z., J.H., X.W. and F.T.; visualization, J.L.; supervision, L.W., T.Z., J.H., X.W. and F.T.; project administration, L.W., T.Z. and J.H.; funding acquisition, T.Z. and X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is supported by the National Natural Science Foundation of China (No. 42265010, No. 62162053) and the Natural Science Foundation of Qinghai Province (2023-ZJ-906M). Research activity at BNL was under the Brookhaven National Laboratory contract DE-SC0012704.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The fifth-generation ECMWF reanalysis data used in this study are publicly available on the ERA5 website https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-pressure-levels?tab=form, accessed on 30 March 2024.

Acknowledgments

We thank the Institute of High Performance of Qinghai University for providing the computing resources.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Joslyn, S.; Savelli, S. Communicating forecast uncertainty: Public perception of weather forecast uncertainty. Meteorol. Appl. 2010, 17, 180–195. [Google Scholar] [CrossRef]
Scher, S.; Messori, G. Predicting weather forecast uncertainty with machine learning. Q. J. R. Meteorol. Soc. 2018, 144, 2830–2841. [Google Scholar] [CrossRef]
Lorenc, A.C. Analysis methods for numerical weather prediction. Q. J. R. Meteorol. Soc. 1986, 112, 1177–1194. [Google Scholar] [CrossRef]
Bauer, P.; Thorpe, A.; Brunet, G. The quiet revolution of numerical weather prediction. Nature 2015, 525, 47–55. [Google Scholar] [CrossRef]
Schultz, M.G.; Betancourt, C.; Gong, B.; Kleinert, F.; Langguth, M.; Leufen, L.H.; Mozaffari, A.; Stadtler, S. Can deep learning beat numerical weather prediction? Philos. Trans. R. Soc. A 2021, 379, 20200097. [Google Scholar] [CrossRef]
Salman, A.G.; Kanigoro, B.; Heryadi, Y. Weather forecasting using deep learning techniques. In Proceedings of the 2015 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Depok, Indonesia, 10–11 October 2015; pp. 281–285. [Google Scholar]
Ren, X.; Li, X.; Ren, K.; Song, J.; Xu, Z.; Deng, K.; Wang, X. Deep learning-based weather prediction: A survey. Big Data Res. 2021, 23, 100178. [Google Scholar] [CrossRef]
Buizza, R.; Tribbia, J.; Molteni, F.; Palmer, T. Computation of optimal unstable structures for a numerical weather prediction model. Tellus A 1993, 45, 388–407. [Google Scholar] [CrossRef]
Rodwell, M.; Palmer, T. Using numerical weather prediction to assess climate models. Q. J. R. Meteorol. Soc. A J. Atmos. Sci. Appl. Meteorol. Phys. Oceanogr. 2007, 133, 129–146. [Google Scholar] [CrossRef]
Onyango, A.O.; Ongoma, V. Estimation of mean monthly global solar radiation using sunshine hours for Nairobi City, Kenya. J. Renew. Sustain. Energy 2015, 7, 053105. [Google Scholar] [CrossRef]
Liang, H.; Sun, X.; Sun, Y.; Gao, Y. Text feature extraction based on deep learning: A review. EURASIP J. Wirel. Commun. Netw. 2017, 2017, 211. [Google Scholar] [CrossRef]
Shi, X.; Yeung, D.Y. Machine learning for spatiotemporal sequence forecasting: A survey. arXiv 2018, arXiv:1808.06865. [Google Scholar]
Cao, S.; Wu, L.; Wu, J.; Wu, D.; Li, Q. A spatio-temporal sequence-to-sequence network for traffic flow prediction. Inf. Sci. 2022, 610, 185–203. [Google Scholar] [CrossRef]
Kim, T.; Yue, Y.; Taylor, S.; Matthews, I. A decision tree framework for spatiotemporal sequence prediction. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015; pp. 577–586. [Google Scholar]
Rodrigues, E.R.; Oliveira, I.; Cunha, R.; Netto, M. DeepDownscale: A deep learning strategy for high-resolution weather forecast. In Proceedings of the 2018 IEEE 14th International Conference on e-Science (e-Science), Amsterdam, The Netherlands, 29 October–1 November 2018; pp. 415–422. [Google Scholar]
Hewage, P.; Trovati, M.; Pereira, E.; Behera, A. Deep learning-based effective fine-grained weather forecasting model. Pattern Anal. Appl. 2021, 24, 343–366. [Google Scholar] [CrossRef]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.c. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar]
Shi, E.; Li, Q.; Gu, D.; Zhao, Z. A method of weather radar echo extrapolation based on convolutional neural networks. In Proceedings of the MultiMedia Modeling: 24th International Conference, MMM 2018, Bangkok, Thailand, 5–7 February 2018; Proceedings, Part I 24. Springer: Berlin/Heidelberg, Germany, 2018; pp. 16–28. [Google Scholar]
Akbari Asanjan, A.; Yang, T.; Hsu, K.; Sorooshian, S.; Lin, J.; Peng, Q. Short-term precipitation forecast based on the PERSIANN system and LSTM recurrent neural networks. J. Geophys. Res. Atmos. 2018, 123, 12–543. [Google Scholar] [CrossRef]
Wang, Y.; Jiang, L.; Yang, M.H.; Li, L.J.; Long, M.; Fei-Fei, L. Eidetic 3D LSTM: A model for video prediction and beyond. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Oprea, S.; Martinez-Gonzalez, P.; Garcia-Garcia, A.; Castro-Vargas, J.A.; Orts-Escolano, S.; Garcia-Rodriguez, J.; Argyros, A. A review on deep learning techniques for video prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 2806–2826. [Google Scholar] [CrossRef]
Jhuang, H.; Gall, J.; Zuffi, S.; Schmid, C.; Black, M.J. Towards understanding action recognition. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 3192–3199. [Google Scholar]
Li, K.; Wang, Y.; Zhang, J.; Gao, P.; Song, G.; Liu, Y.; Li, H.; Qiao, Y. Uniformer: Unifying convolution and self-attention for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 12581–12600. [Google Scholar] [CrossRef]
Wang, Y.; Wu, H.; Zhang, J.; Gao, Z.; Wang, J.; Philip, S.Y.; Long, M. Predrnn: A recurrent neural network for spatiotemporal predictive learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 2208–2225. [Google Scholar] [CrossRef]
Tan, C.; Gao, Z.; Li, S.; Li, S.Z. Simvp: Towards simple yet powerful spatiotemporal predictive learning. arXiv 2022, arXiv:2211.12509. [Google Scholar]
Zhang, J.; Zheng, Y.; Qi, D. Deep spatio-temporal residual networks for citywide crowd flows prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
Kang, S.M.; Wildes, R.P. Review of action recognition and detection methods. arXiv 2016, arXiv:1610.06906. [Google Scholar]
Lee, J.; Lee, J.; Lee, S.; Yoon, S. Mutual suppression network for video prediction using disentangled features. arXiv 2018, arXiv:1804.04810. [Google Scholar]
Gehring, J.; Auli, M.; Grangier, D.; Dauphin, Y.N. A convolutional encoder model for neural machine translation. arXiv 2016, arXiv:1611.02344. [Google Scholar]
Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 1982, 79, 2554–2558. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Wang, Y.; Long, M.; Wang, J.; Gao, Z.; Yu, P.S. Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Wang, Y.; Gao, Z.; Long, M.; Wang, J.; Philip, S.Y. Predrnn++: Towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. In Proceedings of the International Conference on Machine Learning. PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 5123–5132. [Google Scholar]
Wang, Y.; Zhang, J.; Zhu, H.; Long, M.; Wang, J.; Yu, P.S. Memory in memory: A predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9154–9162. [Google Scholar]
Chang, Z.; Zhang, X.; Wang, S.; Ma, S.; Ye, Y.; Xinguang, X.; Gao, W. Mau: A motion-aware unit for video prediction and beyond. Adv. Neural Inf. Process. Syst. 2021, 34, 26950–26962. [Google Scholar]
Wang, Q.; Atkinson, P.M. Spatio-temporal fusion for daily Sentinel-2 images. Remote Sens. Environ. 2018, 204, 31–42. [Google Scholar] [CrossRef]
Kuettel, D.; Breitenstein, M.D.; Van Gool, L.; Ferrari, V. What’s going on? Discovering spatio-temporal dependencies in dynamic scenes. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 1951–1958. [Google Scholar]
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv 2017, arXiv:1707.01926. [Google Scholar]
Yao, H.; Tang, X.; Wei, H.; Zheng, G.; Li, Z. Revisiting spatial-temporal similarity: A deep learning framework for traffic prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 5668–5675. [Google Scholar]
Lin, H.; Bai, R.; Jia, W.; Yang, X.; You, Y. Preserving dynamic attention for long-term spatial-temporal prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 36–46. [Google Scholar]
Jwaid, T.; Meyer, H.D.; Ismail, A.H.; Baets, B.D. Curved splicing of copulas. Inf. Sci. 2021, 556, 95–110. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; proceedings, part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Zou, X.; Li, K.; Xing, J.; Tao, P.; Cui, Y. PMAA: A progressive multi-scale attention autoencoder model for high-performance cloud removal from multi-temporal satellite imagery. arXiv 2023, arXiv:2303.16565. [Google Scholar]
Li, K.; Yang, R.; Hu, X. An efficient encoder-decoder architecture with top-down attention for speech separation. arXiv 2022, arXiv:2209.15200. [Google Scholar]
Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Lea, C.; Flynn, M.D.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal convolutional networks for action segmentation and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 156–165. [Google Scholar]

Figure 1. Overview of the study area, including elevation changes, rivers, and urban distribution in the area.

Figure 2. The overall structure of STFM and the data flow in encoder. STFM consists of a spatiotemporal encoder, a spatiotemporal fusion module (STFU), and a decoder.

Figure 3. The overall structure of the spatiotemporal fusion unit (STFU) and spatiotemporal attention unit. STFU consists of a spatiotemporal attention unit and a temporal convolution.

Figure 4. Visualization results of our STFM and TAU on the 850 hPa temperature dataset.

Figure 5. Visualization results of our STFM and TAU on the 2 m temperature dataset.

Figure 6. Visualization results of our STFM and TAU on the 10 m wind speed u-component dataset.

Figure 7. Visualization results of our STFM and TAU on the 10 m wind speed v-component dataset.

Figure 8. Visualization results of our STFM and UniFormer on the 500 hPa potential dataset.

Figure 9. Visualization results of our STFM and SimVP v2 on the 500 hPa relative humidity dataset.

Table 1. Comparison of prediction performance of our STFM with existing methods on land surface variable datasets.

	t2m			u10			v10
Method	MSE↓	MAE↓	ACC↑	MSE↓	MAE↓	ACC↑	MSE↓	MAE↓	ACC↑
ConvLSTM	0.8518	0.6078	0.9929	1.052	0.6725	0.9	0.8844	0.6354	0.8603
E3D-LSTM	0.8451	0.5969	0.993	1.002	0.6544	0.9044	0.8995	0.6372	0.8579
PredRNN v2	0.8169	0.588	0.9933	1.213	0.711	0.883	0.9234	0.6422	0.8536
UniFormer	0.669	0.5294	0.9944	0.9465	0.6309	0.9096	0.8261	0.6046	0.8692
SimVP v2	1.005	0.678	0.9916	1.028	0.6846	0.903	0.8583	0.6313	0.8635
TAU	0.5685	0.4775	0.9953	0.8781	0.5975	0.9167	0.7563	0.5682	0.8816
STFM (Ours)	0.5255	0.4572	0.9956	0.8276	0.5833	0.9217	0.7126	0.5539	0.888

Table 2. Comparison of prediction performance of our STFM with existing methods on the high-altitude variable dataset.

	t850			Geopotential_500			Relative_humidity_500
Method	MSE↓	MAE↓	ACC↑	MSE↓	MAE↓	ACC↑	MSE↓	MAE↓	ACC↑
ConvLSTM	0.8499	0.6069	0.9929	4973	51.26	0.9968	158.2	8.72	0.8874
E3D-LSTM	0.8277	0.5932	0.9932	6101	57.49	0.996	138.9	8.22	0.9014
PredRNN v2	0.9106	0.6152	0.9925	4224	47.39	0.9972	164.8	8.94	0.8825
UniFormer	0.6752	0.5317	0.9943	4164	47.88	0.9973	141.2	8.1	0.8995
SimVP v2	0.9336	0.6509	0.9922	4881	52.16	0.9968	117.9	7.44	0.9163
TAU	0.5737	0.4781	0.9953	4658	50.37	0.9969	125.1	7.59	0.9111
STFM (Ours)	0.5244	0.4546	0.9956	4027	46.9	0.9974	116.5	7.37	0.9177

Table 3. Ablation experiments on surface meteorological variable datasets.

		t2m			u10			v10
Method	MSE↓	MAE↓	ACC↑	MSE↓	MAE↓	ACC↑	MSE↓	MAE↓	ACC↑
STFM	0.5255	0.4572	0.9956	0.8276	0.5833	0.9217	0.7126	0.5539	0.888
STFM-NTE	0.5169	0.4588	0.9957	0.8341	0.5805	0.9208	0.7126	0.5598	0.8886
STFM-NSTFU	0.5458	0.4624	0.9954	0.8636	0.5939	0.9179	0.7488	0.567	0.8822
STFM-NTE-NSAU	0.5616	0.4701	0.9953	0.8566	0.5877	0.9185	0.7424	0.5666	0.8839
STFM-NTE-NTCN	0.5339	0.4683	0.9955	0.8402	0.5878	0.92	0.7229	0.5692	0.8865

Table 4. Ablation experiments on high-altitude meteorological variable datasets.

		t850			Geopotential_500			Relative_humidity_500
Method	MSE↓	MAE↓	ACC↑	MSE↓	MAE↓	ACC↑	MSE↓	MAE↓	ACC↑
STFM	0.5244	0.4546	0.9956	4027	46.9	0.9974	116.5	7.37	0.9177
STFM-NTE	0.5176	0.4567	0.9956	4463	49.58	0.997	118.7	7.43	0.9156
STFM-NSTFU	0.5428	0.4622	0.9955	4524	49.99	0.997	122.4	7.58	0.913
STFM-NTE-NSAU	0.5534	0.4685	0.9954	4745	50.67	0.9968	124	7.53	0.9124
STFM-NTE-NTCN	0.5339	0.4667	0.9955	4226	48.04	0.9972	119.5	7.4	0.9154

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, J.; Wu, L.; Zhang, T.; Huang, J.; Wang, X.; Tian, F. STFM: Accurate Spatio-Temporal Fusion Model for Weather Forecasting. Atmosphere 2024, 15, 1176. https://doi.org/10.3390/atmos15101176

AMA Style

Liu J, Wu L, Zhang T, Huang J, Wang X, Tian F. STFM: Accurate Spatio-Temporal Fusion Model for Weather Forecasting. Atmosphere. 2024; 15(10):1176. https://doi.org/10.3390/atmos15101176

Chicago/Turabian Style

Liu, Jun, Li Wu, Tao Zhang, Jianqiang Huang, Xiaoying Wang, and Fang Tian. 2024. "STFM: Accurate Spatio-Temporal Fusion Model for Weather Forecasting" Atmosphere 15, no. 10: 1176. https://doi.org/10.3390/atmos15101176

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

STFM: Accurate Spatio-Temporal Fusion Model for Weather Forecasting

Abstract

1. Introduction

2. Related Work

2.1. Spatiotemporal Weather Prediction

2.2. Spatiotemporal Dependency Modeling

3. Data and Methods

3.1. ERA5 Data of Qinghai Province, China

3.2. Methodology

3.2.1. Problem Definition

3.2.2. Overall Framework

4. Results

4.1. Experiment Setup

4.2. Evaluation Metrics

4.3. Analysis and Evaluation of Comparative Experimental Results

4.4. Analysis and Evaluation of Ablation Experiment Results

5. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI