Next Article in Journal
Evaluations of Microwave Sounding Instruments Onboard FY-3F Satellites for Tropical Cyclone Monitoring
Previous Article in Journal
3D-CNN with Multi-Scale Fusion for Tree Crown Segmentation and Species Classification
Previous Article in Special Issue
Memory Augmentation and Non-Local Spectral Attention for Hyperspectral Denoising
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Lightweight Transformer-Based Spatiotemporal Analysis Prediction Algorithm for High-Dimensional Meteorological Data

1
School of Statistics and Data Science, KLMDASR, LEBPS, and LPMC, Nankai University, Tianjin 300071, China
2
Information Fusion Institute, Naval Aeronautical University, Yantai 264001, China
3
SDU-ANU Joint Science College, Shandong University, Jinan 250100, China
4
School of Computer Science and Technology, Tiangong University, Tianjin 300387, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(23), 4545; https://doi.org/10.3390/rs16234545
Submission received: 12 September 2024 / Revised: 15 October 2024 / Accepted: 25 November 2024 / Published: 4 December 2024
(This article belongs to the Special Issue Remote Sensing Cross-Modal Research: Algorithms and Practices)

Abstract

:
High-dimensional meteorological data offer a comprehensive overview of meteorological conditions. Nevertheless, predicting regional high-dimensional meteorological data poses challenges due to the vast scale and rapid changes. Apart from slow conventional numerical weather prediction methods, recently developed deep learning methods often fail to fully integrate spatial information of the high-dimensional data and require a significant amount of computational resources. This paper presents the spatiotemporal analysis fitting prediction algorithm (SA-Fit), an approximation algorithm for regional high-dimensional meteorological data prediction. SA-Fit proposes two key designs to achieve efficient prediction of the high-dimensional data. SA-Fit introduces a lightweight Transformer-based spatiotemporal analysis network to encode spatiotemporal information, which can integrate the interaction information between different coordinates in the data. Furthermore, SA-Fit introduces explicit functions with a lasso penalty to fit variations in high-dimensional meteorological data, achieving the prediction of a large amount of data with minimal prediction values. We performed experiments using the ERA5 dataset from the Shanghai and Xi’an regions. The experimental results show that SA-Fit is comparable to other advanced deep learning prediction methods in overall prediction performance. SA-Fit shortens training time and significantly reduces model parameters while using the Transformer structure to ensure prediction accuracy.

Graphical Abstract

1. Introduction

Meteorological prediction is extremely important in contemporary society, offering timely and accurate weather predictions that contribute significantly to diverse societal advancements. In the field of natural energy, the implementation of an effective prediction system can reduce energy costs and enhance energy utilization efficiency [1,2]. Furthermore, meteorological prediction also influences construction planning [3], water resource management [4], and numerous other domains [5]. Meteorological prediction is becoming increasingly crucial in agriculture, particularly with the ongoing challenges of climate change. These predictions are vital for optimizing planting strategies and resource use, leading to enhanced crop yields and profitability. In the southeastern U.S. and Argentina, ENSO-based predictions have improved corn planting decisions, boosting incomes for farmers [6]. In England, Wales, and the Hetao district in China, weather forecasts aid in efficient nitrogen use and inform practices that positively impact both soil health and crop yields [7,8]. These examples highlight the critical role of meteorological prediction in production practices in modern society.
Nevertheless, meteorological prediction encounters various difficulties and challenges due to complex meteorological dynamics. Currently, meteorological prediction methods can primarily be categorized into two categories [9]: numerical weather prediction (NWP) methods and deep learning methods. The NWP methods focus on simulating various physical processes using a series of partial differential equations (PDEs) and solving them through numerical simulations [10,11]. However, NWP methods necessitate significant computational resources and are time-consuming when it comes to solving PDEs [12] Moreover, the formulas employed in the NWP methods are often inadequate and unavoidably introduce approximation and calculation errors [13,14]. Deep learning methods harness the formidable learning capacity of neural networks to acquire knowledge from historical meteorological data and make predictions very quickly [15,16]. There has been considerable progress in using deep learning for weather prediction research. FourCastNet [17] has established a high-resolution prediction model for global weather forecasting for the first time using data-driven methods. Although there is a certain gap in prediction accuracy compared to the ECMWF Integrated Forecasting System (IFS) based on NWP, its prediction speed is one order of magnitude faster than IFS. Pangu-Weather [18] has used a Transformer-based network for the first time in global weather forecasting achieving prediction accuracy higher than IFS. GraphCast [19] utilizes graph neural networks to process meteorological information of different resolutions, surpassing Pangu-Weather in prediction accuracy. Data-driven deep learning models can skip the step of building complex and refined models through historical data, thus avoiding limitations that exist in NWP models, such as biases in convergence parameterization schemes that strongly affect prediction forecasts. However, such models become black boxes without the support of meteorological theory, often only able to obtain predictive results but lacking interpretability.
Traditional RNN and CNN networks have been widely used in regional meteorological forecasting, but their shortcomings are obvious in dealing with high-dimensional data. Shi et al. [20,21], Yu et al. [22], Wang et al. [23] improved the RNN network structure, making it perform better in spatiotemporal prediction tasks and capable of handling meteorological predictions at single pressure levels. Gao et al. [24] developed a fully CNN-based model to achieve comprehensive prediction performance. However, networks based on RNN and CNN can only predict meteorological data at a single pressure level and cannot capture the interaction information between data at different heights. Furthermore, the structures of RNN and CNN make it difficult to meet the accuracy requirements for prediction, and a more reliable network structure needs to be introduced.
Although the attention mechanism of the Transformer has shown excellent performance in many tasks, the original Transformer structure cannot process the high-dimensional meteorological data. In time series prediction tasks [25,26], the Transformer demonstrates an absolute accuracy advantage when compared to traditional RNN-based LSTM and regression-based ARIMA. Similarly, in image understanding methods, the Transformer has clearly emerged as a successor to the previous mainstream CNN methods [27,28]. In the case of 5D meteorological data, the dataset encompasses five dimensions: longitude, latitude, pressure level, multiple selected meteorological variables, and time. The existing Transformer-based methods can only predict meteorological data point by point for each coordinate in space. As the data scale increases, it demands more predictive models, consequently increasing the demand for computation resources. More importantly, point-by-point prediction ignores the interaction of meteorological data in different coordinates and cannot effectively utilize spatial information in high-dimensional data. As far as we know, SA-Fit is the first method to use the Transformer structure for regional multi-pressure level meteorological prediction.
We drew inspiration from the curve fitting algorithm in machine learning and innovatively designed the Transformer-based spatiotemporal prediction network to achieve prediction. The curve fitting algorithm is a method of constructing explicit function curves that accurately capture the patterns of data points [29,30]. Curve fitting algorithms have found extensive applications in earth science [31,32] as well as diverse domains like biology [33] and economics [34]. Our objective is to derive tailored fitting functions that capture the variations in regional meteorological data. To effectively apply the curve fitting algorithm to high-dimensional meteorological data, we divide the intricate meteorological data into distinct segments and continuously encode them through a spatiotemporal Transformer network while reducing their dimensions. Subsequently, we employ multiple fitting functions to accommodate the unique characteristics of each segment, with the coefficients being our prediction targets.
In this paper, we propose the spatiotemporal analysis fitting prediction algorithm (SA-Fit), an innovative integration of a lightweight Transformer-based network and curve fitting algorithm. Based on the inherent spatiotemporal coherence and high-dimensionality of meteorological data, SA-Fit adopts two key strategies. The first strategy is to improve the Transformer-based network to process spatial and temporal information of high-dimensional data in a step-by-step manner. The second strategy introduces a novel prediction approach that incorporates fitting functions with a lasso penalty to capture variations in meteorological data. As SA-Fit can concurrently predict meteorological data across multiple pressure levels, it augments the capability of the model to handle high-dimensional data, while reducing the demand for computation resources. The main contributions of this work can be summarized as follows:
(1)
We propose an innovative algorithm that combines a lightweight Transformer-based network with the curve fitting algorithm to achieve efficient prediction of high-dimensional meteorological data.
(2)
We improve the Transformer-based network structure for step-by-step processing of spatiotemporal information, which can fully learn the interaction information of different coordinate points in high-dimensional data.
(3)
Our algorithm greatly reduces the model parameters compared to other Transformer-based prediction models, achieving efficient prediction and reducing the demand for computation resources.

2. Methodology

2.1. Overall Structure

For clarity and convenience, we utilize the symbols listed in Table 1 to represent the variables in five-dimensional meteorological data. We append a tilde (∼) to the corresponding symbol to denote the predicted values. The objective of our study is to utilize known regional meteorological data from past time points to predict the unknown data of the same region at future time points. Assuming the current time point is T c , we sample K + 1 previous data points at a fixed time interval Δ t and employ the algorithm to predict J future data from T c , also with a time interval of Δ t . The entire prediction process can be summarized as follows:
χ ˜ , , , T c + Δ t χ ˜ , , , T c + J Δ t = F θ ( χ , , , T c K Δ t χ , , , T c ) ,
where F θ represents the prediction algorithm and θ denotes the parameters.
Since the input of our model consists of five dimensions (longitude, latitude, pressure level, the selected meteorological variables, and time), and is higher in dimensionality compared to typical network inputs, it presents challenges in effectively integrating the information from all five dimensions. We use the powerful and effective Transformer-based network structure [35] as the core of our network design. The attention mechanism serves as the primary component of our network structure, enabling the extraction and integration of information from five-dimensional data. Specifically, as depicted in Figure 1, we employ a spatial encoder to extract spatial information from 4D data (longitude, latitude, pressure level, and the selected meteorological variables) at various time points, resulting in one-dimensional spatial features. Subsequently, we employ a temporal encoder to encode spatial information from different time points. The effectiveness of this structure, which processes spatial information first and then temporal information, has been confirmed by Arnab et al. [36].
To explicitly represent our prediction results, we draw inspiration from curve algorithms and employ multiple explicit fitting functions to capture the variations between the future meteorological data and data of current time ( T c ). For each combination of meteorological variables and feature time points denoted by ( v , t ), we postulate the existence of a function f v , t ( x , y , p ) to comprehensively depict the disparity between χ , , v , t and χ , , v , T c . For example, when representing the true value of the meteorological variable v at time T c + j Δ t , longitude x, latitude y, and pressure level p, we can use f v , T c + j Δ t ( x , y , p ) + χ x , y , p v , T c . Our objective is to find an explicit function f ˜ v , t ( x , y , p , α v , t ) as an approximation of f v , t ( x , y , p ) , with α v , t representing the unknown parameters that require prediction. The final outputs of our network are the estimated parameters α ˜ v , t for multiple f ˜ v , t ( x , y , p , α v , t ) functions of every combination of meteorological variables and feature time points. The prediction process of SA-Fit for the combination ( v , t ) at longitude x, latitude y, and pressure level p can be summarized as follows:
α ˜ v , t = n e t θ ( χ , , , T c K Δ t · χ , , , T c ) [ v , t ] ,
χ ˜ x , y , p v , t = χ x , y , p v , T c + f ˜ v , t ( x , y , p , α ˜ v , t ) ,
where n e t θ represents our network with parameters θ , and [ v , t ] denotes the partial outputs of the network corresponding to the combination ( v , t ).

2.2. Spatiotemporal Analysis Network

As shown in Figure 2, the spatial encoder begins by employing a token embedding layer to embed the input 4D meteorological data (without time dimension) into C channels. It then alternates between Video Swin Transformer (VST) blocks and patch merging blocks to continually integrate information and downsample the 4D data. After the multi-head self-attention (MSA) block, the resulting four-dimensional tensor is reshaped into a one-dimensional spatial feature.
The VST block, originally proposed by Liu et al. [37], is employed in our model to extract information. Leveraging the structural similarity between video data and meteorological data, we can readily utilize this architecture to encode meteorological information. The VST block comprises two components: 3D window multi-head self-attention (3DW-MSA) and 3D shifted window multi-head self-attention (3DSW-MSA). Consider the input tensor χ , , , t i m e H × W × P × C , where H, W, P, and C represent the selected longitude, latitude, pressure level, and variable numbers, respectively. Moreover, 3DW-MSA partitions the tensor into H h × W w × P p windows with a size of h × w × p × C in a non-overlapping manner. Subsequently, multi-head attention is applied to the tokens within these formed windows. To enable information exchange among separate windows in 3DW-MSA, 3DSW-MSA shifts the windows derived from 3DW-MSA along the longitude, latitude, and pressure level by h 2 × w 2 × p 2 tokens, thereby generating new windows. Attention operations are then performed within these reconstructed windows. The VST block with a depth of 1 is computed as follows:
X = 3 DW MSA ( LN ( X ) ) + X ,
Y = MLP ( LN ( X ) ) + X ,
Y = 3 DSW MSA ( LN ( Y ) ) + Y ,
D = MLP ( LN ( Y ) ) + Y ,
where X and D denote the input and output of the VST block; MLP denotes a two-layer multilayer perceptron; LN denotes layer normalization. The VST block repeats the aforementioned process L times when it has a depth of L.
The patch merging block connects adjacent a × b × c tokens in small patches, where a, b, and c represent the length, width, and height of the patches, respectively. It then applies a linear layer to reduce the dimension of the connected tokens to one quarter. The patch merging block can reduce the number of tokens to H a × W b × P c , where the specific values a, b, and c are determined based on the data sizes.
Finally, the tokens are processed through the MSA block, and then the vectors formed by connecting all tokens are passed through a linear layer to obtain spatial features. The MSA block, with a depth of 1 and an input of X, is defined as follows:
X = LN ( MSA ( X ) + X ) ,
Y = LN ( MLP ( ( X ) ) + X ) ,
where MSA denotes multi-head self-attention.
The temporal encoder is constructed by an MSA block with a depth of L T . Since the attention mechanism is a position- and time-agnostic set operation, we incorporate the spatial features extracted from the spatial encoder into their respective time and position embeddings. For the target time points, we substitute their spatial features with learnable vectors. We feed a sequence of feature vectors into the temporal encoder to extract temporal information. The K tokens corresponding to the feature time points are then individually fed into K distinct linear layers, yielding predictions of function parameters. Predictions of the parameters of the fitting functions for all variables at a future time point are generated by the corresponding spatial feature. Specifically, if there are five variables (Z, Q, T, U, and V) that need to be predicted, the spatial feature at time point T c + j Δ t is processed by the temporal encoder to simultaneously generate function parameters α ˜ Z , T c + j Δ t , α ˜ Q , T c + j Δ t , α ˜ T , T c + j Δ t , α ˜ U , T c + j Δ t and α ˜ V , T c + j Δ t .

2.3. Fitting Functions with a Lasso Penalty

The fundamental part of SA-Fit lies in introducing explicit functions to capture variations in meteorological data. Rather than directly predicting the meteorological variable values, we predict the parameters of the fitting functions. To obtain a predicted value, the corresponding fitting function is first derived based on the meteorological variables and the predicted time point. Then, the corresponding longitude, latitude, and pressure level are used as inputs to the fitting function, yielding the predicted variation of the meteorological variable. The final predicted value is obtained by adding the current value of meteorological variables to the predicted variation. Figure 3 shows the process of obtaining the predicted value of meteorological variable v at longitude x 1 , latitude y 1 , pressure level p 1 , and time point T c + j Δ t through the fitting function.
All prediction results rely on generating the variation of meteorological variables through fitting functions. Therefore, the selection of functions directly impacts the prediction accuracy of the algorithm. In this paper, we choose the multivariate polynomial function as the fitting function to describe the variation of high-dimensional data. The form of multivariate polynomial functions with interaction terms is given by the following:
f ˜ v , t ( x , y , p , α ) = k 1 , k 2 , k 3 a k 1 , k 2 , k 3 x k 1 y k 2 p k 3 ,
where a k 1 , k 2 , k 3 are the function parameters. The highest degree of non-zero terms in multivariate polynomials is referred to as the polynomial degree, denoted as d e g f . For simplicity, we uniformly assign an identical d e g f to all variables and time points.
When using polynomial functions with a high d e g f for fitting, the resulting fitting function contains a multitude of terms, making it difficult to identify the dominant ones. Therefore, we introduce a lasso penalty to the coefficients of the polynomial fitting functions. The lasso penalty for a combination of v and t is defined as follows:
L L a s s o v , t = k 1 , k 2 , k 3 | a k 1 , k 2 , k 3 | .
Our objective is to minimize the absolute values of the fitting function coefficients, effectively conducting a process of variable selection. This process leads to certain coefficients being reduced to zero, retaining only the coefficients of dominant items. For reducing prediction errors in a combination of v and t, our loss function is the mean squared error (MSE), as follows:
L M S E v , t = x , y , p ( χ x , y , p v , t χ ˜ x , y , p v , t ) 2 .
Our final training error is given by the following:
L = v , t L M S E v , t + λ v , t L L a s s o v , t ,
where λ is a hyperparameter.

3. Experiment

3.1. Data and Research Area

We use the ERA5 reanalysis dataset [38,39,40,41] as the research data for SA-Fit. The reanalysis dataset [42] is a globally continuous and seamless meteorological dataset that integrates historical meteorological observation data and output data from meteorological models through recalculations. The ERA5 dataset encompasses meteorological data at the surface and 37 pressure levels, with a spatial resolution of 0.25° × 0.25° in latitude and longitude. It is widely regarded as the most comprehensive and accurate reanalysis dataset globally, and it is widely used in various studies [43,44].
To comprehensively showcase the predictive capability of SA-Fit, we selected two regions as our research areas. The first region we selected was Shanghai, situated in Southeast China (longitude range: 120°E to 123°E, latitude range: 29°N to 32°N). The second region was Xi’an, located in Northwest China (longitude range: 107°E to 110°E, latitude range: 32.5°N to 35.5°N). Our study focuses on 13 specific pressure levels: 50 hPa, 100 hPa, 150 hPa, 200 hPa, 250 hPa, 300 hPa, 400 hPa, 500 hPa, 600 hPa, 700 hPa, 850 hPa, 925 hPa, and 1000 hPa. The meteorological variables chosen for prediction include geopotential (Z), specific humidity (Q), air temperature (T), u-component (U), and v-component (V) of the wind.

3.2. Experiment Setup

We use ERA5 data from 2012 to 2018 as the training set, 2019 data as the validation set, and 2020 and 2021 data as the test set. A time interval ( Δ t ) of 6 h is set, and the data of the initial three days ( K = 12 ) are utilized to predict the data on the next day ( J = 4 ).
Regarding the parameter settings of the network, the initial embedding dimension (C) is set to 32. The first VST block has a depth ( L 1 ) of 1, with a window size ( h × w × p ) of 6 × 6 × 2 . The second VST block has a depth ( L 2 ) of 2, with a window size ( h × w × p ) of 4 × 4 × 2 . The MSA block has a depth ( L 3 ) of 2. The patch merging block has a patch size ( a × b × c ) of 2 × 2 × 2 . The spatial encoder has a depth ( L T ) of 6, and the temporal encoder outputs spatial features with a dimension of 1024.
During training, we set the batch size to 32. All fitting functions adopted are polynomial functions with a degree ( d e g f ) of 10. The hyperparameter λ is set to 1 × 10 5 . We employ an Adam optimizer with an initial learning rate of 1 × 10 5 , along with an exponential decay learning rate scheduler with a gamma of 0.95. The drop rate is set to 0.3. We terminate the training when the loss on the validation set increases twice in comparison to the previous epoch.
The benchmark methods for our experiment include ConvLSTM [20], TrajGRU [21], PredRNN [45], PredRNN++ [46], E3D-LSTM [23], MIM [47], CrevNet [22], and SimVP [24]. Due to the inability of the above comparison methods to predict all pressure levels simultaneously, we generate predictions for each pressure level individually. All experiments were conducted on a single NVIDIA RTX3090 GPU (NVIDIA Corporation, Santa Clara, CA, USA).
We use root mean square error (RMSE) and mean absolute error (MAE) as the criteria for experimental results, which are defined as follows:
R M S E ( m ) = i = 1 n m ( m i m i ^ ) 2 / n m ,
M A E ( m ) = i = 1 n m | ( m i m i ^ ) | / n m ,
where m i is the true value, m i ^ is the predicted value, and n m is the number of variable values. The meteorological variables—geopotential (Z), specific humidity (Q), air temperature (T), u-component (U), and v-component (V) of the wind—are measured in units of m2/s2, g/kg, K, m/s and m/s, respectively.

3.3. Experiment Results

Table 2, Table 3, Table 4 and Table 5 display the RMSE and MAE values of SA-Fit and comparison methods for different prediction time intervals. Despite being an approximate prediction algorithm, SA-Fit achieves comparable prediction accuracy to other direct prediction deep learning methods.
The experimental results demonstrate that the prediction accuracy of SA-Fit varies across different meteorological variables, which can be attributed to the suitability of the polynomial fitting functions for each variable. SA-Fit exhibits significantly higher prediction accuracy than other methods in the 6-h prediction of Z in the Shanghai region, achieving an RMSE of 114.334 and an MAE of 85.805. These values represent a reduction of 37.0% and 35.0%, respectively, compared to the second-best results. In the Xi’an region, SA-Fit achieves reductions of 9.7% in RMSE and 8.4% in MAE for the 18-h prediction of Z compared to the second-best results. However, the prediction accuracy of SA-Fit for Q is relatively unsatisfactory. In the 12-h prediction of Q in the Shanghai region, the RMSE of SA-Fit is 1.107, which is increased by 14.2% compared to the best results, while the MAE is 0.642, which is increased by 25.1%. In the 24-h prediction for the Xi’an region, these two increase ratios are 13.0% and 24.5%, respectively. As for the prediction of the other three variables, SA-Fit and other methods exhibit comparable levels of accuracy.
The experimental results indicate that SA-Fit demonstrates superior performance in short-term prediction. SA-Fit achieved the most accurate predictions in the 6-h T prediction, with an RMSE of 1.228 and an MAE of 0.900 in the Shanghai region, and an RMSE of 1.280 and an MAE of 0.958 in the Xi’an region. But in the 24-h prediction, the RMSE and MAE of SA-Fit increased by 5.0% and 6.7%, respectively, compared to the best results in the Shanghai region, and by 4.0% and 4.6% in Xi’an. SA-Fit also demonstrated the best performance in the 6-h U prediction for the Shanghai region, yielding an RMSE of 3.683 and an MAE of 2.735. However, in the 24-h prediction, SA-Fit had increased rates of 4.5% and 6.7% in RMSE and MAE, respectively, compared to the best results.
Based on the above analysis, the experiment results may imply the need to select different fitting functions for different variables and different prediction times to achieve the best prediction results.

3.4. Prediction at Different Pressure Levels

Figure 4 showcases the detailed variation in RMSE for multiple meteorological variables across different prediction time intervals and pressure levels. The behaviors of prediction results at different atmospheric pressures reveal complex, non-linear trends. For variable Z, the RMSE initially shows a sharp increase as pressure decreases, peaking at approximately 200 hPa, indicating a significant deviation in predictions at higher pressures. After this peak, the RMSE gradually declines, stabilizing between 600 hPa and 850 hPa, representing more accurate predictions in this mid-level pressure range. However, a slight increase is observed again as the pressure decreases further. The RMSE of variable Q follows a somewhat similar trend, starting with a moderate rise in RMSE. After reaching its maximum between 600 hPa and 850 hPa, it shows a clear decline as the pressure decreases, suggesting a more stable predictive accuracy in both lower and higher pressure ranges. In the case of variable T, the RMSE exhibits a relatively consistent upward trend as pressure decreases, with a noticeable local maximum occurring between 250 hPa and 400 hPa. This behavior suggests that the prediction error increases more steadily for T compared to other variables at higher pressures. Variables U and V exhibit very similar patterns in their RMSE curves, characterized by an initial increase followed by a steady decline. Both variables show their highest RMSE values in the range between 150 hPa and 400 hPa, corresponding to higher pressures. However, their predictive accuracy improves significantly at lower pressures, with minimum RMSE values observed between 850 hPa and 1000 hPa. Overall, the figure highlights how different meteorological variables exhibit unique relationships between prediction errors and pressure levels, with the mid- to high-pressure regions showing more variability in RMSE patterns.
Figure 5 illustrates the variations in MAE for different variables and prediction time points of SA-Fit prediction results with respect to pressure levels. The variation patterns of MAE in the prediction results are largely consistent with those of RMSE. The prediction results for meteorological variables at different pressure levels demonstrate inconsistency and the relationship between variables and pressure levels exhibits variations. For variable Z, the MAE initially increases, reaching a peak around 200 hPa, followed by a decrease until reaching a pressure level between 600 hPa and 850 hPa, and subsequently increasing again. Variable Q exhibits an initial increase in MAE, followed by a decrease, and reaches its maximum at pressure levels between 600 hPa and 850 hPa. For the variable T, MAE typically increases with increasing pressure levels, with a local maximum value between 250 hPa and 400 hPa. Variables U and V exhibit similar patterns of MAE changes, characterized by an initial increase, and subsequent decrease. The maximum MAE for both variables is observed between 150 hPa and 400 hPa, while the minimum MAE is observed between 850 hPa and 1000 hPa.
Comparing the RMSE and MAE of SA Fit and SimVP prediction results in Shanghai, it can be found that the prediction performance of SA-Fit is generally better than the prediction performance of SimVP. In short-term prediction, SA-Fit significantly outperforms SimVP and demonstrates outstanding performance in predicting variable Z. However, in long-term prediction, SA-Fit exhibits slightly inferior predictive performance compared to SimVP. It is worth noting that due to the more stable and less volatile predictive performance of SA-Fit compared to SimVP at different pressure levels, SA-Fit is more robust at dealing with sudden changes in pressure levels.

3.5. Comparison of the Number of Parameters

Due to the use of the spatiotemporal analysis network, SA-Fit can process and predict high-dimensional meteorological data simultaneously through the improved curve fitting algorithm. The number of parameters in SA-Fit is not affected by the scale of the data to be predicted. The previous Transformer-based time series prediction methods could only predict time series with a single spatial coordinate. When dealing with multiple time series in high-dimensional space, multiple prediction models need to be constructed. As the data scale continues to increase, the computational resources required to construct models become unacceptable. Table 6 shows the model size required by different algorithms, and it is evident that SA-Fit has significantly fewer parameters when predicting meteorological data in the Shanghai region. We have visually demonstrated the advantage of SA-Fit in terms of model size in Figure 6. In fact, using methods like FEDformer to simultaneously predict data for all spatial coordinate points in a given area on a typical consumer-grade graphics card is not feasible, as the GPU memory does not support such operations.

3.6. Reduced Training Time

Table 7 presents the training times for various comparative methods, alongside the results from the SA-Fit experiments conducted in the Shanghai region. The training time is an important metric used in assessing the efficiency of predictive models, especially when scaling up to handle more complex tasks such as predicting meteorological variables across multiple pressure levels.
SA-Fit stands out due to its remarkably efficient training time, especially when predicting across 13 different pressure levels simultaneously. Unlike traditional methods, where the prediction of each pressure level is treated as a separate task resulting in a linear increase in training time, SA-Fit can predict all pressure levels simultaneously. This ability significantly reduces the overall time needed for training, as it does not scale linearly with the number of pressure levels. While some comparison methods, such as ConvLSTM, SimVP, and PredRNN, also exhibit relatively short training times that are comparable to SA-Fit, these models handle predictions differently. They can become inefficient as the complexity of the task increases, particularly when dealing with high-dimensional data with more pressure levels.
As the number of pressure levels grows, the training efficiency advantage of SA-Fit becomes even more pronounced. This scalability is crucial for operational models that require quick training across various atmospheric conditions, ensuring that SA-Fit remains competitive in terms of both training speed and accuracy. Its constant training time, regardless of the number of pressure levels, highlights its suitability for large-scale meteorological forecasting tasks, offering a distinct advantage over other models.

3.7. Visualization

To enhance the clarity of SA-Fit prediction results, we randomly selected two locations and plotted the predicted values of two variables alongside their corresponding true values. These plots are presented in Figure 7 and Figure 8. Figure 7 displays the predictions and true values for variables T and U at a pressure level of 500 hPa in the Shanghai region, specifically located at longitude 121.25° and latitude 30.75°. Figure 8 illustrates the predictions and true values for variables Z and V at a pressure level of 800 hPa in the Xi’an region, situated at longitude 108.25° and latitude 34.25°.
The figures show that the trend of SA-Fit predictions aligns closely with the trend of the real data. Furthermore, SA-Fit demonstrates a strong predictive capacity, particularly in capturing significant variations within the real data. The prediction results of SA-Fit exhibit a slight delay compared to the real data as the prediction time interval increases. However, SA-Fit does not merely replicate historical data; instead, it effectively learns valuable information from the historical data to inform its predictions.

4. Discussion

SA-Fit aims to address the challenge of predicting regional high-dimensional meteorological data with existing deep learning methods. To achieve this, SA-Fit introduces a novel approach for predicting high-dimensional meteorological data. SA-Fit proposes a lightweight Transformer-based spatiotemporal analysis network to process spatiotemporal information and leverages polynomials to fit variations in meteorological data. The experimental results demonstrate that, despite being an approximation algorithm, SA-Fit achieves comparable results to state-of-the-art algorithms developed in recent years.
Some prior prediction networks employ RNN structures and an iterative multi-step prediction strategy, which autoregressively predicts future data; this inevitably leads to cumulative errors when predicting over an extended time horizon. SA-Fit embraces a Transformer-based architecture and employs a direct multi-step prediction strategy that optimizes the multi-step prediction objective directly in a single step. This prediction strategy effectively mitigates the issue of cumulative errors when making predictions over an extended future time horizon. Zeng et al. [50] discussed the benefits of employing Transformers within the context of this strategy. However, in our experiments, we did not observe SA-Fit outperforming RNN-based In previous studies using RNN and Transformer structure networks for prediction, since the cyclic prediction of RNN is different from the direct action of the Transformer on all tokens through the attention mechanism, RNN results will produce obvious cumulative errors when outputting longer prediction results. However, the advantage of the Transformer structure (i.e., of not accumulating errors) can only be reflected when there are dozens or even hundreds of prediction time points. We posit that one potential explanation is that the number of predicted future time points is insufficient to generate substantial cumulative errors.
The structure of the Transformer-based network enables SA-Fit to simultaneously encode data across multiple pressure levels within high-dimensional meteorological data, in contrast to the previous approach of separate predictions for each pressure level or point. This enhanced approach allows SA-Fit to maximize the utilization of structural data information, thus enhancing the support for accurate predictions.
SA-Fit can significantly reduce the number of predicted values, with the extent of reduction increasing as the range of predicted longitude, latitude, and pressure level expands. During the experiment, when predicting the meteorological data of Shanghai, we are required to predict a total of 37,440 values, calculated as 4 × 12 × 12 × 13 × 5 . Conversely, when using SA-Fit ( d e g f = 10 ), we are only required to predict a mere 5720 function coefficients, calculated as 4 × 5 × 286 , only 15.3% of the original amount.
SA-Fit possesses the flexibility to choose suitable fitting functions for prediction, thus making it adaptable to the prediction demands of diverse datasets. Furthermore, distinct fitting functions can be selected for different variables within the dataset. When having prior knowledge of the variables to be predicted, we can leverage this knowledge to choose fitting functions that improve prediction accuracy and interpretability.

5. Conclusions

In this article, we propose SA-Fit, an algorithm using a lightweight Transformer-based network aimed at addressing the challenge of predicting regional high-dimensional meteorological data. SA-Fit proposes an enhanced Transformer-based spatiotemporal analysis network to encode spatiotemporal information in high-dimensional meteorological data and introduces explicit functions to fit variations in meteorological data, offering a novel predictive methodology.
The experimental results show that SA-Fit is comparable to other advanced deep learning algorithms and requires fewer computation resources. When using multivariate polynomial functions for fitting, SA-Fit exhibits favorable performance in certain prediction tasks, such as geopotential prediction and short-term prediction. Meanwhile, in the experiment, SA-Fit effectively reduces training time and greatly reduces the model parameters compared to other Transformer-based prediction models.

Author Contributions

Conceptualization, J.W., S.S. and X.X.; Methodology, Y.T. and B.P.; Investigation, Y.L. and S.S.; Writing—original draft, Y.T. and J.W.; Writing—review & editing, Y.L., X.X. and B.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Key Research and Development Program of China under grant no. 2022YFA1003800, the Fundamental Research Funds for the Central Universities under grant no. 63243074, and the Beijing-Tianjin-Hebei Basic Research Cooperation Project under grant no. F2021203109.

Data Availability Statement

The data presented in this study are publicly available on https://cds.climate.copernicus.eu/, accesssed on 24 November 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Qing, X.; Niu, Y. Hourly day-ahead solar irradiance prediction using weather forecasts by LSTM. Energy 2018, 148, 461–468. [Google Scholar] [CrossRef]
  2. Yildiz, C.; Acikgoz, H.; Korkmaz, D.; Budak, U. An improved residual-based convolutional neural network for very short-term wind power forecasting. Energy Convers. Manag. 2021, 228, 113731. [Google Scholar] [CrossRef]
  3. Ziolkowska, J.R. Economic value of environmental and weather information for agricultural decisions—A case study for Oklahoma Mesonet. Agric. Ecosyst. Environ. 2018, 265, 503–512. [Google Scholar] [CrossRef]
  4. Sigler, W.A.; Ewing, S.A.; Jones, C.A.; Payn, R.A.; Miller, P.; Maneta, M. Water and nitrate loss from dryland agricultural soils is controlled by management, soils, and weather. Agric. Ecosyst. Environ. 2020, 304, 107158. [Google Scholar] [CrossRef]
  5. Regnier, E. Doing something about the weather. Omega 2008, 36, 22–32. [Google Scholar] [CrossRef]
  6. Jones, J.W.; Hansen, J.W.; Royce, F.S.; Messina, C.D. Potential benefits of climate forecasting to agriculture. Agric. Ecosyst. Environ. 2000, 82, 169–184. [Google Scholar] [CrossRef]
  7. Dailey, A.; Smith, J.U.; Whitmore, A. How far might medium-term weather forecasts improve nitrogen fertiliser use and benefit arable farming in the England and Wales? Agric. Ecosyst. Environ. 2006, 117, 22–28. [Google Scholar] [CrossRef]
  8. Quan, H.; Wang, B.; Wu, L.; Feng, H.; Wu, L.; Wu, L.; Li Liu, D.; Siddique, K.H. Impact of plastic mulching and residue return on maize yield and soil organic carbon storage in irrigated dryland areas under climate change. Agric. Ecosyst. Environ. 2024, 362, 108838. [Google Scholar] [CrossRef]
  9. Ren, X.; Li, X.; Ren, K.; Song, J.; Xu, Z.; Deng, K.; Wang, X. Deep learning-based weather prediction: A survey. Big Data Res. 2021, 23, 100178. [Google Scholar] [CrossRef]
  10. Ritchie, H.; Temperton, C.; Simmons, A.; Hortal, M.; Davies, T.; Dent, D.; Hamrud, M. Implementation of the semi-Lagrangian method in a high-resolution version of the ECMWF forecast model. Mon. Weather Rev. 1995, 123, 489–514. [Google Scholar] [CrossRef]
  11. Skamarock, W.C.; Klemp, J.B.; Dudhia, J.; Gill, D.O.; Barker, D.M.; Wang, W.; Powers, J.G. A Description of the Advanced Research WRF Version 2; Technical Report; National Center For Atmospheric Research Boulder Co Mesoscale and Microscale: Boulder, CO, USA, 2005. [Google Scholar]
  12. Bauer, P.; Quintino, T.; Wedi, N.; Bonanni, A.; Chrust, M.; Deconinck, W.; Diamantakis, M.; Düben, P.; English, S.; Flemming, J.; et al. The ECMWF Scalability Programme: Progress and Plans; European Centre for Medium Range Weather Forecasts: Reading, UK, 2020. [Google Scholar]
  13. Palmer, T.; Shutts, G.; Hagedorn, R.; Doblas-Reyes, F.; Jung, T.; Leutbecher, M. Representing model uncertainty in weather and climate prediction. Annu. Rev. Earth Planet. Sci. 2005, 33, 163–193. [Google Scholar] [CrossRef]
  14. Olafsson, H.; Bao, J.W. Uncertainties in Numerical Weather Prediction; Elsevier: Amsterdam, The Netherlands, 2020. [Google Scholar]
  15. Weyn, J.A.; Durran, D.R.; Caruana, R. Can machines learn to predict weather? Using deep learning to predict gridded 500-hPa geopotential height from historical weather data. J. Adv. Model. Earth Syst. 2019, 11, 2680–2693. [Google Scholar] [CrossRef]
  16. Schultz, M.; Betancourt, C.; Gong, B.; Kleinert, F.; Langguth, M.; Leufen, L.; Mozaffari, A.; Stadtler, S. Can deep learning beat numerical weather prediction? Philos. Trans. R. Soc. 2021, 379, 20200097. [Google Scholar] [CrossRef] [PubMed]
  17. Pathak, J.; Subramanian, S.; Harrington, P.; Raja, S.; Chattopadhyay, A.; Mardani, M.; Kurth, T.; Hall, D.; Li, Z.; Azizzadenesheli, K.; et al. Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. arXiv 2022, arXiv:2202.11214. [Google Scholar]
  18. Bi, K.; Xie, L.; Zhang, H.; Chen, X.; Gu, X.; Tian, Q. Pangu-weather: A 3d high-resolution model for fast and accurate global weather forecast. arXiv 2022, arXiv:2211.02556. [Google Scholar]
  19. Lam, R.; Sanchez-Gonzalez, A.; Willson, M.; Wirnsberger, P.; Fortunato, M.; Alet, F.; Ravuri, S.; Ewalds, T.; Eaton-Rosen, Z.; Hu, W.; et al. GraphCast: Learning skillful medium-range global weather forecasting. arXiv 2022, arXiv:2212.12794. [Google Scholar]
  20. Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28, 802–810. [Google Scholar]
  21. Shi, X.; Gao, Z.; Lausen, L.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Deep learning for precipitation nowcasting: A benchmark and a new model. Adv. Neural Inf. Process. Syst. 2017, 30, 5617–5627. [Google Scholar]
  22. Yu, W.; Lu, Y.; Easterbrook, S.; Fidler, S. Efficient and Information-Preserving Future Frame Prediction and Beyond. International Conference on Learning Representations. 2020. Available online: https://openreview.net/forum?id=B1eY_pVYvB (accessed on 24 November 2024).
  23. Wang, Y.; Jiang, L.; Yang, M.H.; Li, L.J.; Long, M.; Fei-Fei, L. Eidetic 3d lstm: A model for video prediction and beyond. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  24. Gao, Z.; Tan, C.; Wu, L.; Li, S.Z. Simvp: Simpler yet better video prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 3170–3180. [Google Scholar]
  25. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 11106–11115. [Google Scholar]
  26. Zhang, Y.; Yan, J. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In Proceedings of the The Eleventh International Conference on Learning Representations, Virtual Event, 25–29 April 2022. [Google Scholar]
  27. Wang, W.; Chen, W.; Qiu, Q.; Chen, L.; Wu, B.; Lin, B.; He, X.; Liu, W. Crossformer++: A versatile vision transformer hinging on cross-scale attention. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 3123–3136. [Google Scholar] [CrossRef]
  28. Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 568–578. [Google Scholar]
  29. Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009; Volume 2. [Google Scholar]
  30. Guest, P.G.; Guest, P.G. Numerical Methods of Curve Fitting; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
  31. Bradley, B.A.; Jacob, R.W.; Hermance, J.F.; Mustard, J.F. A curve fitting procedure to derive inter-annual phenologies from time series of noisy satellite NDVI data. Remote Sens. Environ. 2007, 106, 137–145. [Google Scholar] [CrossRef]
  32. Brooks, E.B.; Thomas, V.A.; Wynne, R.H.; Coulston, J.W. Fitting the multitemporal curve: A Fourier series approach to the missing data problem in remote sensing analysis. IEEE Trans. Geosci. Remote Sens. 2012, 50, 3340–3353. [Google Scholar] [CrossRef]
  33. Motulsky, H.; Christopoulos, A. Fitting Models to Biological Data Using Linear and Nonlinear Regression: A Practical Guide to Curve Fitting; Oxford University Press: Oxford, UK, 2004. [Google Scholar]
  34. Verbeek, M. A Guide to Modern Econometrics; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
  35. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
  36. Arnab, A.; Dehghani, M.; Heigold, G.; Sun, C.; Lučić, M.; Schmid, C. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 6836–6846. [Google Scholar]
  37. Liu, Z.; Ning, J.; Cao, Y.; Wei, Y.; Zhang, Z.; Lin, S.; Hu, H. Video swin transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 3202–3211. [Google Scholar]
  38. Hersbach, H.; Bell, B.; Berrisford, P.; Hirahara, S.; Horányi, A.; Mu noz-Sabater, J.; Nicolas, J.; Peubey, C.; Radu, R.; Schepers, D.; et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 2020, 146, 1999–2049. [Google Scholar] [CrossRef]
  39. Boulahia, A.K.; García-García, D.; Trottini, M.; Sayol, J.M.; Vigo, M.I. Hydrological Cycle in the Arabian Sea Region from GRACE/GRACE-FO Missions and ERA5 Data. Remote Sens. 2024, 16, 3577. [Google Scholar] [CrossRef]
  40. Nelli, N.; Francis, D.; Alkatheeri, A.; Fonseca, R. Evaluation of Reanalysis and Satellite Products against Ground-Based Observations in a Desert Environment. Remote Sens. 2024, 16, 3593. [Google Scholar] [CrossRef]
  41. Liu, H.L.; Duan, M.Z.; Zhou, X.Q.; Zhang, S.L.; Deng, X.B.; Zhang, M.L. Neural Network-Based Estimation of Near-Surface Air Temperature in All-Weather Conditions Using FY-4A AGRI Data over China. Remote Sens. 2024, 16, 3612. [Google Scholar] [CrossRef]
  42. Kalnay, E.; Kanamitsu, M.; Kistler, R.; Collins, W.; Deaven, D.; Gandin, L.; Iredell, M.; Saha, S.; White, G.; Woollen, J.; et al. The NCEP/NCAR 40-year reanalysis project. Bull. Am. Meteorol. Soc. 1996, 77, 437–472. [Google Scholar] [CrossRef]
  43. Zhang, Z.; Lou, Y.; Zhang, W.; Wang, H.; Zhou, Y.; Bai, J. Assessment of ERA-Interim and ERA5 reanalysis data on atmospheric corrections for InSAR. Int. J. Appl. Earth Obs. Geoinf. 2022, 111, 102822. [Google Scholar] [CrossRef]
  44. Xu, S.; Wang, D.; Liang, S.; Liu, Y.; Jia, A. Assessment of gridded datasets of various near surface temperature variables over Heihe River Basin: Uncertainties, spatial heterogeneity and clear-sky bias. Int. J. Appl. Earth Obs. Geoinf. 2023, 120, 103347. [Google Scholar] [CrossRef]
  45. Wang, Y.; Long, M.; Wang, J.; Gao, Z.; Yu, P.S. Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms. Adv. Neural Inf. Process. Syst. 2017, 30, 879–888. [Google Scholar]
  46. Wang, Y.; Gao, Z.; Long, M.; Wang, J.; Philip, S.Y. Predrnn++: Towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 5123–5132. [Google Scholar]
  47. Wang, Y.; Zhang, J.; Zhu, H.; Long, M.; Wang, J.; Yu, P.S. Memory in memory: A predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9154–9162. [Google Scholar]
  48. Chen, M.; Peng, H.; Fu, J.; Ling, H. Autoformer: Searching transformers for visual recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 12270–12280. [Google Scholar]
  49. Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar]
  50. Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Montreal, QC Canada, 8–10 August 2023; Volume 37, pp. 11121–11128. [Google Scholar]
Figure 1. Overall structure of the network. We first encode the spatial information using a spatial encoder, and then encode the temporal information using a temporal encoder. The temporal encoder will output the estimated parameters α ˜ of fitting functions of all variables based on the predicted time.
Figure 1. Overall structure of the network. We first encode the spatial information using a spatial encoder, and then encode the temporal information using a temporal encoder. The temporal encoder will output the estimated parameters α ˜ of fitting functions of all variables based on the predicted time.
Remotesensing 16 04545 g001
Figure 2. The structure of spatial encoder. We use VST blocks based on the Video Swin Transformer for encoding and the 3D patch merging block for dimensionality reduction.
Figure 2. The structure of spatial encoder. We use VST blocks based on the Video Swin Transformer for encoding and the 3D patch merging block for dimensionality reduction.
Remotesensing 16 04545 g002
Figure 3. Obtaining the predicted value through fitting functions.
Figure 3. Obtaining the predicted value through fitting functions.
Remotesensing 16 04545 g003
Figure 4. RMSE of SA-Fit and SimVP predictions of different meteorological variables at different pressure levels.
Figure 4. RMSE of SA-Fit and SimVP predictions of different meteorological variables at different pressure levels.
Remotesensing 16 04545 g004
Figure 5. MAE of SA-Fit and SimVP predictions of different meteorological variables at different pressure levels.
Figure 5. MAE of SA-Fit and SimVP predictions of different meteorological variables at different pressure levels.
Remotesensing 16 04545 g005
Figure 6. Visualization of the model size.
Figure 6. Visualization of the model size.
Remotesensing 16 04545 g006
Figure 7. Predicted values for different hours and true values of T and U from 2020-02-20 00:00 to 2020-03-11 19:00 at a longitude of 121.25°, latitude of 30.75°, and pressure level of 500 hPa in the Shanghai region.
Figure 7. Predicted values for different hours and true values of T and U from 2020-02-20 00:00 to 2020-03-11 19:00 at a longitude of 121.25°, latitude of 30.75°, and pressure level of 500 hPa in the Shanghai region.
Remotesensing 16 04545 g007
Figure 8. Predicted values for different hours and true values of Z and V from 2021-10-06 08:00 00:00 to 2021-10-27 04:00 at a longitude of 108.25°, latitude of 34.25°, and pressure level of 800 hPa in the Xi’an region.
Figure 8. Predicted values for different hours and true values of Z and V from 2021-10-06 08:00 00:00 to 2021-10-27 04:00 at a longitude of 108.25°, latitude of 34.25°, and pressure level of 800 hPa in the Xi’an region.
Remotesensing 16 04545 g008
Table 1. Symbols used to represent variables in the data.
Table 1. Symbols used to represent variables in the data.
SymbolDefinition in This Paper
χ x , y , p v , t The values of variable v at longitude x, latitude y and pressure level p at time t
χ x , y , p , t The values of all variables at longitude x, latitude y and pressure level p at time t
χ , , v , t The values of variable v at all longitudes, latitudes, and pressure levels at time t
χ , , , t The values of all variables at all longitudes, latitudes, and pressure levels at time t
Table 2. Six-h prediction results. The best results are in bold.
Table 2. Six-h prediction results. The best results are in bold.
MethodZQTUV
RMSE MAE RMSE MAE RMSE MAE RMSE MAE RMSE MAE
ShanghaiConvLSTM239.598174.9220.8050.4341.4651.1134.2253.0784.0172.910
TrajGRU197.397146.6220.7970.4161.3371.0073.7692.7583.7212.715
PredRNN264.463193.8800.9040.4891.6501.2474.7073.4464.4623.262
PredRNN++181.460131.9420.7600.4041.2980.9773.8052.7643.7702.731
E3D-LSTM226.261166.2750.8410.4511.4691.1084.2783.1214.2833.110
MIM255.067186.3150.8820.4741.5961.2024.5283.3054.3823.178
CrevNet219.392155.3150.8010.4121.2720.9433.8092.7363.8722.801
SimVP204.688146.1890.7570.3961.3251.0023.8412.8013.6092.626
SA-Fit114.33485.8050.8060.4271.2280.9003.6832.7354.0982.983
Xi’anConvLSTM286.047196.8950.6280.3351.6041.2224.0452.8183.9362.752
TrajGRU218.312157.1900.5790.3191.3831.0533.6832.6373.5692.502
PredRNN301.610215.9920.6740.3741.7471.3334.2362.9624.2632.986
PredRNN++227.105157.5030.5880.3191.4051.0613.6472.5423.6262.534
E3D-LSTM258.260182.8950.6140.3351.5731.1923.9612.7534.0942.851
MIM298.691213.8730.6740.3581.7191.3134.2292.9494.2342.965
CrevNet265.988171.6260.6380.3271.4491.0723.6482.5583.7802.641
SimVP243.694172.5700.6090.3271.4681.1233.5782.6693.4182.402
SA-Fit121.92588.8960.6010.3361.2800.9583.5582.5373.9452.883
Table 3. The 12-h prediction results. The best results are in bold.
Table 3. The 12-h prediction results. The best results are in bold.
MethodZQTUV
RMSE MAE RMSE MAE RMSE MAE RMSE MAE RMSE MAE
ShanghaiConvLSTM271.822199.6051.0150.5461.8041.3485.2523.8485.2113.801
TrajGRU230.104169.9370.9780.5211.7091.2674.8823.5675.0313.668
PredRNN306.955226.2531.0790.5821.9531.4615.6954.1955.5364.057
PredRNN++224.766163.9660.9710.5131.6691.2324.9013.5835.0183.644
E3D-LSTM283.997208.0821.0770.5671.8881.4085.5664.0685.5014.001
MIM300.856219.8481.0660.5671.9221.4295.5704.0855.4813.985
CrevNet250.417178.7511.0110.5281.6731.2404.9503.6125.0923.697
SimVP252.168182.3030.9690.5131.7021.2654.9573.6274.9163.572
SA-Fit196.295145.4461.1070.6421.6551.2185.1693.9735.3063.970
Xi’anConvLSTM314.616223.1650.7610.4071.9151.4475.0363.5045.0823.554
TrajGRU252.683183.7420.7070.3741.8571.3984.6703.2414.8873.418
PredRNN347.476249.6230.7850.4202.0461.5545.2463.6615.3453.744
PredRNN++267.144189.6860.7210.3891.7521.3104.7663.3024.8303.369
E3D-LSTM323.363227.7740.7640.4131.9601.4785.1313.5525.2683.673
MIM345.887249.4290.7910.4282.0321.5465.2623.6615.2843.696
CrevNet298.087199.4690.7820.4051.8491.3854.7313.2754.8883.415
SimVP280.173202.2990.7440.3991.7991.3654.7143.2764.7043.288
SA-Fit205.377151.1910.7970.4371.8091.3644.9213.5845.2163.630
Table 4. The 18-h prediction results. The best results are in bold.
Table 4. The 18-h prediction results. The best results are in bold.
MethodZQTUV
RMSE MAE RMSE MAE RMSE MAE RMSE MAE RMSE MAE
ShanghaiConvLSTM322.295238.5401.1800.6332.1001.5446.0104.4095.9514.343
TrajGRU295.707219.6001.1390.6092.0471.4905.7644.2125.9014.293
PredRNN362.988268.8621.2240.6602.2361.6556.4664.7636.2414.569
PredRNN++285.391209.0591.1330.6062.1851.5415.7314.1985.8334.239
E3D-LSTM340.533252.1181.2290.6602.1901.6136.3444.6336.2194.525
MIM359.017263.8381.2170.6522.2211.6316.3884.6906.2224.530
CrevNet313.374231.8841.1660.6291.9961.4645.7734.2355.8874.283
SimVP302.438219.4661.1240.6062.0161.4745.7854.2245.7724.195
SA-Fit279.721208.3471.2780.7582.0441.5026.0314.5716.2014.446
Xi’anConvLSTM359.793260.4660.8800.4712.1951.6505.7603.9885.8044.059
TrajGRU310.726228.5000.8290.4362.0741.5505.5513.8225.7654.026
PredRNN402.277291.4060.8910.4822.3411.7715.9704.1536.0104.214
PredRNN++321.046232.4250.8310.4442.0551.5315.5353.8245.5883.900
E3D-LSTM374.401269.2300.8750.4592.2431.6875.8554.0435.9724.172
MIM402.754293.1590.9030.4822.3291.7676.0134.1715.9884.189
CrevNet350.778248.0310.8900.4592.1321.6035.4943.7985.6413.944
SimVP323.506236.5720.8450.4552.0821.5655.4883.8035.5543.876
SA-Fit280.580209.2740.9640.5792.1561.6155.7424.1656.0914.267
Table 5. The 24-h prediction results. The best results are in bold.
Table 5. The 24-h prediction results. The best results are in bold.
MethodZQTUV
RMSE MAE RMSE MAE RMSE MAE RMSE MAE RMSE MAE
ShanghaiConvLSTM362.694268.3031.3160.7052.3541.7126.5904.8366.4494.710
TrajGRU341.161251.0141.2740.6842.3311.6816.4514.7106.5274.739
PredRNN414.667306.2441.3510.7292.5041.8407.1145.2516.8094.981
PredRNN++337.354247.0321.2680.6832.2691.6376.4104.7026.4394.685
E3D-LSTM391.664288.2281.3710.7292.4691.8077.0675.1586.8134.964
MIM410.352300.3611.3540.7222.4981.8227.0545.1856.7954.959
CrevNet350.626257.1491.2960.6982.2771.6536.4134.7176.4954.725
SimVP353.030257.5551.2580.6832.2991.6706.4264.6976.3864.644
SA-Fit332.517241.0321.4350.8752.3821.7466.6995.0106.9715.262
Xi’anConvLSTM397.135289.2320.9770.5232.4391.8276.2774.3306.2364.354
TrajGRU356.187261.5640.9360.5062.3581.7586.2114.2516.3224.410
PredRNN453.390328.2890.9890.5362.6171.9766.5504.5516.5124.564
PredRNN++368.918267.9270.9330.5062.3311.7346.1434.2396.1144.268
E3D-LSTM428.263307.3760.9820.5212.5291.9016.4904.4646.5254.556
MIM455.964332.1521.0040.5442.6021.9716.6134.5816.5234.552
CrevNet386.604274.9180.9800.5212.3771.7856.0604.1836.1934.334
SimVP369.037270.6030.9450.5102.3421.7606.0614.2056.1804.319
SA-Fit344.424256.3201.0540.6302.4241.8146.3214.5806.5264.671
Table 6. The model size required for different models for predicting a total of 1728 time series in the Shanghai region.
Table 6. The model size required for different models for predicting a total of 1728 time series in the Shanghai region.
MethodModel Size (MB)
Informer [25]108,812
Autoformer [48]103,714
FEDformer [49]155,554
Pyraformer [49]90,461
SA-Fit258
Table 7. The training times of different methods and the increased rate of training time compared to SA-Fit.
Table 7. The training times of different methods and the increased rate of training time compared to SA-Fit.
MethodTraining TimeIncrease Rate
ConvLSTM5 h 32 min25.8%
TrajGRU12 h 18 min179.5%
PredRNN6 h 16 min42.4%
PredRNN++8 h 10 min85.6%
E3D-LSTM7 h 52 min78.8%
MIM8 h 44 min95.5%
CrevNet8 h 38 min96.2%
SimVP4 h 56 min12.1%
SA-Fit4 h 24 min
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tan, Y.; Wu, J.; Liu, Y.; Shen, S.; Xu, X.; Pan, B. A Lightweight Transformer-Based Spatiotemporal Analysis Prediction Algorithm for High-Dimensional Meteorological Data. Remote Sens. 2024, 16, 4545. https://doi.org/10.3390/rs16234545

AMA Style

Tan Y, Wu J, Liu Y, Shen S, Xu X, Pan B. A Lightweight Transformer-Based Spatiotemporal Analysis Prediction Algorithm for High-Dimensional Meteorological Data. Remote Sensing. 2024; 16(23):4545. https://doi.org/10.3390/rs16234545

Chicago/Turabian Style

Tan, Yinghao, Junfeng Wu, Yihang Liu, Shiyu Shen, Xia Xu, and Bin Pan. 2024. "A Lightweight Transformer-Based Spatiotemporal Analysis Prediction Algorithm for High-Dimensional Meteorological Data" Remote Sensing 16, no. 23: 4545. https://doi.org/10.3390/rs16234545

APA Style

Tan, Y., Wu, J., Liu, Y., Shen, S., Xu, X., & Pan, B. (2024). A Lightweight Transformer-Based Spatiotemporal Analysis Prediction Algorithm for High-Dimensional Meteorological Data. Remote Sensing, 16(23), 4545. https://doi.org/10.3390/rs16234545

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop