Next Article in Journal
A Review on Flavonoids as Anti-Helicobacter pylori Agents
Previous Article in Journal
Risk Factor Analysis of Elevator Brake Failure Based on DEMATEL-ISM
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Post Constraint and Correction: A Plug-and-Play Module for Boosting the Performance of Deep Learning Based Weather Multivariate Time Series Forecasting

School of Computer Science, China University of Geosciences (Wuhan), Wuhan 430078, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(7), 3935; https://doi.org/10.3390/app15073935
Submission received: 23 January 2025 / Revised: 15 March 2025 / Accepted: 27 March 2025 / Published: 3 April 2025

Abstract

:
Weather forecasting is essential for various applications such as agriculture and transportation, and relies heavily on meteorological sequential data such as multivariate time series collected from weather stations. Traditional numerical weather prediction (NWP) methods applied to multivariate time series forecasting are grounded in statistical principles such as Autoregressive Integrated Moving Average (ARIMA); however, they often struggle with capturing complex nonlinear patterns among meteorological variables and temporal variances. Currently, existing deep learning approaches such as Recurrent Neural Networks (RNNs) and transformers offer remarkable performance in handling complex patterns among meteorological multivariate time series, yet frequently fail to maintain weather-specific physical properties such as strict values constraints, while also incurring the significant computational costs of large parameter scales. In this paper, we present a novel deep learning plug-and-play framework named Post Constraint and Correction (PCC) to address these challenges by incorporating additional constraints and corrections based on weather-specific properties such as multivariant correlations and physical-based strict value constraints into the prediction process. Our method demonstrates notable computational efficiency, delivering significant improvements over existing deep learning time series models and helping to achieve better performance with far fewer parameters. Extensive experiments demonstrate the effectiveness, efficiency, and robustness of our method, highlighting its potential for real-world applications.

1. Introduction

Weather forecasting plays a crucial role in modern society, impacting various sectors from agriculture and transportation to daily life planning [1]. Meteorological multivariate time series collected from weather stations serve as one of the primary data sources for weather forecasting tasks, containing multiple variables such as temperature, humidity, pressure, wind velocity, and precipitation which can be used to describe the weather state at a specific location over a period of time.
Traditional numerical weather prediction (NWP) methods applied to multivariate time series forecasting tasks primarily consist of statistical and mathematical models, including Autoregressive Integrated Moving Average (ARIMA) [2], Kalman filtering [3], and Singular Spectrum Analysis (SSA) [4]. However, these which often struggle with nonlinear relationships and complex patterns among meteorological series [5]. Recently, the advancement of deep learning has transformed weather forecasting capabilities, driven by three key factors: algorithmic innovations, the unprecedented scale of available weather data, and advances in parallel computing hardware [6]. Deep learning time series forecasting models excel at automatically extracting complex temporal patterns and dependencies from time series data, achieving state-of-the-art accuracy [7,8].
Meteorological variables possess distinct physical characteristics that significantly influence their temporal dynamics and interrelationships. Temperature exhibits daily and seasonal cyclical patterns with gradual transitions, while precipitation demonstrates intermittent and often non-Gaussian distributions with sudden onsets and varying intensities [9]. Atmospheric pressure typically displays smoother temporal transitions, but can undergo abrupt changes during weather front passages. These variables are further governed by fundamental physical principles such as energy conservation and thermodynamic laws that impose constraints on their possible values and rates of change [10].
However, deep learning models often fail to properly account for meteorology-specific patterns of different meteorological variables such as the strict physical constraints and boundary conditions inherent in weather systems, potentially leading to imprecise or unreliable predictions. Meanwhile, due to the complexity and dynamic nature of these physical characteristics, which exhibit subtle or significant variations across different time periods and geographical regions [11], predefining these relationships in a static way would inevitably affect prediction accuracy and generalization capability. Furthermore, many advanced deep learning models employ complex architectures with massive parameter counts, resulting in high computational resource requirements. These issues limit their practical deployment in real-world timely weather forecasting systems where both accuracy and computational efficiency are critical [12].
To address these challenges, we propose a novel deep learning plug-and-play framework named Post Constraint and Correction (PCC). PCC explicitly incorporates data characteristics that emerge from complex physical processes into the normal prediction process, including numerical constraints and variable relationships, thereby contributing to prediction reliability. Our approach introduces two key sub-modules: (1) the Multi-variants Correlation Constraint (MCC) module, which captures and maintains the complex interdependencies between different meteorological variables, and (2) the State Correction (SC) module, which ensures the reasonability of predictions by incorporating additional correction terms into the final prediction. By introducing network modules that can adaptively learn the empirical relationships between meteorological variables, deep learning models can effectively capture the underlying dynamics without requiring precise definitions of basic laws.
Notably, our method adopts a linear architectural design that achieves significantly efficient computational complexity and parameter scale while delivering notable accuracy improvements, making it suitable for practical weather forecasting. The code implementation is available at a GitHub repository: https://github.com/Fubukipara/PCCforWeatherTS/ (accessed on 26 March 2025).
The main contributions of our work are threefold:
1.
We develop a plug-and-play deep learning module named PCC that incorporates variable relationships and maintains state reasonability for weather multivariate time series forecasting, enabling seamless integration with various forecasting models without requiring architectural modifications or additional preprocessing.
2.
We design a computationally efficient architecture that significantly improves backbone model performance on weather time series forecasting tasks with minimal additional computational overhead, allowing the enhanced model to achieve superior performance with a significantly reduced parameter count.
3.
We conduct comprehensive experiments, ablation studies, and visualizations to demonstrate and analyze the superiority of our method.

2. Related Work

2.1. Weather Multivariate Time Series Forecasting

Multivariate time series data collected from weather stations serve as a fundamental cornerstone in modern weather forecasting systems. These data are characterized by their high dimensionality and complex temporal dependencies [13]. Traditional weather forecasting methods based on multivariate time series data primarily rely on various statistical approaches, including Autoregressive Integrated Moving Average (ARIMA) models, which assume linear relationships between variables, along with extensions such as SARIMA [14] for seasonal data. Other techniques have also been widely used, such as exponential smoothing [15] and Kalman filtering. Although these methods are interpretable and easy to implement, they often struggle to capture the complex and nonlinear relationships inherent in meteorological data, which may lead to unreliable predictions.
Driven by improvements in deep learning algorithms along with increases in the scale of available weather data and parallel computational hardware resources, recent advances in deep learning have revolutionized the field of weather forecasting and enabled the training of larger-scale models [16]. These new models are capable of effectively capturing complex and nonlinear patterns and dependencies within meteorological data in an end-to-end manner while eliminating the need for time-consuming manual feature engineering [17]. Thus, interest in deep learning models for NWP tasks there has been developing rapidly. For example, multiple deep learning methods such as Pangu-Weather [18] and FengWu [19] based on global-scale meteorological data in mesh space have shown the ability to outperform traditional physical models in terms of both prediction performance and efficiency.
For multivariate time series point data collected from weather stations, Recurrent Neural Networks (RNNs) and variants such as Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM) networks have demonstrated remarkable success in prediction and analyzing tasks [20]. The stacked LSTM (stackLSTM) architecture stands out for its exceptional capacity to capture complex temporal dynamics at various time scales in weather time series by leveraging multiple LSTM layers [21,22,23]. Recently, several studies have explored the potential of transformers in the context of weather multivariate time series [24], leveraging the strength of transformers in capturing complex or long-term dependencies in sequential data to offer a promising alternative to traditional RNN-based approaches.

2.2. Deep Learning-Based Time Series Forecasting

Time series forecasting encompasses a broad spectrum of applications, ranging from financial markets [25] and traffic [26] to weather prediction. While traditional statistical methods such as ARIMA and VAR remain valuable for their interpretability, deep learning approaches have emerged as the dominant paradigm, offering superior performance in handling complex temporal and spatial patterns [27].
The evolution of deep learning models for time series analysis has been marked by significant architectural innovations. RNNs have been the primary choice for handling sequential data for the past several years, including NLP [28] and time series analysis [29]. Improved RNN architectures such as GRU [30] and LSTM [31] have addressed the fundamental limitations of vanilla RNNs in capturing long-range dependencies. These architectures have demonstrated remarkable successes [32], including in weather-related time series applications. However, there exists a significant computational time assumption due to the autoregressive structure of RNNs, and they also suffer from the gradient vanishing problem. To address these issues, SegRNN introduces parallel prediction mechanisms [33], enabling the model to predict multiple time steps in parallel. This approach improves both forecasting accuracy and computational efficiency.
Nowadays, the transformer architecture has gained significant attention in long sequence modeling [34] thanks to its multi-head self-attention mechanism, which has demonstrated remarkable ability to capture long-term dependencies and avoid the gradient vanishing problem that arises in RNNs. Consequently, transformers have shown great promise in sequential tasks such as time series forecasting. Although the attention mechanism demonstrates superior performance in long-term temporal modeling, it has drawbacks in terms of model complexity and computational cost [35]. Indeed, numerous methods have been proposed to address these issues, such as treating split patches or whole sequences as tokens in PatchTST [36] and iTransformer [37]. Another research line focuses on utilizing intricate temporal patterns within time series by leveraging techniques such as seasonal trend decomposition [38] and nonstationary compensation [39] to improve forecasting accuracy.
Recent developments in time series forecasting have also explored alternative architectures. MLP-based models such as DLinear [40] offer simple yet effective solutions for time series prediction, achieving comparable performance while requiring significantly fewer computational resources.
However, the existing deep learning-based time series forecasting models still lack the ability to capture the specific physical features of weather time series, including the strict values constraints of meteorological variables and their interdependencies, which may lead to suboptimal performance on weather time series prediction tasks. Therefore, in this paper we propose a generic deep learning plug-and-play module to help deep learning-based time series forecasting models to address these challenges effectively and efficiently.

3. Method

To deal with the specific features of weather time series which make deep learning-based time series forecasting models less effective, we propose a novel deep learning plug-and-play module called Post Constraint and Correction (PCC). The PCC module is designed to help the forecasting model capture more appropriate features of weather time series, resulting in more reliable predictions. As we demonstrate in Figure 1, the PCC module consists of two parts: a Multi-variants Correlation Constraint (MCC) module and a State Correction (SC) module. We introduce each part in detail in the following sections. We define the observed weather time series as X R O × N , where O is the length of observed series and N is the number of variants. The ground truth and future prediction are denoted as Y ¯ R P × N and Y R P × N , respectively, where P is the prediction horizon (the number of future time steps predicted by the model).

3.1. Initial Prediction

We first feed the observed weather time series X to the backbone model (the underlying time series forecasting model, such as LSTM, that serves as the foundation for forecasting). Then, the backbone model generates the initial prediction Y i :
Y i = B a c k b o n e ( X )
where Y i R P × N is the initial prediction of the future series. The backbone model can be any general time series forecasting model, such as an RNN, transformer, or MLP model. Our PCC module is designed to be plug-and-play, which means that there is no need to change backbone model’s architecture, data normalization, training trajectory, etc. It is only necessary to deploy the PCC module on the forecasting series’ channel dimension after the backbone model.

3.2. Multi-Variants Correlation Constraint

There is a distinct correlation among different meteorological variables (e.g., temperature, precipitation) that play important roles in climate change [41]. Thus, we consider that offering additional insights on these relationships may help the model to make more accurate predictions.
However, unlike some other multivariate time series data, such as traffic flow and electricity demand, the correlation patterns among meteorological variables are primarily or directly based on physical properties [42] which are strict, specific, stable, and hard to modify or change across the time. However, some methods focus on whole sequences or patches of different channels. For instance, iTransformer considers every channel’s sequence as a token, which may introduce irrelevant temporal variances and lead to less effective capture of these temporally independent physical correlation patterns.
To address the above issues, we design the Multi-variants Correlation Constraint (MCC) module, which is deployed after the backbone model on the channel dimension of the forecasting series. The detailed architecture of the MCC’s network is shown in Figure 2.
After the backbone generates the initial prediction Y i , we first define the last observed time step X O as the reference point and representation of the historical states, then calculate the bias between the initial prediction Y i and X O :
B = Y i X O
where B R P × N is the bias matrix. The MCC module leverages the bias matrix B as a physically meaningful representation of how each variable evolves from the reference state. By using the change relative to a reference point ( X O ) rather than absolute values, the module is able to effectively capture the temporal dynamics of weather evolution.
Then, we deploy a fully-connected layer f m p and a nonlinear activation function σ on the channel dimension of B to project the bias of every time steps into the hidden space:
h m c = σ ( f m p ( B ) )
where h m c R P × H is the hidden representation of the variable correlation and H is the predefined dimension of the hidden space. We deploy a dropout layer after the hidden representation h m c to avoid overfitting [43]:
h m c = D r o p o u t ( h m c ) .
The fully connected layer f m p learns the complex interdependencies between changes in different meteorological variables through its learnable weights. For example, it can model how changes in temperature relate to corresponding changes in humidity, air pressure, and other variables. The nonlinear activation function σ enables modeling of the nonlinear patterns inherent in complex and dynamic meteorological systems.
Then, another fully connected layer f m c is introduced to generate the constraint term C m c :
C m c = f m c ( h m c ) .
Next, we add the constraint term C m c to the initial prediction’s bias B to generate the constrained bias B m c :
B m c = B + C m c .
Finally, we convert the constrained bias B m c back to a time series by adding the reference point X O :
Y m c = B m c + X O .
During the process above, the variable correlation constraint term C m c is inferred by the MLP through the bias matrix B separately and independently at every time step, thereby avoiding the irrelevant temporal variances and leading to more precise capture of the variable correlation patterns. Unlike fixed correlation rules, the MCC module can dynamically adjust the relationships between meteorological variables through the learnable weights of the linear layers, thereby enhancing the accuracy and reliability of the model’s predictions.
During the training stage, the MCC module can effectively learn the mapping between the bias matrix and the constraint term through a loss-based gradient propagation mechanism:
θ M C C θ M C C η L θ M C C
where η is the learning rate, L is the loss function, and θ M C C represents the parameters of the MCC module. The gradient calculation follows the chain rule:
L θ M C C = L Y · Y Y m c · Y m c C m c · C m c θ M C C .
This update forces the MCC module to learn parameters that generate a C m c term, which minimizes the overall prediction error when applied to the initial prediction and bias matrix. Because meteorological variables exhibit intrinsic physical relationships in the training data, the module naturally learns these relationships through the optimization process without requiring explicit formulation of physics equations.

3.3. State Correction

The range of each variable is strictly limited based on the meteorological variables’ physical characteristics; for instance, the humidity should be within [0,100]. These ranges are also influenced by the location of the observation station; for example, the temperature at the North Pole should be lower than at the equator. Furthermore, certain variables are interdependent; when precipitation occurs, the humidity is typically high, and high wind velocities often occur together with lower temperatures.
These characteristics lead to strict constraints on the ranges of variable values, and result in a high risk of unreasonable states occurring during training and forecasting due to the randomness of model parameter optimizations, such as high precipitation and low humidity at the same time. Therefore, we propose that providing extra monitoring of the predicted states and correcting the unreasonable ones can lead to more reliable prediction results and more robust models.
After obtaining the constrained prediction Y m c , we project Y m c from the temporal domain to the hidden space by a fully connected layer f s p and a nonlinear activation function σ :
h s = σ ( f s p ( Y m c ) )
where h s R H is the representation of each state and H is the predefined dimension of the hidden space. To avoid overfitting and make the model more robust, we introduce a dropout layer after the hidden representation h s , as previously deployed in the MCC module network:
h s = D r o p o u t ( h s ) .
The SC module focuses on ensuring that each predicted state adheres to the physical constraints of the meteorological variables. Unlike the MCC module, which operates based on changes in the variables, the SC module directly examines each predicted state at each time step independently. This direct projection of each time step’s state into the hidden space allows the model to assess whether that state is reasonable irrespective of how it was reached.
To enhance the MLP’s ability to capture the state’s subtle features among the large scale of meteorological states, we introduce a differential operation on representation h s inspired by the differential transformer [44], as demonstrated in Figure 3. This differential operation d i f f can be calculated as follows:
d i f f ( h s ) = exp ( λ 1 ) h s exp ( λ 2 ) h s
where λ 1 , λ 2 R P × H are learnable matrices and ⊙ denotes element-wise multiplication.
Similar to a differential amplifier [45], this differential operation acts as a feature enhancement mechanism, helping to distinguish important patterns from noise in the hidden representation. By applying different learnable weights ( exp ( λ 1 ) and exp ( λ 2 ) ) to the same hidden representation h s , the operation can selectively amplify subtle but important features while suppressing common noise patterns.
Then, we deploy another fully connected layer f s c to judge the state and generate a correction term C s c :
C s c = f s c ( h s ) .
Finally, we add the correction term C s c to the constrained prediction Y m c in order to adjust the states at each time step and generate the final prediction Y:
Y = Y m c + C s c .
Similar to the MCC module, the SC module learns through the gradient propagation mechanism during training. The gradient flow for the SC module is as follows:
θ S C θ S C η L θ S C
where θ S C represents the learnable parameters in the SC module, including the differential operation parameters λ 1 and λ 2 . The gradient of the loss with respect to these parameters is
L θ S C = L Y · Y C s c · C s c θ S C .
Through our Equation (14), we have Y C s c = 1 , which provides a direct gradient path from the loss to the SC module parameters. This direct gradient flow allows the SC module to efficiently learn which aspects of individual meteorological states require correction.
Therefore, the PCC module’s dual-component design addresses different challenges caused by physical characteristics in weather time series forecasting. The MCC component ensures that the relationships between variables stay reasonable during the prediction process. It operates on the correlation patterns, ensuring that when one variable changes, the related variables change in correct ways. Meanwhile, the SC component ensures that the individual states at each time step remain physically plausible regardless of how they were evolved, acting as a correction mechanism that identifies and adjusts unrealistic states.
Together, these components complement each other, enabling the model to learn the variable constraints from the data without requiring explicit physical equations or laws. This makes the resulting model more adaptable and robust across varying scenarios, such as different climate regimes.

4. Experiments

4.1. Experiment Materials and Setup

4.1.1. Dataset

We conducted experiments on the weather time series dataset collected from the Max Planck Institute for Biogeochemistry, Jena, Germany in tabular format. The datasets contained 21 different meteorological variables, including temperature, humidity, and wind speed, as listed in the Appendix A. The variables were recorded every 10 min over the whole year of 2020. Using a ratio of 7:1:2, we split the dataset into a training set for training the model, a validation set for tuning the hyperparameters, and a test set for evaluating the model’s performance after training.
As demonstrated in Figure 4, some of the meteorological variables exhibit unstable and nonstationary properties with noise and fluctuations, making it challenging for general forecasting models to capture useful features precisely and resulting in less effective performance on forecasting tasks.

4.1.2. Backbone Models

For our experimental evaluation, we selected three different types of time series models as the backbone models for our PCC method to serve as the baselines for the comparison. The selected backbone models have shown the state-of-the-art performance among their categories in multivariate time series forecasting tasks. The chosen models included an RNN-based model (SegRNN), two transformer-based models (iTransformer and PatchTST), and an MLP-based model (DLinear). In addition to these advanced general models, we also selected stackLSTM as a backbone model, as it has been extensively utilized in numerous weather time series research studies to model temporal variations and has demonstrated versatile performance in this regard. The detailed settings of the backbone models are listed in Appendix D.

4.1.3. Training and Evaluation Setup

During the training process, we set the default length of the observed series O to 96 (equivalent to 16 h) and set the length of the future series P as { 96 , 192 , 336 , 720 } , which means predicting the weather for the next 16, 32, 56, and 120 h.
Although our PCC module is simple and lightweight compared to the backbone models, it introduces additional complexity as a supplementary component, which may introduce additional risk of unstable training. We carefully selected the hyperparameters of the PCC module as follows:
1.
The hidden dimension for the two submodule networks was set as 128, balancing performance and computational complexity.
2.
Tanh was chosen as the activation function σ due to its symmetry around zero and smooth gradient properties. These characteristics ensure training stability, which is particularly important because weather variables often contain extreme values and rapid fluctuations.
3.
The dropout rate was set to 0.3, representing an optimal tradeoff between regularization to prevent overfitting and maintaining sufficient network capacity.
4.
The learnable matrices λ 1 , λ 2 were both initialized to 1, ensuring that the differential terms start with equivalent contribution weights. This helps to prevent initial bias that may cause unstable parameter updates during the early training stages.
We evaluated PCC’s sensitivity to the settings of the hidden dimension, dropout rate, and initialization of the differential matrix in Appendix C.
Our PCC module functions as a plug-and-play component that can be integrated into backbone models by simply connecting it at the feature dimension. This integration requires no modifications to the backbone architecture itself nor additional data preprocessing steps such as normalization. Therefore, in our experiments we followed the same settings for each model as in their original publications. In addition, we used the official code implementations, including data processing methods, normalization methods, hyperparameters, learning rate, etc., ensuring fairness of the comparison and reproducibility of the results. The detailed training configurations of all the backbone models are provided in Appendix D.
We evaluated the models’ performance on the test set using the metrics of mean absolute error (MAE) and mean squared error (MSE). To ensure reliability, each numerical result was the average of five runs with different random seeds.
The experiments were conducted on a server with an Intel Xeon Platinum 8474C CPU, 80 GB memory, and an NVIDIA RTX 4090D GPU with 24 GB memory. The PCC module and backbone models were implemented with PyTorch 2.0.0 and CUDA 11.8 with Python 3.8 on Ubuntu 20.04.

4.2. Main Results and Discussion

4.2.1. Main Forecasting Results

As shown in Table 1, the PCC module can effectively improve the performance of all backbone models on weather time series forecasting tasks across different prediction horizons. The models augmented with the PCC module all achieve significantly lower MAE and MSE than the original models. Notably, our PCC respectively reduces the MSE of the DLinear and SegRNN models by 17.34% and 11.02% on average. The improvement is particularly pronounced for shorter prediction horizons, with a 26.57% reduction in MSE for DLinear and 11.11% reduction for SegRNN with a prediction horizon of 96. The detailed error bars, which include the standard deviation and the confidence interval, are listed in Appendix B.

4.2.2. Different Length of Observed Time Series

For a more comprehensive evaluation, Table 2 lists the performance of the PCC module with different lengths of the observed time series. As the table shows, the PCC module improves the performance of the different backbone models across varying observation lengths. The improvements are more significant in shorter observed time series. This indicates that PCC can help to reduce the model’s dependence on the length of the observed series. For instance, the DLinear model with PCC achieves better performance based on 8 h of observed time series than the original model based on 32 h of observation. Thus, our PCC module can help the model to capture the underlying patterns more effectively in the data scarcity scenario, leading to more robust and reliable results on the weather forecasting task with less data storage.

4.2.3. Key Variables Analysis

We also conducted a key variables analysis to demonstrate the effectiveness of the PCC module with different crucial meteorological variables, including air temperature, specific humidity, wind velocity, precipitation, etc. These variables are widely utilized in weather forecasting tasks, have shown significant impacts on climate change, and correlate closely with real-world applications such as agriculture, transportation, etc. Importantly, they also exhibit significant correlation patterns between each other and have specific ranges of values. A Table 3 demonstrates, the PCC module can improve the performance of different backbone models with respect to these different key meteorological variables.

4.2.4. Impact of Strongly Correlated Variables

One potential concern with the dataset used in our experiments is the presence of strongly correlated variables, particularly those related to humidity measurements (e.g., Tdew, rh, VPmax) and some temperature-related variables (e.g., T, Tlog). These variables describe similar physical properties but do so in different mathematical representations, which might introduce unwanted biases or redundancies in the model.
To investigate this concern, we conduct additional experiments using a reduced variable set that excluded these potentially redundant features. Specifically, we removed the following variables: Tdew (dew point temperature), VPmax (saturation water vapor pressure), VPact (actual water vapor pressure), VPdef (water vapor pressure deficit), sh (specific humidity), H2OC (water vapor concentration), Tpot (potential temperature), and Tlog (temperature in log).
Table 4 presents a comparison of model performance with the full variable set and the reduced variable set. Contrary to what might be expected, the models trained on the full variable set consistently outperformed those trained on the reduced set regardless of whether PCC was applied. Moreover, PCC maintained its performance improvement ability even with the reduced variable set.
These results suggest that while these particular variables are mathematically related or describe similar weather conditions, they also contribute unique and valuable information that is relevant to the forecasting task. The relationships between these variables appear to provide useful signals rather than misleading correlations. Furthermore, our PCC module effectively leverages these relationships rather than being hindered by them, as evidenced by its consistent performance improvement across both variable sets.
These findings highlight the robustness and generalization capability of our approach even when processing highly correlated input features, which is a common characteristic in meteorological datasets.

4.3. Method Analysis Results and Discussion

4.3.1. Ablation Study

To verify the effectiveness of each design of the PCC module, we conducted an ablation study. We removed or replaced each part of the PCC module, including removal of the two submodules, differential representation inside the state correction network, and inverting the order of the two submodules. The experiments used SegRNN as the backbone model with a look-back window of 96 and prediction horizon of 96. As demonstrated in Table 5, the PCC module with default settings achieves the best performance. Removing the State Correction module leads to the most significant performance degradation, particularly for longer prediction horizons. This indicates that the SC module plays a crucial role in stabilizing predictions and achieving improvement, given the distinct unstable fluctuations and strict physical value constraints among the weather time series.

4.3.2. Scalability Analysis

To evaluate the scalability of our approach, we conducted experiments comparing the parameter efficiency between the original backbone model and our PCC-enhanced version. We used SegRNN as the backbone model and varied its parameter scale through different hidden dimensions: {64, 128, 256, 512, 1024} for the original design and {16, 32, 64} for the PCC-enhanced design, both based on the same observation and prediction length of 96. According to the fitting curves of the results in Figure 5, the PCC-enhanced SegRNN has better accuracy with learnable parameters of less than about 1 % of the original SegRNN’s parameters. This indicates that our PCC helps the original forecasting models to more effectively capture crucial meteorological patterns and achieve higher accuracy with fewer parameters. This reduction in model size directly translates to lower computational demands and deployment costs on hardware devices.

4.3.3. Training Cost and Stability

To more practically and comprehensively validate our method’s efficiency and resource utilization, we present the training time and memory consumption of our method in Table 6, using SegRNN as backbone model with the same look-back and forecasting horizon of 96. The results show that the performance improvement requires only modest and acceptable increases in computational overhead. The shorter training time with the 720 horizon indicates that despite PCC introducing additional computational complexity, it can help the model to achieve more stable training and faster convergence when the forecasting horizon is longer and more likely to be unstable.
Notably, the memory consumption remains nearly identical (less than 0.5% difference) when the prediction horizon is 720. Counter-intuitively, the maximum memory consumption with PCC is slightly lower than the original SegRNN when the prediction horizon is 96. We attribute this to the additional influence of the PCC module on the GPU memory allocation optimization strategies of PyTorch, as the additional memory consumption introduced by PCC is too small to be a significant factor in the total memory consumption.
To evaluate the training process, Figure 6 illustrates the temporal evolution of the training and validation losses for the SegRNN backbone with and without PCC integration. The experiments were conducted with a fixed look-back window of 96 and forecasting lengths of 96 and 720. The training loss trajectories (depicted by blue lines) for both model variants exhibit stability and convergence in the latter stages of training, demonstrating that incorporation of the PCC module maintains training stability while enhancing predictive performance.

4.3.4. Comparison with Complex Mechanisms

The computational complexity of our PCC method primarily stems from the simple fully connected layers and element-wise tensor manipulations in the differential operations. Both of these can be accelerated by parallel computing on GPUs, resulting in lower computational overhead compared to more complex structures such as self-attention mechanisms. We conducted a comparison of performance and computational cost between the proposed PCC module and some complex mechanisms, again using SegRNN as the backbone model. Table 7 shows the results. The methods for comparison included the following:
  • SAMP + SC: Replaces the MCC module with SAMP [46], a self-attention based method for capturing variable correlation in time series.
  • LIFT + SC: Replaces the MCC module with LIFT [47], a channel dependence correction method for time series forecasting using linear structures with complex calculations such as Fourier transforms.
  • saPCC: Replaces the MLP structure in both PCC submodules with a multihead self-attention mechanism.
The implementation details of these three methods are described in Appendix E.
The results show that PCC achieves superior performance with its simple linear structure while maintaining significantly lower computational complexity, making it more suitable for resource-sensitive practical scenarios. We attribute this to the temporal independence in the design of the MCC module, which avoids the negative effects of temporal variance, making it more effective in capturing physical correlation patterns.

4.3.5. Robustness Analysis

As deep learning models are data-driven methods, it is well known that they are sensitive to data quality. Noisy data may introduce negative impacts on model training and lead to poor performance [48]. Unfortunately, time series data collected from weather stations are sometimes noisy [49]. In this section, we evaluate the robustness of our PCC module under various noise conditions to validate its effectiveness in practical forecasting scenarios.
First, we introduce Additive Gaussian Noise (AGN), then scale the AGN by a predefined weight and add it to the dataset. We evaluated two training cases, first adding noise only to the input sequence and then adding it to the complete sequence. We used the SegRNN model as the backbone, with the same look-back window and prediction horizon of 96. The scale weights of the AGN were 0.1 and 0.3. The results listed in Table 8 show the robustness of our PCC module against noise and demonstrate its ability to effectively capture the underlying patterns in noisy scenarios while improving the backbone model’s performance.

4.3.6. Visualization of the Two Submodules in PCC

Figure 7 visualizes the weights and behavior of the Multi-variants Correlation Constraint (MCC) and State Correction (SC) submodules using heatmaps. We used the DLinear model as the backbone, with the same look-back and prediction length of 96.
To visualize the submodules’ operational behavior while mitigating the negative impact of extreme or zero values on the visualization, we calculated the mean squared roots of the ratios between the submodules’ absolute output and input values instead of using the ratios directly.
The visualizations of the weights and ratio metrics demonstrate that the two submodules have different preferences for the time steps and variables. The MCC module is more active in the early prediction stages, which are closer to the reference point, as well as for those variables with strong correlation patterns, such as humidity, wind velocity, and precipitation. The SC module places more emphasis on the later stages, following the regular pattern that the more distant points are more likely to be unstable during model prediction. In addition, it focuses more on variables which are more likely to be unstable or have more complex and significant variance patterns, such as temperature, water vapor pressure, wind velocity and direction, and photosynthetically active radiation. After the initial period, the activity of both modules becomes stable and time-insensitive.

4.3.7. Visualization of Differential Operation

To provide more intuitive insight into the SC module’s effectiveness, we directly visualize the differential operation’s learnable matrices λ 1 and λ 2 in the state correction module. First, we reparameterize these two matrices as
λ d i f f = exp ( λ 1 ) exp ( λ 2 ) ,
where λ d i f f is the reparameterized differential operation matrix. Then, we visualize λ d i f f in Figure 8, demonstrating that the differential operation prefers to enhance those features that are farther away from the observation series.

5. Conclusions and Future Work

In this paper, we have presented a generic plug-and-play approach named Post Constraint and Correction (PCC) for improving deep learning-based multivariate weather time series forecasting by explicitly incorporating domain knowledge such as the physical characteristics of meteorological variables. Our method addresses two key challenges in weather time series prediction: (1) missing or improper alignment of the variables’ physical interdependencies, and (2) imprecise or unreasonable predictions caused by neural network randomness and the strict physical constraints of meteorological variables.
Extensive experimental results across different cases clearly demonstrate the effectiveness of our PCC approach in terms of improved accuracy and cost-efficiency. For instance, our method reduces the prediction MSE by up to 11% on average with the SegRNN backbone while requiring less than 0.5% additional GPU memory consumption. Our analysis of PCC’s scalability demonstrates that the proposed module enables backbone models to achieve better performance with a smaller number of parameters, making it particularly suitable for resource-constrained environments where computational efficiency is as critical as forecasting accuracy.
Additionally, our ablation studies validate the importance of each component in the proposed method along with the reasonableness of our design. Visualization analysis further reveals the interpretability of the module’s behavior, showing how it effectively learns and leverages the specific properties of meteorological time series to achieve improved model performance.
While our PCC module demonstrates significant performance improvements, we acknowledge interpretability limitations common to deep learning approaches in physical systems. Unlike traditional physics-based models, neural networks often function as ‘black boxes’, making it difficult to fully explain how they incorporate domain knowledge [50]. In our opinion, there is a promising approach to address this challenge by developing hybrid architectures that more explicitly incorporate physical relationships. For example, implementing gating mechanisms that selectively apply predefined physical constraints to the model outputs may offer an optimal balance between purely data-driven approaches and physics-constrained models. Such hybrid approaches can not only enhance interpretability but might further reduce model complexity while maintaining or improving prediction accuracy. We consider this a promising direction for our ongoing research, and plan to thoroughly investigate such mechanisms in subsequent work.

Author Contributions

Conceptualization, Z.W., Z.L. and Y.L.; methodology, Z.W.; software, Z.W.; validation, Z.W.; formal analysis, Z.W.; investigation, Z.W. and Z.Y.; resources, Z.W. and Z.L.; data curation, Z.W. and Y.L.; writing—original draft preparation, Z.W.; writing—review and editing, Z.W., Z.L. and Z.Y.; visualization, Z.W. and Z.Y.; supervision, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset we used in our experiment was obtained from the open time series library at https://github.com/thuml/Time-Series-Library (accessed on 29 May 2024). This dataset is preprocessed in a tabular format. The raw CSV files for different years are available at https://www.bgc-jena.mpg.de/wetter/weather_data.html (accessed on 29 May 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Dataset Variables

In this appendix, we provide a detailed list of all variables used in the Max-Planck-Institut Weather Time Series dataset. The dataset contains 21 meteorological parameters. The following table presents the symbol, unit, and detailed description of each variable.
Table A1. Variables and descriptions of the Max-Planck-Institut Weather Time Series dataset.
Table A1. Variables and descriptions of the Max-Planck-Institut Weather Time Series dataset.
SymbolUnitDescription
Pmbarair pressure
T°Cair temperature
TpotKpotential temperature
Tdew°Cdew point temperature
rh%relative humidity
VPmaxmbarsaturation water vapor pressure
VPactmbaractual water vapor pressure
VPdefmbarwater vapor pressure deficit
shg/kgspecific humidity
H2OCmmol/molwater vapor concentration
rhog/m3air density
wvm/swind velocity
max. wvm/smaximum wind velocity
wd°wind direction
rainmmprecipitation
rainingsprecipitation duration
SWDRW/m2shortwave downward radiation
PARW/m2photosynthetically active radiation
max. PARW/m2maximum photosynthetically active radiation
Tlog°Ctemperature in log
CO2ppmcarbon dioxide concentration of ambient air

Appendix B. Error Bars

To validate our method’s robustness against initialization and random factors, Table A2 presents the standard deviations and confidence intervals of the five backbone models with and without PCC. All results are calculated across five runs with different random seeds.
Table A2. Standard deviations and confidence intervals of the five backbone models with and without PCC.
Table A2. Standard deviations and confidence intervals of the five backbone models with and without PCC.
stack LSTM+PCCOriginalConfidence Interval
MSEMAEMSEMAE
96 0.221 ± 0.016 0.266 ± 0.011 0.298 ± 0.026 0.318 ± 0.017 99%
192 0.242 ± 0.009 0.289 ± 0.008 0.387 ± 0.014 0.373 ± 0.012 99%
336 0.285 ± 0.009 0.314 ± 0.006 0.501 ± 0.048 0.459 ± 0.038 99%
720 0.369 ± 0.016 0.376 ± 0.012 0.548 ± 0.004 0.495 ± 0.004 99%
DLinear+PCCOriginalConfidence Interval
MSEMAEMSEMAE
96 0.152 ± 0.000 0.197 ± 0.001 0.207 ± 0.000 0.233 ± 0.000 99%
192 0.196 ± 0.000 0.239 ± 0.000 0.244 ± 0.000 0.269 ± 0.000 99%
336 0.243 ± 0.000 0.276 ± 0.001 0.286 ± 0.000 0.306 ± 0.000 99%
720 0.304 ± 0.001 0.318 ± 0.001 0.345 ± 0.000 0.354 ± 0.001 99%
iTransformer+PCCOriginalConfidence Interval
MSEMAEMSEMAE
96 0.159 ± 0.001 0.204 ± 0.001 0.175 ± 0.001 0.215 ± 0.001 99%
192 0.208 ± 0.001 0.248 ± 0.001 0.225 ± 0.001 0.258 ± 0.001 99%
336 0.267 ± 0.001 0.291 ± 0.001 0.281 ± 0.001 0.299 ± 0.001 99%
720 0.347 ± 0.001 0.344 ± 0.001 0.358 ± 0.001 0.350 ± 0.001 99%
PatchTST+PCCOriginalConfidence Interval
MSEMAEMSEMAE
96 0.157 ± 0.001 0.205 ± 0.001 0.175 ± 0.001 0.216 ± 0.001 99%
192 0.207 ± 0.001 0.249 ± 0.001 0.221 ± 0.001 0.257 ± 0.002 99%
336 0.270 ± 0.001 0.293 ± 0.001 0.280 ± 0.001 0.298 ± 0.001 99%
720 0.345 ± 0.001 0.344 ± 0.001 0.352 ± 0.001 0.347 ± 0.001 99%
SegRNN+PCCOriginalConfidence Interval
MSEMAEMSEMAE
96 0.144 ± 0.000 0.189 ± 0.000 0.162 ± 0.000 0.200 ± 0.000 99%
192 0.189 ± 0.001 0.233 ± 0.001 0.208 ± 0.000 0.243 ± 0.000 99%
336 0.238 ± 0.001 0.273 ± 0.001 0.264 ± 0.000 0.285 ± 0.000 99%
720 0.299 ± 0.001 0.317 ± 0.001 0.347 ± 0.000 0.340 ± 0.000 99%

Appendix C. Hyperparameter Sensitivity

We evaluated the sensitivity of our method to key hyperparameters, including the hidden dimensions and dropout rate of the two submodules’ networks. The results are shown in Figure A1, and the initialization of the SC module’s differential matrices are shown in Table 3. The backbone model is SegRNN with a look-back length of 96. These results demonstrate that our method’s performance is insensitive and robust across different hyperparameter settings, suggesting efficient hyperparameter tuning costs during training.
Figure A1. Performance of our method with different hyperparameter settings, including the hidden dimensions and dropout rates of the submodule networks. SegRNN is the backbone model, with a look-back length of 96 and prediction horizon of 96.
Figure A1. Performance of our method with different hyperparameter settings, including the hidden dimensions and dropout rates of the submodule networks. SegRNN is the backbone model, with a look-back length of 96 and prediction horizon of 96.
Applsci 15 03935 g0a1
Table A3. MSE of forecasting results for SegRNN + PCC with different initialization of the SC submodule’s differential matrices across different lengths of forecasting series ( { 96 , 192 , 336 , 720 } ) with a consistent observation length of 96.
Table A3. MSE of forecasting results for SegRNN + PCC with different initialization of the SC submodule’s differential matrices across different lengths of forecasting series ( { 96 , 192 , 336 , 720 } ) with a consistent observation length of 96.
SettingMSE
96192336720
One 0.144 ± 0.000 0.189 ± 0.001 0.238 ± 0.001 0.299 ± 0.001
Zero 0.144 ± 0.000 0.190 ± 0.000 0.239 ± 0.001 0.299 ± 0.001
Zero-One 0.145 ± 0.001 0.190 ± 0.001 0.240 ± 0.001 0.300 ± 0.001
One-Zero 0.145 ± 0.001 0.190 ± 0.001 0.240 ± 0.001 0.302 ± 0.002

Appendix D. Detailed Settings of Backbone Models

We chose five different models as backbone models for comparison: SegRNN, iTransformer, PatchTST, DLinear, and stackLSTM. The detailed settings used for the backbone models are listed in Table A4, Table A5, Table A6, Table A7 and Table A8.
Table A4. Detailed hyperparameter settings of SegRNN.
Table A4. Detailed hyperparameter settings of SegRNN.
ParameterValueParameterValue
Segment Length48Learning rate0.0001
RNN typeGRUBatch size64
RNN layers1Training epochs30
Hidden size512Training patience10
Dropout0.5Training lossMAE
Table A5. Detailed hyperparameter settings of iTransformer.
Table A5. Detailed hyperparameter settings of iTransformer.
ParameterValueParameterValue
Encoder layers3Learning rate0.0001
Heads8Batch size32
Hidden dimensions512Training epochs10
Dropout rate0.1Training patience3
Training lossMSE
Table A6. Detailed hyperparameter settings of PatchTST.
Table A6. Detailed hyperparameter settings of PatchTST.
ParameterValueParameterValue
Patch size16Learning rate0.0001
Stride8Batch size (96, 192)32
Head (96, 336, 720)4Batch size (336, 720)128
Head (192)16Training epochs3
Hidden dimensions512Training patience3
Encoder layers2Training lossMSE
Table A7. Detailed hyperparameter settings of DLinear.
Table A7. Detailed hyperparameter settings of DLinear.
ParameterValueParameterValue
Kernel size25Learning rate0.0001
Training epochs30Batch size64
Training patience10Training lossMAE
Table A8. Detailed hyperparameter settings of stackLSTM.
Table A8. Detailed hyperparameter settings of stackLSTM.
ParameterValueParameterValue
Hidden size512Learning rate0.0001
LSTM layers4Batch size64
Training epochs30Training lossMAE
Training patience10

Appendix E. Implementation Details of Complex Mechanisms

In this manuscript’s Section 4.1, we chose three different complex mechanisms for comparison with our PCC module: SAMP, saPCC, and LIFT. Of these, SAMP and saPCC are both based on multi-head self-attention, while LIFT is based on a linear structure. For the attention-based methods, we set the number of attention heads in SAMP to 1 and that in saPCC to 8. For LIFT, we set the number of leaders to 4, number of states to 8, and temperature to 1.0.

References

  1. Graham, A.; Mishra, E.P. Time series analysis model to forecast rainfall for Allahabad region. J. Pharmacogn. Phytochem. 2017, 6, 1418–1421. [Google Scholar]
  2. Shivhare, N.; Rahul, A.K.; Dwivedi, S.B.; Dikshit, P.K.S. ARIMA based daily weather forecasting tool: A case study for Varanasi. Mausam 2019, 70, 133–140. [Google Scholar]
  3. Poterjoy, J. Implications of multivariate non-Gaussian data assimilation for multiscale weather prediction. Mon. Weather. Rev. 2022, 150, 1475–1493. [Google Scholar]
  4. Moreno, S.R.; dos Santos Coelho, L. Wind speed forecasting approach based on singular spectrum analysis and adaptive neuro fuzzy inference system. Renew. Energy 2018, 126, 736–754. [Google Scholar] [CrossRef]
  5. Yano, J.I.; Ziemiański, M.Z.; Cullen, M.; Termonia, P.; Onvlee, J.; Bengtsson, L.; Carrassi, A.; Davy, R.; Deluca, A.; Gray, S.L.; et al. Scientific challenges of convective-scale numerical weather prediction. Bull. Am. Meteorol. Soc. 2018, 99, 699–710. [Google Scholar]
  6. Schultz, M.G.; Betancourt, C.; Gong, B.; Kleinert, F.; Langguth, M.; Leufen, L.H.; Mozaffari, A.; Stadtler, S. Can deep learning beat numerical weather prediction? Philos. Trans. R. Soc. 2021, 379, 20200097. [Google Scholar]
  7. Wang, Y.; Wu, H.; Dong, J.; Liu, Y.; Long, M.; Wang, J. Deep time series models: A comprehensive survey and benchmark. arXiv 2024, arXiv:2407.13278. [Google Scholar]
  8. Lim, B.; Zohren, S. Time-series forecasting with deep learning: A survey. Philos. Trans. R. Soc. A 2021, 379, 20200209. [Google Scholar]
  9. Wilby, R.L.; Troni, J.; Biot, Y.; Tedd, L.; Hewitson, B.C.; Smith, D.M.; Sutton, R.T. A review of climate risk information for adaptation and development planning. Int. J. Climatol. J. R. Meteorol. Soc. 2009, 29, 1193–1215. [Google Scholar]
  10. Bauer, P.; Thorpe, A.; Brunet, G. The quiet revolution of numerical weather prediction. Nature 2015, 525, 47–55. [Google Scholar]
  11. Shen, C. A transdisciplinary review of deep learning research and its relevance for water resources scientists. Water Resour. Res. 2018, 54, 8558–8593. [Google Scholar]
  12. Kurth, T.; Subramanian, S.; Harrington, P.; Pathak, J.; Mardani, M.; Hall, D.; Miele, A.; Kashinath, K.; Anandkumar, A. Fourcastnet: Accelerating global high-resolution weather forecasting using adaptive fourier neural operators. In Proceedings of the Platform for Advanced Scientific Computing Conference, Davos, Switzerland, 26–28 June 2023; pp. 1–11. [Google Scholar]
  13. Zhu, X.; Xiong, Y.; Wu, M.; Nie, G.; Zhang, B.; Yang, Z. Weather2k: A multivariate spatio-temporal benchmark dataset for meteorological forecasting based on real-time observation data from ground weather stations. arXiv 2023, arXiv:2302.10493. [Google Scholar]
  14. Dubey, A.K.; Kumar, A.; García-Díaz, V.; Sharma, A.K.; Kanhaiya, K. Study and analysis of SARIMA and LSTM in forecasting time series data. Sustain. Energy Technol. Assessments 2021, 47, 101474. [Google Scholar] [CrossRef]
  15. Ray, S.; Das, S.S.; Mishra, P.; Al Khatib, A.M.G. Time series SARIMA modelling and forecasting of monthly rainfall and temperature in the South Asian countries. Earth Syst. Environ. 2021, 5, 531–546. [Google Scholar] [CrossRef]
  16. Hewage, P.; Behera, A.; Trovati, M.; Pereira, E.; Ghahremani, M.; Palmieri, F.; Liu, Y. Temporal convolutional neural (TCN) network for an effective weather forecasting using time-series data from the local weather station. Soft Comput. 2020, 24, 16453–16482. [Google Scholar] [CrossRef]
  17. Verdonck, T.; Baesens, B.; Óskarsdóttir, M.; vanden Broucke, S. Special issue on feature engineering editorial. Mach. Learn. 2024, 113, 3917–3928. [Google Scholar] [CrossRef]
  18. Bi, K.; Xie, L.; Zhang, H.; Chen, X.; Gu, X.; Tian, Q. Accurate medium-range global weather forecasting with 3D neural networks. Nature 2023, 619, 533–538. [Google Scholar] [CrossRef]
  19. Chen, K.; Han, T.; Gong, J.; Bai, L.; Ling, F.; Luo, J.J.; Chen, X.; Ma, L.; Zhang, T.; Su, R.; et al. Fengwu: Pushing the skillful global medium-range weather forecast beyond 10 days lead. arXiv 2023, arXiv:2304.02948. [Google Scholar]
  20. Karevan, Z.; Suykens, J.A. Transductive LSTM for time-series prediction: An application to weather forecasting. Neural Netw. 2020, 125, 1–9. [Google Scholar] [CrossRef]
  21. Al Sadeque, Z.; Bui, F.M. A deep learning approach to predict weather data using cascaded LSTM network. In Proceedings of the 2020 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), London, ON, Canada, 30 August–2 September 2020; IEEE: New York, NY, USA, 2020; pp. 1–5. [Google Scholar]
  22. Dikshit, A.; Pradhan, B.; Alamri, A.M. Long lead time drought forecasting using lagged climate variables and a stacked long short-term memory model. Sci. Total Environ. 2021, 755, 142638. [Google Scholar]
  23. Yan, Z.; Lu, X.; Wu, L. Exploring the Effect of Meteorological Factors on Predicting Hourly Water Levels Based on CEEMDAN and LSTM. Water 2023, 15, 3190. [Google Scholar] [CrossRef]
  24. Wang, H. Weather temperature prediction based on LSTM and transformer. In Proceedings of the International Conference on Electronics, Electrical and Information Engineering (ICEEIE 2024), Bangkok, Thailand, 16–18 August 2024; SPIE: Bellingham, WA, USA, 2024; Volume 13445, pp. 206–214. [Google Scholar]
  25. Sezer, O.B.; Gudelek, M.U.; Ozbayoglu, A.M. Financial time series forecasting with deep learning: A systematic literature review: 2005–2019. Appl. Soft Comput. 2020, 90, 106181. [Google Scholar]
  26. Zheng, J.; Huang, M. Traffic flow forecast through time series analysis based on deep learning. IEEE Access 2020, 8, 82562–82570. [Google Scholar]
  27. Jaseena, K.; Kovoor, B.C. Deterministic weather forecasting models based on intelligent predictors: A survey. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 3393–3412. [Google Scholar]
  28. Xiao, J.; Zhou, Z. Research progress of RNN language model. In Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 27–29 June 2020; IEEE: New York, NY, USA, 2020; pp. 1285–1288. [Google Scholar]
  29. Hewamalage, H.; Bergmeir, C.; Bandara, K. Recurrent neural networks for time series forecasting: Current status and future directions. Int. J. Forecast. 2021, 37, 388–427. [Google Scholar]
  30. Saini, U.; Kumar, R.; Jain, V.; Krishnajith, M. Univariant Time Series forecasting of Agriculture load by using LSTM and GRU RNNs. In Proceedings of the 2020 IEEE Students Conference on Engineering & Systems (SCES), Prayagraj, India, 10–12 July 2020; IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
  31. Casado-Vara, R.; Martin del Rey, A.; Pérez-Palau, D.; de-la Fuente-Valentín, L.; Corchado, J.M. Web traffic time series forecasting using LSTM neural networks with distributed asynchronous training. Mathematics 2021, 9, 421. [Google Scholar] [CrossRef]
  32. Amalou, I.; Mouhni, N.; Abdali, A. Multivariate time series prediction by RNN architectures for energy consumption forecasting. Energy Rep. 2022, 8, 1084–1091. [Google Scholar]
  33. Lin, S.; Lin, W.; Wu, W.; Zhao, F.; Mo, R.; Zhang, H. Segrnn: Segment recurrent neural network for long-term time series forecasting. arXiv 2023, arXiv:2308.11200. [Google Scholar]
  34. Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; Sun, L. Transformers in time series: A survey. arXiv 2022, arXiv:2202.07125. [Google Scholar]
  35. Ren, H.; Dai, H.; Dai, Z.; Yang, M.; Leskovec, J.; Schuurmans, D.; Dai, B. Combiner: Full attention transformer with sparse computation cost. Adv. Neural Inf. Process. Syst. 2021, 34, 22470–22482. [Google Scholar]
  36. Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. arXiv 2022, arXiv:2211.14730. [Google Scholar]
  37. Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. itransformer: Inverted transformers are effective for time series forecasting. arXiv 2023, arXiv:2310.06625. [Google Scholar]
  38. Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
  39. Liu, Y.; Wu, H.; Wang, J.; Long, M. Non-stationary transformers: Exploring the stationarity in time series forecasting. Adv. Neural Inf. Process. Syst. 2022, 35, 9881–9893. [Google Scholar]
  40. Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 11121–11128. [Google Scholar]
  41. Yun, K.S.; Lee, J.Y.; Timmermann, A.; Stein, K.; Stuecker, M.F.; Fyfe, J.C.; Chung, E.S. Increasing ENSO–rainfall variability due to changes in future tropical temperature–rainfall relationship. Commun. Earth Environ. 2021, 2, 43. [Google Scholar]
  42. Wilby, R.L.; Wigley, T. Precipitation predictors for downscaling: Observed and general circulation model relationships. Int. J. Climatol. J. R. Meteorol. Soc. 2000, 20, 641–661. [Google Scholar]
  43. Park, S.; Kwak, N. Analysis on the dropout effect in convolutional neural networks. In Proceedings of the Computer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; Revised Selected Papers, Part II 13. Springer: Cham, Switzerland, 2017; pp. 189–204. [Google Scholar]
  44. Ye, T.; Dong, L.; Xia, Y.; Sun, Y.; Zhu, Y.; Huang, G.; Wei, F. Differential transformer. arXiv 2024, arXiv:2410.05258. [Google Scholar]
  45. Laplante, P.A.; Cravey, R.; Dunleavy, L.P.; Antonakos, J.L.; LeRoy, R.; East, J.; Buris, N.E.; Conant, C.J.; Fryda, L.; Boyd, R.W.; et al. Compr. Dict. Electr. Eng.; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
  46. Wang, H.; Wang, Z.; Niu, Y.; Liu, Z.; Li, H.; Liao, Y.; Huang, Y.; Liu, X. An Accurate and interpretable framework for trustworthy process monitoring. IEEE Trans. Artif. Intell. 2023, 5, 2241–2252. [Google Scholar]
  47. Zhao, L.; Shen, Y. Rethinking Channel Dependence for Multivariate Time Series Forecasting: Learning from Leading Indicators. arXiv 2024, arXiv:2401.17548. [Google Scholar]
  48. Whang, S.E.; Lee, J.G. Data collection and quality challenges for deep learning. Proc. VLDB Endow. 2020, 13, 3429–3432. [Google Scholar]
  49. Zhong, R.; Jun, S.; Xu, P. Analysis and de-noise of time series data from automatic weather station using chaos-based adaptive B-spine method. In Proceedings of the 2011 International Conference on Remote Sensing, Environment and Transportation Engineering, Nanjing, China, 24–26 June 2011; IEEE: New York, NY, USA, 2011; pp. 4765–4769. [Google Scholar]
  50. Yang, R.; Hu, J.; Li, Z.; Mu, J.; Yu, T.; Xia, J.; Li, X.; Dasgupta, A.; Xiong, H. Interpretable machine learning for weather and climate prediction: A review. Atmos. Environ. 2024, 338, 120797. [Google Scholar]
Figure 1. The architecture of our PCC module, which consists of two main submodules: Multi-variants Correlation Constraint (MCC) and State Correction (SC). The Backbone denotes the original time series forecasting model that we aim to enhance, while S2B indicates conversion of the time series to the bias matrix and B2S denotes conversion of the bias matrix back to a time series. MLP denotes the Multi-Layer Perceptron, including the fully connected layer, activation function, and dropout layer, while DMLP denotes the MLP with differential representation.
Figure 1. The architecture of our PCC module, which consists of two main submodules: Multi-variants Correlation Constraint (MCC) and State Correction (SC). The Backbone denotes the original time series forecasting model that we aim to enhance, while S2B indicates conversion of the time series to the bias matrix and B2S denotes conversion of the bias matrix back to a time series. MLP denotes the Multi-Layer Perceptron, including the fully connected layer, activation function, and dropout layer, while DMLP denotes the MLP with differential representation.
Applsci 15 03935 g001
Figure 2. Architecture of the MCC constraint module deployed in an MLP.
Figure 2. Architecture of the MCC constraint module deployed in an MLP.
Applsci 15 03935 g002
Figure 3. Architecture of the MLP with differential internal representations deployed in the state correction module. Here, λ 1 , λ 2 are the learnable matrices of the differential operation on representation h s . We introduce the differential operation d i f f as a differential amplifier to enhance the subtle features while suppressing irrelevant noise, which helps the MLP to more effectively learn the state’s subtle features among the large number of meteorological states.
Figure 3. Architecture of the MLP with differential internal representations deployed in the state correction module. Here, λ 1 , λ 2 are the learnable matrices of the differential operation on representation h s . We introduce the differential operation d i f f as a differential amplifier to enhance the subtle features while suppressing irrelevant noise, which helps the MLP to more effectively learn the state’s subtle features among the large number of meteorological states.
Applsci 15 03935 g003
Figure 4. Trends of four meteorological variables from the Max-Planck-Institut dataset for the same time period. The trends of air temperature and specific humidity show obvious similarity, while the trend of wind velocity carries dense noise and fluctuations and is relatively independent. There are also some time slots with zero precipitation.
Figure 4. Trends of four meteorological variables from the Max-Planck-Institut dataset for the same time period. The trends of air temperature and specific humidity show obvious similarity, while the trend of wind velocity carries dense noise and fluctuations and is relatively independent. There are also some time slots with zero precipitation.
Applsci 15 03935 g004
Figure 5. Fitting curves showing the performance of the SegRNN backbone model with different parameter scales with and without our PCC module. The results indicate that the proposed PCC module can help the backbone model to achieve better performance with a significantly smaller number of parameters.
Figure 5. Fitting curves showing the performance of the SegRNN backbone model with different parameter scales with and without our PCC module. The results indicate that the proposed PCC module can help the backbone model to achieve better performance with a significantly smaller number of parameters.
Applsci 15 03935 g005
Figure 6. Evolution of the training and validation losses of SegRNN with and without PCC for forecasting lengths of 96 and 720 and with the look-back length fixed at 96.
Figure 6. Evolution of the training and validation losses of SegRNN with and without PCC for forecasting lengths of 96 and 720 and with the look-back length fixed at 96.
Applsci 15 03935 g006
Figure 7. Heatmap visualization of the weights and input–output ratios for the MCC and SC modules within PCC. Lighter color indicates more activity. Subfigures (a,b) respectively visualize the learned weights of the MCC and SC submodules, while (c,d) show their corresponding input–output ratios. This figure demonstrates that the two submodules have different preferences for the different time steps and variables.
Figure 7. Heatmap visualization of the weights and input–output ratios for the MCC and SC modules within PCC. Lighter color indicates more activity. Subfigures (a,b) respectively visualize the learned weights of the MCC and SC submodules, while (c,d) show their corresponding input–output ratios. This figure demonstrates that the two submodules have different preferences for the different time steps and variables.
Applsci 15 03935 g007
Figure 8. Visualization of the differential operation matrix acting on the hidden representation of the state correction module.
Figure 8. Visualization of the differential operation matrix acting on the hidden representation of the state correction module.
Applsci 15 03935 g008
Table 1. MSE and MAE of forecasting results for different backbones with and without PCC across different forecasting horizons ({96,192,336,720}) with a consistent look-back length of 96. Lower value of MAE and MSE indicate better forecasting performance, and are indicated in bold.
Table 1. MSE and MAE of forecasting results for different backbones with and without PCC across different forecasting horizons ({96,192,336,720}) with a consistent look-back length of 96. Lower value of MAE and MSE indicate better forecasting performance, and are indicated in bold.
ModelDesign96192336720Average
MSEMAEMSEMAEMSEMAEMSEMAEMSEMAE
stackLSTMOriginal0.2980.3180.3870.3730.5010.4590.5480.4950.4340.411
+PCC0.2210.2660.2420.2890.2850.3140.3690.3760.2790.311
DLinearOriginal0.2070.2330.2440.2690.2860.3060.3450.3540.2710.290
+PCC0.1520.1970.1960.2390.2430.2760.3040.3180.2240.258
iTransformerOriginal0.1750.2150.2250.2580.2810.2990.3580.3500.2600.281
+PCC0.1590.2040.2080.2480.2670.2910.3470.3440.2450.272
PatchTSTOriginal0.1750.2160.2210.2570.2800.2980.3520.3470.2570.280
+PCC0.1570.2050.2070.2490.2700.2930.3450.3440.2450.273
SegRNNOriginal0.1620.2000.2080.2430.2640.2850.3470.3400.2450.267
+PCC0.1440.1890.1890.2330.2380.2730.2990.3170.2180.253
Table 2. MSE of forecasting results for different backbones with and without PCC across different lengths of observed time series ({48,96,192,336,720}) with a consistent prediction horizon of 96.
Table 2. MSE of forecasting results for different backbones with and without PCC across different lengths of observed time series ({48,96,192,336,720}) with a consistent prediction horizon of 96.
BackboneDesignObservation Length
4896192336720
stackLSTMOriginal0.3260.2980.3260.3220.349
+PCC0.2360.2210.2130.1950.228
DLinearOriginal0.2300.2070.1900.1770.168
+PCC0.1880.1520.1450.1430.142
iTransformerOriginal0.2020.1750.1700.1620.175
+PCC0.1810.1590.1540.1510.159
PatchTSTOriginal0.2110.1750.1590.1500.147
+PCC0.1880.1570.1500.1460.145
SegRNNOriginal0.2030.1620.1500.1460.142
+PCC0.1690.1440.1390.1380.138
Bold values indicate better results.
Table 3. Prediction results of key variables using DLinear as the backbone. Results are reported as MSE and MAE, including predictions with and without PCC. The forecasting range and look-back length are both 96.
Table 3. Prediction results of key variables using DLinear as the backbone. Results are reported as MSE and MAE, including predictions with and without PCC. The forecasting range and look-back length are both 96.
VariableAir TemperatureSpecific HumidityWind VelocityPrecipitation
MSEMAEMSEMAEMSEMAEMSEMAE
Original0.0990.2230.0970.2120.0010.0210.0670.055
+PCC0.0820.2180.0620.1730.0010.0180.0520.035
Bold values indicate better results.
Table 4. Performance comparison (MSE) of models with the full variable set versus the reduced variable set. The forecasting and look-back length are both 96.
Table 4. Performance comparison (MSE) of models with the full variable set versus the reduced variable set. The forecasting and look-back length are both 96.
BackboneDesignReduced Variable SetFull Variable Set
MSEMAEMSEMAE
SegRNNOriginal0.2190.2080.1620.200
+PCC0.1960.1950.1440.189
DLinearOriginal0.2840.2530.2070.233
+PCC0.2070.2070.1520.197
Bold values indicate the best results.
Table 5. Ablation results (MSE) of different PCC designs. The ablation experiments used SegRNN as the backbone model, with a look-back window of 96 and prediction horizon of 96. Here, ✓ denotes that the design is used and unchanged, w/o means that it is removed, and ‘Invert’ means that the order of the two modules is inverted. ‘FC-only’ means that the design was replaced with one fully connected layer, while ‘Diff-Rep’ denotes differential representation.
Table 5. Ablation results (MSE) of different PCC designs. The ablation experiments used SegRNN as the backbone model, with a look-back window of 96 and prediction horizon of 96. Here, ✓ denotes that the design is used and unchanged, w/o means that it is removed, and ‘Invert’ means that the order of the two modules is inverted. ‘FC-only’ means that the design was replaced with one fully connected layer, while ‘Diff-Rep’ denotes differential representation.
DesignOperationForecasting Length
96192336720
MCCw/o0.1490.1930.2410.302
FC-only0.1450.1900.2390.301
SCw/o0.1460.1960.2550.332
FC-only0.1480.1920.2450.312
Diff-Repw/o0.1460.1920.2430.307
PCCw/o0.1620.2080.2640.340
Invert0.1470.1920.2440.309
0.1440.1890.2380.299
Bold values indicate the best results.
Table 6. Comparison of training time and maximum memory consumption of the SegRNN backbone with and without PCC. The look-back length is 96, while the prediction horizons are 96 and 720. We set the number of training epochs as 30 and the patience as 10.
Table 6. Comparison of training time and maximum memory consumption of the SegRNN backbone with and without PCC. The look-back length is 96, while the prediction horizons are 96 and 720. We set the number of training epochs as 30 and the patience as 10.
DesignMSETime (s)Memory (MB)
967209672096720
Original0.1620.347176227204.55865.39
+PCC0.1440.299214219204.47868.00
Table 7. Performance comparison between PCC and more complex mechanisms in terms of MSE, inference time, and number of parameters. The backbone is SegRNN with a look-back length of 96 and prediction horizons of 96 and 720.
Table 7. Performance comparison between PCC and more complex mechanisms in terms of MSE, inference time, and number of parameters. The backbone is SegRNN with a look-back length of 96 and prediction horizons of 96 and 720.
DesignMSETime (ms)Parameters (k)
967209672096720
SAMP + SC0.1520.3040.3250.40186.5246.3
LIFT + SC0.1490.3030.3480.4551694.72738.2
saPCC0.1470.3010.3110.411112.53173.9
PCC0.1440.2990.2970.37535.6195.4
Bold values indicate the best performance.
Table 8. Results of the PCC module under different noise conditions. The backbone is SegRNN with a look-back length of 96 and prediction horizons of 96 and 720.
Table 8. Results of the PCC module under different noise conditions. The backbone is SegRNN with a look-back length of 96 and prediction horizons of 96 and 720.
Noisy Input0.10.3
MSEMAEMSEMAE
SegRNN0.1670.2110.1760.228
SegRNN + PCC0.1490.1980.1590.210
Noisy Sequence0.10.3
MSEMAEMSEMAE
SegRNN0.1780.2400.2670.357
SegRNN + PCC0.1610.2280.2500.346
Bold values (except the column headers “0.1” and “0.3”) indicate the better results.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Z.; Luo, Z.; Yang, Z.; Liu, Y. Post Constraint and Correction: A Plug-and-Play Module for Boosting the Performance of Deep Learning Based Weather Multivariate Time Series Forecasting. Appl. Sci. 2025, 15, 3935. https://doi.org/10.3390/app15073935

AMA Style

Wang Z, Luo Z, Yang Z, Liu Y. Post Constraint and Correction: A Plug-and-Play Module for Boosting the Performance of Deep Learning Based Weather Multivariate Time Series Forecasting. Applied Sciences. 2025; 15(7):3935. https://doi.org/10.3390/app15073935

Chicago/Turabian Style

Wang, Zhengrui, Zhongwen Luo, Zirui Yang, and Yuanyuan Liu. 2025. "Post Constraint and Correction: A Plug-and-Play Module for Boosting the Performance of Deep Learning Based Weather Multivariate Time Series Forecasting" Applied Sciences 15, no. 7: 3935. https://doi.org/10.3390/app15073935

APA Style

Wang, Z., Luo, Z., Yang, Z., & Liu, Y. (2025). Post Constraint and Correction: A Plug-and-Play Module for Boosting the Performance of Deep Learning Based Weather Multivariate Time Series Forecasting. Applied Sciences, 15(7), 3935. https://doi.org/10.3390/app15073935

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop