Enhanced Precipitation Nowcasting via Temporal Correlation Attention Mechanism and Innovative Jump Connection Strategy

Yu, Wenbin; Fu, Daoyong; Zhang, Chengjun; Chen, Yadang; Liu, Alex X.; An, Jingjing

doi:10.3390/rs16203757

Open AccessArticle

Enhanced Precipitation Nowcasting via Temporal Correlation Attention Mechanism and Innovative Jump Connection Strategy

by

Wenbin Yu

^1,2,3

,

Daoyong Fu

¹

,

Chengjun Zhang

^2,3,4,*

,

Yadang Chen

⁴

,

Alex X. Liu

⁵ and

Jingjing An

⁶

¹

School of Software, Nanjing University of Information Science and Technology, Nanjing 210044, China

²

Nanjing University of Information Science and Technology, Wuxi Institute of Technology, Wuxi 214000, China

³

Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science and Technology, Nanjing 210044, China

⁴

School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China

⁵

The Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA

⁶

Huaihe River Basin Meteorological Center, Hefei 230031, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(20), 3757; https://doi.org/10.3390/rs16203757

Submission received: 25 August 2024 / Revised: 7 October 2024 / Accepted: 7 October 2024 / Published: 10 October 2024

(This article belongs to the Special Issue Weather and Climate Extremes Monitoring Based on Remote Sensing Methods)

Download

Browse Figures

Versions Notes

Abstract

:

This study advances the precision and efficiency of precipitation nowcasting, particularly under extreme weather conditions. Traditional forecasting methods struggle with precision, spatial feature generalization, and recognizing long-range spatial correlations, challenges that intensify during extreme weather events. The Enhanced Temporal Correlation Jump Prediction Network (ETCJ-PredNet) introduces a novel attention mechanism that optimally leverages spatiotemporal data correlations. This model scrutinizes and encodes information from previous frames, enhancing predictions of high-intensity radar echoes. Additionally, ETCJ-PredNet addresses the issue of gradient vanishing through an innovative jump connection strategy. Comparative experiments on the Moving Modified National Institute of Standards and Technology (Moving-MNIST) and Hong Kong Observatory Dataset Number 7 (HKO-7) validate that ETCJ-PredNet outperforms existing models, particularly under extreme precipitation conditions. Detailed evaluations using Critical Success Index (CSI), Heidke Skill Score (HSS), Probability of Detection (POD), and False Alarm Ratio (FAR) across various rainfall intensities further underscore its superior predictive capabilities, especially as rainfall intensity exceeds 30 dbz,40 dbz, and 50 dbz. These results confirm ETCJ-PredNet’s robustness and utility in real-time extreme weather forecasting.

Keywords:

precipitation nowcasting; temporal correlation attention mechanism; jump connection strategy; extreme weather events

1. Introduction

Precipitation nowcasting, which predicts imminent weather changes within a few hours, is a crucial field in meteorological research. These short-term forecasts are immensely significant in various fields such as disaster management, agriculture, and urban planning. As the world faces increasing extreme weather events, accurate nowcasts are crucial for essential disaster prevention strategies, helping to mitigate the socioeconomic impacts. The field has seen significant progress in recent years, driven by advances in computational power and the emergence of big data. The shift towards data-driven numerical weather forecasting and machine learning has greatly improved the accuracy and utility of these forecasts.

Although many current forecasting systems rely on numerical models, their short-term predictive capabilities are often limited by the spin-up delay [1,2]. Doppler radar plays a crucial role in forecasting imminent precipitation, with systems such as the McGill Algorithm for Precipitation Nowcasting by Lagrangian Extrapolation (MAPLE) [3] and the Short-range Warning of Intense Rainstorms in Localized Systems (SWIRLS) [4] undergoing rapid development. Popular techniques involve centroid tracking for storms and the Tracking Radar Echoes by Correlation (TREC) [5], which determines the direction of movement by comparing echo time correlation coefficients across a region [6]. Constrained Tracking Radar Echoes by Correlation (COTREC) and the Difference Image-based Tracking Radar Echo by Correlations (DITREC) [7,8,9] derive from TREC. However, these approaches depend on short-term data and assume linear echo evolution, which does not fully utilize historical radar data, leading to limitations in timeliness. With advances in deep learning, focusing on precipitation nowcasting using these technologies has become a priority. Originally, optical flow methods were used to predict motion in radar sequences, but they struggled to extract features from echo images, particularly during high-intensity events. Recently, recurrent neural networks (RNNs) [10], such as Long Short-Term Memory networks (LSTMs) and convolutional LSTM networks (ConvLSTMs) [11,12,13,14], have demonstrated outstanding performance. ConvLSTM [15,16] excels in capturing spatiotemporal features by merging convolutions with LSTM gating mechanisms. Trajectory Gated Recurrent Unit (TrajGRU) [17] uses recurrent connections to handle positional changes, enhancing its ability to detect spatiotemporal correlations. Predictive Recurrent Neural Network (PredRNN++) [18,19] tackles the issue of vanishing gradients in deep recurrent networks, thus improving the accuracy of spatiotemporal sequence predictions. Despite advancements, existing deep learning models still encounter challenges when processing complex radar image sequences. These models often struggle to capture global spatial dependencies and motion characteristics in radar images, especially when predicting moderate to heavy rainfall events. Additionally, current models have limitations in effectively modeling spatial features of radar images. Radar image sequences contain substantial redundant information, and treating all input features with equal importance adds strain during model training, leading to poor generalization of spatial features. These challenges limit the effectiveness of deep learning models in precipitation nowcasting, particularly in predicting rare moderate to heavy rainfall events.

Recent advances include PredRNN-v2 [20], which features memory cells with independent transitions that counteract gradient vanishing in spatiotemporal predictive learning. This model uses cross-layer zigzag memory flow and decouples memory loss to minimize redundancy and enhance long-term dynamic capture. A new training approach [21] utilizes dual loss and regularization, adjusting model parameters by comparing real and simulated radar data to improve predictive accuracy. The Channel Aligned Robust Dual (CARD) model [22] overcomes the intrinsic limitations of Transformers in time series analysis by using a dual structure to detect temporal and dynamic interdependencies in multivariate data, integrating a strong uncertainty-weighted loss function. Crossformer [23] excels at identifying cross-dimensional dependencies in multivariate time series forecasts. Studies on Transformer attention mechanisms [24,25,26,27,28,29] have improved temporal forecasting. A notable addition to the field is the Adversarial Error Correction Generative Adversarial Network (AEC-GAN): Adversarial Error Correction GANs for Auto-regressive Long Time-Series Generation [30], which introduces an adversarial error correction mechanism to enhance the generation of long time-series data. This model addresses the propagation of cumulative errors in auto-regressive systems and uses adversarial data augmentation to improve the robustness and quality of generated sequences, setting a new standard in the generation of complex time-series data. Many models based on Generative Adversarial Networks (GANs) [31,32,33,34,35,36], such as NowcastNet [37], uniquely combine deterministic and generative approaches to predict extreme precipitation with high resolution and localized specificity. These models utilize the generative capabilities of GANs to simulate realistic precipitation scenarios, even in conditions of data scarcity or high uncertainty, thereby enhancing the accuracy of extreme weather event predictions. Moreover, by integrating physical models [38,39] with generative adversarial networks, NowcastNet improves its capability to capture complex meteorological phenomena. PanGu [40] specializes in global medium- to long-term forecasts using a 3D Earth-specific Transformer architecture and hierarchical time aggregation, enhancing accuracy while lowering computational demands, surpassing previous methods.

Existing deep learning models, such as PredRNN-v2 and NowcastNet, have advanced precipitation forecasting but often face challenges in capturing short-term fluctuations in local weather dynamics, especially when predicting decay and growth patterns essential for accurate nowcasting. To overcome these limitations, this study introduces an innovative deep learning model named ETCJ-PredNet for precipitation nowcasting. Unlike existing models, ETCJ-PredNet incorporates a novel time-correlation attention mechanism and a jump connection strategy, jointly enhancing both temporal and spatial forecasting capabilities. The time-correlation attention mechanism utilizes historical data more effectively than conventional approaches in models like PredRNN-v2, which focus primarily on mitigating gradient vanishing without explicitly modeling the complexities of meteorological changes. Additionally, while the AEC-GAN model’s approach to correcting errors in time-series generation is innovative, it does not adequately address the specific challenges of predicting high-intensity, short-duration precipitation events as effectively as our model. The jump connection strategy in ETCJ-PredNet specifically addresses the gradient vanishing problem common in deep sequential models and ensures consistent learning across longer sequences, representing a significant improvement over the dual structure of the CARD model, which emphasizes multivariate interdependencies but does not focus on the specific challenges of precipitation nowcasting. This strategic integration enables ETCJ-PredNet to excel in forecasting high-intensity radar echoes, making it particularly effective for addressing the unique challenges of extreme precipitation forecasting. Building on these advances, ETCJ-PredNet combines the strengths of advanced deep learning architectures to deliver unparalleled accuracy and reliability in nowcasting severe weather events, particularly in scenarios involving intense radar echoes.

2. Related Work

2.1. PredRNN Network

Extensive research into deep learning technologies has propelled the PredRNN series to significant success in short-term forecasting. Initially, Wang et al. developed the PredRNN, an end-to-end recurrent neural network that models both spatial and temporal dynamics, showing excellent results in meteorological forecasts. Later versions like PredRNN++ and PredRNN-v2 have enhanced features aimed at solving key challenges in spatiotemporal prediction, especially the issues related to deep temporal structures and gradient vanishing. PredRNN-v2 introduces the spatiotemporal LSTM (ST-LSTM) module, a novel feature that interacts with the unidirectional memory states of the original LSTM. ST-LSTM is designed to detect changes in both short-term spatial details and long-term dynamics. Its spatiotemporal memory units help the network learn complex transitions in consecutive frames and maintain long-term coherence with rapid responses to short-term dynamics, thanks to the LSTM’s temporal memory units. To address gradient vanishing and enhance long-term dependency capture, PredRNN-v2 features a dual-flow memory mechanism that merges original and new memory units, defining the ST-LSTM architecture. ST-LSTM includes two memory types: one that transitions within each unit from one time step to the next, and a spatiotemporal memory that moves vertically to the next unit at the same time step. Distinct gating mechanisms control these memory states, with the final hidden state resulting from their combination. The dual-memory setup in ST-LSTM units supports complex short-term dynamic modeling and offers a shorter gradient path, aiding in learning long-term dependencies.

By leveraging its unique ST-LSTM design and dual-memory mechanism, PredRNN effectively addresses the challenges of temporal and spatial variations in spatiotemporal sequence prediction. This architecture enhances the model’s ability to capture rapid spatial details in the short term while tracking long-term dynamics, thereby significantly improving accuracy and responsiveness in predicting complex spatiotemporal data sequences. Additionally, the built-in spatiotemporal memory flow vertically transmits memory states from bottom to top layers at each timestep, optimized by distinct gating mechanisms, thereby enhancing predictive performance. The model’s architecture is depicted in Figure 1. This advancement showcases the substantial potential of deep learning in analyzing and predicting complex data patterns, particularly demonstrating outstanding performance in tasks such as complex weather forecasting and video analysis.

While PredRNN-v2 showcases innovative and efficient spatiotemporal sequence prediction, it also faces significant challenges, notably gradient vanishing and difficulties in capturing long-term dependencies. Gradient Vanishing Issue: Despite its ST-LSTM structure designed to counter gradient vanishing, PredRNN-v2’s deep temporal architecture may still experience gradient vanishing or explosion. This common problem in deep recurrent networks can lead to training instability and optimization challenges. Although deep architectures offer benefits, they also add complexity and uncertainty to the training process. Difficulty in Capturing Long-term Dependencies: Despite its dual-memory mechanism, PredRNN-v2 struggles to ensure that these two memory units work effectively together to capture long-term dependencies in practical settings. In this study, the integration of jump connections and a novel temporal encoding attention mechanism has not only mitigated PredRNN-v2’s issues with long-term dependencies and gradients but also greatly improved precipitation nowcasting accuracy and efficiency. These enhancements allow the model to perform more robustly and stably in complex spatiotemporal sequence prediction scenarios.

2.2. Scaled Dot-Product Attention (SDPA)

The SDPA mechanism is crucial, first introduced as a core component of the Transformer model in the landmark paper “Attention Is All You Need”. This mechanism calculates attention weights that assign relative importance or focus to each element in a sequence. Central to the SDPA mechanism is the computation of dot products between the query and all keys, with a scaling factor applied to adjust the size of these dot products. The scaling factor, calculated as the square root of the key vectors’ dimension (

\sqrt{d_{k}}

), helps prevent excessively large dot products in higher dimensions. After scaling the dot products, a softmax function performs a nonlinear transformation to produce a weight distribution. This distribution indicates the relative importance of each key’s value in the output. These weights are then used to compute the output by weighting the corresponding values. The mathematical expression for this process is outlined as follows:

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(1)

Here, Q, K, and V represent the query, key, and value matrices, respectively, and

d_{k}

indicates the dimension of the key vectors. The SDPA mechanism excels at processing sequential data, especially in contexts that require grasping long-range dependencies. Its main advantage is its ability to access global information at each point in the sequence, enhancing the overall understanding of the contextual relationships within the data.

2.3. Jump Connection Strategy

The Jump Connection Strategy is a key architectural feature in neural networks, first popularized by Deep Residual Networks (ResNet). This strategy aims to solve the persistent issues of gradient vanishing and explosion during deep neural network training. Fundamentally, the jump connection strategy uses ‘shortcut’ connections that allow information to bypass one or more layers directly within the network. With these jump connections, a layer’s output depends not only on the previous layer but also gains from direct links to earlier layers. This design improves gradient flow during backpropagation, helping to mitigate the common problem of gradient vanishing in deep networks. A crucial aspect of jump connections is their ability to allow the network to learn identity mappings; this lets the network preserve the input of some layers unchanged, avoiding complex transformations at each layer. This approach is effective at maintaining information flow and reducing the computational complexity involved in training.

In ResNet, jump connections usually involve adding the input directly to a layer’s output. For example, in a basic residual block, the output is given by F(x) + x, where x is the input and F(x) is the output from the block’s other layers. This design helps prevent performance degradation as the network depth increases and can even improve performance. The jump connection strategy is effective for building deeper, yet trainable neural networks, excelling at processing complex spatiotemporal data and enhancing long-term memory capabilities.

3. Materials and Methods

3.1. Application of Jump Connection Strategy in the PredRNN Model

PredRNN is a deep learning model designed specifically for spatiotemporal sequence data. It captures complex spatiotemporal dependencies by using multiple layers of ST-LSTM units. Each ST-LSTM unit processes spatial and temporal information at its layer level. This architecture makes PredRNN ideal for tasks requiring long-term memory and temporal sensitivity, like weather forecasting. The ST-LSTM unit, an enhancement over traditional LSTM, is central to PredRNN and optimizes the processing of spatial and temporal information. Each ST-LSTM unit includes two distinct memory streams: spatial memory flow and temporal memory flow, each capturing dynamic changes in space and time, respectively. The process includes the following steps: (1) Cyclic Unit Update: At each time step t and for each layer l, the cyclic unit’s state update is described as follows:

H_{t}^{l} = f (W_{x}^{l} \cdot X_{t} + W_{h}^{l} \cdot H_{t - 1}^{l} + b^{l})

(2)

where

H_{t}^{l}

is the hidden state at layer l and time t,

X_{t}

is the input,

W_{x}^{l}

and

W_{h}^{l}

are the weight matrices,

b^{l}

is the bias, and f represents the activation function. (2) Capturing Temporal Dependencies: Temporal dependencies are captured as the model transmits hidden states from one time step to the next (i.e., from

H_{t - 1}^{l}

to

H_{t}^{l}

). This dual-stream approach enhances the model’s ability to thoroughly analyze and forecast changes in spatiotemporal sequences.

Figure 2 illustrates the application and impact of various jump connection strategies in the ST-LSTM model. The figure presents states with jump lengths of 1, 2, 3, and 4. The leftmost diagram depicts the ST-LSTM structure with a jump length of 1, where information is passed sequentially between layers. Jump connections with single, double, and triple jumps from lower ST-LSTM units to higher ones are introduced. The final diagram shows the structure with the maximum jump length, where information is transmitted directly from the lowest to the highest layer. This structure is more efficient at handling complex spatiotemporal dependencies. We will further explore the impact of these jump connection strategies on model performance and compare the results across varying jump lengths.

While PredRNN is adept at capturing spatiotemporal dependencies, its design introduces several inherent challenges. Its deep structure and complex inter-layer relationships can lead to overly long memory chains. This complexity can complicate gradient propagation and increase the risk of gradient vanishing, especially in tasks involving long sequences. Additionally, the complex ST-LSTM design might cause information loss between layers, impacting the model’s long-term dependency capabilities. To address these issues, we have integrated the jump connection strategy into PredRNN. Jump connections create direct links between ST-LSTM layers, allowing information to bypass intermediate layers. This architectural modification provides several benefits: Shortening Memory Chains: Jump connections decrease the number of layers information must traverse, effectively shortening the memory chain. Enhancing Long-term Dependency Capture: The model can better retain key historical data for long sequences, enhancing prediction accuracy. Alleviating Gradient Vanishing: Jump connections provide shortcuts that help reduce gradient vanishing, thus stabilizing training. Improving Information Flow Efficiency: Jump connections improve how information flows within the network, reducing loss and aiding in capturing long-term dependencies. Enhancing Model Performance: Improved information flow and reduced gradient vanishing from jump connections boost PredRNN’s performance in long-term prediction tasks. The revised formula for the cyclic unit incorporating jump connections is as follows:

H_{t}^{l} = f (W_{x}^{l} \cdot X_{t} + W_{h}^{l} \cdot H_{t - 1}^{l} + W_{s}^{l} \cdot H_{t - 2}^{l - 2} + b^{l})

(3)

In this formula,

W_{s}^{l} \cdot H_{t - 2}^{l - 2}

denotes the jump connection from the two layers

l - 2

directly to layer l, with

W_{s}^{l}

as the corresponding weight matrix.

Figure 3 illustrates the structural differences between the original PredRNN architecture and the enhanced version featuring jump connections. The left diagram shows the standard time memory flow without jump connections, where information is passed sequentially from one ST-LSTM unit to the next. The right diagram demonstrates the improved architecture, where jump connections are introduced between different ST-LSTM layers. These connections shorten the information transmission path, mitigate the risk of gradient vanishing, enhance information flow, and improve the model’s capacity to capture long-term dependencies. This structural enhancement not only increases the model’s stability and efficiency but also boosts PredRNN’s overall performance in complex spatiotemporal sequence prediction tasks.

3.2. Temporal Correlation Attention

To overcome current models’ limitations in spatial feature generalization and extreme weather prediction, we propose a new research approach supported by a time-correlated attention mechanism. This method integrates information from previous and following frames to fully account for temporal and spatial dependencies, enhancing short-term meteorological prediction. This approach helps the model generate more precise predictions of future images from recent frames, particularly during sudden changes. Although the SDPA mechanism by Vaswani et al., from “Attention Is All You Need”, was innovative, it has struggled in meteorological research due to complex spatiotemporal traits of weather data. In response, we introduce a new research method using a time-correlated attention mechanism that merges data from previous and following frames, deeply analyzing both temporal and spatial dependencies. This method allows the model to use recent frames for predicting future images during high-intensity radar echoes, improving its predictive accuracy for such challenging weather conditions. The fundamental formulation is as follows:

\begin{matrix} K_{temporal} = W_{temporal} \cdot H_{past} \\ V_{temporal} = W_{temporal} \cdot H_{past} \\ {Attention}_{temporal} (Q, K_{temporal}, V_{temporal}) = \\ softmax (\frac{(W_{q} Q + W_{q} K_{temporal} [:, - 1 :;]) K_{temporal}^{T}}{\sqrt{d_{k}}}) V_{temporal} \end{matrix}

(4)

W_{temporal}

is the weight matrix for temporal encoding, and

H_{past}

represents data from past frames. Temporal encoding involves applying the linear transformation

W_{temporal}

to

H_{past}

, incorporating historical temporal dependencies into the model.

W_{q}

represents the linear transformation applied to both the query Q and the historical key

K_{temporal}

. This step increases the query’s sensitivity to both current and historical data. The basic architecture is shown in the diagram below:

Figure 4a shows the enhanced architecture that incorporates the Temporal Correlation Attention mechanism. This architecture builds on the original PredRNN model by adding a Temporal Correlation Attention module. It analyzes and encodes information from past and adjacent frames, improving its ability to accurately predict short-term meteorological changes.

Figure 4b compares the traditional SDPA architecture with the improved Temporal Correlation Attention architecture. The traditional SDPA architecture has difficulty managing the complex spatiotemporal traits of meteorological data. Conversely, the improved Temporal Correlation Attention mechanism effectively addresses both temporal and spatial dependencies. With features like temporal encoding and query enhancement, it better integrates past and present data, enabling more precise predictions during rapidly changing weather conditions.

This architecture demonstrates how the Temporal Correlation Attention mechanism innovatively improves extreme weather forecasting accuracy.

4. Experiments

4.1. Dataset

This study employs the Moving-MNIST and HKO-7 datasets. The Moving-MNIST dataset is extensively used in computer vision and machine learning, especially for video prediction and sequence generation tasks. Originating from the MNIST handwritten digit dataset, this dataset includes digits moving randomly across sequential frames. Typically, each sequence contains 20 frames, with every frame featuring a 64 × 64 pixel image of two moving handwritten digits. Although the positions and directions of the digits vary randomly, they follow physical principles like uniform linear motion. This dataset tests a model’s ability to handle time-series data and predict motion patterns in videos or dynamic images. A major challenge with the Moving-MNIST dataset is predicting digit movement in future frames, necessitating an understanding of the digits’ trajectory and motion trends. As a result, this dataset is widely used in video sequence prediction, dynamic image analysis, and similar fields. The Moving-MNIST dataset acts as a valuable benchmark for simulating real-world scenarios like radar echoes, traffic flow, and human motion. Additionally, the Moving-MNIST dataset requires no extra labeling or processing for use in training and testing spatiotemporal models.

The HKO-7 dataset, compiled by the Hong Kong Observatory, includes Doppler radar echo data from 2009 to 2015, making it a crucial resource for short-term precipitation forecast research. It consists of radar images with a 480 × 480 pixel resolution from 2 km high, covering a 512 km × 512 km area around Hong Kong. Radar data are collected every 6 min, amounting to 240 image frames daily. To improve quality and usability, radar reflectivity is converted to pixel values and noise reduction techniques are used to minimize interference from ground and sea clutter, as well as anomalous propagation. Additionally, the data are converted from three-dimensional polar to Cartesian coordinates, which simplifies model training. With its high temporal and spatial resolution, the HKO-7 dataset is ideal for developing and refining short-term forecast models, improving the accuracy and efficiency of precipitation predictions.

To ensure robust training and evaluation, the dataset was divided into 70% for training, 10% for validation, and 20% for testing. This setup allows the model to learn from a substantial portion of data while allocating smaller portions for validation and testing. The training set helps in fitting model parameters, the validation set is used for tuning hyperparameters and preventing overfitting, and the testing set assesses final performance. The data splitting ratios are crucial. A 70% training set provides ample data for learning, 10% for validation aids in tuning and monitoring, and 20% for testing offers an unbiased performance evaluation. Changing these ratios—such as increasing the training data—can enhance learning but may reduce the amount of validation and testing data, thus increasing the risk of overfitting. Conversely, allocating more data to validation or testing can better gauge generalization but might limit the amount of training data. Evaluating the impact of different data splits helps maintain the model’s effectiveness and adaptability across various scenarios. This underscores the importance of proper data partitioning in building reliable predictive models.

4.2. Evaluation Methodology

In this study, eight evaluation metrics were employed to validate our experimental results, namely Mean Squared Error (

M S E

), Structural Similarity Index (

S S I M

), Learned Perceptual Image Patch Similarity (

L P I P S

), Peak Signal-to-Noise Ratio (

P S N R

),

C S I

,

H S S

,

P O D

, and

F A R

. These metrics collectively assess the similarity and differences between the predicted and original images across various dimensions, as well as the model’s forecasting accuracy and reliability.

( $M S E$ ): $M S E$ measures the average of the squares of the differences between actual and predicted values. Its calculation formula is:

$MSE = \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - {\hat{Y}}_{i})}^{2}$

where n is the total number of pixels, and $Y_{i}$ and ${\hat{Y}}_{i}$ are the values of the ith pixel in the observed and predicted values, respectively. A lower $M S E$ value indicates a stronger ability of the model to understand the overall magnitude of the error.
( $S S I M$ ): $S S I M$ evaluates errors by comparing the structural similarity between predicted and observed results. Its formula is:

$S S I M (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}$

where $μ_{x}$ , $μ_{y}$ are the mean values of images x, y, $σ_{x}^{2}$ , $σ_{y}^{2}$ are their variances, and $σ_{x y}$ is their covariance. The value of $S S I M$ ranges from −1 to 1, with values closer to 1 indicating greater similarity between two images.
( $P S N R$ ): PSNR is a widely used metric for assessing image quality. It describes the ratio between the maximum possible power of a signal and the power of destructive noise that affects its quality. The formula for PSNR is:

$PSNR = 10 \cdot {log}_{10} (\frac{{MAX}_{I}^{2}}{MSE})$

where ${MAX}_{I}$ is the maximum possible pixel value of the image. A higher $P S N R$ value indicates better image quality.
( $L P I P S$ ): LPIPS is used to assess the perceptual similarity between the predicted and the actual images. It measures the perceptual differences between patches of the images as perceived by pretrained deep networks. The formula for LPIPS is complex and involves neural network computations which are not represented by a simple formula.
( $C S I$ ): CSI measures the proportion of correct predictions, excluding the correct negatives. It is calculated using the formula:

$CSI = \frac{T P}{T P + F P + F N}$

Here, $T P$ denotes true positives, $F P$ is false positives, and $F N$ represents false negatives.
( $H S S$ ): HSS assesses the accuracy of predictions beyond what is expected by chance. It is expressed as:

$HSS = \frac{2 \times (T P \times T N - F P \times F N)}{(T P + F N) (F N + T N) + (T P + F P) (F P + F N)}$

$T N$ stands for true negatives.
( $P O D$ ): POD focuses on the accuracy of detecting positive events and is defined as:

$POD = \frac{T P}{T P + F N}$

This metric emphasizes the model’s sensitivity to detecting events correctly.
( $F A R$ ): FAR indicates the proportion of false positives out of all positive forecasts and is calculated as:

$FAR = \frac{F P}{T P + F P}$

A lower FAR value is preferable as it indicates fewer false alarms, thus enhancing the reliability of the model’s predictions.

These assessment metrics comprehensively evaluate the predictive performance of models from various aspects, including error magnitude, structural similarity, perceptual quality, statistical accuracy, and reliability in forecasting tasks. This holistic approach ensures a thorough validation of the model’s performance in practical applications.

4.3. Result Comparison and Analysis

4.3.1. Comparative Experiment

This comparative experiment aimed to assess the effectiveness of the proposed jump connection strategy in improving temporal prediction models. We compared widely used models including ConvLSTM, TrajGRU, PredRNN, and PredRNN-v2, specifically integrating the jump connection strategy into PredRNN-v2, henceforth known as Jump Connection PredRNN Network (JC-PredNet). The experiment used the first 10 frames of an image sequence from the MovingMNIST dataset to predict the next 10 frames. We allocated 70% of the image sequences for training, 10% for validation, and 20% for testing.

We employed a range of evaluation metrics to thoroughly assess the predictive performance of different models, including MSE, LPIPS, SSIM, and PSNR. Table 1 presents the comparative results across these metrics, focusing on average predictions for the latter ten frames of the sequence. In the table, a lower MSE and LPIPS value (↓) indicates that the predicted image more closely aligns with the actual image, reflecting higher accuracy. Conversely, a higher SSIM and PSNR value (↑) denotes greater fidelity to the original image, signifying better image quality and structural similarity. Our results showcased that JC-PredNet exhibited the most favorable performance across the board. Notably, JC-PredNet achieved an MSE of 48.7, which is 2.7 points lower than that of PredRNN-v2, suggesting a substantial enhancement in prediction accuracy. Similarly, JC-PredNet recorded the highest SSIM at 0.895, surpassing PredRNN-v2 by 0.005 points, and an LPIPS of 0.060, which is marginally better than PredRNN-v2’s 0.066. These improvements highlight the efficacy of the jump connection mechanism in enhancing spatio-temporal prediction accuracy, especially in handling complex dynamics within the sequences. These detailed comparisons provide clear evidence of JC-PredNet’s superior capability in modeling and predicting sequences with intricate motion patterns and varying intensities, positioning it as a significant advancement over existing models like ConvLSTM, TrajGRU, and PredRNN variants.

Figure 5 displays two randomly selected examples from the test set to demonstrate their long-term prediction performance. The image from the PredRNN model, enhanced with the jump connection strategy, appears clearer. This improvement suggests that enhancing long-term memory and spatiotemporal modeling allows the model to more accurately predict future frame changes. The new model, JC-PredNet, which includes the jump connection strategy, shows enhanced prediction quality. This is particularly noticeable in the clarity of long-term predictions and trajectory accuracy.

To investigate how different jump lengths affect model performance, we studied the impact of various jump strategies. Figure 6 compares the effects of jump lengths of 2, 3, and 4. According to the figure, the optimal performance occurs with a jump length of 2.

This experiment sought to validate the effectiveness of the temporal correlation attention mechanism in boosting model performance. Building on JC-PredNet to better capture short-term abrupt changes, we analyzed the temporal correlation attention mechanism’s performance by comparing the loss curve convergence of PredRNN-v2, illustrated in Figure 7.

Figure 7 shows that ETCJ-PredNet, which includes the time-correlated attention mechanism, converges more quickly and with a steeper gradient than the original model.

We performed quantitative and qualitative analyses of the network’s performance using the real radar echo HKO-7 dataset. ETCJ-PredNet was compared against the ConvLSTM, TrajGRU, PredRNN, and PredRNN-v2 models. In this experiment, we input the first 10 image frames to predict the subsequent 20 frames, using radar echo images from the past hour to forecast the next two hours. The dataset distribution was 70% for training, 10% for validation, and 20% for testing, using the HKO-7 dataset. We set the model’s learning rate to 0.0001. To thoroughly assess the model’s performance, we used two popular evaluation metrics: SSIM and LPIPS. These metrics provided diverse perspectives on the model’s performance.

Figure 8 displays the performance change curves for each model over time steps. The SSIM and LPIPS comparisons show that ETCJ-PredNet consistently outperforms the other three models in radar image sequence prediction, with its superiority increasing over time.

To evaluate the accuracy of short-term precipitation forecasts, we utilized four metrics: CSI, HSS, POD, and FAR. The models evaluated included our proposed ETCJ-PredNet alongside widely used models like ConvLSTM, TrajGRU, PredRNN, and PredRNN-v2. Expanded assessments are now included across varying rainfall intensity thresholds, revealing a decline in performance with increasing thresholds (dBZ ≥ 30, 40, and 50), which can be attributed to the reduction in strong echo zones in radar images.

We conducted a detailed comparative analysis for each model at these thresholds. Our analysis shows a general trend where performance metrics such as CSI and HSS decrease as the dBZ threshold increases, highlighting the inherent challenges in accurately predicting more intense rainfall events, which are more unpredictable and less frequent. At dBZ ≥ 30, ETCJ-PredNet shows superior performance, achieving the highest CSI of 0.725 and HSS of 0.699, indicating more accurate and reliable predictions of rainfall occurrences.

Further, at higher rainfall intensities, dBZ ≥ 40 and dBZ ≥ 50, ETCJ-PredNet consistently outperforms the comparison models. For instance, at dBZ ≥ 50, ETCJ-PredNet records a CSI of 0.331 and an HSS of 0.353, significantly exceeding the metrics of PredRNN-v2 and other models. This superior performance is particularly evident in moderate to heavy rainfall conditions where the model’s innovative architecture—incorporating a temporal correlation attention mechanism and jump connection strategy—proves most beneficial.

The extended results from Table 2, Table 3 and Table 4 illustrate that ETCJ-PredNet not only improves overall predictive performance but also enhances accuracy in forecasting complex meteorological phenomena and precise short-term predictions of severe weather events. This comprehensive evaluation underscores the significant advantages of ETCJ-PredNet’s unique architecture, affirming its effectiveness and utility in real-time precipitation nowcasting. These findings highlight ETCJ-PredNet as a robust solution for meteorological applications, particularly valuable in scenarios requiring precise short-term predictions of severe weather events.

As shown in Figure 9 and Figure 10, we selected a set of predicted images from the test set. To better reflect changes in rainfall intensity, we used the radar echo color scale to display the prediction results and conducted a detailed qualitative comparison of the prediction performance between ETCJ-PredNet and other models, including ConvLSTM, TrajGRU, PredRNN, and PredRNN-V2. Figure 9 focuses on predictions during the precipitation growth phase, while Figure 10 illustrates the dynamic changes during the precipitation decay phase. “Input” refers to the 10 radar echo frames received by the models, while “Ground Truth” and “Prediction” correspond to the actual and predicted radar echo images for the next 20 frames. Each row from t = 12 to t = 30 shows the predicted results from different models compared to the actual radar images at various time steps.

These comparisons clearly show that as the prediction time (t) progresses, the predicted images from each model gradually lose detail and clarity, particularly in long-term prediction scenarios. For instance, ConvLSTM and TrajGRU perform adequately in short-term predictions, but their predictions become increasingly blurry over time, with significant loss of detail. This issue is particularly evident in areas of strong radar echoes, indicating that these models struggle to capture the fine structures of intense precipitation accurately.

In contrast, ETCJ-PredNet consistently maintains higher prediction accuracy across different time steps, with predicted results that are more closely aligned with the actual radar images. Especially in high-intensity precipitation areas, ETCJ-PredNet preserves more details, not only accurately predicting the contours of the radar echoes but also precisely capturing their motion trajectories. The advantage of ETCJ-PredNet becomes more apparent in the representative cases of “precipitation growth and decay”, where its prediction of strong echoes is particularly effective, with more clearly defined precipitation boundaries.

This performance advantage is largely attributed to the architectural design of ETCJ-PredNet. Its Temporal Correlation Attention Mechanism and Jump Connection Strategy effectively mitigate the accumulation of prediction errors over time that occurs in traditional models. As a result, ETCJ-PredNet demonstrates greater stability and reliability in capturing radar echo trends. Additionally, ETCJ-PredNet consistently maintains higher image clarity, providing richer details and textures compared to other models. In strong radar echo areas, in particular, ETCJ-PredNet’s predictions are more accurate, enabling better definition of precipitation boundaries.

Through this comparative analysis, it is clear that ETCJ-PredNet excels not only in short-term precipitation forecasting but also in handling complex precipitation process changes. Whether in the growth or decay phases of precipitation, ETCJ-PredNet showcases its superior predictive capabilities, making it a more reliable tool for real-time precipitation forecasting.

4.3.2. Ablation Experiment

To validate the necessity and effectiveness of each component within the ETCJ-PredNet model, we performed a thorough ablation study, specifically examining the effects of the jump connection strategy and the temporal correlation attention mechanism.

The PredRNN model effectively handles spatiotemporal sequence data but often struggles with gradient vanishing in deep network layers. To counter this, we introduced a jump connection strategy into the PredRNN framework, creating a new model variant: JC-PredRNN. This strategy allows for direct information transfer between layers, enhancing the model’s capacity to capture long-term dependencies and improving the stability and efficiency of deep network training.

Table 5 displays the ablation results for the jump connection strategy, evaluating performance metrics like CSI, HSS, POD, and FAR, with a precipitation intensity threshold of

dBZ \geq 20

. The notation “w” indicates the inclusion of the jump connection strategy, while “w/o” denotes its absence. Here, “↓” signifies that lower values indicate better predictions, while “↑” implies that higher values represent better accuracy. The results show that the JC-PredRNN model, which includes the jump connection strategy, consistently outperforms the model without it across all metrics. The jump connections effectively mitigate gradient vanishing issues in deep sequence processing, thereby enhancing predictive accuracy and demonstrating superior performance in handling complex spatiotemporal data.

In our study, we compared the standard PredRNN-v2 architecture with a version enhanced by the temporal correlation attention mechanism. Table 6 presents the ablation study results, using the same precipitation intensity threshold of

dBZ \geq 20

. The notation “w” indicates the inclusion of the temporal correlation attention mechanism, while “w/o” denotes its absence. The comparison clearly shows that the PredRNN-v2 model with the temporal correlation attention mechanism significantly outperforms the standard configuration.

These enhancements enable ETCJ-PredNet to perform robustly in extreme precipitation forecasting tasks, underscoring the critical roles of the jump connection strategy and temporal correlation attention mechanism in advancing spatiotemporal predictive capabilities.

5. Discussion

With the Transformer architecture gaining prominence across various fields, many weather-related studies using this technology, like CARD, Crossformer, and PanGu, have shown promising results. However, these models primarily input sequential data into the Transformer from various angles, which is not ideally suited for processing time-series data. The inherent attention mechanism of the Transformer is designed mainly for spatial data and struggles to effectively capture dynamic time series characteristics.

This paper introduces a novel model, ETCJ-PredNet, designed to enhance nowcasting accuracy for extreme precipitation events, including heavy rainfall. Compared to other Transformer-based models for weather nowcasting, our ETCJ-PredNet model offers several significant advantages:

First, the specially designed time-correlated attention mechanism in ETCJ-PredNet effectively captures the temporal dynamics of meteorological data, particularly short-term fluctuations. This mechanism considers not only current time step features but also encodes details from preceding and following frames, effectively utilizing historical information. Transformer models, which focus mainly on spatial information, often produce overly smooth nowcasts that underestimate the intensity and variability of extreme precipitation events. Our time-correlated attention mechanism innovatively addresses this critical issue. By encoding details from preceding and following frames, it captures short-term fluctuations, improving nowcasting accuracy for events like heavy rainfall. This capability allows the model to precisely capture short-term fluctuations in high-intensity radar echoes, enhancing nowcasting precision. The time-correlated attention mechanism is a targeted optimization of the SDPA, endowing ETCJ-PredNet with the ability to comprehensively model time-series data in both temporal and spatial dimensions. The model can simultaneously capture global patterns and local details in meteorological data, exhibiting dual advantages, and addressing the shortcomings of existing Transformer architectures in time-series data modeling.

Additionally, we introduced a jump connection strategy that creates feature propagation channels across layers, enhancing information flow. This strategy helps the model capture features at various scales and semantic levels, improving its capability to model complex time series patterns. ETCJ-PredNet maintains stable gradient flow in deep sequential modeling, preventing gradient vanishing or explosion. This ensures efficient training convergence and helps the model capture multi-scale spatiotemporal features, reducing training complexity and boosting generalization. The application of this strategy gives ETCJ-PredNet significant advantages in developing deep networks and analyzing complex time-series patterns, laying a solid foundation for precise extreme weather nowcasting.

6. Conclusions

This study utilized an innovative approach that combines the temporal correlation attention mechanism and jump connection strategy to enhance the accuracy and efficiency of precipitation nowcasting, particularly for real-time extreme weather event predictions. This method effectively uses time-series data by deeply analyzing and encoding historical frames and correlating previous and subsequent frames to accurately identify and predict key meteorological changes in dynamic weather conditions. Additionally, incorporating the jump connection strategy significantly boosts the model’s long-term memory and spatiotemporal modeling capabilities, addressing the extended time memory chain issues in PredRNN-V2 and enhancing prediction accuracy. Applying the model to the Moving-MNIST and HKO-7 datasets showcased its substantial performance advantages. Specifically, the model achieved optimal results in key metrics including MSE, SSIM, and Learning Perceptual Image Patch Similarity (LPIPS). The model demonstrated strong real-time responsiveness and precise predictive capabilities in extreme weather predictions, particularly effective at handling high-intensity precipitation events. Despite its achievements in predicting extreme weather, the model still has room for improvement in handling anomalies and noise. Future work will focus on further optimizing the model’s structure and training strategies, and incorporating additional environmental factors, which are vital for advancing this line of research. This will enhance the model’s adaptability and generalization across different meteorological scenarios, potentially increasing the accuracy and practicality of predictions and broadening its application in short-term forecasting.

Author Contributions

Conceptualization, C.Z. and W.Y.; methodology, C.Z.; software, D.F.; validation, W.Y., C.Z. and Y.C.; formal analysis, D.F.; investigation, Y.C.; resources, W.Y.; data curation, D.F.; writing—original draft preparation, C.Z.; writing—review and editing, W.Y. and J.A.; visualization, Y.C.; supervision, C.Z.; project administration, C.Z.; funding acquisition, C.Z., W.Y. and A.X.L.; additional contributions, A.X.L. and J.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of China under Grant 62071240, and the Natural Science Foundation of Jiangsu Province under Grant BK20231142.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Acknowledgments

We acknowledge the support from the Natural Science Foundation of China and the Natural Science Foundation of Jiangsu Province. We also thank the administrative and technical support provided by Nanjing University of Information Science and Technology.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Charney, J.G. Progress in dynamic meteorology. Bull. Am. Meteorol. Soc. 1950, 31, 231–236. [Google Scholar] [CrossRef]
Tolstykh, M.A. Vorticity-divergence semi-Lagrangian shallow-water model of the sphere based on compact finite differences. J. Comput. Phys. 2002, 179, 180–200. [Google Scholar] [CrossRef]
Turner, B.J.; Zawadzki, I.; Germann, U. Predictability of Precipitation from Continental Radar Images. Part III: Operational Nowcasting Implementation (MAPLE). J. Appl. Meteorol. 2004, 43, 231–248. [Google Scholar] [CrossRef]
Li, L.; Schmid, W.; Joss, J. Nowcasting of motion and growth of precipitation with radar over a complex orography. J. Appl. Meteorol. 1995, 34, 1286–1300. [Google Scholar] [CrossRef]
Rinehart, R.E.; Garvey, E.T. Three-dimensional storm motion detection by conventional weather radar. Nature 1978, 273, 287–289. [Google Scholar] [CrossRef]
Li, Y.J.; Han, L. Storm tracking algorithm development based on the three-dimensional radar image data. J. Comput. Appl. 2008, 28, 1078–1080. [Google Scholar]
Liang, Q.Q.; Feng, Y.R.; Deng, W.J. A composite approach of radar echo extrapolation based on TREC vectors in combination with model-predicted winds. Adv. Atmos. Sci. 2010, 27, 1119–1130. [Google Scholar] [CrossRef]
Fletcher, T.D.; Andrieu, H.; Hamel, P. Understanding, management and modelling of urban hydrology and its consequences for receiving waters: A state of the art. Adv. Water Resour. 2013, 51, 261–279. [Google Scholar] [CrossRef]
Zhang, Y.P.; Cheng, M.H.; Xia, W.M. Estimation of weather radar echo motion field and its application to precipitation nowcasting. Acta Meteor Sin. 2006, 64, 631–646. [Google Scholar]
Medsker, L.R.; Jain, L.C. Recurrent neural networks. Des. Appl. 2001, 5, 2. [Google Scholar]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar]
Zeyer, A.; Doetsch, P.; Voigtlaender, P.; Schlüter, R.; Ney, H. A comprehensive study of deep bidirectional LSTM RNNs for acoustic modeling in speech recognition. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017. [Google Scholar]
Sundermeyer, M.; Schlüter, R.; Ney, H. LSTM neural networks for language modeling. In Proceedings of the Thirteenth Annual Conference of the International Speech Communication Association, Portland, OR, USA, 9–13 September 2012. [Google Scholar]
Ma, S.; Han, Y. Describing images by feeding LSTM with structural words. In Proceedings of the 2016 IEEE International Conference on Multimedia and Expo (ICME), Seattle, WA, USA, 11–15 July 2016. [Google Scholar]
Liu, Y.; Zheng, H.; Feng, X.; Chen, Z. Short-term traffic flow prediction with Conv-LSTM. In Proceedings of the 2017 9th International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 11–13 October 2017. [Google Scholar]
Zheng, H.; Lin, F.; Feng, X.; Chen, Y. A hybrid deep learning model with attention-based conv-LSTM networks for short-term traffic flow prediction. IEEE Trans. Intell. Transp. Syst. 2020, 22, 6910–6920. [Google Scholar] [CrossRef]
Shi, X.; Gao, Z.; Lausen, L.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Deep learning for precipitation nowcasting: A benchmark and a new model. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Wang, Y.; Long, M.; Wang, J.; Gao, Z.; Yu, P.S. PredRNN: Recurrent neural networks for predictive learning using spatiotemporal LSTMs. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Wang, Y.; Gao, Z.; Long, M. PredRNN++: Towards a Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
Wang, Y.; Wu, H.; Zhang, J.; Gao, Z.; Wang, J.; Philip, S.Y.; Long, M. Predrnn: A recurrent neural network for spatiotemporal predictive learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 2208–2225. [Google Scholar] [CrossRef] [PubMed]
Ravuri, S.; Lenc, K.; Willson, M.; Kangin, D.; Lam, R.; Mirowski, P.; Arribas, A.; Clancy, E.; Robinson, N.; Mohamed, S.; et al. Skilful precipitation nowcasting using deep generative models of radar. Nature 2021, 597, 672–677. [Google Scholar] [CrossRef]
Xue, W.; Zhou, T.; Wen, Q.; Gao, J.; Ding, B.; Jin, R. Make Transformer Great again for Time Series Forecasting: Channel Aligned Robust Dual Transformer. arXiv 2023, arXiv:2305.12095. [Google Scholar]
Zhang, Y.; Yan, J. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In Proceedings of the Eleventh International Conference on Learning Representations, Virtual, 25 April 2022. [Google Scholar]
Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tao, D.; Xu, Y.; Xu, C.; Yang, Z.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar]
Gao, Z.; Shi, X.; Wang, H.; Zhu, Y.; Wang, Y.B.; Li, M.; Yeung, D.Y. Earthformer: Exploring space-time transformers for earth system forecasting. Adv. Neural Inf. Process. Syst. 2022, 35, 25390–25403. [Google Scholar]
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
Liang, Y.; Xia, Y.; Ke, S.; Wang, Y.; Wen, Q.; Zhang, J.; Zheng, Y.; Zimmermann, R. Airformer: Predicting nationwide air quality in China with transformers. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37. [Google Scholar]
Wang, L.; Zeng, L.; Li, J. AEC-GAN: Adversarial Error Correction GANs for Auto-regressive Long Time-series Generation. Proc. Aaai Conf. Artif. Intell. 2023, 37, 10140–10148. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Bengio, Y.; Courville, A. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.; Wang, Z.; Smolley, S.P. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2794–2802. [Google Scholar]
Esteban, C.; Hyland, S.L.; Rätsch, G. Real-valued (medical) time series generation with recurrent conditional GANs. arXiv 2017, arXiv:1706.02633. [Google Scholar]
Yoon, J.; Jarrett, D.; Van der Schaar, M. Time-series generative adversarial networks. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
Liao, S.; Ni, H.; Sabate-Vidales, M.; Szpruch, L.; Wiese, M.; Xiao, B. Sig-Wasserstein GANs for conditional time series generation. Math. Financ. 2024, 34, 622–670. [Google Scholar] [CrossRef]
Zhang, Y.; Long, M.; Chen, K.; Xing, L.; Jin, R.; Jordan, M.I.; Wang, J. Skilful nowcasting of extreme precipitation with NowcastNet. Nature 2023, 619, 526–532. [Google Scholar] [CrossRef]
Xie, P.; Li, X.; Ji, X.; Chen, X.; Chen, Y.; Liu, J.; Ye, Y. An energy-based generative adversarial forecaster for radar echo map extrapolation. IEEE Geosci. Remote. Sens. Lett. 2020, 19, 3500505. [Google Scholar] [CrossRef]
Jarrett, D.; Bica, I.; van der Schaar, M. Time-series generation by contrastive imitation. Adv. Neural Inf. Process. Syst. 2021, 34, 28968–28982. [Google Scholar]
Bi, K.; Xie, L.; Zhang, H.; Chen, X.; Gu, X.; Tian, Q. Accurate medium-range global weather forecasting with 3D neural networks. Nature 2023, 619, 533–538. [Google Scholar] [CrossRef]

Figure 1. Advanced schematic diagram of the PredRNN-V2 architecture [20].

Figure 2. Architectural framework diagram with jump connection strategy.

Figure 3. On the left is the Schematic Diagram of the Time Memory Flow Architecture in the PredRNN Model, and on the right is the Improved Schematic Diagram of the Time Memory Flow Architecture with Jump Connections Introduced.

Figure 4. (a) Improved architecture incorporating temporal correlation attention mechanism. (b) Traditional SDPA architecture and proposed temporal correlation attention architecture.

Figure 5. Display of prediction results on the Moving-MNIST test dataset.

Figure 6. Comparison of MSE, LPIPS, SSIM, and PSNR metrics under different jump connection strategies.

Figure 7. On the left are the original training loss curves for different models, and on the right are the smoothed training loss curves with smoothing techniques applied.

Figure 8. Performance trends of various models over time steps on the HKO-7 dataset.

Figure 9. Examples of predictions on the radar echo test set, generating 20 future frames from 10 past observations, demonstrating the prediction of echo growth.

Figure 10. Examples of predictions on the radar echo test set, generating 20 future frames from 10 past observations, demonstrating the prediction of precipitation decay.

Table 1. This table shows the performance comparison on the Moving-MNIST test set.

Model	MSE ↓	SSIM ↑	LPIPS ↓	FLOPS (G)
ConvLSTM [11]	103.3	0.707	0.156	80.7
TrajGRU [17]	100.1	0.762	0.110	-
PredRNN [18]	62.9	0.878	0.063	-
PredRNN-V2 [20]	51.4	0.890	0.066	-
JC-PredNet	48.7	0.895	0.060	-

Table 2. Comparison of CSI, HSS, POD, and FAR metrics across five networks (

dBZ \geq 30

).

Table 2. Comparison of CSI, HSS, POD, and FAR metrics across five networks (

dBZ \geq 30

).

Model	CSI ↑	HSS ↑	POD ↑	FAR ↓
ConvLSTM	0.594	0.534	0.785	0.290
TrajGRU	0.603	0.547	0.790	0.281
PredRNN	0.651	0.608	0.830	0.249
PredRNN-v2	0.691	0.658	0.850	0.213
ETCJ-PredNet(our)	0.725	0.699	0.870	0.187

Table 3. Comparison of CSI, HSS, POD, and FAR metrics across five networks (

dBZ \geq 40

).

Table 3. Comparison of CSI, HSS, POD, and FAR metrics across five networks (

dBZ \geq 40

).

Model	CSI ↑	HSS ↑	POD ↑	FAR ↓
ConvLSTM	0.431	0.321	0.601	0.396
TrajGRU	0.484	0.394	0.648	0.344
PredRNN	0.504	0.416	0.672	0.331
PredRNN-v2	0.543	0.464	0.695	0.287
ETCJ-PredNet(our)	0.577	0.506	0.717	0.252

Table 4. Comparison of CSI, HSS, POD, and FAR metrics across five networks (

dBZ \geq 50

).

Table 4. Comparison of CSI, HSS, POD, and FAR metrics across five networks (

dBZ \geq 50

).

Model	CSI ↑	HSS ↑	POD ↑	FAR ↓
ConvLSTM	0.211	0.255	0.373	0.672
TrajGRU	0.234	0.264	0.394	0.634
PredRNN	0.248	0.279	0.416	0.618
PredRNN-v2	0.271	0.296	0.441	0.587
ETCJ-PredNet(our)	0.331	0.353	0.514	0.518

Table 5. Ablation study results for jump connection strategy on CSI, HSS, PDO, and FAR metrics (

dBZ \geq 30

).

Table 5. Ablation study results for jump connection strategy on CSI, HSS, PDO, and FAR metrics (

dBZ \geq 30

).

Model	CSI ↑	HSS ↑	POD ↑	FAR ↓
w/o Jump Connection Strategy	0.691	0.658	0.850	0.213
w Jump Connection Strategy	0.707	0.678	0.861	0.201

Table 6. Ablation study results for temporal correlation attention mechanism on CSI, HSS, PDO, and FAR metrics (dBZ ≥ 30).

Model	CSI ↑	HSS ↑	POD ↑	FAR ↓
w/o Temporal Correlation	0.691	0.658	0.850	0.213
w Temporal Correlation	0.714	0.686	0.863	0.195

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, W.; Fu, D.; Zhang, C.; Chen, Y.; Liu, A.X.; An, J. Enhanced Precipitation Nowcasting via Temporal Correlation Attention Mechanism and Innovative Jump Connection Strategy. Remote Sens. 2024, 16, 3757. https://doi.org/10.3390/rs16203757

AMA Style

Yu W, Fu D, Zhang C, Chen Y, Liu AX, An J. Enhanced Precipitation Nowcasting via Temporal Correlation Attention Mechanism and Innovative Jump Connection Strategy. Remote Sensing. 2024; 16(20):3757. https://doi.org/10.3390/rs16203757

Chicago/Turabian Style

Yu, Wenbin, Daoyong Fu, Chengjun Zhang, Yadang Chen, Alex X. Liu, and Jingjing An. 2024. "Enhanced Precipitation Nowcasting via Temporal Correlation Attention Mechanism and Innovative Jump Connection Strategy" Remote Sensing 16, no. 20: 3757. https://doi.org/10.3390/rs16203757

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced Precipitation Nowcasting via Temporal Correlation Attention Mechanism and Innovative Jump Connection Strategy

Abstract

1. Introduction

2. Related Work

2.1. PredRNN Network

2.2. Scaled Dot-Product Attention (SDPA)

2.3. Jump Connection Strategy

3. Materials and Methods

3.1. Application of Jump Connection Strategy in the PredRNN Model

3.2. Temporal Correlation Attention

4. Experiments

4.1. Dataset

4.2. Evaluation Methodology

4.3. Result Comparison and Analysis

4.3.1. Comparative Experiment

4.3.2. Ablation Experiment

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI