DFST-GAN: A Dynamic Flow Spatio-Temporal Generative Adversarial Network for High-Quality Precipitation Nowcasting

Shi, Jiawei; Yu, Wenbin; Qian, Hongjie; Zhang, Chengjun; Zhu, Konglin; Liu, Jie; Liu, Gaoping

doi:10.3390/rs17172974

Open AccessArticle

DFST-GAN: A Dynamic Flow Spatio-Temporal Generative Adversarial Network for High-Quality Precipitation Nowcasting

by

Jiawei Shi

¹,

Wenbin Yu

^1,2,3,*

,

Hongjie Qian

¹

,

Chengjun Zhang

^1,2,3

,

Konglin Zhu

⁴,

Jie Liu

^5,6 and

Gaoping Liu

^5,6

¹

School of Software, Nanjing University of Information Science and Technology, Nanjing 210044, China

²

Wuxi Institute of Technology, Nanjing University of Information Science and Technology, Wuxi 214000, China

³

Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science and Technology, Nanjing 210044, China

⁴

The Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA

⁵

Huaihe River Basin Meteorological Center, Hefei 230031, China

⁶

Anhui Meteorological Observatory, Hefei 230031, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(17), 2974; https://doi.org/10.3390/rs17172974

Submission received: 16 July 2025 / Revised: 23 August 2025 / Accepted: 25 August 2025 / Published: 27 August 2025

Download

Browse Figures

Versions Notes

Abstract

This paper proposes a Dynamic Flow Spatio-Temporal Generative Adversarial Network (DFST-GAN) model for high-quality precipitation nowcasting. Current spatio-temporal prediction models struggle with two key limitations: the inability to adaptively capture complex motion patterns and the tendency to generate blurry predictions over time. To address these challenges, DFST-GAN integrates a dynamic flow feature extraction mechanism with a novel specialized meteorological discriminator, enabling adaptive modeling of complex precipitation system trajectories and generating sharper, physically consistent predictions. We evaluate our approach on the HKO-7 dataset using metrics including CSI, HSS, POD, FAR and ETS. Experimental results demonstrate that DFST-GAN consistently outperforms existing methods across all evaluation metrics, with particularly notable improvements for moderate to heavy rainfall events (dBZ ≥ 50), showing a 18.8% relative improvement in CSI compared to PredRNN-V2. The ablation studies confirm that each component makes a meaningful contribution to overall performance, validating the potential of our approach for operational precipitation nowcasting applications.

Keywords:

precipitation nowcasting; generative adversarial networks; spatio-temporal sequence; deep learning

1. Introduction

Precipitation nowcasting, the task of providing extremely short-range (0–6 h) rainfall predictions for local areas, constitutes a critical component of modern meteorological services, with significant impacts on public safety, urban water management, and transportation systems [1]. The increasing frequency and intensity of extreme rainfall events due to global climate change have further elevated the importance of accurate precipitation nowcasting [2], establishing it as a central challenge for meteorological services worldwide [3].

Traditional nowcasting approaches have formed the foundation of operational precipitation forecasting for decades, with numerical weather prediction (NWP) models and radar-based extrapolation techniques achieving significant success in their respective domains. However, these established approaches face inherent limitations when applied to the specific challenges of precipitation nowcasting. NWP models, despite their physical rigor, suffer from computational constraints that limit update frequencies and spatial resolution, making them less suitable for urban-scale applications requiring sub-kilometer detail and rapid response times [4]. Additionally, the spin-up time required for NWP models often exceeds the forecast horizon for immediate nowcasting needs. Radar-based extrapolation methods, while computationally efficient, fundamentally assume motion persistence and struggle to capture the non-linear evolution processes characteristic of convective initiation, cell merging, and intensity changes that are critical for accurate precipitation nowcasting beyond one hour [5].

For very short-term nowcasting (e.g., within 1 h), traditional numerical weather prediction (NWP) methods and optical flow techniques can indeed yield excellent results due to the dominance of advective processes. However, for forecast horizons of 1–6 h where convective initiation, cell merging, and intensity evolution become critical, the limitations of linear motion assumptions and computational constraints become pronounced, necessitating more sophisticated approaches capable of modeling non-linear atmospheric dynamics and fine-scale processes.

Recent deep learning advances have shown remarkable potential in addressing these limitations through their ability to model complex non-linear spatio-temporal dynamics. Convolutional Long Short-Term Memory (ConvLSTM) [6] pioneered the integration of spatial and temporal modeling for precipitation nowcasting, demonstrating superior performance compared to traditional extrapolation methods. Subsequent developments including Predictive Recurrent Neural Network (PredRNN) [7] improved temporal dependency modeling through spatio-temporal memory flows. However, existing deep learning models face two critical limitations that restrict their operational deployment. First, conventional convolutional operations employ fixed geometric sampling patterns that cannot adaptively model the complex, non-rigid transformations characteristic of atmospheric phenomena [8]. Second, regression-based training objectives prioritize pixel-level accuracy over perceptual quality, resulting in increasingly blurry predictions over extended time horizons [9], which limits practical utility for operational meteorological services requiring sharp precipitation boundaries.

The gap between existing methods’ capabilities and operational requirements necessitates adaptive approaches that can capture complex non-linear dynamics while maintaining computational efficiency, preserve fine-grained structural details over extended forecast horizons, and incorporate meteorological domain knowledge for physically consistent predictions. To address these requirements, this paper proposes the Dynamic Flow Spatio-Temporal Generative Adversarial Network (DFST-GAN), which integrates dynamic flow-based feature extraction with a specialized meteorological discriminator. Unlike existing approaches that rely on fixed convolutional patterns, DFST-GAN employs learnable sampling locations that adaptively adjust based on underlying motion patterns, enabling accurate tracking of complex precipitation system trajectories. Furthermore, through adversarial training with a meteorological information-based discriminator incorporating domain-specific attention mechanisms, the model generates sharp, physically consistent predictions that preserve fine-grained details even for longer forecast horizons.

The performance of DFST-GAN is evaluated on the meteorological HKO-7 radar dataset, with experimental results demonstrating superior performance compared to existing spatio-temporal prediction models. The proposed methodology addresses fundamental limitations in current precipitation nowcasting approaches and provides a foundation for integrating sophisticated deep learning techniques with operational meteorological forecasting systems.

2. Related Work

2.1. Spatio-Temporal Sequence Prediction Models

Spatio-temporal sequence prediction has undergone significant evolution in recent years. Early approaches relied primarily on convolutional structures, with ConvLSTM [6] pioneering the integration of convolutional operations into recurrent frameworks to capture both spatial and temporal dynamics. Building upon this foundation, PredRNN [7] introduced a spatio-temporal LSTM unit that modeled spatial appearances and temporal variations simultaneously through additional memory transitions. Predictive Recurrent Neural Network++ (PredRNN++) [10] further enhanced the architecture with a causal LSTM component to handle gradient propagation issues better, while Memory In Memory (MIM) [11] incorporated a memory-decoupling mechanism to separate short-term and long-term temporal dependencies.

Recent advancements include Predictive Recurrent Neural Network-V2 (PredRNN-V2) [12], which implemented a decoupled memory structure with explicit constraints to mitigate memory confusion, and Eidetic 3D Long Short-Term Memory (E3D-LSTM) [13], which introduced 3D convolutions to capture spatial and temporal information jointly. Alongside these RNN-based models, transformer-based architectures such as Time-Space Transformer (TimeSformer) [14] have emerged, leveraging self-attention mechanisms for modeling long-range dependencies in spatio-temporal data. The Deformable ConvLSTM [15] approach integrates deformable convolutions into recurrent structures for weather forecasting applications, allowing for more adaptive sampling patterns that can better handle irregular motion patterns in meteorological data.

However, these approaches face fundamental limitations when applied to meteorological applications. ConvLSTM and PredRNN variants, despite their success in capturing temporal dependencies, employ fixed convolutional kernels that cannot adaptively model the non-rigid deformations characteristic of atmospheric phenomena. While transformer-based approaches like TimeSformer address long-range dependencies, they lack the ability to adaptively adjust receptive fields based on local motion characteristics, limiting their effectiveness for complex weather systems with varying motion patterns across different spatial regions.

2.2. Generative Adversarial Networks for Sequence Generation

Generative Adversarial Networks (GANs) have demonstrated remarkable capabilities in image generation tasks, with progressive advancements from the original GAN framework [16] to more sophisticated variants like conditional GANs [17] and Wasserstein GANs [18]. The application of GANs to video and sequence generation presents unique challenges, as highlighted by the Video Generative Adversarial Network (VideoGAN) [19], which separates the foreground and background generation processes.

Motion Content Network (MCNET) [20] introduced a motion and content decomposition approach, while Disentangled Representation Network (DRNET) [21] disentangled content and pose representations to enhance temporal consistency. More recent developments include Motion Content Generative Adversarial Network-High Definition (MoCoGAN-HD) [22], which employs a contrastive learning framework to generate high-definition videos, and Transformer Generative Adversarial Network (TransGAN) [23], which replaces convolutional structures with transformer architectures for improved global context modeling in the discriminator. The integration of adversarial learning with spatio-temporal modeling has shown promise in enhancing prediction sharpness, as demonstrated by Video-to-Video Synthesis (Vid2Vid) [24] and Stochastic Disentanglement Generative Adversarial Network (SDGAN) [25] for video-to-video synthesis and stochastic video generation, respectively.

Despite these advances, existing video generation GANs face significant challenges when applied to precipitation nowcasting. Most approaches employ general-purpose discriminators that lack domain-specific meteorological constraints, leading to physically implausible predictions that may violate atmospheric dynamics principles. Furthermore, traditional GANs struggle to maintain temporal consistency over extended forecast horizons, particularly for extreme weather events where physical constraints become critical.

2.3. Precipitation Nowcasting

Precipitation nowcasting, the task of short-term rainfall prediction, has seen significant methodological advancements driven by deep learning approaches. Shi et al. [6] pioneered the application of deep learning to this field with the introduction of ConvLSTM, which integrated convolutional operations with recurrent frameworks to capture spatio-temporal correlations in radar data more effectively. Building upon this foundation, Shi et al. [26] further advanced the field by proposing the Trajectory Gated Recurrent Unit (TrajGRU) model that can actively learn location-variant structures for recurrent connections, significantly improving the capability to track complex motion patterns in precipitation systems.

Domain-specific models have further refined precipitation nowcasting capabilities, with Meteorological Network (MetNet) [27] leveraging multi-resolution inputs and axial self-attention for high-resolution weather prediction. The recently proposed Earth Transformer V2 (EarthformerV2) [28] applies a cuboid attention mechanism to process 3D spatio-temporal data for environmental forecasting tasks efficiently. For radar-based precipitation nowcasting, Rain Network (RainNet) [29] demonstrated the effectiveness of U-Net architectures, while Skillful Network (Skillful-Net) [30] incorporated multi-scale analysis mechanisms to handle precipitation events at varying intensities.

Recent breakthrough developments have demonstrated the potential of physics-informed deep learning models for precipitation nowcasting. Notable examples include NowcastNet [31], which unifies physical evolution schemes with deep learning frameworks to achieve superior performance on extreme precipitation events, and DeepMind’s Deep Generative Model of Radar (DGMR) [32], which achieved state-of-the-art performance through probabilistic nowcasting with specialized attention mechanisms. These physics-embedded approaches address limitations of purely data-driven methods by incorporating atmospheric dynamics constraints and physical consistency.

While these approaches have advanced the field significantly, they exhibit common limitations that restrict operational deployment. Pure data-driven methods like ConvLSTM and TrajGRU, despite their ability to learn complex patterns, often fail to preserve physical consistency and struggle with extreme events that deviate from training distributions. Physics-informed approaches like NowcastNet and DGMR address some of these concerns but introduce computational overhead and complexity that may limit real-time applications. Moreover, most existing methods still suffer from progressive blur accumulation over extended forecast horizons, particularly beyond 2–3 h.

2.4. Hybrid NWP-Deep Learning and Enhanced Optical Flow Methods

The integration of numerical weather prediction (NWP) models with deep learning has emerged as a promising direction to leverage the complementary strengths of both paradigms. A primary focus of this integration involves post-processing ensemble forecasts using deep neural networks to correct systematic biases and improve local-scale predictions. Building on these foundational approaches, Conditional Generative Adversarial Networks (CGANs) have been employed to generate high-resolution precipitation maps from coarse NWP outputs, utilizing complete atmospheric variable sets rather than limited meteorological parameters [33]. More sophisticated developments include spectral nudging techniques that combine data-driven weather models with traditional NWP systems [34].

Traditional optical flow techniques have been significantly enhanced through deep learning integration to overcome limitations in capturing non-linear precipitation dynamics. Notably, researchers have demonstrated that combining multiple optical flow algorithms through deep learning regression processes can more accurately capture multi-spatial features compared to single-algorithm approaches [35]. Further innovations include the application of Conditional Generative Adversarial Networks with multi-temporal optical flow fields to address growth and decay processes that traditional advection-based methods struggle to predict [36]. Contemporary developments have also integrated Dynamic Weight Attention mechanisms with optical flow extrapolation to alleviate mean reversion problems [37], while physics-informed optical flow methods incorporate atmospheric dynamics constraints to bridge classical extrapolation techniques with modern deep learning capabilities [38].

These hybrid approaches emerge from the recognition that neither pure NWP nor pure deep learning methods alone can fully address the challenges of operational precipitation nowcasting. NWP models, while physically consistent, suffer from computational constraints and resolution limitations that make them unsuitable for urban-scale applications. Conversely, purely data-driven deep learning methods, despite their computational efficiency, often lack physical constraints necessary for reliable long-term predictions and struggle with extreme events outside their training distributions.

3. Methods

This section presents the Dynamic Flow Spatio-Temporal Generative Adversarial Network (DFST-GAN), which introduces two key innovations for high-quality precipitation nowcasting. The first innovation is the Dynamic Flow Spatio-Temporal Feature Extractor, which employs learnable sampling locations that adaptively adjust based on underlying motion patterns, enabling accurate tracking of complex precipitation system trajectories. The second innovation is the meteorological information-based discriminator, which incorporates domain-specific attention mechanisms specifically designed for meteorological data characteristics, including multi-scale motion patterns and terrain effects. Together, these components enable DFST-GAN to generate sharp, physically consistent precipitation predictions while accurately modeling non-rigid atmospheric transformations. The following subsections detail each component of the DFST-GAN architecture.

3.1. Dynamic Flow Spatio-Temporal Feature Extractor

The Dynamic Flow Spatio-Temporal Feature Extractor represents the core innovation of our approach, designed to overcome the limitations of conventional convolutional operations in modeling complex motion dynamics. Traditional convolutional layers employ fixed geometric patterns for feature extraction, which constrains their ability to capture the non-rigid transformations and varying motion trajectories commonly observed in real-world spatio-temporal data.

This limitation becomes particularly pronounced when different regions exhibit varying motion characteristics, as the fixed receptive field cannot adaptively adjust to local motion patterns, resulting in suboptimal modeling of dynamically evolving spatial dependencies.

To overcome this limitation, our Dynamic Flow Feature Extractor introduces learnable sampling locations that adapt based on the input data, as shown in Figure 1. Instead of using a fixed grid, we generate a displacement field, denoted by

F

, which determines the sampling locations for each position:

F = ϕ (X; θ_{ϕ})

(1)

Here, X denotes the input feature maps.

ϕ

is a convolutional network used to generate the displacement field, implemented here as a two-layer convolutional network.

θ_{ϕ}

represents the learnable parameters of this network.

For each position

(i, j)

in the feature map, we obtain sampling locations:

P_{l} (i, j) = (i, j) + F (l, i, j)

(2)

With these computed sampling locations, we perform bilinear sampling and aggregate the features:

Z (i, j) = ψ ({S_{1} (i, j), S_{2} (i, j), \dots, S_{L} (i, j)}; θ_{ψ})

(3)

In Equation (2),

P_{l} (i, j)

denotes the coordinates of the l-th new sampling location for the original position

(i, j)

in the feature map, adjusted by the displacement field

F

. In Equation (3), we aggregate the features from these new locations. Specifically,

S_{l} (i, j)

is the feature value obtained at location

P_{l} (i, j)

via bilinear sampling. L is the total number of predefined sampling locations for each position.

ψ

is an aggregation function (e.g., a convolutional or fully-connected layer) that fuses the L sampled features into a final output feature

Z (i, j)

.

θ_{ψ}

represents the learnable parameters of the aggregation function

ψ

.

This dynamic flow mechanism substantially enhances the model’s ability to capture complex motion patterns, thus improving the overall quality of spatio-temporal predictions. This improvement is particularly pronounced for phenomena with intricate spatial dependencies and non-rigid transformations.

3.2. Flow-Adaptive Spatio-Temporal LSTM (FAST-LSTM)

The Flow-Adaptive Spatio-Temporal LSTM (FAST-LSTM) cell represents an enhanced recurrent unit that integrates the dynamic flow mechanism into the memory cell structure of PredRNN-V2. FAST-LSTM incorporates what we previously referred to as dynamic flow sampling, a mechanism that adaptively adjusts sampling locations based on motion patterns. This integration allows the model to leverage both the temporal modeling capabilities of LSTM and the spatial adaptability of the dynamic flow feature extractor.

The FAST-LSTM cell extends the conventional ST-LSTM architecture by incorporating dynamic flow-based feature extraction, as depicted in Figure 2. The core computation involves first generating a displacement field from the previous hidden state:

\begin{matrix} F_{t} = ϕ (H_{t - 1}; θ_{ϕ}) \end{matrix}

(4)

\begin{matrix} {\tilde{H}}_{t - 1} = DynamicSampling (H_{t - 1}, F_{t}) \end{matrix}

(5)

where

H_{t - 1}

is the hidden state at timestep

t - 1

. Equation (4) utilizes the same network

ϕ

(with parameters

θ_{ϕ}

) to generate a displacement field

F_{t}

for the current timestep based on

H_{t - 1}

. The DynamicSampling function in Equation (5) represents the dynamic sampling process, which uses the displacement field

F_{t}

to resample

H_{t - 1}

, yielding an adaptively sampled hidden state

{\tilde{H}}_{t - 1}

. This state is subsequently used in the gating calculations of the FAST-LSTM cell.

A key feature of FAST-LSTM, inherited from PredRNN-V2, is the decoupling of memory states. This design separates short-term and long-term dependencies by utilizing two distinct memory flows: the cell state

C_{t}

and the memory state

M_{t}

. The decoupling is further enhanced by introducing a decoupling loss:

L_{d e c o u p l i n g} = \frac{1}{T} \sum_{t = 1}^{T} \cos (\nabla C_{t}, \nabla M_{t})

(6)

This equation defines the decoupling loss

L_{decoupling}

, which promotes orthogonality between the updates of the two memory flows. Here, T is the total number of timesteps in the sequence.

C_{t}

and

M_{t}

are the cell state and memory state at timestep t, respectively.

\nabla C_{t}

and

\nabla M_{t}

represent the temporal changes (i.e., gradients, often calculated as the difference

C_{t} - C_{t - 1}

and

M_{t} - M_{t - 1}

) of the cell state and memory state. The cos function denotes the cosine similarity. This loss encourages the update directions of the two memory states to be orthogonal, enabling them to capture diverse temporal dynamics.

3.3. Spatio-Temporal Adversarial Generative Network

To address the issue of blurry predictions standard in traditional regression-based models, we employ an adversarial learning strategy in our DFST-GAN framework. The adversarial component comprises a generator, built upon the FAST-LSTM model, and a meteorological information-based discriminator inspired by TransGAN, which assesses the realism of the generated sequences.

3.3.1. Generator Architecture

The generator G takes a sequence of input frames and produces a sequence of future frames:

{\hat{Y}}_{T_{i n} + 1 : T_{o u t}} = G (X_{1 : T_{i n}})

(7)

Here, G is the generator model.

X_{1 : T_{i n}}

represents the input sequence composed of the first

T_{i n}

frames.

{\hat{Y}}_{T_{i n} + 1 : T_{o u t}}

is the predicted sequence of future frames produced by the generator, covering the timesteps from

T_{i n} + 1

to

T_{o u t}

.

The generator consists of an encoder–prediction–decoder architecture, where:

The encoder transforms the input frames into a latent representation using convolutional layers.
The prediction module, built with multiple layers of FAST-LSTM cells, models the temporal dynamics and generates future states.
The decoder converts the predicted states into output frames using transposed convolutional layers.

3.3.2. Meteorological Information-Based Discriminator

For the discriminator, we propose a novel meteorological information-based discriminator (MID) that goes beyond conventional CNN-based discriminators and generic Transformer architectures. Unlike standard Transformer-based discriminators that employ generic self-attention mechanisms, our MID incorporates domain-specific attention mechanisms specifically designed for the spatio-temporal characteristics of meteorological data, including multi-scale movement of precipitation systems and terrain effects.

As illustrated in Figure 3, the MID architecture integrates domain-specific attention mechanisms through a novel parallel-then-fusion design. This approach enables independent feature learning within each domain through self-attention mechanisms while facilitating their physical interaction through cross-attention mechanisms.

Multi-Scale Motion Self-Attention (MSMSA)

Precipitation systems exhibit complex motion patterns across multiple spatial and temporal scales, from local convective cells to large-scale weather systems. To capture these multi-scale dynamics and internal consistency, we introduce a Multi-Scale Motion Self-Attention (MSMSA) mechanism that adaptively focuses on precipitation movements at different scales while modeling the internal relationships within precipitation features:

MSMSA (X) = \sum_{s = 1}^{S} α_{s} \cdot {SelfAttention}_{s} (X_{s})

(8)

In this equation, X represents the input precipitation features,

X_{s}

denotes the precipitation features at scale s, S is the total number of scales,

α_{s}

are learnable scale weights that determine the importance of each scale, and

{SelfAttention}_{s}

denotes the scale-specific self-attention operation applied to features at scale s. Each scale-specific self-attention is designed to capture motion patterns and internal consistency at different temporal and spatial resolutions:

{SelfAttention}_{s} (X_{s}) = softmax (\frac{Q_{s} K_{s}^{T}}{\sqrt{d_{k}}}) V_{s}

(9)

where

Q_{s}

,

K_{s}

, and

V_{s}

are the query, key, and value matrices, respectively, for scale s, all derived from the same input precipitation features

X_{s}

through learned linear transformations, establishing the self-attention nature of this mechanism.

d_{k}

is the dimension of the key vectors, used for scaling to prevent the softmax function from having extremely small gradients. The softmax function ensures that attention weights sum to one across spatial locations. The multi-scale self-attention approach enables the discriminator to detect inconsistencies in precipitation movement patterns across different scales and ensures internal coherence within precipitation systems, from fine-grained convective processes to large-scale atmospheric circulation patterns.

Terrain-Aware Self-Attention (TASA)

Terrain effects significantly influence precipitation patterns through orographic lifting, rain shadow effects, and local circulation patterns. To capture the complex topographic relationships that dynamically interact with evolving precipitation patterns, we introduce a Terrain-Aware Self-Attention (TASA) mechanism that learns precipitation-conditioned terrain representations:

TASA (Z_{precip}, X_{terrain}) = SelfAttention (TerrainFusion (Z_{precip}, X_{terrain}))

(10)

In this equation,

Z_{precip}

represents the precipitation features from the current layer,

X_{terrain}

denotes the terrain data containing topographic information such as elevation, slope, and aspect. The terrain fusion mechanism combines precipitation-aware context with terrain information to create layer-specific terrain representations:

TerrainFusion (Z_{precip}, X_{terrain}) = Concat [Z_{precip}, E_{terrain} (X_{terrain})] \cdot W_{fusion}

(11)

Here, Concat denotes the concatenation operation that joins features along the channel dimension,

E_{terrain}

is a terrain encoder network that transforms raw topographic data into feature representations compatible with precipitation features, and

W_{fusion}

are learnable fusion parameters that project the concatenated features into a unified representation space. The self-attention mechanism then operates on the fused features:

SelfAttention (T_{fused}) = softmax (\frac{Q_{T} K_{T}^{T}}{\sqrt{d_{k}}}) V_{T}

(12)

where

T_{fused}

represents the fused terrain–precipitation features from the TerrainFusion operation,

Q_{T}

,

K_{T}

, and

V_{T}

are the query, key, and value matrices, respectively, all derived from the same fused terrain features

T_{fused}

through learned linear transformations, and

d_{k}

is the dimension of the key vectors for scaling. This enables the mechanism to learn terrain dependencies that are conditioned on the current precipitation state. This precipitation-aware terrain modeling allows the discriminator to assess terrain–precipitation interactions that vary across different precipitation scenarios and atmospheric conditions.

Cross-Modal Fusion Attention (CMFA)

To model the physical interaction between precipitation dynamics and terrain constraints, we implement a Cross-Modal Fusion Attention (CMFA) mechanism that operates on the outputs of MSMSA and TASA through cross-attention:

CMFA (Z_{precip}^{'}, Z_{terrain}^{'}) = CrossAttention (Z_{precip}^{'}, Z_{terrain}^{'}) + Z_{precip}^{'}

(13)

In this equation,

Z_{precip}^{'}

represents the processed precipitation features from MSMSA,

Z_{terrain}^{'}

denotes the processed terrain features from TASA, CrossAttention is the cross-attention mechanism that models inter-modal interactions, and the addition

+ Z_{precip}^{'}

implements a residual connection to preserve the original precipitation information. This mechanism enables directed information flow from terrain features to precipitation features through cross-attention:

CrossAttention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(14)

where Q is the query matrix derived from precipitation features

Z_{precip}^{'}

, K and V are the key and value matrices derived from precipitation-conditioned terrain features

Z_{terrain}^{'}

, respectively,

d_{k}

is the dimension of the key vectors for scaling, and softmax normalizes the attention weights. This establishes the cross-modal nature of this attention mechanism where precipitation features query relevant information from terrain features. This cross-modal attention assesses whether generated precipitation patterns are physically consistent with the underlying topography under current atmospheric conditions, enabling the discriminator to detect violations of scenario-specific orographic precipitation principles.

The complete MID architecture integrates these domain-specific attention mechanisms through a parallel-then-fusion strategy, as depicted in Figure 3. The discriminator D operates by:

Patch Embedding: Input frames are divided into patches and projected into token embeddings
Positional Encoding: Spatio-temporal positional embeddings are added to preserve location and time information for subsequent attention computations
Parallel Adaptive Processing: Precipitation features undergo multi-scale self-attention while terrain features are processed through precipitation-conditioned self-attention
Cross-Modal Fusion: The processed features are fused through CMFA to model physical interactions via cross-attention
Classification: A final probability score is produced through the classification token

The core of MID consists of L meteorological attention layers, each containing:

\begin{matrix} Z_{l, precip}^{'} = MSMSA (L N (Z_{l - 1})) + Z_{l - 1} \end{matrix}

(15)

\begin{matrix} Z_{l, terrain}^{'} = TASA (L N (Z_{l - 1}), X_{terrain}) + Z_{l - 1} \end{matrix}

(16)

\begin{matrix} Z_{l}^{″} = CMFA (L N (Z_{l, precip}^{'}), L N (Z_{l, terrain}^{'})) + Z_{l, precip}^{'} \end{matrix}

(17)

\begin{matrix} Z_{l} = FFN (L N (Z_{l}^{″})) + Z_{l}^{″} \end{matrix}

(18)

These equations describe the adaptive processing within each meteorological attention layer of the MID architecture. The precipitation features

Z_{l - 1}

undergo multi-scale self-attention through MSMSA (Equation (15)) to capture internal motion consistency. Simultaneously, the terrain features are processed through TASA (Equation (16)) using both the evolving precipitation context

Z_{l - 1}

and terrain information

X_{terrain}

, enabling the terrain representation to adapt to different precipitation scenarios across layers. The residual connection adds the original precipitation features to maintain information flow. Subsequently, CMFA in Equation (17) models the physical interaction between precipitation and precipitation-conditioned terrain features through cross-attention mechanisms. Finally, Equation (18) applies the Feed-Forward Network with residual connection to produce the layer output

Z_{l}

.

This precipitation-conditioned terrain modeling design offers significant advantages by enabling terrain representations that dynamically adapt to current precipitation scenarios. The approach enhances physical interaction modeling by allowing terrain features to be conditioned on precipitation states, improving assessment of orographic effects under varying atmospheric conditions. This design provides better discriminative capability for scenario-specific terrain–precipitation relationships, enabling the discriminator to evaluate whether generated precipitation patterns appropriately reflect the complex physical processes governing terrain–precipitation interactions under specific meteorological conditions.

3.4. DFST-GAN Framework

The complete DFST-GAN framework integrates the components described in the previous sections into a unified model for high-quality spatio-temporal sequence prediction, as illustrated in Figure 4.

The overall architecture of DFST-GAN consists of:

A generator network built with multiple FAST-LSTM layers for recursive prediction
A meteorological information-based discriminator that leverages domain-specific attention mechanisms to evaluate sequence realism

Two-Phase Training Strategy

DFST-GAN employs a progressive training approach, utilizing the Least Squares Generative Adversarial Network (LSGAN) objectives to ensure stable convergence:

Phase 1: Pre-Training

The generator is exclusively trained to establish a robust foundation:

L_{pre} = λ_{rec} L_{rec} + λ_{dec} L_{dec}

(19)

where

L_{rec}

is the reconstruction loss (MSE) and

L_{dec}

is the decoupling loss defined in Equation (6). The discriminator remains frozen during this phase.

Phase 2: Adversarial Training

Both networks are trained alternately with the following objectives:

Discriminator Loss:

L_{D} = \frac{1}{2} E_{Y \sim p_{d a t a}} [{(D (Y) - 1)}^{2}] + \frac{1}{2} E_{X \sim p_{d a t a}} [D {(G (X))}^{2}]

(20)

Generator Adversarial Loss:

L_{G} = \frac{1}{2} E_{X \sim p_{d a t a}} [{(D (G (X)) - 1)}^{2}]

(21)

Feature Matching Loss:

L_{f m} = \sum_{i = 1}^{L} \frac{1}{N_{i}} {∥ D^{(i)} (Y) - D^{(i)} (G (X)) ∥}_{1}

(22)

Meteorological Consistency Loss:

L_{met} = λ_{terrain} L_{terrain} + λ_{scale} L_{scale}

(23)

Complete Generator Loss:

L_{total} = λ_{rec} L_{rec} + λ_{dec} L_{dec} + λ_{adv} L_{G} + λ_{f m} L_{f m} + λ_{met} L_{met}

(24)

The training alternates between minimizing

L_{D}

for discriminator updates and

L_{total}

for generator updates until convergence. The hyperparameters

λ_{terrain}

,

λ_{scale}

,

λ_{rec}

,

λ_{dec}

,

λ_{adv}

,

λ_{f m}

, and

λ_{met}

balance the contribution of each loss component.

This progressive training strategy establishes generator stability before introducing adversarial objectives, enabling DFST-GAN to generate physically consistent, high-quality precipitation nowcasts with preserved fine details and accurate motion representation.

4. Experiments

4.1. Datasets

HKO-7

The HKO-7 dataset is a comprehensive meteorological dataset compiled by the Hong Kong Observatory, spanning the period from 2009 to 2015. The region’s subtropical monsoon climate and mountainous terrain contribute to weather patterns exhibiting high variability and limited predictability, making precipitation nowcasting particularly challenging. As illustrated in Figure 5, the dataset coverage area encompasses Hong Kong and surrounding regions, featuring complex topographical variations including coastal areas, urban centers, and mountainous terrain that significantly influence local precipitation patterns.

The dataset consists of radar reflectivity images captured every 6 min, with each frame covering an extensive area of 512 km × 512 km at a resolution of 480 × 480 pixels. The original logarithmic radar reflectivity factor (dBZ), which describes the intensity of radar echoes, is transformed into pixel values using a linear equation: pixel =

[255 \times \frac{(dBZ + 10)}{70} + 0.5]

, with values clipped between 0 and 255. The HKO-7 dataset has selected a total of 812 rainy day data points for training, 50 rainy day data points for validation, and 131 rainy day data points for testing. This division was adopted for the current study. Each day consists of 240 frames, providing substantial temporal coverage for studying weather evolution patterns. The comprehensive spatio-temporal coverage and high resolution of the HKO-7 dataset establish a robust benchmark for developing and refining high-performance meteorological prediction models, particularly for precipitation nowcasting in regions with complex weather dynamics.

4.2. Criteria

To assess the real-time forecasting capability of the model, this study employs five commonly employed evaluation metrics in weather forecasting: Critical Success Index (CSI) [39], Heidke Skill Score (HSS) [40], Probability of Detection (POD), False Alarm Ratio (FAR) [41] and Equitable Threat Score (ETS) [39]. To facilitate this evaluation, the forecast results and ground truth were transformed into binary matrices by applying predetermined radar reflectivity thresholds. The chosen thresholds of 30, 40, and 50 dBZ represent different levels of precipitation intensity in the HKO-7 dataset.

The classification process operates as follows: if a pixel in both the predicted image and the actual image is either 1 or 0, it represents a successful prediction. It is recorded as True Positive (TP) or True Negative (TN). If a pixel in the actual image is 1 while the corresponding pixel in the predicted image is 0, it indicates a false negative and is recorded as False Negative (FN). If a pixel in the actual image is 0 while the corresponding pixel in the predicted image is 1, it indicates a false positive and is recorded as False Positive (FP).

Based on these classifications, the five metrics are calculated using the following equations:

CSI = \frac{TP}{TP + FN + FP}

(25)

HSS = \frac{2 (TP \times TN - FN \times FP)}{[(TP + FP) (FP + TN) + (TP + FN) (FN + TN)]}

(26)

POD = \frac{TP}{TP + FN}

(27)

FAR = \frac{FP}{FP + TP}

(28)

ETS = \frac{TP - {TP}_{r a n d o m}}{TP + FN + FP - {TP}_{r a n d o m}}

(29)

where

{TP}_{r a n d o m} = \frac{(TP + FN) (TP + FP)}{N}

represents the expected number of hits due to random chance, and N is the total number of grid points.

These metrics have distinct interpretations:

CSI (Critical Success Index): quantifies the ratio of correctly predicted rainfall events to the total number of observed and/or predicted rainfall events, providing an overall measure of prediction accuracy [42].

HSS (Heidke Skill Score): evaluates the accuracy of predictions relative to random chance, accounting for correct predictions that may occur by random coincidence, thus providing a more robust measure of model skill [40].

POD (Probability of Detection): reflects the model’s ability to correctly identify actual rainfall events, indicating detection sensitivity [41].

FAR (False Alarm Ratio): represents the proportion of predicted rainfall events that did not occur, reflecting the specificity of predictions [41].

ETS (Equitable Threat Score): provides a skill score that accounts for hits due to chance, offering a more balanced assessment of forecast skill compared to CSI by removing the contribution of random correct forecasts [39].

The CSI, HSS, POD and ETS metrics reflect the accuracy of the model’s predictions, with values closer to 1 indicating higher prediction accuracy. Conversely, FAR measures the proportion of false positives, with lower values indicating stronger prediction capability and reduced forecast errors.

4.3. Implementation and Training Details

All experiments were conducted on an NVIDIA A100 40 GB GPU (NVIDIA Corporation, Santa Clara, CA, USA) with Intel Xeon dual-processors (Intel Corporation, Santa Clara, CA, USA), implemented using the PyTorch 1.13.1 framework (Meta Platforms, Inc., Menlo Park, CA, USA). The model architecture consists of four hidden layers, each with 128 units, and a filter size of 5. For optimization, the Adam optimizer was utilized with a learning rate of

1 \times 10^{- 5}

and a batch size of 8. A reverse-scheduled sampling strategy was implemented to increase the sampling ratio during training gradually. Mean squared error (MSE) was selected as the primary loss function, and a reverse input mechanism was employed to enhance temporal dependency modeling.

When comparing with baseline methods, consistent experimental configurations were maintained, and method-specific parameters were implemented as described in their respective original papers. The training process included regular model evaluation to monitor performance and prevent overfitting.

5. Results

5.1. Comparison with Baseline Models

Our comprehensive evaluation demonstrates that DFST-GAN consistently outperforms state-of-the-art spatio-temporal prediction models across multiple evaluation metrics. The results presented in Figure 6 provide compelling evidence of the superiority of our approach.

For moderate precipitation events (dBZ ≥ 30), DFST-GAN achieves a CSI of 0.739, compared to 0.691 for PredRNN-V2, representing a 6.9% improvement. The improvement becomes more significant for intense precipitation events (dBZ ≥ 50), where our model achieves a CSI of 0.322 compared to 0.271 for PredRNN-V2, representing a 18.8% relative improvement. This enhanced performance for high-intensity events is particularly valuable for operational meteorological services, as accurate prediction of severe weather events is crucial for public safety and disaster preparedness.

Figure 7 presents a comprehensive analysis of how key meteorological evaluation metrics evolve across forecast lead times, addressing the critical question of model performance sustainability over extended prediction horizons. The results demonstrate that DFST-GAN maintains consistently superior performance across all evaluated metrics throughout the entire 2-h forecast period, with initial advantages preserved across all forecast lead times. The degradation patterns reveal that DFST-GAN exhibits the most gradual decline in all metrics, retaining substantially higher skill scores at 120 min compared to baseline methods, while traditional optical flow shows rapid performance deterioration. This temporal analysis confirms that DFST-GAN not only achieves superior initial performance but also demonstrates enhanced forecast skill retention.

To provide a comprehensive evaluation across different meteorological scenarios, we present two detailed case studies representing distinct types of severe convective processes. These case studies utilize the standard meteorological radar reflectivity color scale where green indicates light precipitation (15–25 dBZ), yellow represents moderate precipitation (25–35 dBZ), orange denotes heavy precipitation (35–40 dBZ), red indicates intense precipitation (40–45 dBZ), and dark red represents extreme precipitation (dBZ ≥ 45). The model takes eight frames as input with 6-min intervals, but only the last four input frames are displayed for clarity. The prediction outputs are shown with increased 18-min intervals to provide a broader temporal perspective for prediction evaluation, enabling more intuitive assessment of model performance in preserving precipitation intensity distributions and spatial structures across different severity levels.

Figure 8 presents a moderate-intensity convective system characterized by well-organized precipitation structures with multiple intensity cores distributed across the domain. In this scenario, traditional optical flow methods demonstrate fundamental limitations by producing overly smoothed results that fail to preserve discrete precipitation cores. ConvLSTM exhibits progressive blurring and intensity degradation, while TrajGRU shows improved spatial coherence but suffers from systematic intensity underestimation in later prediction frames. PredRNN and PredRNN++ maintain better structural preservation but exhibit characteristic smoothing artifacts that compromise precipitation boundary definition. In contrast, DFST-GAN successfully preserves both spatial structure and intensity distribution, accurately capturing the evolution of individual precipitation cores while maintaining realistic intensity gradients throughout the prediction sequence.

Figure 9 presents a more challenging intense convective system with rapid morphological changes and high-intensity precipitation cores exceeding 45 dBZ, representing the type of severe weather events critical for operational meteorological services. Traditional methods exhibit severe degradation with optical flow producing unrealistic persistence patterns and substantial structural detail loss. Deep learning baselines demonstrate varying performance degradation, with ConvLSTM showing pronounced blurring and intensity loss, while more advanced methods like PredRNN++ maintain better overall structure but fail to preserve sharp intensity gradients characteristic of severe convective systems. DFST-GAN excels in this challenging scenario by accurately predicting high-intensity precipitation core evolution, preserving realistic intensity distributions, and maintaining sharp precipitation boundaries essential for severe weather prediction.

Through this comprehensive comparative analysis, it is evident that DFST-GAN excels not only in capturing complex spatio-temporal motion patterns but also in maintaining high-quality predictions across varying precipitation intensities and extended time horizons. Whether dealing with synthetic scenarios with well-defined trajectories or real-world meteorological phenomena with irregular and non-rigid transformations, DFST-GAN consistently demonstrates superior predictive capabilities. The model’s ability to preserve fine-grained structural details while reducing false alarms makes it a more reliable and practical tool for operational precipitation nowcasting applications.

5.2. Ablation Studies

To validate the contribution of each component in DFST-GAN, we conducted comprehensive ablation studies by systematically removing individual components and evaluating their impact on model performance. The results, presented in Table 1, provide valuable insights into the effectiveness of our architectural design choices.

The comprehensive ablation analysis reveals that each component of DFST-GAN contributes meaningfully to the overall performance, with varying degrees of impact on precipitation nowcasting accuracy. The dynamic flow mechanism proves to be the most critical component of our architecture. Removing this component results in a substantial performance drop, with CSI decreasing from 0.739 to 0.640 (13.4% relative decrease). This significant degradation confirms our hypothesis that adaptive sampling based on motion patterns is essential for accurately modeling complex precipitation dynamics. The traditional fixed convolutional kernels struggle to capture the non-rigid transformations characteristic of weather systems, leading to less accurate predictions and higher false alarm rates.

The adversarial learning framework is crucial to the model’s success and has a significant impact on its performance. Removing this framework results in a substantial performance drop, with the CSI decreasing from 0.739 to 0.631, a 14.6% relative decrease. The framework’s importance lies in its ability to compel the generator to create sharper and more physically plausible sequences. This adversarial dynamic is critical for improving prediction accuracy and reducing false alarms, leading to higher-quality forecasts.

The meteorological information-based discriminator (MID) contributes significantly to the model’s performance, as its replacement with a traditional CNN-based discriminator results in an 8.9% decrease in CSI. This improvement is attributed to MID’s domain-specific attention mechanisms designed for meteorological data characteristics. These specialized components enable the discriminator to capture complex precipitation motion patterns and terrain–precipitation interactions, allowing it to detect subtle inconsistencies that traditional CNN architectures might miss. This results in more physically consistent predictions and a reduced false alarm rate.

The memory decoupling mechanism provides a significant and consistent improvement across all metrics. The 5.7% decrease in CSI when removing this component validates the importance of separating short-term and long-term temporal dependencies. This separation allows the model to capture both immediate motion patterns and longer-term atmospheric evolution, leading to more robust predictions that better handle the complex multi-scale dynamics inherent in meteorological phenomena.

The ablation studies collectively confirm that each component of DFST-GAN contributes meaningfully to the overall performance, with the dynamic flow mechanism and adversarial learning providing the most substantial improvements. The synergistic effect of these components working together explains the superior performance of the complete DFST-GAN architecture.

5.3. Computational Complexity Analysis

To provide a comprehensive evaluation of our approach, we analyze the computational complexity of DFST-GAN compared to existing spatio-temporal prediction methods on the HKO-7 dataset. This analysis is crucial for understanding the trade-off between improved prediction accuracy and computational overhead, particularly for operational meteorological applications where real-time processing is essential.

Table 2 presents a detailed comparison of computational requirements across different methods. The analysis includes model parameters, memory consumption, and floating-point operations (FLOPs), all measured under identical experimental conditions on the HKO-7 dataset. All complexity measurements focus on inference-time requirements, representing the operational deployment scenario. For DFST-GAN, the analysis evaluates only the generator network, as the discriminator is used exclusively during training and not required for inference.

The results demonstrate that DFST-GAN achieves superior precipitation nowcasting performance with reasonable computational overhead. With 1.3 M parameters, 8.4 GB memory, and 86.2 GFLOPs, DFST-GAN requires slightly more resources than simpler methods like ConvLSTM but remains competitive with advanced approaches such as PredRNN++. The modest computational increase from the dynamic flow mechanism and enhanced FAST-LSTM cells is well-justified by the significant performance improvements in precipitation forecasting accuracy, making DFST-GAN suitable for operational deployment where prediction quality is prioritized.

5.4. Hyperparameter Sensitivity Analysis

The loss function weights (

λ_{r e c}

,

λ_{d e c}

,

λ_{a d v}

,

λ_{f m}

, and

λ_{m e t}

) play crucial roles as hyperparameters in the DFST-GAN model, governing the relative importance of different loss components during training. To systematically analyze their impact, we conduct sensitivity experiments using a controlled approach where each weight is varied individually while maintaining all other parameters at their baseline values determined through preliminary experiments.

Baseline Configuration: For all sensitivity experiments, we establish a baseline configuration with

λ_{r e c} = 1.0

,

λ_{d e c} = 0.1

,

λ_{a d v} = 0.001

,

λ_{f m} = 0.01

, and

λ_{m e t} = 0.05

. When analyzing the sensitivity of a specific weight, only that parameter is varied while all others remain fixed at their baseline values. This controlled approach ensures that observed performance changes can be attributed to the parameter under investigation.

5.4.1. Impact of Reconstruction Loss Weight

The reconstruction loss weight

λ_{r e c}

determines the emphasis on pixel-level accuracy. We systematically vary

λ_{r e c} \in {0.5, 1.0, 2.0, 5.0}

while maintaining other weights at their baseline values. The evaluation was performed on the HKO-7 dataset using multiple metrics.

From Table 3,

λ_{r e c} = 1.0

achieves optimal performance across all metrics, effectively balancing adversarial training with other loss components. Lower values reduce pixel-level accuracy emphasis, while higher values over-emphasize reconstruction at the expense of perceptual quality.

5.4.2. Impact of Decoupling Loss Weight

The decoupling loss weight

λ_{d e c}

controls memory state separation in FAST-LSTM cells. We vary

λ_{d e c} \in {0.01, 0.1, 0.5, 1.0}

while fixing other parameters at their baseline values.

Results in Table 4 show

λ_{d e c} = 0.1

achieves optimal balance. Too small values provide insufficient memory separation, while excessive values over-constrain temporal dynamics.

5.4.3. Impact of Adversarial Loss Weight

The adversarial loss weight

λ_{a d v}

controls the contribution of adversarial training in the overall loss function. We systematically vary

λ_{a d v} \in {0.0001, 0.001, 0.01, 0.1}

while maintaining all other weights fixed at their baseline values.

Table 5 shows

λ_{a d v} = 0.001

provides optimal performance. Insufficient adversarial weight limits generation quality, while excessive weight causes training instability and increased false alarm rates.

5.4.4. Impact of Feature Matching Loss Weight

The feature matching loss weight

λ_{f m}

balances intermediate-level feature alignment. We evaluate

λ_{f m} \in {0.001, 0.01, 0.1, 0.5}

while maintaining other weights at baseline values.

Table 6 shows

λ_{f m} = 0.01

provides optimal performance. Insufficient weight limits feature alignment, while excessive weight interferes with adversarial training dynamics.

5.4.5. Impact of Meteorological Consistency Loss Weight

The meteorological consistency loss weight

λ_{m e t}

incorporates domain-specific constraints. We analyze

λ_{m e t} \in {0.01, 0.05, 0.1, 0.2}

while fixing other parameters at baseline values.

Table 7 demonstrates

λ_{m e t} = 0.05

achieves optimal performance. Lower values provide insufficient domain constraints, while higher values over-emphasize meteorological consistency.

5.4.6. Optimal Weight Configuration Summary

Based on the comprehensive analysis presented above, we determined the optimal weight configuration for DFST-GAN as shown in Table 8.

This optimal configuration enables DFST-GAN to effectively capture spatio-temporal features while generating clear and accurate precipitation predictions. The systematic hyperparameter analysis confirms that each loss component contributes meaningfully to the overall performance, with the selected weights representing a well-balanced compromise between different training objectives.

5.5. Generalization Validation of Dynamic Flow Mechanism

To verify whether the dynamic flow mechanism can be transferred to other spatio-temporal models, we conducted validation experiments by integrating the proposed mechanism into other representative spatio-temporal models. This validation is essential to demonstrate the universal applicability of our approach and establish its broader contribution to the field.

We selected two widely-adopted baseline models for integration: ConvLSTM and PredRNN++. The dynamic flow mechanism was integrated by replacing standard convolutional operations within the core recurrent units while maintaining the original architectural characteristics.

Table 9 presents the comprehensive evaluation results for our generalization validation experiments. The experimental results demonstrate that our dynamic flow mechanism exhibits strong generalizability across different spatio-temporal architectures, with both ConvLSTM and PredRNN++ showing consistent performance improvements that are statistically significant across all evaluation metrics. The magnitude of improvement varies with the sophistication of the base architecture, where ConvLSTM shows larger relative improvements (7.1% CSI improvement) due to its simpler structure, while PredRNN++, which already incorporates advanced mechanisms, demonstrates more modest but still substantial gains (3.9% CSI improvement).

Based on these results, we can conclude that the dynamic flow mechanism exhibits strong generalizability across different spatio-temporal architectures. The consistent improvements observed validate that our approach addresses fundamental limitations in motion modeling that transcend specific architectural choices, making it a valuable contribution to the broader field of spatio-temporal sequence prediction.

6. Discussion

6.1. Current Limitations

6.1.1. Computational and Operational Considerations

The proposed DFST-GAN framework introduces significant computational overhead that presents challenges for real-time operational deployment. The dynamic flow mechanism requires computing displacement fields at each layer, while the meteorological information-based discriminator employs self-attention mechanisms that scale quadratically with sequence length. As demonstrated in Table 7, DFST-GAN requires 2.1 M parameters, 10.6 G memory, and 132.3 GFLOPs, representing substantial increases compared to baseline methods. This computational complexity raises concerns about the feasibility of deploying the model in real-time operational settings, particularly for high-resolution radar data processing where meteorological services require sub-minute response times for severe weather warnings.

The meteorological information-based discriminator architecture, while effective for capturing meteorological patterns, introduces additional latency that may be prohibitive for operational nowcasting systems. Current operational radar processing systems typically operate on 5–6 min update cycles, and the increased computational requirements could potentially delay critical weather warnings.

6.1.2. Limited Validation Data and Generalizability

The study’s validation is constrained by its reliance solely on radar reflectivity data from a single geographic region (Hong Kong Observatory dataset) without incorporating auxiliary meteorological variables such as wind speed, air pressure, temperature, or humidity fields. This limitation significantly restricts the model’s ability to capture the full complexity of atmospheric dynamics and raises questions about generalizability to different climatic regions and weather patterns.

The HKO-7 dataset, while comprehensive for the Hong Kong region, represents a specific subtropical monsoon climate with particular topographical characteristics. The model’s performance on different climate zones (e.g., continental, arctic, desert) or geographical features (e.g., flat plains, mountain ranges, coastal areas) remains unverified. This geographic limitation is particularly concerning for a model intended for operational meteorological applications, as weather patterns and precipitation dynamics vary significantly across different regions.

Furthermore, the study period (2009–2015) may not adequately represent the full spectrum of meteorological variability, including extreme weather events or long-term climate variations. The absence of multi-seasonal and multi-annual validation across diverse weather conditions limits confidence in the model’s robustness for operational deployment.

6.1.3. Temporal and Spatial Scale Limitations

Despite improvements over baseline methods, DFST-GAN continues to experience quality degradation over extended forecast horizons, with the temporal performance analysis revealing declining accuracy beyond 2–3 h. This limitation is particularly problematic for meteorological applications requiring longer-range nowcasts (4–6 h) for operational planning and emergency preparedness.

Beyond temporal constraints, the spatial resolution characteristics of the current implementation also present significant limitations. The spatial resolution constraints of the current implementation (480 × 480 pixels covering 512 × 512 km) may not provide sufficient detail for urban-scale applications or high-resolution local forecasting requirements. Many operational applications require kilometer or sub-kilometer resolution predictions that may challenge the current architectural design.

7. Conclusions

This paper presents Dynamic Flow Spatio-Temporal Generative Adversarial Network (DFST-GAN), a novel deep learning model for high-quality precipitation nowcasting. Our approach addresses two critical limitations in existing methods: the inability to adaptively model complex motion patterns with varying trajectories in different regions of radar images, and the tendency to produce increasingly blurry predictions over extended time horizons. Through the integration of a dynamic flow feature extraction mechanism with a specialized meteorological information-based discriminator, DFST-GAN achieves significant improvements in both prediction accuracy and visual quality while maintaining physical consistency.

The key contributions of this work include:

The introduction of a Dynamic Flow Spatio-Temporal Feature Extractor that employs learnable sampling locations to adaptively adjust based on underlying motion patterns, enabling accurate tracking of complex precipitation system trajectories with non-rigid atmospheric transformations across different regions of radar images.
The integration of a Flow-Adaptive Spatio-Temporal LSTM (FAST-LSTM) cell that combines dynamic flow mechanisms with memory decoupling strategies, effectively separating short-term and long-term temporal dependencies while maintaining spatial adaptability.
The development of a novel meteorological information-based discriminator (MID) that incorporates domain-specific attention mechanisms specifically designed for meteorological data characteristics, including Multi-Scale Motion Self-Attention (MSMSA), Terrain-Aware Self-Attention (TASA), and Cross-Modal Fusion Attention (CMFA) to model physical interactions between precipitation dynamics and terrain effects.
A comprehensive two-phase training strategy that establishes generator stability through pre-training before introducing adversarial objectives, enabling the generation of sharp, physically consistent predictions that preserve fine-grained details even for longer forecast horizons.

Our experimental evaluations on the HKO-7 dataset demonstrate that DFST-GAN consistently outperforms state-of-the-art spatio-temporal prediction models across all evaluation metrics. For the HKO-7 dataset, our model shows particularly notable improvements for moderate to heavy rainfall events (dBZ ≥ 50), achieving an 18.8% relative improvement in CSI compared to PredRNN-V2, which is crucial for operational meteorological services and severe weather prediction.

The comprehensive ablation studies confirm that each component of DFST-GAN contributes meaningfully to the overall performance, with the dynamic flow mechanism and adversarial learning framework providing the most substantial improvements. The generalization validation demonstrates that our dynamic flow mechanism exhibits strong transferability across different spatio-temporal architectures, with consistent improvements observed when integrated into ConvLSTM and PredRNN++ models.

Our work advances the field by introducing novel architectural innovations that address fundamental limitations in spatio-temporal modeling for meteorological applications. DFST-GAN’s unique dynamic flow mechanism provides superior adaptability to complex motion patterns compared to approaches that rely on fixed convolutional operations, while the specialized meteorological discriminator enables more accurate assessment of physical consistency in generated predictions.

While DFST-GAN represents a significant advancement in precipitation nowcasting, several research directions warrant further investigation. Computational efficiency optimization and multi-regional validation across diverse climate zones are essential for operational deployment beyond subtropical regions. Enhanced temporal-spatial resolution frameworks and integration of multimodal meteorological data sources could address current limitations in extended forecast horizons and provide richer atmospheric context. The methodology developed in this paper opens new avenues for integrating sophisticated deep learning techniques with meteorological forecasting applications, potentially contributing to improved public safety, urban water management, and transportation systems through more accurate and reliable short-term precipitation predictions.

Author Contributions

Conceptualization, W.Y. and J.S.; methodology, W.Y.; software, J.S. and H.Q.; validation, C.Z., K.Z. and J.L.; formal analysis, J.S.; investigation, J.S.; resources, W.Y.; data curation, H.Q.; writing—original draft preparation, J.S.; writing—review and editing, W.Y.; visualization, J.S. and G.L.; supervision, W.Y.; project administration, W.Y.; funding acquisition, W.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of China, grant numbers 62473201; the Basic Research Program of Jiangsu, grant number BK20231142.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Acknowledgments

We acknowledge the support from the Natural Science Foundation of China and the Natural Science Foundation of Jiangsu Province. We also thank the administrative and technical support provided by Nanjing University of Information Science and Technology.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Germann, U.; Zawadzki, I. Scale-dependence of the predictability of precipitation from continental radar images. Part I: Description of the methodology. Mon. Weather Rev. 2002, 130, 2859–2873. [Google Scholar] [CrossRef]
Trenberth, K.E. Changes in precipitation with climate change. Clim. Res. 2011, 47, 123–138. [Google Scholar] [CrossRef]
Foresti, L.; Panziera, L.; Mandapaka, P.V.; Germann, U.; Seed, A. Retrieval of analogue radar images for ensemble nowcasting of orographic rainfall. Meteorol. Appl. 2015, 22, 141–155. [Google Scholar] [CrossRef]
Bowler, N.E.; Pierce, C.E.; Seed, A.W. STEPS: A probabilistic precipitation forecasting scheme which merges an extrapolation nowcast with downscaled NWP. Q. J. R. Meteorol. Soc. 2006, 132, 2127–2155. [Google Scholar] [CrossRef]
Wilson, J.W.; Crook, N.A.; Mueller, C.K.; Sun, J.; Dixon, M. Nowcasting thunderstorms: A status report. Bull. Am. Meteorol. Soc. 1998, 79, 2079–2100. [Google Scholar] [CrossRef]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the 28th Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
Wang, Y.; Long, M.; Wang, J.; Gao, Z.; Yu, P.S. Predrnn: Recurrent neural networks for predictive learning using spatiotemporal LSTMs. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
Mathieu, M.; Couprie, C.; LeCun, Y. Deep multi-scale video prediction beyond mean square error. arXiv 2015, arXiv:1511.05440. [Google Scholar]
Wang, Y.; Gao, Z.; Long, M.; Wang, J.; Yu, P.S. Predrnn++: Towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 5123–5132. [Google Scholar]
Wang, Y.; Zhang, J.; Zhu, H.; Long, M.; Wang, J.; Yu, P.S. Memory in memory: A predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 9154–9162. [Google Scholar]
Wang, Y.; Wu, H.; Zhang, J.; Gao, Z.; Wang, J.; Philip, S.Y.; Long, M. Predrnn: A recurrent neural network for spatiotemporal predictive learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 2208–2225. [Google Scholar] [CrossRef]
Wang, Y.; Jiang, L.; Yang, M.H.; Li, L.J.; Long, M.; Li, F.-F. Eidetic 3D LSTM: A model for video prediction and beyond. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Bertasius, G.; Wang, H.; Torresani, L. Is space-time attention all you need for video understanding? In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; pp. 813–824. [Google Scholar]
Tahghighi, P.; Koochari, A.; Jalali, M. Deformable convolutional LSTM for human body emotion recognition. In Proceedings of the 25th International Conference on Pattern Recognition, Milan, Italy, 10–15 January 2021; pp. 741–747. [Google Scholar]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the 28th Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar] [CrossRef]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
Vondrick, C.; Pirsiavash, H.; Torralba, A. Generating videos with scene dynamics. In Proceedings of the 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 613–621. [Google Scholar]
Tulyakov, S.; Liu, M.Y.; Yang, X.; Kautz, J. Mocogan: Decomposing motion and content for video generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1526–1535. [Google Scholar]
Denton, E.L. Unsupervised learning of disentangled representations from video. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4414–4423. [Google Scholar]
Tian, Y.; Ren, J.; Chai, M.; Olszewski, K.; Peng, X.; Metaxas, D.N.; Tulyakov, S. A good image generator is what you need for high-resolution video synthesis. arXiv 2021, arXiv:2104.15069. [Google Scholar] [CrossRef]
Jiang, Y.; Chang, S.; Wang, Z. Transgan: Two pure Transformers can make one strong GAN, and that can scale up. In Proceedings of the 35th Conference on Neural Information Processing Systems, Online, 6–14 December 2021; pp. 14745–14758. [Google Scholar]
Wang, T.C.; Liu, M.Y.; Zhu, J.Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8798–8807. [Google Scholar]
Guen, V.L.; Thome, N. Disentangling physical dynamics from unknown factors for unsupervised video prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11474–11484. [Google Scholar]
Shi, X.; Gao, Z.; Lausen, L.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Deep learning for precipitation nowcasting: A benchmark and a new model. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 1131–1141. [Google Scholar]
Sønderby, C.K.; Espeholt, L.; Heek, J.; Dehghani, M.; Oliver, A.; Salimans, T.; Agrawal, S.; Hickey, J.; Kalchbrenner, N. Metnet: A neural weather model for precipitation forecasting. arXiv 2020, arXiv:2003.12140. [Google Scholar] [CrossRef]
Gao, Z.; Zheng, J.-J.; Wang, Z.; Zaharia, M. EarthformerV2: Enabling efficient video-level modeling for earth system. arXiv 2023, arXiv:2304.11971. [Google Scholar]
Ayzel, G.; Scheffer, T.; Heistermann, M. RainNet v1.0: A convolutional neural network for radar-based precipitation nowcasting. Geosci. Model Dev. 2020, 13, 2631–2644. [Google Scholar] [CrossRef]
Zhang, Z.; Zeng, Y.; Bai, K. Skillful-Net: A multi-scale analysis network for precipitation nowcasting. J. Adv. Model. Earth Syst. 2022, 14, e2022MS003126. [Google Scholar]
Zhang, Y.; Long, M.; Chen, K.; Xie, L.; Wen, H.; Zhang, J.; Yang, Q.; Lv, G. Skilful nowcasting of extreme precipitation with NowcastNet. Nature 2023, 619, 526–532. [Google Scholar] [CrossRef]
Ravuri, S.; Lenc, K.; Willson, M.; Kangin, D.; Lam, R.; Mirowski, P.; Fitzsimons, M.; Athanassiadou, M.; Kashem, S.; Madge, S.; et al. Skilful precipitation nowcasting using deep generative models of radar. Nature 2021, 597, 672–677. [Google Scholar] [CrossRef]
Rojas-Campos, A.; Langguth, M.; Wittenbrink, M.; Pipa, G. Deep learning models for generation of precipitation maps based on numerical weather prediction. Geosci. Model Dev. 2023, 16, 1467–1480. [Google Scholar] [CrossRef]
Husain, S.Z.; Separovic, L.; Caron, J.F.; Aider, R.; Buehner, M.; Chamberland, S.; Lapalme, E.; McTaggart-Cowan, R.; Subich, C.; Vaillancourt, P.A.; et al. Leveraging data-driven weather models for improving numerical weather prediction skill through large-scale spectral nudging. Weather Forecast. 2025, 40, 1749–1771. [Google Scholar] [CrossRef]
Ha, J.-H.; Lee, H. A deep learning model for precipitation nowcasting using multiple optical flow algorithms. Weather Forecast. 2024, 39, 41–53. [Google Scholar] [CrossRef]
Ha, J.-H.; Lee, H. Enhancing rainfall nowcasting using generative deep learning model with multi-temporal optical flow. Remote Sens. 2023, 15, 5169. [Google Scholar] [CrossRef]
Peng, D.; Chen, M.; Zhang, Y.; Tian, Z. Enhanced optic-flow extrapolation for Doppler radar nowcasting with Dynamic Weight Attention. Expert Syst. Appl. 2025, 267, 126168. [Google Scholar] [CrossRef]
Sakaino, H.; Ningrum, D.F.; Insisiengmay, A.; Zamora, L.; Gaviphat, N. DeepRainX: Integrated image nowcast based on deep learning and physical models. In Proceedings of the 2023 IEEE Conference on Artificial Intelligence, Santa Clara, CA, USA, 5–6 June 2023; pp. 99–100. [Google Scholar]
Schaefer, J.T. The critical success index as an indicator of warning skill. Wea. Forecast. 1990, 5, 570–575. [Google Scholar] [CrossRef]
Jolliffe, I.T.; Stephenson, D.B. (Eds.) Forecast Verification: A Practitioner’s Guide in Atmospheric Science; John Wiley & Sons: Chichester, UK, 2012. [Google Scholar]
Wilks, D.S. Statistical Methods in the Atmospheric Sciences, 3rd ed.; Academic Press: Waltham, MA, USA, 2011. [Google Scholar]
Donaldson, R.J.; Dyer, R.M.; Kraus, M.J. An objective evaluator of techniques for predicting severe weather events. In Proceedings of the Ninth Conference on Severe Local Storms, Norman, OK, USA, 21–23 October 1975; pp. 321–326. [Google Scholar]

Figure 1. Illustration of the dynamic sampling process.

Figure 2. The architecture of the Flow-Adaptive Spatio-Temporal LSTM (FAST-LSTM).

Figure 3. The architecture of the meteorological information-based discriminator (MID). The architecture features parallel processing of precipitation and terrain features through dedicated self-attention mechanisms, followed by cross-attention fusion to model their physical interactions.

Figure 4. The overall architecture of DFST-GAN.

Figure 5. Geographic coverage area of the HKO-7 radar dataset, displaying elevation contours and topographical features of Hong Kong and surrounding regions.

Figure 6. Comparison of metrics across different dBZ thresholds.

Figure 7. Performance evolution of key meteorological evaluation metrics across forecast lead times on the HKO-7 dataset (

dBZ \geq 30

).

Figure 7. Performance evolution of key meteorological evaluation metrics across forecast lead times on the HKO-7 dataset (

dBZ \geq 30

).

Figure 8. Case Study 1: Convective precipitation system evolution on the HKO-7 dataset.

Figure 9. Case Study 2: Intense convective precipitation event with rapid evolution.

Table 1. Ablation study results on the HKO-7 dataset (

dBZ \geq 30

).

Table 1. Ablation study results on the HKO-7 dataset (

dBZ \geq 30

).

Component Configuration	CSI ↑	HSS ↑	POD ↑	FAR ↓	ETS ↑
DFST-GAN (w/o Dynamic Flow)	0.640	0.582	0.742	0.236	0.598
DFST-GAN (w/o Adversarial)	0.631	0.573	0.731	0.240	0.589
DFST-GAN (w/ CNN-D)	0.673	0.643	0.812	0.218	0.631
DFST-GAN (w/o Mem. Decoupling)	0.697	0.633	0.807	0.217	0.655
DFST-GAN (Full Model)	0.739	0.707	0.874	0.178	0.695

Note: CNN-D refers to replacing the meteorological information-based discriminator (MID) with a traditional CNN-based discriminator.

Table 2. Computational complexity comparison on the HKO-7 dataset.

Method	Params (M) ↓	Memory (G) ↓	FLOPs (G) ↓
ConvLSTM	0.4	3.6	34.4
TrajGRU	0.7	4.2	45.8
PredRNN	0.9	6.3	69.8
Deformable ConvLSTM	1.2	5.8	58.6
PredRNN++	1.4	8.8	94.7
PredRNN-V2	0.9	7.3	70.8
DFST-GAN	1.3	8.4	86.2

Table 3. Performance with different

λ_{r e c}

values (dBZ ≥ 30).

Table 3. Performance with different

λ_{r e c}

values (dBZ ≥ 30).

$λ_{rec}$	CSI ↑	HSS ↑	POD ↑	FAR ↓	ETS ↑
0.5	0.705	0.672	0.851	0.202	0.663
1.0	0.739	0.707	0.874	0.178	0.695
2.0	0.718	0.686	0.849	0.195	0.676
5.0	0.695	0.663	0.835	0.218	0.653

Note: Other hyperparameters fixed at

λ_{d e c} = 0.1

,

λ_{a d v} = 0.001

,

λ_{f m} = 0.01

,

λ_{m e t} = 0.05

.

Table 4. Performance with different

λ_{d e c}

values (dBZ ≥ 30).

Table 4. Performance with different

λ_{d e c}

values (dBZ ≥ 30).

$λ_{dec}$	CSI ↑	HSS ↑	POD ↑	FAR ↓	ETS ↑
0.01	0.721	0.689	0.858	0.192	0.679
0.1	0.739	0.707	0.874	0.178	0.695
0.5	0.725	0.693	0.852	0.189	0.683
1.0	0.708	0.676	0.841	0.205	0.666

Note: Other hyperparameters fixed at

λ_{r e c} = 1.0

,

λ_{a d v} = 0.001

,

λ_{f m} = 0.01

,

λ_{m e t} = 0.05

.

Table 5. Performance with different

λ_{a d v}

values (dBZ ≥ 30).

Table 5. Performance with different

λ_{a d v}

values (dBZ ≥ 30).

$λ_{adv}$	CSI ↑	HSS ↑	POD ↑	FAR ↓	ETS ↑
0.0001	0.698	0.665	0.842	0.208	0.656
0.001	0.739	0.707	0.874	0.178	0.695
0.01	0.715	0.683	0.878	0.225	0.673
0.1	0.689	0.658	0.891	0.285	0.647

Note: Other hyperparameters fixed at

λ_{r e c} = 1.0

,

λ_{d e c} = 0.1

,

λ_{f m} = 0.01

,

λ_{m e t} = 0.05

.

Table 6. Performance with different

λ_{f m}

values (dBZ ≥ 30).

Table 6. Performance with different

λ_{f m}

values (dBZ ≥ 30).

$λ_{fm}$	CSI ↑	HSS ↑	POD ↑	FAR ↓	ETS ↑
0.001	0.714	0.681	0.859	0.198	0.672
0.01	0.739	0.707	0.874	0.178	0.695
0.1	0.723	0.690	0.871	0.203	0.681
0.5	0.709	0.677	0.883	0.221	0.667

Note: Other hyperparameters fixed at

λ_{r e c} = 1.0

,

λ_{d e c} = 0.1

,

λ_{a d v} = 0.001

,

λ_{m e t} = 0.05

.

Table 7. Performance with different

λ_{m e t}

values (dBZ ≥ 30).

Table 7. Performance with different

λ_{m e t}

values (dBZ ≥ 30).

$λ_{met}$	CSI ↑	HSS ↑	POD ↑	FAR ↓	ETS ↑
0.01	0.719	0.686	0.861	0.194	0.677
0.05	0.739	0.707	0.874	0.178	0.695
0.1	0.726	0.694	0.857	0.188	0.684
0.2	0.711	0.679	0.849	0.201	0.669

Note: Other hyperparameters fixed at

λ_{r e c} = 1.0

,

λ_{d e c} = 0.1

,

λ_{a d v} = 0.001

,

λ_{f m} = 0.01

.

Table 8. Optimal hyperparameter configuration.

$λ_{rec}$	$λ_{dec}$	$λ_{adv}$	$λ_{fm}$	$λ_{met}$
1.0	0.1	0.001	0.01	0.05

Table 9. Generalization validation on the HKO-7 dataset (dBZ ≥ 30).

Model	CSI ↑	HSS ↑	POD ↑	FAR ↓	ETS ↑
ConvLSTM	0.592	0.548	0.721	0.274	0.551
ConvLSTM-DF	0.634	0.593	0.758	0.241	0.592
PredRNN++	0.685	0.651	0.818	0.205	0.642
PredRNN++-DF	0.712	0.681	0.839	0.186	0.670

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, J.; Yu, W.; Qian, H.; Zhang, C.; Zhu, K.; Liu, J.; Liu, G. DFST-GAN: A Dynamic Flow Spatio-Temporal Generative Adversarial Network for High-Quality Precipitation Nowcasting. Remote Sens. 2025, 17, 2974. https://doi.org/10.3390/rs17172974

AMA Style

Shi J, Yu W, Qian H, Zhang C, Zhu K, Liu J, Liu G. DFST-GAN: A Dynamic Flow Spatio-Temporal Generative Adversarial Network for High-Quality Precipitation Nowcasting. Remote Sensing. 2025; 17(17):2974. https://doi.org/10.3390/rs17172974

Chicago/Turabian Style

Shi, Jiawei, Wenbin Yu, Hongjie Qian, Chengjun Zhang, Konglin Zhu, Jie Liu, and Gaoping Liu. 2025. "DFST-GAN: A Dynamic Flow Spatio-Temporal Generative Adversarial Network for High-Quality Precipitation Nowcasting" Remote Sensing 17, no. 17: 2974. https://doi.org/10.3390/rs17172974

APA Style

Shi, J., Yu, W., Qian, H., Zhang, C., Zhu, K., Liu, J., & Liu, G. (2025). DFST-GAN: A Dynamic Flow Spatio-Temporal Generative Adversarial Network for High-Quality Precipitation Nowcasting. Remote Sensing, 17(17), 2974. https://doi.org/10.3390/rs17172974

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DFST-GAN: A Dynamic Flow Spatio-Temporal Generative Adversarial Network for High-Quality Precipitation Nowcasting

Abstract

1. Introduction

2. Related Work

2.1. Spatio-Temporal Sequence Prediction Models

2.2. Generative Adversarial Networks for Sequence Generation

2.3. Precipitation Nowcasting

2.4. Hybrid NWP-Deep Learning and Enhanced Optical Flow Methods

3. Methods

3.1. Dynamic Flow Spatio-Temporal Feature Extractor

3.2. Flow-Adaptive Spatio-Temporal LSTM (FAST-LSTM)

3.3. Spatio-Temporal Adversarial Generative Network

3.3.1. Generator Architecture

3.3.2. Meteorological Information-Based Discriminator

Multi-Scale Motion Self-Attention (MSMSA)

Terrain-Aware Self-Attention (TASA)

Cross-Modal Fusion Attention (CMFA)

3.4. DFST-GAN Framework

Two-Phase Training Strategy

4. Experiments

4.1. Datasets

HKO-7

4.2. Criteria

4.3. Implementation and Training Details

5. Results

5.1. Comparison with Baseline Models

5.2. Ablation Studies

5.3. Computational Complexity Analysis

5.4. Hyperparameter Sensitivity Analysis

5.4.1. Impact of Reconstruction Loss Weight

5.4.2. Impact of Decoupling Loss Weight

5.4.3. Impact of Adversarial Loss Weight

5.4.4. Impact of Feature Matching Loss Weight

5.4.5. Impact of Meteorological Consistency Loss Weight

5.4.6. Optimal Weight Configuration Summary

5.5. Generalization Validation of Dynamic Flow Mechanism

6. Discussion

6.1. Current Limitations

6.1.1. Computational and Operational Considerations

6.1.2. Limited Validation Data and Generalizability

6.1.3. Temporal and Spatial Scale Limitations

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI