Next Article in Journal
Development and Investigation of a Grasping Analysis System with Two-Axis Force Sensors at Each of the 16 Points on the Object Surface for a Hardware-Based FinRay-Type Soft Gripper
Previous Article in Journal
MixImages: An Urban Perception AI Method Based on Polarization Multimodalities
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improving Nowcasting of Intense Convective Precipitation by Incorporating Dual-Polarization Radar Variables into Generative Adversarial Networks

1
School of Mathematics and Computer Science, Guangdong Ocean University, Zhanjiang 524088, China
2
School of Fisheries, Guangdong Ocean University, Zhanjiang 524088, China
*
Author to whom correspondence should be addressed.
Sensors 2024, 24(15), 4895; https://doi.org/10.3390/s24154895
Submission received: 10 June 2024 / Revised: 14 July 2024 / Accepted: 26 July 2024 / Published: 28 July 2024
(This article belongs to the Section Radar Sensors)

Abstract

:
The nowcasting of strong convective precipitation is highly demanded and presents significant challenges, as it offers meteorological services to diverse socio-economic sectors to prevent catastrophic weather events accompanied by strong convective precipitation from causing substantial economic losses and human casualties. With the accumulation of dual-polarization radar data, deep learning models based on data have been widely applied in the nowcasting of precipitation. Deep learning models exhibit certain limitations in the nowcasting approach: The evolutionary method is prone to accumulate errors throughout the iterative process (where multiple autoregressive models generate future motion fields and intensity residuals and then implicitly iterate to yield predictions), and the “regression to average” issue of autoregressive model leads to the “blurring” phenomenon. The evolution method’s generator is a two-stage model: In the initial stage, the generator employs the evolution method to generate the provisional forecasted data; in the subsequent stage, the generator reprocesses the provisional forecasted data. Although the evolution method’s generator is a generative adversarial network, the adversarial strategy adopted by this model ignores the significance of temporary prediction data. Therefore, this study proposes an Adversarial Autoregressive Network (AANet): Firstly, the forecasted data are generated via the two-stage generators (where FURENet directly produces the provisional forecasted data, and the Semantic Synthesis Model reprocesses the provisional forecasted data); Subsequently, structural similarity loss (SSIM loss) is utilized to mitigate the influence of the “regression to average” issue; Finally, the two-stage adversarial (Tadv) strategy is adopted to assist the two-stage generators to generate more realistic and highly similar generated data. It has been experimentally verified that AANet outperforms NowcastNet in the nowcasting of the next 1 h, with a reduction of 0.0763 in normalized error (NE), 0.377 in root mean square error (RMSE), and 4.2% in false alarm rate (FAR), as well as an enhancement of 1.45 in peak signal-to-noise ratio (PSNR), 0.0208 in SSIM, 5.78% in critical success index (CSI), 6.25% in probability of detection (POD), and 5.7% in F1.

1. Introduction

Strong convective precipitation is typically associated with disastrous weather phenomena, such as rainstorms and hurricanes, which frequently result in considerable economic losses and casualties. Strong convective weather is characterized by small spatial scales, short life cycles, and pronounced suddenness. Consequently, providing accurate nowcasting [1] poses a considerable challenge.
In recent years, the integration of polarimetric radar variables and numerical weather prediction (NWP) [2] has been intensively explored in the field of nowcasting applications. However, the numerical weather prediction conducted on supercomputer services [3,4] can merely provide weather forecasts of medium and small spatial scales within the upcoming few hours. Based on the continuity equation, pySTEPS [4] and DARTS [5] solve for future motion fields and future intensity residuals and generate future precipitation data iteratively. The aforementioned approach [4,5] incorporates deep learning. NowcastNet [6] employs multiple autoregressive models to fit the motion field and intensity residuals separately, generates the provisional forecasted data through an iterative evolution operator, and reprocesses the provisional forecasted data to obtain the final forecasted data via an encoder–decoder structure network. Compared with DGMR [3] of recurrent neural networks, NowcastNet reduces the computational cost; however, the simplistic iterative method accumulates more errors. In adversarial training, NowcastNet only utilizes the final forecasted data and observed data but neglects the provisional forecasted data. Hence, the limitations of NowcastNet are as follows: the autoregressive model is subject to the “regressing to the mean” problem; the iterative evolution process is prone to inevitable error accumulation; in adversarial strategy, the significance of the provisional forecasted data is disregarded, and the final forecasted data are regarded as the actual data.
The above-mentioned approaches [3,4,5,6] utilize precipitation data as the input data, and the precipitation data are acquired through the numerical extrapolation method of polarimetric radar variables [7]. Concerning various meteorological systems and distinct geographical regions, the numerical extrapolation method [7] exhibits diverse numerical expressions, and its accuracy is positively correlated with the volume of data. The key to precipitation forecasting is to predict the precipitation intensity in the future period, and the horizontal reflectivity ( Z H ) of polarimetric radar variables can reflect the precipitation intensity. With the upgrade of the weather radar [8] to a dual-polarization radar [9,10,11], the polarization variables become six variables. The differential reflectivity ( Z D R ) and the specific differential phase shift ( K D P ) [12] provide additional precipitation information [11,13], and Z D R and K D P help ensure the reliability of the horizontal reflectivity ( Z H ). Shi et al. [14] proposed a convolutional long short-term memory (ConvLSTM) network based on convolutional neural networks and recurrent neural networks. However, ConvLSTM has only a single input information and does not use multiple radar variables. Based on multimodal learning methods [15,16], Pan et al. [17] proposed the idea of using multiple radar information as input, utilized the UNet [18] structure that can easily adjust and use multiple inputs, and proposed FURENet. In this study, FURENet [17] generates the provisional forecasted data. However, FURENet is afflicted by the “regression to average” issue (where “regression to average” implies that the data generated by the autoregressive model are overly blurred). Hence, the structural similarity loss (SSIM loss) [19,20] is employed to mitigate this “regression to average” issue.
Based on NowcastNet, this study proposes an Adversarial Autoregressive Network (AANet). The generator of AANet is partitioned into FURENet and the Semantic Synthesis Model (SSM), while the discriminator of AANet is the Inception network [21]. FURENet directly generates the provisional forecasted data, and SSIM loss alleviates the “regression to average” issue. The Semantic Synthesis Model [22] introduces the multi-head self-attention mechanism [23] and introduces a global attention mechanism for the past observed data and the provisional forecasted data to prevent SSM from gradually disregarding the role of the past observed data during the spatial normalization process. In the adversarial strategy stage, the discriminator classifies both the provisional forecasted data and the final forecasted data as false, while categorizing the observed data as true. Its objective is to facilitate the concurrent enhancement of FURENet and SSM in adversarial training and to impel FURENet to generate more realistic and more similar generated data, as well as to urge SSM to produce increasingly realistic and increasingly similar generated data, thereby improving the nowcasting performance of the two stages.

2. Methods

The Adversarial Autoregressive Network (AANet) consists of two parts: generator and discriminator. The generator produces forecast data in two phases: In the first stage, FURENet [17] directly generates the provisional forecasted data; in the second stage, the Semantic Synthesis Model (SSM) reprocesses the provisional forecasted data and generates the final forecasted data. discriminator takes the provisional forecasted data, final forecasted data, and observational data as input data. In adversarial training strategy, it promotes the two-stage generator to generate more realistic and more similar generated data: The first-stage generator (FURENet) generates more realistic and more similar provisional generated data; the second-stage generator (SSM) uses the provisional generated data to generate more realistic and more similar final generated data.

2.1. Method for Provisional Forecasted Data Generation

This subsection utilizes mathematical expressions solely to simulate the process of generating the provisional forecasted data in NowcastNet [6] and to simulate the process of generating the provisional forecasted data in AANet. The aim is to compare the errors generated during the generation of the provisional forecasted data via these two approaches. For the sake of descriptive convenience, the approach through which FURENet [17] (the generator model of the first stage of AANet) generates forecasted data is termed as the autoregressive method (A), while the method by which NowcastNet generates the provisional forecasted data is named as the Evolutionary Method (E).

2.1.1. Autoregressive Method

The autoregressive method [17] directly generates forecasted data, as depicted in Equation (1):
X 1 : T = C + i = 0 T 0 φ i X T 0 : 0 + ε
X is the “precipitation data”; C, and φ are the constant tensor; ε is the random error.
The error of the autoregressive method merely originates from the “regression to average” issue, rather than being accumulated through the iterative evolution operator [6].

2.1.2. Evolutionary Method

The evolutionary method [6] generates the motion field and intensity residuals using two autoregressive models (as presented in Equation (1)) and subsequently produces the forecasted data through the evolutionary operator. As depicted in Equation (2),
X i = V i X i 1 + S i + ε i , i 1 , T
X is the “precipitation data”; ε is the random error; V and S represent the motion field and intensity residuals, respectively; and denotes the transfer symbol.
The forecasted data X i are acquired via the evolution operator [6]. By conducting T iterations of the cycle for Equation (2), the forecasted data X 1 : T can be obtained.
The evolutionary method is influenced by two sources of error: the ambiguity error resulting from the “regression to average” issue of the autoregressive model and the positional error that accumulates progressively in the iterative evolutionary process.

2.2. Adversarial Autoregressive Network

The core objective of this study is to optimize the nowcasting approach of convective precipitation through deep learning, thereby offering efficient meteorological services to various socio-economic sectors. The Adversarial Autoregressive Network (AANet) is a nowcasting model for convective precipitation. Based on NowcastNet, AANet integrates and enhances the generative adversarial network framework, refines the generation method of the provisional forecasted data, incorporates the multi-head attention mechanism and SSIM loss, and proposes a two-stage adversarial strategy.
Figure 1 depicts the process and schematic diagram of the Adversarial Autoregressive Network (AANet). Utilizing multiple polarized radar variables as the input, FURENet generates the provisional forecasted data, the Semantic Synthesis Model (SSM) generates the final forecasted data, and the discriminator is employed to distinguish observed data from the final forecasted data (as well as the provisional forecasted data).

2.2.1. FURENet

FURENet adopts UNet [18] as its backbone network and utilizes the Squeeze-and-Excitation module (SE module) [24] to delay the fusion of features of multiple polarization variables. The Delayed Fusion Strategy [17,25,26,27] is usually applied to learning the complex relationships among various different kinds of information. When intricate relationships exist in different input variables, the input layer forcibly combines different input variables through linear operations, thereby causing the information entanglement effect. However, the Delayed Fusion Strategy can effectively address such issues. Let the tensor χ i R N × H × W represent a specific section of the polarized radar observation data (where i is 1, 2, and 3, respectively, representing Z H , Z D R , and K D P , N represents the time length, H represents the length, and W represents the width), and f e n c o d e r 1 and f d e c o d e r 1 , respectively, represent the encoder features and decoder features of FURENet. The specific process of FURENet is shown in Equations (3) and (4).
F n i n = 0 , 1 , , 4 = f e n c o d e r 1 χ i 1 T : 0 , i = 1 , 2 , 3
χ 1 ~ 1 : T = f d e c o d e r 1 F n i i = 1 , 2 , 3 n = 0 , 1 , 2 , 3 , 4
χ i 0 denotes the data of the specific polarization variable at the present moment; F n i represents the feature of the n-th-level semantic layer of the particular polarization variable data.
Equation (3): The encoder learns the features of the polarization variables of consecutive multiple frames and implements downsampling operations layer by layer semantically; the encoder outputs the semantic features ( F n i ) in each semantic layer and ultimately outputs the semantic feature sets ( F n i n = 0 , 1 , , 4 ). Equation (4): The decoder learns various top-level semantic features ( F n i i = 1 , 2 , 3 , n = 4 ); by combining various same-level semantic features ( F n i i = 1 , 2 , 3 , n = k , k < 4 ), the decoder conducts upsampling on the semantic features; after multiple operations of layer semantics, the decoder obtains the provisional forecasted data ( χ 1 ~ 1 : T ).
FURENet comprises three encoders and one decoder, and its structure is presented in Figure 2. The encoder consists of four encoding blocks and a convolutional layer (5 × 5 convolution kernel). The encoder blocks include a residual block [28], a bilateral downsampling, and two residual blocks. The bilateral downsampling includes a convolutional layer (with a 5 × 5 convolution kernel) and Maxpooling. The decoder is composed of the SE block, four decoder1 blocks, a bilateral upsampling, and a convolutional layer (5 × 5 convolution kernel). The bilateral upsampling includes Convtranspose [29] and bilinear interpolation [30]. The SE block comprises a global pooling layer, a linear layer, a Tanh layer, a linear layer, and a Sigmoid layer [31]. The decoder1 block is composed of a bilateral upsampling, a convolutional layer (3 × 3 convolutional kernel), and two residual blocks. The normalization functions and activation functions of the convolutional layer and the residual block are group normalization [32] and Tanh [33], respectively, except for the output layer.

2.2.2. Semantic Synthesis Model

The Semantic Synthesis Model (SSM) consists of three parts, namely the encoder, the decoder, and the noise generator. The encoder encodes and learns the past observed data as well as the provisional forecasted data to acquire the data distribution characteristics ( F n = 4 i n ). The decoder decodes the data distribution feature ( F n = 4 i n ) and assigns it a global self-attention mechanism and a standardized affine transformation. Due to the influence of SPADE [22], the decoder will focus on the provisional predicted data and ignore the past observed data. The noise generator generates a learnable noise distribution feature (z) and cascades it with the data distribution feature ( F n = 4 i n ), so the noise distribution feature (z) deepens the complex relationship between the provisional forecasted data and the past observed data. The noise generator uses VGGNet [34] as the backbone network. Let the tensor χ 1 R N × H × W be used to represent the observed data (where N represents the time length, H represents the length, and W represents the width), ε R N × H / 4 × W / 4 represents the random noise distribution, and f e n c o d e r 2 , f V G G N e t , and f d e c o d e r 2 represent the encoder feature, noise generator feature, and decoder feature of the SSM, respectively. The specific process is shown in Equations (5)–(7) as follows:
F n = 4 i n = f e n c o d e r 2 χ 1 1 T : 0 , χ 1 ~ 1 : T
F Z = f V G G N e t z , z N ( 0 , 1 )
χ 1 ^ 1 : T = f d e c o d e r 2 F n = 4 i n , F Z χ 1 ~ 1 : T
Equation (5): The encoder encodes and downsamples the past observation data ( χ 1 1 T : 0 ) as well as the provisional forecasted data ( χ 1 ~ 1 : T ), ultimately obtaining the data distribution feature ( F n = 4 i n ). Equation (6): The noise generator encodes and downsamples Gaussian noise (with a mean of 0 and a variance of 1), ultimately obtaining the noise distribution feature ( F Z ). Equation (7): The data distribution characteristics and the noise distribution characteristics are cascaded; with the intervention of SPADE (where the feature normalization undergoes spatially affine transformation using the provisional forecasted data), the decoder decodes and upsamples the features and ultimately obtains the final forecasted data.
SSM consists of three parts, namely the encoder, the noise generator, and the decoder, and its specific structure is shown in Figure 3. The encoder consists of four encoder blocks and a convolutional layer (5 × 5 convolution kernel). The encoder block is composed of a residual block [28], a bilateral downsampling, and two residual blocks [28]. The normalization function is group normalization, and the activation function is Tanh. The decoder is composed of a convolutional layer (3 × 3 convolution kernels), five decoder2 blocks, and a convolutional layer (3 × 3 convolution kernels). The decoder2 block consists of bilateral upsampling, a convolutional layer (3 × 3 convolution kernels), a dual-head self-attention mechanism, a convolutional layer (3 × 3 convolution kernels) and a SPADE ResBlk [22]. The multi-head self-attention mechanism is composed of two self-attention [23] cascades. The ResBlk consists of two SPADEs and a Residual Skip. The principles of self-attention and spatially adaptive (de)normalization (SPADE [22]) will be presented subsequently.
The self-attention mechanism [23] is that the model reassigns the weights of features, meaning that the feature map has a global attention mechanism. Assume that the tensor ( X R C × H × W ) represents the variable (where C represents the number of channels, H represents the length, and W represents the width), and the tensor ( ω R C × H ω × W ω ) represents the feature (with C representing the number of channels, H ω representing the length, and W ω representing the width), and the principle is depicted as shown in Equations (8)–(13).
Q c , h , w = i = 0 C 1 j = 0 H ω 1 k = 0 W ω 1 ω Q i , j , k X i n i , h H ω / 2 + j , w W ω / 2 + k
K c , h , w = i = 0 C 1 j = 0 H ω 1 k = 0 W ω 1 ω K i , j , k X i n i , h H ω / 2 + j , w W ω / 2 + k
V c , h , w = i = 0 C 1 j = 0 H ω 1 k = 0 W ω 1 ω V i , j , k X i n i , h H ω / 2 + j , w W ω / 2 + k
α = Q K T C , α ( c 1 , c 2 ) = j = 0 H 1 k = 0 W 1 Q c 1 , j , k K c 2 , j , k C
α ^ = s o f t m a x α , α ^ c 1 , c 2 = e α c 1 , c 2 i = 0 C 1 e α c 1 , i
X o u t = α ^ V , X o u t c , h , w = i = 0 C 1 α ^ c , i V i , h , w
The length (H) and the width (W) are equal; X i , j , k denotes the element of the tensor (X); and (i,j,k) represents the coordinates.
The principle of the self-attention mechanism [23]: f Q , K , V = s o f t m a x Q K T d V , which d is the number of channels. Equations (8)–(10): the autoregressive model encodes and learns the feature weights to obtain three tensors, including query ( Q R C × H × W ), key ( K R C × H × W ), and value ( V R C × H × W ). Equation (11): tensor Q is transformed into matrix Q ( R C × ( H × W ) R C × ( H × W ) ), and tensor K is also transformed into matrix K ( R C × ( H × W ) R C × ( H × W ) ); then, a matrix operation is performed on Q and K T and multiplied by factor 1 C to obtain α ( R C × C ). Equation (12): the softmax function is applied to the first dimension of α to obtain α ^ ( R C × C ). Equation (13): tensor V is transformed into matrix V ( R C × ( H × W ) R C × ( H × W ) ); a matrix operation is carried out between α ^ and V to obtain X ( R C × ( H × W ) ).
The principle of SPADE [22]: Under the intervention of semantic conditions, a spatial affine transformation is carried out for normalization. SPADE reprocesses instance normalization [35]: the spatial affine transformation of normalization, which implements spatial adaptive adjustment for normalization under the intervention of semantic conditions. Suppose the tensor X R C × H × W is used to represent the variable (where C represents the number of channels, H represents the length, and W represents the width), and the tensor ω R C × H ω × W ω is used to represent the feature (where C represents the number of channels, H ω represents the length, and W ω represents the width). The specific principle is detailed as shown in Equations (14)–(18).
μ c = 1 H × W i = 0 H 1 j = 0 W 1 X i n c , i , j
σ c 2 = 1 H × W i = 0 H 1 j = 0 W 1 X i n c , i , j μ c 2
γ c , h , w = i = 0 C 1 j = 0 H ω 1 k = 0 W ω 1 ω γ i , j , k X s c i , h H ω / 2 + j , w W ω / 2 + k
β c , h , w = i = 0 C 1 j = 0 H ω 1 k = 0 W ω 1 ω β i , j , k X s c i , h H ω / 2 + j , w W ω / 2 + k
X o u t c , i , j = γ c , h , w X i n c , i , j μ c σ c + β c , h , w
The length H and the width W are equal; X i , j , k denotes the element of the tensor X, and (i,j,k) represents the coordinates; X s c denotes the semantic condition; μ denotes the mean value, and σ denotes the standard deviation.
Equations (14) and (15): the feature maps acquire their mean and standard deviation via specific formulas. Equations (16) and (17): The semantic conditions are encoded and learned through the autoregressive model in order to acquire the weight tensor γ and the bias tensor β. Equation (18): the normalization of the feature map employs a spatial affine transformation.

2.3. Training Principles

AANet is a Generative Adversarial Network [36], where the generator generates data, and the discriminator distinguishes between true and false data. The training principle of the Generative Adversarial Network: the generator and discriminator engage in a zero-sum game. With the continuous improvement of the vividness of the data generated by the generator, the discriminator must enhance its ability to distinguish between true and false. As the discriminator’s ability to distinguish true and false data becomes increasingly stronger, the generator has to improve its counterfeiting level.
The generator of AANet comprises two components. FURENet is accountable for generating the provisional forecasted data, while the Semantic Synthesis Model (SSM) generates the final forecasted data with the aid of the provisional forecasted data. The provisional forecasted data are helpful to enhance the authenticity and similarity of the final forecasted data. FURENet with better prediction performance indirectly improves the ability of SSM to generate forecasted data (it should be noted that FURENet and SSM are gradient-separated). Therefore, the discriminator deems the forecasted data (including the provisional and the final ones) as false data, while considering the observed data as true data.
For ease of description, let tensor Y R B × T × H × W represent the observation data (B represents Batchsize, T represents the number of time steps, H represents the length, and W represents the width), and tensor X R B × T × H × W represents the forecast data ( X 1 represents the provisional forecasted data, and X 2 represents the final forecasted data).

2.3.1. Mean Square Error Loss

To reduce the error distribution between generated and observed data, the mean square error loss function was used to calculate the error distribution between generated and observed data. Considering that the generator is a two-stage model, it is indispensable to minimize the error distribution of the data generated in the two-stage process.
J M S E = 1 T × H × W i = 0 B 1 j = 0 T 1 k = 0 H 1 l = 0 W 1 Y i , j , k , l X 1 i , j , k , l 2 + Y i , j , k , l X 2 i , j , k , l 2
X i , j , k , l denotes the elements of the tensor X, and (i,j,k,l) represents the coordinates.

2.3.2. Structural Similarity Loss

The autoregressive model suffers from the “regression to average” issue, namely, the generated data exhibits a propensity to converge towards the average value. SSIM Loss effectively alleviates the influence resulting from the “regression to average” issue and significantly enhances the similarity between the generated data and the observed data. The principle of SSIM [19,20] is shown in Equation (20) as follows:
S S I M x , y = 2 μ x μ y + c 1 2 σ x y + c 2 μ x 2 + μ y 2 + c 1 σ x 2 + σ y 2 + c 2
μ x and μ y are the mean values of x and y, respectively; σ x and σ y are the standard deviations of x and y, respectively; σ x y is the covariance of x and y; the constants are taken as K 1 = 0.01 , K 2 = 0.03 , and L indicates the size of the range of values, so c 1 = K 1 L 2 , c 2 = K 2 L 2 .
For SSIM Loss, it should be applied locally rather than globally. The standard deviation, and covariance were computed by a sliding convolution (with a window size of T × 11 × 11). The sliding convolution is the Gaussian weighted tensor ( w R T × 11 × 11 ) with circular symmetry characteristics (the standard deviation of w is 1.5, and the total value is 1 = j = 0 T 1 k = 0 11 1 l = 0 11 1 w j , k , l ). Let μ R B × R × C represent the mean (B represents batchsize, R represents the number of rows, and C represents the number of columns), and σ represents the standard deviation (covariance), as shown below:
μ F b , r , c = t = 0 T 1 i = 0 11 1 j = 0 11 1 w t , i , j F b , t , 11 × r + i , 11 × c + j , F = X , Y
σ F b , r , c 2 = t = 0 T 1 i = 0 11 1 j = 0 11 1 w t , i , j F b , t , 11 × r + i , 11 × c + j μ F b , r , c 2 , F = X , Y
σ X Y b , r , c = t = 0 T 1 i = 0 11 1 j = 0 11 1 w t , i , j X b , t , 11 × r + i , 11 × c + j μ X b , r , c Y b , t , 11 × r + i , 11 × c + j μ Y b , r , c
L S S I M X , Y = b = 0 B 1 r = 0 R 1 c = 0 C 1 1 2 μ X b , r , c μ Y b , r , c + c 1 2 σ X Y b , r , c + c 2 μ X b , r , c 2 + μ Y b , r , c 2 + c 1 σ X b , r , c 2 + σ Y b , r , c 2 + c 2
J S S I M = 1 R × C L S S I M X 1 , Y + L S S I M X 2 , Y  
X i , j , k , l denotes the elements of the tensor X, and (i,j,k,l) represents the coordinates.
Equations (21)–(23): local feature maps calculate the mean tensor, standard deviation tensor, and covariance tensor through circular-symmetric Gaussian weighted tensors. Equations (24) and (25): firstly, the mean tensor, standard deviation tensor, and covariance tensor are employed to compute the local SSIM value; subsequently, the local SSIM value is subtracted from 1 to obtain the local SSIM Loss; eventually, the average is computed based on all the local SSIM Loss.

2.3.3. Two-Stage Adversarial Loss

The discriminator distinguishes between true and false data. The discriminator is composed of a convolutional layer (1 × 1 convolution kernel), four D blocks, a convolutional layer (1 × 1 convolution kernel), a global pooling layer, and two linear layers, as detailed in Figure 4. The D block has a convolutional layer (5 × 5 convolutional kernels), an inception block [21], and a residual block [28]. The inception block has four branches such as jump connections, spatial features, spatiotemporal features, and pooling features: the jump connections are the 2D convolutional layer (1 × 1 convolutional kernel); the spatial features are the 2D convolutional layer (1 × 1 convolutional kernel), and the 2D convolutional layer (3 × 3 convolutional kernels); the spatiotemporal features are the 3D convolutional layer (1 × 1 × 1 convolutional kernel), and two 3D convolutional layers (3 × 5 × 5 convolutional kernels); the pooling features are the 2D convolutional layer (1 × 1 convolutional kernel) and Maxpool. The normalization and activation functions are Group Normalization and Tanh, respectively. Observations, provisional forecasts, and final forecasts are used as discriminator input data, with observations labeled 1 and provisional and final forecasts labeled 0. In the input data, the ratio of observed data to final forecasted data (and provisional forecasted data) is one.
Suppose G 1 represents the first-stage model (FURENet) of the generator, G 2 represents the second-stage model (SSM) of the generator, D represents the discriminator, p d a t a represents the random radar data distribution ( X 1 T : 0 represents the past observation data, and X 1 : T represents the future observation data), and p n o i s e z represents Gaussian noise. The two-stage adversarial strategy is presented as follows:
min G 1 min G 2 max D V D , G 1 , G 2 = E X 1 : T p d a t a X 1 T : T log D X 1 T : T         + E X 1 T : 0 p d a t a X 1 T : T log 1 D G 1 X 1 T : 0 , X 1 T : T         + E X 1 T : 0 p d a t a X 1 T : T z p n o i s e z log 1 D G 2 G 1 X 1 T : 0 , z , X 1 T : T
Equation (26): Try to find the optimal discriminator D that maximizes the likelihood of distinguishing between true data and false data; try to find the optimal generator G 2 that minimizes the likelihood that the generated data will be recognized by the discriminator; try to find the optimal generator G 1 that minimizes the likelihood that the generated data will be recognized by the discriminator. In a zero-sum game, the generator tries to cheat the discriminator: generator G 2 boosts the faking ability and therefore generates more realistic fake data; based on the increased faking ability, generator G 1 also generates more realistic faking data, so generator G 2 receives better faking material and generates more realistic faking data. The specific application differs from the above theory, as described in Algorithm 1.
Algorithm 1. The detailed process explanation of the two-stage adversarial strategy.
Input: past observations Y 1 , future observations Y 2 , provisional forecasts X 1 , final forecasts X 2 , which the shape of the input data is R B × T × H × W .
1: Four random cropping operations ( R 2 B × T × H × W R 2 B × T × H 2 × W 2 ) are performed on Y 1 , Y 2 . It is assigned the label 1, and concatenated along the first dimension;
p y = y 0 , 0 2 T × H 2 × W 2 , 1 , . . . , y 0,3 2 T × H 2 × W 2 , 1 , . . . , y B 1 , 3 2 T × H 2 × W 2 , 1
2: Two random cropping operations ( R 2 B × T × H × W R 2 B × T × H 2 × W 2 ) are performed on Y 1 , X 1 . It is assigned the label 1, and concatenated along the first dimension;
p x 1 = x 1 0 , 0 2 T × H 2 × W 2 , 0 , x 1 0 , 1 2 T × H 2 × W 2 , 0 , . . . , x 1 B 1,1 2 T × H 2 × W 2 , 0
3: Two random cropping operations ( R 2 B × T × H × W R 2 B × T × H 2 × W 2 ) are performed on Y 1 , X 2 . It is assigned the label 1, and concatenated along the first dimension;
p x 2 = x 2 0 , 0 2 T × H 2 × W 2 , 0 , x 2 0 , 1 2 T × H 2 × W 2 , 0 , . . . , x 2 B 1,1 2 T × H 2 × W 2 , 0
4: Connect p y , p x 1 , and p x 2 and randomly sample m times;
s , k p y , p x 1 , p x 2 ,   s a m p l e   4 B   t i m e s
5: Calculating cross-entropy loss.
J T a d v = i = 0 4 B 1 k log 10 D s + 1 k log 10 1 D s

2.3.4. Total Loss

The generator and the discriminator engage in a performance race, and the total loss function allows the generator and the discriminator to develop in adversity, with both models becoming more and more superior in performance.
L = α J M S E + β J S S I M + J T a d v
The parameters α and β are constants. In the experiment, α and β are, respectively, 0.5; nevertheless, setting α to 1 and β to 4 produces better results.

3. Results

3.1. Implementation Details

The “NJU_CPOL_update2308” dataset is the data for the 20th “Huawei Cup” China Graduate Student Mathematical Modeling Contest. This dataset is the test data of Problem F of “Nowcasting of Severe Convective Precipitation”. The continuous precipitation process is represented by using three different continuous multi-frame grid data ( Z H , Z D R , and K D P ). The time interval between two adjacent frames of data is 6 min, and this interval is also the scanning interval of the dual-polarization radar. The shape of the grid data is 256 × 256, and the corresponding plane area is 256 km × 256 km. The NJU_CPOL_update2308 dataset encompasses a considerable quantity of invalid data. Hence, manual screening is imperative, and radar data quality control [37] (encompassing data alignment, outlier elimination, removal of non-meteorological echoes, and data filtering) should be executed. The initial size of the NJU_CPOL_update2308 dataset was approximately 89 G. After manual screening and radar data quality control, the size of the dataset was reduced to 25.5 G.
The deep learning framework is pytorch, with the optimizer using AdamW and learning rate strategy using CosineAnnealingLR. The learning rate of the first-stage model of the generator is set at 4 × 10 4 . The learning rate of the second-stage model of the generator is set at 4 × 10 5 . The learning rate of the discriminator is set at 4 × 10 4 . The graphics processor uses NVIDIA GeForce RTX 3090, which has 24 G of video memory. In the contrast model, the optimizer employs AdamW, and the learning rate strategy implements CosineAnnealingLR. The learning rate of the compared model is consistent with that of this study. Based on the use of the pre-trained model, the number of training iterations is 100.

3.2. Forecast Measurement Indicators

In this study, eight forecast measurement indicators such as normalized error (NE), root mean square error (RMSE), peak signal-to-noise ratio (PSNR), structural similarity (SSIM), critical success index (CSI), probability of detection (POD), F1, and false alarm ratio (FAR) are used to measure the excellence of the model. NE and RMSE describe the error of the forecasted data concerning the observed data. PSNR is used to represent the quality of the predicted data. SSIM describes the degree of similarity between the forecasted data and the observed data, with a value of 1 indicating the same and a value of −1 indicating not at all. CSI, POD, and F1 delineate the probability of success for the forecast. The FAR describes the probability of forecasted failure. Assuming that the tensor Y R T × H × W represents the observed data (T represents the number of time steps, H represents the length, and W represents the width), and the tensor X R T × H × W represents the forecasted data, the formulas for NE, RMSE, PSNR, SSIM, CSI, POD, F1, and FAR are given as follows (detailed formulas for SSIM are given in Equations (20)–(23), with the Batchsize set at 1):
N E = t = 0 T 1 h = 0 H 1 w = 0 W 1 Y t , h , w X t , h , w t = 0 T 1 h = 0 H 1 w = 0 W 1 Y t , h , w
R M S E = 1 T × H × W t = 0 T 1 h = 0 H 1 w = 0 W 1 Y t , h , w X t , h , w 2
P S N R = 20 log 10 65 R M S E
S S I M X , Y = 2 μ X μ Y + c 1 2 σ X Y + c 2 μ X 2 + μ Y 2 + c 1 σ X 2 + σ Y 2 + c 2
C S I = T P T P + F P + F N
P O D = T P T P + F N
F 1 = 2 T P 2 T P + F P + F N
F A R = F P T P + F P
X i , j , k denotes the elements of the tensor X, and (i,j,k) represents the coordinates; TP indicates that positive examples are true; FP indicates that negative examples are false; FN indicates that positive examples are false.

3.3. Ablation Experiments

To validate the performance of the model, AANet is compared with other models. The evolution method (E) of NowcastNet [6] generates the predicted data in two stages. Firstly, it uses the motion field and intensity residuals, and then, it uses the iterative evolution operator to generate the provisional forecasted data. The autoregressive method (A) of FURENet [17] is to generate provisional forecasted data directly. To validate the superiority of the evolutionary method (E) and the autoregressive method (A), these two methods were, respectively, implemented in FURENet and UNet. For the models that adopt the evolution method, their names are prefixed with “E”; for the models that utilize the autoregressive method, their names are prefixed with “A”. The evolutionary method (E) applies the objective function of NowcastNet [6] (except for the adversarial loss), and the autoregressive method (A) applies the objective of the function in this paper (except for the adversarial loss). To verify whether SSIM Loss can suppress the “regression to average” issue, comparative experiments with and without SSIM Loss are conducted on FURENet and UNet. To compare AANet and NowcastNet, the forecasted results were compared using AANet and NowcastNet. In order to verify the effectiveness of the two-stage adversarial strategy, AANet, AANet-adv (using the traditional adversarial loss), and AANet-noadv (not using the adversarial loss) are used for comparison. Finally, ablation experiments were conducted on AANet at each stage.

3.3.1. Autoregressive Method and Evolutionary Method

The autoregressive method (A) and the evolutionary method (E) are forecasting methods. The autoregressive method (A) and evolutionary method (E) were used for FURENet and UNet, respectively, to verify the superiority of the autoregressive method (A) and evolutionary method (E). A-FURENet is the first-stage generator model of AANet; E-UNet is the first-stage generator model of NowcastNet, specifically as shown in Figure 5, Figure 6 and Figure 7, Table 1.
As can be seen in Figure 5, the autoregressive method (A) has a better forecasted performance and still captures more details in the larger future time. At smaller moments, the forecasted performance of the evolutionary method (E) and the autoregressive method (A) are comparable; as the time steps increase, the forecasting ability of the evolutionary method (E) decreases significantly, while the autoregressive method (A) still maintains a certain level of forecasting ability.
In the NE, RMSE, and FAR indicators, the autoregressive method (A) suppresses error better than the evolutionary method (E). In FURENet, the median line and variance of A-FURENet are smaller than the median line and variance of E-FURENet, and A-FURENet’s broken line is also below E-FURENet’s broken line, which is shown in Figure 6 and Figure 7. In UNet, the median line and variance of A-UNet are smaller than the median line and variance of E-UNet, and A-UNet’s broken line is also below E-UNet’s broken line, which is shown in Figure 6 and Figure 7. The increase in the NE value means that the model has the conditions of overfitting or underfitting. The increase in the RMSE value indicates that the deviation degree of the forecasted value from the true value is larger. The increase in the FAR value reveals that the probability of the model’s wrong prediction is higher. Compared to the evolutionary methods, the autoregressive methods suppress overfitting or underfitting, reduce the deviation between forecasted data and true data, and reduce the probability of false forecasts. The autoregressive method (A) lacks the evolution operator of the evolution method (E), in which the evolution operator generates the positional error accumulated by the iterative evolution process, so the autoregressive method (A) only has the error caused by the “regression to average” issue.
In the PSNR and SSIM indicators, the autoregressive methods (A) are better than the evolutionary methods (E). In FURENet, the median line of A-FURENet is higher than the median line of E-FURENet, the variance of A-FURENet is smaller than the variance of E-FURENet, and the broken line of A-FURENet is higher than the broken line of E-FURENet, which is shown in Figure 6 and Figure 7. In UNet, the median line of A-UNet is higher than the median line of E-UNet, the variance of A-UNet is smaller than the variance of E-UNet, and the broken line of A-UNet is higher than the broken line of E-UNet, shown in Figure 6 and Figure 7. A smaller value of PSNR indicates a more ambiguous prediction; a larger value of SSIM indicates that the forecasted data are more similar to the real data. The autoregressive method (A) provides more accurate and more similar forecasted data compared to the evolutionary method (E). The reason is that the error in the autoregressive method only stems from the “regression to average” issue, and there is no error accumulated due to the iterative evolution process.
In the CSI, POD, and F1 indicators, the autoregressive methods (A) are better than the evolutionary methods (E). In FURENet, the median line of A-FURENet is higher than the median line of E-FURENet, the variance of A-FURENet is smaller than the variance of E-FURENet, and the broken line of A-FURENet is higher than the broken line of E-FURENet, shown in Figure 6 and Figure 7. In UNet, the median line of A-UNet is higher than the median line of E-UNet, the variance of A-UNet is smaller than the variance of E-UNet, and the broken line of A-UNet is higher than the broken line of E-UNet (except for the broken line of POD), shown in Figure 6 and Figure 7. The larger the CSI value, the higher the prediction success rate of the model (considering the hit rate, false alarm rate, and missed alarm rate comprehensively); the larger the POD value, the higher the probability of the model’s prediction hit (considering the hit rate and missed alarm rate comprehensively); the larger the F1 value, the larger the POD and the smaller the FAR. The CSI and F1 of the autoregressive method (A) are greater than the CSI and F1 of the evolutionary method (E), but the POD of the autoregressive method (A) is comparable to the POD of the evolutionary method (E). The reason is that the evolutionary method (E) worsens the “re-gression to average” issue, which makes the prediction more ambiguous and leads to inflated values of POD.
As shown in Table 1, A-FURENet outperforms E-FURENet in forecast, with NE reduced by 0.1154, RMSE reduced by 0.557, PSNR improved by 2.05, SSIM improved by 0.0265, CSI improved by 7.16%, POD improved by 4.6%, F1 improved by 6.68%, and FAR reduced by 6.96%; the forecast of A-UNet is better than the forecast of E-UNet, with NE reduced by 0.142, RMSE reduced by 0.563, PSNR improved by 2.06, SSIM improved by 0.0293, CSI improved by 6.83%, POD improved by 3.46%, F1 improved by 6.25%, and FAR reduced by 6.52%. In FURENet and UNet models, the autoregressive method (A) provides similar performance enhancements for NE, RMSE, PSNR, SSIM, CSI, F1, and FAR. Still, the autoregressive method (A) provides a large difference in performance enhancement for POD. The autoregressive method (A) suffers from the “regression to average” issue, and the evolutionary method (E) exacerbates the “regression average” issue and leads to inflated POD values.
A-FURENet and E-UNet are the first-stage generators of AANet and NowcastNet, respectively, and A-FURENet has better performance with 0.1454 reduction in NE, 0.647 reduction in RMSE, 2.27 improvement in PSNR, 0.0322 improvement in SSIM, 8.44% improvement in CSI, 6.7% improvement in POD, 7.95% improvement in F1, and 5.89% reduction in FAR. It can be shown that the first-stage model of AANet’s generator outperforms the first-stage model of NowcastNet’s generator. In summary, the autoregressive method (A) accumulates less error in the forecasted process and has better forecasted performance.

3.3.2. Comparative Experiments of Models with and without SSIM Loss

To verify whether SSIM Loss can suppress the “regression to average” issue, comparative experiments with and without SSIM Loss are conducted on A-FURENet and A-UNet. As shown in Figure 8, Figure 9 and Figure 10, Table 2.
As can be seen in Figure 8, SSIM Loss gives the model the ability to capture more detail, effectively reducing the impact of the “regression to average” issue. At larger timescales, A-FURENet displays more data detail, but A-FURENet_NS (No SSIM Loss) uses the mean to override the data detail. Compared to A-UNet and A-UNet_NS, A-UNet_NS uses the mean to cover a large amount of detail, while A-UNet effectively suppresses the tendency to “regression to average”.
As shown in Box Figure 9, SSIM Loss has little effect on the model’s RMSE, PSNR, CSI, POD, F1, and FAR (excluding UNet’s FAR, the reason is that the presence of a large number of outliers), but SSIM Loss improves the model’s NE and SSIM performance significantly. Thus, SSIM Loss suppresses model overfitting or underfitting, and also facilitates the model to generate more realistic and more similar predictions. As shown in Broken Line Figure 10, SSLM Loss is only significant for the FAR of the model, and SSIM Loss has no significant effect on the other broken lines of the model. Therefore, SSIM Loss reduces the probability of model false forecasts. Based on the above phenomena, SSIM Loss suppresses the “regression to average” issue, suppresses model overfitting or underfitting, and reduces the ambiguity of the predictions, thus reducing the probability of false predictions.
As shown in Table 2, SSIM Loss acts positively on the model. For A-FURENet, NE decreased by 0.0107, RMSE decreased by 0.005, PSNR increased by 0.15, SSIM increased by 0.0044, CSI increased by 0.53%, POD increased by 0.15%, F1 increased by 0.57%, and FAR decreased by 0.97%. For A-UNet, NE decreased by 0.0162, RMSE decreased by 0.008, PSNR increased by 0.2, SSIM increased by 0.0044, CSI increased by 0.47%, POD decreased by 0.67%, F1 increased by 0.53%, and FAR decreased by 2.1%. In summary, SSIM Loss has a significant effect on the model’s FAR. SSIM Loss suppresses the “regression to average” issue, which reduces the ambiguity of the forecast and ultimately reduces the probability of false forecasts.

3.3.3. AANet and NowcastNet

The forecasted performance of AANet and NowcastNet is verified and the forecasted results are shown in Figure 11, Figure 12 and Figure 13, Table 3.
As shown in Figure 11, the forecasted effect of AANet is better than the forecasted effect of NowcastNet. As the number of time steps increases, NowcastNet’s forecast gradually converge to the mean (using the mean to cover all details), but AANet provides more accurate forecast (showing the more details).
Box plot of model forecasted effect as shown in Figure 12. AANet vs. NowcastNet: for NE, RMSE, and FAR, the median line of AANet and the variance of AANet are smaller than the median line of NowcatNet and variance of NowcatNet; for PSNR, SSIM, CSI, POD, and F1, the median line of AANet is higher than the median line of NowcastNet, and the variance of AANet is smaller than the variance of NowcatNet. As shown in Figure 13, the forecasted performance of AANet is better than NowcastNet in all future moments. AANet compared to NowcastNet: for NE, RMSE, and FAR, AANet is smaller than NowcastNet in all future moments; for PSNR, SSIM, CSI, POD, and F1, AANet is greater than NowcastNet in all future moments. Since AANet uses the autoregressive method (A), SSIM Loss, and the two-stage adversarial loss. The autoregressive method (A) only suffers from the “regression to average” issue; SSIM Loss suppresses the “regression to average” issue; the two-stage adversarial loss allows the generator to produce more realistic and more similar provisional and final forecasted data. Because NowcastNet uses the evolutionary method (E) and an imperfect adversarial strategy (Generating data and observational data are defined as true data, ignoring the significance of provisional generated data.), NowcastNet accumulates excessive errors and mistakenly defines generating data as true data.
As shown in Table 3, AANet vs. NowcastNet, the performance of AANet is superior: 0.0763 reduced in NE, 0.377 reduced in RMSE, 1.45 improved in PSNR, 0.0208 improved in SSIM, 5.78% improved in CSI, 6.25% improved in POD, 5.7% improved in F1, and 4.2% reduced in FAR. Because the autoregressive method (A) and SSIM Loss reduce the excessive errors, the NE, RMSE and FAR of AANet are smaller than the NE, RMSE and FAR of NowcastNet. Becuase the two-stage adversarial loss promotes AANet’s generator to generate more realistic and more similar provisional and final forecasted data, and the provisional forecasted data promotes AANet’s second-stage generator to generate more similar and more realistic final forecasted data. AANet’s PSNR, SSIM, CSI, POD, and F1 are greater than NowcastNet’s PSNR, SSIM, CSI, POD and F1. In summary, the forecasted performance of AANet is better than NowcastNet.

3.3.4. AANet with or without Adversarial Strategies

Verify whether the two-stage adversarial strategy works positively by comparing AANet, AANet_adv (the traditional generative adversarial strategy with no use of provisional forecasted data in the adversarial stage), and AANet_Noadv (no use of adversarial strategy), with the forecasted effects as shown in Figure 14, Figure 15 and Figure 16, Table 4.
As shown in Figure 14, the adversarial strategy acts positively on AANet. As the number of future moments increases, the adversarial strategy gradually takes effect, enhancing the model’s ability to fake. Since AANet of the two-stage adversarial strategy is better than AANet of the traditional adversarial strategy, AANet of the two-stage adversarial strategy provides more realistic pseudo-data.
A box plot of model forecasted effect as shown in Figure 15. Although the adversarial strategy exacerbates the model’s FAR, the adversarial strategy improves model’s other performance. For both NE and RMSE, the adversarial strategy effectively reduces the errors in the forecasted data, whereas the two-stage adversarial strategy outperforms the traditional adversarial strategy; for PSNR, SSIM, and POD, the adversarial strategy effectively improves the accuracy of the forecasted data, in which the two-stage adversarial strategy is significantly better than the traditional adversarial strategy; for CSI and F1, the adversarial strategy effectively improves the accuracy of the forecasted data, where the traditional adversarial strategy outperforms the two-stage adversarial strategy.
As shown in Figure 16, the adversarial strategy plays a positive role in the forecasted performance of AANet. As the future moments increase, the broken lines of AANet and AANet-adv are gradually separated, and the two-stage adversarial strategy acts negatively on the decay rate of the forecasted performance (better than the traditional adversarial strategy), which effectively suppresses the weakening of the forecasted performance.
As shown in Table 4, the adversarial strategy plays a positive role in the forecasted performance, with AANet having the best forecast, AANet-adv having the second best, and AANet-Noadv having the worst. The AANet with traditional adversarial strategy (AANet-adv) outperforms the AANet without adversarial strategy (AANet-Noadv), where NE is reduced by 0.0223, RMSE is reduced by 0.112, FAR is reduced by 1.36, PSNR is improved by 0.49, SSIM is improved by 0.0076, CSI is improved by 2.48, POD is improved by 3.19, and F1 is improved by 2.54. AANet outperforms AANet with traditional adversarial strategies (AANet-adv), where NE is reduced by 0.0157, RMSE is reduced by 0.083, FAR is reduced by 0.13, PSNR is improved by 0.29, SSIM is improved by 0.0042, CSI is improved by 1.47, POD is improved by 2.66, and F1 is improved by 1.52. The traditional adversarial strategy promotes the generator to produce more realistic and more similar final forecasted data, and the two-stage adversarial strategy promotes the generator to produce more realistic and more similar final forecasted data through more realistic and more similar provisional forecasted data.
In summary, both the two-stage adversarial strategy and the traditional adversarial strategy play a positive role in the model’s forecasted performance, but the two-stage adversarial strategy is the most effective.

3.3.5. Ablation Experiments of AANet

As shown in Table 5, ablation experiments are performed on AANet. FURENet is the first-stage generative model of AANet, Semantic Synthesis Model (SSM) is the second-stage generative model of AANet. SSIM Loss, Tadv Loss, and adv Loss are all loss functions (Tadv Loss is the two-stage adversarial loss, and adv Loss is the traditional adversarial loss).
FURENet suffers from the “regression to average” issue; thus, FURENet can only generate ambiguous forecasted data. SSIM Loss suppresses the “regression to average” issue; thus, SSIM Loss has a positive effect on FURENet. The main task of SSM is to reprocess the generated data of FURENet (NowcastNet uses a second-stage model to process the generated data of UNet), but SSM acts negatively on FURENet due to multi-stage models’ easily accumulated errors. SSIM Loss reduces the errors that the multi-stage model accumulates, but the forecasted effect is not as good as the combination of the FURENet and SSIM Loss. Both the traditional adversarial strategy and the two-stage adversarial strategy work positively on the multi-stage model, but the two-stage adversarial strategy is superior. The two-stage adversarial strategy is superior to the traditional confrontation strategy: traditional adversarial strategies only uses the final generated data but ignores the importance of the provisional generated data (the first-stage model is gradient separated from the second-stage model); with the two-stage adversarial strategy, the first-stage model generates more similar and more realistic provisional forecasted data, and the second-stage model generates more similar and more realistic final forecasted data through using the temporary forecast data. If the parameters α and β of Equation (27) are set to 1 and 4, AANet achieves even better results with NE = 0.2772, RMSE = 1.893, PSNR = 31.91, SSIM = 0.9177, CSI = 72.70%, POD = 79.97%, F1 = 80.16%, and FAR = 17.36%.

4. Conclusions

The previous nowcasted deep learning models have the “regression to average” issue and cannot provide a complete generative adversarial strategy. In this study, the Adversarial Autoregressive Network (AANet) is proposed, which uses the structural similarity loss (SSIM Loss) function and proposes Two-stage Adversarial Loss (Tadv Loss), and the dataset is “NJU_CPOL_update2308”. To verify the effect of SSIM Loss and Tadv Loss, AANet is compared with the current nowcasted model in a comparative experiment. The results show that SSIM Loss reduces the impact of the “regression to average” issue, and Tadv Loss reduces the errors that the multi-stage model accumulates and promotes the model to generate more realistic false data in both stages. SSIM Loss and Tadv Loss help our model outperform the baseline model regarding forecasted effectiveness. The main contributions of this study are as follows:
  • The previous model employed multiple convolutional neural networks to generate future fields and implicitly and iteratively evolved them to produce forecasted data. Nevertheless, convolutional neural networks are afflicted by the “regression to average” issue, and simple iterative evolution is susceptible to errors. Therefore, the generation is directly the forecasted data via FURENet, and the SSIM Loss is employed to mitigate the impact caused by the “regression to the mean” issue.
  • The previous model adopted multi-stage generation to forecasted data (first generating provisional forecasted data and then generating the final forecasted data) and calibrated the final forecasted data with the help of a discriminator. However, it ignored the importance of the provisional forecasted data and failed to reduce the errors that the multi-stage model accumulates. Therefore, Tadv Loss is used to reduce the errors that the multi-stage model accumulates and enhances the forecasted performance in two stages.
Therefore, this study provides a feasible and forecasted solution based on generative adversarial networks and dual-polarization radar observations.
Limitations of this work: The model does not provide forecasted data of the next 2–3 h, and the dataset lacks radar observations of longer duration.
Future work outlook: Based on deep learning methods and spatial affine trans-formations, we research a forecasted model that can provide longer-term (2–3 h) forecasted data. We work together with the local weather bureau to obtain a larger sample of data from the same geographic area and employ AANet in the domain of temperature prediction for forecasting the temperature in the future period.

Author Contributions

Conceptualization, P.C. and T.L.; methodology, P.C. and T.L.; software, P.C.; validation, P.C. and T.L.; formal analysis, P.C.; writing—original draft preparation, P.C. and T.L.; writing—review and editing, P.C. and T.L.; supervision, T.L. and H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded in part by Quality Engineering Project of Guangdong Ocean University: Industry-Education Integration Practice Teaching Base for Software Engineering Major of Guangdong Ocean University and Chinasoft International, Project Number: PX-129223525, in part by The key special project of the National Key Research and Development Program of the “135” Plan: “Research on the Technology System of Full-process Monitoring and Traceability of Cold Chain Logistics of Aquatic Products Based on Internet of Things Technology” (2019YFD0901605).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Yongguang, Z.; Xiaoling, Z.; Qingliang, Z.; Yihong, D.; Yun, C.; Lifu, H. Review on Severe Convective Weather Short Term Forecasting and Nowcasting. Meteorol. Mon. 2010, 36, 33–42. [Google Scholar] [CrossRef]
  2. Sun, J.; Ming, X.; James, W.; Isztar, Z.; Sue, P.; Jeanette, O.; Paul, J.; Dale, M.; Ping-Wah, L.; Brian, G.; et al. Use of NWP for Nowcasting Convective Precipitation: Recent Progress and Challenges. Bull. Am. Meteorol. Soc. 2014, 95, 409–426. [Google Scholar] [CrossRef]
  3. Ravuri, S.; Lenc, K.; Willson, M.; Kangin, D.; Lam, R.; Mirowski, P.; Fitzsimons, M.; Athanassiadou, M.; Kashem, S.; Madge, S. Skilful precipitation nowcasting using deep generative models of radar. Nature 2021, 597, 672–677. [Google Scholar] [CrossRef] [PubMed]
  4. Pulkkinen, S.; Nerini, D.; Pérez Hortal, A.A.; Velasco-Forero, C.; Seed, A.; Germann, U.; Foresti, L. Pysteps: An open-source Python library for probabilistic precipitation nowcasting (v1. 0). Geosci. Model Dev. 2019, 12, 4185–4219. [Google Scholar] [CrossRef]
  5. Ruzanski, E.; Chandrasekar, V.; Wang, Y. The CASA nowcasting system. J. Atmos. Ocean. Technol. 2011, 28, 640–655. [Google Scholar] [CrossRef]
  6. Zhang, Y.; Long, M.; Chen, K.; Xing, L.; Jin, R.; Jordan, M.I. Skilful nowcasting of extreme precipitation with NowcastNet. Nature 2023, 619, 526–532. [Google Scholar] [CrossRef]
  7. Chen, G.; Zhao, K.; Zhang, G.; Huang, H.; Liu, S.; Wen, L.; Yang, Z.; Xu, L.; Zhu, W. Improving Polarimetric C-Band Radar Rainfall Estimation with Two-Dimensional Video Disdrometer Observations in Eastern China. J. Hydrometeorol. 2017, 18, 1375–1391. [Google Scholar] [CrossRef]
  8. Haiguang, Z. A Review on the Development and Application of the Airborne Doppler Radar Technique. Adv. Earth Sci. 2010, 25, 453–462. [Google Scholar] [CrossRef]
  9. Shao, S.; Zhao, K.; Chen, H.; Chen, J.; Huang, H. Validation of a Multilag Estimator on NJU-CPOL and a Hybrid Approach for Improving Polarimetric Radar Data Quality. Remote Sens. 2020, 12, 180. [Google Scholar] [CrossRef]
  10. Zhao, K.; Huang, H.; Wang, M.; Lee, W.-C.; Chen, G.; Wen, L.; Wen, J.; Zhang, G.; Xue, M.; Yang, Z.; et al. Recent progress in dual-polarization radar research and applications in China. Adv. Atmos. Sci. 2019, 36, 961–974. [Google Scholar] [CrossRef]
  11. Kumjian, M.R. Principles and Applications of Dual-Polarization Weather Radar. Part I: Description of the Polarimetric Radar Variables. J. Oper. Meteorol. 2013, 1, 226–242. [Google Scholar] [CrossRef]
  12. Liu, C.; Zhang, A.; Chen, S. A variational approach for retrieving quantitative precipitation with s-band dual-polarization radar. J. Trop. Meteorol. 2022, 38, 422–432. [Google Scholar] [CrossRef]
  13. Wen, J.; Zhao, K.; Huang, H.; Zhou, B.; Yang, Z.; Chen, G.; Wang, M.; Wen, L.; Dai, H.; Xu, L.; et al. Evolution of microphysical structure of a subtropical squall line observed by a polarimetric radar and a disdrometer during OPACC in Eastern China. J. Geophys. Res. Atmos. 2017, 122, 8033–8050. [Google Scholar] [CrossRef]
  14. Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. arXiv 2015, arXiv:1506.04214. [Google Scholar]
  15. Baltrušaitis, T.; Ahuja, C.; Morency, L.P. Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 423–443. [Google Scholar] [CrossRef]
  16. Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N.; Prabhat. Deep learning and process understanding for data-driven Earth system science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef] [PubMed]
  17. Pan, X.; Yinghui, L.; Kun, Z.; Hao, H.; Mingjun, W.; Haonan, C. Improving Nowcasting of convective development by incorporating polarimetric radar variables into a deep-learning model. Geophys. Res. Lett. 2021, 48, e2021GL095302. [Google Scholar] [CrossRef]
  18. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, part III 18. Springer International Publishing: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar] [CrossRef]
  19. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
  20. Bakurov, I.; Buzzelli, M.; Schettini, R.; Castelli, M.; Vanneschi, L. Structural similarity index (SSIM) revisited: A data-driven approach. Expert Syst. Appl. 2022, 189, 116087. [Google Scholar] [CrossRef]
  21. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
  22. Park, T.; Liu, M.Y.; Wang, T.C.; Zhu, J.Y. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2337–2346. [Google Scholar]
  23. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
  24. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  25. Pandeya, Y.R.; Lee, J. Deep learning-based late fusion of multimodal information for emotion classification of music video. Multimed. Tools Appl. 2021, 80, 2887–2905. [Google Scholar] [CrossRef]
  26. Tseng, K.L.; Lin, Y.L.; Hsu, W.; Huang, C.Y. Joint sequence learning and cross-modality convolution for 3D biomedical segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6393–6400. [Google Scholar]
  27. Zhou, T.; Ruan, S.; Canu, S. A review: Deep learning for medical image segmentation using multi-modality fusion. Array 2019, 3, 100004. [Google Scholar] [CrossRef]
  28. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  29. Dumoulin, V.; Visin, F. A guide to convolution arithmetic for deep learning. arXiv 2016, arXiv:1603.07285. [Google Scholar]
  30. Mastyło, M. Bilinear interpolation theorems and applications. J. Funct. Anal. 2013, 265, 185–207. [Google Scholar] [CrossRef]
  31. Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 2018, 107, 3–11. [Google Scholar] [CrossRef]
  32. Wu, Y.; He, K. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  33. Dubey, S.R.; Singh, S.K.; Chaudhuri, B.B. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing 2022, 503, 92–108. [Google Scholar] [CrossRef]
  34. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  35. Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance normalization: The missing ingredient for fast stylization. arXiv 2016, arXiv:1607.08022. [Google Scholar]
  36. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
  37. Zhang, Y. Deep Learning-Based Quantitative Precipitation Estimation Study of Polarimetric Radar for Landfalling Typhoons in South China. Ph.D. Thesis, Nanjing University of Information Engineering, Nanjing, China, 2022. [Google Scholar] [CrossRef]
Figure 1. The process and schematic diagram of AANet.
Figure 1. The process and schematic diagram of AANet.
Sensors 24 04895 g001
Figure 2. The illustration of the FURENet structure.
Figure 2. The illustration of the FURENet structure.
Sensors 24 04895 g002
Figure 3. Schematic illustration of the SSM structure.
Figure 3. Schematic illustration of the SSM structure.
Sensors 24 04895 g003
Figure 4. Schematic diagram of the structure of the discriminator.
Figure 4. Schematic diagram of the structure of the discriminator.
Sensors 24 04895 g004
Figure 5. Visualization of the forecasted effect; the contour lines (light blue < white < black) are gradient lines of precipitation intensity.
Figure 5. Visualization of the forecasted effect; the contour lines (light blue < white < black) are gradient lines of precipitation intensity.
Sensors 24 04895 g005
Figure 6. Box plot of model forecasted effect, among which (a), (b), (c), and (d) represent E-FURENet, A-FURENet, E-UNet, and A-UNet, respectively. The orange line represents the median line, and the black circle symbolizes the outliers.
Figure 6. Box plot of model forecasted effect, among which (a), (b), (c), and (d) represent E-FURENet, A-FURENet, E-UNet, and A-UNet, respectively. The orange line represents the median line, and the black circle symbolizes the outliers.
Sensors 24 04895 g006
Figure 7. The forecasted effect at future moments, (a), (b), (c), and (d) represent E-FURENet, A-FURENet, E-UNet, and A-UNet, respectively.
Figure 7. The forecasted effect at future moments, (a), (b), (c), and (d) represent E-FURENet, A-FURENet, E-UNet, and A-UNet, respectively.
Sensors 24 04895 g007
Figure 8. Visualization of the forecasted effect; the contour lines (light blue < white < black) are gradient lines of precipitation intensity.
Figure 8. Visualization of the forecasted effect; the contour lines (light blue < white < black) are gradient lines of precipitation intensity.
Sensors 24 04895 g008
Figure 9. Box plot of model forecasted effect, among which (a), (b), (c), and (d) represent A-FURENet, A-FURENet_NS, A-UNet, and A-UNet_NS, respectively. The orange line represents the median line, and the black circle symbolizes the outliers.
Figure 9. Box plot of model forecasted effect, among which (a), (b), (c), and (d) represent A-FURENet, A-FURENet_NS, A-UNet, and A-UNet_NS, respectively. The orange line represents the median line, and the black circle symbolizes the outliers.
Sensors 24 04895 g009
Figure 10. The forecasted effect at future moments, (a), (b), (c), and (d) represent A-FURENet, A-FURENet_NS, A-UNet, and A-UNet_NS, respectively.
Figure 10. The forecasted effect at future moments, (a), (b), (c), and (d) represent A-FURENet, A-FURENet_NS, A-UNet, and A-UNet_NS, respectively.
Sensors 24 04895 g010
Figure 11. Visualization of the forecasted effect; the contour lines (light blue < white < black) are gradient lines of precipitation intensity.
Figure 11. Visualization of the forecasted effect; the contour lines (light blue < white < black) are gradient lines of precipitation intensity.
Sensors 24 04895 g011
Figure 12. Box plot of model forecasted effect, among which (a), and (b) represent AANet, and NowcastNet, respectively. The orange line represents the median line, and the black circle symbolizes the outliers.
Figure 12. Box plot of model forecasted effect, among which (a), and (b) represent AANet, and NowcastNet, respectively. The orange line represents the median line, and the black circle symbolizes the outliers.
Sensors 24 04895 g012
Figure 13. The forecast effect at future moments, (a) and (b) representing AANet and NowcastNet, respectively.
Figure 13. The forecast effect at future moments, (a) and (b) representing AANet and NowcastNet, respectively.
Sensors 24 04895 g013
Figure 14. Visualization of the forecast effect; the contour lines (light blue < white < black) are gradient lines of precipitation intensity.
Figure 14. Visualization of the forecast effect; the contour lines (light blue < white < black) are gradient lines of precipitation intensity.
Sensors 24 04895 g014
Figure 15. Box plot of model forecasted effect, among which (a), (b), and (c) represent AANet, AANet-adv, and AANet-Noadv, respectively. The orange line represents the median line, and the black circle symbolizes the outliers.
Figure 15. Box plot of model forecasted effect, among which (a), (b), and (c) represent AANet, AANet-adv, and AANet-Noadv, respectively. The orange line represents the median line, and the black circle symbolizes the outliers.
Sensors 24 04895 g015
Figure 16. The forecast effect at future moments, (a), (b), and (c) represent AANet, AANet-adv, and AANet-Noadv, respectively.
Figure 16. The forecast effect at future moments, (a), (b), and (c) represent AANet, AANet-adv, and AANet-Noadv, respectively.
Sensors 24 04895 g016
Table 1. The table presenting the average values of indicators for the forecasted methods (ranging from 6 min to 60 min).
Table 1. The table presenting the average values of indicators for the forecasted methods (ranging from 6 min to 60 min).
NERMSEPSNRSSIMCSIPODF1FAR
E-FURENet0.46062.78028.780.874061.3170.7969.4527.30
A-FURENet0.34522.22330.830.900568.4775.3976.1320.34
E-UNet0.49062.87028.560.868360.0368.6968.1826.23
A-UNet0.34862.30730.620.897666.8672.1574.4319.71
Table 2. The table presenting the average values of indicators for the forecasted methods (ranging from 6 min to 60 min).
Table 2. The table presenting the average values of indicators for the forecasted methods (ranging from 6 min to 60 min).
NERMSEPSNRSSIMCSIPODF1FAR
A-FURENet0.34522.22330.830.900568.4775.3976.1320.34
A-FURENet_NS0.35592.22830.680.896167.9475.2475.5621.31
A-UNet0.34862.30730.620.897666.8672.1574.4319.71
A-UNet_NS0.36482.31530.420.893266.3972.8273.9021.81
Table 3. The table presenting the average values of indicators for the forecasted methods (ranging from 6 min to 60 min).
Table 3. The table presenting the average values of indicators for the forecasted methods (ranging from 6 min to 60 min).
NERMSEPSNRSSIMCSIPODF1FAR
AANet0.32162.08831.240.907570.6378.9278.3119.96
NowcastNet0.39792.46529.790.886764.8572.6772.6124.16
Table 4. The table presenting the average values of indicators for the forecasted methods (ranging from 6 min to 60 min).
Table 4. The table presenting the average values of indicators for the forecasted methods (ranging from 6 min to 60 min).
NERMSEPSNRSSIMCSIPODF1FAR
AANet0.32162.08831.240.907570.6378.9278.3119.96
AANet-adv0.33732.17130.950.903369.1676.2676.7920.09
AANet-Noadv0.35962.28330.460.895766.6873.0774.2521.45
Table 5. The table presenting the average values of indicators for the forecasted methods (ranging from 6 min to 60 min).
Table 5. The table presenting the average values of indicators for the forecasted methods (ranging from 6 min to 60 min).
FURENetSSMSSIM LossTadv Lossadv LossNERMSEPSNRSSIMCSIPODF1FAR
0.35592.22830.680.896167.9475.2475.5621.31
0.34522.22330.830.900568.4775.3976.1320.34
0.37242.30830.140.893666.1272.9173.7622.53
0.35962.28330.460.895766.6873.0774.2521.45
0.33732.17130.950.903369.1676.2676.7920.09
0.32162.08831.240.907570.6378.9278.3119.96
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cai, P.; Huang, H.; Liu, T. Improving Nowcasting of Intense Convective Precipitation by Incorporating Dual-Polarization Radar Variables into Generative Adversarial Networks. Sensors 2024, 24, 4895. https://doi.org/10.3390/s24154895

AMA Style

Cai P, Huang H, Liu T. Improving Nowcasting of Intense Convective Precipitation by Incorporating Dual-Polarization Radar Variables into Generative Adversarial Networks. Sensors. 2024; 24(15):4895. https://doi.org/10.3390/s24154895

Chicago/Turabian Style

Cai, Pengjie, He Huang, and Taoli Liu. 2024. "Improving Nowcasting of Intense Convective Precipitation by Incorporating Dual-Polarization Radar Variables into Generative Adversarial Networks" Sensors 24, no. 15: 4895. https://doi.org/10.3390/s24154895

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop