1. Introduction
Quantitative precipitation estimation (QPE) is estimating the intensity of precipitation based on provided meteorological data. It plays a crucial role in meteorological forecasting, hydrological observations, and various other domains. The primary objective of QPE is to provide pixel-level estimates of precipitation intensity within the forecasted area [
1,
2,
3]. Accurate QPE is the key to precipitation forecasting operations, holding significant importance in mitigating the impact of hazardous intense convective rainfall and minimizing societal and economic losses.
Radar observation data has the following advantages: high spatiotemporal resolution, information richness, and broad coverage [
4]. Moreover, the strength of radar reflectivity is closely linked to precipitation intensity [
5]; rendering radar observations plays a crucial role in quantitative precipitation estimation [
6]. Currently, methods for estimating precipitation intensity at the pixel level using radar reflectivity data primarily include the traditional Z–R relationship and deep learning methods based on neural networks. The Z–R relationship characterizes the relationship between radar reflectivity intensity (Z) and precipitation intensity (R) using a simple power law equation [
7,
8,
9]:
. The coefficients a and b need to be determined based on local geographical conditions. However, the sensitivity of these parameters to different seasons is often ignored. Additionally, the Z–R relationship only considers a unilateral connection between reflectivity intensity at a single height level and precipitation [
10], typically performing well in predicting laminar precipitation but struggling to forecast convective heavy rainfall [
11,
12,
13]. This limitation is due to the fact that convective rainfall differs from laminar rainfall, as it involves not only horizontal airflow, but also intense vertical convection movements [
14]. This characteristic stresses the disparity of Z–R relationships between convective and laminar precipitation [
15]. These shortcomings illustrate the challenge in deriving a simple, universally applicable Z–R relationship for quantitative precipitation estimation that holds true for any location and time, given the diverse and dynamic nature of precipitation systems.
Many studies have attempted to improve the accuracy of quantitative precipitation estimation based on the traditional Z–R relationship. For example, Yoon et al. [
16] introduced a significant amount of historical data and employed least squares regression to minimize point-wise errors, resulting in optimal fitting parameters for the radar reflectivity and precipitation intensity relationship. Bringi et al. [
17] introduced differential reflectivity and differential phase shift from dual-polarization radar to the Z–R relationship, providing additional information about particle morphology and motion characteristics. While these approaches have enriched the Z–R relationship to some extent, they still do not address the problem of its applicability to different precipitation types. Chumchean et al. [
18] and Ramli et al. [
19] determined distinct Z–R relationships based on various precipitation types. However, selecting the appropriate Z–R relationship based on precipitation type is challenging in practical applications due to the lack of clear boundaries between different precipitation types. The choice of Z–R relationship often relies on subjective decisions made by radar operators or meteorologists, resulting in a lack of objectivity in precipitation estimation outcomes. In summary, the Z–R relationship has significant limitations in predicting precipitation accurately.
In recent years, deep learning models based on neural networks have provided new research directions for QPE due to their powerful nonlinear representation capabilities [
20,
21,
22,
23]. Compared with the traditional Z–R relationship, the deep learning model can learn the complex mapping relationship between radar data and precipitation intensity from extensive historical data, significantly improving the accuracy of QPE. For example, Tan et al. [
24] proposed a deep neural network with both radar and surface precipitation observations for QPE. Chen et al. [
25] designed a neural network that fuses dual-polarization radar data and satellite observations to jointly estimate precipitation intensity. Wang et al. [
26] employed a convolutional neural network with a fused attention mechanism to guide the model’s focus toward regions most likely to experience precipitation. When compared to the traditional Z–R relationship, deep learning methods have shown superior performance in the task of quantitative precipitation estimation.
So far, existing research has been found to mainly use 2D images from the 2 km altitude plane (CAPPI) of radar data as model inputs for estimating precipitation intensity. However, 2D images have limitations in capturing the spatiotemporal morphological characteristics and motion information of intense convective systems. For instance, single height horizontal reflectivity images cannot capture the vertical characteristics of strong convective cells. When observing new cells in a sequence of images at a single height level, these cells might represent newly formed cells within the convective system, or they could have moved vertically from other levels to their current positions. This uncertainty can impact the model’s ability to learn features of the entire convective system accurately.
Currently, 3D radar data has begun to be applied to radar echo extrapolation tasks. Otuska [
27] used 3D volumetric scan data from phased-array weather radar for echo correlation tracking and forecasting. Tran et al. [
28] treated the height levels as channels and used PredRNN for extrapolation prediction. Based on Tran’s work, Sun et al. [
29] further improved the method by treating the height dimension as an independent dimension and employing 3D convolutions for extracting height information. Compared to 2D extrapolation results from the 2 km altitude, 3D extrapolation provides richer forecast information and offers more accurate predictions at that particular height level. This demonstrates the irreplaceable role of 3D inputs in depicting the convective systems.
Three-dimensional radar echo extrapolation provides a solid foundation for quantitative precipitation estimation. The 3D echo images can reflect the spatial structural information of precipitation and predict the potential locations of future precipitation cells. However, there are seldom QPE models specifically designed for 3D radar inputs. To address this gap, this study proposes a deep neural network-based model for QPE using 3D radar inputs. The model takes a sequence of continuous multi-level radar reflectivity CAPPI images as input and employs an encoding module based on the ConvLSTM model [
30] to extract the spatiotemporal information from the input images. The convolutional attention module (CBAM) [
31] was introduced to guide the model to focus on crucial regions where precipitation is likely to occur. CBAM considers the symmetry of channel and spatial attentions and pays attention to precipitation events using both channel and spatial attention modules. The model also incorporates a skip-connection structure inspired by the Unet model [
32] to achieve multi-level spatial feature matching. This architecture allows the model to simultaneously calculate the precipitation intensity for each coordinate point from both temporal and spatial perspectives, aligning better with meteorological interpretations. In addition, to make the model concentrate more on heavy precipitation events, the asymmetry loss function for different precipitation events is also used to improve the performance.
In summary, the contributions of this paper are as follows:
We use 3D radar echo data for quantitative precipitation estimation, which aims to capture the complex vertical motions within convective systems;
We introduce the convolutional attention module to guide the model to focus on crucial regions as well as the asymmetry loss function for different precipitation events to further improve the performance;
We conduct an empirical exploration of our proposed model and show its superior performance compared to existing representative methods.
The organization of this paper is as follows:
Section 2 introduces the data used in this study.
Section 3 presents the proposed method.
Section 4 shows the test results and analysis of different cases.
Section 5 provides a summary and discussion.
3. Method
3.1. ConvLSTM Cell
The convolutional long short-term memory (ConvLSTM) cell was introduced by Shi et al. [
30]; being a classic type of recurrent neural network (RNN) structure, the ConvLSTM cell performs well in capturing spatiotemporal dependencies within sequential data. This is achieved by simultaneously extracting spatial and temporal information from the data sequence, effectively addressing the challenges of gradient vanishing and exploding often encountered in traditional RNN models.
The ConvLSTM network consists of a stack of ConvLSTM cells, and its structure is shown in
Figure 3. Each ConvLSTM cell takes three inputs: the input at the current time step
xt, the long-term memory cell state from the previous time step
ct−1, and the previous time step’s hidden state output
ht−1. It produces two outputs: the updated long-term memory cell state
ct and the hidden state output at the current time step
ht. The input
xt and input
ht−1 are convolved after being stacked along the channel dimension. The convolution outputs are then separately fed into the forget gate
f, update gate
i, activation gate
a, and output gate
o. These gates output feature maps, which are combined with the memory cell state by element-wise multiplication (Hadamard product) to update the memory cell. The operations performed by these four gates and the resulting
ct and
ht are described by Equations (1)–(6) as follows:
Here is how each symbol corresponds to an operation: * represents the convolution operation, w represents the corresponding convolutional kernel weights, and ◦ represents the Hadamard product (element-wise multiplication).
3.2. Convolutional Block Attention Module (CBAM)
The attention mechanism is a crucial processing mechanism in human vision; it guides the human brain to focus on the most informative and important local regions within visual input images while selectively ignoring less relevant areas. In computer vision, the attention mechanism dynamically selects the importance of different regions within input images, allocating significant computational resources to the crucial portions of an image while discarding irrelevant areas. This helps models in extracting input information from a global perspective. The attention mechanism can be categorized into four types based on the dimensions of attention they affect: channel attention, spatial attention, temporal attention, and branch attention.
The convolutional block attention module (CBAM) [
31] is a lightweight module that symmetrically integrates both channel attention and spatial attention mechanisms. Its structure is shown in
Figure 4. The CBAM module consists of two sub-modules: the channel attention module and the spatial attention module, which are connected in series. Each of these attention modules generates weight matrices that are multiplied element-wise with the input features. This process effectively filters valuable information and discards less relevant information.
The channel attention module operates on the spatial dimensions by employing both max pooling and average pooling to abstractly aggregate spatial information. The results of these two types of pooling are then separately passed through a shared fully connected network with the same set of weights. The outputs of this network are then added and processed through a sigmoid activation function. The resulting one-dimensional vector from this module has a length equal to the number of channels in the input feature maps. Each element in this vector represents the weight assigned to the corresponding channel of the input feature map.
The spatial attention mechanism is similar to the channel attention mechanism, with the distinction that spatial attention operates max pooling and average pooling along the channel dimension. The results of these two pooling operations are then stacked along the channel dimension and fed into a convolutional neural network for fusion. The outcome is processed through a sigmoid activation function to generate a spatial attention weight map. The output of this module is a single-channel 2D image. Its size matches those of the input feature map, where each element signifies the weight value assigned to that spatial location.
Since the input and output of the convolutional attention module have the same size, and the pooling operation greatly reduces the computational amount of the attention mechanism, the module can act as a plug and act as a lightweight unit in the neural network. It addresses the shortcomings of traditional convolutional neural networks, whose convolutional kernel receptive fields have a small range and have difficultly capturing large regional semantic information; computing resources are consequently tilted toward the airspace with higher reflectivity and more complex convective motion, thus effectively improving the model performance.
3.3. Network Architecture
The task of our model in this paper is to input ten consecutive 3D radar reflectivity images over the span of one hour and output a gridded map of accumulated precipitation for that hour. Time information is crucial for the calculation of accumulated precipitation. Therefore, we propose a quantitative precipitation estimation model based on ConvLSTM and 3D convolution. Its architecture is shown in
Figure 5.
This model uses the encoding part of the ConvLSTM model for feature extraction from the 3D radar echo sequences. The ConvLSTM unit comprises two types of outputs: the hidden state h and the memory cell c. The h only influences the next layer and the next time step of the ConvLSTM unit, whereas c can be updated throughout the entire time sequence. Considering the task of estimating the accumulated precipitation within the hour, we discard the output of the hidden state h from the last time step (t10), retaining the memory cell c as the carrier of precipitation information. In this paper, each 2D convolution within the ConvLSTM unit is replaced with a 3D convolution, facilitating better processing of spatial data. Between each layer of ConvLSTM units, the feature maps undergo a process of spatial dimension reduction, channel adjustment, and attention filtering through a combination of a 3D convolution and a CBAM module.
Upon encoding through the ConvLSTM network, the model produces three layers of feature maps. Each layer of feature maps has a different spatial size and number of channels, with deeper layers representing higher-level semantic information. Inspired by the skip-connection structure of the UNet network, this paper performs feature fusion and decoding on feature maps from different levels. Each layer of feature maps is dimensionally matched in the spatial domain with the previous layer’s feature maps using 3D convolutions and upsampling operations. After stacking them along the channel dimension, they are fused with feature maps from lower layers. Finally, a 2D convolution operation is applied to fuse the various feature maps, obtaining the ultimate output of precipitation estimation.
3.4. Loss Function
In the context of quantitative precipitation estimation, the goal is to accurately predict the occurrence of heavy precipitation events. However, the number of grid points corresponding to heavy precipitation is a small fraction of the total grid points in the entire training dataset. During training, the model may struggle to focus on the heavy precipitation regions and tends to prioritize improving accuracy in regions with light precipitation. To address this, we opt to employ the weighted mean squared error (
WMSE) loss function [
33] for training, which is an asymmetric loss function for different precipitation events. Strong precipitation regions are assigned higher weights, directing the model’s attention toward improving accuracy for predicting heavy precipitation events. The formulation of
WMSE is expressed as Equation (7) (where the weights are calculated as in Equation (8)).
In the equation, represents the precipitation intensity at position (x,y) in the observed image, represents the predicted precipitation intensity at position (x,y) in the predicted image, and is the weight assigned to the corresponding point. mask is the land mask; the precipitation estimate at sea will not be included in the total loss.
4. Experimental Results and Discussions
4.1. Experimental Setup and Evaluation Metrics
This paper’s model takes ten consecutive 3D radar reflectivity images as input and predicts the precipitation intensity grid field at the same resolution for this hour. The model is trained by the PyTorch framework, utilizing the Adam optimizer for parameter updates with a learning rate of 0.0001 and decay rates of (0.9, 0.999). The batch size is set to four. The final dataset introduced in
Section 2 consists of 4811 samples in the training set, 1169 samples in the validation set, and 1209 samples in the test set. The model is trained with 50 epochs and the model with the lowest
WMSE in the validation set was selected as the final model for testing.
This paper employs a threshold-based pixel-wise forecast lead-time evaluation method to assess the accuracy of precipitation event prediction. By setting various precipitation thresholds, points in the precipitation grid field greater than the threshold are labeled as “yes”, whereas those below the threshold are labeled as “no”. Next, the predicted image and the ground truth image are used to compute the counts of true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN), as shown in
Table 1. Subsequently, critical success index (CSI), Heidke skill score (HSS), and root mean square error (RMSE) for quantitative precipitation estimation accuracy assessment can be calculated using Equations (9)–(11) as follows:
In Equation (11), represents the actual grid point precipitation intensity, represents the predicted grid point precipitation intensity, and N represents the total number of grid points in the precipitation grid field.
4.2. Test Results and Analysis
In this section, we compare our proposed model with the Z–R relationship (Z–R), a two-dimensional convolution-based quantitative precipitation estimation model (2D-Conv), and a quantitative precipitation estimation model incorporating self-attention mechanism (Attention) [
26] as baseline experiments. Additionally, ablation experiments are conducted using the proposed model without the CBAM module (3D-QPE) and using the proposed model using two-dimensional radar echo images as input (2D-QPE) to elucidate the roles of the different components in our proposed model (3D-QPE-CBAM). The test results are presented in
Table 2 and
Figure 6.
- (1)
Overall, with the increase in precipitation threshold, the accuracy of predicting heavy precipitation decreases significantly for all methods. This is because heavy precipitation is often generated by convective systems, characterized by sudden intensity, rapid movement, and irregular intensity changes. Moreover, heavy precipitation events constitute a small proportion of all the precipitation events. During the learning process, models find it challenging to adequately capture the characteristic patterns of heavy precipitation events from a limited number of instances, resulting in lower predictability for such events.
- (2)
Combining the precipitation estimation scores under different precipitation thresholds, the model accuracies ranked from highest to lowest are as follows: 3D-QPE-CBAM, 3D-QPE, 2D-QPE, Attention, 2D-Conv, and Z–R. From the Z–R relationship to the proposed 3D-QPE-CBAM, there are two significant leaps in the CSI scores and two noticeable drops in the precipitation estimation error (RMSE). The first leap occurs with the transition from the Z–R relationship to deep learning models, and the second leap happens when shifting from 2D to 3D input for the deep learning model. According to
Table 2, the relative increments in CSI scores during these two leaps are 39.3% and 17.4%, whereas the relative decrements in precipitation estimation errors are 33.4% and 17.8%. The former (first leap) indicates the significant advantage of deep learning models in capturing the complex nonlinear mapping relationship between reflectivity and precipitation compared to the Z–R relationship. The latter (second leap) suggests that building a precipitation estimation model based on three-dimensional observational data is reasonable and effective, affirming the feasibility of the proposed 3D modeling approach in this study.
- (3)
When comparing the proposed model’s 2D-QPE (using 2D data) with the 2D-Conv baseline model, which also uses 2D data, the average relative improvement in CSI scores is 30.1%, and the average relative reduction in precipitation estimation errors is 12.3%. This demonstrates that the temporal data utilization approach of the proposed model effectively enhances the performance of the precipitation estimation model.
- (4)
Precipitation exceeding 20 mm is generally considered to be generated by intense convective systems. Under high precipitation threshold conditions, the accuracy improvement of the proposed model compared to the baseline models is even more significant. For hourly precipitation amounts exceeding 20 mm and 30 mm, the 3D-QPE-CBAM model proposed in this study demonstrates an improvement of approximately 10 percentage points in precipitation accuracy compared to the best-performing baseline model, Attention, among the three comparison models. This indicates that the proposed model exhibits a more pronounced advantage in predicting intense convective precipitation.
4.3. Case Studies
To visually demonstrate the advantages of the proposed model in quantitative precipitation estimation, we have selected two instances.
Figure 7 illustrates the radar observation sequence (displaying composite reflectivity) from the Cangzhou station between 5:00 a.m. and 6:00 a.m. on 3 August 2015, along with the visual comparison of the predicted precipitation amounts from various models and the actual precipitation field. This can be observed in
Figure 7.
In the radar reflectivity image, there is a squall line moving from west to east in the upper left corner, and two single-cell thunderstorms gradually developing in the lower left corner. All three areas experienced heavy rainfall within that hour. The upper left area corresponds to a long and intense rainfall band, and the larger single cell corresponds to a high-intensity and wide-ranging precipitation area. A rain band extends diagonally toward the upper right between the areas corresponding to the larger and smaller single-cell thunderstorms. Based on the radar data and actual precipitation observations: (1) Z–R Relationship Prediction: The predicted precipitation areas are fragmented. The predicted rain band associated with 4–8 mm/h precipitation and the precipitation area corresponding to the larger single cell shift rightwards. (2) 2D-Conv Model (2D-Conv) Prediction: The squall line region’s band-shaped heavy precipitation area and the larger single cell’s corresponding intense precipitation. The predicted precipitation area greater than 1 mm is more complete than that of the Z–R relationship. However, the band-shaped heavy precipitation area is notably shorter than the observed results. (3) Attention Model (Attention) Prediction: Similar to the 2D-Conv model, it predicts the band-shaped heavy precipitation region of the squall line and notices the intensity of the smaller single-cell area. However, it fails to predict the rain band between the two single-cell regions. (4) 2D-QPE and 3D-QPE without Convolutional Attention Module Prediction: Successfully predicts the rain band of the squall line, the intense precipitation events in both single-cell areas, and the strip-shaped rain band between them. However, the intensity of the squall line’s rain band is slightly lower than the actual results. (5) 3D-QPE-CBAM Model (3D-QPE-CBAM) Prediction: Building upon the previous model, it enhances the intensity of the squall line’s rain band and further improves the precipitation estimation accuracy.
Figure 8 illustrates the radar observation sequence (displaying composite reflectivity) from the Bengbu station between 10:00 p.m. and 11:00 p.m. on 14 May 2015, along with the visual comparison of the predicted precipitation amounts from various models and the actual precipitation field.
In Case Study 2, it is observed that precipitation exceeding 12 mm mainly occurs in the upper right and lower left regions of the image, which roughly correspond to the areas where radar reflectivity exceeds 40 dBZ. There are two localized precipitation peaks in the upper central area and the lower right area. Based on the radar data and actual precipitation observations: (1) Z–R Relationship Prediction: The predicted results for precipitation exceeding 4 mm/h are generally underestimated in terms of spatial coverage and intensity. The model predicts relatively strong precipitation in the upper right area but misses the intense precipitation event in the lower left region. (2) 2D-Conv Model (2D-Conv) Prediction: The precipitation areas predicted by the 2D-Conv model are more continuous, but it similarly misses the intense precipitation in the lower left area. Additionally, the predicted intense precipitation area is smaller than the actual observed results. (3) Attention Model (Attention) Prediction: In regions where precipitation threshold exceeds 1 mm, the Attention model tends to match the actual results better compared to the previous two models. However, the boundaries between intense precipitation areas remain unclear, and it still fails to predict the intense precipitation in the lower left area. (4) 2D-QPE and 3D-QPE without Convolutional Attention Module Prediction: Enhances the precipitation intensity in the lower region compared to the Attention model and predicts a concentrated precipitation area in the middle of the image. Intense precipitation areas under three-dimensional input are more distinctly delineated compared to two-dimensional input. (5) 3D-QPE-CBAM Model (3D-QPE-CBAM) Prediction: With the introduction of the convolutional attention module, the proposed model successfully identifies the gradually intensifying single cell in the lower left region of the radar image and accurately predicts the occurrence of heavy precipitation, closely matching the actual observed results.
4.4. Discussions
Compared with traditional Z–R models and the deep learning-based self-attention model (QPE-Attention), the proposed model demonstrates significant advantages: substantial improvements in RMSE, CSI, and other indicators. The two visualization case studies also highlight the model’s superior performance in predicting intense convective precipitation.
However, the proposed model still has some common limitations which are typical of comparison models. First, since the model is based on the data in North China and Eastern China, it cannot be used for high altitude areas directly. Second, since the labels are built on rain gauge observations, the model cannot tell the difference between liquid and solid precipitation. Third, due to the elevation angle limitations of radar scanning modes, the three-dimensional data used in this model still have blind spots and limitations in describing convective systems within certain regions. Additionally, precipitation intensity is influenced not only by water vapor density but also by the types of rainfall and various environmental factors such as temperature, air pressure, and wind direction. Relying solely on reflectivity data may not provide sufficient physical information for precipitation prediction. Thus, combining multiple data sources for quantitative precipitation estimation is crucial to further enhance accuracy.
5. Conclusions
This paper presents a quantitative precipitation estimation model based on three-dimensional radar reflectivity image sequences. By inputting ten consecutive time frames of three-dimensional radar echo data, the model performs precipitation intensity estimation within that time range. The main contributions are as follows:
- (1)
Transition from 2D to 3D Modeling: The conventional two-dimensional radar reflectivity data is transformed into three-dimensional modeling, incorporating an additional vertical dimension. This enriches the model with physically meaningful quantities. The three-dimensional reflectivity information in space provides constraints and useful information for ground precipitation.
- (2)
Integration of ConvLSTM and UNet: The proposed architecture combines ConvLSTM with the UNet network for effective information encoding and decoding. This structure enables more efficient extraction and utilization of temporal information, thereby enhancing the model’s ability to predict precipitation intensity.
- (3)
Temporal-Spatial Convolutional Attention Mechanism: The introduced cascaded spatiotemporal convolutional attention mechanism directs the model’s focus toward regions and time frames in the three-dimensional radar echo data that are most likely to lead to intense precipitation events. This enhances the model’s accuracy in predicting such events.
Compared with comparison models, the proposed model shows superior performance in RMSE, CSI, and HSS. Two case studies also visualize the model’s ability to predict intense convective precipitation.
Research has shown that incorporating dual-polarization radar data, such as differential reflectivity and differential phase, can significantly improve the accuracy of quantitative precipitation estimation [
34]. In the future, we hope to integrate three-dimensional reflectivity, three-dimensional differential reflectivity, and three-dimensional differential phase, providing comprehensive physical information for precipitation estimation from a three-dimensional perspective, thereby boosting estimation accuracy.