Next Article in Journal
Evaluating the Near and Mid Infrared Bi-Spectral Space for Assessing Fire Severity and Comparison with the Differenced Normalized Burn Ratio
Next Article in Special Issue
Future Directions in Precipitation Science
Previous Article in Journal
The Analysis and Modelling of the Quality of Information Acquired from Weather Station Sensors
Previous Article in Special Issue
A Synthetic Quantitative Precipitation Estimation by Integrating S- and C-Band Dual-Polarization Radars over Northern Taiwan
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Quantitative Precipitation Estimates Using Machine Learning Approaches with Operational Dual-Polarization Radar Data

1
Department of Astronomy and Atmospheric Sciences, Center for Atmospheric REmote sensing (CARE), Kyungpook National University, Daegu 41566, Korea
2
Department of Statistical Science, Baylor University, Waco, TX 76798-7140, USA
*
Author to whom correspondence should be addressed.
Remote Sens. 2021, 13(4), 694; https://doi.org/10.3390/rs13040694
Submission received: 26 December 2020 / Revised: 8 February 2021 / Accepted: 10 February 2021 / Published: 14 February 2021
(This article belongs to the Special Issue Precipitation and Water Cycle Measurements Using Remote Sensing)

Abstract

:
Traditional radar-based rainfall estimation is typically done by known functional relationships between the rainfall intensity (R) and radar measurables, such as R–Zh, R–(Zh, ZDR), etc. One of the biggest advantages of machine learning algorithms is the applicability to a non-linear relationship between a dependent variable and independent variables without any predefined relationships. We explored the potential use of two supervised machine learning methods (regression tree and random forest) in rainfall estimation using dual-polarization radar variables. The regression tree does not require normalization and scaling of data; however, this method is quite unstable since each split depends on the parent split. Since the random forest is an ensemble method of regression trees, it has less variability in prediction compared with regression trees, but consumes more computer resources. We considered several different configurations for machine learning algorithms with different sets of dependent and independent variables. The random forest model was appropriately tuned. In the test of variable importance, the specific differential phase (differential reflectivity) was the most important variable to predict the rainfall rate (residual that is the difference between the true rainfall rate and the one estimated from the R–Z relationship). The models were evaluated by 10-fold cross-validation. The best model was the random forest model using a residual with the non-classified training set. The results indicated that the machine learning algorithms outperformed the traditional R–Z relationship. Then, we applied the best machine learning model to an S-band dual-polarization radar (Mt. Myeonbong) and validated the result with ground rain gauges. The results of the application to radar data showed that the estimates of the residuals had spatial variability. The stratiform and weak rain areas had positive residuals while convective areas had negative residuals, indicating that the spatial error structure driven by the R–Z relationship was well captured by the model. The rainfall rates of all pixels over the study area were adjusted with the estimated residuals. The rainfall rates adjusted by residual showed excellent agreement with the rain gauge, especially at high rainfall rates.

1. Introduction

Quantitative precipitation estimates (QPE) are a major area of interest within the field of dual-polarization radar. With the advent of polarimetric radar, QPE algorithms using dual-polarization radar variables have been developed in recent decades [1,2,3,4,5]. The dual-polarization radar observes the differential reflectivity (ZDR, dB), specific differential phase (KDP, ° km−1), and cross coefficient ( ρ HV ), as well as the reflectivity (Z, mm6 m−3 or dBZ). Polarimetric variables help to overcome several issues in QPE, such as miscalibration of the radar transmitter or receiver, attenuation in precipitation, and partial beam blockage. The use of these polarimetric variables can provide improved QPE [6]. In addition, since various microphysical information, such as the shape, size, and number concentration of raindrops, is provided using the horizontal and vertical polarization information, QPE using dual-polarization variables provides higher accuracy than using the reflectivity factor in horizontal polarization [7,8,9].
A simple form of radar-based QPE can be performed by an empirical relation between Z and the rainfall rate (R). Marshall and Parmer [10] introduced the R–Z relationship (Z = 200 R1.6), which explains the empirical relationship between Z and R; however, the R–Z relationship is sensitive to the variability of the drop size distribution (DSD, N(D) in m−3 mm−1), which causes the uncertainty of the QPE using the R–Z relationship [11,12]. To deal with the uncertainty in the R–Z relationship, rainfall estimation based on dual-polarization radar variables that provide various information on raindrops was proposed [13,14,15].
ZDR is a good measurement of the median volume diameter, and KDP is linearly related to the rainfall rate as it has a lower moment than Z. As a result, rainfall estimation using these variables is more robust with respect to the variability of N(D) [13,16]. It is possible to significantly improve the over/underestimation in rainfall estimation using the R–Z relationship in a strong rainfall rate of 10 mm h−1 or more [17]. These empirical relationships have limitations in explaining the complex nonlinearity between R and radar variables, which leads to errors in rainfall estimation.
Recently, researchers have shown an increased interest in QPE based on machine learning using remote sensing data [18,19,20]. Machine learning (ML) methods, such as decision tree (DT), random forest (RF), and artificial neural networks (ANNs), are techniques for discovering the relationship between independent variables and a dependent variable based on sample data without any preliminary assumptions, including linearity. DT is divided into classification tree (CT) for a discrete dependent variable and regression tree (RT) for a continuous dependent variable [21]. DT can take into account interactions and nonlinearity between variables.
RF is an ensemble method that consists of a number of DTs and shows more accurate prediction performance than a single DT [22]. RF can reduce the variance by lowering the correlation between DTs with randomly selected independent variables. RF can be fitted in parallel, as several DTs are independently generated. Ouallouche et al. [20] performed rainfall estimation based on RF using data retrieved from the satellite, such as the cloud top height, cloud top temperature, cloud phase, and textural features.
As a result, the rain rates estimated by RF greatly agreed with those measured by rain gauges. Kusiak et al. [23] applied five data-mining approaches, including RF and DT, to estimate the rainfall using rain gauge data and Z measured from the Doppler WSR-88D radar of the National Weather Service’s Next-Generation Radar (NEXRAD) system. They compared the statistics over the methods, but did not compare them with the empirical relationships. The neural network model showed good performance with the lowest mean absolute error (MAE), and the results were lower in order of support vector machine (0.19), k-nearest neighbor (0.22), CT and RT (0.26), and RF (0.27).
ANNs are ML algorithms that have been inspired by the human neuron-synaptic neural network structure. ANNs are actively applied to atmospheric remote sensing data, as this is effective in extracting characteristics and trends of complex data structures and is suitable for modeling non-linear relationships [24]. Chiang et al. [25] estimated rainfall with a recurrent neural network (RNN) using Z measured from the C-band radar for typhoon periods. The RNN produced better hourly rainfall estimates than those from R–Z relationships in terms of the root mean square error (RMSE).
As a result of the comparison of 48-h rainfall accumulations, the rainfall estimates obtained from the R–Z relationship were underestimated with a relative bias larger than –45%, while those from the RNN had a relative bias within ±5%. Chen et al. [26] proposed a deep neural network (DNN) approach in rainfall estimation using simulated dual-polarization radar variables based on the N(D) measured from disdrometers. The rainfall rates based on the DNNs model were almost consistent with the rainfall rates computed directly from the N(D).
They compared the hourly rainfall estimates based on the proposed algorithm using Colorado State University-Chicago Illinois (CSU-CHILL) radar data with 11 rain gauges and showed excellent agreement between the estimates and the measurements from the gauges. Although ANNs have been popularly used in a variety of applications due to the advantage of describing nonlinearity between variables, the main shortcoming we encounter with these methods is the black box problem, which makes interpretation of the process difficult. For instance, these methods provide little insight into how the independent variables influence the learning and prediction processes. On the other hand, tree-based ML methods (DT and RF) provide ease of interpretation with determination of the variable importance.
Little research has been done on ML-based rainfall estimation using dual-polarization radar. Using the dual-polarization variables allows us to incorporate microphysics information, such as the shape and the number concentration of raindrops into the rainfall estimation. The objective of this study is to improve the accuracy of rainfall estimation based on polarimetric radar parameters using machine learning methods—specifically, tree-based methods (DT and RF).
We used observed drop size distributions, N(D), measured using a two-dimensional video disdrometer (2DVD) to simulate rainfall intensity and radar variables. The ML models were trained with these simulated R and radar variables and cross-validated to check the degree of fitting. The best ML model was independently applied into Mt. Myeonbong (MYN) S-band dual-polarization radar data for rainfall estimation. The estimated rainfall rate was verified using the rain gauge data of automatic weather stations (AWSs) within the radar observation range.

2. Data

2.1. Training Dataset: 2DVD Data

In this study, 2DVD data were used to train ML models. The 2DVD is an optical instrument that detects precipitation particles and uses two orthogonal cameras to detect the shadow of particles falling into the observation area. Microphysics information, such as the diameter of particles (D, mm), fall velocity (Vf, m s−1), and the axis ratios can be obtained by measuring the shadow of precipitation particles [27]. This information can be contaminated by observing particles that fall into the observation area after being hit by a disdrometer and broken, or by the mismatching of particles in the image processing.
These outliers are removed by comparison with the empirical relationship between D and Vf [28]. In addition, an N(D) that has one or more channels with a zero number concentration is considered as discontinuous N(D) and eliminated [29]. A total of 41 diameter channels of N(D) from 0 mm to 10.25 mm at 0.25 intervals were used. A 1-min rain rate (R, mm h−1) was calculated from quality-controlled 1-min N(D) using Equation (1):
R 2 D V D =   π 6   0 D m a x N ( D )   V f ( D )   D 3   d D
where dD is the diameter interval at each diameter bin.
Table 1 shows the observation locations, observation periods, the number of 1-min N(D), and the maximum 1-min rain rate from the 2DVD data. The total number of training data was 51,302, measured in Oklahoma (OKL, USA), Daegu (DAE, ROK), Boseong (BOS, ROK), and Jincheon (JIN, ROK) to secure the diversity of microphysical processes. We used data observed in spring (April 2019) and autumn (October 2018) as well as summer data (May to September) for seasonal variety. The 2DVD data in Oklahoma were obtained by the National Severe Storms Laboratory (NSSL), National Oceanic and Atmospheric Administration (NOAA), and the data in Jincheon were provided by the Weather Radar Center (WRC), Korea Meteorological Administration (KMA). The other data were collected by the Center for Atmospheric REmote sensing (CARE), Kyungpook National University (KNU).
We discarded the time if the radar reflectivity was greater than 55 dBZ in order to exclude hail particles in the analysis [6]. We also removed the time if the rainfall rate was less than 0.1 mm h−1 because the disdrometer typically underestimates the rainfall rate when small drops (diameter < 0.7 mm) are dominant [30]. The maximum rainfall rate was larger in OKL than the sites in ROK. The median rainfall rate (radar reflectivity) varied from 0.99 mm h−1 (22.92 dBZ) to 1.78 mm h−1 (27.72 dBZ). Table 1 shows the clearly different statistical characteristics of rainfall in different climates (OKL vs. ROK, different regions in ROK) [29]. Unlike the maximum rainfall rate, the maximum reflectivity showed less discrepancy. These characteristics certainly have impact on the ML models and will be discussed later.
Dual-polarization radar variables are obtained by T-matrix scattering simulation [31], and 5-min time average values of Zh, ZDR, and KDP were additionally used to consider the movement of the precipitation system (Zh 5min, ZDR 5min, and KDP 5min). The T-matrix method used to calculate the polarimetric radar variables is one of the most widely used tools for computing light scattering by non-spherical particles based on directly solving Maxwell’s equations. This approach can simulate theoretical radar measurements for homogeneous and rotationally symmetric non-spherical particles. Backward scattered fields yield ZH, ZDR, and ρ HV , while forward scattered fields produce KDP [17]. The control conditions and the values used in this study are shown in Table 2. The radar wavelength was 11.01 cm (S-band), and the elevation angle of the radar was set at 0 ° . The raindrop shape formula suggested by Thurai et al. [32] was used.

2.2. Operational MYN S-Band Dual-Polarization Radar Data

The weather radar used in this study was the MYN S-band dual-polarization radar operated by the Korea Meteorological Administration (KMA). Table 3 shows the detailed specifications of the MYN radar. A volume scan of nine elevation angles (0 ° , 0.39 ° , 0.83 ° , 2 ° , 2.88 ° , 4.06 ° , 5.67 ° , 7.88 ° , and 10.94 ° ) was performed every 10 min with a beam width of 0.92 ° . The measured parameters were ZH, ZDR, KDP, ρ HV , etc. ZH and ZDR were calibrated through post-processing. The averaged ZH calibration bias was calculated by the self-consistency principle of ZH and KDP, and the averaged ZDR calibration bias was conducted by comparing the ZDR and ZH radar measurements with the theoretical relationship between the same parameters simulated by the 2DVD [33,34]. The averaged calibration bias of ZH was −5.0 dBZ, and the averaged calibration bias of ZDR was 0.03 dB.
Rain gauge data from a total of 192 rain gauges in KMA AWSs within the MYN radar observation range (150 km) were used to verify the radar QPE (Figure 1). Each AWS was equipped with two sizes of tipping-bucket rain gauge, 0.1 and 0.5 mm, which measure the 1-min R. In this study, the 10-min average rain rate was used to match the time resolution with the radar data observed at 10-min intervals, and missing values were excluded when calculating the 10-min average rain rate. We analyzed six rainfall cases for QPE and verification. The rainfall events included stratiform rain (Cases 1, 2, 3, and 6) and convective rain (Cases 4 and 5) from 2017 to 2018 (Table 4).

3. Methods

3.1. Machine Learning

In this study, RT and RF were used for rainfall estimation. Figure 2 shows a schematic diagram of the RT. We defined N to be the number of RTs, and M to be the number of independent variables. A node was divided into sub-nodes with the lowest variance [21]. A recursive binary partition was conducted for each node until a stop condition was met. The most important independent variable was placed at the top of the tree as a root node. The node divided from the root node is called the intermediate node, and the node that reaches the last is called the terminal node.
The RF was based on N RTs with N bootstrap samples (see Figure 3). The bootstrap samples were generated by sampling with replacement. Each RT was grown with the splitting rules using the different independent variables selected randomly. The final prediction was given by the average of the predictions from all RTs. In the RF, the importance of independent variables was measured through the increase of the node purity. The independent variable with the highest increase of node purity played a major role in the prediction.
RF can be optimized by tuning two parameters when generating RTs. One is the number of RTs (ntree = N), and the other is the number of independent variables that are randomly sampled (mtry < M). Liaw and Wiener [35] suggested that ntree was 500, and mtry was M for classification, as opposed to a third of M for regression. Kühnlein et al. [19] compared out-of-bag (OOB) errors by changing the ntree and mtry to improve the predictability of RF, which was used to select important independent variables and to compute the error of the unbiased estimate [22]. To determine the optimal values with the lowest OOB error in this study, we considered the range of values for each tuning parameter, from 400 to 700 for ntree and from 1 to the number of independent variables for mtry. The optimal ntree and mtry were applied to each model.

3.2. Rainfall Estimation

3.2.1. R–Z Relationship

For validation of the ML models used in this study, we derived the empirical rainfall estimation relationships based on dual-polarization radar variables. Several R–Z relationships were considered with different thresholds of KDP and ZDR. The R–Z relationship calculated using all training data is shown as Equation (2), and Equation (3) was retrieved with the data below the threshold of KDP and ZDR (KDP < 0.04° km−1 and ZDR < 0.3 dB). Equation (4) was constructed with the data above the thresholds of KDP and ZDR (KDP ≥ 0.04° km−1 or ZDR ≥ 0.3 dB). The equations are assumed to have a power-law (Y = aXb), and the parameters a and b were estimated using the weighted total least squares method [36].
R(Zh) = 0.030 Zh0.667 (Zh = 197 R1.50)
R(Zh) = 0.017 Zh0.806 (Zh = 151 R1.24) when KDP < 0.04° km−1 and ZDR < 0.3 dB
R(Zh) = 0.012 Zh0.769 (Zh = 318 R1.30) when KDP ≥ 0.04° km−1 or ZDR ≥ 0.3 dB
R(KDP) = 42.6 KDP0.720
R(Zh, ZDR) = 0.003 Zh0.913 ZDR–0.647

3.2.2. ML-Based Estimation

The 2DVD data above the thresholds of KDP and ZDR (KDP ≥ 0.04° km−1 or ZDR ≥ 0.3 dB) were used as training data. The dual-polarization radar parameters simulated by the T-matrix and the 5-min time average values of parameters were used as the independent variables. In this study, we investigated the impacts of three factors on the estimation accuracy. First, three types of dependent variable are considered: R2DVD calculated by N(D) (M1), the residual, ε   =   R ( Z h )     R 2 DVD , between R2DVD and R—which was computed from Equation (4) (M2)—and the normalized residual, ε ¯ = ε / R ( Z h ) ¯ , (M3).
Second, two different groups of the independent variables were used, and the difference between the two groups included KDP. We denote the two groups by KY for the group with KDP and KN for the group without KDP. Lastly, data binning was implemented to group individual observations into specific bins determined by reflectivity (Table 5), allowing us to train the models locally. The local training with the binned training data and global training with the entire training data are denoted by CY and CN, respectively. In the local training (CY) with RF, ntree and mtry with the lowest OBB error for each model were utilized.
KDP and ZDR provide more accurate rainfall estimation than R(Zh) for heavier rainfall, whereas the improvement is not often significant for lighter rainfall due to the noises of KDP and ZDR [13,37,38,39]. Silvestro et al. [40] proposed a rainfall estimation algorithm that makes use of the best empirical relationships depending on the thresholds of KDP and ZDR, which outperformed R(Zh) for real-time applications. In this study, the rainfall estimation based on this algorithm with different thresholds of KDP and ZDR was performed, which is shown in Figure 4. When KDP was greater than 0.04° km−1, KY was utilized for rainfall estimation, KN was used if KDP was less than 0.04° km−1, and ZDR was 0.3 dB or more. The rainfall was estimated by the R–Z relationship (Equation (3)) when both KDP and ZDR were less than the thresholds. The models used in this study are summarized in Table 6.

3.2.3. Validation

The trained models were verified by 10-fold cross-validation. Six statistics were used to assess the performance of the ML models: The root mean square error (RMSE), mean absolute error (MAE), bias, correlation coefficient (CORR), coefficient of efficiency (COE) [41], and normalized error (1-NE),
R M S E =   1 N i = 1 N ( R e s t i R o b s i ) 2
M A E =   1 N i = 1 N | R e s t i R o b s i |
B i a s =   1 N i = 1 N ( R e s t i R o b s i )
C O R R =   i = 1 N ( R e s t i R e s t ¯ ) ( R o b s i R o b s ¯ ) i = 1 N ( R e s t i R e s t ¯ ) 2 i = 1 N ( R o b s i R o b s ¯ ) 2
C O E =   1 i = 1 N ( R o b s i R e s t i ) 2 i = 1 N ( R o b s i R o b s ¯ ) 2
1 N E =   1 ( 1 N i = 1 N | R e s t i R o b s i | ) R o b s ¯
where N is the number of observations in test data, Rest represents the estimated rainfall rate, and Robs is the observed rainfall rate.

3.3. Application to Operational Radar Data

The ML model with the highest accuracy in the 10-fold cross-validation using 2DVD was applied to the rainfall estimation using the operational dual-polarization radar data. The operational radar data used were conducted using the Hybrid Surface Rainfall method (HSR) of the MYN S-band dual-polarization radar data. The HSR is a technique of generating a rainfall field using the data of the lowest elevation angle that is not affected by ground clutter, beam blockage, and non-meteorological echoes. It is applied to the radar data in polar coordinates, and we selected the rain field using the threshold and calibrated for the radar bias (ZH and ZDR).
When determining the rain field, that is, eliminating non-meteorological echoes or removing artifacts, the threshold values of the following texture were used as follows [42,43]: 0.95 for ρ HV , 0.1 for δ ( ρ HV ), 4.0 dB for δ (ZH), 4.0 dB for δ (ZDR), and 15.0° for δ ( Φ DP ). Here, δ ( x ) is the radial texture of variable x with a window size of 10. The HSR data at polar coordinates are converted to Cartesian coordinates [42].
The ML-based rainfall estimation was performed for each grid point in the Cartesian coordinates. Similar to the training data, the 5 × 5 km spatial average of Zh, ZDR, and KDP were considered as the independent variables by taking into account the movement of the precipitation system. The estimated rainfall rate was verified by the rain gauges in the AWSs within the radar observation range (150 km). The rainfall rate estimated from the radar grid point closest to the AWS was compared with the average rainfall rate measured by the rain gauges.

4. Results

4.1. Rainfall Estimation from Simulated Dual-Polarization Variables

Figure 5 shows the fitted tree models for M1KYCN and M1KNCN (with and without KDP). See Table 6 for the symbols such as M1, KY, CN, etc. We found that KDP plays an important role in rainfall estimation with M1KYCN because it is used as a splitting criterion at the root node. On the other hand, KDP is not shown in the tree based on M1KNCN, while Zh is the most crucial for the model.
The increase of node purity in RF for CN is shown in Table 7. To account for potential overfitting in the random forest, we optimized the tuning parameters, the number of RT (ntree), and the number of independent variables randomly sampled (mtry). The values were expressed according to the inclusion of KDP for each model. For M1KY, the node purity rose the most (713,059) when KDP was used as the criterion for split. Then, in the order of Zh (426,349) KDP 5min (311,490), and Zh 5min (147,971), the increase of node purity was higher. In the case of M1KN, the node purity increased the most when the node was divided by Zh. This indicates that Zh played the most major role in the estimation of the rainfall rate.
Unlike M1KY, the increase in the node purity of ZDR was the highest in the M2KY. This indicates that the errors, which cannot be explained by the R–Z relationship, were most closely related to ZDR. The second most important variable in M2KY is ZDR 5min. Similar to KY, ZDR was the most important variable in M2KN, and the importance was the highest in the order of Zh and ZDR 5min. The tendency of increasing node purity of M3 was similar to that of M2 regardless of whether KDP was included or not. As expected, ρ HV was mostly shown to be less important, as it has a very small fluctuation in rainfall cases.
Table 8 presents the increase in the node purity of CY. In M1KY, KDP is shown as the most important variable in all reflectivity intervals as in CN (Table 6), and Zh is indicated as the second most important variable. Similar to CN, M2KY and M2KN always show ZDR as the most important variable. The result of M3 is omitted since it appears with the same tendency as that of M2.
The scatterplots of the 10-fold cross-validation for the RT are displayed in Figure 6. The discontinuity of the estimated value is a prominent problem of RT. For the weak R2DVD (R2DVD < 20 mm h−1), there is a continuity of data because rainfall is often estimated using the R–Z relationship when the R2DVD is weak and the ZDR and KDP are small. In M1CY (Figure 6b), since the training data are divided by the reflectivity interval, the discontinuity problem is rectified, and the 1-NE value increased 7.08% compared to M1CN. M2 (Figure 6c,d) and M3 (Figure 6e,f) demonstrate similar results and showed a more continuous value than M1, with a higher positive CORR value (CORR > 0.98). Additionally, CY appears lower in the RMSE, MAE, and bias values compared with those of CN, and the CORR, COE, and 1-NE values are higher.
Figure 7 presents the results of the 10-fold cross-validation of RF. Overall, RF-based models show improved accuracy compared to the RT-based models in Figure 6. In M1CN (Figure 7a), the CORR value and the COE value are higher than 0.98, even though it presented the worst statistics among the RF models. Compared to M1CN, the underestimation in the strong R2DVD (R2DVD > 80 mm h−1) was corrected in M2CN and M3CN (Figure 7c,e), and the RMSE reduced by 4.93%. The RMSE, MAE, and bias decreased in M1CY (Figure 7b) compared with in M1CN, whereas the RMSE, MAE, and bias increased in M2CY and M3CY (Figure 7d,f).
M2CN outperformed the other models with an RMSE value of 0.598, MAE value of 0.255, CORR value of 0.995, and 1-NE value of 90.99%. The M3CN shows a similar performance with M2CN. There was no significant improvement between CN (left panels) and CY (right panels). Therefore, the ensemble effect by CY was less affected in the RF-based models compared with in the RT-based model. This is because RF itself is based on the ensemble. The interesting thing is that CN in RF even showed a slightly better score than CY. This could be explained by the fact that CN uses more training data and is free from possible local features in CY.
The scatterplots of R estimated by the empirical R–Z relationship are shown in Figure 8. The rainfall rate estimated by R(Zh) calculated from the entire training data (Figure 8a) presents a large dispersion overall and a tendency to underestimate at a rainfall rate of 60 mm h−1 or higher. When the rainfall rate was estimated based on Equations (3) and (4) according to the threshold of KDP and ZDR (Figure 8b), the RMSE and MAE increased compared to the case of using one R(Zh) due to overestimating at 40–100 mm h−1; however, the underestimation of the high rainfall rate (R > 100 mm h−1) was resolved.
Estimation using KDP (Figure 8c) had the lowest RMSE value (1.509) and the highest CORR value (0.974) and COE value (0.939) among the estimations using empirical relationships; however, there was still underestimation overall. On the other hand, in the case of the relationship using Zh and ZDR (Figure 8d), the highest 1-NE value (85.24%) and the lowest MAE value (0.418) are presented, but most of the R2DVD values were overestimated. Among the ML models, the model that showed the lowest performance was M1CN-RT, with an RMSE value of 1.700 and CORR value of 0.961 (see Figure 6). This indicates that all ML-based models outperformed these empirical relationship-based approaches.

4.2. Rainfall Estimation from Operational Radar

In the results of the 10-fold cross-validation of the ML models, the M2CN-RF and M3CN-RF showed the best and similar performance in terms of the RMSE and CORR. In this section, we chose M2CN-RF for rainfall estimation with MYN S-band dual-polarization radar data. The HSR images of the rainfall rate of Case 1 and Case 2 are displayed in Figure 9 and Figure 10. R(Zh) was estimated by Equation (3) when KDP and ZDR were below the threshold value and by Equation (4) when they were above the threshold value (Figure 9a and Figure 10a).
The M2CN-RF (Figure 9c and Figure 10c) adjusted the rainfall rate from R(Zh) by applying the residuals. In the HSR rainfall image, the gray region ( ε = 0) in Figure 9c and Figure 10c correspond to the area in which R(Zh) was replaced due to the threshold values of KDP and ZDR. A positive ε value indicates that R(Zh) was overestimated, while a negative ε value indicates that R(Zh) was underestimated. For reference, the HSR of the radar observed variables is shown in Figure 9d–f and Figure 10d–f.
In Case 1 (1000 LST on 14 August 2017), which is a stratiform case, the area of ε = 0 is wider due to lower values of KDP and ZDR. Although the R(Zh) field is highly correlated with the Zh field, the ε field is less correlated with the Zh field. The large (smaller) value of ZDR is related to the larger positive (negative) ε . The ε is less correlated with KDP in this stratiform case since most of the KDP value is smaller. In Case 2 (0730 LST on 11 September 2017), which is a stratiform rain event with embedded convection, the area of ε = 0 is smaller than that in Case 1. Similar to Case 1, the positive ε area is highly correlated with the higher value of ZDR. Interestingly, the region of higher KDP with Zh > 50 dBZ in the south direction had the largest negative values of ε .
In both cases, it can be seen that ε appeared in space with significant structure, indicating that the error generated from the R–Z relationship had a spatial structure. This result is not surprising because inhomogeneous microphysical processes cause the natural spatial DSD variation and result to the spatial structure of residual from R(Z) [11,12]. The ML model (M2CN-RF) uses the simulated dual-polarization variables, which do not have instrumental noises of the radar, such as sampling error, beam broadening, beam blockage, and miscalibration of the radar [12].
As a result, we can expect the spatial structure of residual will be masked if the order of the magnitude of instrumental noises is comparable to that of natural variation of DSDs. The random variability of instrumental noise can be reduced by averaging the samples. Since the M2CN-RF is an ensemble-based model, the spatial structure of residual can be clearer as the instrumental noises are diminished by averaging the prediction of each RT.
Figure 11 and Figure 12 show the verification of the period average rainfall rate (12 h for Case 1, and 11 h for Case 2) for estimation with the empirical relationships and ML. In Case 1, the RMSE value of ML (R(Zh)   ε RF ) is 1.207 and the CORR is 0.773 (Figure 11a), which shows improved performance compared to those estimated by the empirical relationships. When calculated by Equation (2) (Figure 11b), there is a tendency to underestimate when R > 5 mm h−1. This underestimation is slightly improved by applying Equations (3) and (4) based on the threshold values of KDP and ZDR, and the CORR value increases (Figure 11c).
Underestimation still appears with the heavier rainfall rate. This underestimation of the rainfall rate estimated by R(Zh) is improved by ML in Figure 11a, adjusting the rainfall rate with ε RF . The estimation based on Equation (5) results in substantial underestimation with a low CORR value and the worst performance (Figure 11d). The results estimated by Equation (6) show a severe overestimation of strong RAWS (RAWS > 5 mm h−1) (Figure 11e).
In Case 2, a similar trend as seen in Case 1 appears. R(Zh)   ε RF also outperforms the empirical relationships in all statistics. The underestimations of R(Zh) that appear in RAWS stronger than 15 mm h−1 (Figure 12b,c) are corrected by ε RF (Figure 12a), leading to the decreases of RMSE from 2.59 and 1.89 to 1.74, respectively. Analogous to Case 1, Figure 12d presents an overall underestimation, and the RMSE value also shows the largest value. R(Zh, ZDR) overestimated the strong rainfall rate (RAWS > 25 mm h−1) (Figure 12e).
A total of six rainfall events from 2017 to 2018 were used to verify the five different models (Figure 13 and Table 9). The scatter plots of the event averaged rainfall rate are displayed, and the same color indicates the same event (Figure 6). The statistics are shown for rainfall types and models (Table 1). In general, the ML model (R(Zh)   ε R F ) outperformed all the empirical relationships, with an RMSE value of 1.039, MAE value of 0.593, CORR value of 0.959, COE value of 0.912, and 1-NE value of 75.24%. The estimation of the ML model showed the most consistency with the one-to-one line.
On the other hand, R(KDP) tended to underestimate (Figure 13d), and R(Zh, ZDR) overestimated in weak rain with RAWS < 17.5 mm h−1. According to the rainfall types, R(Zh)   ε R F had a higher CORR (0.956), COE (0.905), and 1-NE (77.18%) in the stratiform rain, whereas it had a lower RMSE (0.612), MAE (0.330), and bias (1.041) in the convective rain. The 1-NE of the stratiform (convective) rain varied from 60% (10%) to 77% (56%). The poor performance of the convective rain was due to the smaller size of rain cells. Most of the convective rain showed smaller areas of precipitation and short-lived cells.

5. Conclusions

The ML-based rainfall estimation using dual-polarization radar variables was explored with simulated and observed variables. The ML methods, RT and RF, used in this study allowed us to model the nonlinear relationship between the dependent and the independent variables and to identify important independent variables. The ML methods were first trained with the DSDs observed from 2DVD. In this study, we also considered three types of dependent variables (R: M1, the residual ε = R ( Z h )     R 2 DVD : M2, and the normalized residual ε ¯ = ε / R ( Z h ) ¯ : M3), two groups of independent variables (with KDP: KY and without KDP: KN), and two types of training data (categorized with intervals of Zh: CY, and overall data without categorization: CN).
In the CY models of RF, the number of RTs and independent variables used was optimized. As a result of ML using DSDs from 2DVD, the KDP was identified as the most important variable for rainfall estimation in both the M1KY-RT and M1KY-RF, while Zh served as the most significant variable in the M1KN. This is an outcome of the fact that the KDP can be approximated with the closest moment of DSDs to R, that is 4.2–5.6th moment and the Zh with the sixth moment of DSDs [12,14]. In M2 and M3, ZDR was the most crucial to explain the error (or residual) occurring in the R–Z relationships.
The ML methods were compared with the empirical relationships through 10-fold cross-validation. In the cross-validation, R(Zh), KN, and KY were first determined with the threshold values of KDP and ZDR due to the noises of KDP and ZDR for light rain (Figure 4), and the other models were then subsequently applied. Since the estimation in RT took the average of the terminal nodes, discontinuity in the estimation and underestimation (overestimation) at the strong (weak) rainfall rate within the node was often shown; however, residual ε corrects these problems. Similarly, CY is also improved with a lower value of RMSE. Since RF takes an ensemble of RT, the discontinuity problem in RT disappears; however, underestimation was still found above 100 mm h−1 in the M1CN-RF. Adjusting the rainfall rate with the estimated ε resolved the underestimation issue, and the estimation was close to the true value.
Compared to the empirical relationships, all the ML models showed improved evaluation statistics compared with the R–Z relationships. Even the worst ML model (M1CN-RT) showed meaningful improvement of the RMSE value (1.700) compared with the R–Z relationships (RMSE of 2.125 and 3.009). The M2CN-RF outperformed all the empirical relationships and presented a higher CORR value and lower RMSE value. The M2CN-RF, the model with the best performance among the ML models trained with 2DVD data, was applied to the MYN S-band dual-polarization radar data.
In the stratiform case (Case 1), most of the ε values were zero in the weak rainfall rate region, while the ε values were positive in the region of larger ZDR. The negative ε values were large in the convection region, in particular, in the region of larger KDP values in CASE 2. In addition, when estimated by the R–Z relationship, a significant underestimation was shown in heavier rainfall regions and was corrected by ε that was estimated by the ML models. In addition, the significant spatial structure of ε appeared and was highly correlated with ZDR positively, and KDP negatively. The evaluation with six rainfall events indicated that the ML model outperformed the empirical relationships regardless of the rainfall type (i.e., stratiform or convective). The statistics according to the rainfall type show that the ML-based QPE for stratiform cases had a higher CORR, COE, and 1-NE compared with the convective cases.
There was a dependency of the estimation accuracy on the trained data set. When we trained the RF model with DSD data from the DAE that was nearest to the MYN radar, the higher rainfall rate was systematically underestimated, likely due to the limited range of rainfall intensity. A recent study showed a significant discrepancy of DSD characteristics due to different climate and main forcings [29]. The addition of the OKL data significantly extended the rainfall prediction range, particularly in higher ranges, and resolved the underestimation of higher rainfall. As a result, we added additional DSDs data from OKL, BOS, and JIN in the training data set to broaden the variability of DSDs. This improved the accuracy in the overall statistics.
Through this study, we investigated the potential application of rainfall estimation based on ML using polarimetric radar data. We envision that more accurate rainfall estimation can be achieved by applying the RF model considering ε trained with 2DVD data to a large number of operational radars. In particular, the ML model improved the estimates in the heavy rain region, which were underestimated in the empirical relationship. This approach would be useful in the analysis and forecasting of severe weather. Future studies could include extending ML models to various radars and rainfall cases in different weather conditions.

Author Contributions

This work was made possible by significant contribution from all authors. Conceptualization, G.L. and K.S.; methodology, K.S., J.J.S., W.B., and G.L.; software, K.S. and J.J.S.; validation, K.S. and G.L.; formal analysis, K.S., W.B., and G.L.; investigation, K.S. and G.L.; resources, K.S. and W.B.; writing—original draft preparation, K.S. and W.B.; writing—review and editing, J.J.S. and G.L.; visualization, K.S.; supervision, G.L.; funding acquisition, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Korea Meteorological Administration Research and Development Program under Grant KMI2020-00910 and by the Korea Environmental Industry & Technology Institute (KEITI) of the Korea Ministry of Environment (MOE) as “Advanced Water Management Research Program” (79615).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

This paper is based on Kyuhee Shin’s thesis. We thank the Korea Meteorological Administration for acquiring the radar and AWS data and to Alexander V. Ryzhkov and Terry Schuur from National Severe Storm Laboratory for providing their valuable 2DVD data. We also greatly appreciate students and researchers in CARE, KNU for constructive discussions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ryzhkov, A.V.; Zrnić, D.S. Comparison of Dual-Polarization Radar Estimators of Rain. J. Atmos. Ocean. Technol. 1995, 12, 249–256. [Google Scholar] [CrossRef] [Green Version]
  2. Brandes, E.A.; Zhang, G.; Vivekanandan, J. Experiments in Rainfall Estimation with a Polarimetric Radar in a Subtropical Environment. J. Appl. Meteorol. 2002, 41, 674–685. [Google Scholar] [CrossRef] [Green Version]
  3. Matrosov, S.Y.; Cifelli, R.; Kennedy, P.C.; Nesbitt, S.W.; Rutledge, S.A.; Bringi, V.N.; Martner, B.E. A comparative study of rainfall retrievals based on specific differential phase shifts at X- and S-band radar frequencies. J. Atmos. Ocean. Technol. 2006, 23, 952–963. [Google Scholar] [CrossRef]
  4. Bringi, V.N.; Rico-Ramirez, M.A.; Thurai, M. Rainfall Estimation with an Operational Polarimetric C-Band Radar in the United Kingdom: Comparison with a Gauge Network and Error Analysis. J. Hydrometeorol. 2011, 12, 935–954. [Google Scholar] [CrossRef]
  5. Thompson, E.J.; Rutledge, S.A.; Dolan, B.; Thurai, M.; Chandrasekar, V. Dual-polarization radar rainfall estimation over tropical oceans. J. Appl. Meteorol. Climatol. 2018, 57, 755–775. [Google Scholar] [CrossRef]
  6. Bringi, V.N.; Chandrasekar, V. Polarimetric Doppler Weather Radar: Principles and Applications; Cambridge University Press: Cambridge, UK, 2001; ISBN 9780521623841. [Google Scholar]
  7. Zrnić, D.S.; Ryzhkov, A. Advantages of Rain Measurements Using Specific Differential Phase. J. Atmos. Ocean. Technol. 1996, 13, 454–464. [Google Scholar] [CrossRef] [Green Version]
  8. Friedrich, K.; Germann, U.; Gourley, J.J.; Tabary, P. Effects or radar beam shielding on rainfall estimation for the polarimetric C-band radar. J. Atmos. Ocean. Technol. 2007, 24, 1839–1859. [Google Scholar] [CrossRef] [Green Version]
  9. Kumjian, M.R. Principles and applications of dual-polarization weather radar. Part II: Warm and cold season applications. J. Oper. Meteorol. 2013, 1, 243–264. [Google Scholar] [CrossRef]
  10. Marshall, J.S.; Palmer, W.M.K. The distribution of raindrops with size. J. Meteorol. 1948, 5, 165–166. [Google Scholar] [CrossRef]
  11. Maki, M.; Park, S.G.; Bringi, V.N. Effect of natural variations in rain drop size distributions on rain rate estimators of 3 cm wavelength polarimetric radar. J. Meteorol. Soc. Jpn. 2005, 83, 871–893. [Google Scholar] [CrossRef] [Green Version]
  12. Lee, G. Sources of errors in rainfall measurements by polarimetric radar: Variability of drop size distributions, observational noise, and variation of relationships between R and polarimetric parameters. J. Atmos. Ocean. Technol. 2006, 23, 1005–1028. [Google Scholar] [CrossRef]
  13. Sachidananda, M.; Zrnić, D.S. Rain rate estimates from differential polarization measurements. J. Atmos. Ocean. Technol. 1987, 4, 588–598. [Google Scholar] [CrossRef] [Green Version]
  14. Ryzhkov, A.V.; Giangrande, S.E.; Schuur, T.J. Rainfall Estimation with a Polarimetric Prototype of WSR-88D. J. Appl. Meteorol. 2005, 44, 502–515. [Google Scholar] [CrossRef] [Green Version]
  15. Cifelli, R.; Chandrasekar, V.; Lim, S.; Kennedy, P.C.; Wang, Y.; Rutledge, S.A. A new dual-polarization radar rainfall algorithm: Application in Colorado precipitation events. J. Atmos. Ocean. Technol. 2011, 28, 352–364. [Google Scholar] [CrossRef] [Green Version]
  16. Seliga, T.A.; Bringi, V.N.; Al-Khatib, H.H. A Preliminary Study of Comparative Measurements of Rainfall Rate Using the Differential Reflectivity Radar Technique and a Raingage Network. J. Appl. Meteorol. 1981, 20, 1362–1368. [Google Scholar] [CrossRef] [Green Version]
  17. Ryzhkov, A.V.; Zrnic, D.S. Radar Polarimetry for Weather Observations; Springer Atmospheric Sciences; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; ISBN 9783030050931. [Google Scholar]
  18. Teschl, R.; Randeu, W.L.; Teschl, F. Improving weather radar estimates of rainfall using feed-forward neural networks. Neural Netw. 2007, 20, 519–527. [Google Scholar] [CrossRef] [PubMed]
  19. Kühnlein, M.; Appelhans, T.; Thies, B.; Nauss, T. Improving the accuracy of rainfall rates from optical satellite sensors with machine learning—A random forests-based approach applied to MSG SEVIRI. Remote Sens. Environ. 2014, 141, 129–143. [Google Scholar] [CrossRef] [Green Version]
  20. Ouallouche, F.; Lazri, M.; Ameur, S. Improvement of rainfall estimation from MSG data using Random Forests classification and regression. Atmos. Res. 2018, 211, 62–72. [Google Scholar] [CrossRef]
  21. Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; Routledge: New York, NY, USA, 1984; ISBN 9780412048418. [Google Scholar]
  22. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  23. Kusiak, A.; Wei, X.; Verma, A.P.; Roz, E. Modeling and prediction of rainfall using radar reflectivity data: A data-mining approach. IEEE Trans. Geosci. Remote Sens. 2013, 51, 2337–2342. [Google Scholar] [CrossRef]
  24. Yuan, Q.; Shen, H.; Li, T.; Li, Z.; Li, S.; Jiang, Y.; Xu, H.; Tan, W.; Yang, Q.; Wang, J.; et al. Deep learning in environmental remote sensing: Achievements and challenges. Remote Sens. Environ. 2020, 241, 111716. [Google Scholar] [CrossRef]
  25. Chiang, Y.M.; Chang, F.J.; Jou, B.J.D.; Lin, P.F. Dynamic ANN for precipitation estimation and forecasting from radar observations. J. Hydrol. 2007, 334, 250–261. [Google Scholar] [CrossRef]
  26. Chen, H.; Chandrasekar, V.; Cifelli, R. A Deep Learning Approach to Dual-Polarization Radar Rainfall Estimation. In Proceedings of the 2019 URSI Asia-Pacific Radio Science Conference (AP-RASC), New Delhi, India, 9–15 March 2019; pp. 1–2. [Google Scholar]
  27. Kruger, A.; Krajewski, W.F. Two-Dimensional Video Disdrometer: A Description. J. Atmos. Ocean. Technol. 2002, 19, 602–617. [Google Scholar] [CrossRef]
  28. Atlas, D.; Srivastava, R.C.; Sekhon, R.S. Doppler radar characteristics of precipitation at vertical incidence. Rev. Geophys. 1973, 11, 1–35. [Google Scholar] [CrossRef]
  29. Bang, W.; Lee, G.; Ryzhkov, A.; Schuur, T.; Lim, K.-S.S. Comparison of microphysical characteristics between southern Korea and Oklahoma using two-dimensional video disdrometer data. J. Hydrometeorol. 2020, 1–61. [Google Scholar] [CrossRef]
  30. Thurai, M.; Gatlin, P.; Bringi, V.N.; Petersen, W.; Kennedy, P.; Notaroš, B.; Carey, L. Toward completing the raindrop size spectrum: Case studies involving 2D-video disdrometer, droplet spectrometer, and polarimetric radar measurements. J. Appl. Meteorol. Climatol. 2017, 56, 877–896. [Google Scholar] [CrossRef]
  31. Mishchenko, M.I.; Travis, L.D.; Mackowski, D.W. T-matrix computations of light scattering by nonspherical particles: A review. J. Quant. Spectrosc. Radiat. Transf. 1996, 55, 535–575. [Google Scholar] [CrossRef]
  32. Thurai, M.; Huang, G.J.; Bringi, V.N.; Randeu, W.L.; Schönhuber, M. Drop shapes, model comparisons, and calculations of polarimetric radar parameters in rain. J. Atmos. Ocean. Technol. 2007, 24, 1019–1032. [Google Scholar] [CrossRef]
  33. Lee, G.; Zawadzki, I. Radar calibration by gage, disdrometer, and polarimetry: Theoretical limit caused by the variability of drop size distribution and application to fast scanning operational radar data. J. Hydrol. 2006, 328, 83–97. [Google Scholar] [CrossRef]
  34. Kwon, S.; Lee, G.; Kim, G. Rainfall Estimation from an Operational S-Band Dual-Polarization Radar: Effect of Radar Calibration. J. Meteorol. Soc. Jpn. Ser. II 2015, 93, 65–79. [Google Scholar] [CrossRef] [Green Version]
  35. Liaw, A.; Wiener, M. Classification and regression by random forest. R News 2002, 2, 18–22. [Google Scholar]
  36. Amemiya, Y. Generalization of the TLS approach in the errors-in-variables problem. In Recent Advances in Total Least Squares Techniques and Errors-in-Variables Modeling; Van Huffel, S., Ed.; SIAM: Philadelphia, PA, USA, 1997; pp. 77–86. [Google Scholar]
  37. Chandrasekar, V.; Bringi, V.N. Error Structure of Multiparameter Radar and Surface Measurements of Rainfall Part I: Differential Reflectivity. J. Atmos. Ocean. Technol. 1988, 5, 783–795. [Google Scholar] [CrossRef] [Green Version]
  38. Chandrasekar, V.; Bringi, V.N.; Balakrishnan, N.; Zrnić, D.S. Error Structure of Multiparameter Radar and Surface Measurements of Rainfall. Part III: Specific Differential Phase. J. Atmos. Ocean. Technol. 1990, 7, 621–629. [Google Scholar] [CrossRef] [Green Version]
  39. Chandrasekar, V.; Gorgucci, E.; Scarchilli, G. Optimization of Multiparameter Radar Estimates of Rainfall. J. Appl. Meteorol. 1993, 32, 1288–1293. [Google Scholar] [CrossRef] [Green Version]
  40. Silvestro, F.; Rebora, N.; Ferraris, L. An algorithm for real-time rainfall rate estimation by using polarimetric radar: RIME. J. Hydrometeorol. 2009, 10, 227–240. [Google Scholar] [CrossRef]
  41. Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
  42. Kwon, S.; Jung, S.-H.; Lee, G. Inter-comparison of radar rainfall rate using Constant Altitude Plan Position Indicator and hybrid surface rainfall maps. J. Hydrol. 2015, 531, 234–247. [Google Scholar] [CrossRef]
  43. Ye, B.Y.; Lee, G.W.; Park, H.M. Identification and removal of non-meteorological echoes in dual-polarization radar data based on a fuzzy logic algorithm. Adv. Atmos. Sci. 2015, 32, 1217–1230. [Google Scholar] [CrossRef]
Figure 1. The locations of 192 automatic weather stations (dots) within the radar observation range of the MYN radar (red filled triangle). Black rings denote radar range rings with a 50 km interval.
Figure 1. The locations of 192 automatic weather stations (dots) within the radar observation range of the MYN radar (red filled triangle). Black rings denote radar range rings with a 50 km interval.
Remotesensing 13 00694 g001
Figure 2. Schematic diagram of the regression tree. X i is an independent variable, and c i denotes an optimal splitting criterion value for the corresponding independent variables. The Y j ¯ denotes the average value of the data belonging to the jth terminal node.
Figure 2. Schematic diagram of the regression tree. X i is an independent variable, and c i denotes an optimal splitting criterion value for the corresponding independent variables. The Y j ¯ denotes the average value of the data belonging to the jth terminal node.
Remotesensing 13 00694 g002
Figure 3. Schematic diagram of the random forest (RF) model.
Figure 3. Schematic diagram of the random forest (RF) model.
Remotesensing 13 00694 g003
Figure 4. Flowchart of the rainfall estimation in this study.
Figure 4. Flowchart of the rainfall estimation in this study.
Remotesensing 13 00694 g004
Figure 5. Results of the classification by the (a) M1KYCN and (b) M1KNCN regression trees. n is the number of observations. X, c, and Y ¯ are the same as in Figure 2.
Figure 5. Results of the classification by the (a) M1KYCN and (b) M1KNCN regression trees. n is the number of observations. X, c, and Y ¯ are the same as in Figure 2.
Remotesensing 13 00694 g005
Figure 6. Scatter plots of R2DVD and the rainfall rate estimated using a regression tree (RT) with 10-fold cross-validation. (a) M1CN, (b) M1CY, (c) M2CN, (d) M2CY, (e) M3CN, and (f) M3CY.
Figure 6. Scatter plots of R2DVD and the rainfall rate estimated using a regression tree (RT) with 10-fold cross-validation. (a) M1CN, (b) M1CY, (c) M2CN, (d) M2CY, (e) M3CN, and (f) M3CY.
Remotesensing 13 00694 g006
Figure 7. Scatter plots of R2DVD and the rainfall rate estimated by RF using 10-fold cross-validation. (a) M1CN, (b) M1CY, (c) M2CN, (d) M2CY, (e) M3CN, and (f) M3CY.
Figure 7. Scatter plots of R2DVD and the rainfall rate estimated by RF using 10-fold cross-validation. (a) M1CN, (b) M1CY, (c) M2CN, (d) M2CY, (e) M3CN, and (f) M3CY.
Remotesensing 13 00694 g007
Figure 8. Scatter plots of the rainfall rate from empirical relationships. (a) R(Zh), (b) adjusted R(Zh), (c) R(KDP), and (d) R(Zh, ZDR).
Figure 8. Scatter plots of the rainfall rate from empirical relationships. (a) R(Zh), (b) adjusted R(Zh), (c) R(KDP), and (d) R(Zh, ZDR).
Remotesensing 13 00694 g008
Figure 9. Hybrid Surface Rainfall method (HSR) images of the (a) estimated rain rate by Z–R relationship (R(Zh)), (b) adjusted rain rate (R(Zh)− ε R F ), (c) estimated residual by RF ( ε R F ), (d) ZH, (e) ZDR, and (f) KDP at Case 1 (1000 LST on 14 August 2017).
Figure 9. Hybrid Surface Rainfall method (HSR) images of the (a) estimated rain rate by Z–R relationship (R(Zh)), (b) adjusted rain rate (R(Zh)− ε R F ), (c) estimated residual by RF ( ε R F ), (d) ZH, (e) ZDR, and (f) KDP at Case 1 (1000 LST on 14 August 2017).
Remotesensing 13 00694 g009
Figure 10. HSR images of the (a) estimated rain rate by Z–R relationship (R(Zh)), (b) adjusted rain rate (R(Zh)− ε R F ), (c) estimated residual by RF ( ε R F ), (d) ZH, (e) ZDR, and (f) KDP at Case 2 (0730 LST on 11 September 2017).
Figure 10. HSR images of the (a) estimated rain rate by Z–R relationship (R(Zh)), (b) adjusted rain rate (R(Zh)− ε R F ), (c) estimated residual by RF ( ε R F ), (d) ZH, (e) ZDR, and (f) KDP at Case 2 (0730 LST on 11 September 2017).
Remotesensing 13 00694 g010
Figure 11. Scatter plots of (a) R(Zh)− ε R F , (b) R(Zh), (c) adjusted R(Zh), (d) R(KDP), and (e) R(Zh, ZDR) for Case 1 (14 August 2017). RAWS is the rainfall rate from the ground rain gauge. Values in the parentheses (in (a)) represent the improvement percentages from the best performance of the empirical relationship.
Figure 11. Scatter plots of (a) R(Zh)− ε R F , (b) R(Zh), (c) adjusted R(Zh), (d) R(KDP), and (e) R(Zh, ZDR) for Case 1 (14 August 2017). RAWS is the rainfall rate from the ground rain gauge. Values in the parentheses (in (a)) represent the improvement percentages from the best performance of the empirical relationship.
Remotesensing 13 00694 g011
Figure 12. Scatter plots of (a) R(Zh)− ε R F , (b) R(Zh), (c) adjusted R(Zh), (d) R(KDP), and (e) R(Zh, ZDR) for Case 2 (11 September 2017). Values in parentheses (in (a)) represent the improvement percentage from the best performance of the empirical relationship.
Figure 12. Scatter plots of (a) R(Zh)− ε R F , (b) R(Zh), (c) adjusted R(Zh), (d) R(KDP), and (e) R(Zh, ZDR) for Case 2 (11 September 2017). Values in parentheses (in (a)) represent the improvement percentage from the best performance of the empirical relationship.
Remotesensing 13 00694 g012
Figure 13. Scatter plots of (a) R(Zh)− ε R F , (b) R(Zh), (c) adjusted R(Zh), (d) R(KDP), and (e) R(Zh, ZDR), with different cases represented by different colors. Values in parentheses (in (a)) represent the improvement percentage from the best performance of the empirical relationship.
Figure 13. Scatter plots of (a) R(Zh)− ε R F , (b) R(Zh), (c) adjusted R(Zh), (d) R(KDP), and (e) R(Zh, ZDR), with different cases represented by different colors. Values in parentheses (in (a)) represent the improvement percentage from the best performance of the empirical relationship.
Remotesensing 13 00694 g013
Table 1. The two-dimensional video disdrometer (2DVD) data used in this study.
Table 1. The two-dimensional video disdrometer (2DVD) data used in this study.
AreaPeriod [Year]Number of 1-min DataMedian of 1-min Rain Rate [mm h−1]Median of 1-min Reflectivity [dBZ]Maximum of 1-min Rain Rate [mm h−1]Maximum of 1-min Reflectivity [dBZ]
Oklahoma, USA (OKL)1996–2006
(May to September)
79441.7827.72133.3954.88
Daegu (DAE)2011–2012
(May to September)
75161.3625.0999.2452.50
Boseong (BOS)2013–2015, 2018
(May to September)
12,0830.9922.9293.3953.95
2018
(October)
713
Jincheon (JIN)2013–2015, 2018
(May to September)
22,7311.0623.5576.4654.11
2018
(October)
315
2019
(April)
545
Total51,302 -
Table 2. The control conditions and their corresponding values used in the T-matrix scattering simulation.
Table 2. The control conditions and their corresponding values used in the T-matrix scattering simulation.
CharacteristicsValues
Radar wavelength11.01 cm (S-band)
Radar elevation angle0 °
Environment temperature23 °C
Drop shape formulaTaken from Thurai et al. (2007)
Table 3. Specifications of the Mt. Myeongbong (MYN) S-band dual-polarization radar.
Table 3. Specifications of the Mt. Myeongbong (MYN) S-band dual-polarization radar.
ParameterValue
Frequency
(wavelength)
2272 MHZ
(10 cm, S-band)
Location36°10′45″N, 128°59′50″E
Height1136 m
Beam width0.92 °
Elevation angles0 ° , 0.39°, 0.83°, 2°, 2.88°, 4.06°, 5.67°, 7.88°, and 10.94°
Maximum range285 km
Table 4. List of precipitation period and rain types of rainfall events.
Table 4. List of precipitation period and rain types of rainfall events.
Case No.Period (LST)Rain Type
10000–1200 14 August 2017Stratiform
20200–1100 11 September 2017Stratiform
30200–0700 1 July 2018 Stratiform
41700 27 August–0600 28 August 2018Convective
51000–1600 3 September 2018Convective
60500–1000 7 September 2018Stratiform
Table 5. The reflectivity (ZH) interval and the number of training data.
Table 5. The reflectivity (ZH) interval and the number of training data.
Class No.Interval [dBZ]Number of Observations
1 0   Z H < 20 599
2 20   Z H < 30 11,023
3 30   Z H < 40 9369
4 40   Z H   1639
Table 6. Summary of the models used in this study.
Table 6. Summary of the models used in this study.
Independent VariablesTraining SetDependent Variables
Rain rate
(R2DVD)
(M1)
Residual
( ε = R(Zh) − R2DVD)
(M2)
Normalized residual
( ε ¯ = ε / R ( Z h ) ¯ )
(M3)
Zh, ZDR, KDP, ρ HV , Zh 5min, ZDR 5min, KDP 5min
(KY)
Not classified training set
(CN)
M1KYCNM2KYCNM3KYCN
Classified by
reflectivity interval
(CY)
M1KYCYM2KYCYM3KYCY
Zh, ZDR, ρ HV , Zh 5min, ZDR 5min
(KN)
Not classified training set
(CN)
M1KNCNM2KNCNM3KNCN
Classified by
reflectivity interval
(CY)
M1KNCYM2KNCYM3KNCY
Table 7. Increase of the node purity in RFs for CN. The highest increases of node purity are highlighted in bold.
Table 7. Increase of the node purity in RFs for CN. The highest increases of node purity are highlighted in bold.
M1M2M3
KYZh426,34975,4222734
ZDR36,434207,2607153
KDP713,05329,9971019
ρ HV 388322,859641
Zh 5min147,97119,286635
ZDR 5min36,61887,9822993
KDP 5min311,49011,326385
KNZh832,321101,3353347
ZDR106,485200,6276948
ρ HV 10,99822,430712
Zh 5min565,86732,9651192
ZDR 5min147,22097,4323308
Table 8. Increase of the node purity in RFs for CY. The highest increases of node purity are highlighted in bold.
Table 8. Increase of the node purity in RFs for CY. The highest increases of node purity are highlighted in bold.
M1M2
0 Z H < 20 20 Z H < 30 30 Z H < 40 40 Z H 0 Z H < 20 20 Z H < 30 30 Z H < 40 40 Z H
KYZh0.803135121,85097,1150.080103306451,941
ZDR0.24035710,56024,1840.35548416,522166,202
KDP1.197186135,962255,7370.112155430317,446
ρ HV 0.04130262524130.03562485626,171
Zh 5min0.202579772729,8020.0817712258406
ZDR 5min0.192265862415,2660.19425711,10358,468
KDP 5min0.27387515,24888,5290.08913021676925
KNZh1.771298751,455261,5520.10722462,09662,096
ZDR0.36349015,22746,5100.418545153,789153,789
ρ HV 0.061634318895380.0515533,45733,457
Zh 5min0.443144320,805148,6840.13018114,20814,208
ZDR 5min0.29635011,76043,2500.23026167,74367,743
Table 9. Accuracy of the rainfall estimation with the different models for stratiform, convective, and all cases. The highest values of statistics are highlighted in bold, and the values in parentheses represent the improvement percentage from the best performance of the empirical relationship. Root mean square error (RMSE), mean absolute error (MAE), bias, correlation coefficient (CORR), coefficient of efficiency (COE) [41], and normalized error (1-NE).
Table 9. Accuracy of the rainfall estimation with the different models for stratiform, convective, and all cases. The highest values of statistics are highlighted in bold, and the values in parentheses represent the improvement percentage from the best performance of the empirical relationship. Root mean square error (RMSE), mean absolute error (MAE), bias, correlation coefficient (CORR), coefficient of efficiency (COE) [41], and normalized error (1-NE).
TypeMethodRMSEMAEBiasCORRCOE1-NE [%]
StratiformR(Zh)   ε R F 1.237
(8.37)
0.770
(8.66)
1.2940.956
(0.52)
0.905
(2.03)
77.18
(2.88)
R(Zh)1.6760.8431.1900.9360.82570.79
Adjusted R(Zh)1.3500.9851.2100.9510.88775.02
R(KDP)2.2171.3461.1290.8960.69560.09
R(Zh, ZDR)1.4690.9161.3070.9410.86672.83
ConvectiveR(Zh)   ε R F 0.612
(3.92)
0.330
(5.98)
1.0410.873
(2.22)
0.731
(3.10)
55.79
(5.36)
R(Zh)0.7160.3820.9710.8330.63248.82
Adjusted R(Zh)0.6370.3511.0400.8540.70952.95
R(KDP)1.2040.6711.5390.463−0.04010.07
R(Zh, ZDR)0.8030.4180.7620.7630.53740.98
TotalR(Zh)   ε R F 1.039
(8.05)
0.593
(8.63)
1.2090.959
(0.52)
0.912
(1.90)
75.24
(3.21)
R(Zh)1.3890.7521.1360.9390.84268.60
Adjusted R(Zh)1.1300.6491.1590.9540.89572.90
R(KDP)1.8831.0721.2610.8840.70955.23
R(Zh, ZDR)1.2520.7161.1380.9430.87170.12
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Shin, K.; Song, J.J.; Bang, W.; Lee, G. Quantitative Precipitation Estimates Using Machine Learning Approaches with Operational Dual-Polarization Radar Data. Remote Sens. 2021, 13, 694. https://doi.org/10.3390/rs13040694

AMA Style

Shin K, Song JJ, Bang W, Lee G. Quantitative Precipitation Estimates Using Machine Learning Approaches with Operational Dual-Polarization Radar Data. Remote Sensing. 2021; 13(4):694. https://doi.org/10.3390/rs13040694

Chicago/Turabian Style

Shin, Kyuhee, Joon Jin Song, Wonbae Bang, and GyuWon Lee. 2021. "Quantitative Precipitation Estimates Using Machine Learning Approaches with Operational Dual-Polarization Radar Data" Remote Sensing 13, no. 4: 694. https://doi.org/10.3390/rs13040694

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop