Quantitative Precipitation Estimation Using Weather Radar Data and Machine Learning Algorithms for the Southern Region of Brazil

Verdelho, Fernanda F.; Beneti, Cesar; Pavam, Luis G.; Calvetti, Leonardo; Oliveira, Luiz E. S.; Zanata Alves, Marco A.

doi:10.3390/rs16111971

Open AccessArticle

Quantitative Precipitation Estimation Using Weather Radar Data and Machine Learning Algorithms for the Southern Region of Brazil

by

Fernanda F. Verdelho

^1,2,*,†

,

Cesar Beneti

^1,†

,

Luis G. Pavam, Jr.

^1,2

,

Leonardo Calvetti

³

,

Luiz E. S. Oliveira

²

and

Marco A. Zanata Alves

^2,†

¹

Parana Environmental Technology and Monitoring System—SIMEPAR, Curitiba 81530-900, Brazil

²

Department of Computer Science, Polytechnic Center, Federal University of Paraná—UFPR, Curitiba 81530-000, Brazil

³

Department of Meteorology, Federal University of Pelotas—UFPEL, Pelotas 96010-610, Brazil

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2024, 16(11), 1971; https://doi.org/10.3390/rs16111971

Submission received: 22 February 2024 / Revised: 20 May 2024 / Accepted: 24 May 2024 / Published: 30 May 2024

(This article belongs to the Special Issue Advance of Radar Meteorology and Hydrology II)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In addressing the challenges of quantitative precipitation estimation (QPE) using weather radar, the importance of enhancing the rainfall estimates for applications such as flash flood forecasting and hydropower generation management is recognized. This study employed dual-polarization weather radar data to refine the traditional Z–R relationship, which often needs higher accuracy in areas with complex meteorological phenomena. Utilizing tree-based machine learning algorithms, such as random forest and gradient boosting, this research analyzed polarimetric variables to capture the intricate patterns within the Z–R relationship. The results highlight machine learning’s potential to improve the precision of precipitation estimation, especially under challenging weather conditions. Integrating meteorological insights with advanced machine learning techniques is a remarkable achievement toward a more precise and adaptable precipitation estimation method.

Keywords:

machine learning; quantitative precipitation estimation; precipitation estimation; meteorological radar; random forest; gradient boosting

1. Introduction

The quantitative precipitation estimation (QPE) method is a foundational pillar in hydro-meteorological sciences, with far-reaching implications for energy generation, agricultural planning, and environmental conservation. This study’s motivation arose from Brazil’s hydroelectric heart, Western Paraná, underpinned by the Itaipu Binational Dam, a global leader in hydroelectric energy output [1]. The complexities of weather forecasting in this region not only have academic and economic significance but also are vital for the strategic operation of hydroelectric reservoirs and for the protection of communities against the unpredictable forces of nature [2,3].

The Z–R relationship [4], a cornerstone in radar meteorology for the conversion of radar reflectivity (Z) into rainfall rates (R), has long been recognized for its broad applicability, but it is also subject to various limitations [5,6]. Factors such as variability in the raindrop size distribution, the presence of mixed-phase precipitation, radar signal attenuation, calibration challenges, and physical obstructions contribute to these uncertainties, which are particularly pronounced in the complex landscape of Western Paraná, Brazil.

These issues are common in various geographical settings, necessitating a novel methodological approach to enhance the accuracy of rainfall estimation [4]. The Z–R relationship is based on empirical correlations that vary geographically and temporally, influenced by the local climatic conditions, which underscores the need for adaptive approaches that can adjust to specific meteorological conditions.

This research aimed to evaluate the applications of the Z–R relationship in a particular environmental context and explore the potential of machine learning (ML) to improve QPE. This study focused on tree-based machine learning models—random forest [7] and gradient boosting [8]—to leverage their ability to model the nonlinear complexities of precipitation data. While previous research has tested machine learning methods for QPE in various parts of the world, including the Southern Andes of Ecuador [9], South Korea [10], and Switzerland [11], none have yet employed a hybrid approach integrating multiple machine learning models to enhance rainfall estimation.

In our academic pursuits, we have extensively utilized machine learning to achieve progress in quantitative precipitation estimation (QPE). Previous studies have suggested that machine learning can be effective in this field, but it is still relatively new, particularly in the development of hybrid machine learning models. Rollenbeck et al. [12] showed that machine learning can outperform empirical approaches in calibrating X-band radar for extreme weather events in a region of complex precipitation in North Peru, highlighting the potential of advanced algorithms in such scenarios.

This research explored the application of machine learning in meteorology to obtain more accurate precipitation estimates in a region where the weather patterns are closely linked with hydroelectric power generation. Our study analyzed the performances of two models, random forest and gradient boosting, in both classification and regression scenarios. We aimed to create a more resilient and accurate meteorological practice, which could have implications beyond Western Paraná.

This article is structured to guide the reader through the research process. The analysis begins with an in-depth examination of the dataset, which forms the basis of the hybrid machine learning approach for QPE (Section 2). This is followed by an explanation of the selected machine learning models, the data transformation procedures, and the benchmarks for performance evaluation (Section 3). An assessment of the current QPE methods establishes the context for a detailed analysis of the proposed hybrid model’s effectiveness, calibration, and configuration (Section 3.1). This article then discusses the practical application of the model, its adaptability to operational demands, and its validation against real-world precipitation events (Section 3.2). Finally, this article concludes by synthesizing the findings, discussing their implications, and considering the potential of the hybrid approach within the broader context of QPE advancements (Section 4).

2. Materials and Methods

The research methodology can be categorized into four phases: data collection, data preprocessing, feature engineering, model development, and model evaluation.

2.1. Data Collection

Paraná is one of the five most developed states in Brazil, with a strong economy that is centered around agriculture and industry. In addition, it has the second-highest energy potential in the country.

In the Paraná region, which encompasses six micro-regions and their hydroelectric plants, there is a need for increased monitoring due to various critical factors. These factors include the region’s significant climate variability, the potential impact on hydroelectric plant operations, and the importance of accurate weather forecasts for water resource management and natural disaster prevention.

To address these challenges, a primary dataset was derived from a dual-polarization weather radar system located in Cascavel, Paraná, Brazil. Installed in 2014, this radar system captures atmospheric data with high granularity, providing comprehensive representations of precipitation events over several years.

Figure 1 illustrates the study area and the network of 36 rain gauges, which complements the radar data. The rain gauge network features automatic tipping bucket mechanisms with a sensitivity greater than 0.1 mm. The EEC S-band CAS radar magnetron is equipped with an 8.5 m parabolic antenna, offering an over 45.0 dB gain and a half-power beam width of 0.95 degrees. It supports both linear horizontal and vertical polarization, with an angular positioning accuracy of 0.05 degrees. The radar’s scanning speed can reach up to 10 rpm, and it includes a magnetron transmitter with peak power of 850 kW. It uses a single receiver, with a typical minimum discernible signal of −114 dBm and a linear dynamic range of up to 105 dB.

Table 1 presents the spatial and temporal resolutions of the radar and rain gauge data, both in their original and transformed forms. Initially, the radar data feature a spatial resolution of 1° in azimuth by 250 m in range at a single elevation, with a temporal resolution of 5 min. These data are then reprocessed in the database to a different format by redefining the range resolution as a ’bin gate’ to better understand the precipitation patterns over different distances. This transformation also adjusts the temporal resolution to 15 min for consistency in analysis.

Similarly, the rain gauge data, which originally use the latitude and longitude for spatial resolution and have a 15 min temporal resolution, are converted into a format compatible with the radar data, using azimuth and range coordinates, while maintaining the exact temporal resolution. The binning process of the radar data allows for a more nuanced interpretation of the spatial variability in the precipitation, aligning them with the rain gauge data for a comprehensive analysis.

This research examined the Z–R relationships used in the operational environment in Paraná, Brazil (Figure 2). The relationships chosen for comparison were derived from the methodologies of Marshall and Palmer [4], Calheiros [13], and NEXRAD [14], which are operationally viable within the regional context. While other coefficients, such as those proposed by Vulpiani et al. [15], are available, this study prioritized the operational applicability of the chosen Z–R relationships. These relationships were specifically selected because they have been extensively tested and are commonly used in operational settings in research, ensuring a practical and targeted analysis. This selection allows for a coherent evaluation of the Z–R relationship’s performance in real-world weather radar applications.

2.2. Data Preprocessing

In this study, we considered the distribution of the precipitation over the Western Paraná region from 2018 to 2022. Our dataset predominantly comprised events with no precipitation, with almost 94% of the data showing precipitation levels below 0.1 mm. Due to this imbalance, we chose to focus on precipitation events exceeding 0.2 mm per 15 min, which aligns with the calibration settings of the rain gauges used. These gauges were optimized to accurately detect minimal yet significant precipitation events, ensuring an effective analysis.

We created a distribution graph of the precipitation data between 2018 and 2022 that focuses on rain events exceeding 0.2 mm (Figure 3). This graph displays the amount of precipitation in millimeters on a logarithmic scale. The analysis shows that the majority of the data cluster between 0.2 and 10 mm per 15 min, with a significant peak in the range of 5 to 10 mm. There is also a noticeable decrease in the frequency of data for rainfall volumes exceeding 10 mm. Only a tiny fraction, approximately 0.02%, of the data correspond to precipitation events that exceed 30 mm. This refined focus on specific precipitation ranges allowed for a more targeted and accurate analysis of the data, which is essential in understanding and predicting rainfall patterns in the context of weather forecasting, climate studies, and urban planning.

We ensured the integrity and reliability of the meteorological data by comprehensively addressing several critical aspects during the data preprocessing phase. This phase included the following key steps:

Distance Filtering: We refined the dataset for an accurate analysis by applying filters to remove data from the radar’s blind spot and areas beyond its reliable range.
Handling Missing Data: We implemented techniques to address missing values in key variables, such as reflectivity, ensuring the dataset’s completeness.
Polarimetric Variable Selection: The meticulous selection and filtering of crucial polarimetric variables were conducted to enhance the quality of the data. Variables such as $Z_{DR}$ , $K_{dp}$ , and $ρ_{hv}$ were carefully chosen based on their importance in distinguishing meteorological phenomena, as described in [17]. The thresholds defined for these variables were as follows:
–
For $ρ_{hv}$ , a threshold of ≥0.5 identifies cloud data, whereas values up to $0.99$ indicate high linear polarization associated with precipitation.
–
The threshold for $Z_{DR}$ was set at $- 8.0$ , crucial for selecting rain data due to high horizontal diffraction.
–
Similarly, a threshold of $- 8.0$ for $K_{dp}$ identifies rain data related to differential polarization diffraction.
–
For snow data selection, a $K_{dp}$ threshold of $- 15.0$ was used, owing to the low differential polarization diffraction.
These thresholds are instrumental in ensuring precise discrimination between different meteorological conditions.
Consistency Check Between Reflectivity and Precipitation: A thorough validation ensured consistency between the radar reflectivity measurements and rain gauge data.

These preprocessing steps were pivotal in preparing the radar and rain gauge data for subsequent machine learning model training and validation. The cleaned dataset significantly improved the accuracy of our precipitation estimation models by facilitating more reliable and detailed meteorological analyses.

2.3. Feature Engineering

Feature engineering plays a critical role in enhancing the performance of machine learning models in meteorological applications, particularly in precipitation estimation using radar data. The process involves the meticulous selection and preprocessing of the variables to ensure their relevance and effectiveness in the predictive models.

In the training and evaluation of our models, we carefully selected the variables based on their relevance to the collected data and the objectives of our study. The primary variables included the horizontal reflectivity (

D B Z_{h}

), differential reflectivity (ZDR), specific differential phase (

K_{dp}

), and co-polar correlation coefficient (

R h o_{HV}

), all of which are measured by the radar system. Additionally, we incorporated the altitude and distance from the radar as critical features in our machine learning models. These additions were crucial in enhancing the accuracy of our quantitative precipitation estimates, allowing our models to account for variations in elevation and radar beam dispersal over different distances.

Table 2 presents a clear and visual description of the variables selected for the model. These variables were chosen because they notably impacted the model’s accuracy in predicting precipitation events. To make these variables comparable and improve the precision of the predictions, they were preprocessed. This conversion from their original units, such as decibels (dBZ) for

{mm}^{6} m^{- 3}

, to more relevant units, such as millimeters of precipitation over a 15 min interval, was necessary.

For a complete understanding, Table 3 shows the input variables utilized for the precipitation estimation model from the radar and station data.

The dataset was divided into 2018 and 2021 data for training (70%), validation (30%), and testing (year of 2022). The feature engineering process, including variable selection and preprocessing, proved crucial in enhancing the model’s precision and reliability in predicting the precipitation patterns.

2.4. Model Development

As part of our research, we designed a model to improve quantitative precipitation estimation (QPE) using machine learning techniques, specifically the random forest (RF) and gradient boosting (GB) methods. These techniques were chosen for their exceptional performance in classification and regression tasks, which are crucial for accurate precipitation prediction. The RF technique combines the predictions of multiple decision trees (denoted as (denoted as

T_{i} (y)

) ) by either voting or averaging:

f (y) = \frac{1}{N} \sum_{i = 1}^{N} T_{i} (y)

(1)

where

f (y)

denotes the random forest outcome, and N represents the number of trees.

Gradient boosting is an ensemble technique that builds a series of models in a sequential manner, with each subsequent model aiming to correct the errors made by its predecessors. Specifically, GB refines the predictions by focusing on the errors of prior iterations:

f (y) = \sum_{i = 1}^{n} α_{i} h_{i} (y)

(2)

where

f (y)

is the composite model’s prediction for input y, n is the iteration count,

α_{i}

is the weight for iteration i’s model

h_{i} (y)

, and

h_{i} (y)

is the prediction of the weak learner at iteration i.

Our methodology comprised two primary stages: classification and regression. Initially, we distinguished between ‘rain’ and ‘no rain’ events using thresholds based on the precipitation values. Subsequently, we employed regression to estimate the precipitation intensity in the ‘rain’ data, using the same machine learning techniques. This dual approach, which combined RF’s ensemble method and GB’s error-minimizing capability, ensured an all-inclusive and accurate QPE model, effectively leveraging the strengths of both techniques in handling complex meteorological data.

The classification stage in our methodology employs the RF and GB techniques to effectively distinguish between ‘rain’ and ‘no rain’ events. This distinction is crucial, as it allows our model to identify operational patterns that signal rain events above the 0.2 mm/15 min threshold, ensuring that only data representing quantifiable rain enter the regression stage. This process not only enhances the accuracy of our rainfall estimation but also optimizes the computational efficiency by focusing on relevant events. These methodological steps are depicted in Figure 4, which illustrates the model application workflow.

We selected key variables for model training and evaluation, including radar-measured factors such as reflectivity (

D B Z_{h}

), differential reflectivity (ZDR), the specific differential phase (

K_{dp}

), and the co-polar correlation coefficient (

R h o_{HV}

).

2.5. Model Evaluation

To evaluate the model’s performance comprehensively, we compared its estimated precipitation rates with the actual values from the test dataset. Additionally, we compared the model’s predictions with the theoretical Z–R model.

During the validation process, ground-based meteorological station data were used to compare the model’s forecasts with actual observations. This enabled a detailed analysis to identify any discrepancies. To determine the accuracy of the regression model, statistical metrics such as the root mean square error (RMSE), mean absolute error (MAE), and Kling–Gupta efficiency (KGE) were used [18,19]. The mean squared error (MSE), which is defined in Equation (3), measures the average of the squared differences between the forecast and actual values and provides a residue variance metric:

M S E = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - \hat{y})}^{2}

(3)

where

y_{i}

denotes the forecast value of y, and

\hat{y}

signifies the mean of y.

The RMSE is a metric that consolidates the forecast errors into a single predictive power score. It is calculated by taking the square root of the MSE (Equation (4)). When extrapolating the precipitation estimates across the Paraná grid, the RMSE is particularly sensitive to larger magnitude errors, such as potential outliers that may result from the extrapolation process.

R M S E = \sqrt{M S E} = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - \hat{y})}^{2}}

(4)

The mean absolute error (MAE) is a commonly used metric in data analysis that calculates the average of the absolute differences between the predicted and actual values in a dataset. The metric is widely utilized due to its simple interpretation and compatibility with the prediction target’s scale. Equation (5) is used to calculate the MAE:

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - x_{i} |

(5)

where

y_{i}

symbolizes the forecast value of y,

x_{i}

is the observed value of x, and n represents the total number of data points.

The last metric used to evaluate the performance of the regression model was the Kling–Gupta efficiency (KGE) metric [18,19,20]. This metric is obtained by Equation (6), with its parameters r,

β

, and

γ

calculated through Equations (7)–(9), respectively. This metric quantifies the degree of overlap between the observed and forecast time series by examining their correlations, mean values, and standard deviations, thereby providing a comprehensive analysis of the regression model’s performance. One key advantage of using the KGE metric over other metrics is its global applicability and effectiveness in diverse hydrological contexts.

\begin{matrix} {KGE}^{'} & = 1 - \sqrt{{(r - 1)}^{2} + {(β - 1)}^{2} + {(γ - 1)}^{2}} \end{matrix}

(6)

\begin{matrix} r & = \frac{\sum_{i = 1}^{n} (y_{i} - \bar{y}) ({\hat{y}}_{i} - \bar{\hat{y}})}{\sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} \sqrt{\sum_{i = 1}^{n} {({\hat{y}}_{i} - \bar{\bar{\hat{y}}})}^{2}}} \end{matrix}

(7)

\begin{matrix} β & = \frac{μ_{\hat{y}}}{μ_{y}} \end{matrix}

(8)

\begin{matrix} γ & = \frac{{CV}_{\hat{y}}}{{CV}_{y}} = \frac{σ_{\hat{y}} / μ_{\hat{y}}}{σ_{y} / μ_{y}} \end{matrix}

(9)

With regard to measuring the performance of regression models in predicting precipitation rates, it is important to use certain metrics to ensure an accurate evaluation. By taking a holistic approach and adopting these metrics, we can assess the model’s accuracy against the observed data with precision.

To evaluate the performance of classification models, it is customary to employ confusion matrices and metrics such as accuracy, recall, and precision. In our hybrid methodology, which amalgamates elements of both classification and regression, it is imperative to apply these performance metrics to the classifier component to ensure the overall efficacy of the model. A confusion matrix (shown in Table 4) is a useful tool for displaying classification results in matrix format. The rows represent the actual classes, while the columns represent the predicted classes. The matrix contains true positive (TP), false positive (FP), true negative (TN), and false negative (FN) values.

Accuracy (Equation (10)) is a general metric indicating the total percentage of correct predictions (both positive and negative) of the model, calculated using the following formula:

\begin{matrix} Accuracy = \frac{T P + T N}{T P + T N + F P + F N} \end{matrix}

(10)

Although widely used, the accuracy can be misleading when imbalanced classes tend to favor the dominant class.

Other important metrics include the recall and precision, defined, respectively, as follows:

\begin{matrix} Recall = \frac{T P}{T P + F N} \end{matrix}

(11)

\begin{matrix} Precision = \frac{T P}{T P + F P} \end{matrix}

(12)

Recall, also known as sensitivity, is a metric that measures how well a model can correctly identify positive cases among all true positive cases. In simpler terms, it shows the proficiency of the model in detecting positive cases and reducing the number of false negatives. This metric is particularly important when failing to identify a positive case can lead to severe consequences, such as diagnosing a disease. For instance, in a test for a severe illness, it is crucial to have high recall to ensure that most patients with the disease are detected. Equation (11) represents the recall.

Equation (12) calculates the precision of a model. The precision measures the proportion of correct positive predictions. It is an essential metric used to evaluate the quality of a model’s positive predictions. If a model has high precision, it means that a significant number of its positive classifications are correct. This is crucial in cases where a false positive can have severe consequences, such as spam filtering. For example, high precision in an email filtering system is desirable to prevent legitimate emails from being mistakenly marked as spam.

It is essential to balance the precision and recall in certain situations. In cases where false positives can have serious consequences, the precision cannot be ignored entirely. Therefore, the decision to use relevant metrics will depend on the context of the application and the impact of classification errors.

In this work, it was crucial to prioritize the recall. In the classification that we adopted, it is essential to identify as many positive cases as possible, even if this leads to an increase in false positives. The consequences of failing to recognize truly positive cases can be severe. Failure to alert an at-risk population may lead to injuries or even deaths during natural disasters.

3. Results and Discussion

The ML models developed in this research focused on the task of precipitation estimation. Thus, for the evaluation of the models’ capabilities, we compared their estimations against observed data obtained from a collocated gauge–radar dataset. This approach aimed to address the challenge of categorizing meteorological events, such as rain and no rain, while concurrently estimating the intensity of the precipitation when present.

The classification step helps to categorize meteorological events as “rain” or “no rain”. Once the data have been classified as “rain”, the regression step is used to accurately estimate the precipitation amount. This is essential for various applications, such as weather forecasting, climate analysis, water resource management, and urban planning. Combining classification and regression methods provides a more in-depth and comprehensive meteorological analysis, offering valuable insights into the precipitation and its intensity.

Tuning the hyperparameters in machine learning models, such as setting ‘n_estimators’ to 100 for RF and 500 for GB, optimizes the performance, and it enabled us to achieve more accurate results in this study. These values were selected based on evaluations using cross-validation to balance the model’s complexity and generalization. This section will explore how the combination of these methods can lead to significant outcomes in understanding and predicting precipitation conditions.

Analyzing the results in Table 5, we can observe the performances of the RF and GB classifiers. Both algorithms demonstrated remarkable performances in the task of classifying meteorological events. During the validation phase, they achieved high accuracy, with values of 0.90 for both, indicating the ability of these models to classify conditions such as “rain” or “no rain” accurately.

The models’ effectiveness was evaluated using the recall metric, which measured their ability to correctly identify rain events. The classifiers performed exceptionally well during the validation phase, with recall values approaching 1.0. This result indicates the low occurrence of false negatives, meaning that very few instances of rain were erroneously classified as “no rain”. This outcome emphasizes the accuracy of the classifiers in correctly identifying and categorizing rain events.

The precision, which measured the proportion of true positives relative to the total predicted positives, also showed significant results, with values of 0.85 for RF and 0.84 for GB in the validation phase, suggesting that most of the models’ rain predictions were correct.

The results in the test phase, although slightly lower than during validation, were still satisfactory, with both classifiers maintaining good performance in terms of accuracy, recall, and precision. This consistency between the validation and test phases suggests that the models can generalize well to new data.

In Table 6, we examine the RF and GB regressors’ performances in estimating the precipitation intensity. The results highlight the efficacy of these models in quantitative estimation. The RMSE and MAE are standard metrics for the evaluation of the quality of predictions. In both metrics, the regressors achieved good performances.

During the validation phase, RF registered an RMSE of 1.58 mm and an MAE of 1.07 mm, while GB recorded an RMSE of 1.56 mm and an MAE of 0.73 mm. These values indicate that the regressors’ predictions were very close to the actual values, with relatively low average errors. The performances in the test phase were also excellent, with both regressors maintaining low values for the RMSE and MAE.

The results confirm that the hybrid approach effectively classifies meteorological events and estimates the precipitation intensity. The coordinated combination of these two steps provides a comprehensive and precise view of the weather conditions, essential for various practical applications. The consistency of the results between the validation and test phases demonstrates that these models can generalize well to new data, further emphasizing their usefulness in real-world scenarios.

In Table 7, we present a comparative analysis of the performance of the hybrid model in relation to theoretical Z–R relations and the three Z–R relations of the meteorological radar (DSD [13], Marshall–Palmer [4], and Nexrad [14]). The table displays the RMSE, MAE, and KGE metrics for the validation and test phases.

We observed remarkable results when comparing the hybrid model with the estimated theoretical Z–R relations. For both phases, validation and testing, the random forest–random forest (RFRF) model achieved RMSEs of only 1.00 mm and 0.82 mm, respectively, indicating that the quantitative predictions of the model were very close to the actual values. The MAEs were also low, with values of 0.41 mm and 0.34 mm in the validation and test phases. Moreover, the KGE coefficient demonstrated good agreement between the predictions and observations, with values of 0.62 and 0.80 for the validation and test phases.

The results for the gradient boosting–gradient boosting (GBGB) model are also consistent, with RMSEs of 1.30 mm and 0.70 mm, MAEs of 0.35 mm and 0.23 mm, and KGE values of 0.80 and 0.90 for the validation and test, respectively. These results confirm the ability of the hybrid model methodology to accurately estimate the precipitation intensity.

However, when evaluating different combinations of the classifiers and regressors, such as random forest–gradient boosting (RFGB) and gradient boosting–random forest (GBRF), we observed variations in performance, highlighting the importance of selecting the algorithms appropriately. We also compared the performances of these models with the Z–R relations of the meteorological radar—DSD [13], Marshall–Palmer [4], and Nexrad [14]—noting that the hybrid approach surpassed these in terms of accuracy.

We also highlight the performance of the Oracle (OC) method, which combines the three Z–R methods and chooses the closest to the observed value. The Oracle method served as a benchmark in our analysis, providing an idealized scenario where the best-performing Z–R relation was selected for each event. This approach allowed us to gauge the potential upper limit of accuracy achievable by dynamically adapting the Z–R relations based on real-time observations. This model achieved good results, with RMSEs of 1.10 mm and 1.20 mm, MAEs of 0.53 mm and 0.56 mm, and KGE values of 0.60 and 0.78 for the validation and test, respectively.

Figure 5 presents graphs of the test dataset, providing a clearer visualization of the performances of the different linear regression models and the coefficient of determination Equation (13), which represents the proportion of data variation explained by the model. In this analysis, we focused on the models that are highlighted in Table 7: RFRF, GBGB, and Oracle.

R^{2} = 1 - \frac{\sum {(y - \hat{y})}^{2}}{\sum {(y - \bar{y})}^{2}}

(13)

When observing the linear regression figures, it is evident that these three models demonstrate remarkable performances compared to the others. The regression lines fitted for RFRF, GBGB, and Oracle are very close to the data points, suggesting a considerable ability to estimate the precipitation intensity.

However, to determine which of the three models can be identified as the best, it is essential to consider the evaluation metrics presented in Table 7, where the RMSE indicates how close the predictions are to the actual values, with lower values indicating the better fit of the model to the data.

Figure 6 illustrates the correlation between the observed data and the predictions generated by the models. The graph displays points (a) to (h) that identify cases where the precipitation was notably underestimated. The color variations in the graph indicate critical areas: red identifies where filters should be applied to exclude anomalous data or outliers, and blue shows where the results are considered reliable without the need for additional filtering.

The data suggest that 80.79% of the values with low correlations require filtering, indicating that filters are an effective tool for improving the accuracy of predictions. Of particular interest is the observation indicated by points (a) and (e) of 29.6 mm. This observation shows a significant discrepancy in the data processed by the ML models. This observation underlines the importance of points with lower correlations to identify possible failures or limitations in the used models.

Table 8 compares the precipitation estimates of the RFRF and GBGB models with the actual measurements. The analysis focuses on the underestimated events, as illustrated in Figure 6. Notably, points (a) and (e), corresponding to the Pinhao station, reveal a significant disparity between the estimates of 0.65 mm (RFRF) and 0.78 mm (GBGB) against the observed value of 29.6 mm.

This case illustrates the complexity of precipitation estimation and highlights the influence of factors such as the distance between the meteorological station and the radar. The distance of 193 km between the Pinhao station and the radar affects the accuracy of the reflectivity reading. For this point, the

D B Z_{h}

value of 28.63 dBZ suggests possible distortions caused by radar beam scattering and atmospheric variations at great distances. Furthermore, interpreting this value of 28.63 dBZ using different methods of calculating the precipitation rate (different Z–R relationships) reveals a significant discrepancy compared to the observed value. The calculated rates are as follows:

Using Marshall–Palmer [4], 1948—0.56 mm/15 min;
Using Nexrad [14]—0.47 mm/15 min;
Using DSD [13]—0.53 mm/15 min.

These lower values suggest that the observed precipitation rate may be overestimated, indicating a possible error at the Pinhao station. The high linear polarization (RHOHV = 0.97) and the differential polarization diffraction values (KDP = 0.20 and ZDR = 0.37) point to complex meteorological conditions that the models may not have adequately interpreted. The complex nature of these polarimetric parameters, as discussed regarding the selection of the polarimetric variables, suggests that specific aspects of the meteorological phenomena were beyond the estimation capacity of the machine learning models.

Table 8 presents a comparative analysis between the recorded data and the model estimates. Taking the case of Pinhao as an example, we observe a notable discrepancy between the observed data and the estimates. In the hour before the event, a precipitation volume of 0.0 mm was recorded, corroborated by the model estimate of 0.00 mm. However, a considerable precipitation event was observed in the subsequent hour, reaching 29.6 mm. This amount contrasts significantly with the model estimate of only 0.65 mm.

This discrepancy could be attributed to potential reading errors at the meteorological station, possibly caused by technical failures. The fact that no amount of rain was measured immediately before a significant rainfall event is a possible indicator of an instrument malfunction. Alternatively, it may indicate an inherent limitation of the model in accurately predicting intense rain events, particularly in scenarios where previous readings suggest low levels or the absence of precipitation. This observation highlights the need to refine the estimation models, seeking the better representation and capture of the temporal dynamics of intense rain events.

3.1. Rain Intensity Analysis

We next consider a model analysis regarding different rain types, from the lightest rain to the heaviest rain, to evaluate the accuracy of the algorithms in differentiating between various rainfall intensities.

Table 9 presents the accuracy percentages for the RFRF and GBGB algorithms across different rainfall intensities, including no rain, light rain, moderate rain, and heavy rain scenarios. The results from both the validation and testing phases are represented as precision values.

Under moderate rain conditions (intensity greater than 2 mm/15 min and up to 5 mm/15 min), both algorithms exhibited high accuracy, with RF achieving up to 97% precision in the testing phase and GB also demonstrating robust performance, which indicates the effectiveness of these models in correctly identifying and classifying moderate rain events, a crucial aspect of accurate meteorological nowcasting and water resource management.

However, when faced with heavy rain events (intensity greater than 5 mm/15 min), a significant variation in performance was observed. While GB maintained high precision, indicating its robustness and adaptability under extreme precipitation conditions, RF showed a decrease in precision. This difference underscores the importance of selecting the appropriate algorithm for the modeling of intense rain events, highlighting GB as a valuable tool for practical applications where the accurate identification of heavy rainfall is essential.

These results highlight the sensitivity of the RFRF and GBGB algorithms to the rainfall intensity, with a particular focus on moderate and heavy rain. The superior performance of GB under these conditions suggests its applicability in scenarios where distinguishing between different precipitation levels is crucial for informed decision making in meteorology and water resource management. This analysis reinforces the need for tailored algorithm selection and data preprocessing strategies to enhance the estimation and classification of intense precipitation events.

3.2. Comparison of Images from Different Precipitation Estimation Methodologies

This section presents a comparative analysis of the model outputs and theoretical Z–R relationships, emphasizing the application of Radial Basis Function Interpolation (RBF) [21] for data transformation. A series of images are showcased to illustrate the models’ performance in replicating the radar reflectivity and estimating the precipitation rates.

Figure 7 displays images representing the model outputs for 11 October 2022, at 11:00 UTC. On this date, significant rainfall was experienced in the western region of Paraná, which was covered by the Cascavel radar. The figure comprises several sub-figures, with the first depicting the radar reflectivity, while the subsequent images show the precipitation rates over 15 min intervals.

Notably, the RFRF (Figure 7b) and GBGB (Figure 7c) models exhibit alignment with the radar reflectivity field, indicating their precision in capturing the observed precipitation system’s characteristics. The visual representations in Figure 7 are invaluable in evaluating the models’ capability to reproduce the actual weather conditions, particularly under heavy rainfall scenarios.

The transformation of the station data to the radar grid was achieved using RBF interpolation for 117 stations. It is crucial to highlight that the employed RBF interpolation produced a smooth and efficient surface (Figure 7), although the parameter selection and overfitting potential are critical considerations in ensuring the method’s accuracy.

The comparison between the station-obtained data values and those derived from the radar images (as illustrated in Figure 8) reveals a tendency for overestimation within both metrics. Specifically, the machine learning models RFRF (Figure 8a) and GBGB (Figure 8b) exhibit this overestimation, closely followed by the OC_Sub (Oracle) model (Figure 8c).

Our employed methodology relies on point-based interpolation to generate a 500 m resolution grid. In the northwest region, for example, where interpolated rain gauge data are unavailable, all estimation methods demonstrate overestimation. Thus, it is expected that an increase in the number of rain gauges in the region would reduce this overestimation, by providing more data points for the interpolation method.

Additionally, this overestimation is more pronounced for regions distant from the radar. As the radar beam travels from the radar, its distance from the planetary surface increases due to the Earth’s curvature. This, in turn, affects the radar measurements. Adjustments are made for the radar beam height, relevant for regions farther from the radar, and further analysis will be conducted that is focused on improving these adjustments.

To improve future investigations, it would be useful to incorporate additional predictive variables such as the temperature, humidity, and atmospheric pressure into the models. This extension would enable a more detailed and comprehensive analysis of the weather conditions, resulting in more precise precipitation estimates. Additionally, it is crucial to train the models with more data points around the radar station.

Figure 9 and Figure 10 display the algorithm outcomes and theoretical Z–R relationships for the weather events of 12 July 2023, at 22:30 UTC and 2 September 2023, at 13:00 UTC, respectively. These instances confirm the earlier discussed overestimation trend seen for the machine learning data for the event on 11 October 2022, at 11:00 UTC.

4. Conclusions

This research aimed to enhance precipitation estimation by integrating radar and rain gauge data through machine learning. The results provide a compelling affirmative response to the research questions, demonstrating significant improvements in the estimation accuracy.

A robust model was developed, amalgamating data from multiple sources and employing a hybrid approach combining classification and regression techniques. This methodology proved particularly effective, yielding robust quantitative estimates. It performed well even on datasets with a high proportion of missing values.

The success of the models was highly dependent on the data quality and the volume of the data used. Gradient boosting was more robust than random forest when tested with various precipitation scenarios. The machine learning models outperformed the traditional meteorological standards, indicating their promising potential in the meteorological domain.

A sensitivity analysis was conducted across different rainfall intensities, which provided valuable insights into the models’ adaptability to diverse meteorological conditions and pointed out promising areas for further research and enhancement.

Evaluating the models’ performance across different geographic regions could yield insights into their adaptability and generalizability in diverse scenarios. Furthermore, the deployment of advanced neural network models such as convolutional neural networks (CNNs) [22,23] could be explored to capture more complex spatial and temporal patterns in the data, offering a deeper, more integrated approach to precipitation estimation.

This study highlights the potential of machine learning models in estimating precipitation. It also paves the way for future research into more advanced techniques and diverse scenarios. The outcomes of this work carry tangible implications for both academia and practical applications, as it advances the accuracy of meteorological nowcasting. The practical applications of these models could exert a significant impact on society, particularly in improving the precision of meteorological products. Such progress is crucial for effective natural disaster management and water resource optimization, thereby contributing to the well-being and safety of communities.

Author Contributions

Conceptualization, F.F.V., C.B., L.E.S.O. and M.A.Z.A.; methodology, F.F.V., C.B., L.E.S.O. and M.A.Z.A.; software, F.F.V.; validation, F.F.V., L.E.S.O. and M.A.Z.A.; formal analysis, F.F.V., C.B., L.E.S.O. and M.A.Z.A.; investigation, F.F.V. and C.B.; resources, F.F.V.; data curation, F.F.V. and C.B.; writing—original draft preparation, F.F.V.; writing—review and editing, C.B., L.G.P.J., L.C., L.E.S.O. and M.A.Z.A.; visualization, F.F.V.; supervision, M.A.Z.A. and C.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets employed in this research are proprietary to SIMEPAR and are subject to privacy and commercial use limitations. However, they can be made available to the academic and research community under certain conditions. Interested researchers are invited to submit a request to the Meteorology Operational Department via opera@simepar.br.

Conflicts of Interest

The authors declare no conflicts of interest.

Sample Availability

Samples of the compounds are available from the authors.

Abbreviations

CNNs	Convolutional Neural Networks
GB	Gradient Boosting
GBGB	Gradient Boosting–Gradient Boosting
GBRF	Gradient Boosting–Random Forest
KGE	Kling–Gupta Efficiency
ML	Machine Learning
MAE	Mean Absolute Error
OC	Oracle
QPE	Quantitative Precipitation Estimation
RBF	Radial Basis Function Interpolation
RF	Random Forest
RFRF	Random Forest–Random Forest
RFGB	Random Forest–Gradient Boosting
RMSE	Root Mean Square Error
Z–R	Z–R Relationship (Z—Reflectivity; R—Rainfall Rate)

References

ANEEL. Agência Nacional de Energia Elétrica—Geração. 2021. Available online: https://www.aneel.gov.br/ (accessed on 23 February 2022).
IDR-Paraná. Classificação Climática—Instituto de Desenvolvimento Rural do Paraná. 2021. Available online: https://www.idrparana.pr.gov.br/Pagina/Atlas-Climatico (accessed on 23 February 2022).
IPARDES-Paraná. Paraná Perspectiva—Instituto Paranaense de Desenvolvimento Econômico e Social. 2021. Available online: https://www.ipardes.pr.gov.br/Noticia/IPARDES-lanca-estudo-sobre-o-Parana (accessed on 23 February 2022).
Marshall, J.S.; Palmer, W.M.K. The distribution of raindrops with size. J. Atmos. Sci. 1948, 5, 165–166. [Google Scholar] [CrossRef]
Hong, Y.; Gourley, J.J. Radar Hydrology: Principles, Models, and Applications; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
Chandrasekar, V.; Beauchamp, R.M.; Bechini, R. Introduction to Dual Polarization Weather Radar: Fundamentals, Applications, and Networks; Cambridge University Press: Cambridge, UK, 2023. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Orellana-Alvear, J.; Célleri, R.; Rollenbeck, R.; Bendix, J. Optimization of x-band radar rainfall retrieval in the southern Andes of Ecuador using a random forest model. Remote Sens. 2019, 11, 1659. [Google Scholar] [CrossRef]
Shin, J.-Y.; Ro, Y.; Cha, J.-W.; Kim, K.-R.; Ha, J.-C. Assessing the applicability of random forest, stochastic gradient boosted model, and extreme learning machine methods to the quantitative precipitation estimation of the radar data: A case study to Gwangdeoksan radar, South Korea, in 2018. Adv. Meteorol. 2019, 2019, 6542410. [Google Scholar] [CrossRef]
Wolfensberger, D.; Gabella, M.; Boscacci, M.; Germann, U.; Berne, A. Rainforest: A random forest algorithm for quantitative precipitation estimation over Switzerland. Atmos. Meas. Tech. 2021, 14, 3169–3193. [Google Scholar] [CrossRef]
Rollenbeck, R.; Orellana-Alvear, J.; Rodriguez, R.; Macalupu, S.; Nolasco, P. Calibration of x-band radar for extreme events in a spatially complex precipitation region in north Peru: Machine learning vs. empirical approach. Atmosphere 2021, 12, 1561. [Google Scholar] [CrossRef]
Calheiros, R.V.; Beneti, C.; Oliveira, C.; Calvetti, L. Distrometric Drop Size Distribution in South Brazil: Derived Z-R Relationships and Comparisons with Radar Measurements. In Proceedings of the 38th Conference on Radar Meteorology, Chicago, IL, USA, 28 August–1 September 2017. [Google Scholar]
Heiss, W.H.; McGrew, D.L.; Sirmans, D. NEXRAD: Next generation weather radar (WSR-88D). Microw. J. 1990, 33, 79–89. [Google Scholar]
Vulpiani, G.; Montopoli, M.; Passeri, L.D.; Gioia, A.G.; Giordano, P.; Marzano, F.S. On the Use of Dual-Polarized C-Band Radar for Operational Rainfall Retrieval in Mountainous Areas. J. Appl. Meteorol. Climatol. 2012, 51, 405–425. [Google Scholar] [CrossRef]
Battan, L.J. Radar observation of the atmosphere. Q. J. R. Meteorol. Soc. 1973, 99, 793. [Google Scholar]
Lakshmanan, V.; Karstens, C.; Krause, J.; Elmore, K.; Ryzhkov, A.; Berkseth, S. Which Polarimetric Variables Are Important for Weather/No-Weather Discrimination? J. Atmos. Ocean. Technol. 2015, 32, 1209–1223. [Google Scholar] [CrossRef]
Kling, H.; Fuchs, M.; Paulin, M. Runoff conditions in the upper Danube basin under an ensemble of climate change scenarios. J. Hydrol. 2012, 424–425, 264–277. [Google Scholar] [CrossRef]
Gupta, H.V.; Kling, H.; Yilmaz, K.K.; Martinez, G.F. Decomposition of the Mean Squared Error and NSE Performance Criteria: Implications for Improving Hydrological Modelling. J. Hydrol. 2009, 377, 80–91. [Google Scholar] [CrossRef]
Baez-Villanueva, O.M.; Zambrano-Bigiarini, M.; Beck, H.E.; McNamara, I.; Ribbe, L.; Nauditt, A.; Birkel, C.; Verbist, K.; Giraldo-Osorio, J.D.; Thinh, N.X. RF-MEP: A Novel Random Forest Method for Merging Gridded Precipitation Products and Ground-Based Measurements. Remote Sens. Environ. 2020, 239, 111606. [Google Scholar] [CrossRef]
Rippa, S. An Algorithm for Selecting a Good Value for the Parameter c in Radial Basis Function Interpolation. Adv. Comput. Math. 1999, 11, 193–210. [Google Scholar] [CrossRef]
Ayzel, G.; Scheffer, T.; Heistermann, M. RainNet v1.0: A convolutional neural network for radar-based precipitation nowcasting. Geosci. Model Dev. 2020, 13, 2631–2644. [Google Scholar] [CrossRef]
Zhang, J.; Howard, K.; Gourley, J.J. Constructing Three-Dimensional Multiple-Radar Reflectivity Mosaics: Examples of Convective Storms and Stratiform Rain Echoes. J. Atmos. Ocean. Technol. 2005, 22, 30–42. [Google Scholar] [CrossRef]

Figure 1. Map of South America with an inset presenting our study site. In the study site inset, the red circle defines the location of the weather radar, and the gray circle, its surveillance area. Also depicted in the study site are the rain gauges, presented as blue triangles.

Figure 2. Z–R relationships used for operational purposes in the context of Paraná state, depicting various methodologies. The dashed blue line represents the stratiform precipitation relationship by [4], the red line illustrates the convective precipitation relationship [14], and the green line depicts the convective approach by [13]. The x-axis represents the precipitation rate in millimeters per hour (mm/h), and the y-axis represents the reflectivity factor in decibels relative to Z (dBZ). An increase in precipitation rate entails an increase in the reflectivity factor. Adapted from [16].

Figure 3. The distribution of the number of events of precipitation data accumulated in 15 min intervals after data cleaning in the period of 2018–2022. Values below 0.2 mm/15 min were not considered. Based on [11].

Figure 4. A diagram of the machine learning model’s workflow. It presents the data flow and the hybrid model. The classifier is used to verify whether there is rain. If so, the regressor is used to estimate the amount. Finally, the model’s performance is measured through metrics such as the mean absolute error (MAE) and root mean square error (RMSE).

Figure 5. Scatter plots comparing the observed and predicted (estimated) rainfall from various models on the test dataset, expressed in mm per 15 min. Each panel represents a different model, denoted by codes such as RFRF or GBGB, among others, at the top of each plot. The red lines indicate the line of perfect agreement (y = x), while the black dots represent the actual data points. The metrics include the coefficient of determination (R²), mean absolute error (MAE), root mean square error (RMSE), and Kling–Gupta efficiency (KGE), providing a quantitative view of each model’s accuracy.

Figure 6. Analysis of the prediction accuracy for rainfall events above 5 mm observed per 15 min, using two machine learning models: Random Forest–Random Forest and Gradient Boosting–Gradient Boosting. Panels (a,c) show data points accepted by the filters, where the predictions closely aligned with the observations are depicted in red. Panels (b,d) depict data points rejected by the filters, where predictions that significantly diverged from the observations are shown in blue. The solid black line represents the accuracy of the prediction, and the dashed lines show the extent of deviation that is acceptable. Each letter from a to h corresponds to specific outlier points detailed in the text, highlighting the necessity of applying filters to improve the model performance.

Figure 7. A comparison of the radar reflectivity and estimated precipitation rates across multiple algorithms and theoretical Z–R relationships on 11 October 2022, at 11:00 UTC. Panel (a) displays the radar reflectivity map with values ranging from −20 to 60 dBZ, indicating the intensity of the radar echo. Panels (b–h) show the precipitation rates predicted by various models (RFRF, GBGB, RFGB, GBRF, DSD Calheiros, MP, NEXRAD) across the same geographic region, measured in mm per 15 min. These maps provide insights into how the different algorithms interpret reflectivity data to estimate precipitation.

Figure 8. A comparison and subtractive analysis of the precipitation rate estimates from different stations and models on 22 October 2022, at 11:30 UTC. Panels (a–f) illustrate the differences in precipitation rate (mm per 15 min) as estimated by algorithms RFRF_sub, GBGB_sub, OC_sub, DSD_sub, MP_sub, and NEXRAD_sub, respectively. These maps highlight areas of significant discrepancy (in yellow and green) against the background of minimal or no discrepancy (in purple), showing how each model’s prediction varies from the observed station data. The color scale represents the magnitude of the discrepancy, providing a visual comparison of the model accuracy across the region.

Figure 9. A comparison of the radar reflectivity and estimated precipitation rates across multiple algorithms and theoretical Z–R relationships on 12 July 2023, at 22:30 UTC. Panel (a) displays the radar reflectivity map with values ranging from −20 to 60 dBZ, indicating the intensity of the radar echo. Panels (b–h) show the precipitation rates predicted by various models (RFRF, GBGB, RFGB, GBRF, DSD Calheiros, MP, NEXRAD) across the same geographic region, measured in mm per 15 min. These maps provide insights into how the different algorithms interpret reflectivity data to estimate precipitation.

Figure 10. A comparison and subtractive analysis of the precipitation rate estimates from different stations and models on 12 July 2023, at 22:30 UTC. Panels (a–f) illustrate the differences in the precipitation rate (mm per 15 min) as estimated by algorithms RFRF_sub, GBGB_sub, OC_sub, DSD_sub, MP_sub, and NEXRAD_sub, respectively. These maps highlight areas of significant discrepancy (in yellow and green) against the background of minimal or no discrepancy (in purple), showing how each model’s prediction varies from the observed station data. The color scale represents the magnitude of the discrepancy, providing a visual comparison of the model accuracy across the region.

Table 1. Native and transformed spatial and temporal resolutions of the products included in the gauge–radar database.

	Original Resolution		Database Resolution
	Spatial	Temporal	Spatial	Temporal
Radar	$1^{\circ} \times 250 m \times 1$ elevation	$5 \min$	Azimuth, Range	$15 \min$
Rain Gauge	Lat, Lon	$15 \min$	Azimuth, Range	$15 \min$

Table 2. The features used in the machine learning model for radar rainfall retrieval.

Feature Name	Description	Units
Alt	Beam height from radar	$meters$
Distance	Distance of rain gauge from radar	$km$
$D B Z_{h}$	Reflectivity factor at horizontal	${mm}^{6} m^{- 3}$
ZDR	Differential reflectivity	$dB$
$K_{dp}$	Specific differential phase	$° {km}^{- 1}$
$R h o_{HV}$	Co-polar correlation coefficient	-

Table 3. The input variables for quantitative precipitation estimation (QPE) from the radar and station data.

Azimuth	Range	Time	Elevation	Sweep	DBZH	DBZV	KDP	…
52	144,750	1 January 2018 10:15:00	0.5	0.0	1.22	1.47	0.05	…
161	83,000	1 January 2018 10:15:00	0.5	0.0	4.11	1.01	0.5	…
…	…	…	…	…	…	…	…	…

Table 4. The confusion matrix.

Predicted Value
	Positive	Negative
Positive	TP	PF
Negative	FN	TN

Table 5. The results obtained in the classification task for the validation and test datasets.

Classifier
Algorithm	Accuracy		Recall		Precision
	Validation	Test	Validation	Test	Validation	Test
Random Forest	0.90	0.82	0.98	0.99	0.85	0.74
Gradient Boosting	0.90	0.83	0.97	0.98	0.84	0.76

Table 6. The results obtained in the regression task for the validation and test datasets (mm/15 min).

Regressor
Algorithm	RMSE		MAE
	Validation	Test	Validation	Test
Random Forest	1.58 mm	0.94 mm	1.07 mm	0.44 mm
Gradient Boosting	1.56 mm	0.71 mm	0.73 mm	0.48 mm

Table 7. The results obtained using the two algorithms for the validation and test datasets (mm/15 min).

Classifier + Regressor/Comparison with Z-R Relation
	RMSE		MAE		KGE
	Validation	Test	Validation	Test	Validation	Test
RFRF	1.00 mm	0.82 mm	0.41 mm	0.34 mm	0.62	0.80
GBGB	1.30 mm	0.70 mm	0.35 mm	0.23 mm	0.80	0.90
RFGB	1.40 mm	1.30 mm	0.62 mm	0.69 mm	0.22	0.39
GBRF	1.00 mm	0.88 mm	0.41 mm	0.42 mm	0.62	0.79
DSD	1.40 mm	1.60 mm	0.69 mm	0.72 mm	0.50	0.75
Marshall–Palmer	1.50 mm	1.70 mm	0.74 mm	0.77 mm	0.49	0.73
Nexrad	1.60 mm	1.60 mm	0.74 mm	0.71 mm	0.50	0.75
Oracle	1.10 mm	1.20 mm	0.53 mm	0.56 mm	0.60	0.78

RF—Random Forest; GB—Gradient Boosting; DSD—Disdrometer.

Table 8. High values not estimated by the machine learning model.

Random Forest–Random Forest
	Observed mm		Estimated mm		Station	Distance	DBZH	RHOHV	ZDR	KDP
	$Δ t - 1$	$Δ t$	$Δ t - 1$	$Δ t$
a	0.0	29.6	0.00	0.65	Pinhao	193	28.63	0.97	0.37	0.20
b	4.8	24.0	3.00	0.00	Salto_Caxias	71	13.67	0.96	0.70	0.00
c	2.4	20.6	2.58	1.25	Laranjeiras	138	29.84	0.98	0.86	0.37
d	0.0	20.0	0.58	1.06	Loanda	219	30.23	0.94	1.32	0.13
Gradient Boosting–Gradient Boosting
	Observed mm		Estimated mm		Station	Distance	DBZH	RHOHV	ZDR	KDP
	$Δ t - 1$	$Δ t$	$Δ t - 1$	$Δ t$
e	0.0	29.6	0.00	0.78	Pinhao	193	28.63	0.97	0.37	0.20
f	4.8	24.0	4.00	0.00	Pinhao	71	13.67	0.96	0.70	0.00
g	0.0	20.0	0.58	0.52	Loanda	219	30.23	0.94	1.32	0.13
h	0.0	19.2	0.00	0.64	Umuarama	126	28.98	0.99	0.10	0.22

Table 9. The results regarding the percentages of light and heavy rain for the Random Forest–Random Forest (RFRF) and Gradient Boosting–Gradient Boosting (GBGB) algorithms.

Rain Intensity Analysis
	RFRF		GBGB
	Validation	Test	Validation	Test
No Rain (0 to 0.2 mm/15 min.)	83%	64%	84%	70%
Light Rain (>0.2 to 2 mm/15 min.)	25%	44%	78%	93%
Moderate Rain (>2 to 5 mm/15 min.)	91%	97%	88%	96%
Heavy Rain (>5 mm/15 min.)	53%	75%	76%	93%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Verdelho, F.F.; Beneti, C.; Pavam, L.G., Jr.; Calvetti, L.; Oliveira, L.E.S.; Zanata Alves, M.A. Quantitative Precipitation Estimation Using Weather Radar Data and Machine Learning Algorithms for the Southern Region of Brazil. Remote Sens. 2024, 16, 1971. https://doi.org/10.3390/rs16111971

AMA Style

Verdelho FF, Beneti C, Pavam LG Jr., Calvetti L, Oliveira LES, Zanata Alves MA. Quantitative Precipitation Estimation Using Weather Radar Data and Machine Learning Algorithms for the Southern Region of Brazil. Remote Sensing. 2024; 16(11):1971. https://doi.org/10.3390/rs16111971

Chicago/Turabian Style

Verdelho, Fernanda F., Cesar Beneti, Luis G. Pavam, Jr., Leonardo Calvetti, Luiz E. S. Oliveira, and Marco A. Zanata Alves. 2024. "Quantitative Precipitation Estimation Using Weather Radar Data and Machine Learning Algorithms for the Southern Region of Brazil" Remote Sensing 16, no. 11: 1971. https://doi.org/10.3390/rs16111971

APA Style

Verdelho, F. F., Beneti, C., Pavam, L. G., Jr., Calvetti, L., Oliveira, L. E. S., & Zanata Alves, M. A. (2024). Quantitative Precipitation Estimation Using Weather Radar Data and Machine Learning Algorithms for the Southern Region of Brazil. Remote Sensing, 16(11), 1971. https://doi.org/10.3390/rs16111971

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Quantitative Precipitation Estimation Using Weather Radar Data and Machine Learning Algorithms for the Southern Region of Brazil

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.2. Data Preprocessing

2.3. Feature Engineering

2.4. Model Development

2.5. Model Evaluation

3. Results and Discussion

3.1. Rain Intensity Analysis

3.2. Comparison of Images from Different Precipitation Estimation Methodologies

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Sample Availability

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI