Cost-Sensitive Rainfall Intensity Prediction with High-Noise Commercial Microwave Link Data

Zheng, Liankai; Lin, Jiaxiang; Huang, Zhixin; Lin, Yu; Zheng, Qin; Chen, Qianqian; Lin, Lizheng; Chen, Jianyun

doi:10.3390/su16188067

Open AccessArticle

Cost-Sensitive Rainfall Intensity Prediction with High-Noise Commercial Microwave Link Data

by

Liankai Zheng

^1,2,

Jiaxiang Lin

^1,2,*

,

Zhixin Huang

^1,2,

Yu Lin

^1,2,

Qin Zheng

^1,2,

Qianqian Chen

^1,2,

Lizheng Lin

³ and

Jianyun Chen

⁴

¹

Key Laboratory of Smart Agriculture and Forestry, Fujian Agriculture and Forestry University, Fuzhou 350002, China

²

College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China

³

Fujian Provincial Meteorological Bureau, Fujian Provincial Atmospheric Detection Technology Support Center, Fuzhou 350028, China

⁴

Meteorological Bureau of Fuzhou, Fuzhou 350008, China

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(18), 8067; https://doi.org/10.3390/su16188067

Submission received: 6 August 2024 / Revised: 11 September 2024 / Accepted: 12 September 2024 / Published: 15 September 2024

Download

Browse Figures

Versions Notes

Abstract

:

Rainfall intensity prediction based on commercial microwave link data has received significant attention in recent years due to the higher spatial resolution and lower energy consumption. However, the predictive performance is inferior to the model based on meteorological data by reason of the high noise in commercial microwave link data, further exacerbated by the imbalance in the number of samples across different rainfall intensities. Hence, a cost-sensitive rainfall intensity prediction model (CSRFP) is proposed to achieve better predictive performance in high-noise commercial microwave link data. First, the spatiotemporal scene information is encoded, and its weights are trained to provide the model with correlations between signal data from different stations, which helps the model to better capture potential patterns between the data and thus reduce the effect of noise. Next, the rainfall cross-entropy loss based on the rainfall distribution provides the model with the probability of different rainfall intensities occurring and back-calculates the signal attenuation at a specific rainfall intensity, assigning more reasonable weights to different samples considering signal attenuation, which makes the model cost-sensitive and can address the class imbalance problem. Extensive experiments are carried out on high-noise communication data and imbalanced rainfall data in Fuzhou. Compared to typical prediction methods such as RNN applied to rainfall and communication data, CSRFP improves Recall, Precision, AUC_ROC, AUC_PR and F₁ and Accuracy by approximately 19%, 37%, 8%, 22%, 30%, and 17%, respectively. Significantly, the model’s prediction accuracy for heavy rain with the smallest number of samples improves by about 13%.

Keywords:

time series prediction; class imbalance; cost sensitive; commercial microwave links

1. Introduction

Rainfall intensity prediction is crucial for preventing landslides, riverbank breaches, and secondary disasters from heavy rainfall [1,2]. Research has shown that effective rainfall prediction can significantly reduce the risk of landslides and improve the accuracy of susceptibility mapping, especially when using modern machine learning methods [3,4]. Mainstream rainfall intensity prediction schemes rely on dedicated signal monitoring equipment, incurring additional construction and operational costs. For example, a modern weather radar’s monthly power consumption ranges from 1 to 2.5 tce, with China’s annual average weather radar energy consumption reaching 6600 tons of standard coal [5]. The energy consumption of rainfall intensity prediction has basically reached the level of high energy-consuming industries, and energy consumption and cost restrict the improvement of spatial resolution of rainfall intensity prediction.

Traditionally, statistical and mathematical models based on a collection of extreme precipitation data have defined the predictive models of precipitation intensity. Recent studies have highlighted the importance of understanding these distributions to effectively model extreme rainfall events and detect climate change [6]. The input data for these mathematical models mostly comes from radar or weather stations limited by costs and other factors, making it difficult to deploy them densely. In contrast, communication signal stations are already widely distributed in many areas. Some researchers have proposed using commercial microwave links (CMLs) data as a substitute for specific meteorological microwave data from weather radars [7,8]. This approach aims to enhance resource reutilization, improve spatial resolution, save energy, and reduce emissions. Research on rainfall intensity prediction using CMLs has already been initiated and has achieved certain results [9,10]. However, previous studies [11,12,13,14] only considered a few signal features, such as received signal level (RSL), neglecting spatiotemporal scene information. To improve prediction accuracy, high-precision monitoring equipment was still required to collect low-noise CML data, contradicting the original purpose. Moreover, the class imbalance problem was ignored, which cannot reflect the real performance of the model.

Class imbalance refers to a significant unevenness or skewedness in the proportion of classes within the data, which leads to the model ignoring the minority class in the prediction and causes a more serious impact when the imbalance is aggravated or the noise in the data is relatively high. There is more noise in CMLs when compared to the special meteorological microwave links, due to the complex application scenarios and the devices’ interference-resistant design [15]. Previous rainfall intensity prediction experiments based on CMLs did not address class imbalance and still used average prediction accuracy as the main evaluation metric, leading to unreliable results [16]. Similar problems in many application scenarios have been studied [17,18], where various approaches have been explored, including data-level methods [19], cost-sensitive methods [20], and ensemble methods [21], which have produced significant effects in practical applications. However, data-level methods destroy time dependence and may amplify the noise of data; ensemble methods demand substantial computational cost. Therefore, cost-sensitive methods are employed to address the large-scale, high-noise imbalanced time series prediction problems, which is named CML-based rainfall intensity prediction.

In summary, to achieve low-carbon and high spatial resolution rainfall intensity prediction using CMLs, it is urgent to develop a cost-sensitive model that can suppress the effects of high noise and class imbalance.

2. Related Work

In rainfall intensity prediction, the meteorological method has high accuracy but low spatial resolution and high deployment cost [22]. The satellite method can cover the entire world, but it has high construction costs, low spatiotemporal accuracy, significant cloud interference, and difficult inversion [23]. The weather radar link-based prediction has wide coverage, but near-ground measurement is difficult and operational energy consumption is high. In contrast, the CML-based prediction offers high spatial resolution and no additional construction cost, but obtaining and processing data with complicated features is challenging [24,25]. Hitherto, many researchers have conducted studies on CML-based rainfall prediction. Messer [7] proposed the use of CMLs for detecting meteorological conditions. Djibo [26] argues that using CML data is an alternative, innovative, and cost-effective solution for rainfall quantification. He proposed a data system for collecting real-time power from CMLs and demonstrated the correlation between rainfall and microwave attenuation using data provided by the National Meteorological Agency of Burkina Faso. They demonstrated the feasibility of environmental monitoring using wireless communication networks. Machine learning techniques have been successfully applied to prediction and evolution tracking, providing enhanced predictive capabilities in various regions [27,28]. Lian and Kumar [9,29] found that clustering methods and decision trees showed strong capabilities for processing meteorological data, while neural networks showed superior performance for microwave data.

Brito [14] studied low-frequency microwave links below 10 GHz, using machine learning algorithms like support vector machines and decision trees based on the global system for mobile communications and the global positioning system signal strengths to classify rain intensities as no rain, light rain, and heavy rain and to analyze performance. Avanzato [13] highlighted the correlation between a set of wireless channel quality monitoring parameters and relative rainfall intensity levels, proposing a novel method for rainfall classification using LTE wireless channel parameters based on a cell selection mechanism. Qiu [30] proposed a comprehensive prediction method combining spatial information of sites, designed a multi-task convolutional neural network, and made use of 8 types of weather variables as features to predict short-term rainfall for 3 h. Pudashine [31] applied an algorithm capable of reducing signal noise unrelated to rainfall amounts [32], constructing a long short-term memory (LSTM) network for simulations and real-field predictions at airports. Poornima [33] redesigned a recurrent neural network (RNN) model, optimizing the structure’s output gate, and compared it with models like the autoregressive integrated moving average model, validating the superiority of the recurrent neural network structure in rainfall prediction.

The studies mentioned above explore and discuss from multiple perspectives the feasibility of using artificial intelligence methods for short-term rainfall intensity prediction with CMLs. However, the existing results still rely on specialized signal monitoring equipment and do not fully leverage the spatial density advantage of CMLs. Additionally, the issue of rainfall intensity prediction being overly focused on the majority class has not received sufficient attention in the past. To address the problem above, this study explores how to solve the problems of high noise in CMLs and class imbalance in rainfall data. The study aims to improve the performance of the model on high-noise, unbalanced microwave link data.

3. Cost-Sensitive Rainfall Intensity Prediction

3.1. Principle

To solve the high noise and class imbalance problem in CML-based rainfall intensity prediction, a cost-sensitive rainfall intensity prediction model (CSRFP) adds an attention-embedding layer (AEL) with spatiotemporal scene information and uses a rainfall cross-entropy loss (RF-CEL) for disciplining. CSRFP is divided into two parts: training and inference, and the structure is shown in Figure 1.

The model can allow the input time series to contain numerical data such as signal strength and character data such as spatiotemporal scene information. In the AEL, spatiotemporal scene information is embedded into dimensionally appropriate vectors, which are assigned different weights by the model and then merged with signal features. Normalization follows the AEL to unify the scales of all features, enhancing model stability. During training, the normalized training set data enters the recursive and linear layers, and the RF-CEL performs loss calculation and iteration. RF-CEL assigns appropriate weights to the training results based on the distribution probability of different rainfall intensities, making the model sensitive to costs. In inference, the normalized validation set or test set data passes through the recursive and linear layers, and the probability corresponding to each rainfall intensity is directly output by the SoftMax layer. This process ensures that the model can adequately account for varying rainfall intensity distributions, improving adaptability and stability across different scenarios.

3.2. Attention-Embedding Layer

The spatiotemporal scene information is a type of low-noise data that previous studies have overlooked. By encoding and embedding spatiotemporal scenes, AEL provides additional correlations and hidden information, such as the density of people or buildings near the stations, which helps the model detect noise and complex patterns in the signal data.

In the AEL, the embedding process encodes spatiotemporal scene information into vectors of appropriate dimensions. Attention, implemented as a simple two-layer perceptron, is responsible for training a set of vectors that assign weights to each feature. The concat module integrates these spatiotemporal scene features with signal features. The structure of AEL, as depicted in Figure 2, demonstrates this process where the input layer feeds into linear layers activated by ReLU functions. The weighted combinations of scene and signal data are then utilized in subsequent layers, ensuring that each feature is appropriately weighted and integrated, thereby enhancing the model’s performance and stability.

3.3. Rainfall Cross-Entropy Loss

RF-CEL evolved from cross-entropy loss (CEL, denoted as L_CE) and Weibull distribution of rainfall over a specific time period. CEL is a commonly used loss function in classification tasks. However, for datasets with class imbalances, relying solely on CEL often does not yield satisfactory results. In the context of multi-class tasks, the definition of CEL is as follows:

L_{C E} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{C} y_{i j} \ln (p_{i j})

(1)

where N stands for the number of samples; C represents the number of classes;

y_{i j}

is a binary indicator, which has a value of one when sample i is divided into class j, and zero otherwise; and

p_{i j}

is the probability as predicted by the model that sample i belongs to class j. When one class greatly outnumbers the others in the sample set, CEL mainly reflects the classification accuracy of that dominant class. In cases with complex feature relationships or weaker data-label correlations, relying solely on CEL can cause the model to neglect minority classes.

To counteract this issue, a weighting factor α is introduced in the balanced cross-entropy loss (BCEL). This adaptation is designed to improve upon the limitations encountered when using CEL in scenarios with intricate feature relationships or subdued data-label connections. The definition of BCEL is as follows:

L_{B C E} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{C} α_{i j} y_{i j} \ln (p_{i j})

(2)

In the BCEL, the set of weights

α_{j} \in {α_{1}, α_{2}, \dots, α_{C}}

allows for the adjustment of the loss contribution from different classes by tuning the size of α. However, α does not dynamically adjust the computation of losses. This means that for different rainfall datasets, it might be necessary to design distinct weights, making its application somewhat limited in scope.

Specifically, in the context of rainfall intensity prediction, a novel adjustment factor z can be introduced to construct a cost-sensitive loss function. This factor can be derived by integrating the distribution of rainfall with the rain attenuation effect, thereby tailoring the loss computation to more accurately reflect the nuances of rainfall data. This approach allows for a more effective and cost-sensitive handling of class imbalances in rainfall intensity prediction models.

The Weibull distribution is one of the most common distributions for rainfall [34]. Let RF denote the random variable representing rainfall rate. The probability density function of rainfall following the Weibull distribution is as follows:

f_{R F} (r) = a b r^{b - 1} e^{- a r^{b}}

(3)

In this equation, r represents the rainfall amount, and a, b, μ are the distribution parameters, determined by the when geographic conditions. By integrating

f_{R F} (r)

, the cumulative probability distribution of RF can be obtained as follows:

F_{R F} (r) = \int_{r}^{\infty} f_{R F} (r) = p {R F \geq r} = e^{- a r^{b}}

(4)

According to reference [35,36], the Z–R relationship in rainfall intensity prediction can be approximated as follows:

Z = q r^{k}

(5)

In this expression, Z is known as the radar reflectivity or unit rain attenuation, r is the rainfall amount, and q, k are empirical coefficients. By defining the inverse function of Equation (4) as

F_{R F}^{- 1} (p) = r

and substituting it into Equation (5), the following relationship can be derived:

Z = q {\frac{{[- \ln (p)]}^{\frac{1}{b}}}{a}}^{k}

(6)

Assuming that the rainfall factor Z has the same form as radar reflectivity Z, which gives the rainfall factor Z a certain degree of practical significance. Therefore, the expression for RF-CEL can be formulated as follows:

L_{R F} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{C} y_{i j} z_{i j} \ln (p_{i j})

(7)

z_{i j} = \frac{q}{a^{k}} {[- \ln (p_{i j})]}^{\frac{k}{b}}

(8)

From Equation (5), it is evident that as the numerical value of radar reflectivity Z decreases, the numerical value of rainfall amount r also decreases, showing a positive correlation between the two. From Equation (4), it is observed that as the rainfall amount r decreases, the probability p increases, indicating a negative correlation between them. As the forms of z and Z are identical, therefore, z and p are negatively correlated. As the probability

p_{i j}

nears one, the corresponding coefficient

z_{i j}

approaches zero. Conversely, as

p_{i j}

decreases,

z_{i j}

tends towards infinity. In scenarios such as the anticipation of no rainfall, a scenario often aligned with the predominant class, the probability

p_{i j}

tends to be notably elevated. Consequently, this circumstance leads to diminished penalties for inaccuracies in predictions. Conversely, for instances associated with the minority class, the inverse holds true. Introducing the coefficient z facilitates cost-sensitive learning within the realm of rainfall intensity forecasting, thereby modulating the model’s sensitivity to different classes in accordance with their individual probabilities.

3.4. Evaluation Metrics

In classification tasks, accuracy, defined as the proportion of correctly predicted samples to the total number of samples, is commonly used as a metric for evaluating model performance. However, in scenarios with class imbalance, accuracy often fails to effectively reflect the model’s true performance. As the imbalance becomes more pronounced, the value of accuracy as a metric further diminishes. For imbalanced multi-class problems, metrics like recall, precision, the area under the curve (AUC) and F-measure are more reliable metrics of performance. The confusion matrix for binary classification problems is illustrated as shown in Table 1.

In the macro measurements, Recall, Precision, and F₁ are defined as follows:

R_{m a c r o} = \frac{1}{C} \sum_{i = 1}^{C} \frac{T P}{T P + F N}

(9)

P_{m a c r o} = \frac{1}{C} \sum_{i = 1}^{C} \frac{T P}{T P + F P}

(10)

F_{1} = \frac{2 P_{m a c r o} R_{m a c r o}}{P_{m a c r o} + R_{m a c r o}}

(11)

The receiver operating characteristic (ROC) curve is a widely used tool for observing model performance. The curve’s vertical axis represents the true positive rate (TPR), and the horizontal axis represents the false positive rate (FPR). Here, TPR is equivalent to precision, while FPR is defined as

F P R = \frac{F P}{F P + T N}

. ROC curves are suitable for assessing the overall performance of classifiers. However, in imbalanced classification tasks, if the combined sample size of all other classes is greater than the size of one class, an increase in FP has a relatively small impact on FPR. This can lead to an overestimation of the model’s performance in ROC analysis. The precision–recall (P-R) curve addresses this issue by plotting recall on the horizontal axis and precision on the vertical axis, thereby eliminating the effects of many negative samples. The closer the P-R curve approaches the upper right corner, or the ROC curve approaches the upper left corner, the better the model’s classification performance. However, different curves may intersect, and the AUC provides a more intuitive means of comparing the performance of different classifiers. The AUC value ranges from zero to one. In multi-class tasks, for a particular class, an AUC greater than 0.5 indicates that the classifier can distinguish that class; otherwise, it lacks such ability.

Recall, Precision,

A U C_{R O C}

,

A U C_{R O C}

, and F₁ metrices introduced above can comprehensively reflect the prediction performance of the model from multiple dimensions. In the rainfall prediction based on CMLs, the comprehensive consideration of these metrics can greatly improve the reliability of the results compared with the accuracy of single use.

4. Experimentation

4.1. Summary of Data

The experimental data, provided by communications companies and the weather service, consists of routine records that did not require additional collection. The signal data from a communication link is sourced from over 70,000 stations in Fuzhou, where user reception conditions are statistically analyzed. Each station consists of 1 to 5 sub-stations, with a data collection frequency of one hour. The signal types include received signal strength indicator (RSSI) and reference signal received quality (RSRQ). Compared to the transmitter’s signal conditions, the user side experiences more complex interference, albeit with a broader coverage range. Additionally, the data encompasses coverage scenarios, administrative regions, grid areas, device manufacturer names, cell names, operating frequency bands, and other spatiotemporal features. An overview of some of the key features is provided in Table 2.

By integrating these two datasets chronologically, we created a rainfall communication dataset (RCD). This dataset uses the subsequent hour’s rainfall amount as the label. Given the large volume of data, the experimental dataset was selected from representative time periods. The number of samples of each rainfall intensity used in the experiment is shown in Table 3.

4.2. Procedures

These data also include numerical features such as signal strength, which poses challenges for direct application of non-AI methods. Additionally, according to the research by Lian and Kumar [9,29], traditional machine learning methods are not suitable for this dataset. Furthermore, the large volume of data can lead to inefficiencies in machine learning approaches. Therefore, this experiment primarily compares CSRFP with several deep learning models used in past studies and current state-of-the-art time series forecasting models to validate CSRFP’s superior performance in handling imbalanced rainfall sequences. Rainfall intensity is categorized into four levels based on hourly rainfall amounts: no rain, light rain, moderate rain, and heavy rain, labeled as 0, 1, 2, and 3, respectively. Calculating six metrics under macro measurement: Recall, Precision,

A U C_{R O C}

,

A U C_{P R}

, F₁, and Accuracy score. The three main parts of the experiment are as follows:

I.: To compare the prediction accuracy of MLP, RNN, and LSTM and discuss why CSRFP improves based on RNN.
II.: To test the influence of RF-CEL and spatiotemporal scene information encoding on the structure performance of RNN and to verify the ability of CSRFP to resist the influence of noise and deal with class imbalance.
III.: The ROC and PR curves were used to observe the performance of RNN, RNN_BCE (RNN using BCEL), and CSRFP in predicting different classes to compare the ability of CEL, BCEL, and RF-CEL in dealing with class imbalance problems.

In an initial comparison using CEL as the loss function, the performance of native MLP, RNN, and LSTM models was evaluated. The MLP model failed to converge during training, with the loss and accuracy on the validation set increasing over time. Figure 3 illustrates the accuracy rates on the test set for MLP, RNN, and LSTM, with yellow representing predicted values, blue indicating actual values, and green denoting true positives. It is evident that the MLP model struggles with the class imbalance present in the dataset, predominantly predicting all samples as no rain. In contrast, RNN and LSTM exhibit better predictive performance for no rain, light rain, and moderate rain compared to MLP. However, both RNN and LSTM still face challenges in accurately identifying heavy rain scenarios. This highlights the limitations of these models in handling highly imbalanced data distributions, particularly for rare but critical events such as heavy rainfall.

In further ablation experiments, the model adding only spatiotemporal scene information is named RNN_zone, and the model using only RF-CEL is named RNN_RF. By comparing the standard RNN model with RNN_zone and RNN_RF, insights can be gained into how much RF-CEL and encoding of spatiotemporal scene information can influence the predictive capabilities of the model. The performance metrics for these models are presented in Table 4.

For RNN_BCE, the weight α_ij was set inversely proportional to the frequency of each label in the sample. In the CSRFP, z_ij was set based on reference [35,37] and adjusted to the specific situation of Fuzhou. In the final comparison among RNN, RNN_BCE, and CSRFP, their ROC and PR curves are displayed in Figure 4, ordered from top to bottom as RNN, RNN_BCE, and CSRFP, respectively.

The macro-measured AUC_ROC values for these models are 0.72, 0.79, and 0.86, respectively, while the AUC_PR values are 0.42, 0.54, and 0.66, respectively. In terms of recognizing heavy rainfall, given the smaller number of heavy rain labels, the PR curve provides a more reliable measure of performance compared to the ROC curve. The AUC_PR values for the heavy rain class for the three models are 0.07, 0.43, and 0.62, respectively. CSRFP improves the prediction performance for all classes of rainfall, with the PR curve of heavy rain prediction showing a clear improvement compared to RNN_BCE. This is illustrated in Figure 4, where the ROC and precision–recall curves for different classes are shown, highlighting the superior performance of CSRFP in distinguishing between different rainfall intensities, especially for the critical heavy rain scenarios.

4.3. Result and Discussion

The performance metrics for MLP, LSTM, Informer, RNN, RNN_BCE, and CSRFP specifically for the heavy rain class, along with their overall performance metrics under Macro measurement and test accuracy, are summarized in Table 5 and Table 6, respectively.

From Table 5 and Table 6, the three typical models that performed well in previous studies for rainfall prediction based on CML signal data collected on professional signal monitoring equipment and Informer, one of the most advanced series prediction models, are affected by high noise when forecasting on RCD, and their prediction ability is reduced. Moreover, the class imbalance in RCD seriously reduces the prediction accuracy of the four models for the minority class. Therefore, direct use of the standard model does not enable CML-based rainfall prediction.

Compared with Informer and LSTM, RNN has little difference in various indexes, and the overall prediction ability is similar. This indicates that in short series prediction tasks such as short-term rainfall prediction, the advanced models improved for long series, such as Informer and LSTM, have no significant performance advantages. In order to avoid higher training costs, the improved model based on the lower complexity of RNN is a more reasonable choice.

Compared with RNN, RNN_BCE, which balances the weights of each class, has different degrees of improvement in Recall, Precision, AUC_ROC, AUC_PR and F₁, but the overall Accuracy is reduced. The MLP, which completely lacks predictive ability for minority classes, outperforms RNN, LSTM, and Informer in overall accuracy. These validate that overall accuracy metrics cannot reflect a model’s performance on imbalanced data.

Table 4 and Table 6 show that the coding with spatiotemporal scene information improves the performance metrics of rainfall intensity prediction in different degrees, which proves that it can suppress the influence of high noise in RCD to some extent. RNN_RF is higher than RNN_BCE in the mean value of each metric, indicating that RF-CEL can deal with the class imbalance problem better than BCEL.

CSRFP achieves 0.68, 0.62, 0.86, 0.66, 0.65, and 0.75 in Recall, Precision, AUC_ROC, AUC_PR and F₁ and Accuracy, respectively and increases about 19%, 37%, 8%, 22%, 30%, and 17% compared to the highest values of other models in the experiment. In the prediction of different rainfall intensities, the proposed CSRFP model achieved accuracy rates of 0.80, 0.65, 0.59, and 0.68 for no rain, light rain, moderate rain, and heavy rain, respectively. Compared to the optimal values of other models, this represents improvements of 4%, 16%, 3%, and 13%. CSRFP can suppress the effects of high noise and class imbalance, which generates a superior result compared to previous models on the highly noisy and class imbalanced datasets.

5. Conclusions and Prospect

A CMLs-based rainfall intensity prediction model, called CSRFP, is proposed to solve the high noise and class imbalance problems in CMLs-based rainfall intensity prediction. The encoding of spatiotemporal scene information provides additional correlations and hidden information, helping to detect complex patterns in the data and reduce the impact of noise. RF-CEL addresses the class imbalance problem effectively by back-calculating the signal attenuation based on the probability of different rainfall intensities from the Weibull distribution, thereby assigning loss weights to different rainfall intensity classes. Extensive experiments are carried out on RCD with high noise and class imbalance. Multiple evaluation metrics enhance the reliability of the prediction result. In terms of prediction result, MLP performs poorly in no rain, light rain, moderate rain, and heavy rain. RNN and LSTM show better performance in no rain, light rain, and moderate rain compared to MLP, but their performance in heavy rain remains subpar. Notably, CSRFP demonstrates strong performance in no rain, light rain, moderate rain, and heavy rain.

Compared to previous models, the proposed CSRFP represents a significant advancement in the development of rainfall regression prediction based on CMLs, showcasing improved predictive performance. However, it currently only achieves a four-class rainfall classification, which highlights the gap towards the ultimate goal of accurately predicting rainfall amounts sought by many researchers. Ongoing research aims to enhance CSRFP’s capabilities, paving the way for future low-carbon, high spatial resolution rainfall regression predictions.

Smart cities are a current hot concept, and the rainfall prediction based on CMLs with high-resolution characteristics is an important part of smart city design. However, the concentration time in urban subcatchments is usually short because there are many impermeable surfaces in cities, causing rainfall to quickly gather and form surface runoff. In practical cases, many urban flood events are caused by heavy rainfall occurring within a short period (e.g., within 30 min). A one-hour rainfall duration may exceed the concentration time of most urban subcatchments, causing an imbalance between rainfall and runoff. Therefore, using a one-hour rainfall duration may underestimate the risk of urban flooding. In future research, we will analyze shorter-duration rainfall events to facilitate early measures against flood risks in urban areas. Additionally, when weather phenomena such as snowfall and haze occur, microwave signals are similarly affected as they are during rainfall. Therefore, related research is expected to extend to more meteorological phenomena. With advancements in related technologies, CSRFP is anticipated to play a significant role in the development of smart cities.

Author Contributions

Conceptualization, L.Z.; Methodology, L.Z.; Software, L.Z.; Validation, L.Z.; Resources, J.L., Z.H., L.L. and J.C.; Writing – original draft, L.Z.; Writing – review & editing, L.Z., J.L., Y.L., Q.Z. and Q.C.; Supervision, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported by the Natural Science Foundation of Fujian Province, China (2021J01124, 2021J01461).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors express gratitude to the Fuzhou Meteorological Bureau and China Mobile Communications Group for the rainfall data and communication data used in the study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yuan, C.; Li, Q.; Nie, W.; Ye, C. A Depth Information-Based Method to Enhance Rainfall-Induced Landslide Deformation Area Identification. Measurement 2023, 219, 113288. [Google Scholar] [CrossRef]
He, J.; Zhang, L.; Xiao, T.; Wang, H.; Luo, H. Prompt Quantitative Risk Assessment for Rain-Induced Landslides. J. Geotech. Geoenviron. Eng. 2023, 149, 04023023. [Google Scholar] [CrossRef]
Ma, J.; Lei, D.; Ren, Z.; Tan, C.; Xia, D.; Guo, H. Automated Machine Learning-Based Landslide Susceptibility Mapping for the Three Gorges Reservoir Area, China. Math. Geosci. 2024, 56, 975–1010. [Google Scholar] [CrossRef]
Liu, Z.; Ma, J.; Xia, D.; Jiang, S.; Ren, Z.; Tan, C.; Lei, D.; Guo, H. Toward the Reliable Prediction of Reservoir Landslide Displacement Using Earthworm Optimization Algorithm-Optimized Support Vector Regression (EOA-SVR). Nat. Hazards 2024, 120, 3165–3188. [Google Scholar] [CrossRef]
Cao, Y.; Zhu, C.; Zheng, Q.; Sun, Z.; Liao, R. Energy Consumption Analysis of Weather Radar System in China. In Proceedings of the 3rd International Conference on Artificial Intelligence and Electromechanical Automation (AIEA 2022), Changsha, China, 8–10 April 2022; Volume 12329, p. 1232945. [Google Scholar]
Montes-Pajuelo, R.; Rodríguez-Pérez, Á.M.; López, R.; Rodríguez, C.A. Analysis of Probability Distributions for Modelling Extreme Rainfall Events and Detecting Climate Change: Insights from Mathematical and Statistical Methods. Mathematics 2024, 12, 1093. [Google Scholar] [CrossRef]
Messer, H.; Zinevich, A.; Alpert, P. Environmental Monitoring by Wireless Communication Networks. Science 2006, 312, 713. [Google Scholar] [CrossRef]
Leijnse, H.; Uijlenhoet, R.; Stricker, J.N.M. Rainfall Measurement Using Radio Links from Cellular Communication Networks. Water Resour. Res. 2007, 43. [Google Scholar] [CrossRef]
Lian, B.; Wei, Z.; Sun, X.; Li, Z.; Zhao, J. A Review on Rainfall Measurement Based on Commercial Microwave Links in Wireless Cellular Networks. Sensors 2022, 22, 4395. [Google Scholar] [CrossRef]
Zhang, P.; Liu, X.; Pu, K. Precipitation Monitoring Using Commercial Microwave Links: Current Status, Challenges and Prospectives. Remote Sens. 2023, 15, 4821. [Google Scholar] [CrossRef]
Beritelli, F.; Capizzi, G.; Lo Sciuto, G.; Napoli, C.; Scaglione, F. Rainfall Estimation Based on the Intensity of the Received Signal in a LTE/4G Mobile Terminal by Using a Probabilistic Neural Network. IEEE Access 2018, 6, 30865–30873. [Google Scholar] [CrossRef]
Christofilakis, V.; Tatsis, G.; Votis, C.T.; Chronopoulos, S.K.; Kostarakis, P.; Lolis, C.J.; Bartzokas, A. Rainfall Measurements Due to Radio Frequency Signal Attenuation at 2 GHz. Signal Inf. Process. 2018, 9, 192–201. [Google Scholar] [CrossRef]
Avanzato, R.; Beritelli, F. Hydrogeological Risk Management in Smart Cities: A New Approach to Rainfall Classification Based on LTE Cell Selection Parameters. IEEE Access 2020, 8, 137161–137173. [Google Scholar] [CrossRef]
Brito, L.; Keese Albertini, M. Data Mining of Meteorological-Related Attributes from Smartphone Data. J. Comput. Sci. 2016, 15, 1–9. [Google Scholar]
Riera, J.M.; Pimienta-Del-Valle, D.; Pérez-Peña, S.; Garcia-Del-Pino, P.; Benarroch, A.; Calvo, A.I.; Blanco-Alegre, C. Characterization of Rain Attenuation in 80–200 GHz from Experimental Drop Size Distributions. IEEE Trans. Antennas Propag. 2023, 71, 4371–4379. [Google Scholar] [CrossRef]
Chawla, N.V.; Japkowicz, N.; Kotcz, A. Editorial: Special Issue on Learning from Imbalanced Data Sets. ACM SIGKDD Explor. Newsl. 2004, 6, 1–6. [Google Scholar] [CrossRef]
Depto, D.S.; Rizvee, M.; Rahman, A.; Zunair, H.; Rahman, M.S.; Mahdy, M.R.C. Quantifying Imbalanced Classification Methods for Leukemia Detection. Comput. Biol. Med. 2023, 152, 106372. [Google Scholar] [CrossRef]
Zhu, H.; Zhou, M.; Liu, G.; Xie, Y.; Liu, S.; Guo, C. NUS: Noisy-Sample-Removed Undersampling Scheme for Imbalanced Classification and Application to Credit Card Fraud Detection. IEEE Trans. Comput. Soc. Syst. 2023, 11, 1793–1804. [Google Scholar] [CrossRef]
Li, L.; He, H.; Li, J. Entropy-Based Sampling Approaches for Multi-Class Imbalanced Problems. IEEE Trans. Knowl. Data Eng. 2020, 32, 2159–2170. [Google Scholar] [CrossRef]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar]
Rodríguez, J.J.; Díez-Pastor, J.-F.; Arnaiz-González, Á.; Kuncheva, L.I. Random Balance Ensembles for Multiclass Imbalance Learning. Knowl. Based Syst. 2020, 193, 105434. [Google Scholar] [CrossRef]
de Oliveira Simoyama, F.; Croope, S.; de Salles Neto, L.L.; Santos, L.B.L. Optimization of Rain Gauge Networks—A Systematic Literature Review. Socio-Econ. Plan. Sci. 2023, 86, 101469. [Google Scholar] [CrossRef]
Ouallouche, F.; Labadi, K.; Mohia, Y.; Lazri, M.; Ameur, S. Artificial Intelligence for Satellite Image Processing: Application to Rainfall Estimation. In Intelligent Systems and Applications; Kulkarni, A.J., Mirjalili, S., Udgata, S.K., Eds.; Springer Nature: Singapore, 2023; pp. 165–174. [Google Scholar]
Schleiss, M.; Berne, A. Identification of Dry and Rainy Periods Using Telecommunication Microwave Links. IEEE Geosci. Remote Sens. Lett. 2010, 7, 611–615. [Google Scholar] [CrossRef]
Bournas, A.; Baltas, E. Analysis of Weather Radar Datasets through the Implementation of a Gridded Rainfall-Runoff Model. Environ. Process. 2023, 10, 7. [Google Scholar] [CrossRef]
Djibo, M.; Ouedraogo, W.Y.S.B.; Doumounia, A.; Sanou, S.R.; Sawadogo, M.; Guira, I.; Koné, N.; Chwala, C.; Kunstmann, H.; Zougmoré, F. Towards Innovative Solutions for Monitoring Precipitation in Poorly Instrumented Regions: Real-Time System for Collecting Power Levels of Microwave Links of Mobile Phone Operators for Rainfall Quantification in Burkina Faso. Appl. Syst. Innov. 2023, 6, 4. [Google Scholar] [CrossRef]
Long, Y.; Li, W.; Huang, R.; Xu, Q.; Yu, B.; Liu, G. A Comparative Study of Supervised Classification Methods for Investigating Landslide Evolution in the Mianyuan River Basin, China. J. Earth Sci. 2023, 34, 316–329. [Google Scholar] [CrossRef]
Ma, S.; Shao, X.; Xu, C. Landslide Susceptibility Mapping in Terms of the Slope-Unit or Raster-Unit, Which Is Better? J. Earth Sci. 2023, 34, 386–397. [Google Scholar] [CrossRef]
Kumar, R.S.; Ramesh, C. A Study on Prediction of Rainfall Using Datamining Technique. In Proceedings of the 2016 International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 26–27 August 2016; Volume 3, pp. 1–9. [Google Scholar]
Qiu, M.; Zhao, P.; Zhang, K.; Huang, J.; Shi, X.; Wang, X.; Chu, W. A Short-Term Rainfall Prediction Model Using Multi-Task Convolutional Neural Networks. In Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA, USA, 18–21 November 2017; IEEE: New York, NY, USA, 2017; pp. 395–404. [Google Scholar]
Pudashine, J.; Guyot, A.; Petitjean, F.; Pauwels, V.R.N.; Uijlenhoet, R.; Seed, A.; Prakash, M.; Walker, J.P. Deep Learning for an Improved Prediction of Rainfall Retrievals from Commercial Microwave Links. Water Resour. Res. 2020, 56, e2019WR026255. [Google Scholar] [CrossRef]
Overeem, A.; Leijnse, H.; Uijlenhoet, R. Measuring Urban Rainfall Using Microwave Links from Commercial Cellular Communication Networks. Water Resour. Res. 2011, 47. [Google Scholar] [CrossRef]
Poornima, S.; Pushpalatha, M. Prediction of Rainfall Using Intensified LSTM Based Recurrent Neural Network with Weighted Linear Units. Atmosphere 2019, 10, 668. [Google Scholar] [CrossRef]
Livieratos, S.; Katsambas, V.; Kanellopoulos, J. A Global Method for the Prediction of the Slant Path Rain Attenuation Statistics. J. Electromagn. Waves Appl. 2000, 14, 713–724. [Google Scholar] [CrossRef]
Marshall, J.S.; Langille, R.C.; Palmer, W.M.K. Measurement of Rainfall by Radar. J. Atmos. Sci. 1947, 4, 186–192. [Google Scholar] [CrossRef]
Budalal, A.A.; Islam, M.R. Path Loss Models for Outdoor Environment—With a Focus on Rain Attenuation Impact on Short-Range Millimeter-Wave Links. E-Prime-Adv. Electr. Eng. Electron. Energy 2023, 3, 100106. [Google Scholar] [CrossRef]
Zheng, W.; Liu, S.; Zhou, Z.; Zhong, G.; Zhuang, Q. A Weibull-Based Framework for Uncertainty Evaluation in Rainfall Frequency Analysis. J. Hydrol. 2023; preprint. [Google Scholar] [CrossRef]

Figure 1. Model structure.

Figure 2. AEL structure.

Figure 3. Native model classification accuracy.

Figure 4. Receiver operating characteristic curve and precision—recall curve.

Table 1. Binary confusion matrix.

Label	Prediction
Label	Positive	Negative
Positive	True Positive (TP)	False Negative (FN)
Negative	False Positive (FP)	True Negative (TN)

Table 2. Introduction to the characteristics of the link section.

Feature	Number of Subfeatures	Data Range
Sampling time	1	0~23
District	1	/
Longitude	1	[W180, E180]
Latitude	1	[N90, S90]
E_ID	1	/
CI	1	/
Reference Signal Received Power	48	[−∞, +∞]
Reference Signal Received Quality	18	[−∞, +∞]
Uplink signal-to-noise ratio	37	[−∞, +∞]
Time Led Time	45	[0, 4096]
Tracking area code	1	/
Coverage scenario	1	/
Maximums transmit power	1	/
Attribution area	1	/
Attribution grid	1	/
Operating frequency band	1	/

Note: The values of signal data represent the number of people within specific intervals. For example, the value of “Reference Signal Received Power_00 [−∞, −120)” is 44, indicating that the reference signal received power for 44 people is less than −120.

Table 3. Sample distribution.

Rainfall Intensity	Training (h)	Verification (h)	Test (h)	Total
No rain (~0.3 mm/h)	462,941	59,759	143,017	665,717
Light (0.3~2.5 mm/h)	174,899	5916	25,564	206,379
Moderate (2.5~6 mm/h)	107,714	3811	10,231	121,756
Heavy (6 mm/h~)	38,703	1696	5117	45,516
Total	784,257	71,182	183,929	1,039,368

Note: The data are aggregated in three-hour time steps, and the total number of samples is related to the number of stations.

Table 4. Ablation experiments.

Metric	RNN	RNN_zone	RNN_RF
Recall	0.39	0.57	0.63
Precision	0.42	0.48	0.58
AUC_ROC	0.71	0.80	0.82
AUC_PR	0.42	0.50	0.60
F₁	0.41	0.52	0.61
Accuracy	0.64	0.64	0.71

Table 5. Heavy rain recognition ability.

Metric	MLP	LSTM	Informer	RNN	RNN_BCE	CSRFP
Recall	0.00	0.23	0.27	0.04	0.60	0.68
Precision	0.00	0.25	0.24	0.02	0.22	0.54
AUC_ROC	0.22	0.68	0.77	0.71	0.86	0.90
AUC_PR	0.02	0.12	0.10	0.07	0.43	0.62
F₁	0.00	0.24	0.25	0.03	0.32	0.60
Accuracy	0.00	0.05	0.05	0.03	0.60	0.68

Table 6. Results summary.

Metric	MLP	LSTM	Informer	RNN	RNN_BCE	CSRFP
Recall	0.25	0.44	0.40	0.39	0.57	0.68
Precision	0.17	0.42	0.45	0.42	0.45	0.62
AUC_ROC	0.33	0.69	0.66	0.71	0.79	0.86
AUC_PR	0.23	0.39	0.39	0.42	0.54	0.66
F₁	0.20	0.43	0.42	0.41	0.50	0.65
Accuracy	0.67	0.59	0.66	0.64	0.54	0.75

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zheng, L.; Lin, J.; Huang, Z.; Lin, Y.; Zheng, Q.; Chen, Q.; Lin, L.; Chen, J. Cost-Sensitive Rainfall Intensity Prediction with High-Noise Commercial Microwave Link Data. Sustainability 2024, 16, 8067. https://doi.org/10.3390/su16188067

AMA Style

Zheng L, Lin J, Huang Z, Lin Y, Zheng Q, Chen Q, Lin L, Chen J. Cost-Sensitive Rainfall Intensity Prediction with High-Noise Commercial Microwave Link Data. Sustainability. 2024; 16(18):8067. https://doi.org/10.3390/su16188067

Chicago/Turabian Style

Zheng, Liankai, Jiaxiang Lin, Zhixin Huang, Yu Lin, Qin Zheng, Qianqian Chen, Lizheng Lin, and Jianyun Chen. 2024. "Cost-Sensitive Rainfall Intensity Prediction with High-Noise Commercial Microwave Link Data" Sustainability 16, no. 18: 8067. https://doi.org/10.3390/su16188067

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cost-Sensitive Rainfall Intensity Prediction with High-Noise Commercial Microwave Link Data

Abstract

1. Introduction

2. Related Work

3. Cost-Sensitive Rainfall Intensity Prediction

3.1. Principle

3.2. Attention-Embedding Layer

3.3. Rainfall Cross-Entropy Loss

3.4. Evaluation Metrics

4. Experimentation

4.1. Summary of Data

4.2. Procedures

4.3. Result and Discussion

5. Conclusions and Prospect

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI