Event-Based Bias Correction of the GPM IMERG V06 Product by Random Forest Method over Mainland China

Liu, Zhenyu; Hou, Haowen; Zhang, Lanhui; Hu, Bin

doi:10.3390/rs14163859

Open AccessArticle

Event-Based Bias Correction of the GPM IMERG V06 Product by Random Forest Method over Mainland China

by

Zhenyu Liu

¹,

Haowen Hou

²,

Lanhui Zhang

^3,*,†

and

Bin Hu

^1,†

¹

Gansu Provincial Key Laboratory of Wearable Computing, School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China

²

School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China

³

Key Laboratory of West China’s Environmental System (Ministry of Education), College of Earth and Environmental Sciences, Lanzhou University, Lanzhou 730000, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2022, 14(16), 3859; https://doi.org/10.3390/rs14163859

Submission received: 27 April 2022 / Revised: 13 July 2022 / Accepted: 5 August 2022 / Published: 9 August 2022

Download

Browse Figures

Versions Notes

Abstract

:

The Global Precipitation Measurement (GPM) IMERG V06 product showed excellent performance in detecting precipitation, but still have room to improve. This study proposed an event-based bias correction strategy through random forest (RF) method to improve accuracy of the GPM IMERG V06 product over mainland China. Results showed that, over mainland China, biases caused by ‘hits’ events are most responsible for the errors of the GPM product, and ‘falseAlarms’ events took the least responsibility for that because of the small GPM values for ‘falseAlarms’ events. Compared with the raw GPM product, the bias-corrected GPM product showed better performance in both fitting observed precipitation values and in detecting precipitation events, thus the event-based bias-strategy in this study is credible. After bias correction, the ability of the bias-corrected GPM product was significantly improved for ‘hits’ events, but showed slight deterioration in RMSE and MAE and significant improvements in FAR and CSI for ‘falseAlarms’ events. This is because the established RF classification model tends to learn characteristics of the event with larger proportion, and then performed better in correctly identifying the event with larger proportion in the subregion.

Keywords:

GPM IMERG V06 product; event-based bias correction; random forest method; mainland China

1. Introduction

Precipitation is a fundamental component of water cycle, precipitation estimates with a high spatiotemporal resolution are crucial for hydrological, ecohydrological and meteorological studies [1,2,3]. Although gauge observations are typical measurements to directly observe precipitation at the Earth’s surface, but it is difficult to derive areal precipitation by gauge observations because of the low weather station density and uneven spatial distribution [1,2,3]. In recent decades, satellite-based precipitation products (SPPs) have been an important way to provide precipitation data at large scales, including the Naval Research Laboratory Global Blended-Statistical Precipitation Analysis (NRLgeo) [4], the Climate Prediction Center morphing method (CMORPH) [5,6,7], Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks (PERSIANN) [6,7,8], and Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA) [6,7,9], etc.

As a prototype of the Global Earth Observation System of Systems (GEOSS) identified by the Committee on Earth Observation Satellites (CEOS), the Global Precipitation Measurement (GPM) mission has been deployed on 28 February2014 by National Aeronautics and Space Administration (NASA) and the Japan Aerospace Exploration Agency (JAXA) [1,10]. The GPM carries the first spaceborne Dual-Frequency Precipitation Radar (DPR), which consists of a Ka-band precipitation radar (KaPR) operating at 35.5 GHz and a Ku-band precipitation radar (KuPR) operating at 13.6 GHz. It ensures temporal sampling of 3 h or less [11,12]. The current operational satellite precipitation products of GPM mainly include the GPM-GSMaP of JAXA and the IMERG (Integrated Multi-satellitE Retrievals for Global precipitation Measurement) product of NASA [13]. The IMERG product applies the algorithms integrating the advantages of previous multi-satellite precipitation retrieval algorithms, including TMPA, CMORPH, and PERSIANN [14,15]. Given these improvements, the IMERG products tend to perform better than the other SPPs in many regions across the world including Iran [16], East Asia [15], Malaysia [17], China [18], the United States and Mexico [19], thus got increasing attentions in related analysis.

The IMERG latest version (V06) was released in March 2019 with the modifications to the satellite intercalibrations, the inclusion of additional sensors and refinements to the Kalman filter process [20,21]. Recently, several studies focused on the evaluation of the IMERG V06 products around the world [22,23,24]. They pointed out that the IMERG V06 product includes significant improvements in the algorithm used to estimate precipitations and provide estimates for precipitation phase, but the products still have room to improve [20,24]. Bias correction of SPPs is the most popular way to improve their accuracy [25,26,27,28]. However, most previous studies focused on the evaluations of the GPM product, little has been carried out on data correction of the GPM product [22,23,24,29].

Generally, the total bias (T) in the SPPs can be divided into three independent components, bias of ‘hits’ events (H), bias of ‘falseAlarms’ events (F) and bias of ‘misses’ events (M) [23,30,31]. In which, ‘hits’ events are those events that both SPPs and gauge observation detect precipitation events, ‘falseAlarms’ refers to that SPPs identifies precipitation but the gauge observation does not, and ‘misses’ refers to that the gauge observation classifies them as precipitation but SPPs do not [28]. The sources of different bias components are of dissimilar nature [30], thus it is necessary to bias correct SPPs based on different bias components, that is, based on ‘hits’, ‘falseAlarms’, and ‘misses’ events. To date, there are numerous methods applied the previous researches to bias correct SPPs, which can be classified into three groups, interpolation method, physical model, and statistical method [29]. However, all the methods of three groups only focused on bias correction of ‘hits’ events because they cannot distinguish different bias components [31].

To bias correct SPPs based on different bias components, the effective solution is first distinguishing different bias components and then correcting them separately. It is a typical classification problem in machine learning to distinguish different bias components, while it is a regression problem to correct the bias of different events. To solve these two problems in one unified framework, random forest (RF) is employed since it is flexible and powerful ensemble method for both classification and regression problems based on decision trees, which has been widely used in hydrological applications [32,33,34]. Moreover, as an ensemble method, the RF method can be easily parallelized to reduce time consumption for a large data set [32,35]. The RF method can be applied without any assumptions; thus, it is suitable to deal with dataset with different data distributions [36]. In addition, the RF method can also provide the relative importance of different environmental factors to figure out their quantitative impacts on the performance of SPPs [32]. Thus, the RF method has been applied in the bias correction in this study.

Therefore, this study aims to bias correct the GPM IMERG V06 product based on different events over mainland China. The gauge observations were collected at 13,691 stations from 25 August 2015 to 31 May 2019. The GPM IMERG V06 was first evaluated in eight climatic and topographic subregions over mainland China, then event-based bias correction of the GPM product was conducted through bias correction strategy implemented by random forest method, and finally the suitability and uncertainty of the bias correction strategy was discussed. This study would provide essentials for applications of the GPM product and provide insights for future improvements.

2. Bias Correction Strategy

The GPM product was bias corrected based on different events in this study. As all the GPM data are 0 for ‘misses’ events, only ‘hits’ events and ‘falseAlarms’ events (GPM data higher than 0) were bias corrected. As shown in Figure 1, bias correction strategy in this study includes two steps, (1) event identification and (2) error correction for identified ‘hits’ events. Due to their significant impacts on performance of the GPM product, digital elevation model (DEM) data, air temperature, and topographical and climatic sub-regions at each grid were selected as environmental factors for bias correction in this study [13,18,23,37,38,39,40].

Both steps are conducted through the RF method. All the data was randomly divided into training set and validation set with a data ratio of 4:1. This ratio means 80% data are for training and other 20% data are for validation, which balance the effectiveness and time consumption. In the first step, a RF classification model was established to identify ‘hits’ events and ‘falseAlarms’ events based on the raw GPM data (the original GPM) and environmental factors. The input was the vector combined by the raw GPM data, air temperature, subregions, DEM, and date. The output was binary value, which is 0 for ‘hits’ events and 1 for ‘falseAlarms’ events. Then grid search and 5-fold cross validation are employed on training set for super parameters that make the model perform better. K-fold cross-validation is a popular validation method in machine learning to deal with large data. K is often set as 5 or 10, which means all data are equally and randomly divided into 5 or 10 parts. For example, 10-fold cross-validation signifies 9 parts data are for training and 1 part data are for testing in a single testing loop. In addition, every part of data takes turns once as testing data. This k-fold cross-validation could reduce the fluctuation caused by data contingency. Super parameters in random forest refer to the number of decision trees in random forest, maximum of depth for decision tree, split criterion, and so on. These super parameters describe a unique random forest model in specific application scenarios. According to the outcome of 5-fold cross validation, random forest super parameters in the first step has 70 base estimators and each estimator has max depth 25. Leaf should have at least 70 samples. When nodes are split, two features are considered when looking for the best split and Gini index is used to measure the quality of a split. Trained random forest model is used to distinguish ‘hits’ and ‘falseAlarms’ events in the validation set to avoid data leakage. All ‘hits’ event samples were given weights for better feature learning. Finally, through the established RF classification model, the events with the raw GPM data higher than 0 were identified as ‘falseAlarms’ events or ‘hits’ events, respectively. Then, the bias-corrected GPM data of all identified ‘falseAlarms’ events were set as 0, and the GPM data of identified ‘hits’ events were further bias corrected in the second step.

In the second step, a RF regression model was established for only ‘hits’ events using same input variables in the first step. The output were biases of the raw GPM data, which are the differences of raw GPM data and the corresponding gauge observations. Model training and super parameters optimization procedure is the same as in the first step. According to the outcome of 5-fold cross validation, random forest super parameters in the second step has 70 base estimators and each estimator has max depth 78. Leaf should have at least 87 samples. When nodes are split, two features are considered when looking for the best split and Gini index is used to measure the quality of a split. Through the established RF regression model, the biases of the raw GPM data for all identified ‘hits’ events were determined, and then the bias-corrected GPM data were obtained by the differences of the raw GPM data and the bias values.

3. Materials and Methods

3.1. Study Area

The mainland China is located between 73°~135°E and 18°~53°N, with complex topography and strong precipitation variability (Figure 2). The mainland China can be divided into eight sub-regions depending on topography and climatic conditions, which were widely applied to comprehensively investigate the error characteristics of GPM IMERG products in previous studies [23,37,38,39,41]. The eight subregions are (1) inland Xinjiang (XJ), (2) the Qinghai-Tibetan plateau (TP), (3) Northwestern China (NW), (4) Northeastern China (NE), (5) Northern China (NC), (6) the Changjiang (CJ) River Plain, (7) the southwestern China (SW), and (8) southeastern China (SE) (Figure 2a).

In details, the XJ region located in central Asia is dominated by an arid or semiarid climate with little precipitation over the whole year. There are three mountain ranges (the Altay, Tianshan, and the Kunlun Mountains) and two basins (the Junggar and Tarim Basins) in this area [42]. Known as the “Third Pole”, the TP region is dominated by the Plateau mountain climate with high altitude, complex topography and significant precipitation variability [43]. The NW region is dominated by a dry climate and bounded by the 400 mm annual precipitation isohyet. The NE region is located in the north of the Yan Mountains at high latitudes. The NC region is located in the north of the Qinling Mountains and the Huai River. The climate of NW, NE and NC is primarily controlled by the temperate continental climate with a hot and wet summer and a cold and dry winter [37]. The plain CJ region is located in the Middle-Lower reaches of Yangtze River, dominated by hills, low mountains, and plains [44]. The SW region is Sichuan basin and Yungui Plateau, bounded by the Dabashan Mountains to the north and the Wulingshan Mountains to the east. The SE region is bounded by the Nanling Mountains to the north and the Wuyishan Mountains to the northwest. The most distinctive climatic feature of this subregion is the East Asian monsoon [45]. The above eight subregions were also applied in this study. The details of eight subregions are shown in Table 1 and Table 2.

3.2. Gauge Observations

Generally, in-situ gauge precipitation observations are considered as the standard data for validation of satellite-based precipitation products [46]. In this study, we obtained daily precipitation data from 13,691 stations archived by the Chinese Meteorological Administration (CMA) during 25 August 2015 to 31 May 2019 over mainland China (Figure 2b), which were compiled by the National Meteorological Information Center (NMIC) of CMA. All of the in situ observations have been quality controlled by CMA, including examining extreme values, conducting internal consistency checks, and removing questionable data [47,48,49].

3.3. GPM Product

The GPM IMERG product provides three kinds of products, including the near real time ‘early’ and ‘late’ run products, and the post real time ‘final’ run product. Compared to the ‘early’ and ‘late’ runs, the ‘final’ run of IMERG V06 shows an improvement in the performance of precipitation estimation [24]. Therefore, the GPM IMERG V06 ‘final’ run product (hereafter is called ‘the GPM product’) was used for bias correction in this study. The product is at daily interval with a spatial resolution of 0.1° × 0.1° from 25 August 2015 to 31 May 2019 over mainland China.

3.4. Evaluation Indices

The interpolation methods would be affected by calculation principle and study area, thus leading to uncertainty and errors of interpolation results [50]. Therefore, the GPM product was compared with gauge observation instead of interpolation analysis to ensure the accuracy of the gauge data in this study. Specifically, for each grid of the GPM data, all the observations of the gauge stations located in the grid cell were averaged and then compared with the GPM data.

Three statistical indices were used to measure the accuracy of both the raw and bias corrected GPM products in fitting the gauge observations at daily step [28,51,52]: correlation coefficient (R), root-mean-square errors (RMSE), and mean absolute error (MAE). The higher R, lower RMSE and MAE, the better performance of the GPM estimates in fitting the observations.

The different bias components of both the raw and bias corrected GPM products were evaluated by three contingency indices, probability of detection (POD), false alarm ratio (FAR) and critical success index (CSI), which is commonly used in related studies [28,53,54]. POD measures the fraction of observed events that were correctly diagnosed and is also called the ‘hits’ rate. FAR provides the fraction of diagnosed events that were actually ‘falseAlarms’. CSI gives the overall fraction of events correctly diagnosed by each dataset [28]. All the three contingency statistics range from 0 to 1, with perfection represented by a POD of 1, a FAR of 0 and a CSI of 1.

4. Results

4.1. Evaluation of the Raw GPM IMERG V06 Product

As shown in Figure 3, ranging in −0.002~0.903 and concentrating in 0.200~0.800, the R values were high in eastern China (NE, NE, CJ, and SE) and low in western China (XJ and TP). The RMSE values ranged in 0.298~18.822 mm/d and concentrated in 0.298~12.000 mm/d, while the MAE values ranged in 0.057~5.790 mm/d and concentrated in 0.057~5.000 mm/d. Both of the RMSE and MAE values were high in southeast China (CJ and SE) and low in northwest China (XJ, TP, and NW). Overall, evaluated by statistics indices, the GPM product performed best in NE and NC subregions with highest R values and lower RMSE and MAE values, then followed by NW subregion, SW subregion, while worst in XJ and TP subregions with lowest R values and lowest RMSE and MAE values and in CJ and SE subregions with highest R values and highest RMSE and MAE values.

The POD values ranged in 0.348~1.000 and concentrated in 0.500~1.000, were high except SW subregion and some stations in XJ and TP subregions (Figure 3d). As POD is the ratio between ‘hits’ and ‘hits’ + ’misses’, the high POD values indicate low ‘misses’ events of the GPM product. The FAR values ranged in 0.056~0.990, and showed high values in northern China and low values in southern China (Figure 3e), indicating that high ‘falseAlarms’ events occurred in northern China. The CSI values ranged in 0.010~0.764 and concentrated in 0~0.700, and showed low values in northern China and high values in southern China (Figure 3f), indicating that high ‘hits’ events occurred in southern China. Evaluated by contingency indices, the GPM product performed better in SW, CJ, and SE subregions with lowest FAR values and highest CSI values, then in NW, NE, TP, and NC subregions with lower FAR values and higher CSI values, and finally in XJ subregions with highest FAR values and lowest CSI values.

The total error was broken down into three components of ‘hits’, ‘falseAlarms’, and ‘misses’ errors, and error decomposition over different regions can offer a more detailed analysis [23,30]. Thus, the errors for three events were evaluated in eight subregions. As shown in Figure 4a, ‘falseAlarms’ event numbers are significantly higher than ‘hits’ event numbers in XJ, TP, NW, NE, and NC subregions, while ‘hits’ event numbers are significantly higher than ‘falseAlarms’ event numbers in CJ, SW, and SE subregions. Both ‘falseAlarms’ and ‘hits’ event numbers are significantly higher than ‘misses’ event numbers in all the eight subregions. The RMSE and MAE values are in a descending order of ‘hits’ events, ‘misses’ events and ‘falseAlarms’ events in all the eight subregions over mainland China (Figure 4b,c). Therefore, the bias caused by ‘hits’ events is most responsible for the errors of the GPM product, and ‘falseAlarms’ events took the least responsibility over mainland China although with high event numbers in northern China.

4.2. Bias Correction of the GPM by Random Forest

4.2.1. Performance Improvement by the First Step of Bias Correction Strategy

To evaluate the first step of bias correction strategy, the differences of each index between data after bias correction of the first step and the raw GPM data are shown in Figure 5. The first step identified ‘hits’ and ‘falseAlarms’ events, and then set the GPM data as 0 for the identified ‘falseAlarms’ events. In the training period, the R values increased with differences ranging in −0.022~0.170 in XJ subregion, increased at several stations in NW, NE, and NC subregions, and showed no-variation at the other stations. The RMSE values decreased with differences ranging in −0.287~0.007 mm/d in XJ subregion, decreased at several stations in NW, NE, and NC subregions, and showed no-variation at the other stations. The differences of MAE values ranged in −0.317~0 mm/d and concentrated in −0.040~0 mm/d over mainland China. The POD values significantly decreased with differences ranging in −0.197~0 in XJ subregion, decreased at several stations in NW, NE, and NC subregions, and showed no-variation at the other stations. However, the FAR values showed significant decreases with differences ranging in −0.320~0.002 and concentrating in −0.200~0, and the CSI values showed significant increases with differences ranging in −0.027~0.281 and concentrating in 0~0.150.

As shown in Figure 5, in the validation period, the R values increased with differences ranging in −0.018~0.184 in XJ subregion, increased at several stations in NW, NE, and NC subregions, and showed no-variation at the other stations. The RMSE decreased with differences ranging in −0.595~0.044 mm/d in XJ subregion, decreased at several stations in NW, NE, and NC subregions, and showed no-variation at the other stations. The differences of MAE values ranged in −0.358~0.005 mm/d and concentrated in −0.040~0 mm/d over mainland China. The POD values significantly decreased with differences ranging in −0.500~0 in XJ subregion, decreased at several stations in NW, NE, and NC subregions, and showed no-variation at the other stations. However, the FAR values showed significant decreases with differences ranging in −0.407~0.167 and concentrating in −0.200~0, and the CSI values showed significant increases with differences ranging in −0.054~0.366 and concentrating in 0~0.150.

In summary, evaluated by statistics indices, the first step of bias correction strategy only significantly improved the performance of the GPM product in XJ subregion in both training and validation periods. However, with significant decrease of FAR values and significant increase of CSI values, the first step of bias correction strategy significantly improved the ability of the GPM product in detecting precipitation events.

4.2.2. Performance Improvement by the Second Step of Bias Correction Strategy

To evaluate the second step of bias correction strategy, the differences of each index between data after bias-correction of the second step and data after bias-correction of the first step are shown in Figure 6. Since the second step only correct the biases of identified ‘hits’ events, there are no changes for contingency indices, thus only the statistics indices were applied in the comparison. In the training period, the R values generally increased with differences ranging in −0.201~0.620 and concentrating in −0.050~0.200. The RMSE values generally decreased with differences ranging in −4.003~1.380 mm/d and concentrating in −3.000~0.500 mm/d. The differences of MAE values ranged in −0.904~1.194 mm/d and concentrated in −0.600~0.600 mm/d. In the validation period, the R values generally increased with differences ranging in −0.456~0.481 and concentrating in −0.100~0.200. The RMSE values generally decreased with the differences ranging in −5.880~2.859 mm/d and concentrating in −3.000~1.000 mm/d. The differences of MAE values ranged in −1.277~1.608 mm/d and concentrated in −0.600~0.800 mm/d.

Overall, with increased R values and decreased RMSE values in both training and validation periods, the bias-corrected GPM product after the second step showed better performance in fitting observed precipitation values than the bias-corrected GPM product after the first step. Moreover, there are no obvious regional differences of the improvements in statistics indices by the second step.

4.2.3. Overall Performance Improvement after Bias Correction

The differences of each index between the GPM data after overall bias correction and the raw GPM data are shown in Figure 7. The R values increased with differences ranging in −0.340~0.620 and concentrating in 0~0.200. The RMSE values decreased with differences ranging in −3.838~1.425 mm/d and concentrating in −2.000~0 mm/d. The differences of MAE values ranged in −0.817~1.171 mm/d and concentrated in −0.600~0.600 mm/d. The POD values showed slight decreases with differences ranging in −0.200~0 and concentrating in −0.040~0. The FAR values showed significant decreases with differences ranging in −0.308~0 and concentrating in −0.250~0, and the CSI values showed significant increases with differences ranging in −0.030~0.271 and concentrating in 0~0.150. Overall, with increased R values and decreased RMSE values, the bias-corrected GPM product showed better performance in fitting observed precipitation values than the raw GPM product in this study. Meanwhile, with slightly decreased POD values, significantly decreased FAR values and significantly increased CSI values, the bias-corrected GPM product also performed better performance in detecting precipitation events than the raw GPM product.

Since the bias strategy in this study corrected the raw GPM product based on ‘hits’ events and ‘falseAlarms’ events, the performance improvements were evaluated for two events of the raw GPM product by the differences of RMSE and MAE values between the GPM data after overall bias correction and the raw GPM data. As shown in Figure 8, both the RMSE and MAE values significantly decreased after bias-correction for ‘hits’ events, but slightly increased after bias-correction for ‘falseAlarms’ events. Moreover, in southern China (CJ, SW, and SE), the decreases of RMSE and MAE values for ‘hits’ events are higher and the increases of two indices for ‘falseAlarms’ events are lower than in the other regions.

5. Discussion

5.1. Reliability of the Evaluation in This Study

Evaluated by statistics indices, the GPM IMERG V06 product performed best in NE and NC subregions, then followed by NW subregion, SW subregion, while worst in XJ and TP subregions with lowest R, RMSE and MAE values and in CJ and SE subregions with highest R, RMSE and MAE values. It is consistent with conclusions in previous studies that both R and RMSE values were high in southeastern China and low in northwestern China [55]. Evaluated by contingency indices, the GPM IMERG V06 product performed best in SW, CJ, and SE subregions, then in NW, NE, TP, and NC subregions, and finally in XJ subregion, which is also consistent with the spatial patterns in previous studies [38,55].

In the previous evaluations against gauge observations at daily scale, the GPM IMERG product showed R values of 0.50~0.98, RMSE values of 0.41~9.5 mm/d, POD values of 0.5~0.9, FAR values of 0.2~1, and CSI values of 0~0.51 [37,38,39,40,50], which showed smaller ranges than the evaluation results in this study. In addition to different GPM versions in the evaluations, the differences are mainly because the evaluation in this study was conducted at each station instead of evaluations by all the data over mainland China or in subregions in most previous studies. Moreover, there were observations from 13,691 stations used in this study, which is much larger than previous studies. Thus, the evaluation results in this study are more comprehensive and reliable.

5.2. The Impacts of Errors in Detecting Light Precipitation of the GPM Product

As shown in Figure 9, the GPM data for ‘falseAlarms’ events is significantly lower than those data for ‘hits’ events, and lower than those data for ‘misses’ events. Moreover, for ‘falseAlarms’ events, 96.4% GPM data fall in 0~5 mm/d and 88.4% GPM data fall in 0~2 mm/d over mainland China, which are extreme light precipitations. The low GPM data for ‘falseAlarms’ events result in the errors of ‘hits’ events are most responsible for the errors of the GPM product, and ‘falseAlarms’ events took the least responsibility for the errors of the GPM product over mainland China although with high event numbers in northern China. The low GPM data for ‘falseAlarms’ events also lead to that after the first step, the bias-corrected GPM data showed significant improvement in detecting precipitation events evaluated by the FAR and CSI values, but only showed significant improvements of the statistics indices in XJ subregion. Light precipitation is critical to the Earth’s ecosystems because of its high occurrence rate and vital role in maintaining soil moisture [23,56,57]. Therefore, bias correction of ‘falseAlarms’ events by the first step in this study is important for related researches, although without significant improvement in statistics indices.

Additionally, Li et al. also pointed out the limited capability of the GPM IMERG V06 product to detect light precipitations under 5 mm/d, and significant deviations between the GPM product and gauge observations for light precipitations under 2 mm/d [23]. As observations from 2350 gauge stations used in their study, they attributed the errors in detecting light precipitation events to scale-mismatch between satellites and gauges in the evaluation, in addition to topography and climate. However, the same evaluation results were draw against observations from 13,691 stations in this study, thus the scale-mismatch is not main reason for significant errors in detecting light precipitation besides topography and climate. The GPM product is the inversion data of water vapor in air cylinder, and the gauge observation is the precipitation value falling on ground, which would lead to errors in GPM data, especially for light precipitation. Therefore, it is necessary to analyze the error sources of the GPM product, especially for light precipitation in the future.

5.3. Suitability and Uncertainty of the Bias Correction Strategy

Compared with the raw GPM product, the bias-corrected GPM product showed better performance in fitting observed precipitation values with increased R values and decreased RMSE values, and performed better in detecting precipitation events with slightly decreased POD values, significantly decreased FAR values and significantly increased CSI values. Therefore, the bias-strategy in this study is credible. As shown in Figure 10, for identification of ‘falseAlarms’ and ‘hits’ events, the raw GPM data is the most influencing factor, then followed by date, air temperature, subregion and finally DEM over mainland China. Meanwhile, for the bias of the ‘hits’ events, the raw GPM data is the most influencing factor, then followed by date, air temperature, DEM and finally subregion. Thus, the raw GPM data, date, and air temperature are important factors for both event identifications and bias correction of ‘hits’ events, which should be cautioned in future bias corrections. As the GPM product provides both rainfall and snowfall values [13,18]. The important influence of date and air temperature are mainly because they are auxiliary index to distinguish rainfall and snowfall. It indicates that precipitation type should be considered in bias corrections of the GPM product.

The ability of the bias-corrected GPM product was significantly improved for ‘hits’ events after bias-correction (Figure 8). However, although with significant improvements in FAR and CSI, the performance of the GPM product showed slight deterioration in RMSE and MAE for ‘falseAlarms’ events after overall bias correction (Figure 8). As shown in Figure 11, both the RMSE and MAE values decreased for correctly identified ‘falseAlarms’ events, but increased for falsely identified ‘falseAlarms’ events, which would be bias corrected as ‘hits’ events in the second step. Therefore, the slight deterioration in RMSE and MAE for ‘falseAlarms’ events is caused by the accuracy in identifying ‘falseAlarms’ events by the RF classification model in the first step of the bias strategy.

Moreover, as shown in Figure 12, the maximum proportions of falsely identified ‘hits’ events in the first step are 32.8%, 11.4%, and 23.1% in XJ, TP, and NW subregions, thus leading to a slight decrease in POD values in these areas. Due to the higher proportions of correctly identified ‘hits’ events and lower proportions of correctly identified ‘falseAlarms’ events in the first step in southern China (CJ, SW, and SE) (Figure 12), the decreases of RMSE and MAE values for ‘hits’ events are higher and the increases of two indices for ‘falseAlarms’ events are lower in southern China than in the other regions. The proposed classification model tends to learn characteristics of the event with larger proportion. Due to the data imbalance of ‘hits’ and ‘falseAlarms’ events in different subregions, the RF classification model performed better in correctly identifying ‘hits’ events in southern China (CJ, SW, and SE) with larger proportions of ‘hits’ events (Figure 4a), and performed better in correctly identifying ‘falseAlarms’ events in the other subregions with larger proportions of ‘falseAlarms’ events (Figure 4a).

6. Conclusions

This study aims to bias correct the GPM IMERG V06 product based on ‘hits’, ‘falseAlarms’, and ‘misses’ events over mainland China. The gauge observations were collected at 13,691 stations from 25 August 2015 to 31 May 2019. The GPM IMERG V06 product was first evaluated in eight climatic and topographic subregions over mainland China, then event-based bias correction strategy of the GPM product was conducted through random forest method. The main conclusions are as follows.

Evaluated at each station over mainland China, the R values of the GPM IMERG V06 product ranged in −0.002~0.903 and concentrated in 0.200~0.800, the RMSE values ranged in 0.298~18.822 mm/d and concentrated in 0.298~12.000 mm/d, while the MAE values ranged in 0.057~5.790 mm/d and concentrated in 0.057~5.000 mm/d. Evaluated by three statistics indices, the GPM IMERG V06 product performed best in NE and NC subregions, then followed by NW subregion, SW subregion, while worst in XJ and TP subregions.

The POD values ranged in 0.348~1 and concentrated in 0.5~1, the FAR values ranged in 0.056~0.990, and the CSI values ranged in 0.010~0.764 and concentrated in 0~0.700. Evaluated by contingency indices, the GPM IMERG V06 product performed best in SW, CJ, and SE subregions, then in NW, NE, TP, and NC subregions, and finally in XJ subregion.

Biases caused by ‘hits’ events are most responsible for the errors of the GPM product. With 96.4% GPM data for ‘falseAlarms’ events fall in 0~5 mm/d and 88.4% GPM data fall in 0~2 mm/d over mainland China, the low GPM data for ‘falseAlarms’ events caused that ‘falseAlarms’ events took the least responsibility for the errors of the GPM product over mainland China although with high event numbers in northern China. The low GPM data for ‘falseAlarms’ events also lead to that the bias-corrected GPM data after the first step showed significant improvement of ability in detecting precipitation events evaluated by the FAR and CSI values, but only showed significant improvements of the statistics indices in XJ subregion. The GPM product is the inversion data of water vapor in air cylinder, and the gauge observation is the precipitation value falling on ground, which would lead to errors of the GPM data, and thus caused high ‘falseAlarms’ events in detecting light precipitation. It is necessary to analyze the error sources of the GPM product, especially for light precipitation in the future investigations.

Compared with the raw GPM product, the bias-corrected GPM product showed better performance in fitting observed precipitation values with increased R values and decreased RMSE values, and performed better in detecting precipitation events with slightly decreased POD values, significantly decreased FAR values and significantly increased CSI values. Therefore, the bias correction strategy in this study is credible. The raw GPM data, date, and air temperature are important factors for both event identifications and bias correction of ‘hits’ events, which should be cautioned in future bias corrections.

After overall bias correction, the ability of the bias-corrected GPM product was significantly improved for ‘hits’ events, but showed slight deterioration in RMSE and MAE for ‘falseAlarms’ events although with significant improvements in FAR and CSI. The slight deterioration is mainly caused by the accuracy of the RF classification model in identifying ‘falseAlarms’ events. Due to the data imbalance of ‘hits’ and ‘falseAlarms’ events in different subregions, the established RF classification model tends to learn characteristics of the event with larger proportion, and then performed better in correctly identifying the event with larger proportion in the subregion. Therefore, the RF classification model performed better in correctly identifying ‘hits’ events in southern China (CJ, SW, and SE), and performed better in correctly identifying ‘falseAlarms’ events in XJ, TP, NW, NE, and NC subregions. It is necessary to improve RF method in dealing with imbalance data in the future.

The above results provide essentials for applications of the GPM product and provide insights for future improvements.

Author Contributions

Conceptualization, Z.L., L.Z. and B.H.; data processing and analysis, Z.L. and H.H.; writing—original draft preparation, Z.L. and L.Z.; review, editing, and finalization, L.Z. and B.H.; project administration, L.Z. and B.H. All authors have read and agreed to the published version of the manuscript.

Funding

The project is partially funded by the grants from National Natural Science Foundation of China (41877148), the National Key Research and Development Program of China (2019YFA0706200), the National Natural Science Foundation of China (61632014, 61627808, 61802159, 61802158), and Fundamental Research Funds for Central Universities (lzujbky-2019-26, lzujbky-2021-kb26).

Data Availability Statement

The Global Precipitation Measurement (GPM) IMERG V06 data used in this study are available at https://gpm.nasa.gov/directory (accessed on 26 April 2022). The gauge observations were provided by the Chinese Meteorological Administration. Direct requests for the material may be made to the provider.

Acknowledgments

We are deeply grateful to the anonymous reviewers for their constructive comments and responsible work.

Conflicts of Interest

The authors declare no conflict of interest.

References

Michaelides, S.; Levizzani, V.; Anagnostou, E.; Bauer, P.; Kasparis, T.; Lane, J.E. Precipitation: Measurement, remote sensing, climatology and modeling. Atmos. Res. 2009, 94, 512–533. [Google Scholar] [CrossRef]
Kidda, C.; Huffman, G. Global precipitation measurement. Meteorol. Appl. 2011, 18, 334–353. [Google Scholar] [CrossRef]
Sun, Q.; Miao, C.; Duan, Q.; Ashouri, H.; Sorooshian, S.; Hsu, K.-L. A review of global precipitation data sets: Data sources, estimation, and intercomparisons. Rev. Geophys. 2018, 56, 79–107. [Google Scholar] [CrossRef] [Green Version]
Turk, F.J.; Rohaly, G.D.; Hawkins, J.; Smith, E.A.; Marzano, F.S.; Mugnai, A.; Levizzani, V. Meteorological applications of precipitation estimation from combined SSM/I, TRMM and infrared geostationary satellite data. In Microwave Radiometry and Remote Sensing of the Earth’s Surface and Atmosphere; Pampaloni, P.P., Paloscia, S., Eds.; VSP International Science Publishers: Leiden, The Netherlands, 2000; pp. 353–363. [Google Scholar]
Joyce, R.J.; Janowiak, J.E.; Arkin, P.A.; Xie, P. CMORPH: A method that produces global precipitation estimates from passive microwave and infrared data at high spatial and temporal resolution. J. Hydrometeorol. 2004, 5, 487–503. [Google Scholar] [CrossRef]
Zhang, T.; Yang, Y.; Dong, Z.; Gui, S. A Multiscale Assessment of Three Satellite Precipitation Products (TRMM, CMORPH, and PERSIANN) in the Three Gorges Reservoir Area in China. Adv. Meteorol. 2021, 2021, 9979216. [Google Scholar] [CrossRef]
Lei, H.; Zhao, H.; Ao, T. Ground validation and error decomposition for six state-of-the-art satellite precipitation products over mainland China. Atmos. Res. 2022, 269, 106017. [Google Scholar] [CrossRef]
Sorooshian, S.; Hsu, K.-L.; Gao, X.; Gupta, H.V.; Imam, B.; Braithwaite, D. Evaluation of PERSIANN System Satellite–Based Estimates of Tropical Rainfall. Bull. Am. Meteorol. Soc. 2000, 81, 2035–2046. [Google Scholar] [CrossRef] [Green Version]
Huffman, G.J.; Bolvin, D.T.; Nelkin, E.J.; Wolff, D.B.; Adler, R.F.; Gu, G.; Hong, Y.; Bowman, K.P.; Stocker, E.F. The TRMM Multisatellite Precipitation Analysis (TMPA): Quasi-Global, Multiyear, Combined-Sensor Precipitation Estimates at Fine Scales. J. Hydrometeorol. 2007, 8, 38–55. [Google Scholar] [CrossRef]
Hou, A.Y.; Kakar, R.K.; Neeck, S.; Azarbarzin, A.A.; Kummerow, C.D.; Kojima, M.; Oki, R.; Nakamura, K.; Iguchi, T. The global precipitation measurement mission. Bull. Am. Meteorol. Soc. 2014, 95, 701–722. [Google Scholar] [CrossRef]
Tapiador, F.J.; Turk, F.J.; Petersen, W.; Hou, A.Y.; García-Ortega, E.; Machado, L.A.T.; Angelis, C.F.; Salio, P.; Kidd, C.; Huffman, G.J.; et al. Global precipitation measurement: Methods, datasets and applications. Atmos. Res. 2012, 104–105, 70–97. [Google Scholar] [CrossRef]
Yong, B.; Liu, D.; Gourley, J.J.; Tian, Y.; Huffman, G.; Ren, L.; Hong, Y. Global View of Real-Time Trmm Multisatellite Precipitation Analysis: Implications For Its Successor Global Precipitation Measurement Mission. Bull. Am. Meteorol. Soc. 2015, 96, 283–296. [Google Scholar] [CrossRef]
Chen, H.; Yong, B.; Gourley, J.J.; Liu, J.; Ren, L.; Wang, W.; Hong, Y.; Zhang, J. Impact of the crucial geographic and climatic factors on the input source errors of GPM-based global satellite precipitation estimates. J. Hydrol. 2019, 575, 1–16. [Google Scholar] [CrossRef]
Huffman, G.J.; Bolvin, D.T.; Braithwaite, D.; Hsu, K.; Joyce, R.; Kidd, C.; Nelkin, E.J.; Xie, P. NASA Global Precipitation Measurement (GPM) Integrated Multi-Satellite Retrievals for GPM (IMERG). In Algorithm Theoretical Basis Document (ATBD) Version 4.5.; NASA/GSFC: Greenbelt, MD, USA, 2017. [Google Scholar]
Lee, J.; Lee, E.-H.; Seol, K.-H. Validation of Integrated MultisatellitE Retrievals for GPM (IMERG) by using gauge-based analysis products of daily precipitation over East Asia. Theor. Appl. Climatol. 2019, 137, 2497–2512. [Google Scholar] [CrossRef]
Sharifi, E.; Steinacker, R.; Saghafian, B. Assessment of GPM-IMERG and Other Precipitation Products against Gauge Data under Different Topographic and Climatic Conditions in Iran: Preliminary Results. Remote Sens. 2016, 8, 135. [Google Scholar] [CrossRef] [Green Version]
Tan, M.L.; Santo, H. Comparison of GPM IMERG, TMPA 3B42 and PERSIANN-CDR satellite precipitation products over Malaysia. Atmos. Res. 2018, 202, 63–76. [Google Scholar] [CrossRef]
Tang, G.; Clark, M.P.; Papalexiou, S.M.; Ma, Z.; Hong, Y. Have satellite precipitation products improved over last two decades? A comprehensive comparison of GPM IMERG with nine satellite and reanalysis datasets. Remote Sens. Environ. 2020, 240, 111697. [Google Scholar] [CrossRef]
Yuan, S.; Zhu, L.; Quiring, S.M. Comparison of Two Multisatellite Algorithms for Estimation of Tropical Cyclone Precipitation in the United States and Mexico: TMPA and IMERG. J. Hydrometeorol. 2021, 22, 923–939. [Google Scholar] [CrossRef]
Huffman, G.J.; Bolvin, D.T.; Braithwaite, D.; Hsu, K.; Joyce, R.; Kidd, C.; Nelkin, E.J.; Sorooshian, S.; Tan, J.; Xie, P. NASA Global Precipitation Measurement (GPM) Integrated Multi-satellitE Retrievals for GPM (IMERG). In Algorithm Theoretical Basis Document (ATBD) Version 06; NASA/GSFC: Greenbelt, MD, USA, 2019. [Google Scholar]
Tan, J.; Huffman, G.J.; Bolvin, D.T.; Nelkin, E.J. IMERG V06: Changes to the Morphing Algorithm. J. Atmos. Ocean. Technol. 2019, 36, 2471–2482. [Google Scholar] [CrossRef]
Hosseini-Moghari, S.-M.; Tang, Q. Validation of GPM IMERG V05 and V06 Precipitation Products over Iran. J. Hydrometeorol. 2020, 21, 1011–1037. [Google Scholar] [CrossRef] [Green Version]
Li, X.; Sungmin, O.; Wang, N.; Liu, L.; Huang, Y. Evaluation of the GPM IMERG V06 products for light rain over Mainland China. Atmos. Res. 2021, 253, 105510. [Google Scholar] [CrossRef]
Yu, L.; Leng, G.; Python, A.; Peng, J. A Comprehensive Evaluation of Latest GPM IMERG V06 Early, Late and Final Precipitation Products across China. Remote Sens. 2021, 13, 1208. [Google Scholar] [CrossRef]
Essou, G.R.C.; Sabarly, F.; Lucas-Picher, P.; Brissette, F.; Poulin, A. Can Precipitation and Temperature from Meteorological Reanalyses Be Used for Hydrological Modeling? J. Hydrometeorol. 2016, 17, 1929–1950. [Google Scholar] [CrossRef]
Beck, H.E.; Vergopolan, N.; Pan, M.; Levizzani, V.; van Dijk, A.I.J.M.; Weedon, G.P.; Brocca, L.; Pappenberger, F.; Huffman, G.J.; Wood, E.F. Global-scale evaluation of 22 precipitation datasets using gauge observations and hydrological modeling. Hydrol. Earth Syst. Sci. 2017, 21, 6201–6217. [Google Scholar] [CrossRef] [Green Version]
Chen, J.; Li, C.; Brissette, F.P.; Chen, H.; Wang, M.; Essou, G.R. Impacts of correcting the inter-variable correlation of climate model outputs on hydrological modeling. J. Hydrol. 2018, 560, 326–341. [Google Scholar] [CrossRef]
Zhang, L.; He, C.; Tian, W.; Zhu, Y. Evaluation of Precipitation Datasets from TRMM Satellite and Down-scaled Reanalysis Products with Bias-correction in Middle Qilian Mountain, China. Chin. Geogr. Sci. 2021, 31, 474–490. [Google Scholar] [CrossRef]
Sun, W.; Sun, Y.; Li, X.; Wang, T.; Wang, Y.; Qiu, Q.; Deng, Z. Evaluation and Correction of GPM IMERG Precipitation Products over the Capital Circle in Northeast China at Multiple Spatiotemporal Scales. Adv. Meteorol. 2018, 2018, 4714173. [Google Scholar] [CrossRef]
Tian, Y.; Peters-Lidard, C.D.; Eylander, J.B.; Joyce, R.J.; Huffman, G.J.; Adler, R.F.; Hsu, K.; Turk, F.J.; Garcia, M.; Zeng, J. Component analysis of errors in satellite-based precipitation estimates. J. Geophys. Res. Atmos. 2009, 114, D24101. [Google Scholar] [CrossRef] [Green Version]
Chaudhary, S.; Dhanya, C. An improved error decomposition scheme for satellite-based precipitation products. J. Hydrol. 2021, 598, 126434. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Grömping, U. Variable Importance Assessment in Regression: Linear Regression versus Random Forest. Am. Stat. 2009, 63, 308–319. [Google Scholar] [CrossRef]
Rahmati, O.; Pourghasemi, H.R.; Melesse, A.M. Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: A case study at Mehran Region, Iran. Catena 2016, 137, 360–372. [Google Scholar] [CrossRef]
Briem, G.; Benediktsson, J.; Sveinsson, J. Multiple classifiers applied to multisource remote sensing data. IEEE Trans. Geosci. Remote Sens. 2002, 40, 2291–2299. [Google Scholar] [CrossRef] [Green Version]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Wang, C.; Tang, G.; Han, Z.; Guo, X.; Hong, Y. Global intercomparison and regional evaluation of GPM IMERG Version-03, Version-04 and its latest Version-05 precipitation products: Similarity, difference and improvements. J. Hydrol. 2018, 564, 342–356. [Google Scholar] [CrossRef]
Wei, G.; Lü, H.; Crow, W.T.; Zhu, Y.; Wang, J.; Su, J. Comprehensive Evaluation of GPM-IMERG, CMORPH, and TMPA Precipitation Products with Gauged Rainfall over Mainland China. Adv. Meteorol. 2018, 2018, 3024190. [Google Scholar] [CrossRef] [Green Version]
Zhao, H.; Yang, B.; Yang, S.; Huang, Y.; Dong, G.; Bai, J.; Wang, Z. Systematical estimation of GPM-based global satellite mapping of precipitation products over China. Atmospheric Res. 2018, 201, 206–217. [Google Scholar] [CrossRef]
Jiang, L.; Bauer-Gottwein, P. How do GPM IMERG precipitation estimates perform as hydrological model forcing? Evaluation for 300 catchments across Mainland China. J. Hydrol. 2019, 572, 486–500. [Google Scholar] [CrossRef]
Chen, F.; Li, X. Evaluation of IMERG and TRMM 3B43 Monthly Precipitation Products over Mainland China. Remote Sens. 2016, 8, 472. [Google Scholar] [CrossRef] [Green Version]
Lu, X.; Wei, M.; Tang, G.; Zhang, Y. Evaluation and correction of the TRMM 3B43V7 and GPM 3IMERGM satellite precipitation products by use of ground-based data over Xinjiang, China. Environ. Earth Sci. 2018, 77, 209. [Google Scholar] [CrossRef]
Yao, Y.Y.; Zheng, C.; Andrews, C.; Zheng, Y.; Zhang, A.; Liu, J. What controls the partitioning between baseflow and mountain block recharge in the Qinghai-Tibet Plateau? Geophys. Res. Lett. 2017, 44, 8352–8358. [Google Scholar] [CrossRef]
Su, X.L.; Shum, C.K.; Luo, Z. Evaluating IMERG V04 Final Run for Monitoring Three Heavy Rain Events Over Mainland China in 2016. IEEE Geosci. Remote Sens. Lett. 2018, 15, 444–448. [Google Scholar] [CrossRef]
Chen, W.L.; Jiang, Z.H.; Li, L.; Yiou, P. Simulation of regional climate change under the IPCC A2 scenario in Southeast China. Clim. Dyn. 2011, 36, 491–507. [Google Scholar] [CrossRef]
Guo, H.; Chen, S.; Bao, A.; Behrangi, A.; Hong, Y.; Ndayisaba, F.; Hu, J.; Stepanian, P.M. Early assessment of Integrated Multi-satellite Retrievals for Global Precipitation Measurement over China. Atmos. Res. 2016, 176–177, 121–133. [Google Scholar] [CrossRef]
Zhai, P.; Zhang, X.; Wan, H.; Pan, X. Trends in Total Precipitation and Frequency of Daily Precipitation Extremes over China. J. Clim. 2005, 18, 1096–1108. [Google Scholar] [CrossRef]
Shen, Y.; Xiong, A.; Wang, Y.; Xie, P. Performance of high-resolution satellite precipitation products over China. J. Geophys. Res. Earth Surf. 2010, 115, D02114. [Google Scholar] [CrossRef]
Ling, X.; Huang, Y.; Guo, W.; Wang, Y.; Chen, C.; Qiu, B.; Ge, J.; Qin, K.; Xue, Y.; Peng, J. Comprehensive evaluation of satellite-based and reanalysis soil moisture products using in situ observations over China. Hydrol. Earth Syst. Sci. 2021, 25, 4209–4229. [Google Scholar] [CrossRef]
Yu, C.; Hu, D.; Liu, M.; Wang, S.; Di, Y. Spatio-temporal accuracy evaluation of three high-resolution satellite precipitation products in China area. Atmos. Res. 2020, 241, 104952. [Google Scholar] [CrossRef]
Behrangi, A.; Khakbaz, B.; Jaw, T.C.; AghaKouchak, A.; Hsu, K.; Sorooshian, S. Hydrologic evaluation of satellite precipitation products over a mid-size basin. J. Hydrol. 2011, 397, 225–237. [Google Scholar] [CrossRef] [Green Version]
Meng, J.; Li, L.; Hao, Z.; Wang, J.; Shao, Q. Suitability of TRMM satellite rainfall in driving a distributed hydrological model in the source region of Yellow River. J. Hydrol. 2014, 509, 320–332. [Google Scholar] [CrossRef]
Ghajarnia, N.; Liaghat, A.; Arasteh, P.D. Comparison and evaluation of high resolution precipitation estimation products in Urmia Basin-Iran. Atmos. Res. 2015, 158–159, 50–65. [Google Scholar] [CrossRef]
Tang, G.; Long, D.; Hong, Y. Systematic Anomalies over Inland Water Bodies of High Mountain Asia in TRMM Precipitation Estimates: No Longer a Problem for the GPM Era? IEEE Geosci. Remote Sens. Lett. 2016, 13, 1762–1766. [Google Scholar] [CrossRef]
Zhou, Z.; Guo, B.; Xing, W.; Zhou, J.; Xu, F.; Xu, Y. Comprehensive evaluation of latest GPM era IMERG and GSMaP precipitation products over mainland China. Atmos. Res. 2020, 246, 105132. [Google Scholar] [CrossRef]
Chen, H.; Chandrasekar, V. Estimation of Light Rainfall Using Ku-Band Dual-Polarization Radar. IEEE Trans. Geosci. Remote Sens. 2015, 53, 5197–5208. [Google Scholar] [CrossRef]
Zhou, J.; Zhi, R.; Li, Y.; Zhao, J.; Xiang, B.; Wu, Y.; Feng, G. Possible causes of the significant decrease in the number of summer days with light rain in the east of southwestern China. Atmos. Res. 2020, 236, 104804. [Google Scholar] [CrossRef]

Figure 1. Bias correction strategy in this study.

Figure 2. The study area. (a) Eight subregions of interest: inland Xinjiang (XJ), the Qinghai-Tibetan plateau (TP), Northwestern China (NW), Northeastern China (NE), Northern China (NC), the Changjiang (CJ) River Plain, the southwestern China (SW), and southeastern China (SE). (b) gauge stations.

Figure 3. Evaluation of raw GPM product against gauge observations over mainland China. (a) R, (b) RMSE, (c) MAE, (d) POD, (e) FAR, (f) CSI.

Figure 4. Statistics of different events of the raw GPM in eight subregions. (a) Proportions of event numbers, (b) RMSE for different events of the raw GPM, (c) MAE for different events of the raw GPM.

Figure 5. Differences of evaluation indices between bias-corrected GPM product and the raw GPM product for the first step of bias correction strategy: training period (left), and validation period (right).

Figure 6. Differences of evaluation indices between bias-corrected GPM product for the second step and bias-corrected GPM product for the first step: training period (left), and validation period (right).

Figure 7. Differences of evaluation indices between overall bias-corrected GPM product and the raw GPM product. (a) R, (b) RMSE, (c) MAE, (d) POD, (e) FAR, (f) CSI.

Figure 8. Performance improvement of overall bias-corrected GPM product for ‘hits’ and ‘falseAlarms’ events. (a) differences of RMSE, (b) differences of MAE.

Figure 9. Precipitation values of the raw GPM product and gauge observations for different events.

Figure 10. Relative importance of environmental factor over mainland China.

Figure 11. Differences of RMSE values between bias-corrected GPM data and raw GPM data for ‘falseAlarms’ events.

Figure 12. The proportions of correctly identified events in the first step of the bias-strategy.

Table 1. Information of eight subregions in mainland China.

Subregions	DEM (m asl)	Area	Gauge Numbers
Inland Xinjiang (XJ)	−188~7573	146.78	1446
The Qinghai-Tibetan plateau (TP)	1083~8756	232.90	947
Northwestern China (NW)	283~5275	156.17	4247
Northeastern China (NE)	−274~2667	95.66	482
Northern China (NC)	−115~2993	68.16	849
The Changjiang River Plain (CJ)	−140~3093	80.72	838
The southwestern China (SW)	20~7258	99.67	4591
Southeastern China (SE)	−57~3916	70.07	291

Table 2. Precipitation characteristics of the study period of train periods.

Subregions	Training Period		Validation Period
Subregions	Mean (mm)	CV (mm)	Mean (mm)	CV (mm)
Inland Xinjiang (XJ)	0.469	5.445	0.459	5.190
The Qinghai-Tibetan plateau (TP)	1.160	3.218	1.166	3.247
Northwestern China (NW)	1.012	4.326	1.022	4.367
Northeastern China (NE)	1.495	4.102	1.518	4.072
Northern China (NC)	1.728	4.177	1.733	4.196
The Changjiang River Plain (CJ)	3.513	2.847	3.467	2.858
The southwestern China (SW)	2.810	2.853	2.783	2.865

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Z.; Hou, H.; Zhang, L.; Hu, B. Event-Based Bias Correction of the GPM IMERG V06 Product by Random Forest Method over Mainland China. Remote Sens. 2022, 14, 3859. https://doi.org/10.3390/rs14163859

AMA Style

Liu Z, Hou H, Zhang L, Hu B. Event-Based Bias Correction of the GPM IMERG V06 Product by Random Forest Method over Mainland China. Remote Sensing. 2022; 14(16):3859. https://doi.org/10.3390/rs14163859

Chicago/Turabian Style

Liu, Zhenyu, Haowen Hou, Lanhui Zhang, and Bin Hu. 2022. "Event-Based Bias Correction of the GPM IMERG V06 Product by Random Forest Method over Mainland China" Remote Sensing 14, no. 16: 3859. https://doi.org/10.3390/rs14163859

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Event-Based Bias Correction of the GPM IMERG V06 Product by Random Forest Method over Mainland China

Abstract

1. Introduction

2. Bias Correction Strategy

3. Materials and Methods

3.1. Study Area

3.2. Gauge Observations

3.3. GPM Product

3.4. Evaluation Indices

4. Results

4.1. Evaluation of the Raw GPM IMERG V06 Product

4.2. Bias Correction of the GPM by Random Forest

4.2.1. Performance Improvement by the First Step of Bias Correction Strategy

4.2.2. Performance Improvement by the Second Step of Bias Correction Strategy

4.2.3. Overall Performance Improvement after Bias Correction

5. Discussion

5.1. Reliability of the Evaluation in This Study

5.2. The Impacts of Errors in Detecting Light Precipitation of the GPM Product

5.3. Suitability and Uncertainty of the Bias Correction Strategy

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI