Autonomous Quality Control of High Spatiotemporal Resolution Automatic Weather Station Precipitation Data

Ouyang, Hongxiang; Qin, Zhengkun; Xu, Xingsheng; Xu, Yuan; Huangfu, Jiang; Li, Xiaomin; Hu, Jiahui; Zhan, Zixuan; Yu, Junjie

doi:10.3390/rs17030404

Open AccessArticle

Autonomous Quality Control of High Spatiotemporal Resolution Automatic Weather Station Precipitation Data

by

Hongxiang Ouyang

^1,2,

Zhengkun Qin

^3,*

,

Xingsheng Xu

¹,

Yuan Xu

¹

,

Jiang Huangfu

¹,

Xiaomin Li

⁴,

Jiahui Hu

⁵,

Zixuan Zhan

⁶ and

Junjie Yu

⁷

¹

Jiangxi Provincial Meteorological Observatory, Nanchang 330046, China

²

Key Laboratory of Ecological Meteorology in Ji’an City, Ji’an Meteorological Bureau, Ji’an 343000, China

³

Centre of Data Assimilation for Research and Application, Nanjing University of Information Science and Technology, Nanjing 210044, China

⁴

Yongxin County Meteorological Bureau, Ji’an 343400, China

⁵

Climate Center of Xinjiang Uygur Autonomous Region, Urumqi 830002, China

⁶

Anfu County Meteorological Bureau, Ji’an 343200, China

⁷

Leping Meteorological Bureau, Jingdezhen 333300, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(3), 404; https://doi.org/10.3390/rs17030404

Submission received: 6 November 2024 / Revised: 19 January 2025 / Accepted: 22 January 2025 / Published: 24 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

How to prevent the influence of precipitation’s localized and sudden characteristics is the most formidable challenge in the quality control (QC) of precipitation observations. However, with sufficiently high spatiotemporal resolution in observational data, nuanced information can aid us in accurately distinguishing between intense, localized precipitation events, and anomalies in precipitation data. China has deployed over 70,000 automatic weather stations (AWSs) that provide high spatiotemporal resolution surface meteorological observations. This study developed a new method for performing QC of precipitation data based on the high spatiotemporal resolution characteristics of observations from surface AWSs in China. The proposed QC algorithm uses the cumulative average method to standardize the probability distribution characteristics of precipitation data and further uses the empirical orthogonal function (EOF) decomposition method to effectively identify the small-scale spatial structure of precipitation data. Leveraging the spatial correlation characteristics of precipitation, partitioned EOF detection with a 0.5° spatial coverage effectively minimizes the influence of local precipitation on quality control. Analysis of precipitation probability distribution reveals that reconstruction based on the first three EOF modes can accurately capture the organized structural features of precipitation within the detection area. Thereby, based on the randomness characteristics of the residuals, when the residual of a certain observation is greater than 2.5 times the standard deviation calculated from all residuals in the region, it can be determined that the data are erroneous. Although the quality control is primarily aimed at accumulated precipitation, the randomness of erroneous data indicates that 84 continuous instances of error data in accumulated precipitation can effectively trace back to erroneous hourly precipitation observations. This ultimately enables the QC of hourly precipitation data from surface AWSs. Analysis of the QC of precipitation data from 2530 AWSs in Jiangxi Province (China) revealed that the new method can effectively identify incorrect precipitation data under the conditions of extreme weather and complex terrain, with an average rejection rate of about 5%. The EOF-based QC method can accurately detect strong precipitation events resulting from small-scale weather disturbances, thereby preventing local heavy rainfall from being incorrectly classified as erroneous data. Comparison with the quality control results in the Tianqing System, an operational QC system of the China Meteorological Administration, revealed that the proposed method has advantages in handling extreme and scattered outliers, and that the precipitation observation data, following quality control procedures, exhibits enhanced similarity with the CMAPS merged precipitation data. The novel quality control approach not only elevates the average spatial correlation coefficient between the two datasets by 0.01 but also diminishes the root mean square error by 1 mm.

Keywords:

EOF; quality control; precipitation; automatic weather station; CMPAS data

1. Introduction

As the cornerstone of meteorological and hydrological research, precipitation data have become particularly critical in terms of their accuracy and reliability in the context of climate change and increasingly extreme weather events. Acquisition of precipitation data relies principally on technical means such as surface observation stations, radar, and satellite remote sensing, each of which has its own specific advantages and limitations [1,2]. Although radar and satellite remote sensing technologies have a wide coverage area, the accuracy and resolution of their retrieved data are affected by factors such as cloud cover and terrain [3,4,5]. Consequently, surface precipitation observation remains the primary source of precipitation data used by local meteorological departments for their operations and precipitation research. In recent years, China has fully automated its acquisition of surface meteorological observations, and a large and dense network of approximately 65,000 automatic weather stations (AWSs) has been established nationwide. Previous studies have proven that the assimilation of observation data from AWSs can effectively improve the accuracy of high-resolution regional numerical weather forecasting [6]. However, owing to the wide and dense distribution of AWSs, the substantial differences in terrain, and the characteristics of high-resolution real-time data, there has long been a challenge regarding the use of AWS precipitation data with unstable quality in practical forecasting operations and assimilation research [7,8].

The spatial distribution of precipitation has strong nonuniformity, which results in considerable uncertainty in estimating precipitation, even for reasonably small areas. The processing of surface observations of precipitation involves multiple aspects such as data quality control (QC), interpolation, and scale conversion. Any small errors in these links could be amplified in the final estimation of precipitation, leading to notable biases [9], and the currently increasing occurrence of extreme precipitation events further increases the uncertainty of the quality of precipitation data [10].

High-quality precipitation observations are the foundation of precipitation research, and strict QC is the main approach adopted to improve the quality of such observational data. The methods adopted for QC of precipitation data have also been the research focus of previous studies [11]. For example, Boulanger et al. [12] used a decision tree algorithm to perform QC on the daily observational data of the Argentine National Meteorological Agency for the period 1959–2005. They detected a large number of erroneous precipitation and temperature data and verified the applicability of their method in other countries. Hamada et al. [13] developed an automatic QC system for detecting erroneous information in daily rainfall observations, which can automatically and objectively identify erroneous data. Dandrifosse et al. [14] developed a rapid QC method for meteorological parameters, such as temperature, pressure, humidity, and wind observed, via meteorological stations deployed in farmland, which can perform real-time QC of the data and has a low misjudgment rate. In addition to multiple-station collaborative QC based on the spatial continuity of atmospheric variables, some studies also introduced statistical analysis methods such as spatial regression [15], inverse distance weighting [16], and interpolation [17] into collaborative QC research. In recognition of the needs of data assimilation for research and commercial purposes, QC approaches based on model background fields have also been developed. For example, when the temperature difference between the observed data and the model background field exceeds a given threshold, the data are deemed unusable. This can effectively prevent the overall assimilation effect from being compromised by observational data that deviate too far from the background field [18,19].

Although research on the QC of precipitation data has developed to a certain extent, the prominent spatiotemporal characteristics of localized precipitation make it difficult to use traditional threshold determination methods such as boundary value checks and climate extreme value ratios. Methods such as internal consistency verification, temporal continuity assessment, and spatial consistency analysis all rely on the spatiotemporal continuity of variables [20], and the randomness of precipitation limits the effectiveness of these methods in performing QC. Especially in summer, when severe convective weather occurs frequently, this limitation becomes even more prominent. In addition to statistical based QC methods, radar-based quantitative precipitation estimation has been introduced for QC discrimination of cumulative precipitation at AWSs [21,22]. However, such an approach cannot meet the real-time requirements of commercial activities; therefore, further exploration is needed regarding methods for performing QC of AWS precipitation data.

Although much progress has been achieved in research on QC methods adopted for meteorological data, current studies mainly focus on observational data of continuous atmospheric variables, and the variation patterns of the meteorological variables are generally applied based on the physical laws of large-scale weather systems. With the continued deployment of surface AWSs in China, the rapid increase in the density of their distribution has provided a valuable opportunity to capture small-scale surface weather information with high spatiotemporal resolution. Therefore, determining how best to utilize the high spatiotemporal resolution characteristics of surface AWS data, overcome the limitations of the prominent spatiotemporal characteristics of localized precipitation [23], and avoid the influence of small-scale weather systems on the QC process are recognized as important issues in developing QC methods suitable for high spatiotemporal resolution surface AWS precipitation data.

Since the observational data contain a great deal of meso- and micro-scale weather information, these pieces of information often lead to observed values in the observational data that exceed the threshold range of conventional quality control methods, and thus misjudgments occur. The EOF method can decompose three-dimensional variables into a linear combination of different spatial modes and their corresponding time coefficients. Furthermore, based on the scale differences of the characteristics represented by different modes, the three-dimensional variables can be decomposed into a primary term composed of the first n modes that exhibit relatively large-scale structural features, and a residual term predominantly characterized by small-scale random variations. Based on this principle, a quality control method based on EOF decomposition was proposed [24]. This method aims to reduce the adverse impact of the weather system on the quality of observed data, thus ensuring the accuracy and reliability of the quality control results [25,26].

Since the introduction of the EOF quality control method, numerous meteorologists applied and improved it. Shao et al. [27] used ERA5 reanalysis data as the background field and established a surface temperature data quality control method suitable for high spatiotemporal density based on the EOF-based QC method. The method was tested in the central and eastern regions of China and yielded satisfactory results. Shen et al. [28] leveraging the EOF method for identifying anomalous observational data, proposed a data repair technique based on iterative EOF analysis. This technique effectively restores erroneous surface automatic station temperature observations. Building upon an analysis of the spatial scale and error distribution characteristics of surface temperature, Shang et al. [29] have established an EOF-based QC methodology for surface temperature observations that is entirely dependent on observational data. This approach is capable of accurately identifying erroneous observational data while effectively circumventing the impacts of background field errors, topographical influences, and local weather changes. Through the research and application of meteorologists, it has been proven that the EOF-based QC method can effectively avoid the influence of weather systems on precipitation data, providing new ideas for quality control of precipitation data.

Despite recent progress, further research is required to ascertain how best to apply EOF-based QC methods to precipitation observation data. The prominent spatiotemporal characteristics of localized precipitation have long represented a problem in controlling the quality of precipitation data. As a statistical method, EOF analysis focuses on variables with reasonable spatiotemporal continuity, but the suddenness of precipitation can seriously undermine the temporal continuity of precipitation as a variable. Gebremichael et al. [30] found that precipitation data can be made to follow a normal distribution through temporal averaging. Therefore, precipitation accumulation conversion can be used to construct precipitation QC data with reasonable temporal continuity. High spatial resolution surface AWS precipitation data allow for the accurate identification of observed precipitation extremes caused by extreme weather conditions and complex terrain. However, EOF analysis is often used to identify the large-scale spatial structure of meteorological variables. Therefore, the problem of how best to use EOF analysis to extract small-scale precipitation information also urgently needs further research.

This study focused on the high spatiotemporal resolution characteristics of AWS data. Based on the cumulative conversion of hourly precipitation observation data, a method based on EOF analysis was developed to perform QC using only observational data. On the basis of in-depth analysis of the spatial correlation scale and probability distribution characteristics of precipitation data derived from AWSs, the regional scope and relevant threshold of EOF-based QC were determined objectively. Finally, a QC method suitable for high spatiotemporal resolution surface AWS precipitation observation data was successfully constructed. This research result can be integrated into both early warning systems to avoid false alarms caused by erroneous data and assimilation systems for real-time QC of precipitation data, offering strong commercial application prospects.

The remainder of the paper is structured as follows: Section 2 introduces the data and the preprocessing method used in the study. Section 3 describes the spatiotemporal and probability distribution characteristics of cumulative precipitation. Section 4 discusses how to determine QC methods. The method to determine the occurrence time of incorrect hourly precipitation is described in Section 5. Section 6 proves the superiority of the proposed method. Finally, the conclusions are presented in Section 7.

2. Data Sources and Preprocessing

This study used high-density AWS precipitation data provided by the Jiangxi Provincial Meteorological Bureau. Overall, the study considered data from 2530 AWSs distributed over the region 24.5–30°N, 113.5–118.5°E. The minimum distance between AWSs is approximately 1 km. The study period extended from 09:00 on 20 February 2023 to 08:00 on 31 May 2023 (unless specified otherwise, all times are UTC), i.e., a total of 2400 h. To verify the QC results, this study incorporated precipitation data from both the Tianqing Business System QC for the same period [31] and the real-time China Meteorological Administration Multisource-merged Precipitation Analysis System (CMPAS) hourly dataset, developed using surface precipitation observational data, radar quantitative estimation precipitation data, and satellite inversion precipitation data and key technologies such as deviation correction and fusion analysis [32]. Additionally, this study also used radar reflectance data.

The Tianqing System is a meteorological big data cloud platform developed by the China Meteorological Administration, which is the basic platform supporting meteorological operations. It has massive storage capacity and powerful data output capability, covering the lifecycle of data transmission and collection, processing and handling, storage and service, analysis and monitoring. The precipitation quality control methods in the Tianqing system mainly include boundary values, range values, spatial consistency, internal consistency, time-varying, continuous inspection, and comprehensive inspection.

CMPAS is a real-time precipitation fusion analysis product produced by the National Meteorological Information Center. It is divided into two source fusion products and three source fusion products. This article uses the three source fusion product, which uses probability density function matching method to correct the systematic bias of radar estimation and satellite inversion precipitation products. Then, the Bayesian model averaging method is used to combine the radar and satellite precipitation products to form a background field covering China, and finally the optimal interpolation method is used to integrate the ground observation data. The surface observation data used in CMPAS is provided by Tianqing.

In this study, we first pre-processed the precipitation data by calculating the cumulative precipitation. Specifically, for the hourly precipitation data, we carried out a moving average within a 120 h window (both before and after each hour), thus creating a cumulative precipitation sequence. The cumulative precipitation data for the entire Jiangxi region is a 2530 × 2160 matrix. We performed EOF decomposition on the cumulative precipitation data. For a 0.5 × 0.5-deg region, assuming there are 240 h observational data from M stations, the data matrix of the precipitation observation data X can be expressed as

X_{M \times 240}

, and the covariance matrix can be expressed as

A_{M \times M} = {XX}^{T}

.

3. Spatiotemporal and Probability Distribution Characteristics of Cumulative Precipitation

While hourly precipitation can exhibit prominent abruptness, cumulative precipitation data have better continuity [33], especially as the time of accumulation increases, with the distribution of the precipitation data becoming closer to a normal distribution. Figure 1 shows the cumulative precipitation change curve for different averaging periods. It is evident that with increase in the averaging period, data continuity also increases, verifying the findings of previous research [34].

In order to better clarify the characteristics of precipitation observation data, the spatial distribution of the maximum, median, and variance of hourly precipitation observation data and hourly cumulative precipitation data used is presented here (Figure 2). The northern part of Jiangxi is predominantly flat, while the central and southern regions are characterized by hilly terrain. Influenced by the topography, stations with hourly precipitation exceeding 50 mm are primarily concentrated in the central area. The regions with high standard deviation values are situated in the central-eastern and southern parts, where extreme precipitation is more likely to occur. The median values across all stations in the province are predominantly clustered between 0.5 and 0.6 mm.

The high-value regions for ten-day accumulated precipitation extremes are predominantly found in the central-eastern and southern areas. The median values in the northern and central parts are generally above 60 mm, whereas those in the southern region are clustered between 30 and 60 mm. The areas of high standard deviation largely overlap with the regions of high extremes. A comparison between hourly and accumulated precipitation shows that the ratio of the maximum to median values for hourly precipitation is considerably greater than that for accumulated precipitation. Additionally, the high values of accumulated precipitation are concentrated, while precipitation in other areas is more evenly distributed. This suggests that the spatiotemporal continuity of accumulated precipitation is superior to that of hourly precipitation.

Probability distribution characteristics can be used to quantitatively evaluate the continuity of data. Figure S1 shows the precipitation probability distribution corresponding to each curve illustrated in Figure 1. It is evident that hourly precipitation observational data do not follow a normal distribution. However, as the cumulative period increases, the precipitation probability distribution gradually approaches a normal distribution. When the averaging period reaches 10 days, the precipitation probability distribution broadly follows a normal distribution (dashed line in the figure). The skewness coefficient and the kurtosis coefficient of the cumulative precipitation probability distribution for different averaging periods were also calculated (Figure 3). As the cumulative period increases, the skewness coefficient and the kurtosis coefficient both gradually diminish. At 10 days, the skewness coefficient and the kurtosis coefficient tend to stabilize, with a skewness coefficient of approximately 1 and a kurtosis coefficient of approximately 2. As the cumulative duration increases, the timeliness of the data gradually decreases; therefore, the cumulative period considered in this study was set to 10 days.

To accurately determine the optimal duration for calculating accumulated precipitation, the skewness and kurtosis coefficients of the data’s probability density function were calculated as they vary with the accumulation time. Precipitation data were processed with a rolling accumulation time length ranging from 1 to 11 days, and the curves illustrating the changes in the kurtosis and skewness coefficients of the probability distribution function were generated. It was observed that after the accumulation time reaches 9 days, both the kurtosis and skewness coefficients tend to stabilize. For the convenience of subsequent data processing, 10 days were empirically chosen for the statistical analysis.

4. Quality Control Method Based on Partition EOF

Precipitation has prominent spatiotemporal variations attributable to interactions between mesoscale disturbances and local conditions such as topography, which produce strong localized characteristics [35]. In conventional QC methods, spatial consistency checks often determine the correctness of data based on the difference between precipitation observed at a specific station and that recorded at surrounding stations. However, AWS data with high-density characteristics can reflect the notable changes in precipitation in the horizontal plane associated with small-scale weather disturbances. This is especially the case at the edge of weather systems where adjacent stations are prone to large differences. Therefore, it is not possible to directly identify erroneous data based on the difference between observations at adjacent stations.

Accurate extraction of the small- and medium-scale variational characteristics of high-resolution precipitation data is fundamental to ensuring the accuracy of QC. Previous studies clearly indicated that the spatial scale distinguished via EOF analysis is positively correlated with the spatial range it covers [36], which means that by narrowing the regional scope of EOF analysis, its ability to capture small- and medium-scale precipitation information can be effectively improved. To define the optimal spatial range for EOF analysis scientifically and reasonably, the primary task is to clarify the spatial scale characteristics of the weather systems that can be distinguished by AWS precipitation data. Given that the correlation of precipitation between stations is inevitably regulated by weather conditions, using the curve of the correlation coefficient with distance as an analysis tool can accurately reflect the spatial scale characteristics of the observational data. The specific quantification method used for this feature adopted the following steps.

1.: Calculate the temporal correlation coefficient between each station and the surrounding stations as follows:

ρ = \frac{cov (X, Y)}{std (X) std (Y)}

(1)

2.: Calculate the distance between each station and the surrounding stations as follows:

dis = 2 r \cdot \arcsin (\sqrt{\sin^{2} (\frac{Y_{2} - Y_{1}}{2}) + \cos (Y_{1}) \cos (Y_{2}) \sin^{2} (\frac{X_{2} - X_{1}}{2})})

(2)

where dis is the distance between two stations; r is the radius of the Earth;

Y_{1}

and

Y_{2}

represent the latitude (in radians) of the first and second station, respectively; and

X_{1}

and

X_{2}

represent the longitudes (in radians) of the first and second station, respectively.

3.: Calculate the maximum correlation coefficient between stations within different distances.

The correlation between precipitation amounts at two stations is often influenced by various factors such as terrain, vegetation, and water bodies. In most cases, these interfering factors tend to reduce the correlation coefficients of precipitation amounts. Therefore, in order to avoid the influence of the above factors on the statistical results, the maximum value of the correlation coefficient of all stations at each distance is selected to draw the curve of the correlation coefficient with distance. Figure 4 shows the maximum correlation coefficient between the precipitation sequences at two stations corresponding to different station distances. It is evident that the trend of the maximum correlation coefficient between stations declines with increasing distance, which is attributable to the spatial continuity of precipitation. The correlation coefficient exhibits a reasonably stable characteristic at distances of 30–50 km, with magnitude of approximately 0.997. It means that relatively stable spatial features of the corresponding scale exist in the precipitation data. Therefore, the single analysis area adopted in this study was set at 0.5° × 0.5°.

Higher-order EOF modes represent the small-scale characteristics of the data. This is because the observed precipitation at a station generally consists of two components. First, there is the amount of precipitation similar to that of surrounding stations, which is often caused by a large-scale weather system. In this case, the precipitation in a region exhibits spatial structural characteristics. Second, there is the precipitation caused by local small-scale weather, which is independent of the surrounding stations. The precipitation component with regular spatial structural characteristics definitely does not conform to the characteristics of a random distribution. However, by using the first few modes of EOF, we can effectively extract the precipitation information with spatial structural characteristics. Therefore, the remaining precipitation information composed of higher-order EOF modes often shows characteristics of being uncorrelated among stations. This also means that the precipitation sequence composed of the remainders of all stations exhibits statistical characteristics of a random distribution. Therefore, in determining whether the EOF QC method can be applied to cumulative precipitation data, it must first be clarified whether the reconstruction results of its high-order modes satisfy a normal distribution. Consequently, before performing QC, it is necessary to define those high-order EOF modes that are suitable for EOF-based QC of precipitation data.

The skewness coefficient and the kurtosis coefficient are important metrics in statistics because they describe the symmetry and steepness or smoothness of a data distribution. As a special form of distribution, the normal distribution has specific values of skewness and kurtosis. Therefore, this study used these two statistical metrics to quantitatively analyze the frequency distribution of the high-order EOF modal reconstruction field. In recognition of the need to conduct multiple experiments to obtain the general pattern of the frequency distribution, in addition to the research period (10:00 on 9 March 2023 to 10:00 on 19 March 2023), two further periods were randomly selected for analysis: 10:00 on 4 March 2023 to 10:00 on 14 March 2023 and 10:00 on 14 March 2023 to 10:00 on 24 March 2023.

To determine those high-order EOF modes that are suitable for QC of precipitation data, Figure S2 shows the frequency distribution of the reconstructed field of high-order EOF modes after gradual extraction of the first four modes, together with the skewness and kurtosis coefficients of the reconstructed field. The black dashed line shown in the figure represents the closest standard normal distribution function curve, which is defined as the standard normal distribution function for which the standard deviation is consistent with the observed data. It is evident from Figure S2 that when the EOF mode increases to the first three modes, the residual field broadly follows a normal distribution. The residual field skewness coefficient of the first mode is 0.22, and the skewness coefficient of the second mode is increased to 0.66. When the mode is increased to the third and fourth modes, the skewness coefficient tends to become stable at −0.199 and −0.15. When the mode is increased to the third mode, the residual field has reasonable symmetry. The kurtosis coefficient of the first mode is 14.95, and the kurtosis coefficient of the second mode is increased to 45.87. When the mode is increased to the third and fourth modes, the kurtosis coefficient tends to stabilize at 16.62 and 18.06. When the mode increases to the third mode, the distribution of the residual field is more uniform. To further clarify the rationality of setting the modal threshold, the variation curves of the kurtosis and skewness of the residual fields after extracting the first six modes, respectively, are presented here (Figure 5). It can be observed that the absolute value of skewness reaches its maximum after the first two modes and then gradually decreases as the number of modes increases. For kurtosis, it reaches its minimum after the first three modes and then shows an increasing trend as the number of modes continues to rise. This is mainly because as more modes are extracted, a large number of precipitation values close to 0 are likely to occur, which causes the kurtosis to increase with the increase in the number of modes. Therefore, choosing an appropriate modal threshold has a crucial impact on the quality control effect. By combining the skewness coefficient and the kurtosis coefficient, it can be established that when the mode is increased to the third mode, the residual field is closer to a normal distribution under the premise of reasonable symmetry. The results in Figure 6 further demonstrate that choosing the first three modes as the threshold in this study is highly reasonable.

In order to clarify whether the use of the first three modes can meet the quality control requirements, the spatial distribution maps of the reconstructed field and residual field at 01:00 on 14 March 2023 are provided, and the explanatory variance and cumulative explanatory variance are calculated (Figure 6). By comparing the reconstructed field, residual field, and observational data, it was found that EOF can effectively extract meso- and small-scale weather information from the observational data, with anomalies manifesting as large values in the residual field, thus allowing for the effective identification of erroneous stations. From the perspective of explained variance and cumulative explained variance, when the third mode is included, the cumulative explained variance reaches 99.24%. Therefore, extracting the first three modes can effectively capture the weather system information contained in the data.

As described above, the analysis area was defined by calculating the variation curve of the maximum correlation coefficient between stations with distance. The analysis area can be divided into subregions starting from the bottom left corner at 0.5° × 0.5° and then gradually moved incrementally to the right or upward by 0.25° to form new sub areas during QC. Rolling QC experiments can be conducted for each subregion separately.

The specific steps of the experimental plan were as follows.

Use the EOF analysis method to decompose the 3D data Rain into two parts in each subregion, i.e., the first n modal reconstruction parts and the remaining modal reconstruction results; then, the observation can be expressed as follows:

${Rain}^{OBS} = {Rain}_{{EOF}_{1 - n}}^{OBS} + {Rain}_{{EOF}_{res}}^{OBS}$

(3)

where ${Rain}^{OBS}$ represents the observed precipitation at the observation station, ${Rain}_{{EOF}_{1 - n}}^{OBS}$ represents the reconstructed field of the first n modes extracted from the observational data following EOF analysis, and ${Rain}_{{EOF}_{res}}^{OBS}$ represents the residual field of the first n modes extracted from the observational data following EOF analysis, where n is taken as 3.
After obtaining the reconstructed field and the residual field, calculate the standard deviation ${df}_{res}^{OBS}$ of ${Rain}_{{EOF}_{res}}^{OBS}$ for all stations at the same time each day, compare ${Rain}_{{EOF}_{res}}^{OBS}$ for each time with the corresponding ${df}_{res}^{OBS}$ , and define outliers within the subregion when $|{Rain}_{{EOF}_{res}}^{OBS}| \geq 2.5 {df}_{res}^{OBS}$ .
After obtaining the distribution of outliers in each subregion at each time, for cases where overlap exists between subregions, if a station in the overlapping region is adjudged an outlier in the two subregions at a certain time, then that station is considered an outlier at that time. Finally, the distribution of outliers at each time in the studied area is obtained.

5. Determination of QC Methods

Owing to the lack of true values, to further clarify the accuracy of the new proposed QC method, an ideal experimental method was first used to evaluate the new QC method. This so-called ideal experiment refers to the addition of different levels of artificial error information to the observational data to test the capability of the new QC method in recognizing different levels of error. In conducting the ideal experiment, three stations were selected at random: station A (27.89°N, 116.26°E), station B (27.64°N, 114.01°E), and station C (27.35°N, 116.34°E). At 00:00 on 14 March 2023, artificial incorrect data were added to the hourly precipitation data of these three stations. Owing to the cumulative precipitation of 25–45 mm over the 10 days at that time, precipitation of 5–60 mm was added at equal intervals in the hourly precipitation data of these three stations and compared with the QC results of the original data. The precipitation trends and exclusion results of the three stations were similar; consequently, the results for station B were selected for display here. Comparison of Figure 7a,b reveals that the additional artificial erroneous disturbances can affect the 10-day cumulative precipitation data for 120 h before and after each time. It is evident from Figure 7a that at 00:00 on 14 March 2023 (coordinate 230 in the figure), when the disturbance precipitation increased to 15 mm, there was a continuous exclusion phenomenon during the period of coordinates 230–290 in the figure. However, at 25 mm, the 10-day cumulative precipitation data showed a continuous exclusion phenomenon during the period of coordinates 200–300 in the figure, indicating that the exclusion results tended to stabilize from 25 mm onward. However, in the original observation data (Figure 7b), there was no phenomenon of exclusion around that time, indicating that the data began to exhibit exclusion when the disturbance precipitation approached the original 10-day cumulative precipitation data. When the disturbance reached or exceeded the original data time, the exclusion began to stabilize.

It should be clarified that the above analyses were based on cumulative precipitation data for QC. In practical application, it is necessary to clarify those observational data that are incorrect at specific times. From the actual QC results of accumulated precipitation, it is evident that there will be continuous exclusions in the QC results. Therefore, the stability of the QC results can be used as a basis for judging the correctness of precipitation data at a single moment.

To establish a suitable threshold, this study statistically analyzed the distribution of excluded precipitation data within the study area at different thresholds (Figure 8a). It is evident that when the threshold is small, the excluded precipitation data exhibit a greater degree of extremism, and that the extreme value of precipitation gradually reduces as the threshold increases. The median value of the excluded precipitation data exhibits little change with the threshold, but from the distribution of the data, it is apparent that the excluded data tend to have similar characteristics as the threshold increases. Thus, it is evident that an ideal exclusion result can be obtained by setting a threshold, while also ensuring a stable exclusion rate. Therefore, the total frequency of data exclusion and the standard deviation of the rate of data exclusion over time for all stations within the study area with different thresholds set within 240 h were calculated (Figure 8b). It is evident that the overall trends of data exclusion frequency and the standard deviation decrease with the increase in the threshold, indicating that the rate of data exclusion will gradually decrease and become more stable over time as the threshold increases. The standard deviation of the frequency and rejection rate over time decreases rapidly in the threshold range of 24–72, and it slows markedly in the threshold range of 72–192. Owing to the stable and high rejection rate of the standard deviation of the frequency and rejection rate over time in the three ranges of 48–72, 72–96, and 96–120, the median of the two ranges is taken to find the threshold in the threshold range of 60–108.

To visualize the data exclusion situation more clearly at different thresholds, the time variation curves of the rate of data exclusion in the study area at each hour under the set equidistant threshold are shown in Figure 9. It is evident that as the threshold increases, the rate of exclusion gradually diminishes, and the fluctuation amplitude of the curve gradually decreases. When the threshold increases to 84, the rate of exclusion tends to stabilize and remain at approximately 1%. Heavy precipitation is usually generated by small- and medium-sized weather systems that have a life cycle of 1–3 days. Therefore, when the threshold continues to increase, the proportion of exclusion periods in the statistical period shows a trend of decline, leading to a trend of increase in the fluctuation of the exclusion rate. Consequently, the threshold was set to 84.

6. Determine the Occurrence Time of Incorrect Hourly Precipitation

Given that incorrect precipitation data at a single point in time can sometimes lead to notable deviations in the cumulative precipitation for up to 84 h or even 10 days, relying solely on threshold-based determination of erroneous data might inadvertently mistake correct data at adjacent time points as erroneous, thereby affecting the overall accuracy of the data. To overcome this problem, it is particularly important to introduce auxiliary criteria to accurately identify the specific moment at which erroneous precipitation occurs.

Using EOF analysis, it is possible to effectively extract the main features from complex weather information, while the residual field tends to follow a normal distribution pattern. In this context, any extreme values that deviate from the norm will appear as statistically significant extrema in the residual field. Based on this characteristic, extreme points in the residual field can be used as indicators to identify and locate the specific points in time of erroneous precipitation records.

To visually demonstrate the effectiveness of this method, the distribution of the residual field during the period of concentrated removal of erroneous data in the ideal experiment is shown in Figure 10. It is evident that the residual field of the excluded data exhibits sudden changes in its temporal variation curve, such as at time 280 on the right-hand side of Figure 10a and at time 180 on the left-hand side of Figure 10c. Combined with the precipitation variation curve, it is apparent that these sudden changes occur when precipitation undergoes marked change in a short period. Moreover, as clearly shown in the figure, at the moment of introducing erroneous precipitation data, the residual field reached the extreme value level within the exclusion period. This reflects the strong basis of the proposed method: based on setting a threshold for judgment, combined with analysis of the time at which the residual field extremum appears, it is possible to more accurately identify the specific time when erroneous precipitation occurs.

After determining the entire set of quality control methods, a flow chart of the quality control methods is presented here (Figure 11).

The detailed quality control process is shown in the following figure.

7. Comparative Analysis of EOF Quality Control Effect with Operational Data

To confirm the effectiveness of the threshold, the 10-day cumulative precipitation data of three consecutive time nodes (i.e., 23:00 on 13 March 2023, 00:00 on 14 March 2023, and 01:00 on 14 March 2023) in the same region were randomly selected for analysis. Figure 12 visually displays the precipitation of each time period and their exclusion status before and after the application of the threshold. At 23:00 on March 13^th, there were significant differences in precipitation between the removed stations and the surrounding stations (such as an extreme value of 85.9 mm or an abnormal extreme value of 0 mm), and these data were continuously removed at 00:00 on March 14^th and 01:00 on March 14^th. By comparing the rejection results at 01:00 on March 14^th with Figure 6b, it is found that the locations of the rejected stations exactly coincide with the regions of large absolute values In Figure 6b. This further confirms that the erroneous precipitation information remains in the residual field. It can also be found that there may be a risk of misjudgment if the threshold is not added. For example, at 23:00 on March 13^th, a station with a precipitation of 35.0 mm was identified as an abnormal station despite no significant anomalies in precipitation compared to neighboring stations, and this misjudgment continued to be removed. After adding a threshold, this phenomenon did not occur. Observing the number of times each station was removed during the quality control cycle, it was found that sites with significant differences from surrounding stations had significantly more removal times. Therefore, it can be concluded that setting a threshold based on this Is reasonable and effective.

After determining the threshold and auxiliary judgment criteria, a new QC algorithm is immediately added to the QC program to control the hourly precipitation observation data. To analyze whether the addition of the new algorithm can improve the QC effect, the spatial correlation coefficient and the root mean square error change curve of AWS data and CMPAS data before and after QC are presented in Figure 13. It is evident that the spatial correlation coefficient generally increases after QC, with a maximum increase of approximately 0.01. Additionally, the root mean square error between the QC data and the CMPAS data generally decreases, with a maximum reduction of approximately 1.

After clarifying that the addition of the new algorithm can effectively improve the QC effect on the data, we further explored the specific impact of adding the new algorithm on the QC effect. Here, the data before and after EOF-based QC were compared with the data after Tianqing QC and the CMPAS precipitation data to evaluate the advantages and disadvantages of EOF-based QC compared with traditional QC methods (Figure 14). At time 338, it is evident that a single point of heavy precipitation appeared in the left-hand part of the figure in the original data, but not in the CMPAS precipitation data. The EOF-based QC method accurately identified the station with abnormal heavy precipitation, whereas the Tianqing QC method failed to identify that station. At time 404t, no precipitation occurred in the CMPAS precipitation data, but false weak precipitation appeared in the original data. The EOF-based QC method successfully identified and controlled the precipitation, while the Tianqing QC method failed to recognize these two stations. At time 506, in the original data, there were two stations with zero precipitation in the middle and on the right-hand side of the figure that were surrounded by stations with light to moderate rain. However, in the CMPAS precipitation data, both stations had light rain. The EOF-based QC method accurately identified these two stations with abnormal zero values, while they were not identified via the Tianqing QC method. Comparison shows that the EOF-based QC method has better capability for recognizing stations with abnormal heavy precipitation, false weak precipitation, and erroneous zero precipitation compared with traditional QC methods.

To verify whether this method can effectively retain accurate local heavy precipitation data, the hourly precipitation data at 18:00 on 21 May 2023 and the radar echoes covering short-term heavy precipitation generated during this period are shown in Figure 15. It is evident that strong echoes of >55 dBZ existed in the area 26.3–26.5°N, 116.3–116.5°E, and that these echoes were stable and nearly stationary, resulting in short-term heavy precipitation with localized rainfall intensity of over 80 mm/h. When the Tianqing QC system detected this short-term heavy rainfall, it was adjudged incorrect data, and then manually verified and reassessed as correct. However, the EOF-based QC method effectively extracts information from small- and medium-sized weather systems, resulting in the retention of such data.

To further verify the effectiveness of this QC method in different regions and seasons, this study also conducted QC experiments on hourly precipitation observation data from 1–25 August 2024 in Hunan Province. The research found that under the existing thresholds, the data can still be effectively quality controlled, and this method can also yield good results in other provinces and seasons. The following is the quality control result (Figure 16) at 10:00 am on 11 August 2024 (Beijing time). From the radar reflectivity, it can be seen that from 9:00 to 10:00, the radar reflectivity above the station was between 45–50 dBz. Through communication with the local meteorological department, we learned that there was an overestimation in the precipitation observation data at the station. Figure 16c shows that the operational Tianqin quality control system failed to identify the erroneous strong precipitation data at that moment, but the EOF-based quality control method established in this study effectively identified the overestimation of precipitation at the station (Figure 16b). The incorrectly observed precipitation at the station, marked in red in the figure, reached 85.9 mm, which is far beyond the range of heavy rain and does not match the radar echo results. Therefore, there is reason to believe in the correctness of the results from the new quality control method.

8. Conclusions and Discussion

China has achieved comprehensive automation of its surface weather observations, and the derived AWS data with high spatiotemporal resolution can better display the multiscale variational characteristics of surface meteorological parameters. However, the application of surface AWS data has long been constrained by the instability of the quality of the automatic observational data. To address this issue, this paper has developed an independent quality control method based on EOF, specifically targeting the strong local characteristics of precipitation. The new QC method has effectively solved the impact of the strong local characteristics of precipitation data on QC, by calculating the cumulative precipitation method. On the basis of obtaining continuous precipitation data, the QC method based on EOF (empirical orthogonal function) analysis has been further introduced. On the basis of extracting the relatively large-scale continuous spatial structure of precipitation data, QC is carried out on the residual field, thus effectively avoiding the impact of weather processes on the QC results.

This method can effectively control hourly precipitation observation data. If this method is integrated into the data assimilation system, it can effectively improve the utilization of precipitation observation data, thereby improving the accuracy of forecasting and obtaining more accurate precipitation reanalysis data. Meanwhile, the precipitation observation data obtained through quality control using this method can provide effective support for the generation of multi-source fusion precipitation observation data. However, it should be pointed out that in the process of returning the quality control results of cumulative precipitation to the hourly precipitation data, the new method requires the observational data of 5 days before and after each piece of data. Therefore, it is still very difficult for the new method to achieve real-time quality control of hourly precipitation observational data, which is also the direction of efforts for follow-up research.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/rs17030404/s1: Figure S1: Probability distribution function of precipitation data for (a) hourly, (b) 5-day, (c) 10-day, and (d) 15-day running accumulation precipitation data in Jiangxi Province. Dashed lines show the closest normal distribution function curve; Figure S2: Probability distribution function of the residual data after removal of (a) the first, (b) the second, (c) the third, and (d) the fourth EOF modes. Dashed lines show the closest normal distribution function curve.

Author Contributions

Conceptualization, H.O., Z.Q., J.H. (Jiahui Hu) and X.X.; methodology, H.O., Z.Q. and X.X.; software, H.O. and Z.Q.; validation, X.X.; analysis, H.O. and Z.Q.; investigation, Y.X., X.L. and Z.Z.; resources, X.X.; data curation, X.X. and Z.Z.; writing—original draft preparation, H.O. and Z.Q.; writing—review and editing, H.O., Z.Q. and X.X.; visualization, H.O. and J.H. (Jiang Huangfu); supervision, Z.Q. and X.X.; project administration, J.Y.; funding acquisition, Z.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 42375004); the Open Project Fund of Guangdong Provincial Key Laboratory of Regional Numerical Weather Prediction, CMA (No. J201801); the Key Research Projects of Jiangxi Provincial Bureau (No. JX2022Z10); and the Geological Disaster Prevention Project in Jiangxi Province (No. B360000010073).

Data Availability Statement

The data presented in this study are available upon request from the corresponding author due to restrictions, e.g., privacy or ethics.

Acknowledgments

The author would like to thank the Jiangxi Provincial Meteorological Bureau for providing the original observation data.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Islam, M.N.; Terao, T.; Uyeda, H.; Hayashi, T.; Kikuchi, K. Spatial and temporal variations of precipitation in and around Bangladesh. J. Meteor. Soc. Jpn. Ser. II 2005, 83, 21–39. [Google Scholar] [CrossRef]
Zhang, H.; Zhai, P. Temporal and spatial characteristics of extreme hourly precipitation over eastern China in the warm season. Adv. Atmos. Sci. 2011, 28, 1177–1183. [Google Scholar] [CrossRef]
Trenberth, K.E. Atmospheric moisture residence times and cycling: Implications for rainfall rates and climate change. Clim. Chang. 1998, 39, 667–694. [Google Scholar] [CrossRef]
Niu, Z.Y.; Zhang, L.; Han, Y.; Dong, P.M.; Huang, W. Performances between the FY-4A/GIIRS and FY-4B/GIIRS long-wave infrared (LWIR) channels under clear-sky and all-sky conditions. Q. J. R. Meteorol. Soc. 2023, 149, 1612–1628. [Google Scholar] [CrossRef]
Niu, Z.Y.; Tang, F.; Wang, L.W. All-sky assimilation of FY-4A AGRI water vapor channels: An observing system experiment study for south Asian monsoon prediction. Q. J. R. Meteorol. Soc. 2024, 150, 1–14. [Google Scholar] [CrossRef]
Hou, T.; Kong, F.; Chen, X.; Lei, H.; Hu, Z. Evaluation of radar and automatic weather station data assimilation for a heavy rainfall event in southern China. Adv. Atmos. Sci. 2015, 32, 967–978. [Google Scholar] [CrossRef]
Lagouvardos, K.; Kotroni, V.; Bezes, A.; Koletsis, I.; Kopania, T.; Lykoudis, S.; Mazarakis, N.; Papagiannaki, K.; Vougioukas, S. The automatic weather stations NOANN network of the National Observatory of Athens: Operation and database. Geosci. Data J. 2017, 4, 4–16. [Google Scholar] [CrossRef]
Durre, I.; Menne, M.J.; Gleason, B.E.; Houston, T.G.; Vose, R.S. Comprehensive automated quality assurance of daily surface observations. J. App. Meteor. Climatol. 2010, 49, 1615–1633. [Google Scholar] [CrossRef]
Dai, A. Global precipitation and thunderstorm frequencies. Part I: Seasonal and interannual variations. J. Clim. 2001, 14, 1092–1111. [Google Scholar] [CrossRef]
Alexander, L.V.; Zhang, X.; Peterson, T.C.; Caesar, J.; Gleason, B.; Klein Tank, A.M.H.; Haylock, M.; Collins, D.; Trewin, B.; Rahimzadeh, F.; et al. Global observed changes in daily climate extremes of temperature and precipitation. J. Geophys. Res. Atmos. 2006, 111, D05109. [Google Scholar] [CrossRef]
Zahumenský, I. Guidelines on Quality Control Procedures for Data from Automatic Weather Stations; World Meteorological Organization: Geneva, Switzerland, 2004; Volume 955, pp. 2–6. [Google Scholar]
Boulanger, J.P.; Aizpuru, J.; Leggieri, L.; Marino, M. A procedure for automated quality control and homogenization of historical daily temperature and precipitation data (APACH): Part 1: Quality control and application to the Argentine weather service stations. Clim. Chang. 2010, 98, 471–491. [Google Scholar] [CrossRef]
Hamada, A.; Arakawa, O.; Yatagai, A. An automated quality control method for daily rain-gauge data. Global Environ. Res. 2011, 15, 183–192. [Google Scholar]
Dandrifosse, S.; Jago, A.; Huart, J.P.; Michaud, V.; Planchon, V.; Rosillon, D. Automatic quality control of weather data for timely decisions in agriculture. Smart Agric. Technol. 2024, 8, 100445. [Google Scholar] [CrossRef]
Hubbard, K.G.; You, J. Sensitivity analysis of quality assurance using the spatial regression approach—A case study of the maximum/minimum air temperature. J. Atmos. Ocean. Tech. 2005, 22, 1520–1530. [Google Scholar] [CrossRef]
Barnes, S.L. A technique for maximizing details in numerical weather map analysis. J. Appl. Meteor. Climatol. 1964, 3, 396–409. [Google Scholar] [CrossRef]
Dodson, R.; Marks, D. Daily air temperature interpolated at high spatial resolution over a large mountainous region. Clim. Res. 1997, 8, 1–20. [Google Scholar] [CrossRef]
Ruggiero, F.H.; Sashegyi, K.D.; Madala, R.V.; Raman, S. The use of surface observations in four-dimensional data assimilation using a mesoscale model. Mon. Wea. Rev. 1996, 124, 1018–1033. [Google Scholar] [CrossRef]
Guo, Y.R.; Shin, D.H.; Lee, J.H.; Xiao, Q.N.; Barker, D.M.; Kuo, Y.H. Application of the MM5 3DVAR system for a heavy rain case over the Korean Peninsula. In Proceedings of the Twelfth PSU/NCAR Mesoscale Model Users’ Workshop NCAR, Boulder, CO, USA, 24–27 June 2002. [Google Scholar]
Ren, Z.H.; Zhao, P.; Zhang, Q.; Zhang, Z.; Cao, L.; Yang, Y.; Zou, F.; Zhao, Y.; Zhao, H.; Chen, Z. Quality control procedures for hourly precipitation data from automatic weather stations in China. Meteor. Mon. 2010, 36, 123–132. (In Chinese) [Google Scholar]
Zhong, L.; Zhang, Z.; Chen, L.; Yang, J.; Zou, F. Application of the Doppler weather radar in real-time quality control of hourly gauge precipitation in eastern China. Atmos. Res. 2016, 172, 109–118. [Google Scholar] [CrossRef]
Sahlaoui, Z.; Mordane, S. Radar rainfall estimation in Morocco: Quality control and gauge adjustment. Hydrology 2019, 6, 41. [Google Scholar] [CrossRef]
Liu, W.; Zhang, Q.; Fu, Z.; Chen, X.; Li, H. Analysis and estimation of geographical and topographic influencing factors for precipitation distribution over complex terrains: A case of the northeast slope of the Qinghai–Tibet Plateau. Atmosphere 2018, 9, 349. [Google Scholar] [CrossRef]
Qin, Z.K.; Zou, X.; Li, G.; Ma, X.L. Quality control of surface station temperature data with non-Gaussian observation-minus-background distributions. J. Geophys. Res. Atmos. 2010, 115, D1631. [Google Scholar] [CrossRef]
Wang, Y.; Xu, Z.F.; Fan, G.Z. Study of EOF quality control method of 2m temperature. Plateau Meteor. 2013, 32, 2564–2574. (In Chinese) [Google Scholar]
Zhao, H.; Qin, Z.K.; Wang, J.C.; Liu, Y. Case studies and applications of the Empirical Orthogonal Function quality control in variational data assimilation systems for surface observation data. Acta Meteor. Sin. 2015, 73, 749–765. (In Chinese) [Google Scholar]
Shao, Y.H.; Qin, Z.K.; Li, X. Quality control based on EOF for surface temperature observations from high temporal-spatial resolution automatic weather stations. Trans. Atmos. Sci. 2022, 45, 603–615. (In Chinese) [Google Scholar]
Shen, W.B.; Li, X.; Qin, Z.K. Restoration Method for Automatic Station Temperature Observation Data Based on EOF lteration. Chin. J. Atmos. Sci. 2022, 46, 406–418. (In Chinese) [Google Scholar]
Shang, Y.Y.; Zhang, B.; Qin, Z.K. Independent Quality Control of High Spatiotempora Resolution Surface Temperature Observations from Automatic Stations. Plateau Meteorol. 2024, 43, 967–981. (In Chinese) [Google Scholar]
Gebremichael, M.; Krajewski, W.F. Modeling distribution of temporal sampling errors in area-time-averaged rainfall estimates. Atmos. Res. 2005, 73, 243–259. [Google Scholar] [CrossRef]
Huo, Q.; He, C.W.; He, L.; Gao, F.; Chen, S.; Xu, Y. Design and application of algorithm intensive environment for CMA big data and cloud platform. J. Appl. Meteor. Sci. 2024, 35, 80–89. (In Chinese) [Google Scholar]
Shi, C.X.; Pan, Y.; Gu, J.X.; Xu, B.; Han, S.; Zhu, Z.; Zhang, L.; Sun, S.; Jiang, Z. A review of multi-source meteorological data fusion products. Acta Meteor. Sin. 2019, 77, 774–783. (In Chinese) [Google Scholar]
Merino, A.; Fernández-González, S.; García-Ortega, E.; Sánchez, J.L.; López, L.; Gascón, E. Temporal continuity of extreme precipitation events using sub-daily precipitation: Application to floods in the Ebro basin, northeastern Spain. Int. J. Climatol. 2018, 38, 1877–1892. [Google Scholar] [CrossRef]
Tang, G.; Clark, M.P.; Papalexiou, S.M. The use of serially complete station data to improve the temporal continuity of gridded precipitation and temperature estimates. J. Hydrometeor. 2021, 22, 1553–1568. [Google Scholar] [CrossRef]
Yoshikane, T.; Yoshimura, K. A bias correction method for precipitation through recognizing mesoscale precipitation systems corresponding to weather conditions. PLoS Water 2022, 1, e0000016. [Google Scholar] [CrossRef]
Navarra, A.; Simoncini, V. A Guide to Empirical Orthogonal Functions for Climate Data Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2010. [Google Scholar]

Figure 1. Temporal variation curves of regional average hourly and cumulative precipitation in Jiangxi Province. Black line represents hourly precipitation (mm); red, blue, and green lines represent 5-, 10-, and 15-day running average precipitation (mm), respectively.

Figure 2. Spatial distribution of the maximum, median, and standard deviation of precipitation (mm) from 09:00 on 20 February 2024 to 08:00 on 31 February 2024 (UTC): maximum (First column), median (Second column), and standard deviation (Third column) of hourly precipitation (First line) and accumulated precipitation in ten days (Second line).

Figure 3. The variation curves of kurtosis coefficient and skewness coefficient for days 1–11.

Figure 4. Variation in the maximum correlation coefficient with distance between two stations. The colored scatter points represent the correlation coefficients of each station, black line represents the original correlation coefficient variation curve, red line represents the correlation coefficient variation curve after five-point smoothing.

Figure 5. Curves of changes in skewness and kurtosis coefficient after extraction of the first six modes.

Figure 6. Spatial distributions of the reconstructed field (a) and the residual field (b) after EOF extraction at 01:00 on 14 March 2023, along with the explainable variance and cumulative explainable variance in the area shown in the figure (c).

Figure 7. Time series of precipitation variation at each station in the idealized experiment and the actual QC, together with the removal of data points: (a) the idealized experiment and (b) the original data. Solid lines indicate the 10-day accumulated precipitation (mm), the dashed line shows the hourly precipitation (mm), and the hollow points represent the moments when data were rejected. In (a), the colored solid points indicate artificially added hourly precipitation values, and the solid lines show the 10-day accumulated precipitation curve with increments ranging from 5–60 mm.

Figure 8. (a) Box plot of precipitation probability density function after QC under different thresholds, and (b) the number of rejected data (red bars) and the standard deviation of rejected station numbers (blue bars) in different threshold intervals.

Figure 9. Variation of rejection rate with time (hour) for different thresholds represented by different colors.

Figure 10. Variation of precipitation with observation time (hour), with different colors representing different amounts of artificial errors added at (a) station A (27.89°N, 116.26°E), (b) station B (27.64°N, 114.01°E), and (c) station C (27.35°N, 116.34°E).

Figure 11. Flow chart of EOF-based QC methods.

Figure 12. The exclusion of 10-day cumulative precipitation data before and after adding the threshold in the quality control cycle. The red color represents the removed data, (a,b) are before and after adding the threshold at 23:00 on 13 March 2023, (c,d) are before and after adding the threshold at 00:00 on 14 March, (e,f) are before and after adding the threshold at 01:00 on 14 March, and (g) is the exclusion frequency.

Figure 13. Variation in (a) the spatial correlation coefficient and (b) the root mean square error (RMSE) with observation time between the CMPAS data and the observations before (red) and after (blue) QC.

Figure 14. Spatial distribution of hourly precipitation (mm) before QC (First column) and after QC via the Tianqing QC system (Second column), via the new QC method (Third column), and via CMPAS data (Fourth column) at 18:00 BJT on 11 March 2023 (First line), 12:00 BJT on 14 March 2023 (Second line), and 18:00 BJT on 18 March 2023 (Third line). The false weak precipitation is marked with blue numbers, and the rejected data are marked with red numbers.

Figure 15. Spatial distribution of (a) the reflectivity of the Ganzhou radar, and precipitation data (mm) at 18:00 on 21 May 2023 after QC via (b) the new QC method and (c) the Tianqing QC method. Red numbers represent excluded data.

Figure 16. Spatial distribution of (a) the reflectivity of the Shaoyang radar, and precipitation data (mm) at 10:00 on 11 August 2024 after QC via (b) the new QC method and (c) the Tianqing QC method. Red numbers represent excluded data.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ouyang, H.; Qin, Z.; Xu, X.; Xu, Y.; Huangfu, J.; Li, X.; Hu, J.; Zhan, Z.; Yu, J. Autonomous Quality Control of High Spatiotemporal Resolution Automatic Weather Station Precipitation Data. Remote Sens. 2025, 17, 404. https://doi.org/10.3390/rs17030404

AMA Style

Ouyang H, Qin Z, Xu X, Xu Y, Huangfu J, Li X, Hu J, Zhan Z, Yu J. Autonomous Quality Control of High Spatiotemporal Resolution Automatic Weather Station Precipitation Data. Remote Sensing. 2025; 17(3):404. https://doi.org/10.3390/rs17030404

Chicago/Turabian Style

Ouyang, Hongxiang, Zhengkun Qin, Xingsheng Xu, Yuan Xu, Jiang Huangfu, Xiaomin Li, Jiahui Hu, Zixuan Zhan, and Junjie Yu. 2025. "Autonomous Quality Control of High Spatiotemporal Resolution Automatic Weather Station Precipitation Data" Remote Sensing 17, no. 3: 404. https://doi.org/10.3390/rs17030404

APA Style

Ouyang, H., Qin, Z., Xu, X., Xu, Y., Huangfu, J., Li, X., Hu, J., Zhan, Z., & Yu, J. (2025). Autonomous Quality Control of High Spatiotemporal Resolution Automatic Weather Station Precipitation Data. Remote Sensing, 17(3), 404. https://doi.org/10.3390/rs17030404

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Autonomous Quality Control of High Spatiotemporal Resolution Automatic Weather Station Precipitation Data

Abstract

1. Introduction

2. Data Sources and Preprocessing

3. Spatiotemporal and Probability Distribution Characteristics of Cumulative Precipitation

4. Quality Control Method Based on Partition EOF

5. Determination of QC Methods

6. Determine the Occurrence Time of Incorrect Hourly Precipitation

7. Comparative Analysis of EOF Quality Control Effect with Operational Data

8. Conclusions and Discussion

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI