**2. Methodology**

To explore the data assimilation frequency of the short-range precipitation forecasting, the numerical model WRF is chosen for this study [7] and its assimilating extension WRF-3DAVR is used. WRF version 3.7 is used for all experiments. The WRF model is a fully compressible, non-hydrostatic, mesoscale NWP, and atmospheric simulation system. In addition, WRF-3DAVR can further influence the initial state of the WRF model by assimilating different high-resolution observations. A brief overview of the basic model settings and the system description can be found in the following sections.

#### *2.1. A Brief Description of WRF-3DVAR*

The fundamental objective of the 3DVAR system is to seek an optimal estimate of initialization at parsing time through an iterative solution of a pre-determined function, including observations, background forecast from the NWP system, etc. [34].

$$J(\mathbf{x}) = \mathbf{J}^b + \mathbf{J}^o = \frac{1}{2} \left( \mathbf{x} - \mathbf{x}^b \right)^T \mathbf{B}^{-1} \left( \mathbf{x} - \mathbf{x}^b \right) + \frac{1}{2} (\mathbf{y} - \mathbf{y}^o)^T (\mathbf{E} + \mathbf{F})^{-1} (\mathbf{y} - \mathbf{y}^o) \tag{1}$$

Iterative solutions to Equation (1) can summarize the Dimensional Variational Assimilation problem, to seek the analysis state variable *x* to minimize *J*(*x*). In the formula, *x* is the variable that represents the surface and atmospheric surface state, *xb* is the first guess (or background) acquired from the previous forecast, and *y*0 is the assimilated observation. *Y* = *H*(*x*) is the model-derived observation space, which is transformed from gridded analysis *x* by the observation operator *H* for comparison against *y*0. The individual data points are fitted by the weight of its error estimate: *B*, *E,* and *F* are the background error covariance matrix, observation error covariance matrix, and representativity error covariance matrices, respectively. This solution represents a minimum variance estimate of the true state of the atmosphere given the two sources of a priori data: the first guess *xb* and the observation *y*0 [35].

The WRF-3DVar is a variational data assimilation system designed and built in the WRF model, and is used to assimilate radar reflectivity and GTS in this study [6,36]. The background error covariance CV3 created by the National Meteorological Center (NMC) was employed, which has the advantage of wide applicability [37].

#### *2.2. WRF Model Configuration*

The primary focus of this study is Fuping and Zijingguan catchments covered by the innermost domain of a three nested domain. For the study area, the center of the domain is at 39◦2600N and 114◦4600E, and from the outermost to the innermost the nested domain sizes are 1260 × 1260, 450 × 360, and 145 × 115 km2. Considering the diversity in assimilation variables, the horizontal grid spacing of the outermost domain is set at 9 km and the grid size of the innermost layer is set to be 1 km, where the downscaling radio is set to be 1:3 [38,39]. The locations of the nested domain and radar coverage are shown in Figure 1. Forty vertical pressure levels are considered for the three nested domains with a model top at 50 hPa [40,41].

**Figure 1.** The study area with nested configuration of WRF domains at 9, 3, and 1 km resolution.

The 1◦×1◦ spatial resolution of Global Forecast System (GFS) forecast data provide the initial and lateral boundary conditions for the simulations. Considering the resolution of the GFS data, it would be more reasonable to add another domain with 27 km horizontal grid spacing as the outermost one. Unfortunately, experimental runs are carried out and the effects of data assimilation are found to be similar with the current domain settings. Since rainstorm forecasting has a high requirement of effectiveness for a given period of time, in practical application it is beneficial to obtain the forecasting information as soon as possible. Therefore, the 27 km grid is not applied in this study. The integration time-step of the WRF model and WRF-3DAVR system is 6 s and the time interval for the output is set to be 1 h. The GFS data, which is updated every six hours, is a real-time product provided by National Centers for Environmental Prediction (NCEP). Because GFS has not been processed by assimilation analysis, it has been used in many studies to forecast the storm events in WRF model and WRF-3DAVR [3,5,31].

The performance of the model is highly dependent on the parameterized scheme, and the scheme selected in the study may be suitable for one storm event in one region, but not necessarily for others [42]. Since it is difficult to judge which scheme is most suitable for future storm events, parameterized schemes are usually determined in advance for practical application [43]. Based on relevant experimental research on the selection of sensitive parameterization schemes for ensemble rainfall forecasting [44,45], the physical parameters with the best applicability in the study area were adopted in this study. Details of the parameterizations that have significant impacts on the generation of rainfall used in the assimilation experiments are shown in Table 1. It is worth noting that the cumulus scheme is switched off for the 3 and 1 km domain.

**Table 1.** Details of the parameterizations used in the assimilation experiments.


#### **3. Case Study and Data**

#### *3.1. Study Area and Storm Events*

The main focus of this study is the Fuping and Zijingguan watershed covered by Domain 2 and Domain 3 of the WRF model. According to the temporal and spatial distribution characteristics of rainfall and the representatives of rainfall-runoff generation characteristics, the case used in this study is four 24-h rainstorm events, which occurred in Fuping and Zijingguan watershed. Fuping catchment is located at 39◦22N~38◦47N and 113◦40E~114◦18E with a drainage area of 2210 km<sup>2</sup> and Zijingguan catchment is located at 39◦13~39◦40N and 114◦28~115◦11E with a drainage area of 1760 km2. They belong to the south and the north branch of the Daqinghe basin respectively, located in northern China, having a warm temperate continental monsoon climate. Terrain elevation of the study area varies from 2286 m in the northwest part to 200 m in the southeast mountains. The average annual rainfall in the study catchments is approximately 600 mm, the short episodes of intensive precipitation are more frequent in late May and early September. The steep terrain leads to a short confluence time of the flood, which together with high-intensity, short-duration precipitation is prone to cause severe flood disasters. They are concentrated in areas with thin soil layer and low vegetation coverage especially. In particular, on 21 July 2012, the 24 h rainstorm event occurred in the Beijing-Tianjin-Hebei region, which has been widely concerned because of the heavy rainfall intensity and large losses. The duration of the four storm events and the accumulative rainfall amounts are shown in Table 2. Among the four storms, event 4 is the heaviest one trigged by a severe convective system,

concentrated in a small area of the watershed and with high rainfall intensity in a short time period. The other three storms are typical stratiform rains with moderate intensity but various distributions in space and time. The measured rainfall data are rainfall stations and the time interval is 1 h. The 24 h rainfall accumulation is computed by Thiessen polygon method, which averages the observations from the rain gauges. Figure 2 demonstrates a detailed map of the study area, such as the rain gauges, watershed boundary, catchment outlets topography, and major rivers.



**Figure 2.** Location map and two study sites in the Daqinghe catchment.

In reality, the precipitation in northern China is not absolutely even in time and space, which is different from that in southern China. To study the spatial and temporal evenness of the precipitation in study catchments, both spatial and temporal variation coefficient (*Cv*) of four storm events from 1985 to 2015 are calculated. A threshold of 5% was employed to separate even and uneven rainfall events. The evenness of the rainstorm events is quantitatively evaluated in the spatial and temporal dimensions by variation coefficient *Cv* in this study. The lower *Cv* is, the more even the rainfall is. Based on the thresholds selected, we found two critical values: 0.4 for spatial *Cv* and 0.6 for temporal *Cv*. When the rainfall *Cv* values were less than the threshold *Cv* values, storm events have a relatively even distribution of rainfall over the respective catchment. Table 3 shows the *Cv* values of the four storm events in the both spatial and temporal dimensions. It can be found that the rainfall is more uneven in time than in space, which is helpful to analyze the spatiotemporal distribution of rainfall forecast results. It can be seen from Table 3 that the

four rainstorms have different spatial and temporal evenness, and the selection of rainfall events can provide reference for different types of rainfall in northern China.

$$Cv = \sqrt{\frac{\sum\_{j=1}^{N} \left(\frac{x\_j}{N} - 1\right)^2}{N}} \tag{2}$$

In the spatial dimension, *xj* is the accumulated 24 h rainfall at the rain gauge *j*, and *x* is the average of *xj*. *N* is the number of rain gauges. In the temporal dimension, *xj* is average hourly rainfall from all the rain gauges at time *j*, and *N* is the number of hours. The higher *Cv* is, the more uneven the rainfall distribution is.


**Table 3.** Rainfall evenness of the four selected 24-h storm events in space and time.

#### *3.2. Data Assimilation Experiments*

In this study the radar reflectivity and GTS data observations forecasting are assimilated into WRF and WRF-3DVAR system for 24 h typical storm events.

#### 3.2.1. Weather Radar Data

The Doppler radar used in this study is the S-band radar located at Shijiazhuang city, which is provided by the National Meteorological Administration and covers a radius of 250 km and can completely cover the two study areas. The radar completed a volume sweep every 6 min, with 9 beam angles. By using a data format conversion program, the binary base data file of Shijiazhuang Doppler radar is directly written into "ob.radar" file as the observation field input file of wrfvar.exe. The format of the ob.radar file can be found in the WRFDA user guide. The ob.radar file includes the latitude and longitude of the pixel center and the height of the radar beam above that pixel. The technique details of the radar data are provided in Table 4.

**Table 4.** Basic parameters of S-band radar in Shijiazhuang.


The radar reflectivity is conducted through the quality control chain to improve the quality of the measurements before importing the assimilation process [52]. Therefore, before being assimilated by WRF-3DVAR, radar data are supported by China Integrated Meteorological Information Service System (CIMISS) of China Meteorological Administration to remove artefacts such as ground clutter, radial interference echo, speckles through quality control. For convenience, *dBZ* is often used to represent radar reflectivity:

$$
\hbar \text{dBZ} = 10 \lg^Z / \text{Z}\_0 \tag{3}
$$

where, *Z*0 = 1 mm6/ m3, is a constant, and *Z* is reflectivity.

According to Sun and Crook [12], the radar data are assimilated through an observational operator that describes the relationship between radar reflectivity and rainwater mixing ratio by

$$Z = 43.1 + 17.5 \log \left( \rho q\_r \right) \tag{4}$$

where *Z* is the reflectivity, *ρ* is the air density in kg/m3, and *qr* is the rainwater mixing ratio. This relation is derived analytically by assuming the Marshall–Palmer distribution of raindrop size. At the same time, the pixel-based radar reflectivity is assimilated directly in WRF-3DVar, by stating the latitude and longitude of the pixel center and the height of the radar beam above that pixel.

## 3.2.2. GTS Data

The downloaded GTS data is a collection of all kinds of meteorological observation data, including surface weather station, ship, buoy, pilot balloon, sonde, aircraft, and satellite observations, and its format is easily recognized by WRF-3DVAR. The decoded data is converted into a suitable Little\_R format by shell script, which is used for data assimilation of WRF-3DVAR. Five GTS datasets, including Sound, Synop, Pilot, AIREP, and Metar, were incorporated into the WRF model at 6 h assimilation time interval in this study, the number of observations were 2718, 4217, 733, 201, 612, respectively. The assimilated GTS data was directly interpolated into the background field of the model, and the background field was corrected by a certain algorithm.

GTS data has the characteristics of wide coverage and small spatial density, which is suitable for assimilation on a large scale. On the other hand, radar data coverage is relatively restricted, but the data spatial density is intense, and therefore is more suitable for assimilation in small scale. Since the scanning radius of the radar is 250 km, thus the coverage range is similar to that of Domain 2, which is much smaller than that of the outer domain and larger than the innermost domain. Therefore, in this study the radar data is assimilated only in Domain 2, whereas the GTS data is assimilated in the outer domain (Domain 1). Since the GTS data is released every six hours, in the hourly assimilation scheme, GTS is only assimilated at the 6th, 12th, 18th, and 24th hour from the start of the storm.

Four experiments were conducted at different horizontal resolutions while keeping all physical settings the same. In scheme "NA\_1km", Global Forecast System (GFS) forecast data is interpolated to the model grid as the initial conditions for the 24 h forecast without data assimilation in all the three nested domains. In scheme "DA\_1h\_1km", 1 h data assimilation is carried out with rainfall forecasts output from the 1 km domain (Domain 3). The settings of Scheme "DA\_1h\_3km" are the same as DA\_1h\_1km, except that rainfall forecasts are output from the 3 km domain (Domain 2). In the scheme "DA\_6h\_3km", the 1 km innermost domain is removed with only Domain 1 and Domain 2 left. Radar and GTS data are assimilated every 6 h and rainfall forecasts are output from Domain 2. Detailed explanations of the experimental design and the settings of the four schemes are shown in Table 5. Before the data assimilation cycle starts, the WRF model spins up for 30 h in all schemes with the initial condition from GFS. For cycling data assimilation, the prediction of the previous assimilation run serves as the background for the next run.

The root mean square error (*RMSE*), the mean bias error (*MBE*), and the critical success index (*CSI*) are used to evaluate the simulated precipitation of the WRF and WRF-3DAVR model. After the analysis of indices, *CSI/RMSE* is used as the comprehensive evaluation index to explore the more intuitive response of different rainfall types to forecast errors in temporal and spatial scales.


**Table 5.** Experiment description.

*CSI* [53] denotes the percentage of correct simulation between the forecast and observations, and the perfect score is 1. According to Equation (6), the calculation of *CSI* depends on whether it rains or not. Since the essence of WRF model simulation is to solve equations, it is inevitable that the rainfall calculation result is close to 0. In order to avoid the light rain in the prediction being included in *H* or *R*, hourly rain rate less than 0.01 mm is considered as no rain, and *CSI* is based on the method in Table 6 to classify the rainfall simulation results.

**Table 6.** Rain/no rain contingency table for the WRF simulation against observation.


Classified variables *H*, *R* and *S* represent whether the predicted and observed values in a certain observation period or observation position are greater than 0.01. If both predicted and observed values are greater than 0.01, that is, rainfall is captured in model, *H*+1; If the predicted value is greater than 0.01 and the observed value is less than or equal to 0.01, that is, the model misreports rainfall, then *R*+1; If the predicted value is less than or equal to 0.01 and the observed value is greater than 0.01, that is, the rainfall is missed in model, then *S*+1. If both the predicted value and the observed value are equal to 0.01, that is, the model accurately predicts the scenario without rainfall.

$$RMSE = \sqrt{\frac{1}{M} \sum\_{i=1}^{M} \left(Q\_i^{'} - Q\_i\right)^2} \tag{5}$$

$$CSI = \frac{1}{M} \sum\_{i=1}^{M} \frac{H\_i}{H\_i + S\_i + R\_i} \tag{6}$$

$$MBE = \frac{1}{M} \sum\_{i=1}^{M} \left( Q\_i' - Q\_i \right) \tag{7}$$

For the spatial dimension, *Qi* and *Qi* denote the observation and prediction of 24 h rainfall accumulations at each rain gauge *i*. *M* is the number of the rain gauges, which is 8

for the Fuping catchment and 11 for the Zijingguan catchment. For the temporal dimension, *Qi* and *Qi* are the average areal rainfall of observation and prediction at each time step *i*. This time *M* is 24, which represents the number of the time steps. The specific meaning can be seen in Table 7.


**Table 7.** The meaning of letter in formulas, including *Qi*, *Qi*, *i*, and *M*.

The calculation of *CSI* is based on the rain or no rain contingency table (Table 7). For the spatial dimension, the predicted rainfall was compared with observations at rain gauge locations to calculate the indices of *H*, *R*, and *S* at the time *i*, and then the values of the indices at all times are averaged to obtain *CSI* according to Equation (6). *M* is the number of total times. In this study, the time *i* is consistent with output frequency of the model, which is 1 h. That is, *CSI* is the average value of the *H*/(*H* + *R* + *S*) for each hour of the 24 h rainfall duration. Similarly, for the temporal dimension, the indices in Table 7 are calculated based on the time series data obtained for the simulated and observed areal rainfall at the rain gauge *i*. The values of the indices at all rain gauges are then averaged to produce the final *CSI* value based on Equation (6). In this case, *M* refers to the total number of rain gauges rather than the simulation time.
