*2.1. Wind Observation Data*

In order to estimate the improvement of the satellite data assimilation to wind speed simulation, wind speed observations from seven wind observation towers were used. These wind towers have wind speed observations at different heights (10, 50, and 70 m) and measure the instantaneous wind speed and direction every 10 min. The data of wind towers were provided by China Huaneng Group Co., Ltd. (CHNG), and all wind towers are located in the wind farm of CHNG.

Table 1 shows the geographical locations and altitudes of the seven wind towers. These wind towers are geographically close, and all of them are located near the coastal area in Yangjiang, Guangdong Province. Table 2 is the wind-sensor type, model number, hardware, and software version of wind towers. Figure 1 shows these towers' locations in the inner domain of the model.


**Table 1.** Latitude, longitude, and terrain height of wind observation towers.



The observation time of the wind towers is the whole year of 2012. For the original data, quality control (QC) was performed first; another wind resource assessment research study [16] used the similar type of wind observations, so we used the same QC method as that study. The QC method was as follows: (1) If the wind speed value does not change for more than 30 min, these data are regarded as invalid data. (2) If there is a large difference between the observed wind speeds at different heights, the data with small value at that time are also considered as invalid data. The comparison methods were as follows: |*V*<sup>70</sup> − *V*50| > 4.0*m*/*s*, |*V*<sup>70</sup> − *V*10| > 8.0*m*/*s*, or |*V*<sup>50</sup> − *V*10| > 8.0*m*/*s*, where *V*70, *V*50, and *V*<sup>10</sup> are the wind speed at 70, 50, and 10 m.

Table 3 shows the observation data numbers of each wind tower in the different months of 2012, and Table 4 is the observation data amount after the quality control. Some towers have missing data in autumn and winter. The data of Tower 5 at heights of 50 and 70 m are considered invalid because of poor quality.

Oct.

Nov.

401 9

409 6

Dec. 0 0 0

403 7

413 8

403 7

413 8

428 3

403 2

408 0

428 4

403 2

408 0

428 3

403 2

408 0

446 4

432 0

136 2

526

430

260

446 4

432 0

136 2

2.2. Numerical Model and Data Assimilation

conventional surface and upper-air observation data as the data assimilation input.

We set three numerical simulation tests to measure the improvement of wind speed simulation by applying satellite data assimilation. The first test (Test 1) only used cold-start initial conditions from NCEP's final analysis data to create a simulation of wind speed. The second test (Test 2) used the analysis field generated by the data assimilation system as the initial conditions, and the model field was updated four times by the data assimilation system during each simulation run. In order to compare the improvement of satellite data assimilation with conventional observations data, we set

0 0 0 4320 0 0 0 0 0 3852

0 0 0 3888 0 0 0 0 0 4176

0 0 0 3924 0 0 0 0 0 4368

385 2

417 6

436 8

0

0

0

Figure 1. Terrain height of inner domain and the distribution of the seven wind observation towers **Figure 1.** Terrain height of inner domain and the distribution of the seven wind observation towers (red dots). "T1" represents Tower 1, "T2" represents Tower 2, and so forth.

(red dots). "T1" represents Tower 1, "T2" represents Tower 2, and so forth.

#### *2.2. Numerical Model and Data Assimilation*

The Model we used in our work was the WRF model (version 3.8.1), and its data-assimilation system WRFDA [4] was used for satellite and conventional data assimilation. Figure 2 shows the three nested domains of our simulation tests. The inner domain we used in our simulations was mainly located in the coastal area of Yangjiang, Guangdong. Figure 1 shows the distribution of wind towers (red dots) in the inner domain. The chosen of physical configuration considered the following schemes: Morrison double-moment scheme [17] for microphysics; RRTMG [18] for longwave and shortwave radiation; Noah [19] for land-surface scheme; Kain–Fritsch [20] for cumulus convention; and YSU [21] for PBL. Table 5 shows the model configuration of simulations; since domain 03's grid We set three numerical simulation tests to measure the improvement of wind speed simulation by applying satellite data assimilation. The first test (Test 1) only used cold-start initial conditions from NCEP's final analysis data to create a simulation of wind speed. The second test (Test 2) used the analysis field generated by the data assimilation system as the initial conditions, and the model field was updated four times by the data assimilation system during each simulation run. In order to compare the improvement of satellite data assimilation with conventional observations data, we set a third test (Test 3), which used the same data assimilation configurations of Test 2, except the conventional surface and upper-air observation data as the data assimilation input.

resolution was less than 5 km, we did not need to set the cumulus convention scheme for domain 03. The Model we used in our work was the WRF model (version 3.8.1), and its data-assimilation system WRFDA [4] was used for satellite and conventional data assimilation. Figure 2 shows the three nested domains of our simulation tests. The inner domain we used in our simulations was mainly located in the coastal area of Yangjiang, Guangdong. Figure 1 shows the distribution of wind towers (red dots) in the inner domain. The chosen of physical configuration considered the following schemes: Morrison double-moment scheme [17] for microphysics; RRTMG [18] for longwave and shortwave radiation; Noah [19] for land-surface scheme; Kain–Fritsch [20] for cumulus convention; and YSU [21] for PBL. Table 5 shows the model configuration of simulations; since domain 03's grid resolution was less than 5 km, we did not need to set the cumulus convention scheme for domain 03. All of the test cases (Test 1, Test 2, and Test 3) used the same WRF model configurations, including parameter settings and model domains. The ETA values of the near-surface layers were 1.0000, 0.9960, 0.9920, 0.9900, 0.9851 . . . The average altitudes of near-surface layers in domain 03 were 16.31, 48.97, 73.52, and 101.85 m.




**Table 4.** The number of wind speed observations after quality control.


All of the test cases (Test 1, Test 2, and Test 3) used the same WRF model configurations, including parameter settings and model domains. The ETA values of the near-surface layers were 1.0000,

Figure 2. Terrain height of the model's simulation domains. There are three nested domains: domain **Figure 2.** Terrain height of the model's simulation domains. There are three nested domains: domain 01 (d01), domain 02 (d02), and domain 03 (d03).

01 (d01), domain 02 (d02), and domain 03 (d03). **Table 5.** Domain configuration and parameter settings of the Weather Research and Forecast (WRF) model.


Shortwave radiation RRTMG RRTMG RRTMG Land-surface Noah Noah Noah Cumulus convention Kain–Fritsch Kain–Fritsch Not set The data used to generate the initial condition and boundary forcing of the model were Final Operational Global Analysis Data (FNL) [22], which were provided by the National Centre for Environmental Prediction (NCEP). The spatial resolution of FNL data was 1 degree (both in latitude and longitude), and the temporal resolution was 6 h.

Longwave radiation RRTMG RRTMG RRTMG

PBL YSU YSU YSU The data used to generate the initial condition and boundary forcing of the model were Final Operational Global Analysis Data (FNL) [22], which were provided by the National Centre for The background error covariance matrix used in 3DVar data assimilation was generated by the NMC method [23]. To calculate the background error, a one-month simulation was made from 1 Jan. 2012 to 1 Feb. 2012. The simulation had the same model settings as Test 1, 2, and 3, and it contained a 12-h forecast and 24-h forecast results at both 00:00 and 12:00 UTC. Then, the 62 pairs of results from the 31 days were used to calculate the background error covariance.

The NCEP GDAS Satellite Data [24] were used as the input data for satellite data assimilation. The satellite data sensors included AMSUA, HIRS, MHS, and AIRS. Table 6 shows the types of platforms

and sensors. In order to process the satellite data before data assimilation, the Community Radiative Transfer Model (CRTM) was used as the radiative transfer model. The CRTM model can look up the Cloud coefficient, Surface Emissivity coefficient, and Aerosol coefficient and eliminate the satellite data bias caused by cloud, land, and aerosol. Table 7 shows the resolution of satellite data. Since some sensors (MHS, HIRS, and AIRS) have higher resolutions than domain 01 (27 km), we applied data-thinning to domain 01 in the data assimilation.


**Table 6.** Satellite data used for data assimilation. Contains several sensors (AMSUA, HIRS, and MHS, AIRS) on EOS, METOP, and NOAA platforms.

**Table 7.** Resolutions of satellite data.


The conventional observations used in Test 3 includes NCEP ADP Global Surface Observational Weather Data [25] and NCEP ADP Global Upper Air Observational Weather Data [26]. Most of the conventional observations used in Test 3 are synoptic observations. Figure 3 shows the locations of synoptic observation stations.

Figure 4 shows the method of the WRF run in the three tests. Because we needed to obtain values of wind speed every 10 min instead of long-term variability, we started the model every day in 2012, and each run of the model made just a one-day simulation. The reason for this was that long-time running depends on boundary forcing, which can capture long-term variability, but it cannot accurately simulate the results at every moment.

In Test 1, we started the WRF model at 18:00 UTC every day and took 6 h from 18:00 to 00:00 UTC as the spin-up time. Then, from 00:00 to 00:00 UTC the next day, the model output the simulated wind speed as the results of Test 1. Since the time interval of the observation data was 10 min, the time interval of the wind speed output of the model was also set to 10 min.

In Test 2 and Test 3, we used the WPS (WRF preprocessing system) output of 18:00 UTC as the first guess field, and then we assimilated the satellite observation data by using the 3DVar, and used the 3DVar output as the initial field of the WRF model. From 18:00 UTC to 00:00 UTC, the model was run as a spin up process. At 00:00 UTC (+1 day), 06:00 UTC (+1 day), 12:00 UTC (+1 day), and 18:00 UTC (+1 day), the data assimilation system was run four times, each time using the WRF output as the first guess field and the satellite data as the observation input. Weather Data [25] and NCEP ADP Global Upper Air Observational Weather Data [26]. Most of the conventional observations used in Test 3 are synoptic observations. Figure 3 shows the locations of synoptic observation stations.

The conventional observations used in Test 3 includes NCEP ADP Global Surface Observational

Remote Sens. 2020, 12, 973 8 of 30

Table 7. Resolutions of satellite data.

Sensor Resolution AMSUA ~50 km MHS ~17 km AIRS ~13.5 km HIRS ~10 km

Figure 4. Model running process for Tests 1, 2, and 3. Test 1 (left) started the model at 18:00 UTC every day, with 6 hours spin up and 24 hours model run. Tests 2 and 3 (right) started the model at 18:00 UTC every day and assimilated satellite/conventional data at 18:00 UTC, 00:00 UTC (+1day), 06:00 UTC (+1day), 12:00 UTC (+1day), and 18:00 UTC (+1day). **Figure 4.** Model running process for Tests 1, 2, and 3. Test 1 (left) started the model at 18:00 UTC every day, with 6 h spin up and 24 h model run. Tests 2 and 3 (right) started the model at 18:00 UTC every day and assimilated satellite/conventional data at 18:00 UTC, 00:00 UTC (+1day), 06:00 UTC (+1day), 12:00 UTC (+1day), and 18:00 UTC (+1day).

Unlike the long-time cycling run of WRF-3Dvar, our model was cold-started daily based on FNL data. In Tests 1, 2, and 3, we used FNL data to generate the initial field, to run the WRF model for 30 hours, and to begin the next day's run. The only difference between Test 1 and Tests 2 and 3 was that Unlike the long-time cycling run of WRF-3Dvar, our model was cold-started daily based on FNL data. In Tests 1, 2, and 3, we used FNL data to generate the initial field, to run the WRF model for 30 h, and to begin the next day's run. The only difference between Test 1 and Tests 2 and 3 was that Test 2 and

Test 2 and Test 3 applied data assimilation five times in each WRF run. Another similar wind resource

1

ୀଵ

Here, is the total grid number in domain 03; ଷ௩ is the U or V wind speed after 3Dvar;

We calculated the MAD of U and V at 4 different 3Dvar times, in the simulation, including 00, 06, 12, and 18 UTC. The results of different height and different time MAD are shown in Figure 5; we can find that the MAD is stable at 00, 06, 12, and 18, so there is virtually no difference between the

หଷ௩ <sup>−</sup> <sup>ห</sup>

(1)

3 =

and is the U or V wind speed of first guess field.

typical operational cycling run and our daily cold-start run.

Test 3 applied data assimilation five times in each WRF run. Another similar wind resource evaluation work [27] explained that this way avoids model divergence and the accumulation of truncation errors, and the WRF simulations used in that research were 2-day-restart runs. We also calculated the 3Dvar mean absolute difference (MAD) of U and V in domain 03 in Test 2, as follows:

$$\text{GDvar } MAD = \frac{1}{n} \sum\_{i=1}^{n} \left| \mathbf{S}\_{3Dur} - \mathbf{S}\_{fg} \right| \tag{1}$$

Here, *n* is the total grid number in domain 03; *S*3*Dvar* is the U or V wind speed after 3Dvar; and *Sf g* is the U or V wind speed of first guess field.

We calculated the MAD of U and V at 4 different 3Dvar times, in the simulation, including 00, 06, 12, and 18 UTC. The results of different height and different time MAD are shown in Figure 5; we can find that the MAD is stable at 00, 06, 12, and 18, so there is virtually no difference between the typical operational cycling run and our daily cold-start run. Remote Sens. 2020, 12, 973 10 of 30

Figure 5. Mean absolute difference (MAD) of U (m/s) and V (m/s) at different heights (10, 50, and 70 **Figure 5.** Mean absolute difference (MAD) of U (m/s) and V (m/s) at different heights (10, 50, and 70 m) in Test 2.

After obtaining the model results, we first interpolated the three-dimensional wind field into heights of 10, 50, and 70 m, using linear interpolation. Then, we interpolated the wind field into the wind tower's latitude and longitude, where the horizontal interpolation method was bilinear After obtaining the model results, we first interpolated the three-dimensional wind field into heights of 10, 50, and 70 m, using linear interpolation. Then, we interpolated the wind field into the wind tower's latitude and longitude, where the horizontal interpolation method was bilinear interpolation. We used the results of the interpolation to compare with the observed wind speed.

interpolation. We used the results of the interpolation to compare with the observed wind speed.

#### *2.3. Results Measurements*

output.

result, and

2.3.2. Index of Agreement (IA)

can be calculated by the following:

m) in Test 2.

2.3. Results Measurements In order to evaluate the results of the different tests, the following evaluation indices were In order to evaluate the results of the different tests, the following evaluation indices were calculated to evaluate the errors and correlations between model results and observation data.

#### calculated to evaluate the errors and correlations between model results and observation data. 2.3.1. Root-Mean-Square Error (RMSE)

2.3.1. Root-Mean-Square Error (RMSE) The root-mean-square error (RMSE) is widely used in NWP to evaluate the error of wind speed and other meteorological variables. Since the observation data has a 10-minute time resolution, we The root-mean-square error (RMSE) is widely used in NWP to evaluate the error of wind speed and other meteorological variables. Since the observation data has a 10-minute time resolution, we used the 10-minute model output and calculated the RMSE of the 10-minute model wind speed output.

$$RMSE = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} (M\_i - O\_i)^2} \tag{2}$$

is the value of the model

0 ≤ ≤ 1 (3)

 = ඩ 1 ( <sup>−</sup> ) ଶ ୀଵ (2) Here, *n* is the number of wind speed observations in each month, *M<sup>i</sup>* is the value of the model result, and *O<sup>i</sup>* is the observation value.

Here, is the number of wind speed observations in each month,

ୀଵ

investigate the agreement level of the model output to the wind speed observations.

is the observation value.

= 1 −

where ത is the average value of the total observation data.

matched results, and 0 indicates no agreement at all.

2.3.3. Pearson Correlation Coefficient (R)

∑ (| − ത| + | − ത|) ଶ

ୀଵ

ଶ

The Index of Agreement varies between 0 and 1, where a value of IA close to 1 indicates well-

The index of agreement can detect additive and proportional differences in the observed and simulated means and variances [31]. We calculate the IA of the 10-minute model wind speed output to

The Pearson Correlation Coefficient (R) is also widely used to evaluate the performance of wind speed simulation of NWP. It reflects the correlation between wind speed simulation series and

Index of Agreement is a standardized measure of the degree of model prediction error [28–30]. It

#### 2.3.2. Index of Agreement (IA)

Index of Agreement is a standardized measure of the degree of model prediction error [28–30]. It can be calculated by the following:

$$IA = 1 - \frac{\sum\_{i=1}^{n} \left(M\_i - O\_i\right)^2}{\sum\_{i=1}^{n} \left(\left|M\_i - \overline{O}\right| + \left|O\_i - \overline{O}\right|\right)^2} \text{ o } \le IA \le 1\tag{3}$$

where *O* is the average value of the total observation data.

The Index of Agreement varies between 0 and 1, where a value of IA close to 1 indicates well-matched results, and 0 indicates no agreement at all.

The index of agreement can detect additive and proportional differences in the observed and simulated means and variances [31]. We calculate the IA of the 10-minute model wind speed output to investigate the agreement level of the model output to the wind speed observations.

#### 2.3.3. Pearson Correlation Coefficient (R)

The Pearson Correlation Coefficient (R) is also widely used to evaluate the performance of wind speed simulation of NWP. It reflects the correlation between wind speed simulation series and observation series. If the model output has a high level of R, the error can be largely corrected by postprocessing algorithms.

$$R = \frac{\text{Cov}(M, O)}{\sqrt{Var[M]Var[O]}} \tag{4}$$

Here, *Cov*(*M*, *O*) represents the covariance of the model results and observation wind speed, and *Var*[*M*] and *Var*[*O*] represent the variance of the model results and observation wind speed. These variables can be calculated as follows:

$$\text{Cov}(M, O) = \sum\_{i=1}^{n} \left( M\_i - \overline{M} \right) \left( O\_i - \overline{O} \right) \tag{5}$$

$$Var[M] = \sum\_{i=1}^{n} \left( M\_i - \overline{M} \right)^2 \tag{6}$$

$$Var[O] = \sum\_{i=1}^{n} \left( O\_i - \overline{O} \right)^2 \tag{7}$$

In our tests, we calculate the R between the 10-minute model wind speed output and 10-minute wind speed observations. The R results were also compared, to evaluate the simulation results and the improvements of data assimilation.

#### 2.3.4. Weibull Distribution of Wind Speed

In general, the distribution of near-surface wind speed can be fitted by Weibull distribution [32]. The probability density function of Weibull distribution is as follows:

$$f(\mathbf{x}; \lambda, k) = \frac{k}{\lambda} \left(\frac{\mathbf{x}}{\lambda}\right)^{k-1} e^{-(\frac{\mathbf{x}}{\lambda})^k} \tag{8}$$

where *x* is the wind speed, *k* > 0 is the shape parameter, and λ > 0 is the scale parameter of the Weibull distribution.

The Weibull distribution has been widely used in the wind resource assessment because, before the wind farm construction, the wind speed distribution must be evaluated in order to calculate the amount of electric power the wind farm can generate. The two parameters can be used to determine whether the distribution of wind speed simulation results is similar to the observations. If the parameters of wind simulation results are close to the observation, the distribution of model result can reflect the true wind distribution well, and the model results can be used in the wind resource assessment.

## **3. Results**
