Estimation of Volume Variation Time Series

Figure 21 shows the resulting time series of volume variations with errors estimates for Palestine Lake. The time series shows volume variations up to 0.25 km<sup>3</sup> over the last four decades. For validation, the volume time series provided by USGS/TWDB is used. It contains volumes which have been calculated by using three different hypsometric curves. Additionally, the time series contains extrapolated volumes above the extrapolation limit (dashed black) where no information about the bathymetry is available because of the measurement technique itself. Therefore, the volume time series used for the validation was split into three phases for which constant offsets are applied because of the different hypsometric curves. The latest period since 1 August 2012 is used as reference for which no offset is applied. For the period between 2 January 1999 and 1 August 2012, an offset of <sup>−</sup>0.015 km<sup>3</sup> is applied. For the volumes until 2 January 1999, an offset of <sup>−</sup>0.055 km<sup>3</sup> is applied. It can be clearly seen that the periods of applied offset do not match with the hypsometric curves from TWDB. Furthermore, this discrepancy was cross-checked by taking water levels from in-situ data into account.

**Figure 21.** Volume time series of Palestine Lake.

The validation of the resulting time series of volume variations with absolute volumes from TWDB yields an RMSE of 0.017 km<sup>3</sup> and a correlation coefficient of 0.82 using 520 data points. The differences between in-situ volume and volume changes vary between <sup>−</sup>0.06 km<sup>3</sup> and 0.06 km<sup>3</sup> considering an offset of 0.264 km<sup>3</sup> . The relative error with respect to the volume variations is 13.0% which is quite high due to the small fluctuations in the volume time series, except in the years 1996, 2005/2006 and 2011 when a volume decrease occurred. When looking at the errors with respect to the full volume, a value of 3.3% is yielded. This accuracy is sufficient for most hydrologic applications. The example of Palestine Lake shows that despite the small the RMSE (0.13 m) of used water levels from satellite, the resulting quality of the volume variations is rather poor, indicating that the number of points and their distribution between minimum and maximum water level has a strong influence on the quality of the hysometry and resulting volume variations. Overall, this example shows that reliable time series of volume variations can be estimated based on remote sensing data, even if the input data sets are of minor quality with short periods of altimeter data.

#### *5.2. Quality Assessment and Discussion*

In this paper, 28 water bodies shown in Table 1 have been investigated in order to demonstrate and validate the new approach introduced in Section 4 for estimating time series of volume variations of lakes and reservoirs. After a detailed discussion of three selected reservoirs (Ray Roberts Lake, Hubbard Creek Lake, Palestine Lake), a general quality assessment of all 28 water bodies is performed in this Section.

All water bodies are located in Texas where detailed validation data is provided by TWDB. The water bodies have different characteristics regarding water level, surface area and volume. The lake sizes cover a range between less than 2.4 km<sup>2</sup> for Medina Lake and about 782 km<sup>2</sup> for Toledo Bend Lake. The long-term changes in water level vary between 2.77 m for Cedar Creek Lake and 30.20 m for Medina Lake. The surface areas vary between 2.62 km<sup>2</sup> for Bardwell Lake and 272.03 km<sup>2</sup> for Texoma Lake. Based on water levels and surface areas, the resulting volumes vary between 0.062 km<sup>3</sup> for Bardwell Lake and 6.041 km<sup>3</sup> for Toledo Bend Lake.

Table 2 summarizes the detailed quality assessment of all 28 water bodies. In order to perform a reliable quality assessment of our results, the time series of surface areas and volumes provided by TWDB have to be homogenized in advance. For consistency, we correct offsets in in-situ data if necessary which can be caused by changes in the applied hypsometry models. Since values from surface areas, respectively volumes are mainly not available for higher water levels, an extrapolation of the hypsometric curves and resulting values is performed by the TWDB. Therefore, extrapolated values in the validation data are truncated if unreliable. The quality assessment is performed for the used water levels and surface areas but also for the resulting hypsometry and volume variations. Therefore, we estimate the RMSE for all data sets, Pearson correlation coefficients (water levels, surface areas, volume variations) and Spearman correlation coefficient for the hypsometry. Additionally, a relative error with respect to the variations is estimated:

$$Error\_{Relative} = \frac{RMSE}{P\_{\ $5}(v) - P\_{\$ }(v)} \ast 100[\%] \tag{3}$$

To estimate the relative error in Equation (3), the RMSE is divided by the difference of the 95% percentile and the 5% percentile of the values *v*. In this study, *v* are the resulting water levels, surface areas and volume variations. In addition, an error with respect to the full volume is estimated based on the absolute volumes provided by TWDB for validation. This allows us also to compute the water volume below the lowest remote sensing observations (i.e., the offset of our volume change time series).

For quality assessment of the 28 water bodies, the water levels and surface areas used for the volume estimation are analyzed first. For each water body the number of available water levels from satellite altimetry and surface areas from optical imagery are given. The water levels from satellite altimetry are validated with in-situ data provided by TWDB/USGS. The resulting correlations coefficients vary between 0.81 and 1.00 (average: 0.94), whereas the RMSE values vary between 0.13 m and 0.78 m (average: 0.25 m). The relative error for the water levels varies between 1.7% and 16.3% (average: 7.2%). The quality assessment of the used surface areas as input data results in correlations coefficients which vary between 0.80 and 1.00 (average: 0.95), whereas the RMSE values vary between 0.13 km<sup>2</sup> and 21.75 km<sup>2</sup> (average: 2.74 km<sup>2</sup> ). The relative error for the surface areas varies between 2.2% and 23.5% (average: 9.2%).

The quality of the computed hypsometry model shown in Table 2 varies strongly depending on the accuracy of used input data but mainly on the number of used pairs and their data distribution. The correlation coefficients of the hypsometry compared to data used for fitting varies between 0.64 and 0.99 (average: 0.89). It can be clearly seen that there is no dependency between the correlation coefficients derived from the used input data and the hypsometry model. The resulting RMSE values vary between 0.15 m and 0.53 m (average: 0.32 m).

Finally, the resulting volume variations area validated with absolute volume time series provided by TWDB. The correlation coefficients vary between 0.80 and 0.99 (average: 0.94) and the RMSE between 0.002 km<sup>3</sup> and 0.166 km<sup>3</sup> (average: 0.025 km<sup>3</sup> ). For comparison, the relative errors are provided which vary between 2.8% and 14.9% (average: 8.3%). However, the relative errors depend on the volumes variations, therefore we provide errors with respect to the full volume additionally. These errors vary between 1.5% and 6.4% (average: 3.1%). For the sake of completeness, the missed volumes below the lowest observation (volume offsets) are also given for the 28 investigated water bodies.


**Table 2.** Quality assessment for 28 selected water bodies. For each water body, the validation results with in-situ data are shown for the used water levels from altimetry and surface areas from optical images (number of data sets, Pearson correlation, RMSE, relative error). Additionally, the resulting quality of the estimated hypsometry model is given (number of data sets, Spearman correlation, RMSE). Finally, the validation results for the volume variations are presented (number of data sets, Pearson correlation, RMSE, relative error, absolute error, offset).





 Extrapolated validation data truncated; 2 Offset resulting from different hypsometry models applied.

Even if the RMSE of most volume variation time series is very good, we can see a spread in the quality of all target under investigation. In the following, the most important criteria defining this quality will be investigated. The most important factor is the quality of the input data, which turned out to be crucial for the estimation of volume changes. Figure 22(left) shows the impact of correlation coefficients from water levels and surface areas on the resulting correlation coefficients of volume variations. It can be clearly seen that higher correlation coefficients of the input data result in higher correlation coefficients for the volume variations. For example, if both input data have a correlation larger than 0.90, the resulting correlation is also larger. Since the areas are available over the entire period, it is more important to have accurate water levels that, in the best case, cover the entire range of surface areas. Figure 22(right) shows the effects of the relative errors of water levels and surface areas on the resulting relative errors of volume variations. Here, too, the strong dependency is clearly visible. The smaller the relative errors of the input data are, the smaller are the relative errors from the volume variations. Overall it can be said that the quality of the input data, the correlation coefficients and the relative errors allow us to assess the quality of the resulting volume variations.

**Figure 22.** Impact assessment of the used input data (water levels, surface areas) on the resulting quality of volume variations. The dependency of the correlation coefficients *R* <sup>2</sup> and relative errors of the three data sets are shown for all water bodies. (**a**) Quality assessment of correlation coefficients R<sup>2</sup> . (**b**) Quality assessment of relative errors.

In further analyses, we will now examine the impact of the characteristics of the water bodies on the resulting volume variations. Therefore, we compare the resulting relative errors of the volume variations but also absolute volume errors with the maximum surface area, surface area variations, water level variations and maximum volumes for the 28 water bodies (Figure 23). For none of the four parameters a clear influence on the resulting errors can be seen. This proves that the quality of volume estimates are generally independent of the lake's characteristics. Errors less than 4% of the average lake volume can be achieved regardless of the water body characteristics.

**Figure 23.** Impact assessment of water body characteristics (maximum surface area, surface area variation, water level variation, maximum absolute volume) on the resulting relative and absolute errors. (**a**) Relative and absolute errors compared to the maximum surface areas. (**b**) Relative and absolute errors compared to the surface area variations. (**c**) Relative and absolute errors compared to the water level variations. (**d**) Relative and absolute errors compared to the maximum volumes.

#### **6. Conclusions**

The paper presents an improved approach for estimating time series of volume variations for lakes and reservoirs by combining water levels from satellite altimetry and surface areas from optical imagery. Both input data sets are derived from time series publicly available from the "Database for Hydrological Time Series of Inland Waters" (DAHITI). In a first step, a hypsometry model based on water levels and surface areas is calculated. For this purpose, a modified Strahler approach has been developed, which is optimized for non-continuous data sets. The fitted hypsometric curve is used to derive corresponding water levels for all surface areas. This results in a combined long-term water level time series based on satellite altimetry and surface areas. In the next step, all land-water masks and corresponding water levels are stacked in order to estimate a bathymetry between the minimum and maximum observed surface area. Finally, the bathymetry is intersected with water levels from satellite altimetry and surface areas in order to estimate volume variations with respect to the minimum observed surface area. The data holding of DAHITI has been extended by that new product. Additionally, all side products, namely the hypsometry models, bathymetries and the reconstructed long-term water level time series are available on DAHITI.

The performance of this new approach is assessed for 28 lakes and reservoirs located in Texas, United States. The results are compared with volume time series which are derived from water levels of in-situ stations and local bathymetric surveys. The average RMSE for all water bodies is 0.025 km<sup>3</sup> , corresponding to 8.3% with respect to the variations and 3.1% with respect to the overall volume. The validation shows that the quality of the resulting volume time series strongly depends on the quality of the used input data. If correlation coefficients *R* <sup>2</sup> of water levels and surface areas with in-situ data are larger than 0.90, then the resulting correlation coefficients *R* <sup>2</sup> of the volume variations are almost always larger than 0.90. For the 28 investigated water bodies, the resulting correlation coefficients *R* <sup>2</sup> of the volume variations vary between 0.80 and 0.99.

It can be concluded that on the basis of precise water levels from satellite altimetry and precise surface areas from optical imagery in combination with the modified Strahler approach for estimating hypsometric curves, very accurate time series of volume variations can be achieved, also for smaller water bodies (≤6.0 km<sup>3</sup> , <sup>≤</sup>782 km<sup>2</sup> ). The approach provides consistent long-term time series starting in 1984 without inconsistencies caused by changes in the vertical datum or recalculation of hypsometric curves, as can be contained in in-situ data sets. Since the method is solely based on remote sensing data, it can be easily applied on a global scale and to remote areas without human infrastructure. In addition to the volume variation time series, this new approach provides further products such as reconstructed water levels based on surface areas for periods since 1984, when altimetry data is not yet available. Additionally, high-resolution bathymetry data sets between the minimum and maximum observed water level are provided.

#### **7. Data Availability**

All presented volume variation time series, hypsometric curves, reconstructed water level time series derived from surface areas and bathymetry data as well as results for many additional targets are freely available in DAHITI at https://dahiti.dgfi.tum.de.

**Author Contributions:** C.S. calculated the input data (water levels, surface areas), developed the approach for estimating volume variations, performed the validation and wrote the paper. D.D. and F.S. contributed to the manuscript writing and helped with the discussions of the applied methods and results. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding. The APC was funded by the Technical University of Munich (TUM) in the framework of the Open Access Publishing Program.

**Acknowledgments:** We thank the Texas Water Development Board (TWDB, https://www.waterdatafortexas.org) for providing water levels, surface areas, volumes and rating curves for the 28 lakes and reservoirs used in this study. We also the US Geological Survey (USGS) for in-situ data incorporated in products of the TWDB. We also thank the reviewers for helpful comments and suggestions which helped us to improve the quality of this paper.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

**Wenqing Xu <sup>1</sup> , Like Ning 2,3 and Yong Luo 1,\***


Received: 19 February 2020; Accepted: 16 March 2020; Published: 17 March 2020

**Abstract:** With the development of the wind power industry in China, accurate simulation of near-surface wind plays an important role in wind-resource assessment. Numerical weather prediction (NWP) models have been widely used to simulate the near-surface wind speed. By combining the Weather Research and Forecast (WRF) model with the Three-dimensional variation (3DVar) data assimilation system, our work applied satellite data assimilation to the wind resource assessment tasks of coastal wind farms in Guangdong, China. We compared the simulation results with wind speed observation data from seven wind observation towers in the Guangdong coastal area, and the results showed that satellite data assimilation with the WRF model can significantly reduce the root-mean-square error (RMSE) and improve the index of agreement (IA) and correlation coefficient (R). In different months and at different height layers (10, 50, and 70 m), the Root-Mean-Square Error (RMSE) can be reduced by a range of 0–0.8 m/s from 2.5–4 m/s of the original results, the IA can be increased by a range of 0–0.2 from 0.5–0.8 of the original results, and the R can be increased by a range of 0–0.3 from 0.2–0.7 of the original results. The results of the wind speed Weibull distribution show that, after data assimilation was used, the WRF model was able to simulate the distribution of wind speed more accurately. Based on the numerical simulation, our work proposes a combined wind resource evaluation approach of numerical modeling and data assimilation, which will benefit the wind power assessment of wind farms.

**Keywords:** data assimilation; WRF; WRFDA; 3DVar

#### **1. Introduction**

Energy from fossil fuels has played a major role in the development of modern human civilization, but it also brings serious environmental problems and climate issues, such as atmospheric environmental pollution and global warming. The development of renewable energy is one of the major ways to solve environmental problems and achieve sustainable development; wind energy has been developed rapidly as the main clean and renewable energy.

In 2018, China installed an additional wind power capacity of 21 GW and thus has total wind power capacity of more than 200 GW [1]. Before the construction of wind farms, wind resources in wind farm areas need to be evaluated, and the location selection of a wind farm is mainly based on the results of the wind resource assessment. Therefore, wind speed simulation in the wind farm area is a key issue in the development of wind power.

After many years of development, wind speed simulation in wind resource assessment and prediction now has two methods: the statistical method and numerical simulation. Costa et al. [2] made a brief review about the development of the short-term wind speed forecast during 30 years of history, highlighting that the main forecast method has changed from the statistical model into the numerical model, and that the integration between both models has also begun to be used. Storm et al. [3] used the Weather Research and Forecast (WRF) [4] model to simulate the LLJ (low-level jet), and the model was able to capture some characteristics of LLJ, which indicates that WRF model can be used for short-term wind energy simulation.

In order to improve the accuracy of the numerical weather model in wind speed simulation, there are two approaches: (1) developing the physical parameterization scheme to improve the wind simulation performance at near-surface levels and (2) applying the data assimilation to improve the initial condition of the atmosphere. Some research studies evaluated the parameterization scheme chosen and planetary boundary layer (PBL) development [5–8]. In addition to the selection and improvement of the PBL scheme, data assimilation is also widely used to improve the wind simulation results of the numerical model.

Liu et al. [9] combined the WRF model with a data assimilation system and a large eddy simulation (LES) model, which increased wind energy simulation resolution to the level of LES. Zhang et al. [10] used the WRF model and data assimilation to forecast near-surface wind speed. In this work, the conventional observations and infrared satellite observations were used to improve the model output wind speed by the 3DVar. The results showed that, with the improvements of the initial fields, the assimilation of conventional observations and infrared satellite observations significantly improved the wind forecast results. Ancell et al. [11] compared the effects of the ensemble Kalman filter and 3Dvar data assimilation on wind forecasting. The results showed that the EnKF assimilation effect is better than the 3DVar assimilation for 24-h forecasting. Ulazia et al. [12] compared different data assimilation schemes and found that the assimilation at an interval of six hours has a better effect on the simulation of wind speed than at an interval of 12 h. The study also suggested applying data assimilation techniques to mesoscale weather models in wind resource assessment. Che et al. [13] developed a system to predict wind speed at turbine height. The Kalman filter algorithm was used to assimilate the cabin wind data after quality control, and the wind speed prediction of the WRF model was improved. The study also pointed out that data assimilation can effectively reduce random errors and is more important in rare or extreme weather conditions. Ulazia et al. [14] used the WRF model to estimate the offshore wind energy resources on the Iberian Mediterranean coast and the Balearic Islands. The results of data assimilation and no data assimilation were compared. The results showed that the bias of the wind speed simulation after the 3DVar data assimilation was significantly reduced. Cheng et al. [15] improved short-term (0–3 h) wind energy forecasting by assimilating wind speed observed in wind turbines into a numerical weather forecast system. The results showed that the assimilation of wind speed can reduce the average absolute error of the wind speed forecast for 0–3 h by 0.5–0.6 m/s.

As can be seen from related works, data assimilation can improve short-term wind speed simulation results by changing the initial field and providing real-time updates during the model run. An efficient way is dividing long-time simulation into multiple short-time simulations, using the previous numerical weather prediction (NWP) output as the "first guess" field, and then applying data assimilation to update the initial condition and to continue the next short-time run.

In this paper, we used the WRF (Weather Research and Forecast) model to make a one-year wind speed simulation on the coastal wind farm area in Yangjiang, Guangdong. Furthermore, the 3DVar data assimilation was used to assimilate the satellite radiation data. The observation data of seven wind observation towers were used to measure the simulation results and to calculate the improvements of data assimilation on near-surface wind speed simulation. The remainder of this paper is organized as follows: Section 2 mainly introduces the experiment, the data, and the results of the measurement methods; Section 3 analyzes the results of the different tests; Section 4 discusses the results of this article compared with other work; and Section 5 presents the main conclusions.
