2.1. Study Area
This study is focused on a host of lakes located in the Biosphere Reserve of La Mancha Húmeda, a wetland-rich region, the largest in the Iberian Peninsula, comprising up to 30,000 ha holding wetlands and lakes [
28] distributed within the provinces of Albacete, Ciudad Real, Cuenca, and Toledo, in the Castilla-La Mancha region (Central Spain) (
Figure 1). In 1981, UNESCO designated the area as the Biosphere Reserve of La Mancha Húmeda within the Man and Biosphere Programme (MAB), a scientific program to promote improved relationships between people and their environments. In 2014, this Biosphere Reserve was recognized as one of the largest wetland areas in Europe, and many of its wetlands are included in the Natura 2000 Network (European Habitats and Birds Directives) and the Convention on Wetlands of International Importance, called the RAMSAR Convention. Located in a very flat area (La Mancha), the Biosphere Reserve includes floodplain lagoons and, particularly a variety of saline lakes, mostly endorheic, and represents one of the main saline lake districts in Europe.
A set of 13 shallow saline lakes were selected for this study (
Table 1), all located between 638 and 690 m above sea level. Most are temporary lakes located within the Záncara and Cigüela River Basins in agricultural landscapes. They have small watersheds, mostly covered by vineyards and cereal crops. Most of the lakes have a marginal belt of homophilous plants, and some also have helophytic vegetation in areas where the water is influenced by the discharges of treated wastewater, and thus, salinity is lower in these lakes. The main water inflows to these lakes come from direct precipitation, runoff of small basins, groundwater recharge in some cases and even include some cases of treated wastewater spills from nearby towns. Most of the lakes lack surface outlets and behave as endorheic systems with evaporation as the main water withdrawal process, causing salt accumulation in the lake beds; thus, these lakes range from mesosaline to hypersaline. Their hydroperiods fluctuate, though these lakes are mostly temporary as a result of the mixed Mediterranean-Continental semiarid climate patterns [
28], which can be illustrated by the flooding pattern of Lake Alcahozo over a specific time horizon (
Figure 2). Climate in this area shows a pronounced dry season with average annual rainfall of 400–500 mm [
29]. Sediments of the studied area are of continental nature; the terrain is flat; and the dominant lithology is limestone [
2].
Lakes in La Mancha Húmeda have rapidly deteriorated because of the alteration of their hydrological patterns and pollution, the former mainly linked to the continuous overexploitation of the aquifers within the last 50 years due to increased agricultural demand [
30]. This water extraction caused an unsustainable draft of the aquifers. Many of the lakes within the Biosphere Reserve were drained-out during the 20th century. Others, because of the salinity and the impossibility for agricultural use, were used for landfills or wastewater disposal by the nearby towns.
2.3. Remote Sensing Data Collection and Pre-Processing
The ETM+ onboard Landsat 7 platform was used to conduct this study. All of the images are available free of charge at the United States Geological Survey website. The Landsat cloud-free images downloaded from the EarthExplorer visor (
http://earthexplorer.usgs.gov/) correspond to the surface reflectance product corrected from the atmospheric contribution with the 6S radiative transfer code (CDR_sr). The downloaded scenes were synchronous or close in time to the reference data (
Table 2). The path and row corresponding to our study area belong to 200/33 and 201/32–33, respectively.
Satellite image pre-processing was required prior the application of the different methods. All scenes were cut to fit our area of interest, and a normalization of the images was then conducted [
31,
32] using the image of July 2014 as a reference. The iteratively-reweighted multivariate alteration method (IRMAD) was applied [
33] to minimize the spectral variability caused by seasonal sun-surface-sensor effects [
31]. The IRMAD technique developed for automatic radiometric normalization of multi-spectral and hyper-spectral images allowed us to find linear combinations between the reference and target image bands used to generate a pair of new multispectral images by using canonical correlation analysis. The components of the new images were called canonical variates. This IRMAD technique considers that the reflectance values of some areas in the same scene acquired at different times would be changed, but not everywhere. With this assumption, pixels with the fewest differences between the canonical variates were labeled as the pseudo-invariant pixels, which were then used to normalize each image band-by-band to the reference image. The linear regression equations used to spectrally align each of the six bands of an image were obtained with regression coefficients (r
2) > 0.90 and root mean square errors (RMSE) <10%. After this pre-processing, all images were visually compared to ensure that they were co-registered correctly. If so, no further corrections or adjustments were necessary. A correction of the scan line corrector failure was performed following the methodology proposed by Scaramuzza et al. [
34]. To isolate our study area (lakes), a water mask was manually digitalized from a Landsat 5-TM image (May 2010), corresponding to a wet year, to mark out the maximum flooding area of the lakes.
2.4. Water Mapping Methods
Several methods were tested to estimate the absence or presence of water in the lakebeds of the studied lakes (
Figure 3). For a rigorous comparison, we focused on the data from the intensive field campaign in July 2014 and on five tested lakes (Alcahozo, Camino de Villafranca, El Longar, La Veguilla, Las Yeguas and Manjavacas) (
Figure 4). The comparison was established between two-band vegetation indices, two-band water indices, single band threshold, classification methods, Artificial Neural Network (ANN), Support Vector Machine (SVM) and Genetic Programming (GP) algorithms. Note that the Landsat image used in this section (21 July 2014) will be later considered as the reference for the normalization procedure applied to the full image dataset. Regarding two-band spectral indices for vegetation, both the NDVI [
35] and the SAVI [
36] were tested. Within the two-band spectral indices for water, several variants of the NDWI were studied, including the Modification of the Normalized Difference Water Index (MNDWI) and the index proposed by Ángel-Martínez [
37] (I_CEDEX) where CEDEX is the Centre for studies and experimentation on public works in Madrid, Spain. The approach proposed by Bustamante et al. [
38] based on the MIR band and the single-band threshold with the NIR used by [
39] were tested. Both supervised and unsupervised classification methods using different spectral bands were also tested. Unsupervised classification methods include k-means and the Iterative Self-Organizing Data Analysis Technique (ISODATA), whereas supervised classification included the parallelepiped method, minimum and Mahalanobis distance and the maximum likelihood method (
Figure 3).
Machine learning algorithms were also considered, including SVM, GP and ANN, a machine-learning method inspired by the human brain function. These machine learning techniques are all nonparametric classification techniques that require no assumptions about the distribution of the data and thus need no a priori knowledge about the characteristics of feature data. An ANN model is based on three different layers: input layer (i.e., input data include reflectance bands and/or water-vegetation indices), one or more hidden layers and the output layer (i.e., dichotomous output includes either water or non-water class) [
40]. The SVM is a supervised machine learning technique based on statistical learning theory [
41] used to find boundary locations of different classes. In our study, different kinds of SVM were tested to carry out the classification, including linear, quadratic, cubic and Gaussian kernels. MATLAB
® software was used to implement ANN and SVM. Finally, genetic programming (GP) is a subclass of evolutionary computation techniques designed to search for the best fit to perform a user-defined task. GP can decode system behaviors based on empirical data for symbolic regression, uncover relationships and make inferences using association path analysis, classification, clustering and forecasting [
42]. A principal advantage of GP is that the solution methodology can learn the relationship between the inputs and outputs without any a priori knowledge or preconceptions, thus placing the burden of the discovery process primarily on the GP, reducing data contribution and preprocessing by the user [
43]. In this study, the user-defined task was to develop a GP model that uses the inputs of surface reflectance data associated with common bands and different two-band indices to predict the outputs, including water and non-water categories. We used Discipulus
® [
44] software to run the GP algorithms.
The ground-truth dataset was divided in two subsets for ANN, SVM and GP model training (2/3 of the total data; 56 data points) and validation (1/3 of total data; 28 data points). Finally, the ANN, SVM and GP algorithms yielding the best results were compared to the methods above. A flowchart of the different steps generated in the methodology (
Figure 4) shows that all of the pixel values used for the training/validation/testing of all methods were extracted before the correction of the Scan Line Corrector (SLC) failure of Landsat 7; thus, this correction did not cause any bias. A confusion matrix was obtained for each of the methods applied to rank all of the possible cases associated with these models in different categories, estimating whether or not the predicted value is consistent with the real value. To evaluate the consistency of our results, we used the confusion matrix to calculate the kappa coefficient (κ), an index ranging from −1 to +1; values higher than 0.4 are considered acceptable [
45]. The confusion matrix can also be used to generate the commission error and user accuracy. The commission error is the percentage of pixels wrongly assigned to a certain class by the classifier, whereas user accuracy is the probability that a pixel assigned to a class by the classifier correctly corresponds to that class. The omission error is the percentage of pixels that belong to the ground truth class, but were improperly classified, and the producer accuracy is the probability that the classifier has been correctly assigned to a class given by the ground truth data.
The method showing the best performance in terms of discrimination of water/non-water pixels in our test area was selected and applied to the full set of images listed in
Table 2 and to a variety of lakes different from those used in the algorithm training. Finally, the relationship between precipitation- evaporation and the lake water cover was analyzed for Lake Alcahozo, used as a model lake to explore the effect of meteorological variability characteristic of the Mediterranean climate during 2013–2015. Meteorological data were downloaded from the Servicio Integral de Asesoramiento al Regante de Castilla-La Mancha (SIAR) (
http://crea.uclm.es/siar/datmeteo/), which provides daily information of mean, absolute maximum and minimum values of temperature, humidity, wind speed, cumulative global solar radiation, daylight hours, precipitation and reference evapotranspiration estimated with the Penman–Monteith equation [
46]. In the present study, we focused on precipitation and reference evapotranspiration data to explore the hydrological cycle dynamics.
Finally, we include a brief discussion of climate elasticity as an application example. Climate elasticity of streamflow (e), an index commonly used to quantify the sensitivity of streamflow to meteorological pattern and climate change. It is defined as the proportional change in the streamflow (lake flooded area in our case) relative to the proportional change in a climatic variable, such as precipitation (p). The index was applied to our lake data, and the nonparametric estimator was used to calculate the climate elasticity [
47]. In particular, we focused on the variation between the area of a lake and the precipitation Equation (1).
where
ep is the climate elasticity relative to precipitation,
A and
P represent area and precipitation,
and
are the corresponding yearly mean values and
t is time.