1. Introduction
Salt pans are a special type of terrestrial wetlands, which are formed in relation to arid climates, topographic depressions, and salt-rich groundwater [
1,
2]. They can be defined as “(...) arid zone basins (...), subject to ephemeral surface water inundation of variable periodicity and extent” [
2]. Saline lakes, such as salt pans, are of vital importance for biodiversity and water management [
2,
3]; however, at the global scale, their number is declining mainly due to direct human intervention in their hydrology and climate change [
4,
5]. Although global data on salt pans are missing [
6], many case studies suggest a global trend toward salt pan degradation and decline [
2,
5,
7,
8,
9,
10]. These trends also apply to the salt pans in Seewinkel in eastern Austria [
11,
12], where key regional ecosystem functions are under threat. The lives of, among others, halophytes [
13], amphibians, reptiles [
14,
15], and birds [
16,
17] depend on these wetlands. Halophytes, such as communities of
Puccinellio-Salicornietea, require a high groundwater level facilitating capillary rise to ensure their water supply [
13]. Birds, such as the kentish plover (
Charadrius alexandrinus), use high water levels in spring (for hatching [
18]) and summer [
17], as do amphibians and reptiles [
14,
15]. In Central Europe, such ecosystems can only be found in the Pannonian Basin [
19] due to the unique tectonic conditions in the region [
20]. In recent years, processes such as eutrophication, paludification, siltation, overgrowth with vegetation, fragmentation, long-term drying, and in consequence, habitat loss, have accelerated [
21,
22]. These are largely connected to excessive groundwater drainage for land use change [
21,
23]. The potential impact of climate change on the salt pans in Seewinkel is not yet fully understood [
23], although small, geographically isolated wetlands reportedly react rather quickly to meteorological forcing [
24].
The salt pans in the Seewinkel region follow the salt pan cycle [
1], in which the dry basin is the default, central, and recurrent moment, which is alternated by its opposite state: the varying presence of water [
1,
25]. In summer, high evaporation rates in combination with an interruption of groundwater supply tend to outweigh precipitation [
14] leading to salt pan desiccation. Especially during late winter and early spring [
21], low evaporation rates allow precipitation combined with an increased contribution of groundwater to fill the basins. Wind contributes to important ecosystem processes as it influences evaporation rates and drives the mixing of water when the salt pan is inundated [
21]. It also strengthens capillary rise and blows out inorganic sediments from the salt pan basins during periods of desiccation [
20]. Salt pans in poor hydrological conditions lose additional water by surface water infiltrating into deeper soil layers [
14]. Thus, monitoring and predicting both the long-term and short-term variability of surface water occurrence in the Seewinkel salt pans is needed to assess ecosystem change and their resilience.
Wetland hydroperiod [
26], a key characteristic and ecological indicator of intermittent wetlands, such as salt pans [
19,
27], can be characterized by means of water height (WH), water extent (WE), or water volume (WV) [
2,
28,
29]. WH derived from in situ water level gauges offers the most reliable and temporally frequent source of information. However, water gauges provide merely vertical, locally tied measurements and are costly to install and maintain. Especially for salt pans, the water level gauge must be positioned at the deepest point due to increased drying towards the edges. In many regions of the world, long-term, automatic in situ measurements are not widely available, as is also the case for the Seewinkel region (web address:
https://wasser.bgld.gv.at/hydrographie/die-seen and
https://ehyd.gv.at/ (accessed on 14 August 2023)).
WE is especially suited for studying the inundation state of the salt pans due to their shallow topography so that small changes in water volume cause substantial changes in water surface area. WE can be reliably retrieved from Earth observation (EO) satellite data that provide global, freely available information of high spatial and sufficient temporal resolution [
30,
31]. Multispectral imagery has proven to be suitable for studying salt pans because of the high reflectivity of exposed salt surfaces and the absorption of infrared radiation by water surfaces [
6]. Although commonly suffering from cloud cover, multispectral observations are less affected by wind than radar systems [
32,
33,
34], which have been widely used to monitor wetlands [
30,
31,
35,
36]. Most studies use data from the moderate resolution imaging spectroradiometer (MODIS) [
37] or a series of the Landsat missions, which together cover an observation period of nearly 50 years [
38]. Examples of global satellite-derived WE products are the global surface water (GSW) product [
39] and the dynamic surface water extent product [
40]. Additionally, continental-scale products exist [
41]. These large-scale products include data on salt pans (
Figure 1); however, they are often inaccurate for small-size ecosystems, such as those encountered in Seewinkel [
39]. Local case studies using remote sensing to derive WE and inundation states are numerous [
42,
43,
44,
45,
46,
47,
48,
49,
50].
Several modeling approaches exist that link various drivers to salt pan hydrological properties. Traditional hydrological modeling [
52,
53,
54] applied to wetlands depends on a certain quantity and quality of data for parameterization, which often hampers their spatial transferability to regions where these data are not available [
55,
56,
57,
58,
59]. Stochastic modeling has long been recognized as a vital alternative to process-based modeling [
60]. A number of studies have focused on summarizing past, present, and perspective machine learning (ML) methodology in estimating different hydrological variables [
61,
62,
63,
64,
65,
66,
67,
68,
69,
70,
71], such as groundwater [
65,
66,
67]. Conventional ML models, such as the random forest (RF) approach [
72,
73], have been the most commonly used concepts for modeling hydrological variables [
70,
71]. They encompass the advantage of being well explored, non-parametric, often robust estimators that, in many cases, offer extensive algorithmic options for model interpretation [
70]. Hybrid models [
74,
75,
76,
77] and deep learning models [
78] have only recently gained attention in hydrological research [
77,
78]. Hybrid models are meant to incorporate the advantages of traditional hydrological modeling and ML modeling [
74,
75,
76,
77]. Although fit for complex pattern recognition tasks, deep learning models typically require large amounts of data for model training [
78,
79] and are harder to interpret [
80].
Advances in ML [
70,
71] have boosted the relevance of stochastic modeling for predicting lake WH [
81,
82,
83,
84,
85]. Past research in modeling wetland inundation dynamics using ML methods is often restricted to using in situ measurements for identifying the presence of water [
86,
87]. Greater data availability provided by EO [
39,
88,
89] has contributed to studies utilizing WE for modeling wetlands, although, to our knowledge, not for salt pans and in different temporal resolutions. The monthly inundation state of freshwater playas in the Great Plains of North America has been modeled on a large spatial scale using a monthly global water extent product based on Landsat [
39] and climate and land cover data [
90]. Inundation patterns in the Darling River Floodplain, Australia, were modeled using Landsat data and topography, meteorological, and hydrological data [
91]. Satellite-derived WE (lake surface area) of Lake Gregory, Australia’s salt lake, has been modeled using ML with precipitation and temperature as predictors [
92]. Quantification of wetland permanence of four water body permanence classes in the Prairie Pothole Region, although not carried out for salt pans, was executed by ref. [
93], who, in addition to climate and land cover, introduced features based on topography to ML modeling. Various ML models were used for the mentioned studies. Ref. [
90] used a long short-term memory neural network, ref. [
91] used RF, ref. [
92] applied a generalized group method of data handling, and ref. [
93] used extreme gradient boosting techniques.
The goal of this study is to combine long-term EO and ML to build seasonal prediction models of the inundation state of salt pans in the Seewinkel region of Austria. Modeling of the salt pan inundation state contributes to a better understanding of the effect of climate variability and groundwater exploitation on salt pan health. First, we derive and evaluate a reference data set of the yearly inundation states from 1984 to 2022 of 34 salt pans based on the Landsat satellite archive [
38,
94]. Second, we use this new long-term inundation state data set as a target variable in combination with meteorological data from global reanalysis and in situ groundwater data to develop ML models to predict in early spring (end of March) if the salt pans fall dry during July, August, September, or October (JASO) of the same year. We develop models with and without the use of local in situ groundwater gauge data, which are precise but sparse at the global level. Therefore, we tested whether meteorological predictors in combination with EO-based estimates of WE are fit for use in regions with limited availability of in situ data. In total, we create three core models: a meteorology-based model, a groundwater-based model, and one combining meteorological and groundwater data. To identify the most important drivers, we apply concepts from explainable artificial intelligence [
95].
5. Discussion
5.1. Assumptions
The modeling framework has been built on several assumptions. First, the targets (i.e., inundation dynamics for different salt pans) are correlated. Second, the accumulation periods (particularly, the dominant 12-month period) hold explanatory power in regard to the inundation state in summer. Third, the salt pans are in an environmental condition that is good enough to allow them to react to the natural drivers applied in this study. In other words, the salt pans need to be at least sufficiently well connected to the salt pan cycle to respond to the groundwater-based and meteorological predictors. Fourth, we assume that the climatology always leads to the prediction of a drying from spring toward summer. Hence, the models cannot predict the ’inundated’ state based on dry conditions in spring. This is unless the salt pan-wise class distribution is skewed towards the ’inundated’ state, causing the models to always predict the ’inundated’ state (outcome TN). In other words, if the situation during the lead time deviates much from the climatology, the models will not capture many of the effects on the salt pan inundation state. This is a disadvantage in years when drivers strongly change and may lead to misclassifications.
In total, desiccation in spring (in April to June; here used as a proxy for dry conditions) in combination with the ’inundated’ state in JASO occurred for 41 events (
Figure 3a) and resulted in 20 TNs and 21 FPs for the GROUNDWATER model. The year 2008 accumulated a notable number of fourteen FP outcomes. This circumstance also reveals that not all annual desiccation events were captured. Taking into account the desiccation events from the beginning of April to the end of October would result in a class distribution of 63%/ 37%.
5.2. Predictors
The results of the EDA (only for Lange Lacke;
Figure 4a), feature importance (
Figure 7), and calculation of partial dependencies (
Figure 8) support the assumption of a close connection between salt pans and groundwater [
14,
21]. We suspect that the rather long time steps of the model and the respective long-term predictor setup support the forcing of the slow-reacting features evolving around groundwater as a key predictor. It is to be determined whether the high importance of groundwater is actually due to the contribution of groundwater to salt pan water status directly or more generally to water abundance, i.e., drought conditions, in the region. The outstandingly high feature importance of SGI is presumptively connected to its continuous nature, rather than relying on artificially thresholded integration periods [
115].
Still, the METEOROLOGY model achieved similar scores compared to the GROUNDWATER model. Both were able to capture many of the interannual differences in the inundation state. We managed to find meteorological predictors that are of importance for spring salt pan water abundance, which is essential for the salt pan inundation state in JASO. The importance of meteorological predictors could stem from their temporal autocorrelation from one year to the next [
170,
171]. For the meteorological predictors, a single (or more) not included month(s) from the previous year could make a change in spring water abundance.
We find that, other than the continuous SGI, time periods of 12 months or more work best for predicting salt pan inundation state. Such predictors exhibited large feature importance within their model setups. It is up to further research to determine whether the 12-monthly anomaly mean is the most appropriate integration period. This argument is particularly relevant as shifting climate patterns influence groundwater recharge. The SPI 6 and the GW level ratio relate to a similar time period (6 months). This period does not seem to be particularly relevant, as both predictors were comparatively insignificant in all three models (also when disregarding the SGI). Temperature-based predictors were most important in the METEOROLOGY model despite exhibiting low correlations with the SGI.
For some combinations of salt pans and predictors, the PDP (
Figure 8) exhibited sigmoid curves with a wide spread. The PDPs for Lange Lacke and Unterer Stinkersee showed a clear progression against the SGI and SPI 24, respectively. The SPI 24 is closely related to groundwater drought as suggested in the literature [
125] and by the correlation analysis (
and
;
Section 4.2.1). Therefore, our results can confirm the observation made by ref. [
14] that both salt pans are closely connected to groundwater. This is even more true for Wörtenlacken 2, which is reported to have an atypically strong connection to groundwater, even greater than that of Lange Lacke [
14]. Similar inferences can be made for all other salt pans (
Appendix A). Additionally, the probability thresholds for the SGI were similar in the case of Lange Lacke and Wörtenlacken 2 (
Figure 8). However, such an analysis is prone to misinterpretations as partial dependency behavior can vary depending on the model setup and the underlying training data. For example, the hydrology of Katschitzlacke is reportedly similar to Lange Lacke [
14], whereas our results indicate a closer connection to the predictor number of days above 25 °C.
A drawback associated with the input data is their low spatial resolution. The argument is particularly valid for P anomalies, since groundwater level,
anomalies, and T anomalies vary less in space and time [
172,
173]. Here, future models could improve the (spatial) representation of precipitation. An understanding of the inundation state in JASO would require seasonal forecasts of hydrological and meteorological variables. Meteorological predictors that focus on depicting changing precipitation, evaporation, and temperature patterns in the region due to climate change should additionally prove beneficial [
24].
Features evolving due to the human impact on the ecosystem, such as the (e.g., monthly) amount of groundwater extraction from wells and discharge into drainage canals, were not used, as, to our knowledge, no such information is available in the region. However, the use of this information could potentially enhance the knowledge to be gained from the models, especially if such information was available at the subregional scale or for each salt pan.
5.3. Target
Our results confirm that the EO-based inundation state is a useful target variable for ML-based modeling. Data from the Landsat mission has been shown to form a useful basis for quantifying interannual dynamics in surface water dynamics [
39,
54,
174]. Although in some years, the impact of cloud cover was high, the summer/fall inundation status could be retrieved for all salt pans over the entire study period except for the years 2002 and 2012. The variation in this target variable roughly showed similar dynamics to some of the variables considered in other studies on a larger area, e.g., SPEI3 [
103] or long-term precipitation [
102]. The year 2015 represents an exception, as it is referred to as drought year in ref. [
103] but appears rather wet in our analysis. This might be because of the rather wet conditions in fall-winter 2014, which is also visible in ref. [
103].
Uncertainties in the salt pan time series are expected to be larger for smaller salt pans, which have a larger relative proportion of mixed pixels with bordering land (
Section 4.1). However, this argument turns out to be secondary since the accuracy of the model target is dependent on the exact recognition of desiccation and not on the precise sensing of the true WE. Higher resolution remote sensing products, such as Sentinel-2 imagery, could reduce the error connected to spotting desiccation inside the ’last’ pixels. Such data would need to be used in combination with, e.g., the Landsat archive, to build the models on extensive time series. In addition to using satellite data with higher resolutions, we propose the use of alternative target variables to avoid the skewed salt pan-wise (and year-wise) class imbalances. The time of the first desiccation (
Figure 3) would constitute an interesting target variable [
14,
21].
5.4. Model Error
Although in this study we were able to predict the salt pan inundation state in Seewinkel with only moderate accuracy, the average performance of the three independent test sets indicates a gain of 0.24 compared to the RANDOM model. We regard the average score between the models of 0.6 as acceptable only insofar as the assumed reasons for the observed model error are numerous and, depending on the salt pan and year, heavy-weighing. Therefore, the model error can be, approximately, explained. Increasing the model performance based on the issues described, in detail, below is largely limited by data uncertainty and data availability. The failure of the model to make correct predictions if the meteorological conditions deviate much from the climatology, the artificial inundation, and the uncertain hydrological condition, meaning surface water possibly infiltrating into deeper layers, explain the results and provide starting points for future improvements to the model. In general, we consider the model setup performant and stable.
Many years exhibit highly skewed class distributions, especially since 2016. This influences the metrics since different years are connected to varying degrees of difficulty in correct estimation. The total skill of the three models is very similar. The indirect setup of the model, which means the prediction of the inundation state in summer via the water balance at the end of March, can be considered a major contributor to this outcome. Salt pans with a more balanced class distribution are more challenging to correctly estimate for the three models (
Table 5). On average, the GROUNDWATER model performed best in predicting these eight salt pans (Zicklacke, Katschitzlacke, Fuchslochlacke 3, Oberer Stinkersee, Mittlerer Stinkersee, Wörtenlacken 2, Neubruchlacke, and Lange Lacke), although interpreting these results proved difficult due to the widely varying hydrological conditions of the salt pans [
14].
Section 4.2.2 stresses the importance of the underlying physical conditions on the fold-wise performance. As already discussed in
Section 5.1, moderate success mainly lies in the struggle to estimate extreme dry, and, especially, wet conditions in summer (e.g., drought around 1992, 2003, and 2016, and wet periods around especially 1996 (1997) and 2010).
As stated in
Section 4.2.3, estimates were worse for years in which the inundation state shifted to the alternative state. The misclassifications are probably due to some salt pans reacting faster to hydrometeorological changes than others. Hence, for some salt pans, the environmental conditions of the previous months and year(s) have a stronger influence on the prediction of the current year than for others. Furthermore, the misclassifications may be partly due to the fact that the input features are coarse resolution (i.e., do not differ between Lacken) and partly to the model trying to get a best fit over all the years.
Although we did not apply feature selection [
112] to reduce the number of features used in this study, we were able to inhibit overfitting in model testing. This was completed by trimming the decision trees used in the four RF models in the scope of the hyperparameter optimization. This built on our model design, which enables independent model testing and, practically, on closely monitoring training–test differences throughout this study.
In addition to changing climate patterns, a process referred to as “drying from beneath” [
20] challenges the water-holding capacity of the salt pans. Depending on the ecological state of the salt pans, this mechanism can directly influence WE and, therefore, the inundation state. We suppose that the worse the ecological health of the salt pan, the higher the negative impact on model performance. However, it is not possible to characterize this ecological state based on our models and using the available input data. Due to the skewed class distribution, the assumption that our predictor selection works better for more natural/ecologically healthy salt pans could not be answered inside this model setup. The disregarded large human influence on the water cycle [
21] constitutes an additional source of error.
All models are subject to a division between the periods before and after 2004 (
Figure 5). This pattern cannot be found in the target variable (
Figure 3). Additional research is needed to clearly connect climate change and the phenomenon of “dying salt pans” to these observations. As artificially inundated salt pans were introduced into the modeling, year-wise estimates could additionally have been affected due to misguided thresholding.
5.5. Model Transferability
The model based on meteorological predictors can be transferred to any other salt pan ecosystem worldwide in combination with the use of high-resolution remote sensing imagery, such as that provided by Landsat. In general, globally available predictor data in sufficient temporal and spatial resolution with respect to the studied ecosystem are needed, at best in combination with uncertainty quantification. This can be ensured by choosing an adequate spatial resolution of the predictors with regard to the catchment size. Here, ERA5-Land offers a good starting point with its 9 km × 9 km spatial resolution. The EO data should have a suitable temporal and spatial resolution to capture the dynamics of the studied ecosystem. For example, it is not possible to retrieve the water extent information of ecosystems of a smaller size than the Landsat resolution of 30 m × 30 m. Another important constraint is that this approach will likely not be suitable in the case of water bodies whose water extent shows a low sensitivity with respect to water volume, i.e., with steep bathymetry in which a drop in the water level will not lead to a proportional decrease in the water area.
6. Conclusions
As salt pans in Seewinkel are increasingly vulnerable ecosystems in often poor hydrological conditions, we aimed at improving ecosystem understanding and, finally, decision-making by predicting the salt pan inundation state in summer and fall with ML models.
Our models stress the importance of groundwater for the estimation of the inundation state in summer/fall. This solidifies the general notion represented in the literature [
14,
21] and calls for sustainable groundwater management in the region to ensure the conservation of this ecosystem. We stress that the use of the SGI [
115] as a predictor is promising. The model based on meteorological predictors can be transferred to any other salt pan ecosystem worldwide in combination with the use of high-resolution remote sensing imagery, such as the Landsat archive. METEOROLOGY achieved an MCC of 0.66 compared to GROUNDWATER with 0.59 and COMBINED with 0.57, with respect to the independent test set. We identified the most likely sources of error, namely the struggle to estimate the inundation state correctly in the case of extreme environmental conditions developing after March, human intervention into the water cycle by artificially inundating the salt pans, and surface water loss due to the possible infiltration into deeper layers due to a failure of the water retention capacity [
20]. Furthermore, we highlight the potential of the concept of partial dependency [
166] to understand threshold-dependent ecosystems, such as salt pans in the Seewinkel region.
To our knowledge, the results represent the first data-driven prediction and understanding of salt pan dynamics in the Seewinkel region. We identified the main drivers and potential improvements for future model development. In this context, the use of more advanced ML algorithms could prove beneficial.
Furthermore, the possibility of transferring the METEOROLOGY model to other salt pan ecosystems in combination with EO data makes this study particularly valuable. We propose the application of our models to salt pans of larger sizes and ones that are less influenced by humans and in a better ecological condition. This could improve both performance and interpretability.
The possibility of predicting the salt pan inundation state in summer/fall is of potential importance to decision-makers in conservation and tourism [
14,
17]. A better understanding of salt pans can contribute to preserving this unique geographic space in the Pannonian Basin.