1. Introduction
Floods are the most frequent and second-most costliest natural disasters worldwide, making up a share of 43% of the recorded disaster events in 1998–2017 and affecting over 2 billion people, according to the survey of the United Nations Office for Disaster Risk Reduction (UNISDR) [
1]. For the period 2008–2018, the International Federation of Red Cross and Read Crescent Societies (IFRC) counted 730 million flood-affected people, representing 52% of world’s disaster victims that suffered 23% of economic damage and 7% of recorded deaths related to natural disasters over that period [
2]. With climate change and an intensifying water cycle, human losses, infrastructural damages, and economical losses are expected to increase in the future [
3], a trend that will be exacerbated by a growing flood vulnerability due to urbanisation, population growth, and land cover change.
In light of this threat, knowing where and when floods appear is imperative to authorities and disaster management units around the globe. Any effective flood management requires timely and detailed flood maps to enable preparation, planning, and rapid response to the local flood risk. Here, satellite remote sensing offers a rich data source [
4], allowing to assess the situation from the bird’s eye perspective. A recent review article [
5] offered a synopsis on satellite remote sensing in flood management, expounding on the technological developments since the 1970s and on how the European Copernicus program with their Sentinel satellite fleet can assist in various phases of flood management. Likewise, rainfall observations from space are well-established, and are used for example in riverine flood modelling, which shows significant potential in flood forecasting [
6].
As to
flood extent mapping, state-of-the-art imaging sensors revolve along low Earth orbits and scan our planet at a resolution of some 10 m and achieve up to a daily measurement frequency, depending on the geographic location, satellite mission constellation, and employed sensor technology. While optical sensors allow an easy-to-interpret analysis, they are often blocked by cloud cover during flood situations when severe weather conditions are predominant [
5,
7]. In contrast, Synthetic Aperture Radar (SAR) sensors have day-and-night and all-weather capabilities [
8], and can observe the situation at all overpasses. Although the peak extent of a flood might not coincide with a satellite overspass and is usually not recorded, SAR systems offer observations more regularly and can capture the flood’s rise and fall. In conjunction with their high sensitivity to water occurrences, satellite-based SARs are excellent instruments to directly map flooded surfaces on a regional scale, and are well suited for automatic and global flood monitoring. That said, the employed scientific algorithms need to account for land cover and environmental effects on the radar signal. Due to the observational distance from space and the resolution limits of SAR sensors, spatial detail in flood delineation can miss the requirements of users who work on the local scale or in areas with high complexity from topography, vegetation, or buildings. To enhance delineation fidelity, SAR data can be combined with elevation or optical datasets [
9], or be assimilated into hydraulic models [
10,
11].
Methods that
map floods directly in SAR imagery are well established and allow for fast and straightforward analysis on an image/scene level, as water bodies typically feature a strong contrast to other surfaces in SAR images. The reason lies within the difference between microwave scattering mechanism over water and land surfaces, and the side-looking geometry of SAR systems. A specular reflection of the radar pulses by the water surfaces leads to backscatter intensities received at the sensor that are much lower than for most other land cover types [
12]. This physical mechanism renders the mapping of open, calm water in principle rather straightforward. Accordingly, many SAR-based flood mapping algorithms have been developed based on image histogram analysis, leading to thresholds that divide the area into low and high backscatter regions and, respectively, into water and non-water.
Even when applied to only
single SAR images, thresholding often produces quite satisfying results (e.g., [
13,
14,
15]), and shows robust performance when refined with automated hierarchical thresholding in image-tiles, fuzzy logic post classification, and region growing (e.g., [
16,
17]). However, this requires a suitable reference map on permanent and seasonal water bodies to distinguish flooded areas. Furthermore, applying such algorithms over larger regions in an automated fashion is challenging due to the complexity of the terrain, heterogeneous land cover, and varying environmental conditions. In particular, maps based on single images often contain false positives over land surfaces with backscatter signatures as low as that from water surfaces, i.e., an overestimation due to non-flood pixels detected as flooded pixels. Such
water look-alikes are typically found in areas featuring smooth surfaces (e.g., tarmacs, sands, salt pans), dry and sparse vegetation (e.g., prairie grasslands), bare rock grounds, and in areas located in the image’s radar shadow that appears along high mountain ranges, forest lines, or buildings.
The impact of these confounding effects can be minimised by using
change detection approaches, which (1) are less sensitive to the generation of false positives and (2) directly yield flood areas instead of water areas. Here, changes between two subsequent measurements are attributed to sudden changes occurring on the ground, transforming the flood mapping issue to a classification problem between change and no-change. Following the computation of a difference image (i.e., the change image), different histogram threshold approaches (e.g., [
10,
18]) can be applied to generate the binary classification. Such change detection methods assume that one type of change (i.e., a decrease of backscatter due to the specular reflection on water bodies) dominates all other changes, and therefore, might produce misclassifications of non-water pixels due to low backscatter from dry soil conditions. Yet, the study of [
19] showed that such false positives can be effectively reduced with dual-image processing approaches. Further enhancements are achieved again with method refinements, such as hierarchical image tiling and region growing, e.g., recently in [
20].
Today, we have access to a large number of SAR satellites from multiple past and current missions, including Envisat ASAR, Radarsat, TerraSAR, and Sentinel-1. This allows us to go one step further and use the availability of many observations distributed over time. Then, we can build
SAR backscatter time series and map floods based on the signal’s deviation from a priori statistical parameters. For example, in the method of [
21], the rationale is that usually, land pixels can be classified as flooded when they show a distinct deviation from the expected seasonal backscatter that is modelled with harmonic functions.
Another established approach to address ambiguities in SAR images between flooded and non-flooded is to produce a measure of
uncertainty from the classification process. In this regard, probabilistic methods, specifically Bayesian inference and its multi-node extensions—so-called Bayesian networks—are popular choices [
15,
22,
23,
24,
25]. Here, the probability of a given SAR backscatter pixel is assessed against predefined flooded and non-flooded probability distributions, and the flood mapping is realised through selection of the more probable class, along with a certainty value. The required distributions can be inferred from historic observations and—when ingesting local time series—may establish pixel-specific parametrisation.
The employment of time series for the detection of floods in satellite images necessitates in practice the formation of a
datacube, where historic and new images are unified through well-defined methods on SAR preprocessing, gridding, and file storage. In a datacube, the temporal and spatial dimensions are treated alike, and therefore, each SAR image eligible to flood mapping can be directly compared with the entire backscatter history, allowing to implement on a per-pixel basis different sorts of change detection algorithms in a straightforward and efficient manner. Subsequently, pixel-specific thresholds and model parameters can be applied without the need for spatial re-gridding or resampling and allow fast flood classification. Recently, ref. [
26] developed on the Google Earth Engine platform a flood algorithm that combines processing with access to satellite datacubes [
27], using a decision tree informed from antecedent Sentinel-1 and Landsat time series.
Effectively, a datacube enables (1) a more robust handling of land surface heterogeneity, (2) the a priori determination of regions where open water cannot be detected for physical reasons (e.g., dense vegetation, urban areas, deserts), (3) the estimation of the flood mapping’s uncertainties, and (4) the generation of historic water extent maps, essentially as a by-product of the model calibration, which may serve as a reference for distinguishing between floods and the normal seasonal water extent. For instance, ref. [
28] showed that time series analyses applied on SAR data archives are also well suited to improve the characterisation of permanent water bodies, and [
29] derived an exclusion layer to remove overestimations of flood extent in arid regions.
In this paper, we present in
Section 2 our new method based on a Sentinel-1 datacube, followed by a description of the input data collection, the flood mapping algorithm with its statistical model, and its procedures to identify insensitivities. For a major flood event in Greece/Thessaly in 2018,
Section 3 interprets the generated parameters and discusses an in-depth investigation of the method’s performance, and
Section 4 draws conclusions and considers future research directions.
2. Data and Methods
We take up the datacube approach for flood mapping, and present a time series-based detection method for Sentinel-1 radar data, using a simple Bayes classifier in conjunction with data-driven masks for low-sensitivity areas.
2.1. Our New Method Based on a Sentinel-1 Datacube
The 2014-launched Sentinel-1 mission [
30] of the European Earth observation programme Copernicus employs C-band SAR instruments (CSAR) operated at a 5.5 cm wavelength. It is the first SAR mission that is dedicated to systematic backscatter acquisitions, with a two-satellite-constellation scanning all global land masses at 10 m sampling within 12 days (Note: Since December 2021, Sentinel-1B suffers from an operational anomaly and its CSAR sensor is not active, reducing global coverage roughly by a factor of 2. See also
Section 4). With this, the mission offers an unprecedented spatio-temporal coverage as well as radiometric accuracy and stability, and fuels many applications through enabling the retrieval of geophysical variables as, e.g., soil moisture [
31,
32], vegetation density [
33,
34], crop status [
35,
36], or snow depth [
37]. However, Sentinel-1’s sensor design and acquisition strategy pose new challenges, as they constitute also a break with former C-band SAR missions ERS-1/2, Radarsat-1/2, and Envisat ASAR, as (1) it provides VV-polarised radar observations over land areas and (2) its satellites follow a strict acquisition scenario, scanning the ground under repeating viewing angles, and thus, limiting the range of observations angles. While the VV-channel is considered most suitable for the detection of water surfaces through its generally higher sensitivity on this matter [
17,
38], the limited number of observations angles poses a challenge for backscatter normalisation, which is usually required to obtain consistent classification results within the stretched image extent [
39]. Moreover, the stationary orbit configuration of Sentinel-1 generates a discriminative swath footprint pattern (as, e.g., discussed in [
32])—with some areas observed only by one or two orbits, and many among them with a narrow or even non-existent incidence angle range—and as a consequence, the incidence angle normalisation suffers from high uncertainty or relies on spatial proxies [
40]. Some studies on water mapping ignore the incidence angle effect by arguing that Sentinel-1’s incidence angle range is rather narrow [
41], while others only use acquisitions from identical relative orbits, tolerating lower revisit frequency and less reliable model parameters.
Our here-presented change detection algorithm pursues a new strategy, exploiting the availability of historical backscatter measurements within an spatially extensive multiyear Sentinel-1 datacube. After reaching the (obvious) definition that flood is water where normally no water is, our three central statements are the following:
For all water bodies around the globe, we assume that they have an identical C-band SAR backscatter signature, independent from local conditions such as depth, underwater ground, or turbidity. This assumption is justified by the fact that the penetration depth of microwaves into water is just a few millimetres at best. Under the conditions that the water bodies are open (i.e., not covered by vegetation), calm (i.e., not roughened by wind), and non-frozen, the backscatter signal is of universal character and primary related to the incidence angle. This allows us estimating global backscatter parameters for water bodies that particularly include flood bodies (hereafter water distribution), derived from selected Sentinel-1 measurements collected from calm and open water bodies. Eventually, we form these a priori backscatter distributions for a set of fine bins within the Sentinel-1 incidence angle range.
Contrary, the backscatter signal over land is diverse and heterogeneous, and thus, we localise its parametrisation and retrieve the a priori local backscatter distribution for each individual pixel over the landmasses. Building upon the already available Sentinel-1 datacube that comprises data since 2014, we estimate the local backscatter distribution per relative orbit geometry (hereafter local distribution) and do not apply any incidence angle normalization. We further assume that flooded conditions are highly infrequent, and hence, neglect their impact on the multiyear statistics, and declare the local distribution to represent non-flood conditions.
For the actual flood mapping, the values of the incoming Sentinel-1 image are analyzed pixel-wise against the water distribution (respective to the incidence angle) and the local distribution (respective to the orbit). By means of Bayesian inference, it is then possible to derive the value’s posterior probabilities of belonging to the water and the local distribution, and hence, to decide if the image pixel belongs to either the flood or the non-flood class. Applying the Bayes decision rule yields not only the class allocation, but also implicitly provides a probabilistic uncertainty measure at each pixel.
Accordingly, the algorithm fully exploits the entire Sentinel-1 signal history within the datacube, realised by a set of a priori computed statistical parameters that provide via a harmonic seasonality model a specific SAR characterisation of the Earth’s land surface at the pixel level. Water surfaces are modelled globally and with respect to Sentinel-1’s incidence angle dependency. With those parameters as input, and with the mathematical legacy of Bayes, the flood delineation procedure can be designed computationally relatively slim, it does not require any human interaction (e.g., on selection of reference images), and it is hence most suitable for automatic global operations in near-real-time (NRT). Limiting our water definitions to calm and open waters is a concession made to achieve our operational objectives (including automatisation), recognizing the various complications with the SAR modeling over roughened and overgrown waters. Such situations can be highly dynamic and require thematic flagging a posteriori, which is beyond this paper’s scope.
However, the two obtained distribution parameter sets allow an a priori identification of our flood algorithm’s no-sensitivity areas, where the water and local distribution are too strongly overlapping and the flood decision is not reliable or possible. This includes permanent and seasonal water bodies, as well as permanent low-backscatter pixels from water look-alikes, such as airports and motorways. Finally, a topography mask is applied to elevated areas where floods hardly occur, and hence, support the robustness of a potential automatic global service for detecting floods.
Figure 1 summarises the general workflow between the main components of our flood mapping method.
2.2. Sentinel-1 Datacube Formation
Observational input to our flood algorithm is generated by the C-band sensor (CSAR) onboard the Sentinel-1A and -1B satellites, operated in the Interferometric Wide-swath (IW) mode that is the mission’s main operational mode over land and measures backscatter in dual polarisation (VV and VH). In IW mode, Sentinel-1 offers a systematic and regular revisit of 9 to 1 local observations within 12 days, following the mission’s orbit cycle and its observation scenario (see
https://sentinel.esa.int/web/sentinel/missions/sentinel-1/observation-scenario (accessed on 25 July 2022), details discussed in [
32,
40]).
For this study, we collected for the period 2015–2020 the VV-polarised IW mode Ground Range-Detected at High resolution (IWGRDH) products that hold backscatter amplitude data and are characterised by a 10 m pixel spacing, a nominal spatial resolution of 20 m × 22 m, and a radiometric accuracy of 1dB (3
) [
30].
The build of our Sentinel-1 IW datacube is detailed in our recent dedicated publication in [
42] (together with a description of access options). As a brief summary here, all files underwent parallelised preprocessing comprising (1) precise orbit data usage, (2) image border noise removal (following an algorithm developed specifically for S-1 [
43]), (3) thermal noise removal, (4) radiometric calibration, (5) terrain correction, yielding an intermediate image at 10 m pixel sampling in geographical coordinates, (6) reprojection onto the Equi7Grid, (7) downsampling with gdalwarp/cubicsplines to a 20 m pixel-size, and (8) splitting into 300 km-sized tiles; (gdalwarp accessed on 25 July 2022 at
https://gdal.org/programs/gdalwarp.html).
Figure 1.
Schematic overview of the flood mapping algorithm’s main components and data flow. Gray module: SAR preprocessing (not subject of this publication); blue module: offline/precomputed parameter retrieval; red module: the online/NRT flood classification. SIG0: sigma nought backscatter coefficient (). IA: incidence angle ().
Figure 1.
Schematic overview of the flood mapping algorithm’s main components and data flow. Gray module: SAR preprocessing (not subject of this publication); blue module: offline/precomputed parameter retrieval; red module: the online/NRT flood classification. SIG0: sigma nought backscatter coefficient (). IA: incidence angle ().
The Equi7Grid [
44] is a global spatial reference system designed to handle efficiently the archiving, processing, and analysis of high resolution raster data over land, as it minimises data oversampling and preserves geometric accuracy. Its features have been found most beneficial in global terrain analysis by [
45], and its design allows spatial accuracy in flood mapping around the globe.
The choice of the 20 m pixel sampling (instead of 10 m) is motivated by noise reduction. A Sentinel-1 image, owing to the nature of the SAR observation technique, inevitably carries speckle and signal noise, and consequently, the effective resolution is somewhat coarser than the nominal resolution. Although the processing to GRDH already dampens the noise level, it can be effectively reduced further through spatio-temporal averaging and filtering, closing the gap between nominal and actual resolution. For the flood mapping with Sentinel-1 IW images, we considered a downsampling to 20 m as a good compromise between noise reduction and resolution power for water body delineation. Moreover, the data size (reduced by factor ∼4) significantly reduced the required storage and processing power, speeding up the parameter generation and flood estimation. Details on the used SAR preprocessing methods and how they are employed in High-Performance Computing (HPC) environments can be found in the studies of [
40,
46,
47].
The obtained SAR images hold
(sigma nought) backscatter coefficient values in decibel (dB). They are co-registered and time-stacked over the Equi7Grid-tile EU020M_E054N006T3 (covering our study site in Greece, cf.
Section 2.7), ranging from January 2015 to December 2020, and providing ∼600 individual measurements from orbits D080 (descending orbit direction) and A175 (ascending) for the study area centre. The other five local orbits had no overpass during the flood event. These images build, together with information on the topography (based on digital elevation models, DEMs), the basis for the parameter generation, flood mapping, and masking presented in this study. We note that the European Space Agency (ESA), as the primary provider, slices Sentinel-1 the IW products along-track per 25 s sensing time (equivalent to about 170 km in azimuth direction), and we forward the initial slicing and timestamps to our preprocessed datacube. In the course of the flood mapping, due to the algorithm’s design aiming for NRT operations, adjacent Sentinel-1 slices stemming from the same overpass are not spatially merged, and a thin line along the product slicing may remain unclassified.
As to the observation geometry, the projected local incidence angle (PLIA) values are available as a by-product of the terrain correction step of the SAR preprocessing chain. For the purpose of flood mapping, PLIA describes appropriately the radar geometry over flat terrain and acts as the
incidence angle (IA, θ) in this study. Because of the self-repeating orbit geometries of the Sentinel-1 mission—the satellite positions are maintained within an orbital tube of 50 m (1
) [
30]—almost identical observation angles are established at each overpass. Globally, the Sentinel-1 mission has 175 (repeating) relative orbits, with locally up to 9 orbits. Consequently, when working with Sentinel-1 data separately per relative orbit (indexed in the following by
), a pixel’s value for
can be assumed constant. We capitalise on this and use as input to our algorithm a set of constant
values, which are computed a priori and per-orbit as average
of the Sentinel-1A+B observations of the year 2020.
Our notation in this paper uses subscripts for data-based parameters, which are estimated a priori and stored on disk, e.g., , whereas variables appearing in runtime are notated by parentheses, e.g., .
2.3. Backscatter Parameters for Water and Land Surfaces
Following our initial three central statements on our approach to map floods, where an SAR backscatter image shows a water signature instead of the expected local (land) signature, we generate dedicated statistical parameters from the Sentinel-1 multiyear data archive. We use these parameters to compute the posterior probabilities for the classes
flood and
non-flood, which are subsequently input to the Bayes decision rule (
Section 2.4).
Based on the premise that flood bodies show the same signature as regular water bodies, we infer the
flood backscatter probability distributions from the manually collected water distribution (
Section 2.3.1).
To obtain the
non-flood backscatter probability distributions, we derive from the pixel’s time series the so-called
harmonic coefficients to model the local seasonal signal holding the expected values within the yearly cycle. With this, we declare the expected local distribution to represent the non-flooded conditions, irrespective of the actual land cover including permanent or seasonal water bodies—as we define floods as water occurring where it normally does not (
Section 2.3.2).
2.3.1. Flood Backscatter Probability Distributions
In radar images, water surfaces show typically a strong contrast to land surfaces. This was recently confirmed for Sentinel-1 CSAR and permanent water bodies in the global study of [
40]. Consequently—and important to flood mapping—temporarily inundated surfaces introduce a drop in the backscatter time series of an affected land area. To differentiate these changes from other effects with similar outcome, a detailed knowledge of the backscatter behaviour over water surfaces is required.
As demonstrated in [
25], the SAR backscatter behaviour over water surfaces can be represented by a normal distribution, and it can be retrieved from a representative collection of SAR measurements over water bodies. Following this approach, we collected various backscatter observations
along with the respective incidence angles
over ocean and inland water surfaces from the Sentinel-1 datacube. Due to the typical increase of backscatter over water during wind or frost conditions, and the much reduced separability against land in such cases, the collection was thoroughly filtered for calm conditions based on visual image inspection. As in [
25], we extracted the actual water surfaces by the use of global land cover data, and additionally removed the pixels on the edge line of each water body to avoid the influence of mixed land-water pixels. The representative water collection was then aggregated through averaging per month and orbit, and comprises ∼1000 individual composites that cover the 2015–2016 period and 12 European Equi7Grid-tiles.
Before one can estimate water backscatter signatures, the strong linear relation between backscatter and incidence angle must be accounted for. To eliminate the impact of the incidence angle on the observed backscatter value, ref. [
25] normalised ENVISAT ASAR backscatter to a reference incidence angle, while [
15] applied an approach based on the assignment of reference angles to discrete classes (bins) and subsequent distribution sampling. Sentinel-1 CSAR, however, provides only a limited number of incidence angles per pixel, and with only one or two incidence angles per pixel, the underdetermined equation system would introduce high uncertainty. Therefore, our method developed for Sentinel-1 provides an equivalent approach without the need for normalisation.
With the backscatter probability density function (PDF)
, we model the flood likelihood for any given incidence angle
(generalising from
). Assuming specifically a conditional normal distribution, its PDF
is determined by its mean
and standard deviation
. If the relationship between
and
is linear and
is constant, these parameters can be obtained by linear least squares. In order to verify our assumptions, within our water-backscatter collection
, we first sorted the backscatter samples along the incidence angles
and grouped them in
bins, using two-sided rounding to the bins’ centre values, noting that this binning size is considered precise enough to cover its impact on backscatter (
Figure 2a).
Figure 2.
(
a) Scatterplot of collected Sentinel-1 backscatter coefficients
over water surfaces against incidence angles
, arranged within
incidence angle bins. The solid red line is the fitted linear function, while the dashed red line indicates one standard deviation, and the dotted line half of the standard deviation (used for masking in
Section 2.5.2). (
b) Standard deviation of backscatter coefficients within the respective incidence angle bins.
Figure 2.
(
a) Scatterplot of collected Sentinel-1 backscatter coefficients
over water surfaces against incidence angles
, arranged within
incidence angle bins. The solid red line is the fitted linear function, while the dashed red line indicates one standard deviation, and the dotted line half of the standard deviation (used for masking in
Section 2.5.2). (
b) Standard deviation of backscatter coefficients within the respective incidence angle bins.
Obviously, the mean water-backscatter values within each bin show—as expected—a linear relation with the incidence angles, which can be parametrised as:
Its gradient
and intercept
can be estimated by means of linear regression (cf. solid red line in
Figure 2a), yielding
and
in dB.
In order to verify that the standard deviation
s of
is constant and independent of
, we calculated the standard deviation per
-bin, and—as can be seen in
Figure 2b—these standard deviations are very similar across the whole incidence angle range, with a very small gradient of −0.008 and a small (meta) standard deviation of 0.33 dB.
This allows us to fit the linear model for the backscatter as function of the incidence angle
, with the corresponding standard deviation computed by taking the square root of the sum of squared errors
divided by the number of data points (
n), adjusted for the degrees of freedom of the model (see [
48], chapter 3). Putting everything together, the globally applicable flood-backscatter PDF
is defined for a given incidence angle
(in
) by the following equation:
2.3.2. Non-Flood Backscatter Probability Distributions
Our method’s purpose is the detection of water surfaces over normally “dry” land surfaces. Hence, a robust a priori knowledge of the local backscatter response under normal, non-flooded conditions is essential. The recorded radar signal over natural land surfaces consists of temporally constant, e.g., soil and bedrock composition, and sensor-related parameters, and variable factors, such as soil moisture and vegetation conditions. Based on the surface characteristics and the climatic condition, the local backscatter time series usually shows a specific periodic, or harmonic, behaviour called seasonality.
For the description of the backscatter’s seasonality, we use a harmonic model (Equation (
5)), following the approach of [
21].
is the day-of-year derived from the actual acquisition time
t of the radar measurement by applying the day-of-year conversion
.
and
represent the harmonic coefficients/parameters, and
is the
expected radar backscatter at
. The first cosine coefficient
equals
, which is the average radar backscatter, and the first sine coefficient
reduces per definition to zero:
Based on each pixel’s backscatter time series, the
harmonic parameters are computed via a linear least-squares estimation:
where
A is the Jacobi matrix,
l the observation matrix (i.e., the pixel’s time series
), and
x the matrix containing the unknowns, defined as:
A has the shape
, where
n is the number of measurements in the time series
and
k is the chosen order of the harmonic model (and
,
). As suggested by [
21], we set
k to 3, which is enough to reproduce oscillations caused by seasonal processes at time scales of ∼4 months. Higher
k-values would imply modelling processes occurring at shorter time scales, and hence, incorporate effects from outliers and short-term events, including floods.
Similar to water bodies, radar backscatter from land surfaces depend on the incidence angle , though with a generally lower impact. Nevertheless, the backscatter modeling over land is more demanding, as the impact’s strength varies strongly with the particular land cover type. We argued that water can be modelled globally without localised parameters, since we can presume that they have a globally uniform behaviour in the C-band SAR perspective. This is not valid for land pixels, which require localised parameters on the backscatter behaviour, owing to the various land and vegetation surface characteristics.
To avoid that variations caused by vegetation- or soil-induced seasonality are confused with incidence or azimuthal angle effects, the systematic impact of the observation geometry has to be eliminated. A normalization step prior to the estimation of the harmonic parameters as done for ASAR by [
25] is not applicable for Sentinel-1. Fortunately, we can exploit the self-repeating orbit geometries and the constant incidence angle values
, and we estimate the desired harmonic parameters separately for each local relative orbit
. Ultimately, we can forward the image and parameters as non-normalised backscatter values to the flood model.
Analogous to above, we model the local pixel’s likelihood for non-flooded conditions—with a generic PDF
for a given point in time
t and relative orbit
—as a Gaussian PDF
. In particular, the expected backscatter value from the harmonic model
acts as mean parameter of the local distribution
, given
t and
:
To illustrate Equation (
8), Figure 4e in
Section 3 plots an example backscatter time series
from descending orbit
over our Greek study site, together with the estimates for
and the residuals between them.
Furthermore, here, the standard deviation is inferred from the time-independent
of the residuals between the pixel’s actual time series
(from the datacube) and the expected values
(from the harmonic model), divided by the model’s degrees of freedom:
2.4. Bayesian Flood Mapping
Reference [
25] computed a water probability for a backscatter measurement from residuals against the mean value of the given PDFs. We expand this approach and compute such a probability based on the above backscatter distributions for flood
and non-flood.
An incoming Sentinel-1 image is forwarded as a pixel array to the flood mapping algorithm and defines the day of the acquisition
and the relative orbit
. With incidence angle values
from the 2020 mean
data, we are able to construct the flood PDF
, and with the harmonic local parameters of orbit
and the date
, we are able to construct the non-flood PDF
(for the sake of brevity, from here on, we will omit the conditioning variables, except the class labels
N and
). These distributions can be set into relation with new backscatter measurements
to assign them to either of the classes flood (
F) or non-flood (
). Given the class-specific PDFs (or likelihoods), the two
posterior probabilities and
of class membership can be inferred using
Bayes’ theorem:
The denominator
is referred to as the
evidence and serves as a normalization factor to scale the posterior probabilities between 0 and 1 for each sample
:
where
and
are called
priors and represent the a priori probability of a pixel belonging to a certain class. In the Bayesian framework, these priors could be used to integrate information available before (i.e., a priori) the actual observation, e.g., from historical flood records or run-off models, into the inference. In general, we have no such information, which is reflected by choosing an uniformed prior distribution, assigning for the priors
and achieving an equal weighting.
Inserting the posterior probabilities defined in Equations (
12) and (
13) into the Bayes’ decision rule:
This yields the most probable class
c from the overall class set
.
Figure 3 shows a graphical illustration of the Bayes flood mapping procedure for an exemplary backscatter observation with respect to the distributions.
The advantages of this approach are that it not only produces the local optimal threshold for separating both classes, but that it also establishes a measure of uncertainty, described by the so called
conditional error. For each sample
, one can define the conditional error
as follows:
is the posterior probability that the specific observation was generated by the class not chosen by the Bayes decision rule, which can range from very certain to tossing a coin . As such, it is an inverse measure for confidence in the classification.
Figure 3.
Bayesian flood mapping procedure over one pixel for an exemplary backscatter measurement, including the probability density functions and posterior probabilities of the two classes flooded (F) and non-flooded (). This example illustrates a challenging situation with relatively close local parameters . The marked backscatter observation at -15.1 dB has a probability of 0.8 to belong to the class non-flood (), with an uncertainty of 0.2.
Figure 3.
Bayesian flood mapping procedure over one pixel for an exemplary backscatter measurement, including the probability density functions and posterior probabilities of the two classes flooded (F) and non-flooded (). This example illustrates a challenging situation with relatively close local parameters . The marked backscatter observation at -15.1 dB has a probability of 0.8 to belong to the class non-flood (), with an uncertainty of 0.2.
2.5. Detection of No-Sensitivity
The interaction of C-band microwaves with the land surface is in general complex and there a several situations where Sentinel-1 CSAR observations are insensitive to flood conditions for physical, geometric, or sensor-side reasons. With our statistical model parameters built from the multiyear Sentinel-1 datacube, we have a powerful tool at hand to identify such adverse conditions. In the following, we outline our methods to identify locations and observations for which the Bayes model does not allow for a robust decision between flood and non-flood, and thus, our algorithm is ill-posed. The implemented set of masks are widely overlapping, but each one addresses particular aspects of non-sensitivity, and thus, increase the algorithm’s robustness, ultimately also aiming at global application.
2.5.1. Masking of Exceeding Incidence Angles
The Sentinel-1 IW mode scans Earth with side-looking viewing angles between 29
–46
. Consequently, flat areas feature incidence angles (IA) only from within this range, whereas IA exceeding it stem from sloped surface, and hence, are only found in rugged terrain. Naturally, water surfaces are observed at all times under IA from this flat range and our water distributions—leading to the regression parameters in Equation (
3) and Equation (
4)—could only be defined for this limited
domain (cf. with
Figure 2).
To preclude any flood decision over areas with exceeding IA, we set hard
thresholds before we apply the Bayes model to an incoming Sentinel-1 scene. To allow a decision on flooded conditions in areas on the onset of hills (while keeping the extrapolation moderate), we extend the acceptable
range by a ∼10% buffer and relax the initial range to 27
–48
, and obtain the following for the
incidence angle mask:
With this, we obtain an a priori mask dependent on the relative orbit, and all pixels with values outside this range are mapped as unclassified.
2.5.2. Identification of Conflicting Distributions
The key indicator driving our algorithm to decide weather a Sentinel-1 measurements stems from a flooded or non-flooded surface is the sharp decrease in backscatter when a normally dry pixel is water-covered. This implies that the pixel has, during normal conditions, higher backscatter values than a respective water surface, equivalent to when the local distribution is overall higher than the respective water distribution. For situations where this is not true, our Bayes decision model is insensitive to flood conditions and cannot be used. Typical locations where this can appear are asphalt surfaces along highways or airstrips, salt panes, or arid san- and bedrock areas, which can be summarised in the SAR perspective as water look-alikes.
Fortunately, with our local and water distributions built from Sentinel-1 backscatter samples, we can determine a priori such ambiguous locations, with respect to the relative orbit
and the day-of-year
. Whenever the local distribution is not distinguishable from the the water distribution, we declare the model insensitive. In particular, we rule out every configuration where the mean of the local distribution is lower than the mean plus one half standard deviation of the water distribution:
The choice of this (conservative) threshold not only secures that we exclude locations that have on average backscatter values lower than water, but also rules out configurations where the two distributions share a considerable overlap and the Bayes model becomes arbitrary. From Equation (
4), the threshold
is always
and is illustrated as a parallel dotted line to the water’s backscatter regression line in
Figure 2. With this, we obtain an a priori mask dependent on the relative orbit and the day-of-year, and all pixels with non-separable distributions are mapped as unclassified.
Although many critical cases will be caught by the Bayes uncertainty mask (see
Section 2.5.4), this
conflicting distribution mask profits from the high spatial quality of the multiyear parameters that have a much reduced noise level compared to an individual Sentinel-1 IW scene, and thus, suppresses speckled classifications in noisy SAR image sections when the Bayes decisions are tight.
2.5.3. Removal of Measurement Outliers
Our flood algorithm decides between normal and flood conditions on the basis of distributions sampled in multiyear time series. When an incoming Sentinel-1 IW image contains extreme values, i.e., statistical outliers, those measurements are not properly represented by our model’s probabilities and a Bayes decision is not meaningful. Independent from the reason (either physical features on the ground, sensor-side image artifacts and energy overflows, or noise and speckle), we exclude such extreme image pixels and mask all values outside three standard deviations of the local distributions:
With this, we obtain an outlier mask dependent on incoming image values with respect to the local distribution, and all outlier-pixels are mapped as unclassified.
2.5.4. Denial of High Uncertainty on Decision
The Bayes approach yields in addition to the classification flood/non-flood the conditional error as measure for its uncertainty. For certain situations—with backscatter values belonging in all likelihood either to the water or the local distribution—this uncertainty measure is close to zero and we can accept the classification with high confidence. In contrast, when backscatter values of the incoming image are somewhat between the two distributions—falling into their overlap and no class is much more probable than the other one—the Bayes decision is very uncertain and the classification is not meaningful. The maximum value for the conditional error is per definition
, and we define a threshold of
for an acceptable and meaningful decision, reflecting a 4:1 probability that the assigned class is correct:
With this, we obtain an (un-)certainty mask dependent on the Bayes decision, based on information from the parameters as well as the actual measurement. All pixels where the acceptable certainty is not reached are mapped as unclassified.
2.6. Postprocessing
SAR offers an advantageous observation principle when it comes to flood mapping (high sensitivity to water, clear view through clouds, independence from daylight). This comes with some inherent disadvantages, which might impair the correct flood identification, and necessitate a (mild) postprocessing of our classification results.
2.6.1. Morphological Operator
The backscatter measurement of a single Sentinel-1 pixel is composed of a superposition of signals from various different scatterers at a sub-pixel scale. This so-called speckle, although physical, appears in SAR images as noise and varies the backscatter over homogeneous targets. As a consequence, single pixels may show a lower backscatter and could be confused with inundation. In order to reduce such false-positive detections, a small spatial majority filter is applied to the flood map from the pixel-based Bayes decision. A too large kernel size would increase the risk of underdetection, and based on visual impression, we set the filter’s kernel size to 3 pixels.
2.6.2. Topography
Another quality-degrading effect comes from the side-looking geometry of SAR observations in form of potential signal distortions in areas of strong topography. Range-Doppler terrain correction may not model sufficiently accurate the illuminated area and this could result in very low and high backscatter along hills and mountain ridges.
To avoid strong over- or underdetection, here, we use the Height Above Nearest Drainage (HAND) index data based on the Shuttle Radar Topographic Mission (SRTM) DEM to mask out areas with strong topography that are distant to water bodies. The HAND index value [
49] represents the vertical distance between a DEM cell and the nearest cell of the drainage network. By excluding all pixels featuring a HAND index of 20 m or more, the impact of topography related misclassification can be reduced significantly. Since floods appear predominately in vicinity to local aquifers and take effect only on flat terrain, the risk of missed classification is minimal through this exclusion, as (relatively) low-lying areas remain unmasked.
This HAND mask largely overlaps with the incidence angle mask from
Section 2.5.1, as in practice, both are a function of topography. However, while the
mask covers areas where our approach is unfit for flood mapping because of the Sentinel-1 SAR geometry, the HAND mask excludes areas where one cannot expect floods in general, from a hydrologic perspective. Effectively, the HAND mask removes artefacts on flattening hilltops, and thus, improves the algorithm’s robustness.
2.7. Study Flood Event and Reference Data
To test our method, we examine a major flood disaster that struck the Greek mainland in February 2018. This event was already subject to, e.g., the study of [
26], and serves our experiment well, as it was well-captured not only by Sentinel-1, but more importantly, also by the (usually rare) reference satellite products with a similarly good temporal overlap.
In 2018, the region of Thessaly (overview map in Figure 4 in
Section 3) was subject to ongoing and unprecedentedly intense rainfall, causing rivers to overflow and inundating farmland and settlements. A weather station near the village of Zagora recorded 676 mm of rain between 21 and 26 February, including 209 mm in only 24 h on the 26th [
50]. The most affected area spans about 50 km in the north-west of the Thessalian plain between the cities of Trikala and Larissa, accommodating dense agriculture with non-irrigated and permanently irrigated farmland on fertile soils. The plain is widely surrounded by mountain ranges and is discharging into the Aegean Sea via the Pineios River and its tributaries. This drainage basin had been frequently flooded in history [
51] and is still flood-prone nowadays, hit again by a severe flood of the Pineios river in February 2018.
After its peak around 26 February 2018, two Sentinel-1 images recorded the situation on 28 February 2018 at 04:39 and 16:31 from two overpasses in the orbit tracks with the relative orbit number D080 (descending direction) and A175 (ascending). As validation references from the same date, we could collect one flood delineation map from Copernicus Emergency Management Service (CEMS) Rapid Mapping (
https://emergency.copernicus.eu/mapping/, accessed on 13 January 2022), and one (almost cloud-free) Sentinel-2 multispectral-optical acquisition. The latter was converted to a flood map by calculating the Normalised Difference Water Index (NDWI) and setting the threshold of
(and by neglecting the very small permanent waters), following [
52]. The reference data for the evaluation in
Section 3 are detailed in
Table 1.
4. Conclusions
In this paper, we presented our recent advances in flood mapping with Sentinel-1 SAR data, which produced a novel method that is fit for global and near-real-time monitoring. Operating autonomously from human interaction and reference identification, it yields flood classification and corresponding uncertainty values by distinguishing current SAR imagery from precomputed and localised parameters. The algorithm is centred on an optimised global datacube structure, is parametrised pixel-wise through harmonic time series analysis, and features a priori masking of insensitive areas and observations.
We established our approach on the basis of the monitoring capability of the European Sentinel-1 CSAR mission and its global and long-standing observation scenario (in the light of the ongoing anomaly of Sentinel-1B, ESA advanced the launch schedule of Sentinel-1C to April 2023). Building upon the stability, frequency, and quality of the provided IWGRDH imagery, we derived seasonality parameters from 2015–2020 time series, and we collected a representative CSAR signature for water bodies. Moreover, we effectively capitalise on the mission’s orbit repetition through the per-orbit model parametrisation the usage of static incidence angles, while at the same time minimising the systematic influence of the observation geometry. Data-driven exclusion masks identify situations suffering from unfit parameter configurations, where Sentinel-1 flood mapping is not reliable or even impossible due to physical limitations of the SAR system.
In conjunction with this a priori and pixel-localised flood model calibration, the presented Bayes classification decision engine requires little computational effort, and hence, can be run fast during near-real-time (NRT) flood mapping applications. In fact, the algorithm is already an integral component of the recently launched Global Flood Monitoring (GFM, [
53]) component (integrated in the Global Flood Awareness Systems (GloFAS) available at
https://www.globalfloods.eu/ (accessed on 25 July 2022)) of the Copernicus Emergency Management Service (CEMS), as one of three independent flood mapping algorithms that are combined within one ensemble decision product. The
GFM ensemble setup [
54] promises robustness and accuracy in global flood monitoring, as the three employed Sentinel-1 algorithms complement each other through entirely different concepts on the the flood decision, and with this, it represents well the current research on automated SAR-based flood detection [
55]. The algorithm based on the work of [
20] ingests a pair of two recent images and maps changes therein through statistical modelling of backscatter distributions in hierarchical subsets, while the algorithm based on developments by [
17] classifies in single images using fuzzy-logic methods and topography-derived indices, with subsequent region growing. In contrast, our here-presented algorithm exploits per pixel the full Sentinel-1 signal history from the datacube, and classifies on the basis of precomputed probabilities for flooded and non-flooded SAR signatures. A big advantage towards NRT-readiness of our approach is that there is no dependency on a recent and congruent precursor satellite image, because the change is detected against a precomputed synthetic image from the harmonic model.
In terms of flood map accuracy, our datacube-based Bayes decision performs reliably, well-aligned with result metrics produced in the literature (e.g., as in the review of [
56], or specifically [
26]). The obtained flood maps for the Thessaly event in February 2018 are widely congruent with the two available reference maps, with complete agreement on the general flood body structure, and with few notable deviations related to different observation timings and retreating floods. The comparative coarse resolution of our 20 m-sampled datacube may be seen as a shortcoming here, ceding some details along vegetated rivers. However, this mild downsampling, combined with the temporal aggregation by the harmonic synthesis, offers a clear representation of the expected local SAR signature, practically free from noise and speckle. Following this method paper, our group is currently composing a subsequent evaluation study [
57]. It examines in depth the flood mapping performance for multiple events on five continents in comparison to maps generated by the CEMS Rapid Mapping activations, with findings that confirm observations made in this study of the 2018 Greece event.
What remains to be tackled by future research is an appropriate handling of off-seasonality. This includes effects foremost from crop rotation in agricultures, progressing land cover changes, or extreme soil moisture conditions. In such cases, the harmonic model and the expected backscatter value maybe do not fit the actual non-flood signal, and misinterpretation and false positives can occur. A solution could be adapting the seasonal reference with current observations, e.g., through integrating antecedent Sentinel-1 images and merging them with the local harmonic function through temporal filtering. Furthermore, the impact of the input backscatter time series length in terms of seasonal cycles is to be addressed in the upcoming experiments.
In order to detect windy conditions that roughen the flood surfaces, within the GFM project, a first adequate attempt is implemented based on Sentinel-1 data on-hand during runtime. When the current SAR image over regular water bodies is much increased against localised long-term statistics, a wind flag is raised in the surrounding areas. As such, this is independent from auxiliary meteorological data that may be troublesome in global near-real-time operations.
That said, integrating
auxiliary information is another research direction aiming for further increased accuracy, especially as Bayesian methods are adept in integrating preexisting ancillary information in the labelling process. For floods, data on topography, morphology, or local water body seasonality may be integrated in form of dynamic
prior probabilities, e.g., in simple Bayesian inference [
22], or further integrated in belief networks [
24].
Finally, the
masking of problematic areas opens up a wide field of possibilities to increase robustness, in particular when aiming for automated and global applications. The topical work of [
58] focused on the globally applicable generation of a dedicated exclusion mask, where SAR is insensitive to flood/non-flood conditions. In a quite similar approach, they applied time series analyses on the Sentinel-1 datacube to identify problematic land covers and radar geometries (e.g., shadows), and could effectively reduce classification errors.