1. Introduction
Water stress emerges as a critical national security concern, exerting profound effects on the overall quality of life, food and energy production, and political stability within our interconnected global landscape. Monitoring water levels is a crucial tool for informing demand management, flood control, and safe dam operations [
1]. Ancient societies constructed dams to store water for agricultural, drinking, and irrigation purposes with the oldest reservoir recorded in the Black Desert of modern Jordan in the fourth millennium BCE. These early dams, characterized by simple earthen structures, have evolved over time with advancements in dam construction and water management techniques. In the contemporary context, reservoirs still play an important role in the regulation and management of surface water resources. Recent decades have witnessed a global surge in the construction of numerous reservoirs and dams, driven by objectives such as flood control, hydropower generation, and irrigation [
2]. The impact of reservoirs extends beyond their immediate functions, significantly influencing river discharge patterns, hydro morphology [
3], water quality [
4], and the spatiotemporal distribution of surface water resources [
5]. Recent studies (e.g., [
1]) indicate that approximately two-thirds of the world’s long rivers are currently affected by dams.
Despite the comprehensive documentation of dams and their configurations, the consistent observation of reservoir storage remains predominantly limited to developed countries, and even there, accessing records may be challenging. This discrepancy contributes to tragic outcomes, as highlighted in the [
2], as 2.2 billion people worldwide lack access to clean drinking water, and 3.5 billion people lack access to safely managed sanitation.
Understanding and addressing the complex dynamics of reservoir management and its implications for water resources are imperative in the face of this growing global water stress. Bridging the gap between historical practices and contemporary challenges requires the use of advanced monitoring techniques, data-driven approaches, and integrated water resources management strategies, which collectively enable the more efficient, sustainable, and adaptive management of water resources in response to changing environmental and societal needs.
In this context, monitoring reservoir volume is critically important for several reasons. Firstly, it supports a sustainable water supply for both urban and rural communities. By continuously tracking water levels, authorities can anticipate shortages and take measures to conserve water, thus preventing severe water restrictions during drought periods. Secondly, effective monitoring is vital for agricultural planning. Reliable data on reservoir volumes allows farmers to align their planting schedules and irrigation practices with actual water availability, maximizing crop yields and conserving water. This integration of water resources management supports sustainable agricultural practices and optimizes water use efficiency [
6].
Similar issues are present in the energy sector. For hydroelectric power production, monitoring reservoir levels ensures that sufficient water is available to meet energy demands. This enhances the efficiency of hydropower plants, ensuring a stable energy supply and reducing operational costs. By maintaining optimal water levels, hydroelectric plants can operate more efficiently, balancing energy production with environmental considerations [
7].
Many developing countries face significant limitations in estimating their water resources, leading to poor management and adverse conditions for their populations. Furthermore, there is a lack of comprehensive global information on this subject. The available data is highly decentralized, making it difficult to perform global-scale analyses. Additionally, the information that is accessible predominantly pertains to large reservoirs, neglecting smaller ones that are also crucial for agriculture and other purposes at a local or regional scale.
Flood risk management is another critical area where reservoir volume monitoring plays a key role. By predicting water levels accurately, authorities can manage water releases from reservoirs in a controlled manner, thereby minimizing downstream flooding and protecting communities and infrastructure.
The present work aims to contribute to addressing the reservoir volume estimation by developing and providing a web-based tool to monitor the volumes of reservoirs worldwide. This tool will facilitate the analysis of patterns and assess the storage status of most reservoirs globally.
Tracking reservoir storage has been challenging due to the varying conditions each reservoir presents, such as differences in size, weather patterns, and available resources. A variety of methods are employed to achieve this, ranging from low-tech to high-tech solutions. One common approach is manual measurement, which involves using staff gauges: simple graduated boards or poles installed at various points around a reservoir (
Figure 1a). Observers read the water level directly from these gauges at regular intervals. This method is cost-effective and straightforward, but it requires consistent manual labor and can be less accurate due to human error and environmental conditions.
In addition to manual methods, pressure transducers are also used. These devices measure water pressure at a specific depth in the reservoir. The pressure readings are then converted into water level measurements. This method can provide real-time data and is often used in automated monitoring systems with data loggers for continuous recording.
Ultrasonic and radar sensors, shown at
Figure 1b, use sound waves or radio waves to determine the distance between the sensor and the water surface. By knowing this distance, the water level can be calculated. These sensors are typically mounted above the water surface and provide accurate non-contact measurements, making them ideal for remote or hard-to-reach locations. They can also transmit data wirelessly to monitoring stations.
Remote sensing technology offers significant advantages, and several studies have addressed its challenges (such as cloud coverage or image processing issues) and have developed innovative solutions. These efforts often focus on algorithms for identifying water bodies and estimating area–volume or elevation–volume relationships for reservoirs. In most cases, this is achieved for a case-specific reservoir.
For instance, Ref. [
8] estimated monthly volume variations in the Thac Mo hydroelectric reservoir (2016–2021) using Sentinel-1 for surface area and Jason-3 altimetry for water levels. Validation against in situ data revealed high correlations (R = 0.98–0.99) between satellite-derived measurements and ground truth, demonstrating the approach’s precision. Similarly, Ref. [
9] combined TanDEM-X and COSMO-SkyMed SAR data with in situ measurements (2011–2013) to estimate water levels and reservoir volumes. Results showed a correlation coefficient of 0.99, with differences within ±1 m and a standard deviation of 0.60 m, emphasizing SAR data’s accuracy in volume estimation.
Using Google Earth Engine (GEE), Ref. [
10] calculated flooded areas and water volumes in reservoirs from Landsat imagery and the Modified Normalized Difference Water Index (MNDWI). Their approach, validated against historical in situ data across four reservoirs, achieved high Pearson correlations, highlighting GEE’s potential for automated large-scale reservoir monitoring. Ref. [
11] analyzed the Latyan Dam Reservoir in East Tehran with satellite data (Landsat, Sentinel-1, Sentinel-2) and artificial intelligence (AI) methods, including genetic algorithms (GA), differential evolution (DE), and cat swarm optimization (CSO). The DE method reached 98% accuracy in classification, with correlations above 0.9 between DE and in situ data, confirming AI’s reliability for monitoring complex reservoirs. Ref. [
12] conducted a comprehensive study on the global monitoring of large reservoir storage using satellite remote sensing. They analyzed 34 global reservoirs using a combination of five satellite altimeters over the period from 1992 to 2010. An unsupervised classification approach using a MODIS 16-day 250 m vegetation product was used to estimate surface water areas, which were then combined with satellite-altimeter-based water elevation estimates to derive elevation–area relationships. Their findings indicated that storage estimates were highly correlated with gauge observations for large reservoirs in the United States (
R = 0.92 to 0.99). The normalized root mean square error (NRMSE) ranged from 3% to 15%, with a mean absolute error of 4% of the reservoir capacity. This study demonstrated the feasibility and accuracy of using satellite data for global reservoir monitoring. These studies collectively highlight the progress in remote sensing technology and its application in reservoir monitoring. The methodologies developed and validated in these works demonstrate high accuracy and reliability, and all present valuable knowledge to start this research.
Table 1 presents a comparative summary of these works.
The present work explores the development and application of an innovative web-based tool designed to monitor reservoir volumes on a global scale. The tool employs satellite imagery and automated analysis techniques to generate accurate and timely information about reservoir storage levels. By integrating these capabilities, the system offers valuable insights into trends in reservoir volumes, enabling users to optimize water management strategies and understand the dynamics of water storage.
Although similar tools for estimating reservoir volumes already exist, they are typically site-specific and tailored to specific regions (cf.
Table 1), limiting their broader applicability. In contrast, the tool presented in this paper is user-driven, allowing for the volume estimation of any reservoir based on user input, and is accessible directly through a web browser, ensuring greater flexibility and usability. One of the primary objectives of this tool is to enhance accessibility to crucial reservoir data. Freely available at [
https://github.com/joao862/BLU, accessed on 16 December 2024], it aims to empower decision makers, researchers, and water resource managers by providing an efficient and reliable means of tracking water availability.
The outcomes of this research are expected to contribute to global water security by fostering informed decision making and promoting the sustainable management of water resources. By mitigating the adverse effects of water stress, the tool aspires to strengthen resilience to evolving environmental conditions, ultimately supporting the equitable distribution of water resources and ensuring their availability for future generations.
2. Materials and Methods
2.1. General Procedure
The tool to automate and simplify the process of analyzing reservoir volumes through remote sensing will enable users to easily define a specific region of interest (ROI) and a custom date range for analysis, streamlining the workflow for those requiring fast and accurate insights into water bodies. By leveraging high-resolution satellite imagery, primarily from the Sentinel-2 mission, the tool aims to operate within the specified parameters, acquiring the necessary data for the selected period.
Once the satellite images are retrieved, the developed tool (herein called software) will conduct a series of image processing tasks, starting with the removal of any potential interferences such as clouds or atmospheric distortions and man-made structures present in the reservoir. It will then apply water surface masking techniques to isolate the reservoir from surrounding land features, ensuring the precise identification of the water body. Subsequently, the software will compute the area of the reservoir at different points in time, allowing for the temporal analysis of water levels.
In addition to calculating the area, the software will utilize or create storage capacity curves, which provide a mathematical relationship between the surface area and volume of the reservoir. By interpolating the volume from the calculated surface areas, the software will produce accurate volume estimates of the selected reservoir. The entire process, from image acquisition to volume calculation, is schematically presented in
Figure 2, offering a clear picture of the general procedure and final output.
The tool will integrate a user-friendly interface and sophisticated analysis capabilities, aiming to enhance the accuracy and efficiency of reservoir volume estimation, providing valuable insights for water resources management and environmental monitoring worldwide.
2.2. Sentinel-2 Imagery
Google Earth Engine (GEE) is central to the present work, providing an accessible platform for satellite data and computational resources. It allows for remote sensing data acquisition and processing via its Python API. Sentinel-2 imagery [
13] was chosen for its global coverage, high spatial resolution (10 m), and 5-day revisit time, making it ideal for water resources management. The tool enables access to imagery based on region and cloud coverage percentage, ensuring only high-quality images are processed.
In contrast to traditional methods, remote sensing technologies offer a more advanced, efficient, and cost-effective solution for monitoring reservoir levels. Remote sensing involves gathering information from a distance using sensors mounted on satellites or aircraft.
The twin Sentinel-2 satellites, 2A and 2B, are designed to ensure a high revisit frequency of every 5 days at the equator, making them highly effective for continuous monitoring. These satellites are equipped with optical instruments that capture imagery across 13 spectral bands, each of which corresponds to a specific range of the electromagnetic spectrum. The bands include four at 10 m spatial resolution, six at 20 m resolution, and three at 60 m resolution, covering an orbital swath width of 290 km [
13].
The electromagnetic spectrum encompasses a wide range of energy, including radio waves, microwaves, infrared, visible light, ultraviolet, x-rays, and gamma rays. While the human eye can detect visible light, Sentinel-2’s instruments are designed to capture data not only in the visible spectrum, but also in the near-infrared and short-wave infrared bands. This capability allows the satellites to identify different features and materials on Earth’s surface based on their unique spectral signatures. The data gathered across these 13 spectral bands are crucial for applications such as monitoring changes in vegetation, water levels, and land use, and are invaluable for environmental management, disaster response, and resource conservation.
Resolution is a critical factor in remote sensing, encompassing radiometric, spatial, spectral, and temporal dimensions. The interplay of these aspects determines the suitability of a sensor for specific applications. High radiometric resolution enhances the sensor’s sensitivity to subtle variations in reflectance, enabling the precise discrimination of target features. Spatial resolution, in turn, dictates the level of detail captured, influencing the effectiveness of analysis in complex or heterogeneous landscapes.
Spectral resolution defines the range and granularity of wavelengths a sensor can detect, with Sentinel-2’s spectral bands offering versatility across applications such as vegetation monitoring and water body analysis. Temporal resolution, shaped by a satellite’s orbital configuration and swath width, ensures frequent revisit times, which are critical for monitoring dynamic processes like seasonal water fluctuations and vegetation cycles.
2.3. Inundated Area
Remote sensing techniques are fundamentally based on extracting coverage information from satellite images and analyzing an object’s ability to reflect radiant energy within a specific spectrum. By comparing the boundaries of surface water bodies, it is possible to accurately map areas containing water. These techniques rely on spectral indices (i.e., mathematical formulas), applied to spectral bands of satellite imagery to derive specific information about the Earth’s surface features. These formulas are designed to highlight or quantify characteristics of interest, which can be used to extract water bodies from a satellite image.
The near-infrared (NIR) band, ranging from 0.7 to 1.0 μm, is essential for vegetation analysis due to the strong reflection of NIR light by healthy plants, thanks to their leaf structure. This makes NIR crucial for monitoring plant health through indices like the NDVI. It also helps distinguish land from water, as water absorbs NIR and appears darker in images, while NIR is also useful for soil moisture and crop health assessments.
The short-wave infrared (SWIR) band, spanning from 1.1 to 2.5 µm, excels at detecting moisture content in soil and vegetation. Wet surfaces strongly absorb SWIR light, making it effective for identifying dry versus moist areas. Additionally, SWIR can penetrate thin clouds, aiding in clearer imaging. It is useful for distinguishing snow from clouds, monitoring fire-affected areas, and geological mapping due to the unique SWIR signatures of minerals.
The green band, ranging from 0.5 to 0.6 µm, is part of the visible spectrum and captures the green light reflected by plants. It is key for creating true-color images when combined with red and blue bands. The green band also plays a role in water body monitoring and vegetation analysis (e.g., indices like the NDWI and MNDWI).
Together, NIR, SWIR, and green bands offer powerful insights into Earth’s surface, enhancing environmental monitoring, agriculture, and urban planning (
Figure 3). Their combined use helps solve complex problems across various fields.
The Normalized Difference Water Index (NDWI) and the Modified Normalized Difference Water Index (MNDWI) discern water bodies from land surfaces, developed by [
14], and utilize near-infrared and short-wave infrared bands to minimize interference from vegetation and shadows, proving effective in diverse environments. It compares the reflectance of near-infrared and short-wave infrared bands to distinguish water bodies, where high values indicate water due to its high reflectance in NIR and absorption in SWIR.
Similarly, the Modified Normalized Difference Water Index (MNDWI), proposed by [
15], enhances water detection in urban settings by focusing on green and SWIR bands. In this case, it focuses on green and SWIR bands, enhancing water detection accuracy in urban areas with built-up features and vegetation.
These indices facilitate the precise mapping of surface water, supporting applications in environmental monitoring, flood prediction, and water management. Challenges include variability in water quality parameters and the need for improved algorithms to enhance accuracy across different geographical and temporal scales, underscoring ongoing research efforts in the field.
Most recently, Ref. [
16] proposed a new index for detecting water bodies called the Automated Water Extraction Index (AWEI). This index improves classification accuracy in areas with shadows and dark surfaces, which are not well classified by other methods. The authors concluded that the AWEI can be used with high levels of accuracy to detect water bodies, especially in mountainous areas. As is known, the shadows in these regions are caused by the geomorphology of highlands and constitute one of the most important sources of errors in classification. The AWEI integrates green, red, NIR, and SWIR bands to improve water classification, particularly in challenging environments with shadows and dark surfaces.
Table 2 summarizes these methods.
In addition, they developed a shadow-corrected version of the index, known as AWEIsh, which further enhances water detection in areas with significant shadow interference by incorporating blue and SWIR bands alongside green and NIR. These indices enable the precise mapping and monitoring of surface water dynamics despite their strengths and weaknesses. By leveraging the distinct spectral properties of water across different bands and combining the information these indices provide, using them in the cases where they are most effective, it is possible to create an effective and accurate workflow for most cases [
17].
Distinct spectral indices are applied to emphasize water features in satellite imagery by exploiting their distinct reflectance characteristics. Nonetheless, each index has its constraints. For example, the NDWI may have difficulties in metropolitan regions due to interference from constructed surfaces, whereas the MNDWI confronts obstacles in situations including cloud and shadow contamination. Likewise, the AWEI index, albeit useful in specific shadow-affected regions, exhibited discrepancies when juxtaposed with the NDWI in comparable imagery applications. Notwithstanding these constraints, the NDWI proved to be the most consistent and dependable metric throughout this investigation, especially across the several circumstances examined. Its resilience to noise caused by meteorological circumstances, including haze and partial cloud cover, rendered it exceptionally effective in preserving the accuracy of water delineation. Furthermore, the NDWI exhibited enhanced stability across reservoirs of diverse dimensions and geographical settings, reliably producing precise water surface detection. The index’s straightforwardness and computational efficiency rendered it optimal for the automated large-scale processing necessitated by this application.
The reflectance properties of materials have been used to develop a variety of spectral indices that are designed to emphasize specific features. Although certain indices exhibit exceptional accuracy in specific scenarios, their performance frequently deteriorates in other environments. For instance, near urbanized areas, the NDWI is less effective, the MNDWI is unable to manage clouds and shadows, and even though the AWEI and AWEIsh perform better in shadowed conditions and other challenging scenarios, they still demonstrate poor consistency in general.
The attributes of low sensitivity to atmospheric noise, stability in varied environments, and computational efficiency ultimately validated the choice of the NDWI as the optimal index for this study, facilitating accurate and dependable reservoir volume estimation across diverse conditions.
3. Development of the Tool
3.1. Workflow
The main steps of the tool are presented herein (see corresponding subsections in parentheses).
Users define a specific reservoir or select a region of interest, along with a date range.
The platform retrieves satellite imagery based on the defined parameters.
Non-water features such as bridges, identified using the Google Places API, are removed through morphological operations like dilation and erosion.
For images with high cloud coverage, cloud pixels are filtered out, and the reservoir shape is reconstructed using bathymetry data.
The Normalized Difference Water Index (NDWI) is applied with a threshold determined through an unsupervised learning method to detect water pixels.
After determining the inundated area, the area–volume relationship is selected from the following options: (a) available in the tool’s database; (b) provided by the user (txt or excel file); (c) derived from a Digital Elevation Model (DEM) uploaded by the user; or (d) retrieved from global databases such as GLOBAthy.
Volume Time-Series Generation
The calculated surface areas are interpolated to generate the time series of reservoir volumes.
Figure 4 presents a flowchart illustrating these stages and their interconnections.
3.2. Backend
3.2.1. Determining the Maximum Perimeter of the Reservoir (AOI)
Moreover, convolutional neural network models, which are deep learning models capable of receiving images as input, have also been trained in this field. However, they require a very large number of labeled images for training in as many different circumstances as possible, making the process very time-consuming.
By restricting the maximum perimeter of the lake or reservoir through the shape file or polygon, it is possible to eliminate issues related to mistaking water for other features in the scene and also exclude small water bodies that are not part of the reservoir. This proposed method simplifies the process while still overcoming most of the limitations. In the tool, this is achieved through the integration of the HydroLakes dataset [
18], which provides global shoreline polygons for lakes with a surface area of at least 10 hectares. It includes 1.4 million lakes and reservoirs, covering a total surface area of 2.67 million km
2, a shoreline length of 7.2 million km, and a total storage volume of 181,900 km
3. It offers geometric and limited attribute information such as surface area, shoreline length, average depth, water volume, and residence time. Additionally, it is part of the larger HydroATLAS database [
18], which enhances it with more hydro-environmental characteristics.
The integration of this dataset in the software not only increases the efficiency of the method, as explained above, but also simplifies the process of choosing the region of interest (ROI) by allowing the user to define it just by clicking on the lake polygon. Furthermore, the speed and computational time are also concerns, and can be highly influenced by the size of uploaded files such as this dataset. To address this, the dataset can be uploaded to the GEE platform and then accessed in this project as an asset. This approach makes the process of loading the geographical data of 1.5 Gb much faster.
3.2.2. Accessing the Satellite Imagery
The GEE platform provides access to a vast repository of satellite imagery and geospatial datasets, allowing for the creation of surface area time series for lakes and reservoirs. The temporal resolution of these images depends on the optical repository source (e.g., 16 days for Landsat, 10 days for Sentinel-2) and can be increased by combining data from different missions. GEE simplifies the process of accessing and processing satellite imagery by providing an integrated environment with powerful computational capabilities to operate and manipulate the images.
The processing of these image collections in the tool is performed on the Sentinel-2 data due to its global coverage, capturing imagery of almost the entire Earth’s land surface. Sentinel-2 offers relatively higher spatial resolution (10 m) compared to Landsat products (typically 30 m), and also provides a high temporal resolution of 5 days at the equator when both satellites are operational. As part of the Copernicus program, the available products from Sentinel-2A and Sentinel-2B satellites carry an innovative wide-swath high-resolution multispectral imager with 13 spectral bands, making them an excellent choice to fulfill the goal of this study.
Thus, the first main part of the method consists of gathering the satellite imagery, highlighting the water pixels, removing unwanted objects if necessary, and calculating the water area.
The Copernicus Sentinel-2 dataset “Harmonized Sentinel-2 MSI: Multispectral Instrument Level 1-C” in Google Earth Engine, provided by the ESA and the European Union, has a date availability from 27 June 2015 at midnight UTC to the present day. The corresponding Earth Engine snippet of the image collection is “COPERNICUS/S2_HARMNIZED”.
3.2.3. Detect the Presence of Bridges
Classifying aerial images to detect structures like bridges is a critical step in analyzing reservoir water areas. The BLIP (Bootstrapping Language–Image Pre-Training) machine learning model is leveraged for this task. BLIP excels in vision–language tasks, combining image–text retrieval and captioning. It handles noisy datasets by generating synthetic captions, which are filtered for accuracy, improving training quality.
For aerial image analysis, BLIP generates descriptive captions that identify features like bridges or roads. When a bridge is detected, morphological operations, including dilation and erosion, are applied to mitigate its impact on water area calculations. These operations address inconsistencies caused by lighting, shadows, or misclassified pixels. Dilation smooths discrepancies by expanding the binary mask, while erosion restores the mask’s original shape. This process reduces noise and enhances water surface delineation accuracy, as demonstrated in figures illustrating how bridges are effectively removed from calculations.
Map-based APIs like OpenStreetMap (OSM) or Google Places API were integrated to identify the relevant structures. These tools automate the detection of infrastructure, including bridges, within a reservoir’s boundaries. Using geographic coordinates, the APIs query vast datasets for man-made structures. For example, an OSM request can search specifically for bridges, returning metadata such as names, locations, and outlines. This automation reduces manual effort, improves accuracy, and scales to reservoirs of varying sizes.
The improved method begins by defining a reservoir’s geographic boundaries using spatial coordinates. These boundaries are queried in OSM or Google Places API for structures. If a bridge is detected, details such as its name and architecture are displayed.
Figure 5 illustrates how detected bridges are outlined and identified within the region of interest, streamlining the workflow.
Once identified, the software applies morphological operations—first dilation, then erosion—to smooth out inconsistencies in the water mask caused by the presence of the bridge. These operations help to correct misclassified pixels, ensuring that the area calculation excludes the bridge’s interference. This approach allows for accurate water surface delineation, even in the presence of man-made structures.
3.2.4. Masking Clouds and Shadows
One of the major limitations and obstacles in remote sensing is the presence of adverse environmental and visual conditions such as clouds. In this context, clouds are responsible for covering the water areas as well as creating shadows, leading to holes in the binary masks. While it is possible for the user to choose a lower cloud cover percentage, thereby filtering images with less cloud appearance, this may result in very few images for analysis in some cases, especially for large areas.
To overcome this issue, the developed tool uses a solution based on the S2Cloudless algorithm to mask the clouds and shadows, provided by the Sentinel-2 hub [
19].
S2Cloudless is a machine learning-based algorithm for pixel-wise cloud probability computation using 10 Sentinel-2 bands. It computes the NDWI index, subtracts the cloud pixels from the NDWI binary image, and, based on the bathymetry of the lake, performs a polygon search to reconstruct the affected NDWI image. Its simplicity and scalability make it effective for masking both clouds and shadows by combining cloud projection with NIR low-reflectance pixels. Thresholds were validated through comparisons with ground truth data, ensuring a balance between accuracy and computational efficiency.
This algorithm was applied to the image to mask the clouds and shadows, and the NDWI is computed to visualize the area affected by these obstacles. As a result of this implementation, the example below illustrates the cloud-masked pixels with orange color for a cloudy day at the Caia Reservoir in Portugal (
Figure 6).
As highlighted in the example, the clouds are successfully masked, and it is noticeable that some pixels intersect the reservoir and, consequently, the NDWI mask, leading to an underestimation of the lake’s area. Thus, the next step is to reconstruct the NDWI polygon, which can be accomplished using the bathymetry information. The bathymetry of a lake represents the measurements and behavior of the lake or any other water body with depth, containing a set of contours for each elevation.
Figure 7 illustrates an example of the NDWI applied to a reservoir image taken on a cloudy day.
In this case, the NDWI visibly underestimates the extent of the water-covered regions due to cloud interference. The reconstructed mask, however, addresses these imperfections by closing the gaps in the NDWI and providing a more accurate representation of the reservoir’s current inundated surface, visually fitting the actual water boundaries more effectively.
3.2.5. Classifying Water Pixels Using Unsupervised Learning
An important aspect of the water region classification is the creation of the NDWI mask within the AOI. This step involves generating a binary mask representing the water extent, where pixels with a value of 0 indicate water and pixels with a value of 1 represent non-water regions. A simple method for creating this mask is to apply a threshold manually to the NDWI image; typically, values between 0 and 0.3 are classified as water, while values outside this range are considered non-water. However, this fixed threshold approach is not ideal for large-scale applications, as satellite images can vary significantly due to lighting, shadows, and other environmental conditions. Using a static threshold can lead to errors of underestimating or overestimating the water area. To address these limitations, the tool implements an unsupervised learning method to create the binary mask. This approach automatically adjusts the threshold based on the specific characteristics of each image, ensuring a more accurate representation of the water extent, even under varying conditions such as cloud cover.
The method utilized is the K-means. The main advantage of this clustering algorithm comparing with other possible classification methods based on spectral characteristics, such as OTSU’s method, is its availability on the Google Earth Engine, which significantly enhances its accessibility, ease of use, and processing speed, making the software more scalable and faster. The binary mask generated using this method differs from the one created with a fixed threshold. While the differences might not be immediately apparent visually, as seen in
Figure 8a,b, they become evident when the areas of the polygons are calculated, confirming the hypothesis mentioned previously and sustaining the choice of the algorithm for classification. This distinction is clearly illustrated in the chart presented in
Figure 8c, which highlights the differences in the area calculations for the Caia Reservoir.
3.2.6. Calculating the Water Area
Once the binary NDWI mask is created, the next step is to calculate the water area. This can generally be achieved in two ways: by summing all the pixels identified as water regions or by finding the contours of the largest water body and calculating the area of the resulting polygon. While the first approach is simpler, it may struggle with accuracy, particularly in the presence of small water bodies or wetlands near the reservoir, as these regions will be included in the calculation, potentially leading to an overestimation of the reservoir’s area. In contrast, the second approach, though more demanding in terms of computational resources, offers greater precision by focusing on the largest contiguous water body.
Figure 9 presents the example of the Abrilongo Reservoir in Portugal.
The plots above show, on the left, the binary mask where all the white pixels are counted, resulting in an area of 16.83 km2. On the right, the corresponding polygon is plotted, with an area of 16.34 km2, which is approximately 3% less. This comparison demonstrates the effectiveness of using contour-based methods to achieve a more accurate representation of the reservoir’s area, highlighting the potential for overestimation when using the simpler pixel-summing method.
Despite these considerations, the AOI, which represents the maximum extent of the reservoir, helps address this issue. The water area is calculated by counting water pixels within the AOI using a reducer in Google Earth Engine. A reducer is an efficient and straightforward GEE tool designed to aggregate data across dimensions, enabling operations such as summing pixel values.
3.2.7. Calculating the Volume of the Reservoir
The storage capacity curve, also known as the area–storage or capacity–elevation curve, represents the relationship between the volume of water stored in a reservoir and the corresponding surface area or elevation of the water level. It will be used in this final stage of the tool for calculating the reservoir’s volume by interpolating the inundated water area using established relationships between surface area and stored water volume. This volume can be derived through several methodologies depending on available data and the reservoir’s characteristics. The approach employs a multi-source strategy to obtain the area–volume (A-V) relationship.
This curve is derived from detailed bathymetric surveys or elevation data of the reservoir basin. This information is more feasible from local authorities due to their direct access to site-specific data, adherence to regulatory standards, and the use of high-resolution and regularly updated measurements that accurately reflect local conditions. Therefore, the primary source of these data is input by the user (as an excel or txt file) or obtained through datasets published from these certified agencies.
The tool includes two datasets for A-V relationships, namely the Sistema Nacional de Informação de Recursos Hídricos (SNIRH) and Global Reservoir and Dam Database (GRanD).
The Sistema Nacional de Informação de Recursos Hídricos (SNIRH) provides historical hydrological data for the main reservoirs in Portugal. It can be accessed through the website of the Portuguese Environmental Agency, and it includes current and historical information on hydrological matters within the Portuguese territory. This includes storage levels for more than 60 reservoirs, along with details such as maximum capacity, maximum water level, storage–volume capacity curves, and other characteristics.
The Global Reservoir and Dam Database (GRanD) offers a comprehensive global dataset incorporating detailed information on reservoirs and their capacity relationships, managed by McGill University, with the collaboration from several key institutions [
20]. The dataset draws from a broad range of underlying information, including regional and national inventories, gazetteers, publications, monographs, and maps, providing a comprehensive global dataset on reservoirs and dams. The most recent version, GRanD v1.3, was updated to add 458 more reservoirs to the previous version, increasing the total to 7320 records and global reservoir storage by 666.5 km
3. This dataset includes detailed storage capacity relationships for each reservoir contemplated [
20].
After the selection by the user, the corresponding reservoir is identified in the database and a linear interpolation is then applied to estimate the volume, utilizing the resulting water area and the reference dictionary. This integrated approach ensures a more accurate estimation of the reservoir volume based on the calculated water area.
If users have a Digital Elevation Model (DEM) but no existing A-V relationships, the tool can generate these using the DEM. This involves calculating the area and volume at various elevation levels, providing a tailored estimate of the reservoir’s storage characteristics. The code divides the model into different and equal elevations slices by masking the pixels that have a specific elevation of that step, and it defines two functions. The first one computes the volume below a specified elevation by determining the water height above each cell and summing these values, multiplied by the cell area. The second one creates a mask to identify inundated cells, counts them, and multiplies them by the cell area to obtain the total inundated area. Lastly, a dictionary is initialized and created to store these calculated volumes and areas. This process systematically computes and logs the volume and area for a range of elevation levels, providing a comprehensive dataset for creating an area–volume dictionary, which is also delivered as part of the output.
In scenarios where neither A-V relationships nor DEM data are available, the tool defaults to the GLOBathy dataset. GLOBathy is a comprehensive global bathymetric dataset that covers over 1.4 million water bodies, developed to complement the HydroLAKES dataset. It employs a GIS-based approach to create bathymetric maps by integrating maximum depth estimates with the geometric and geophysical characteristics of water bodies as defined by HydroLAKES. These depth estimates have been validated against observed data from 1.503 different water bodies, ensuring their accuracy. Additionally, GLOBathy provides derived head–area–volume (h-A-V) relationships for these water bodies based on the generated bathymetric maps [
21].
These relationships were modeled using polynomial functions, with coefficients estimated by fitting the bathymetric data across multiple depth layers, resulting in a detailed understanding of the water body’s volumetric properties as a function of water level [
21].
3.3. User Interface (UI)
This paper aimed not only to develop a method for calculating reservoir volumes using remote sensing, thereby overcoming significant challenges, but also to create a user-friendly tool that allows individuals and groups to analyze water resources as intuitively and practically as possible. Although the current tool is a prototype built on the Streamlit framework, it already incorporates the core principles and foundations that will guide future versions. This section will outline some of the key features developed to achieve these objectives.
The tool’s user interface is designed to be accessible and straightforward, with the following four main tabs: (i) About, (ii) Tutorial, (iii) Worldwide Analysis, and (iv) Country-Specific Analysis (in the current version, Portugal).
The About tab provides technical details about the project, including methodology explanations and other essential information. The Tutorial tab features a short video that guides users through the functionalities of the tool. The Worldwide Analysis tab is central to the tool, allowing users to select and analyze reservoirs globally. It offers a general overview of water storage data across different regions. The Country-Specific Analysis tab, currently tailored for Portugal, provides a more focused and detailed analysis of the country’s water reservoirs. It offers faster access to relevant data and deeper insights into water storage trends.
In the Worldwide Analysis tab, users have several options to select a water body for further analysis. They can use the integrated map to click directly on a water body, enabling a quick search by entering the reservoir’s name or coordinates, as shown in
Figure 10a,b. Alternatively, users can input a region of interest (ROI) using a geojson file or using built-in tools to create and export a polygon that delineates the ROI, cf.
Figure 10c.
After selecting the reservoir, the user can specify the date range by choosing a start and end date (
Figure 10d). Next, the user can select the maximum cloud coverage percentage for the images retrieved from the collection (
Figure 10d). A higher cloud coverage percentage will filter out fewer images, allowing for a more comprehensive analysis over a broader date range, but with an increased likelihood of cloud interference. If cloud interference is detected, the code will automatically mask the clouds and locate the matching polygon, as described in previous sections. Finally, the user can press the ‘Start Computing’ button, and the code will execute the script, displaying the output as previously discussed.
5. Discussion
The accuracy and reliability of the tool in calculating reservoir volumes are quantitatively assessed using key performance metrics, including the Coefficient of Determination (R
2), Mean Absolute Percentage Error (MAPE), and Mean Bias Deviation (Bias). These metrics provide a robust evaluation of the effectiveness under different scenarios and are presented in
Table 3, offering a detailed assessment of its performance.
The performance metrics indicate a strong predictive capability of the model. The tests yield an average Mean Absolute Percentage Error (MAPE) of 5.35%, an average Coefficient of Determination (R2) of 0.90, and a BIAS of 0.004. The low MAPE reflects that the model’s predictions, on average, deviate by only 5.35% from the observed values, suggesting a high level of accuracy in the estimations. The R2 value of 0.90 indicates that 90% of the variance in the observed data is explained by the model, demonstrating its robustness and reliability in capturing the underlying data patterns.
Moreover, the constant low Bias, very close to zero, resulting in an average value of 0.004, suggests the absence of systematic errors in the model’s predictions, implying that the model neither consistently overestimates nor underestimates the actual values. This balance is crucial for ensuring the credibility and generalizability of the model’s outputs across different datasets.
The combination of low MAPE and BIAS and high R2 underscores the model’s efficacy in providing accurate and unbiased predictions, making it a valuable tool for further application in this domain. However, to comprehensively understand the method’s behavior across reservoirs with varying characteristics, such as differences in volume, surface area, and the area–volume (A-V) relationship derived from diverse data sources, a more detailed investigation was carried out. The goal of this more thorough investigation was to find any possible relationships and trends that would shed light on how the model functions in various scenarios.
A correlation analysis through a correlation matrix (
Figure 17a) and the correlation chart (
Figure 17b) reveals several key relationships between the variables, particularly between reservoir size (maximum volume) and the performance metrics (
R2 and MAPE). Maximum area and maximum volume are strongly positively correlated, which is expected, as larger reservoirs naturally have greater surface areas and volumes.
The relationship between R2 and reservoir size is mixed; in some cases, R2 increases with reservoir size, suggesting that the models fit larger reservoirs better. However, this trend is not consistent across all reservoirs. MAPE tends to increase with maximum area and volume as well, implying that larger reservoirs experience greater percentage errors in predictions. This may reflect the increased complexity of modeling larger water bodies, where simpler models struggle to capture all the nuances involved; for instance, the spatial resolution of the satellite images may cause some details to go missing. The size of the reservoir may also contribute to this discrepancy, with larger reservoirs typically having slightly larger differences.