Utilizing Deep Learning and Object-Based Image Analysis to Search for Low-Head Dams in Indiana, USA

Crookston, Brian M.; Arnold, Caitlin R.

doi:10.3390/w17060876

Open AccessArticle

Utilizing Deep Learning and Object-Based Image Analysis to Search for Low-Head Dams in Indiana, USA

by

Brian M. Crookston

^1,*

and

Caitlin R. Arnold

^1,2

¹

Utah Water Research Laboratory, Department of Civil and Environmental Engineering, Utah State University, Logan, UT 84321, USA

²

J-U-B Engineers, Inc., Logan, UT 8432, USA

^*

Author to whom correspondence should be addressed.

Water 2025, 17(6), 876; https://doi.org/10.3390/w17060876

Submission received: 11 February 2025 / Revised: 10 March 2025 / Accepted: 11 March 2025 / Published: 18 March 2025

(This article belongs to the Special Issue Application of Remote Sensing and GIS in Prediction Hydrogeological Hazards)

Download

Browse Figures

Versions Notes

Abstract

:

Although low-head dams in the USA provide water supply, irrigation, and recreation opportunities, many are unknown by regulators. Unfortunately, hundreds of drownings occur each decade at these dams from an entrapment current that can form immediately downstream. To explore the ability of deep learning to scan large areas of terrain to identify the locations of low-head dams, ArcGIS Pro and embedded deep learning models for object-based image analysis were investigated. The State of Indiana low-head dam dataset was selected for model training and validation. Aerial imagery (leaf-off conditions) captured from 2016 to 2018 for the nearly 94,000 km² area had a minimum resolution of 304.8 mm. A new Python code was developed that automated the generation of training images and searching was limited to 100 m wide river corridors. Due to bank vegetation, all low-head dams were assigned a visibility score to aid in training and performance analysis. A total of 19 backbone models were considered with single shot detection and options for RetinaNet, Faster R-CNN, and batch normalization. Additional identification classes were incorporated to overcome identification of visually similar objects. After four training iterations, the final trained model was a ResNet RetinaNet backbone model featuring 101 layers with an 83% recall rate for dams with high visibility and a 17% recall rate for those with moderate visibility.

Keywords:

open scanning; object identification; ArcGIS Pro; drownings at low-head dams

1. Introduction

1.1. Public Safety at Low-Head Dams

Humans have long constructed dams, weirs, and similar overflow structures in rivers and waterways to impound and divert flows, for navigation and shipping, or to use river currents for mills and (in more modern times) hydropower [1]. The low-head dam is one such structure that typically spans the entire channel width with water flowing over the crest. Prior to 1900, thousands of low-head dams were constructed in the USA from a variety of materials (wood, stones, masonry, concrete, etc.) and cross-sectional shapes (reference), commonly ranging from 1.5 to 4.5 m in height. As the name suggests, these hydraulic structures generally have relatively shallow flow depths over the structure (e.g., 0.2 m to 2.0 m) and high tailwaters reducing the water surface elevation difference. For these reasons, identifying a low-head dam while traveling downstream on a river can be extremely difficult [2].

A dangerous entrapping current (more commonly known as a keeper, reverse roller, submerged hydraulic jump, or hydraulic) can form immediately downstream of a low-head dam (see Figure 1a). Flows over the dam plunge directly into the downstream waters towards the river bottom, entraining large amounts of air (see Figure 1b). As energy reduces, the downstream tailwater deflects this flow towards the surface, forming a point where the flow splits (i.e., the boil); the portion of flow deflected upstream completes the current (see Figure 2). Flow velocities near the dam are typically quite high, easily capable of overturning small boats and making it very difficult for watercraft and swimmers to escape. The significant amount of entrained air decreases the buoyancy of life vests and other floatation devices while any woody debris or other entrapped materials tumble in the flow, further increasing the danger.

Indeed, the seemingly calm waters (e.g., Figure 1a) conceal the danger from any unfamiliar with these structures [3]. As a result, low-head dams in the USA have caused over 700 documented fatalities since 1950 and have been referenced by news media and others as drowning machines [2,4]. To reduce fatalities, the location of a low-head dam and any corresponding dangers should be clear to the public.

1.2. Identification

The majority of low-head dams in the USA are unregulated from a safety perspective. Furthermore, the ownership of some low-head dams is unknown, and in 2020 most States and the Federal Government had not generated an inventory of these potentially dangerous structures. Therefore, in 2020, volunteers formed a Joint Task committee sponsored by the American Society of Civil Engineers-Environmental & Water Resources Institute (ASCE-EWRI), the Association of State Dam Safety Officials (ASDSO), and the United States Society of Dams and Levees (USSD), and crowdsourced efforts to manually search publicly available aerial imagery to identify low-head dams, a time-consuming and labor-intensive process. However, additional efforts are needed to complete and refine this first database so that each potential location can be checked and confirmed by a trained professional. Thus, could a deep learning model and a geographic information system (GIS) software be used to search aerial imagery and identify low-head dams similar to a human volunteer? A computer generating a draft inventory of a region may increase identification efficiency or provide a redundant check to a human inventory. However, any inventory needs to be carefully reviewed by a qualified expert.

1.3. Automated Object Detection

Computer vision can be used for a variety of purposes including image classification, dynamic analyses, and various aspects of detections including feature extraction and object localization [5,6,7,8,9,10,11,12,13]. Artificial intelligence (AI) has advanced many detection techniques across a variety of applications and industries (e.g., [14,15,16]) yet object visibility (i.e., perspective, lighting, color and contrast, ambiguity) remains a primary challenge [17,18,19].

Machine learning (ML) is a subcategory of AI that uses computer systems to solve a problem (e.g., such as object detection) through experience or without explicit programming [20]. During the past decade, it has seen widespread commercial use for developing practical software and tools [21]. Deep learning (DL) is a subset of ML where training data are passed through multiple algorithms (i.e., model layers) that each create a different and complimentary interpretation of the training data [22]. The presence of multiple layers and their simultaneous usage imitates the human brain and is known as a Convolutional Neural Network (CNN) [23,24,25] with region-based CNN, a specific DL architecture used for object detection [26]. The introduction of regions in CNNs (R-CNN, [17]) provides a deeper architecture, with improved R-CNN models such as Faster R-CNN that jointly optimize classifications and bounding box regression tasks [27]. Additional factors may increase the performance or training efficiency of R-CNN models such as batch normalization [28] and RetinaNet pyramid network architecture for object detection [29].

Once trained, a DL model can perform an image analysis in a fraction of the time required for manual processing and, in some instances, with high accuracy [24]. Examples of automated image classification in Civil Engineering include infrastructure damage reconnaissance [24], construction monitoring [30], pavement resilience and wear [31,32], stormwater and sewer system defect detection [33,34], and land-use classification in geographic images [35,36]. DL models may evaluate individual pixels (pixel-based detection) or pixel clusters (object-based classification) [37,38].

The process to create a DL model able to perform a certain image classification task generally includes conceptualizing the model, data acquisition, preparing training data, training the formulated model, and evaluating performance. Multiple iterations are common to refine and tune various aspects of the model before it is finalized and implemented.

With the rapid advancement of technology during the past 25 years, there are currently many available software programs that include a DL model framework with training algorithms for geographic information system (GIS) applications. Some programs that have produced highly accurate models include Google Earth Engine, eCognition, TopoNet, Python, R, ENVI, and ESRI’s ArcGIS Pro (v 2.7) [39,40,41,42]. For example, ESRI’s ArcGIS Pro (v2.7) has 19 available backbone models that contain between 11 and 201 simultaneously operable algorithms or layers [43]. Most, if not all, of these 19 backbone models have already been trained on millions of images available from ImageNET, an online dataset of over a million labeled images [44,45]. To our knowledge none of the DL models in ArcGIS Pro have been trained with low-head dam images, but these models have been trained on other images [40,42].

A critical aspect of data training is obtaining aerial imagery of sufficient resolution to facilitate object detection [22,46]. For example, Landsat satellite imagery has been accessible to many researchers over the years but typically has a pixel resolution of 15 m or 30 m [46]. High resolution satellite imagery is becoming increasingly available to the public; IKONOS and Quickbird imagery can capture imagery with pixel resolutions under 2 m. Imagery taken from piloted aircraft or unmanned aerial vehicles (UAV) can produce very high-resolution imagery with pixel resolution even less than 10 cm [41,47,48], but such datasets often have limited spatial extents. However, even high-resolution imagery can include features that obscure or encumber detection. Thus, image quality is time sensitive, such as on clear days at noon or during seasons where vegetation is less dense.

The labeling and classifying of imagery for training purposes for smaller task-specific efforts may be performed manually [49] with image rotation, etc., used to significantly increase the number of training tiles. A baseline number of samples needed for one type of classification is highly dependent on the task and desired outcome, software, and DL algorithms, method, and application [25]. Image pre-processing may include manipulating bands (e.g., red, green, blue, infrared, thermal, panchromatic) to highlight desired objects and features. Each band acquires data for a particular range of frequencies along the electromagnetic spectrum [50]. There is no specific rule for partitioning training from validation data, but 50-50, 30-70, and 20-80 are common.

Effectively labeling features or objects in an image is critical for training; Python DL models typically require a set of images where the entire image is used to denote a specific object. ArcGIS Pro requires a feature to be labeled within the image, thus allowing labeling of multiple features and multiple classes of features within the image. The process to create training images requires polygons (of any shape) that outline the feature of interest; finished training images have the images and attached metadata shows the x and y extents of each feature in the image, essentially turning the input polygon into a rectangle. A model trained by using these data would be able to specify within an image where a feature is located. There are various detection models that take advantage of this such as single shot multibox detection as it generates a bounding box fit to the size of an object and generates labels for any object classes specified in training [51].

There are various ways to present the training data to the model depending on the application and software. Options available to a user include selecting a backbone model, the number of times the images will be passed through the ANN layers, and the percent of the provided dataset that will be used to validate the trained model. Training a DL model can require significant processing power depending on the input data, task, backbone model, and number of images to be passed through a specified number of layers [52]. Extensive run times may require cloud computing or similar capabilities and can be quite expensive [39]. Thus, leveraging readily accessible computers and employing efficient models is advantageous, such as the first-place winner of the xView2 Challenge trained all their models over about 7 days on 2 GPUs with 12 GB each [53,54].

As mentioned, two approaches for image analysis include pixel-based classification and object-based image analysis (OBIA), also known as geographic object-based image analysis (GOBIA) for GIS applications [55]. These two types of classifications exist for both supervised and automated classification processes [56,57]. Although each approach has utility, higher accuracies have been achieved by OBIA with high-resolution imagery [35,41,42,48]. This is presented graphically via Figure 3, adapted from [35]; Figure 3a has a low resolution of 20 m and pixels are significantly larger than objects with sub-pixel techniques needed for detection; Figure 3b has a medium resolution of 5 m where pixel and objects sizes are of the same order and pixel-by-pixel techniques would be appropriate; Figure 3c is a higher resolution of 1.25 m where pixels are significantly smaller than the objects and regionalization of pixels into groups of pixels and, finally, objects, is possible. Using the Shannon sampling theorem, Blaschke [35] concludes that an object should be a minimum of one tenth of the pixel size.

OBIA locates the objects within an image. In an ArcGIS environment, OBIA classifications exist as a vector dataset. Because OBIA relies heavily on the association of nearby pixels, it can use spectral, texture, and shape characteristics of an object [35,41,46,47]. Two approaches for OBIA include (1) region growing or grouping techniques and (2) edge or boundary detection techniques. OBIA overcomes within-object variation that can cause misclassification in a pixel-based classification [41].

Variety in the size of an object or the environment surrounding it can further compromise the ability of DL to accurately detect objects [35,40,58]. The variation in surroundings has potential to increase as the study area increases. In a review of OBIA for land cover applications, refs. [46,47] found that over 95.6% of 254 studies involved an area less than 3 km². For study areas that used a spatial resolution of 15 m and finer, the study area was less than 1 km². In conclusion, refs. [46,47] suggest the need for larger study areas in future research. This is supported by a prior effort where automated OBIA was used to update the UK national land cover map in 2007 [41]. Applications beyond land cover classification that have used OBIA techniques as fine spatial resolution imagery have become available. This includes mapping trees and coral reefs [41,48]. Rather than classifying an entire image, refs. [25,40] used OBIA to find specific objects such as airplanes, wind turbines, or swimming pools. Accuracies of 83% and 73% were achieved, yet it was not noted how accuracies were impacted by study area.

Perhaps the most significant obstacle when identifying structures in waterways is the variation in surrounding terrain, urbanization, and land cover. Previous studies have focused on small regions where one object class has little opportunity for terrain variability [46,47,59,60]. However, low-head dams are found on rivers passing through metropolitan areas, farmland, deserts, areas with high vegetation, mountains, and plains. They vary in length to match the width of the water body they are on. Some have water flowing over the dam crest throughout the year (perennial rivers and streams) while others only see water during irrigation periods or seasonal flows.

1.4. Deep Learning and Low-Head Dam Identification

A DL model that identifies low-head dams would potentially achieve the highest accuracy success through object-based identification. Given the size of low-head dams relative the pixel resolution of most imagery sources, a single pixel or group of pixels of a low-head dam would be difficult to distinguish from another feature in a river, thus the main detection properties of geometry, color, contrast are best noticed through OBIA and the relationships between pixels or resulting from pixel groupings. Thus, the characteristics of the entire low-head dam (e.g., geometry, color, contrast, etc.) across all pixels that represent the low-head dam are what distinguish it in overhead imagery.

Several studies have applied DL techniques to locate various types of dams. A review of the use of modal identification techniques to identify was recently completed [61]. A study in Brazil [58] to locate mine tailings dams using Google Colab [62] and semantic segmentation. Their DL model identified areas with specific characteristics often associated with mining operations. While the accuracy was not reported, they verified results manually, finding the model sufficiently useful. A DL model was developed in Australia to detect farm dams, using a pre-trained ResNet34 model, achieving 94.8% accuracy [63]. Their approach, however, faced challenges when dams spanned image borders, which could lead to missed or double-counted dams. By sampling 124,510 images, they detected 5105 dams and estimated over 1.7 million farm dams across Australia. Though effective for estimating numbers, this method was less suitable for locating individual dams [63].

Another DL model was developed [64] using Python to classify prepared image tiles with and without a low-head dam. Similarly to [63], this model processed individual image tiles with various rotations, mirroring, etc., and achieved 91% accuracy, which is excellent but limited as the study is a result of image tile bias and the model was not capable of scanning aerial imagery over large extents.

Therefore, this study considered how existing DL models in ArcGIS Pro v2.7 can locate low-head dams using Object-Based Image Analysis (OBIA) performed over continuous aerial imagery. Input data pertaining to the State of Indiana were selected as high-resolution imagery and a full low-head dam inventory existed. This study also explored the automation of training tile preparation, multiple object classes, and if searching can be limited to river corridors. Due to bank vegetation, all low-head dams were assigned a visibility score to aid in training and performance analyses.

2. Materials and Methods

2.1. Study Area

The State of Indiana is relatively flat with a diverse mix of urbanized and relatively natural perennial river systems. Early settlers (1790–1849) typically traveled to the southern region of the state via the Ohio River; large wooded areas were quickly cleared for agricultural purposes and many gristmill and sawmill businesses emerged that featured low-head dams (see Figure 4). In 2020, 170 low-head dams existed (https://www.indianamap.org/datasets/INMap::low-head-dams/about, accessed on 1 February 2020) and about 25% of the state was wooded with riparian river corridors lined with many large trees and dense vegetation with diverse fauna. The extensive river and stream network (approximately 57,410 km) is commonly accessed for recreational purposes by novices and experienced kayakers and boaters alike. Between 2010 and 2020, the Indiana Department of Homeland Security identified 25 fatalities at low-head dams; in 2020, House Bill 1099 was passed to place safety signage and prohibit people from accessing these structures and adjacent waters. An interactive low-head dam map and safety information can be accessed at (https://www.in.gov/dhs/get-prepared/general-safety/low-head-dam-safety/, accessed on 14 October 2020).

Model training and final area scanning was performed using a workstation instead of a cluster, assuming potential users of this model would not have access to higher computing resources. The machine had 16 cores at 3.5 GHz with 64 GB DDR4 RAM and a single 4 GB graphics card. ESRI’s ArcGIS Pro (v2.7) was selected for this study as this software is commonly used by civil engineers domestically and internationally, it is readily available, it includes software extensions for performing image classification using DL, and it is a powerful tool for obtaining, organizing, and analyzing geospatial data. Additional motivation for its usage was in the context of the Joint Task committee and National Dam Safety Program as professionals involved were familiar with or users of ArcGIS Pro. Note that the DL model trained in ArcGIS Pro is unrelated to the Model Builder application.

An overview of the DL model workflow used herein is presented in Figure 5. It includes five phases beginning with model purpose and objectives and concluding with model deployment. Novel aspects of this methodology include the visibility class, the automation of training image preparation, and the consideration of all DL backbone models available in ArcGIS Pro at the time of this study. These aspects are discussed in further detail.

2.2. Conceptualization

The DL tools in ArcGIS Pro v2.7 required the Image Analyst extension as well as installation of deep learning Python packages available at Github (see https://pro.arcgis.com/en/pro-app/help/analysis/image-analyst/install-deep-learning-frameworks.htm, accessed on 1 February 2020). Environments and other software settings were configured according to [65] and summarized at https://www.esri.com/arcgis-blog/products/arcgis-pro/imagery/deep-learning-with-arcgis-pro-tips-tricks/ (accessed on 19 December 2020). The DL tools have 19 available backbone models, each with varying characteristics and emphases (see Table 1). Please note that the number of layers each model contained is noted in the model title. All backbone models were tested in the initial stages of model training.

2.3. Data Acquisition

A total of 170 known low-head dams were included within this dataset for Indiana. This list of dams includes run-of-the-river low-head dams and does not include low-head dams off a waterway that are used for storage or to provide pressure and do not have the reverse roller. This dataset is believed to be complete, and all the low-head dams in the dataset have been field verified. The State of Indiana has orthophotos taken in leaf-off conditions in 2016, 2017, and 2018 with resolution of at least 304.8 mm [66]. At the request of some counties and cities, parts of the imagery have a resolution of 76.2 mm and 152.4 mm. Imagery was retrieved from the Indiana Office of Information Technology through a portal managed by Indiana University (https://gis.iu.edu/, accessed on 19 October 2020), and it was downloaded in a .tiff format in tiles that are 5000 pixels by 5000 pixels (e.g., 2.3 km² at 304.8 mm resolution). The imagery dataset was produced to meet the American Society for Photogrammetry and Remote Sensing (ASPRS) Accuracy Standards [67] for 30 cm orthoimagery pixel size. It has 4 bands and was collected using a Leica ADS80 Airborne Digital Sensor (Leica Geosystems, Heerbrugg, Switzerland) [68].

2.4. Training Data Preparation

From the large aerial tiles, the locations of known low-head dams were provided by Manuela Johnson in an Excel file, including name and number identifier for the dam, coordinates in Universal Transverse Mercator (UTM), and other metadata [69]. The geolocations of each dam were imported as a point feature class. Polygons around each low-head dam was created manually as part of this study as each structure had a different size, orientation, geometry, and visibility due to vegetation. This was performed through standard shapefile creation tools in ArcGIS Pro rather than the “label objects for deep learning” command. Note that to aid in object identification, the polygons included a small portion of calm waters upstream of the dam and a portion of the turbulent (often white aerated water) waters downstream; polygons spanned the entire river width from bank to bank. The 170 low-head dams were approximately 5.5 m to 182.9 m width, but these data could not be used to auto-generate polygons as lengths measured using the imagery varied from 0.5 to 3 times the listed length (not unexpected given water flowing over the structures). For dams with unclear imagery (overhead vegetation or poor contrast between water surface and land) or no visible crest, the channel width was approximated. As such, these measured lengths are only intended to be used to characterize one visual property of a dam and should not be used for other purposes.

About 34 dams were difficult to identify by the human eye due to overhead vegetation (bare trees, bushes), poor contrast of water, or no visible dam abutments. As visibility is critical to object identification, a visibility class was formulated as presented in Figure 6 and Table 2.

In this study, the Indiana dataset used for training was divided approximately 50%-50% for training and validation based upon region (see Figure 4) running North and South. The model was trained on the second and fourth groups and tested on the first and third groups. This division was advantageous for this study because both the training and the validation groups needed diverse characteristics in the model: both groups contain agricultural land and portions of the city of Indianapolis, this split is approximately half of the state as well as half the low-head dams, both groups contain large and small rivers, and both groups contain some imagery from each year, 2016, 2017, and 2018. Please note that the imagery in Allen County was saved in a different projection than the rest of the counties, which made it difficult to export training data using the same method as the other counties. Therefore, these four dams were used for validation purposes. The total counts of the dams used for training and validation, also divided by visibility classification, is shown in Table 2.

The “export training data for deep learning” is responsible for creating the image chips for training. The imagery and low-head dam polygons are the inputs for this command. Additional characteristics of the image are controlled by additional parameters including image clip size in pixels; stride, which relates to how much overlap will occur between two image clips; meta data format; and rotation angle. The tile size must allow for the entire dam to be contained within the image. As all image chips must be the same size, the selected tile size must work for all training images. The metadata prepares additional data that can be used in training. A rotation angle creates additional images that have been rotated to provide additional images for training. The number of additional images created is equal to 360° divided by the input rotation angle. The images are near duplicates of each other at different angles; the images differ from each other at the corners of the image chip.

Another option in the “export training data” tool is the metadata output. Metadata is a set of data that describes other data. In deep learning, metadata often includes information that gives additional details to supplement an image, such as cloud cover percentage or sun angle. Pritt and Chern [41] found that including this type of metadata improves the accuracy of the classification. The metadata prepared in the execution of this tool does not create this type of metadata or allow for it to be input. The metadata prepared by this tool included a label for each image regarding where within the image the low-head dam is located, the number of pixels in the prepared image, bands in the images (e.g., RGB by default, infrared is also common), and statistics regarding the total number of images and features in the training dataset.

It is significant to note that in ArcGIS Pro Version 2.7 the author did not find a way to include additional metadata to help in training the model. This includes an attempt to include data regarding the distance to a stream or river feature or information regarding the elevation. This is possible to include in other deep learning applications, such as those implemented using Python, but was not successful at the time of this study within ArcGIS Pro. Figure 7 shows the process resulting in the image chips. Figure 7a is an image of a 131 m wide low-head dam on the Eel River. Figure 7b shows the polygon surrounding the dam and the turbulent tailwater. Figure 7c shows the image chip created before image pre-processing. Figure 7d shows the image chip created after removing the fourth band from the source image. The metadata associated with this final image specifies a width and height of 550 pixels with the polygon residing 67 pixels < x < 349 pixels (counted from left to right) and 10 pixels < y < 463 pixels (counted from the top down).

Image chips are a critical part of the DL model creation process. As such, image pre-processing can enhance the quality of all images by not only centering the low-head dam in the tile but also manipulating the spectral bands. For example, Figure 7c is very washed out or has poor contrast and coloring. This was resolved by removing the infrared band on the imagery of Indiana (see Figure 7d). Note that this was performed before the creation of image chips. Another type of image pre-processing that was performed was creating mosaic datasets and mosaic Rasters. Through this effort, any low-head dam that occurs on the border between two raster images is made continuous. Mosaicking also condensed the Rasters so there were fewer inputs to the ArcGIS Pro, export training data for deep learning command function. To clarify, Figure 8 shows an example of how this tool functions; the software begins at the upper left corner of the input imagery and the tool creates the first image clip with the specified number of pixels (height and width). The location of subsequent image chips is determined according to the stride parameter. If the “Output No Feature Tiles” is set to “Only Tiles with Features”, then any tile with any part of a low-head dam polygon will be saved. For large features that rarely occur near each other, the output images will often contain only a portion of a dam and not a sufficient portion for DL image recognition. Through this effort, the preliminary exports created 1067 training images with only 628 considered to adequately depict a low-head dam and fewer still showed both dam abutments (or riverbanks).

Figure 8 shows two blue polygons marking low-head dams. When this image is used to create training data, the first extracted training image (1) is the upper left corner and appears in a dashed yellow box. If the stride is the same as the pixel width and length, the second extracted image (2) would be immediately right (green dashed box). If only the images intersecting a dam polygon are saved, only the five clips outlined in purple (3) are exported. The bottom left purple image contains all of low-head dam #32. The bottom right image is sufficient for identifying low-head dam #31. The other three training images in purple do not contain a sufficient portion of low-head dam #32 to adequately depict the properties of a dam; thus, only two of the five created images would be deemed appropriate for training.

Manipulation of the tool is necessary to properly generate training tiles. Two methods were considered herein: (1) clip each raster imagery such that the upper left corner would lie to the northwest of the centroid of a low-head dam a distance of half the desired pixel width and pixel length—this solution was not selected as it would require Rasters clipped for each dam and at each desired pixel dimension, making automation cumbersome and complicating the organization of dams; (2) alter the extents (environment setting) to set the upper left starting location convenient to low-head dams. This solution was implemented as the X_min and Y_max coordinates were readily determined as 1.2 times the distance from the edge of the image to the centroid pixel. The X_max and Y_min coordinates were 1.8 times the distance from the edge of the image to the centroid of the pixel. These values allowed centering of low-head dams without creating any image chips with only fragments of dams.

Setting the extents of the export requires that image chips for each dam be exported separately. The metadata for each export was created individually and it became necessary to combine the metadata as well as the image chips. To accomplish this, a Python code was developed to automate this task and is available with open access through Hydroshare [70]. A further challenge with image chips in this study was that the metadata only gives the x and y extents of the polygon, not the exact features. Depending on the orientation of the polygon, much more than just the low-head dam could be included within the bounding box. Therefore, additional image chips were created using a 45-degree rotation angle in the “export training for deep learning” command in order to have a mix of polygons that only contained the dam with an additional set that contained the dam with some additional adjacent information. It is recommended to have a rotation angle of 0 for initial exports until the tool and Python code are calibrated to minimize the time required to export training tiles.

2.5. Training Deep Learning Model

Training the DL model occurred in Python but can be performed through the ArcGIS script called “train model for deep learning.” This takes the framework created by the “export training data for deep learning” tool and creates definitions based on observed features. It iterates through the provided data to recreate definitions that fit specified data. The output includes a DL package file (.dlpk), an ESRI model definition file (.emd) to be used for detecting objects, and ArcGIS Pro model metrics to summarize how training was performed and provide an anticipated accuracy metric of the model. Two inputs to this command include the backbone model and the pre-trained model. The backbone models (see Table 1) were previously trained by ESRI via the ImageNET dataset. A backbone model prescribes the number of layers and the neural networks basic set up. The training process uses one of these input models and retrains or refines these models with additional training data. If a previous model showed promise but had less than ideal accuracy, using it as a pre-trained model is preferred to further train the algorithm for quicker and more accurate object identification. The deep learning model in ArcGIS Pro was trained in four iterations or stages, summarized in Table 3.

2.6. Model Performance and Validation

DL model validation was a function of the application, characteristics, available information, and objectives of the model [71,72]. In this study, a combination of several methods (bootstrapping and cross validation) used in environmental and water modeling fields were tailored to the use of a DL model to locate low-head dams. A contingency table was developed (Table 4) to consider model performance coupled with four additional parameters: accuracy (Equation (1)), a bias score (Equation (2)), and a recall value (Equation (3)):

Table 4. Deep learning model confusion matrix.

		Actual		Total
		Positive	Negative	Total
Model Prediction	Positive	True positive (accurately located low-head dam) TP	False positive (incorrectly located low-head dam) FP	Total low-head dam locations identified by model δ
Model Prediction	Negative	False negative FN	True negative TN	Total locations not identified by model Δ

A c c u r a c y = \frac{T P + T N}{δ}

(1)

B i a s = \frac{T P + F P}{T P + F N}

(2)

R e c a l l = \frac{T P}{T P + F N}

(3)

where the total count N is the 170 low-head dams, TP is a correctly identified low-head dam, FP is not a low-head dam but identified as such by the model. FN is a low-head dam not identified by the model. TN (true negative) is less relevant in this study as the DL model was able to correctly identify a considerable number of objects that were not low-head dams. Thus, TN in Equation (1) was assumed to be 0 since the non-dam features correctly identified were so abundant in a state-wide search that they would have skewed metrics intended to focus on TP and thus were not worth saving and counting. Δ is the total number of low-head dams not identified by the model, and δ is the total number of locations identified by the model as low-head dams (TP + FP).

A completely accurate model will have accuracy = 1 and bias = 1. For this study, it is significant to note that FP > 0 (within reason) is desired over high scoring accuracy and bias as these can be checked manually by a professional for inventory purposes. FN > 0 prevents this validated model from preparing a low-head dam inventory for checking and quality assurance. To ensure the model obtains fewer false negatives, only a bias score greater than 1 is permissible. Recall values are common DL performance metrics and describes how often a real low-head dam was detected by the model.

Bootstrapping, or random sampling, occurred when one data point was left out and the model was created with all other data and tested against this point. The data point was reintroduced to the pool, and another was removed to again train the model and test against this point. Bootstrapping occurred within the “train model for deep learning” tool according to a validation percentage, an input value determining how many data points must go through bootstrapping for the final model. After the GIS tools ran bootstrapping for the freshly trained model, cross validation was performed. To perform this, the trained model was run on the half of the spatial extents of Indiana where low-head dams were not used in training. The low-head dams found by the model were compared to the known locations of low-head dams in these areas.

For each iteration, an initial validation was performed on a relatively small area of 14.09 km². This small area (shown in Figure 9) has two colors of water bodies, two verified low-head dams, and one additional feature that appeared to be a low-head dam from above. This image was a part of the second group of imagery used for training purposes. The results obtained for this area were not included in the final results since this imagery was also used for training.

Once a trained DL model performed sufficiently at preliminary validation, it was directed to scan other aerial imagery that included low-head dams. For final runs where validation data were collected for contingency purposes, the imagery was prepared specifically to decrease run time by limiting the DL model to only scan along stream corridors. This stream data were gathered from the Indiana Map portal that provides the National Hydrography Dataset (NHD) for the state of Indiana [73]. The lines in this stream shapefile should show the centerline of canals, rivers, and streams throughout the state but is not always accurate. Therefore, a buffer of 100 m from the centerline was used to clip the aerial imagery as all dams in Indiana were well within this 100 m buffer. The clipping in ArcGIS Pro was performed using the “Extract by Mask” tool in the Spatial Analysis Toolbox. This process, shown in Figure 10, subsets the imagery to delineate where the model should search for low-head dams. This speeds up run times significantly. For example, a preliminary run that took two days prior to clipping took two hours after clipping.

3. Results

3.1. Model Training Outcomes

As this study utilized a commercially available software, a summary of each training phase is provided to provide insight into model development.

During the first iteration, all ArcGIS default settings were applied for the aforementioned tools within the software and no constraints were set when exporting the training tiles. As a result, the training images were not centered on the dams with a total of 1067 training images. The Train deep learning model tool was used with the single shot detection model. However, in the detect objects for deep learning tool, the results showed dams scattered across the entire image. This led to a complete failure in detection accuracy, as the model falsely predicted many objects in the images as low-head dams.

To address this, a second iteration was conducted. The original training dataset was reprocessed, resulting in 628 images that were deemed beneficial for teaching the model to recognize low-head dams. The Train deep learning model tool was again used with the single shot detection model with accuracy = 0.755. Figure 11a demonstrates a gradual decrease in training loss, indicating effective learning over time. The loss curve shows a smooth downward trend, confirming the model’s ability to reduce errors on the training set. However, the validation loss exhibited significant fluctuations, including several large spikes. After processing around 600 batches, the validation loss showed some improvement but remained unstable, signaling the need for further refinement.

Although the model successfully detected both dams in image 308 (see Figure 12), it also incorrectly identified 47 other features as dams. These misidentified features shared characteristics such as straight edges with white on one side, bare white trees next to a river, and channel-like features (see Figure 13). For example, in Figure 13a the model incorrectly identified the roadway, which visually appears similar to the adjacent low-head dam. In Figure 13b, the outlines and contrast of large buildings was also a challenge for the model. Contrast from bridges and shorelines were also false positives (see Figure 13c).

In the third iteration, two additional classes, “water” and “not water”, were introduced to distinguish low-head dams from similar visual features in the environment. To streamline the training process, only relevant features were used for training, with 424 images assigned to the dam class, 120 images to the water class, and 80 images to the not water class. All 19 backbone models were tested equivalently using their default settings, and the top five performers were further evaluated on a separate set of 47 images containing two low-head dams. Model metrics for this configuration are presented in Figure 11b and Figure 14. Based on these results (see Table 5), the RetinaNet ResNet model with 101 layers had the best performance regarding accuracy and thus was selected for further testing. This model had accuracy for Dam = 0.881 (TP = 81/92), water = 0.348, and not water = 0.760. This model architecture includes four major components such as bottom-up pathway to calculate feature maps at various scales, the top-down pathway and lateral connections to upsample and merge top-down and bottom-up layers, and, finally, classification and regression subnetworks to predict objects at and regress anchor boxes. For more in-depth detail, please see ArcGIS Pro guidance and [70].

As shown in Figure 11b, both the training and validation losses rapidly decreased within the first few hundred batches, indicating that the model was learning efficiently. The training loss continued to steadily decline, while the validation loss plateaued after approximately 1500 batches, with minor fluctuations. This indicated that while the model was fitting the training data well, its performance on unseen validation data were no longer improving significantly, suggesting potential overfitting, as highlighted in Figure 15, which includes FP objects.

Given this result, a fourth iteration was undertaken with the introducing an additional new class: “forest” to distinguish forested areas from water and non-water features. Care was taken to only prepare training data from groups 2 and 4 to maintain prior training frameworks. Similarly, for validation, group 3 was reserved. This resulted in 640 images for the dam class, 144 for the water class, 120 for the not water class, and 72 for the forest class. Performance is highlighted in Figure 11c, Figure 16 and Figure 17. This model had accuracy for Dam = 0.738 (TP = 68/92), water = 0.913, not water = 1.000, and forest = 0.643. Although low-head dam identification accuracy slightly declined, accuracies in the other classes increased. Iteration 4 was thus identified as the final trained model configuration given the shortcomings in the number of training tile samples and the complexity of the terrain.

3.2. Trained Deep Learning Model Validation and Final Performance

The final model (LowHeadDamLocator.dlpk) was developed as noted through four iterations involving the preparation of training data, training the deep learning model. The final performance is summarized in Figure 18 and Table 6 and Table 7. Due to the very high number of FP, the model achieved an overall accuracy of 0.7%, which may be expected given the significant area and complex terrain being scanned. Final validation performance included bias = 44.46 and recall = 32.1%. The likelihood of detecting a dam was found to be strongly associated with its visibility level, as this model was able to detect 20/24 dams of visibility class 3 and only 5/29 of visibility class 2. This model successfully achieved the study’s objectives, demonstrating a recall value of 83.3% for the high visibility class and 17.2% for the moderate visibility category by openly searching aerial imagery.

4. Discussion

Machine learning techniques for object identification using aerial imagery can be effective, but their success heavily depends on image resolution, the quantity of training tiles, identification classes, and the specific features the model is trained to recognize. This study also confirmed that easily identifiable characteristics, such as distinct geometric shapes, colors, or contrasts, can aid in identification but may pose challenges during training if those key identifiers are too similar to other objects in the image.

It is important to note that in this study, it was found that there was a significant identification performance difference between DL algorithms that analyze only image chips for recognition and those that search large areas, similar to a human’s approach. Although there may be many available approaches, usage of the existing ArcGIS Pro was a valuable tool capable of this task. As a result, for both easily recognizable low-head dams (category 3) and harder-to-spot dams (categories 0 and 1), the algorithm’s performance is comparable to human identification. Several insights were also gained from instances of false positives and unsuccessful training iterations. These lessons include the following:

The number of pixels in the training data significantly affected the computer memory needed to train the model. On the computer used, images with 1024 pixels could only be successfully trained on backbone models with fewer than 30 layers, and with a batch size limited to one or two. However, images with 400 or 500 pixels could be trained on all models with batch sizes of four or eight.
When using the export training data for deep learning tool and limiting the extents, the tool needs to fit an image within the specified area. If the extents provided are smaller than the width or height of a pixel in the raster image, the tool will disregard those extents and export images based on the full size of the raster image instead. It is important to note that the extents are defined using the units of the map’s coordinate system, which may differ from the raster image’s internal coordinate system. Therefore, the extents should be carefully set in accordance with the map’s coordinate units to ensure the desired area is properly selected for export.
It was not possible to add additional metadata to the training dataset, such as elevation information or distance to the nearest stream or river.

Furthermore, an initial inventory of low-head dams for any region merits a careful review by a qualified expert to confirm locations. This review is recommended by the authors, regardless if the inventory was human generated or produced using the method detailed herein or similar. The advantage of using a computer-generated inventory, even with 1000+ false positives, is that you have overcome the initial challenge of generating a reasonably accurate initial inventory that can then be quickly checked. In this case, the inventory is already in ArcGIS Pro and navigating to each location is highly efficient.

Finally, additional research and effort could be applied to this topic such as retraining the model with many more image tiles of different rotations, contrast, and visibility, etc. Specifically, exploring the performance of this model is merited for states where visibility along river corridors may be generally higher due to less precipitation or colder temperatures. Exploring model performance and further training for different aerial imagery resolutions may also be beneficial. Finally, it would be expected that performance would have been increased if 80% of the data were reserved for training and only 20% for validation.

5. Conclusions

The primary outcome of this study is a method for locating low-head dams from aerial imagery using ArcGIS Pro and high-resolution aerial imagery to support ongoing public safety efforts by inventorying all low-head dams in the USA. This study selected, tested, and successfully trained a deep learning model to accomplish this task, with the supporting data and methodology published with open access. Key findings from this object identification effort include the following:

The tools within ArcGIS Pro were able to train a deep learning model when training images adequately depict a low-head dam, which can include centering it in an image.
An image classification was developed to help model users prepare images for scanning other regions of the USA.
Residual Network models are the pretrained models with the highest accuracy for the low-head dam application.
The pretrained models with fewer layers tended to have lower model accuracy.
Low-head dams can be located with sufficient visibility within the study area of Indiana.
The visibility criteria greatly influenced the ability for the model to locate a given dam.
Object identification over a large area was feasible with sufficient computational resources (i.e., a single robust desktop may require 1+ months depending on the size of the region). Additional machines may speed up run time with each machine running a subset of the desired area.

Author Contributions

Conceptualization, C.R.A. and B.M.C.; methodology, C.R.A. and B.M.C.; software, C.R.A.; validation, C.R.A. and B.M.C.; formal analysis, C.R.A. and B.M.C.; investigation, C.R.A. and B.M.C.; resources, B.M.C.; data curation, C.R.A.; writing—original draft preparation, C.R.A. and B.M.C.; writing—review and editing, B.M.C.; visualization, C.R.A. and B.M.C.; supervision, B.M.C.; project administration, B.M.C.; funding acquisition, B.M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the State of Utah through Utah State University and by D. Campbell.

Data Availability Statement

The original data presented in the study are openly available in HydroShare at http://www.hydroshare.org/resource/4ea2d62f7c864f0691e4441587c8116f.

Acknowledgments

The authors express sincere thanks to Manuela Johnson for providing the Indiana low-head dam dataset and for multiple discussions regarding dam safety, Indiana data preparation, and preliminary findings.

Conflicts of Interest

Author Caitlin R. Arnold was employed by the company J-U-B Engineers, Inc.; however, the research was conducted independently and all authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Angelakis, A.N.; Baba, A.; Valipour, M.; Dietrich, J.; Fallah-Mehdipour, E.; Krasilnikoff, J.; Bilgic, E.; Passchier, C.; Tzanakakis, V.A.; Kumar, R.; et al. Water Dams: From ancient to present times and into the future. Water 2024, 16, 1889. [Google Scholar] [CrossRef]
Tschantz, B. What we know (and don’t know) about low-head dams. J. Dam Saf. 2014, 12, 37–45. [Google Scholar]
Tschantz, B.A.; Wright, K.R. Hidden dangers and public safety at low-head dams. J. Dam Saf. 2011, 9, 7–17. [Google Scholar]
Schweiger, P.; Barfuss, S.; Foos, W.; Richards, G. Don’t Go with the Flow! Identifying and Mitigating Hydraulic Hazards at Dams; Association of State Dam Safety Officials: Lexington, KY, USA, 2017. [Google Scholar]
Sung, K.-K.; Poggio, T. Example-based learning for view-based human face detection. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 39–51. [Google Scholar] [CrossRef]
Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 1627–1645. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM. 2012, 60, 84–90. [Google Scholar] [CrossRef]
Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA, 3 November 2014; pp. 675–678. [Google Scholar]
Chen, C.; Seff, A.; Kornhauser, A.; Xiao, J. Deep driving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2722–2730. [Google Scholar]
Yang, Z.; Nevatia, R. A multi-scale cascade fully convolutional network face detector. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4 December 2016; IEEE: New York, NY, USA, 2016; pp. 633–638. [Google Scholar]
Cao, Z.; Simon, T.; Wei, S.E.; Sheikh, Y. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7291–7299. [Google Scholar]
Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3D object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1907–1915. [Google Scholar]
Zhao, Z.Q.; Zheng, P.; Xu, S.T.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef]
Dundar, A.; Jin, J.; Martini, B.; Culurciello, E. Embedded streaming deep neural networks accelerator with applications. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 1572–1583. [Google Scholar] [CrossRef]
Cintra, R.J.; Duffner, S.; Garcia, C.; Leite, A. Low-complexity approximate convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 5981–5992. [Google Scholar] [CrossRef]
Stuhlsatz, A.; Lippel, J.; Zielke, T. Feature extraction with deep neural networks by a generalized discriminant analysis. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 596–608. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Redmon, J. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef] [PubMed]
Sarker, I.H. Machine learning: Algorithms, real-world applications and research directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef] [PubMed]
Koulali, R.; Hajar, Z.; Zaim, M. Evaluation of Several Artificial Intelligence and Machine Learning Algoriths for Image Classification on Small Datasets. In Advances on Smart and Soft Computing; Saeed, F., Al-Hadhrami, T., Mohammed, F., Mohammed, E., Eds.; Springer: Singapore, 2020; pp. 51–60. [Google Scholar]
Lv, Q.; Dou, Y.; Niu, X.; Xu, J.Q.; Xia, F. Remote sensing image classification based on DBN model. J. Comput. Res. Dev. 2014, 51, 1911–1918. [Google Scholar]
Patterson, B.; Leone, G.; Pantoja, M.; Behrouzi, A.A. Deep learning for automated image classification of seismic damage to built infrastructure. In Proceedings of the Eleventh US National Conference on Earthquake Engineering, Los Angeles, CA, USA, 25–26 June 2018. [Google Scholar]
Pritt, M. Deep learning for recognizing mobile targets in satellite imagery. In Proceedings of the 2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 9 October 2018; pp. 1–7. [Google Scholar]
Trigka, M.; Dritsas, E. A Comprehensive Survey of Machine Learning Techniques and Models for Object Detection. Sensors 2025, 25, 214. [Google Scholar] [CrossRef]
Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ioffe, S. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Arabi, S.; Haghighat, A.; Sharma, A. A deep learning based solution for construction equipment detection: From development to deployment. arXiv 2019, arXiv:1904.09021. [Google Scholar]
Li, Y.; Che, P.; Liu, C.; Wu, D.; Du, Y. Cross-scene pavement distress detection by a novel transfer learning framework. Comput.-Aided Civ. Infrastruct. Eng. 2021, 36, 1398–1415. [Google Scholar] [CrossRef]
Doycheva, K.; Koch, C.; König, M. Computer vision and deep learning for real-time pavement distress detection. In Advances in Informatics and Computing in Civil and Construction Engineering, Proceedings of the 35th CIB W78 2018 Conference: IT in Design, Construction, and Management, Chicago, IL, USA, 1–3 October 2018; Springer International Publishing: Cham, Switzerland, 2019; pp. 601–607. [Google Scholar]
Moselhi, O.; Shehab-Eldeen, T. Automated detection of surface defects in water and sewer pipes. Autom. Constr. 1999, 8, 581–588. [Google Scholar] [CrossRef]
Kumar, S.S.; Wang, M.; Abraham, D.M.; Jahanshahi, M.R.; Iseley, T.; Cheng, J.C. Deep learning–based automated detection of sewer defects in CCTV videos. J. Comput. Civ. Eng. 2020, 34, 04019047. [Google Scholar] [CrossRef]
Blaschke, T. Object based image analysis for remote sensing. ISPRS J. Photogramm. Remote Sens. 2010, 65, 2–16. [Google Scholar] [CrossRef]
Cornic, A.; Ose, K.; Ienco, D.; Barbe, E.; Cresson, R. Assessment of urban land-cover classification: Comparison between pixel and object scales. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11 July 2021; IEEE: New York, NY, USA, 2021; pp. 5716–5719. [Google Scholar]
Pan, X.; Zhang, C.; Xu, J.; Zhao, J. Simplified object-based deep neural network for very high resolution remote sensing image classification. ISPRS J. Photogramm. Remote Sens. 2021, 181, 218–237. [Google Scholar] [CrossRef]
Tassi, A.; Gigante, D.; Modica, G.; Di Martino, L.; Vizzari, M. Pixel-vs. Object-based landsat 8 data classification in google earth engine using random forest: The case study of maiella national park. Remote Sens. 2021, 13, 2299. [Google Scholar] [CrossRef]
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
Pritt, M.; Chern, G. Satellite image classification with deep learning. In Proceedings of the 2017 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 10 October 2017; IEEE: New York, NY, USA, 2017; pp. 1–7. [Google Scholar]
Aplin, P.; Smith, G.M. Advances in object-based image classification. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2008, 37, 725–728. [Google Scholar]
Murray, J.; Sargent, I.; Holland, D.; Gardiner, A.; Dionysopoulou, K.; Coupland, S.; Hare, J.; Zhang, C.; Atkinson, P.M. Opportunities for machine learning and artificial intelligence in national mapping agencies: Enhancing ordnance survey workflow. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 43, 185–189. [Google Scholar] [CrossRef]
ESRI ArcGIS Pro Resources. 2020. Available online: https://www.pro.arcgis.com/en/ (accessed on 19 October 2020).
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20 June 2009; IEEE: New York, NY, USA, 2009; pp. 248–255. [Google Scholar]
Stanford Vision Lab; Stanford University; Princeton University. ImageNet Tree View. 2010. Available online: https://www.image-net.org/index.php (accessed on 11 December 2020).
Ma, L.; Li, M.; Ma, X.; Cheng, L.; Du, P.; Liu, Y. A review of supervised object-based land-cover image classification. ISPRS J. Photogramm. Remote Sens. 2017, 130, 277–293. [Google Scholar] [CrossRef]
Ma, L.; Li, M.; Gao, Y.; Chen, T.; Ma, X.; Qu, L. A novel wrapper approach for feature selection in object-based image classification using polygon-based cross-validation. IEEE Geosci. Remote Sens. Lett. 2017, 14, 409–413. [Google Scholar] [CrossRef]
Benfield, S.L.; Guzman, H.M.; Mair, J.M.; Young, J.A. Mapping the distribution of coral reefs and associated sublittoral habitats in Pacific Panama: A comparison of optical satellite sensors and classification methodologies. Int. J. Remote Sens. 2007, 28, 5047–5070. [Google Scholar] [CrossRef]
Diaz, O.; Kushibar, K.; Osuala, R.; Linardos, A.; Garrucho, L.; Igual, L.; Radeva, P.; Prior, F.; Gkontra, P.; Lekadir, K. Data preparation for artificial intelligence in medical imaging: A comprehensive guide to open-access platforms and tools. Phys. Medica 2021, 83, 25–37. [Google Scholar] [CrossRef]
United States Geological Survey. What Are the Band Designations for the Landsat Satellites? Available online: https://www.usgs.gov/faqs/what-are-band-designations-landsat-satellites?qt-news_science_products=0#qt-news_science_products (accessed on 15 October 2021).
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Part I 14. Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
Ghannadi, P.; Kourehli, S.S. Data driven method of damage detection using sparse sensors installation by SEREPa. J. Civ. Struct Health Monit. 2019, 9, 459–475. [Google Scholar] [CrossRef]
Alstad, C. The xView2 AI Challenge. 1 April 2022. Available online: https://www.ibm.com/cloud/blog/the-xview2-ai-challenge (accessed on 9 April 2022).
GitHub. DIUx-xView2_First_Place: First Place Solution for View2: Assess Building Damage challenge. 6 August 2020. Available online: https://github.com/DIUx-xView/xView2_first_place (accessed on 14 December 2020).
Hay, G.J.; Castilla, G. Geographic Object-Based Image Analysis (GEOBIA): A new name for a new discipline. In Object-Based Image Analysis: Spatial Concepts for Knowledge-Driven Remote Sensing Applications 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 75–89. [Google Scholar]
Domadia, S.G.; Thakkar, F.N.; Ardeshana, M.A. Recent advancement in learning methodology for segmenting brain tumor from magnetic resonance imaging-a review. Multimed. Tools Appl. 2023, 82, 34809–34845. [Google Scholar] [CrossRef]
Zhang, C.; Cheng, J.; Tian, Q. Unsupervised and semi-supervised image classification with weak semantic consistency. IEEE Trans. Multimed. 2019, 21, 2482–2491. [Google Scholar] [CrossRef]
Balaniuk, R.; Isupova, O.; Reece, S. Mining and tailings dam detection in satellite imagery using deep learning. Sensors 2020, 20, 6936. [Google Scholar] [CrossRef]
Aguilar, M.A.; Saldaña, M.M.; Aguilar, F.J. GeoEye-1 and WorldView-2 pan-sharpened imagery for object-based classification in urban environments. Int. J. Remote Sens. 2013, 34, 2583–2606. [Google Scholar] [CrossRef]
Jamil, A.; Bayram, B.; Seker, D.Z. Mapping Hazelnut Trees from High Resolution Digital Orthophoto Maps: A Quantitative Comparison of an Object and a Pizel Based Approach. FEB-Fresenius Environ. Bull. 2019, 28, 561–567. [Google Scholar]
Mostafaei, H. Modal identification techniques for concrete dams: A comprehensive review and application. Sci 2024, 6, 40. [Google Scholar] [CrossRef]
Google Colaboratory. 2024. Available online: https://colab.research.google.com/ (accessed on 1 November 2020).
Malerba, M.E.; Wright, N.; Macreadie, P.I. A continental-scale assessment of density, size, distribution and historical trends of farm dams using deep learning convolutional neural networks. Remote Sens. 2021, 13, 319. [Google Scholar] [CrossRef]
Chakraborty, S.; Cardwell, M.; Crookston, B.; Hotchkiss, R.H.; Johnson, M. Using Deep Learning and Aerial Imagery to Identify Low-Head Dams. In Dam Safety; Association of State Dam Safety Officials: Nashville, TN, USA, 2021; p. 177. [Google Scholar]
Alouta, R.; Hess, K. Deep Learning with ArcGIS Pro Pro Tips & Tricks: Part 1. 22 February 2021. Available online: https://www.esri.com/arcgis-blog/products/arcgis-pro/imagery/deep-learning-with-arcgis-pro-tips-tricks/ (accessed on 12 July 2021).
Indiana University. 2016–2018 Indiana Orthophotography Refresh. Available online: https://gis.iu.edu/dataset/statewide/in_2016.html (accessed on 17 September 2020).
Abdullah, Q. The ASPRS Positional Accuracy Standards, Edition 2: The Geospatial Mapping Industry Guide to Best Practices. Photogramm. Eng. Remote Sens. 2023, 89, 581–588. [Google Scholar] [CrossRef]
Indiana Office of Information Technology. Dataset Download Interface. Indiana Spatial Data Portal. 2016. Available online: http://gis.iu.edu/ (accessed on 14 December 2020).
Johnson, M. Indiana Low-Head Dam Inventory. 2020. [Google Scholar]
Arnold, C. Deep Learning and Low Head Dams, HydroShare. 2021. Available online: http://www.hydroshare.org/resource/4ea2d62f7c864f0691e4441587c8116f (accessed on 1 February 2021).
Wujek, B.; Hall, P.; Günes, F. Best Practices for Machine Learning Applications; SAS Institute Inc.: Cary, NC, USA, 2016. [Google Scholar]
Bennett, N.D.; Croke, B.F.; Guariso, G.; Guillaume, J.H.; Hamilton, S.H.; Jakeman, A.J.; Marsili-Libelli, S.; Newham, L.T.; Norton, J.P.; Perrin, C.; et al. Characterising performance of environmental models. Environ. Model. Softw. 2013, 40, 1–20. [Google Scholar] [CrossRef]
Indiana Geographic Information Council, Inc. n.d. Layer Gallery, Indiana Map. Available online: https://maps.indiana.edu/layerGallery.html?category=WaterBodies (accessed on 13 July 2021).

Figure 1. Examples of low-head dams: (a) the Dock Street Dam in Pennsylvania with over 30 fatalities (photo courtesy Benjamin Israel Devadason) and (b) the boil and entrapment current.

Figure 2. Overview of a low-head dam with the entrapment current.

Figure 3. Relationship between objects under consideration and spatial resolution with (a) as 20 m pixel, (b) as 5 m pixel, and (c) as 1.25 m pixel (adapted from [35]).

Figure 4. The State of Indiana with 170 low-head dams represented by pink (training data) and green (validation data) markers.

Figure 5. Low-head dam DL model workflow.

Figure 6. Low-head dam visibility class examples: (a) Class 0, no visibility or poor visibility; (b) Class 1, low visibility; (c) Class 2, moderate visibility; and (d) Class 3, high visibility.

Figure 7. Creation of a low-head dam image chip with (a) a low-head dam in Indiana, (b) the polygon denoting the dam, (c) the created image chip without image pre-processing, and (d) the image chip after image pre-processing.

Figure 8. Overview of the ArcGIS Pro Export training data for the DL tool process.

Figure 9. Raster for initial model validation.

Figure 10. Creation of 100 m buffer area for search with (a) imagery with NHD streamlines, (b) the buffer area, and (c) as the extracted imagery for DL model searching.

Figure 11. Training and validation loss for (a) Iteration 2, (b) Iteration 3, and (c) Iteration 4.

Figure 12. Iteration 2 training identification samples.

Figure 13. Iteration 2 example performance results including correct and incorrect object identification for (a) the low-head dams, (b) building footprints, and (c) riverbanks and bridges.

Figure 14. Iteration 3 training identification samples.

Figure 15. Iteration 3 example performance results including correct and incorrect object identification such as (a) roads, (b) bridges and low-head dams, (c) buildings, and (d) tree boundaries.

Figure 16. Iteration 4 training identification samples.

Figure 17. Iteration 4 example performance results including correct and incorrect object identification such as (a) low head dam, (b) false positives, and (c) buildings.

Figure 18. Deep learning model search results of the State of Indiana (93,495.7 km²).

Table 1. ArcGIS Pro backbone model text matrix.

Backbone Model	Batch Normalization	Single Shot Detector	RetinaNet	FasterRCNN
ResNet-18		X	X	X
ResNet-34		X	X	X
ResNet-50		X	X	X
ResNet-101		X	X	X
ResNet-152		X	X	X
DenseNet-121		X
DenseNet-169		X
DenseNet-161		X
DenseNet-201		X
VGG-11		X
VGG-11	X	X
VGG-13		X
VGG-13	X	X
VGG-16		X
VGG-16	X	X
VGG-19		X
VGG-19	X	X
MobileNet version 2		X
DarkNet-53		X

Table 2. Training and validation data of low-head dams divided by visibility classification.

	Visibility				Total
	Class 0	Class 1	Class 2	Class 3	Total
Training	8	18	33	33	92
Validation	9	16	29	24	78
Total	17	34	62	57	170

Table 3. DL Model training iteration summary.

Iteration	Overview
1	The majority of training images did not showcase a low-head dam. Zero accuracy
2	Training images included at least half of a low-head dam. Approximately 10% accuracy and 10% recall.
3	Training images centered on a low-head dam. Two additional training classes: (1) water (2) not water. Approximately 50% accuracy and high recall.
4	One additional training class added: (3) forest. Search limited to 100 m wide corridor along NHD stream/rivers. Approximately 1% accuracy and 32.1% recall.

Table 5. Iteration 3, the top 5 of 19 performing models.

Model Type	Backbone Model	TP	FP
RETINANET	RESNET101	2 of 2	0
RETINANET	RESNET 152	2 of 2	1
FASTERRCNN	RESNET50	2 of 2	13
RETINANET	RESNET 50	1 of 2	0
FASTERRCNN	RESNET 18	1 of 2	0

Table 6. Trained DL model validation confusion matrix results.

		Actual		Total
		Positive	Negative	Total
Model Prediction	Positive	True positive TP = 25	False positive FP = 3443	Total low-head dam locations identified by model δ
Model Prediction	Negative	False negative FN = 53	True negative TN	Total locations not identified by model Δ

Table 7. Trained DL model validation metrics according to visibility class.

Visibility Class	# Dams	TP	FN	Expected Recall	Actual Recall
0	9	0	9	0.00	0.00
1	16	0	16	0.00	0.00
2	29	5	24	0.50	0.17
3	24	20	4	0.95	0.83
Total	78	25	53	0.48	0.32

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Crookston, B.M.; Arnold, C.R. Utilizing Deep Learning and Object-Based Image Analysis to Search for Low-Head Dams in Indiana, USA. Water 2025, 17, 876. https://doi.org/10.3390/w17060876

AMA Style

Crookston BM, Arnold CR. Utilizing Deep Learning and Object-Based Image Analysis to Search for Low-Head Dams in Indiana, USA. Water. 2025; 17(6):876. https://doi.org/10.3390/w17060876

Chicago/Turabian Style

Crookston, Brian M., and Caitlin R. Arnold. 2025. "Utilizing Deep Learning and Object-Based Image Analysis to Search for Low-Head Dams in Indiana, USA" Water 17, no. 6: 876. https://doi.org/10.3390/w17060876

APA Style

Crookston, B. M., & Arnold, C. R. (2025). Utilizing Deep Learning and Object-Based Image Analysis to Search for Low-Head Dams in Indiana, USA. Water, 17(6), 876. https://doi.org/10.3390/w17060876

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Utilizing Deep Learning and Object-Based Image Analysis to Search for Low-Head Dams in Indiana, USA

Abstract

1. Introduction

1.1. Public Safety at Low-Head Dams

1.2. Identification

1.3. Automated Object Detection

1.4. Deep Learning and Low-Head Dam Identification

2. Materials and Methods

2.1. Study Area

2.2. Conceptualization

2.3. Data Acquisition

2.4. Training Data Preparation

2.5. Training Deep Learning Model

2.6. Model Performance and Validation

3. Results

3.1. Model Training Outcomes

3.2. Trained Deep Learning Model Validation and Final Performance

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI