Mapping Small Watercourses from DEMs with Deep Learning—Exploring the Causes of False Predictions

Koski, Christian; Kettunen, Pyry; Poutanen, Justus; Zhu, Lingli; Oksanen, Juha

doi:10.3390/rs15112776

Open AccessArticle

Mapping Small Watercourses from DEMs with Deep Learning—Exploring the Causes of False Predictions

by

Christian Koski

^*,

Pyry Kettunen

,

Justus Poutanen

,

Lingli Zhu

and

Juha Oksanen

Finnish Geospatial Research Institute (FGI) in the National Land Survey of Finland (NLS), Vuorimiehentie, 02150 Espoo, Finland

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(11), 2776; https://doi.org/10.3390/rs15112776

Submission received: 21 April 2023 / Revised: 21 May 2023 / Accepted: 23 May 2023 / Published: 26 May 2023

(This article belongs to the Section AI Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

Vector datasets of small watercourses, such as rivulets, streams, and ditches, are important for many visualization and analysis use cases. Mapping small watercourses with traditional methods is laborious and costly. Convolutional neural networks (CNNs) are state-of-the-art computer vision methods that have been shown to be effective for extracting geospatial features, including small watercourses, from LiDAR point clouds, digital elevation models (DEMs), and aerial images. However, the cause of the false predictions by machine-learning models is often not thoroughly explored, and thus the impact of the results on the process of producing accurate datasets is not well understood. We digitized a highly accurate and complete dataset of small watercourses from a study area in Finland. We then developed a process based on a CNN that can be used to extract small watercourses from DEMs. We tested and validated the performance of the network with different input data layers, and their combinations to determine the best-performing layer. We analyzed the false predictions to gain an understanding of their nature. We also trained models where watercourses with high levels of uncertainty were removed from the training sets and compared the results to training models with all watercourses in the training set. The results show that the DEM was the best-performing layer and that combinations of layers provided worse results. Major causes of false predictions were shown to be boundary errors with an offset between the prediction and labeled data, as well as errors of omission by watercourses with high levels of uncertainty. Removing features with the highest level of uncertainty from the labeled dataset increased the overall f1-score but reduced the recall of the remaining features. Additional research is required to determine if the results remain similar to other CNN methods.

Keywords:

mapping; watercourses; deep learning; semantic segmentation; lidar; digital elevation model

Graphical Abstract

1. Introduction

Detailed, up-to-date geospatial data of small watercourses, such as rivulets, streams, and ditches, is increasingly important for many use cases, including maps, hydrological and terrain accessibility analysis, efficient utilization of natural resources, flood inundation mapping, and protection and restoration of ecological hotspots of biodiversity. Mapping small watercourses with traditional methods, such as stereo mapping from aerial images, is laborious, particularly in forested areas where the canopy obscures the features. Existing datasets of small watercourses are often lacking in both completeness and accuracy. However, as new high-resolution remote-sensing datasets have emerged, there has been increased research on automated and semi-automatic methods for the extraction of watercourses [1,2,3,4,5]. Convolutional neural networks (CNNs) are state-of-the-art networks for computer vision tasks [6,7]. CNNs have been developed and applied for semantic segmentation of geospatial features, for example, buildings [8,9,10], roads [11,12,13] landslides [14], as well as multiple geospatial feature classes simultaneously [15]. CNNs have also been applied to the mapping of hydrographic features from high-resolution LiDAR DEMs [5,16]. These studies have demonstrated the great potential of utilizing CNNs for the extraction of geospatial features from remote-sensing data. However, in complex tasks, such as extraction of watercourses, CNN results tend to have a relatively large quantity of errors of commission and omission. Using the model predictions directly in the automatic production of hydrographic datasets that require a complete hydrographic network, e.g., maps, is therefore not possible. A deeper understanding of the results, particular errors of omission and commission, is needed before viable post-processing steps can be developed. Watercourses are challenging because they have indeterminable boundaries and have experienced various levels of degrading and overgrowth. When producing the training sets, the digitizers need to deal with choices about which watercourses should be included and how to digitize partly overgrown or degraded watercourses. By understanding the causes of the errors in CNN predictions, a better understanding of how to digitize the training sets can also be gained.

In addition to labeled data, using suitable input data is critical for achieving good results with CNNs. Results from previous studies indicate that DEMs themselves might not be the best input data for semantic segmentation of watercourse features. For example, Stanislawski et al. [5] used multiple data layers as input to a CNN and determined how much impact each layer had on the results. The digital terrain model they used was not among the layers with the most impact. It is also not clear whether using multiple data layers is better than using individual ones. For example, Ghorbanzadeh et al. [14] found that adding datasets resulted in worse f1-scores when using a CNN to detect landslides.

In this paper, we present results from applying a deep-learning method to the mapping of small watercourse features, and from exploring extensively the causes of resulting false predictions. The study aimed to gain an understanding of what quality-level results can be expected when using deep-learning methods for the extraction of small watercourses, to understand what type of input data is best suited for the task, and to gain an understanding of what causes false predictions in the results. Although there have been previous studies that have applied deep-learning methods to the extraction of small watercourses, to our knowledge, no other studies have explored the impact of uncertainty in small watercourse features, on the validation of the deep-learning results. In addition, to our knowledge, previous studies have not validated the CNN methods for forested peatlands that can have dense ditch networks in areas with small elevation changes, typical for North European shield areas. The results of the study can help the decision-making of researchers and developers when producing data for their machine-learning applications when applying CNN methods for watercourse features, and for validating results from such methods.

2. Background

2.1. Automatic Watercourse Extraction Methods

Researchers have been working on developing automated methods for extracting watercourses from DEMs for decades. Flow accumulation is a commonly applied method for identifying watercourses. In its basic form, flow accumulation assumes that the flow from a cell is in the direction of the lowest elevation in its neighboring eight cells, and the value of a cell is the number of cells that flow to that cell [17]. Pit removal is typically applied to the DEM before the flow accumulation algorithm to avoid the flow ending up in small depressions, caused by errors from elevation underestimation, which results in flat surfaces. However, flow accumulation algorithms in their basic form do not work well on flat surfaces [18]. For example, pit removal assumes that depressions are formed from an underestimation of elevation, when they can also be formed from an overestimation of elevation [19]. Flow accumulation algorithms also assume that all water flow will be in the direction of the lowest elevation, which is not always the case [19]. For example, in dense ditch networks with multiple ditch crossings, which are typical in Finland, water flows from, and to, multiple directions within the watercourses.

The past one-and-a-half decade has seen many new methods that utilize LiDAR data developed for automatic extraction and enhancement of spatial watercourses data. Bailly et al. [20] developed a method to extract watercourses in cultivated lands from LiDAR data based on elevation profiles, curve shapes, and classification. They later introduced a method to reconstruct ditch networks from incomplete watercourses [21]. Cazorzi et al. [22] proposed a method for extracting drainage networks in agricultural lands based on relative elevation calculated from a LiDAR DEM. Passalaqua et al. [23] introduced methods that use curvature and other metrics to extract watercourses from a LiDAR DEM. They later extended the method to use additional data to improve the results [19]. Broersen et al. [1] tested the technique used by Passalaqua et al. [24] and found that it worked poorly in peatlands, particularly in flat areas. Broersen et al. [1] introduced their own two novel methods for automated extraction of watercourses from LiDAR data, one 2D method based on voids in LiDAR point clouds that are caused by water reflection, and a 3D method based on the curvature of LiDAR DEMs. They showed that the 3D method performed significantly better, as it was able to find most, if not all, of the same features as found by the 2D method, in addition to features that were not found by the 2D method. Roelens et al. [2] presented a workflow for classifying point clouds as ditches based on a random forest classifier applied to geometric, radiometric, and intensity density features, and then creating 2D vector ditch features based on the results. They evaluated the workflow against field-measured data from a grassland area and a peri-urban area. They concluded that the most important features for classification were the change in curvature and roughness of points in the local neighborhood.

Other classifier algorithms have also been used for detecting watercourses. Salandra et al. [3] compared three such methods, random forest, BayesNet, and k-Nearest Neighbors to detect water in a river, using multitemporal RGB orthomosaics as input data. The methods were combined with two image segmentation filters that they used separately and together, to find the best combination of classifier algorithm and segmentation filter. They found that depending on when the orthomosaic was collected, either the random forest or k-Nearest Neighbors performed the best in terms of the f1-score [3]. However, Noi and Kappas [4] showed that support vector machines outperformed both random forest and k-nearest neighbors when classifying features from Sentinel-2 imagery.

Recently, researchers have started to study the use of CNNs for extracting watercourse features in DEMs. For this purpose, Xu et al. [16] developed an Attention U-Net that extended the original U-Net. It was then tested for semantic segmentation of streams. They found that the network performed slightly better than the original U-Net for the task and much better than a support vector machine and artificial neural network model. Stanislawski et al. [5] applied a CNN based on U-Net for extracting waterbodies and watercourses from a 5 m resolution DEM for 14 watershed areas with elevations ranging from 32 m to 1880 m. Their results showed average recall values of 0.49 to 0.64, average precision values of 0.71 to 0.80, and f1-scores of 0.58 to 0.68.

2.2. Spatial Watercourse Data in Finland

Narrow watercourses are common in Finland where ditches are used to increase drainage in agricultural fields, wetlands, and on the side of roads, while natural rivers, streams, and creeks flow into, and between, Finland’s thousands of lakes. A recent estimate puts the total amount of drained peatlands in Finland at 4.8 million hectares, of which 88.7% are covered by forests [25]. In addition, there were 2.27 million hectares of agricultural land in use in Finland during 2022 [26]. Currently, the most detailed country-wide dataset of small watercourses is managed in the National Land Survey of Finland’s (NLSs) topographic database (TDB) where watercourses are stored either as polygons (at least 5 m wide watercourses) or as line features (less than 5 m wide watercourses). Today many governmental and non-governmental organizations use the data for a variety of analyses that could benefit from increased completeness and positional accuracy of the data. Increasing the quality of watercourse data is also important for the consistency of data in maps. As other datasets are being derived automatically from high-resolution DEMs, for example, contour lines [27], and are visualized with the watercourses from the TDB, there are inconsistencies such as watercourses intersecting with a single contour line at multiple locations. In Finland, there is also a need for constant updating of watercourse data, as new ditches are being dug in both agricultural fields and peatlands, as well as old ditches are restored to follow the EU’s biodiversity strategy. A new LiDAR point-cloud dataset, the NLS 5 p point cloud, which has a density of 5 p/m² is currently being produced in Finland by the NLS [28] and is accurate enough to reveal watercourses in detail to the human eye.

3. Data and Methods

3.1. Study Area

The study area was selected from central Finland, in the Suonenjoki municipality (Figure 1). The precise area is that of the N5221A map sheet in the Finnish TM-35 map sheet division. The elevation in the area varies between 98 m and 196 m above sea level. The area was selected because it has a relatively large quantity of the most common types of watercourses found in Finland; ditches in wetlands, ditches in and around agricultural fields, roadside ditches, and natural streams, can all be found in the area.

3.2. Data

3.2.1. Ground Truth and Labeled Dataset

Supervised CNNs require a large amount of accurately labeled data [6]. Due to the lack of a watercourse dataset that would adequately fill the high accuracy and completeness requirements, we opted to produce a new training dataset by manually digitizing all watercourses from the test area, based on visual interpretation from reference data layers. Small watercourses were defined as watercourses that are less than five meters wide. Watercourses in the NLSs TDB are stored as polygon features if they have a width of 5 m or more, and these larger watercourses have all been mapped in the TDB. Therefore, we determined that all watercourses that were either mapped as line features in the TDB or were unmapped were small watercourses. Because watercourses have indeterminable boundaries which makes their width difficult to determine, we opted to digitize their centerline as a line feature. Later we buffered the lines by 1.5 m, therefore giving them a fixed width of 3 m. We considered different buffer sizes between 1 m and 2.5 m at 0.5 intervals. The 2 and 2.5 buffers caused the merging of narrow parallel ditches, while a 1 m buffer would not adequately cover the area of wider watercourses. Furthermore, the narrow width of the watercourse increased the likelihood of completely missing the watercourse center if the line feature was digitized inaccurately.

The reference data layers, used to identify watercourses, included a relative topographic position (RTP), a relief shade, and a flow accumulation layer, all derived from a DEM generated from the NLSs 5 p point cloud at 0.5 m resolution. In addition, the NLS topographic map that includes the current watercourse data in the TDB, and NLS orthophotos were used as additional reference data where appropriate. The start and end points of each straight watercourse section were digitized at the center of the deepest point on the watercourse crossing line, determined as the darkest pixel of the RTP layer, or the center of a group of cells with indistinguishable differences in on-screen pixel darkness. Vertices were added as many as necessary to ensure that the line feature runs at the center of the watercourse. However, no significant efforts were made to validate the accuracy of the centerlines before their use as training data for the CNN, as any significant deviation of the accuracy requirement would be likely to show up in the model outcomes. The digitized ground-truth dataset has a total of 4035 vector line features with a combined length of 363 km. To label the data, the vector lines were buffered 1.5 m and rasterized to a 0.5 m resolution grid. The fully labeled raster has a size of 11,997 × 11,997 cells. Cells have a value of 0 for non-watercourse cells and 1 for watercourse cells.

One of the aims of the study was to improve the understanding of the impact of uncertainty in features on the results produced by CNNs. To study that impact, watercourse line vectors were given a clarity class attribute (CC) that reflects the amount of uncertainty in visually interpreting the feature from the background layers (Table 1). The score is subjective and represents the best estimate of the digitizer (Figure 2). For the study, the order and extreme ends of the scale are more important than the absolute judgment of the class of a watercourse feature between two neighboring classes. To ensure as few inconsistencies as possible in the score, all watercourses were digitized by a single researcher. Two additional researchers helped the digitizer to initially determine a guideline for the scaling of the clarity attribute, and after digitizing, helped to decide the score of some of the most challenging cases.

3.2.2. Input Data

Eleven data layers were used as input in training the CNN model (Figure 3). A DEM was used that was generated from the NLS 5 p point cloud. Nine additional layers that visually highlight watercourses in the data, were derived from the DEM. These include topographic position index (TPI) with 10 m, 30 m, and 75 m radius filters, sky view, positive and negative openness, and curvature, which was produced by combining profile and plan curvature data, both in their own channel, ruggedness index, and slope. TPI can be calculated with varying neighborhood sizes. We chose to include three layers, each with a different radius, 10 m, 30 m, and 75 m. The sky view factor was used, as it has been found useful for visualizing small-scale features, including working well in relatively flat areas [30]. Openness was also used, as it can overcome some limitations of local relief models and sky view factor, for example, openness visualizes areas with a large difference in slope better than local relief models and enhances the visibility of both concave and convex features in contrast to sky view factor [31]. In addition, a curvature layer was used, as curvature has been found useful to highlight watercourse features because the banks, and bottom of small watercourses, have distinct convex and concave forms. In addition, we included a slope and ruggedness index layer in the tested input data, as they can provide information that typically distinguishes watercourses from their surroundings, particularly in flat terrain. The NLS orthophotos were used as they are unique measurements compared to the DEM, and thus, even if providing little information on watercourses on their own, if coupled with other data, they could potentially add information to the LiDAR data. All input data layers are raster datasets with the geographical extent and resolution matching that of the labeled data. The input layers were used both individually and by combining pairs of them. When combined, values of both original input datasets were stored in their own channels of a new raster layer.

3.3. Deep-Learning Model

For the experiment, we developed a CNN based on the U-Net architecture (Figure 4), for which we used the PyTorch machine-learning library [32]. U-Net was originally developed for the semantic segmentation of biomedical images [33]. Later it has been used as a base for models in semantic segmentation of geospatial features, including waterbodies and watercourses [5,16]. For our study, because there were many models to be trained, U-Net was modified to have good speed without sacrificing too much of its accuracy. Parameters and network architecture were modified based on extensive testing of performance speed and accuracy. The network produces two outcomes, one for non-watercourse predictions and one for watercourse predictions, the latter of which was used for validation. The cross-entropy loss function is calculated after each 128-image batch. The Adam optimization algorithm [34] is applied to the weights of the network.

The width and height of input images for the network are 160 pixels. The dimensions depend on the input image channels which therefore vary between models with different input data. Image tiles given as input to the network are cut from random locations of the datasets training part. The images are then normalized to zero mean and unit variance. The mean and variance were calculated from the full dataset, except for the DEM for which the mean was calculated from each input image separately. This is because the elevation above sea level is not relevant to assessing whether a pixel is part of a watercourse, while elevation changes are. Tiles are augmented with random rotation and have a random 50% chance of being mirrored. Images 128 pixels wide and high are cut from the labeled dataset to match the same-sized central part of the input images. To match the labeled dataset, 16 pixels of each side of the predictions are cut before calculating loss. Cutting the sides means that 16 pixels of the sides of the training and validation area were not affecting the training or validation of the model. The model was trained for 250 epochs, which after extensive testing was determined to be sufficient for optimized results. Between epochs, the model parameters were stored if the f1-score was improved from the previous highest f1-score. The reported f1-score, recall, and precision values are calculated from the validation of the epoch with the highest f1-score.

3.4. Evaluation Metrics

To validate the performance of trained models we calculated the f1-score, recall, and precision for each trained model based on the validation set. The number of true-positive predictions (TPPs), false-positive predictions (FPPs), and false-negative predictions (FNPs) in the validation set were calculated. TPPs are positive predictions for which the matching cell in the labeled dataset is a watercourse. FPP are positive predictions for which the matching cell in the labeled data is not a watercourse. FNPs are negative predictions for which the matching cell in the labeled dataset is a watercourse. The recall is defined as the percentage of TPPs of all positive pixels in the labeled dataset (1). Precision is defined as the percentage of TPPs of all positive predictions (2). Because a higher precision or recall value can be achieved at the cost of the other one, the f1-score (3) is used to determine the overall performance of the network. The f1-score is the harmonic mean of precision and recall.

Recall = TPP/(TPP + FNP),

(1)

Precision = TPP/TPP + FPP),

(2)

f1-score = (2 × Precision × Recall)/(Precision + Recall),

(3)

Furthermore, to account for local variation in the study area, and to achieve a more accurate estimate of the performance, we used k-fold cross-validation to calculate average f1-scores (4) across the data. The data were divided vertically into five folds of equal size, after which five models were trained, each using a different fold as validation data, while the remaining areas were used as training data. When all folds were trained, the final metrics of the model were determined by the mean score of all five folds.

E = \frac{1}{k} \sum_{i = 1}^{k} E_{i}

(4)

3.5. Comparing the Performance of the Input Dataset

First, we trained ten models using the DEM as input data to acquire an understanding of the variance of the f1-score, recall, and precision that occur because of random initial weights, random location of training samples, and random augmentations. We used the average f1-score of the ten models to compare the results from the DEM to other input data layers and combinations of input datasets. We then trained ten models, each with another different individual input dataset (Figure 5). Because training models with all possible dataset combinations would be too time-consuming, we trained eight models that combined the input data of the best-performing model based on the f1-score, with each of the other input data layers. The metrics from all trained models were then compared and the best-performing layer was determined based on the highest f1-score.

3.6. Analyzing Causes of False Predictions

To gain an understanding of the causes of false predictions, we analyzed the FNPs and FPPs of the best-performing model from the input dataset comparison (Figure 6). This analysis was divided into four parts:

Recall per CC: We calculated the percentage of FNPs of the total watercourse pixels in the label for each CC separately to determine the recall per CC. This aimed to give a better understanding of the impact of uncertainty in features by showing the proportion of FNPs in lower CCs compared to higher CCs.
Impact of removing uncertain features: We then trained new models with only CC1–CC3, and CC1–CC4, in the labeled data and compared the results with the results when training with labeled data with all CCs. The aim was to understand how removing features with the most uncertainty would impact the predicted results of the other CCs.
Relaxed recall compared to regular recall: For both the models trained with CC1–CC5 and CC1–CC4 in the labeled data, we calculated relaxed recall, which was compared to the regular recall for each CC and the full combined results. Relaxed recall is defined as the fraction of watercourse pixels in the labeled data that are within ρ distance (we chose ρ to be equal to 1 m) from a watercourse pixel in the prediction [35]. The aim was to understand the impact of small positional differences between the labeled data and model prediction on the recall.
Qualitative visual analysis: Finally, we visually analyzed the remaining FNPs for each CC inspecting their locations in background layers and determining what type of watercourse the location featured. The aim was to understand what are the common types of watercourses or locations which result in FNPs.

Figure 6. Strategy for exploring causes of false predictions.

We analyzed FPPs with two methods.

Relaxed precision compared to regular precision: We calculated the relaxed precision for the predictions with all CCs in the labeled data, which was compared to the prediction’s regular precision. Relaxed precision is the fraction of predicted watercourse pixels that are within ρ (we chose ρ to be equal to 1 m) of a watercourse pixel in the labeled data [35]. The aim was to understand the impact of small positional differences between the labeled data and model prediction on the precision.
Qualitative visual analysis: We visually analyzed the remaining FPPs, inspecting their locations in the background layers. The aim was to understand what are common types of features that cause FPPs.

4. Results

4.1. Best-Performing Input Data

The average f1-score, from using the DEM as input, using k-folds mean validation, and training ten models, was 0.812 (Table 2). The standard deviation of the f1-score from the ten models was 0.0006. The average recall was 0.773 and the average precision was 0.856. The highest f1-score was from the 6th model, which had an f1-score of 0.813, recall of 0.773, and precision of 0.860.

The average f1-score for each fold shows that there are large differences in f1-scores between validation areas (Table 3). The differences were particularly large for recall, while the differences in precision were more subtle.

Comparing the f1-score from using different individual input data layers, showed that the average f1-score, from using the DEM as input, is higher compared to f1-scores from using other input data. Of the models with other input data, the model with curvature and the model with TPI with a 75 m radius neighborhood had the highest f1-scores (Table 4). The results show that for the models with TPI as input data, the f1-score marginally increased as the radius of the neighborhood grew.

Comparing the f1-score from using the DEM in combination with other layers as input data with the average score from using only the DEM as input data, showed that using only the DEM resulted in the best f1-score (Table 5). However, for each layer other than the DEM, using them combined with the DEM improved the f1-score, compared to using them individually.

4.2. Causes of FNP

Because the DEM ended up being the best-performing input layer, we trained ten models with both labeled data CC1–CC4 and CC1–CC3 and calculated the average scores for comparison of results. For labeled data CC1–CC4 the average f1-score was 0.821 (Table 6), higher than when training with labeled data CC1–CC5. Training the model with labeled data CC1–CC3 resulted in a worse average f1-score, 0.789. For both labeled data, the recall was better than when training with labeled data CC1–CC5 in the training data, but the precision was worse. Notably, the average precision was much lower with only CC1–CC3 in the labeled data.

FNPs for all folds of the 6th model with labeled data CC1–CC5 and the DEM as input were analyzed further. The results showed that FNPs were more common in low CCs, where uncertainty in the labeled dataset is higher (Table 7). The weighted percentage of CC5 FNPs was 51.6%. Using relaxed recall instead of normal recall further amplified the differences between CCs, showing that CC5 accounted for 62.7% of the FNPs that were not caused by pixel overflow. In total, 31.6% of all FNPs were reduced when using relaxed recall.

FNPs of each fold of the 7th model trained with labeled data CC1–CC4 were analyzed further. Despite a higher f1-score and overall recall achieved by the model, the results showed that for each CC the recall was worse than when training with labeled data CC1–CC5 (Table 8). Comparing relaxed recall of the two models showed that there was almost twice the amount of FN results for CC1 in the model with labeled data CC1–CC4, while in absolute terms, recall decreased the most for CC4, down to 0.695 from 0.805. Relaxed recall reduced the total FNs by 37.5%, which is slightly more than when training the model with labeled data CC1–CC5.

Visual inspection of FNPs, excluding errors accounted for by relaxed recall and precision, showed that most other FNPs were caused by sections of watercourses that are less clear than what the features CC suggests (Figure 7). For CC1 FNPs were caused mainly by roadside ditches. In addition, some watercourse end sections that run into or from lakes were not predicted correctly. For CC2, the false predictions were similar, although two FNP roadside sections were identified that appeared to be clear watercourses in the background layer. Overall, such cases were sparse across CC1–CC3. For CC3, there were similar false predictions to CC1 and CC2, but they appear more frequently. In addition, wide and shallow watercourse sections were falsely predicted as non-watercourses. For CC4 and CC5 most FNPs were sections that are difficult to determine from the background layers. For CC5, there were many FNPs in densely ditched areas, where the ditches appear to be overgrown and filled up. Overlaying the point cloud on top of the results showed that FNPs also regularly appear where the point cloud is sparse, which in turn contributes to making the sections unclear.

4.3. Causes of FPPs

The relaxed precision of the 6th model trained with the DEM as input and labeled data CC1–CC5, was 0.919, up from a regular precision of 0.862. Therefore, the offset between watercourses in the labeled data and the predictions accounts for 41.2% of the FPPs. Visual analysis of the other FPPs showed that most errors are caused by watercourses missing from the labeled data (Figure 8). These watercourses were typically not clear in the background layers and would fall into CC5 if digitized. Roadside ditches and densely dug ditches were also common in the FPPs. Few FPPs were found that could clearly be determined as non-watercourse features. Three circular features were predicted as watercourses that resemble tar pits.

5. Discussion

5.1. Input Layer Performance

Human eyes distinguish watercourses better in layers such as TPI and sky view, compared to a DEM. However, in our study, the DEM provided the best results based on f1-scores. One possible explanation, in addition to possibly losing information in generating the layers, is that while the layers are good at highlighting watercourses in many types of terrain, they also tend to show watercourses in less detail in some other types of terrain. Perhaps somewhat surprisingly, combining input datasets did not increase the f1-score. Most of the other layers are derived from the DEM, and probably, therefore, the results may indicate that they do not provide additional information for training the CNN. Although the orthophoto is not a good input dataset because of trees blocking the view to small watercourses, watercourses in agricultural fields and on the sides of roads are often visible in them. However, orthophotos also appeared to not provide sufficient additional information to improve the results. Further research is needed for a comprehensive understanding of why the results did not improve by adding datasets.

The relatively large differences in f1-scores between folds mean that the choice of validation area was sensitive to the selected geographic area. Therefore, it is expected that the study would have benefitted from a larger training and validation dataset to obtain more precise metrics. On the other hand, the scores between the ten models trained with the DEM had a small variance, which means that random elements of the study, including initial weights of the network, the random rotation and random mirroring augmentations, as well as the training data being picked from random locations of the full training dataset, did not cause major variance in scores.

5.2. Causes of False Predictions

Almost a third of FNPs were caused by the offset between the labeled data and the predictions. These false predictions impact vector data that is derived from the predictions by causing an offset (assuming the center is correct in the labeled data), rather than a missing feature or feature section. Because the displacement is one meter at most, it does not impact many, if any, use cases of small watercourses. The results also show that using relaxed recall reduces a higher percentage of FNPs for the higher CCs. Although not surprising, it highlights that relaxed recall may explain a larger percentage of FNPs when semantically segmenting only clearer features. The results indicate that the number of FN predictions calculated with regular recall that are TP with relaxed recall increases as the features become more unclear. This may be due to partially correctly predicted watercourse features becoming more complete, not only towards the sides of the watercourse but also along it. When a greater number of watercourses are incomplete along the watercourse, it increases the number.

Of the remaining FNPs, when using relaxed recall, roughly seven out of eight are caused by CC4 and CC5 which have a lot of uncertainty in them. The remaining FNPs in CC1–CC3 account for only 3.88% of the total watercourse pixels in the labeled data. The visual analysis showed that some of these errors in predictions can be explained by overestimating their CC when digitizing, or by small unclear sections in otherwise clear watercourses. Improving the architecture or optimizing hyperparameters might not then significantly increase quantitative results. On the other hand, the metrics would likely be improved by improving the quantity and quality of the labeled data. Visual inspection showed that many of the unclear features can be explained by uneven distribution in the point cloud leaving some areas with sparsely distributed points. Filling the gaps with sparse point distribution is likely to solve the issue. Roadside ditches were common among both FP and FN results. Interpreting roadside ditches can be difficult because there are typically elevation changes on both sides of the road which when observed from layers such as RTP and relief shading, can be misinterpreted as ditches, ditches could be interpreted as other features. FNPs were also relatively common when the watercourse was wide but shallow, compared to the most typical watercourse types in the area which are narrow ditches. In such cases, improving the CNN method to account for the imbalance in feature type and/or increasing the quantity of training data, to enable the model to train more on such features, could possibly improve the correct prediction of such features.

Most FPPs were caused by narrow, hard-to-see watercourses that were not digitized to the ground-truth data, suggesting that the completeness of the digitized watercourse dataset could be improved. It would likely result in improved precision. Broersen et al. [2] noted a similar finding in their study, stating that roughly half of their false-positive findings were in fact watercourses. Quality improvements could be done to the already digitized watercourses, correcting CCs of features, and considering that sections of watercourses can belong to different classes. Nevertheless, because the visual assessment did not identify cases of FNPs caused by inaccurate digitizing of the ground-truth data, the results suggest that the original dataset is mostly accurate in terms of position, and therefore, quality improvements to data may not significantly improve the predictions for higher CCs. Increasing the quantity of data could potentially improve results for some feature types, such as wide watercourse sections or natural streams. Additional tests could be conducted to determine how much training data are enough for optimal results. Stanislawski et al. [5] used varying amounts of their 4600 km² of training data (at 5 m resolution) in segmenting watercourses with U-Net, starting from 3% and increasing up to 35% of the available data. They found that increasing the amount of data beyond 15%, would not improve the model metrics. Although it is not possible to directly conclude their study, due to differences in resolution and density of watercourses in the datasets, in terms of pixels, the training data in our study exceeds the number of pixels of 15% of their data, while the density of the watercourses in our study area was high.

The results showed that excluding CC5 watercourses from the labeled dataset improved the f1-score but had a negative effect on the recall of the remaining CCs, compared to training with CC1–CC5 features in the labeled dataset. It means that to achieve the optimal completeness for the clearer features, the less clear ones need to be included in the dataset. Completeness is often important for the automatic extraction of watercourses because it ensures that the features that are found have continuity and that the hydrological network is depicted correctly. Based on the results, typical machine-learning metrics, f1-score, recall, and precision, may not give an accurate validation of the result. It depends on both the requirements of the final watercourse dataset that is derived from the predictions and the post-processing steps. For accurate validation, the accuracy and completeness of such a final watercourse dataset need to be assessed. Watercourses are one part of a hydrological network that includes other features, for example, underground watercourses, culverts, ponds, and lakes. Automatic methods have been developed for, for example, the detection of culverts from DEMs [36] and ponds [37]. As the methods develop and mature, these methods need to be combined into a completely automated workflow that produces an optimal final hydrological network.

Multiple factors could have impacted both the comparison of input datasets and causes of false predictions that were not accounted for in the study. The quality of the digitized training dataset was shown to have caused false predictions. This means it also could impact the results of the input data comparison. For example, the errors of omission by the digitizer would falsely increase the f1-score of layers that would not find the missing watercourses and decrease the score for the ones that would find them. The study did not consider alternative loss functions or optimizer algorithms, nor the resolution of input and training data. Different loss functions have been shown to provide different results in, for example, road segmentation with CNN from remotely sensed, and the function should be selected based on the use case and dataset [38]. Śliwiński et al. [39] found that when delineating watercourses from DEMs of different resolutions, 1.5 m and 1 m resolution DEM resulted in only a small decrease in lengths of the watercourse lines, which indicates that the resolution of 0.5 m of the DEM used in our study is sufficient for capturing most watercourses.

6. Conclusions

We digitized a new ground-truth dataset of small watercourses and developed a U-Net-based neural network solution to semantically segment watercourses from a 36 km² study area. The network was tested with different input data and the best-performing model from the tests was used to analyze the causes of false predictions in the data. A CC attribute added to the ground-truth data during digitizing of the training dataset enabled the analysis of the impact of uncertainty in identifying the centerline of the feature. The results showed that the DEM performed the best out of the different input data. A large portion of the false predictions can be explained by a small offset, max 1 m, between the labeled data and the predictions. The distribution of FNPs in CCs shows that more than 85% of FNPs are caused by CC4 and CC5, the least clear features. In higher classes, wide watercourses and roadside ditches are typical reasons for FNPs. The latter is also common in FPPs and may be caused by mistakes in interpreting the background layers, as roads and cliffs next to roads cause similar visual effects as watercourses. The results of this study can be used by researchers and developers for producing suitable data and developing methods for applying CNN to the mapping of small watercourses, as well as for understanding the causes of false predictions and their impact on assessing results from CNN. Further research is needed in explaining why the DEM worked best on its own, and how to improve the results of the network.

Author Contributions

Conceptualization, C.K., P.K., L.Z. and J.O.; methodology, C.K, P.K, L.Z. and J.O.; software, C.K.; validation, C.K., P.K. and J.O.; formal analysis, C.K.; investigation, C.K. and P.K.; resources, P.K., L.Z. and J.O; data curation, C.K. and J.P.; writing—original draft preparation, C.K. and J.P.; writing—review and editing, P.K., J.P., L.Z. and J.O; visualization, C.K. and J.P.; supervision, P.K., L.Z. and J.O.; project administration, L.Z. and P.K.; funding acquisition, P.K., L.Z. and J.O. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported by the National Land Survey’s ATMU project (Advanced Technology for topographic Map Updating), which was funded by the Ministry of Finance in Finland (Valtiovarainministeriö) for the period of 1 January 2021–31 December 2022.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the sensitive nature of the raw data.

Acknowledgments

The authors wish to acknowledge computing resources provided by CSC—IT Center for Science, Finland (urn:nbn:fi:research-infras-2016072531) and the Open Geospatial Information Infrastructure for Research (Geoportti RI, urn:nbn:fi:research-infras-2016072513).

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Roelens, J.; Rosier, I.; Dondeyne, S.; Van Orshoven, J.; Diels, J. Extracting drainage networks and their connectivity using LiDAR data. Hydrol. Process. 2018, 32, 1026–1037. [Google Scholar] [CrossRef]
Broersen, T.; Peters, R.; Ledoux, H. Automatic identification of watercourses in flat and engineered landscapes by computing the skeleton of a LiDAR point cloud. Comput. Geosci. 2017, 106, 171–180. [Google Scholar] [CrossRef]
Noi, P.T.; Kappas, M. Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using Sentinel-2 imagery. Sensors 2017, 18, 18. [Google Scholar]
La Salandra, M.; Colacicco, R.; Dellino, P.; Capolongo, D. An Effective Approach for Automatic River Features Extraction Using High-Resolution UAV Imagery. Drones 2023, 7, 70. [Google Scholar] [CrossRef]
Stanislawski, L.V.; Shavers, E.J.; Wang, S.; Jiang, Z.; Usery, E.L.; Moak, E.; Duffy, A.; Schott, J. Extensibility of U-Net neural network model for hydrographic feature extraction and implications for hydrologic modeling. Remote Sens. 2021, 13, 2368. [Google Scholar] [CrossRef]
Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J.; et al. Recent advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef]
Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 6999–7019. [Google Scholar] [CrossRef] [PubMed]
Maltezos, E.; Doulamis, A.; Doulamis, N.; Ioannidis, C. Building extraction from LiDAR data applying deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 2018, 16, 155–159. [Google Scholar] [CrossRef]
Shao, Z.; Tang, P.; Wang, Z.; Saleem, N.; Yam, S.; Sommai, C. BRRNet: A fully convolutional neural network for automatic building extraction from high-resolution remote sensing images. Remote Sens. 2020, 12, 1050. [Google Scholar] [CrossRef]
Deng, W.; Shi, Q.; Li, J. Attention-gate-based encoder–decoder network for automatical building extraction. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2611–2620. [Google Scholar] [CrossRef]
Liu, Y.; Yao, J.; Lu, X.; Xia, M.; Wang, X.; Liu, Y. RoadNet: Learning to comprehensively analyze road networks in complex urban scenes from high-resolution remotely sensed images. IEEE Trans. Geosci. Remote Sens. 2018, 57, 2043–2056. [Google Scholar] [CrossRef]
Xu, Z.; Shen, Z.; Li, Y.; Xia, L.; Wang, H.; Li, S.; Jiao, S.; Lei, Y. Road extraction in mountainous regions from high-resolution images based on DSDNet and terrain optimization. Remote Sens. 2020, 13, 90. [Google Scholar] [CrossRef]
Abdollahi, A.; Pradhan, B.; Alamri, A. RoadVecNet: A new approach for simultaneous road network segmentation and vectorization from aerial and google earth imagery in a complex urban set-up. GIScience Remote Sens. 2021, 58, 1151–1174. [Google Scholar] [CrossRef]
Ghorbanzadeh, O.; Blaschke, T.; Gholamnia, K.; Meena, S.R.; Tiede, D.; Aryal, J. Evaluation of different machine learning methods and deep-learning convolutional neural networks for landslide detection. Remote Sens. 2019, 11, 196. [Google Scholar] [CrossRef]
Dong, R.; Pan, X.; Li, F. DenseU-net-based semantic segmentation of small objects in urban remote sensing images. IEEE Access 2019, 7, 65347–65356. [Google Scholar] [CrossRef]
Xu, Z.; Wang, S.; Stanislawski, L.V.; Jiang, Z.; Jaroenchai, N.; Sainju, A.M.; Shavers, E.; Usery, E.L.; Chen, L.; Li, Z.; et al. An attention U-Net model for detection of fine-scale hydrologic streamlines. Environ. Model. Softw. 2021, 140, 104992. [Google Scholar] [CrossRef]
O’Callaghan, J.F.; Mark, D.M. The extraction of drainage networks from digital elevation data. Comput. Vis. Graph. Image Process. 1984, 28, 323–344. [Google Scholar] [CrossRef]
Mao, X.; Chow, J.K.; Su, Z.; Wang, Y.H.; Li, J.; Wu, T.; Li, T. Deep learning-enhanced extraction of drainage networks from digital elevation models. Environ. Model. Softw. 2021, 144, 105135. [Google Scholar] [CrossRef]
Martz, L.W.; Garbrecht, J. The treatment of flat areas and depressions in automated drainage analysis of raster digital elevation models. Hydrol. Process. 1998, 12, 843–855. [Google Scholar] [CrossRef]
Bailly, J.S.; Lagacherie, P.; Millier, C.; Puech, C.; Kosuth, P. Agrarian landscapes linear features detection from LiDAR: Application to artificial drainage networks. Int. J. Remote Sens. 2008, 29, 3489–3508. [Google Scholar] [CrossRef]
Bailly, J.S.; Levavasseur, F.; Lagacherie, P. A spatial stochastic algorithm to reconstruct artificial drainage networks from incomplete network delineations. Int. J. Appl. Earth Obs. Geoinf. 2011, 13, 853–862. [Google Scholar] [CrossRef]
Cazorzi, F.; Fontana, G.D.; Luca, A.D.; Sofia, G.; Tarolli, P. Drainage network detection and assessment of network storage capacity in agrarian landscape. Hydrol. Process. 2013, 27, 541–553. [Google Scholar] [CrossRef]
Passalacqua, P.; Do Trung, T.; Foufoula-Georgiou, E.; Sapiro, G.; Dietrich, W.E. A geometric framework for channel network extraction from lidar: Nonlinear diffusion and geodesic paths. J. Geophys. Res. Earth Surf. 2010, 115, 1254. [Google Scholar] [CrossRef]
Passalacqua, P.; Belmont, P.; Foufoula-Georgiou, E. Automatic geomorphic feature extraction from lidar in flat and engineered landscapes. Water Resour. Res. 2012, 48, W03528. [Google Scholar] [CrossRef]
Sallinen, A.; Tuominen, S.; Kumpula, T.; Tahvanainen, T. Undrained peatland areas disturbed by surrounding drainage: A large scale GIS analysis in Finland with a special focus on aapa mires. Mires Peat 2019, 24, 38. [Google Scholar]
Utilised Agricultural Area 2022 (Provisional). Available online: https://www.luke.fi/en/statistics/utilised-agricultural-area/utilised-agricultural-area-2022-provisional (accessed on 8 November 2022).
Kettunen, P.; Koski, C.; Oksanen, J. A design of contour generation for topographic maps with adaptive DEM smoothing. Int. J. Cartogr. 2017, 3, 19–30. [Google Scholar] [CrossRef]
Laser Scanning Data. 5p. Available online: https://www.maanmittauslaitos.fi/en/maps-and-spatial-data/expert-users/product-descriptions/laser-scanning-data-5-p (accessed on 23 November 2022).
Kuusisto, E. Lake and river systems in Finland. Finn Environ. 2006, 23, 49–58. [Google Scholar]
Zakšek, K.; Oštir, K.; Kokalj, Ž. Sky-view factor as a relief visualization technique. Remote Sens. 2011, 3, 398–415. [Google Scholar] [CrossRef]
Doneus, M. Openness as visualization technique for interpretative mapping of airborne lidar derived digital terrain models. Remote Sens. 2013, 5, 6427–6442. [Google Scholar] [CrossRef]
Pytorch. Available online: https://pytorch.org (accessed on 23 November 2022).
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Mnih, V.; Hinton, G.E. Learning to detect roads in high-resolution aerial images. In Proceedings of the 11th European Conference on Computer Vision, Heraklion, Greece, 5–11 December 2010. [Google Scholar]
Mäkinen, V.; Oksanen, J.; Sarjakoski, T. Automatic determination of stream networks from DEMs by using road network data to locate culverts. Int. J. Geogr. Inf. Sci. 2019, 33, 291–313. [Google Scholar] [CrossRef]
Shi, W.; Sui, H. An effective superpixel-based graph convolutional network for small waterbody extraction from remotely sensed imagery. Int. J. Appl. Earth Obs. Geoinf. 2022, 109, 102777. [Google Scholar] [CrossRef]
Xu, H.; He, H.; Zhang, Y.; Ma, L.; Li, J. A comparative study of loss functions for road segmentation in remotely sensed road datasets. Int. J. Appl. Earth Obs. Geoinf. 2023, 116, 103159. [Google Scholar] [CrossRef]
Śliwiński, D.; Konieczna, A.; Roman, K. Geostatistical Resampling of LiDAR-Derived DEM in Wide Resolution Range for Modelling in SWAT: A Case Study of Zgłowiączka River (Poland). Remote Sens. 2022, 14, 1281. [Google Scholar] [CrossRef]

Figure 1. The study area is in central Finland and is a representative sample of the Finnish lake district, which covers close to a third of the country [29].

Figure 2. Examples of clarity classes given to watercourses.

Figure 3. The input data used in the study. The curvature layer consists of both plan curvature and profile curvature.

Figure 4. The modified U-Net used for the study, which is based on the original U-Net by Ronneberger et al. [33].

Figure 5. Strategy for input data comparison.

Figure 7. Examples of FNPs (yellow); the end part of a CC1 ditch running into a lake (a); a roadside ditch in CC1 (b); a roadside ditch section in CC2 was missed, although it is clear in the data (c); a wide watercourse section of CC3 (d); unclear, wide, and shallow watercourses of CC4 (e); the natural stream that is barely visible in the background layers of CC5 (f). Predicted watercourses are shown on top of labeled ones to highlight FNPs.

Figure 8. Typical false-positive results from ditches that were not digitized into the ground-truth data; obscure ditches in agricultural fields (a); false-positive results caused by circular shapes that are likely to be old tar pits (b); roadside ditches next to cliffs (c); partially filled up ditches in forested areas (d). Labeled watercourses are shown on top of predicted ones to highlight FPPs.

Table 1. Clarity class descriptions.

Class	Description	Total Length
CC1	The clearest and crisp features. Wide, obvious watercourses that have sharp boundaries.	43.6 km
CC2	Slightly narrower or less clear watercourses compared to CC1.	65.9 km
CC3	Unclear parts of watercourses or unclear position of centerline due to low banks. Narrower than CC1 and CC2 watercourses.	119.3 km
CC4	Either very narrow or somewhat unclear watercourse. Yet clearly a watercourse feature.	74.9 km
CC5	The least clear features. Barely detectable as a watercourse with unclear parts or a natural stream with the exact position of the center difficult to determine. Meandering and/or wide banks.	59.8 km

Table 2. Scores for all 10 iterations with the DEM as input data.

Iteration	f1-Score	Recall	Precision
1	0.812	0.768	0.862
2	0.812	0.773	0.855
3	0.811	0.773	0.855
4	0.812	0.767	0.864
5	0.811	0.777	0.850
6	0.813	0.773	0.860
7	0.812	0.776	0.853
8	0.812	0.771	0.858
9	0.811	0.775	0.852
10	0.811	0.777	0.849
Average	0.812	0.773	0.856

Table 3. The average f1-score, recall, and precision of each fold from the models using the DEM as input data.

Fold	f1-Score	Recall	Precision
Fold 1	0.873	0.863	0.883
Fold 2	0.809	0.764	0.859
Fold 3	0.791	0.744	0.844
Fold 4	0.778	0.716	0.851
Fold 5	0.808	0.777	0.842

Table 4. Scores from using individual input data layers.

Input Data	f1-Score	Recall	Precision
DEM	0.812	0.773	0.856
Slope	0.770	0.726	0.822
Ruggedness	0.711	0.643	0.798
Curvature	0.800	0.766	0.840
TPI10	0.796	0.746	0.854
TPI30	0.799	0.766	0.836
TPI75	0.800	0.765	0.839
Sky view	0.760	0.704	0.828
Positive openness	0.770	0.731	0.814
Negative openness	0.767	0.730	0.810
Orthophotos	0.231	0.157	0.434

Table 5. Scores from using other input data layers combined with DEM.

Input Data	f1-Score	Recall	Precision
DEM + Orthophotos	0.803	0.762	0.851
DEM + Curvature	0.801	0.763	0.843
DEM + TPI75	0.804	0.769	0.845
DEM + Slope	0.806	0.774	0.842
DEM + Ruggedness	0.810	0.770	0.857
DEM + Sky view	0.808	0.778	0.844
DEM + Positive openness	0.778	0.743	0.817
DEM + Negative openness	0.779	0.746	0.817

Table 6. Scores when training with CC 1–4 features in the labeled data.

Labeled Data	f1-Score	Recall	Precision
CC1–CC5	0.812	0.773	0.856
CC1–CC4	0.821	0.806	0.839
CC1–CC3	0.789	0.793	0.786

Table 7. Recall compared and relaxed recall when trained with CCs 1–5 in the labeled data.

Class	Recall	Weighted Percentage of False Predictions	Relaxed Recall
CC1	0.949	4.6	0.990
CC2	0.938	5.6	0.983
CC3	0.857	12.8	0.929
CC4	0.715	25.5	0.805
CC5	0.423	51.6	0.507

Table 8. Recall compared and relaxed recall when training with CCs 1–4 in the labeled data.

Class	Recall	Weighted Percentage of False Predictions	Relaxed Recall
CC1	0.934	9.1	0.981
CC2	0.927	10.0	0.979
CC3	0.813	25.7	0.896
CC4	0.598	55.2	0.695

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Koski, C.; Kettunen, P.; Poutanen, J.; Zhu, L.; Oksanen, J. Mapping Small Watercourses from DEMs with Deep Learning—Exploring the Causes of False Predictions. Remote Sens. 2023, 15, 2776. https://doi.org/10.3390/rs15112776

AMA Style

Koski C, Kettunen P, Poutanen J, Zhu L, Oksanen J. Mapping Small Watercourses from DEMs with Deep Learning—Exploring the Causes of False Predictions. Remote Sensing. 2023; 15(11):2776. https://doi.org/10.3390/rs15112776

Chicago/Turabian Style

Koski, Christian, Pyry Kettunen, Justus Poutanen, Lingli Zhu, and Juha Oksanen. 2023. "Mapping Small Watercourses from DEMs with Deep Learning—Exploring the Causes of False Predictions" Remote Sensing 15, no. 11: 2776. https://doi.org/10.3390/rs15112776

APA Style

Koski, C., Kettunen, P., Poutanen, J., Zhu, L., & Oksanen, J. (2023). Mapping Small Watercourses from DEMs with Deep Learning—Exploring the Causes of False Predictions. Remote Sensing, 15(11), 2776. https://doi.org/10.3390/rs15112776

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mapping Small Watercourses from DEMs with Deep Learning—Exploring the Causes of False Predictions

Abstract

1. Introduction

2. Background

2.1. Automatic Watercourse Extraction Methods

2.2. Spatial Watercourse Data in Finland

3. Data and Methods

3.1. Study Area

3.2. Data

3.2.1. Ground Truth and Labeled Dataset

3.2.2. Input Data

3.3. Deep-Learning Model

3.4. Evaluation Metrics

3.5. Comparing the Performance of the Input Dataset

3.6. Analyzing Causes of False Predictions

4. Results

4.1. Best-Performing Input Data

4.2. Causes of FNP

4.3. Causes of FPPs

5. Discussion

5.1. Input Layer Performance

5.2. Causes of False Predictions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI