1. Introduction
The Greenland Ice Sheet (GrIS) is the second largest ice sheet in the world after the Antarctic Ice Sheet. GrIS could cause a global sea level rise of 7 m if it melts completely, with the uncertainty at 35% [
1]. GrIS has shown accelerated mass loss and contribution to sea level rise in recent decades due to global warming [
2,
3]. GrIS has made a total contribution of 13.7 ± 1.1 mm to the global sea level rise since 1972 [
4]. Mass loss of GrIS mainly results from a combination of surface runoff, submarine melt, and iceberg calving [
5,
6,
7,
8]. Surface runoff led to nearly 40% of GrIS’s mass loss from 2000 to 2008 [
5]. Increased surface melting and runoff has contributed to 84% of the increase in mass loss from 2009 to 2012 [
9]. Therefore, surface runoff of GrIS has currently drawn attention in the relevant studies [
10].
In general, GrIS’s surface meltwater, mainly caused by solar radiation, is part of supraglacial hydrological system. The system is mainly located in the ablation area and includes Supraglacial lakes (SGLs), streams, crevasses, and moulins [
11]. Meltwater accumulates in SGLs and is transported through streams and crevasses; some goes directly across the GrIS surface to the ocean [
12], while most flows through vertical surface-to-bed hydrologic connections (moulins) and finally reaches the subglacial area [
13] and then the ocean. The influence of GrIS surface meltwater on the ice sheet mass balance needs to be understood, and an important step is to quantify the surface meltwater in SGLs. As the most representative and identifiable hydrologic feature, SGLs have been investigated based on geomorphological characterizations such as quantity, area, depth, and volume [
11]. Besides these physical properties’ extraction, it has also been found that SGLs contribute to the negative mass balance of GrIS in two main ways. First, the melting rate of ice sheets is affected by the low albedos of SGLs, which cause increased solar radiation absorption [
14,
15,
16]. Second, SGLs influence the ice sheet dynamic process by transporting surface meltwater to the GrIS bed, reducing basal friction, and thereby accelerating ice sheet motion in early-melt season [
17,
18,
19,
20,
21,
22,
23,
24,
25], although this is followed by deceleration in marginal regions [
26].
Different remote sensing images and threshold methods have been used in previous studies of SGLs. MODIS images are mainly used in long time series to track the evolution of SGLs because of the long time coverage and high temporal resolution. Sundal et al. [
27] found the monthly area variations of SGLs at different elevations using 260 MODIS images acquired between 2003 and 2007, based on setting the threshold of ratio between red and blue bands. Selmes et al. [
13] used 3704 MODIS images acquired from 2005 to 2009 to map the evolution of 2600 SGLs for the whole GrIS by the moving window thresholding method. However, compared with the low spatial resolution of MODIS images, Landsat images have a higher spatial resolution and more spectral information, so they are more capable of extracting the area and drainage events of SGLs in detail. Miles et al. [
28] used a combined Landsat 8 and Sentinel-1 dataset and the threshold of NDWI (Normalized Difference Water Index) to extract the drainage events of SGLs from June to September 2015. Williamson et al. [
29,
30] designed two algorithms named “FAST” and “FASTER” to extract the area and volume of SGLs based on thresholds of the ratio between red and blue and NDWI, respectively. Schwatke et al. [
31] used five water indexes MNDWI (Modified Normalized Difference Water Index), NWI (New Water Index), AWEI
nsh (Automated Water Extraction Index for Non-Shadow Areas), AWEI
sh (Automated Water Extraction Index for Shadow Areas) and TC
wet (Tasseled Cap for Wetness) to combine their individual strengths for the water identification in the summer of 2018.
Overall, previous water identification methods are mainly based on setting the threshold of different water feature indices and decision rules. They extract objects by designing effective or representative features, which are called feature engineering. Feature engineering relies on empirical knowledge and there can be a lack of transferability when the study area changes. On the other hand, during the process of feature learning, the input samples are labeled and then the relationship between the input and the label is established in the model. Therefore, the model can actively learn the features of the samples, without defining what exactly the features are [
32]. Feature learning is a black-box operation learning rules and patterns from sample data [
33]. Although it lacks the explanation of physical processes and principles, it avoids manually designing features and does not depend on empirical knowledge, so the adaptability is improved. As one of the most widely used feature learning method, convolutional neural network (CNN) which combines spectral and spatial features has achieved success [
34,
35,
36,
37] in the recognition of different objects including handwritten words [
38], human faces [
39], buildings [
40], and so on. These researches prompted us to explore the water identification of SGLs using CNN due to its capacity for feature learning.
In this study, we aim to provide an efficient method of extracting SGLs using the established CNN framework and morphological and geometrical operations based on Landsat 8 images during the melt seasons of 2014–2018. Moreover, we assess the water identification results of the CNN by comparing them with those of the widely used threshold method (Otsu) and two other machine learning algorithms (RF and SVM). Otsu uses the global threshold for the water highlighting index NDWI to identify water. RF and SVM can learn the water characteristics and classification rules for water recognition based on the spectral information of sample pixels. The comparison results show that CNN has the best water identification ability of these four methods in this study area.
3. Methods
We first computed the mean image of each band for each year. Secondly, we selected samples based on 10-band composition mean images of B1–B8 from Landsat 8, the Normalized Difference Vegetation Index (NDVI), and NDWI because all 10 bands include the characteristics of water. Thirdly, the CNN model was trained, validated, and tested by all samples selected from these mean images. Finally, the images acquired from five consecutive years were put into the well-trained CNN model to identify water pixels. Then morphological and geometrical operations were applied to extract SGLs from the water.
We extracted SGLs through two main procedures. As shown in
Figure 4, the first step was to identify water pixels by training and testing the CNN model using the selected samples. The second step was to extract the object SGLs in five years based on some filtration conditions including the area and width of the water patches, and mathematical operations of geometry and morphology.
3.1. Water Identification Based on CNN
To benefit from the multiple spectral bands in Landsat 8 OLI sensors, many empirical indices have been used to effectively characterize land cover categories by a combination of a subset of Landsat spectral bands such as NDWI [
43] and modified NDWI (MNDWI) [
44]. Zhang et al. [
45] input Landsat 8 image patches with the size of 5 × 5 into the CNN to identify the built-up areas in different cities in China. This inspired us to design the networks as shown in
Figure 5, consisting of input layer, convolution layer, fully connected layer and output layer.
The B8 (with an original resolution of 15 m) of the Landsat 8 OLI images was resampled to 30-m resolution using the nearest neighborhood sampling. In order to make full use of the spectral information of the Landsat 8-OLI image, B1–B8 are stacked together with the NDWI images calculated by Equation (1) and the NDVI images calculated by Equation (2). Hence, the 10-band image was taken as input for the CNN. Due to the variability and inhomogeneity of the geometrical shape of surface meltwater, spatial information plays an important role in the accurate identification of water pixels. In the 30-m resolution image, we considered the size of every image patch as N × N (3, 5, 7, 9, and 11 have all been tested as the value of N). For each pixel, an N-neighborhood was considered, which means the size of image patch is N × N × 10. So, a 10-band image patch shown as N × N neighborhood centered on each sample point was inputted into the neural network. The optimal value of N was determined based on the accuracy of the training process. When the training process converged, the corresponding N with the highest accuracy would be the optimal value. There are two convolutional layers and two max-pooling layers that aim to extract spectral features and spatial features and higher-grade semantic features. There are three layers in the fully connected layer. In order to prevent overfitting, a batch-normalization layer and a dropout layer were added. The output layer consists of a softmax operator, which outputs water and background into two separate categories. In the whole network, we use the activation function called Rectified Linear Unit (ReLU) [
46] to solve the vanishing gradient problem for training epochs in the network.
For the CNN, we established three hyperparameters to better train the system: batch size, epoch and learning rate. The batch size was set to 128 and the epoch was set to 200. We used cross entropy as the loss function and stochastic gradient descent (SGD) as the optimizer to learn the model parameters. The initial learning rate of SGD was set to 0.001. When the learning rate was no less than 0.00001, the learning rate decreased exponentially with the increase in training epoch indices. The formula for the learning rate is shown in Equation (3). In this equation, 0.001 represents the initial learning rate,
lrate represents the learning rate after each iteration, and
n indicates the epoch of iteration.
The hyperparameters connecting the networks in CNN need to be determined during the training process. Here we used 42%, 18%, and 40% [
45] of all the 10,000 samples as the training set, validation set, and testing set, respectively. The training set is used to learn and adjust the weights and offsets in the neural network, while the validation set is used to verify that the hyperparameters are reliable. The training process stops after 200 epochs. Therefore, if the accuracies of training and validation sets are high (greater than 0.9) and close (the difference is smaller than 0.05) and the losses are low (smaller than 0.2), the network is successfully trained, which indicates that the model has converged.
3.2. Accuracy Assessment for Water Identification
The test samples were classified into two categories to get the predictive label. A confusion matrix shown in
Table 1 was then obtained according to the true label and the predictive label. The overall accuracy (OA), recall, and precision were calculated based on the confusion matrix. OA represents the proportion of correctly predictive samples in all samples. Recall indicates the percentage of predictive water samples in all true water samples. Precision indicates the percentage of true water samples in all predictive water samples. These three precision indices can be used to assess the accuracy of water area identification.
The
OA,
Recall, and
Precision are calculated using Equations (4)–(6), respectively:
3.3. The Extraction of SGLs from Water
SGLs’ geometric characteristics such as area, length, width, and shape are very important for SGLs extraction. Therefore, we considered extracting SGLs from the water identification results by using an algorithm for geometry and morphology. There are three conditions that should be taken into consideration in the process of extracting SGLs from water: (1) SGLs smaller than the minimum area threshold should be removed; (2) holes resulting from the floating ice inside the SGL should be filled and the outer contour should represent the size of the SGL; (3) narrow water patches that represent rivers should be removed. The flow chart shown in
Figure 6 illustrates how these problems have been solved.
First, we converted the water identification image to binary in order to proceed with the morphological and geometrical algorithm. Then, according to the minimum SGL size of two MODIS pixels (250 m) in previous research [
13,
29], water patches smaller than 200 pixels (~0.18 km
2, called tiny water patches) were removed. Next, we conducted opening and closing operations based on small rectangle structural elements. The following Equations (7)–(10) show the opening, closing, dilation, and erosion operations displayed by the symbols ○,●,⊕,⊙, respectively. The opening operation (shown in Equation (7)) means that erosion is done before the dilation to eliminate isolated water spots; it is used to delete edge fragmentation around water patches to form more rounded shapes. Contrarily, the closing operation (shown in Equation (8)) means the dilation is done before the erosion process to connect scattered water patches inside SGLs to fill small holes. After that, we conducted a dilation operation (shown in Equation (9)) based on a large rectangle structural element to connect major SGLs, and we conducted an erosion operation (shown in Equation (10)) based on a large ellipse structural element to make the SGLs more compact. The hole-filling process was iterated until no void appeared in each SGL. To distinguish narrow rivers from SGLs, water patches with a width of the minimum bounding rectangles less than 5 pixels (~150 m, called slender rivers) were regarded as narrow rivers [
47] and were removed from the final results. All of these processes were accomplished on a Python platform using two open-source libraries: Opencv and Scikit-image.
The opening operation of structural element B to A:
The closing operation of structural element B to A:
The dilation operation of structural element B to A:
The erosion operation of structural element B to A:
3.4. The Temporal–Spatial Change Detection of SGLs
The change degree of area in temporal scale is a quantitative description of the dynamic changing characteristics of SGLs over the study period [
48]. A dynamic variability model [
49], shown in Equation (11) is used to describe the rate of change of SGLs’ area within a certain time frame in the research area. In this formula,
Sa and
Sb represent the total area of SGLs in different year “a” (previous) and year “b” (next) respectively. K represents the interannual areal changing rate of SGLs. Positive and negative K values mean increasing and decreasing, respectively, the absolute value of K implies the speed of SGLs’ change:
We calculated the difference in lake area on pixel basis between two years. As shown in Equation (12),
Da and
Db represent SGLs’ spatial distribution in years “a” (previous) and “b” (next).
CH represents the result of change detection and has three possible values: 0 shows no change, 1 indicates it changes from background to SGL, and −1 indicates it changes from SGL to background.
3.5. Three Algorithms (Otsu, SVM, and RF) Used for Comparison with CNN
We use both unsupervised and supervised algorithms to make well-rounded comparison with CNN. Otsu [
50] is an unsupervised algorithm to determine the global threshold for image binary segmentation. It obtains the threshold when the interclass variance between the target and the background image is the largest. This is consistent with our aim to distinguish water from the background. NDWI can highlight the water information in the image with the value range of [−1, 1]. When the value is close to 1, the pixel is water, and when the value is near to −1, the pixel is non-water. Water and non-water can be distinguished by setting an appropriate threshold. The calculation of Otsu is simple and fast, but when the ratio between the target and background is large, the variance of classes displays multiple peaks and the segmentation effect can be inaccurate.
Besides unsupervised algorithms, supervised algorithms should also be included in the comparison. RF and SVM are both classical and widely used supervised machine learning algorithms with the high classification accuracy and efficiency. The main idea of SVM [
51] is to establish an optimal decision hyperplane to maximize the distance between the nearest two classes of samples on both sides of the plane. SVM has the advantage of low generalization error rate, but it has difficulty in solving multiple classification problems. The essence of RF [
52] belongs to a parallel ensemble classification algorithm, which is an improvement on the decision tree algorithm that merges multiple decision trees together; the establishment of each tree depends on the samples extracted independently. RF is computationally efficient and able to process images with multi-features. However, it could perform overfitting when there is obvious image noise. In this research, we use the two classifiers (svm.SVC and ensemble.RandomForestClassifier) provided by the Python sklearn module. Referring to [
45], and through theoretical analysis and experiment, the parameters of the two classifiers are shown in
Table 2.
6. Conclusions
In this study, we successfully extracted SGLs in southwest Greenland and found the dynamic change characteristics of SGLs from 2014 to 2018. The results show that our method has achieved 94% effectiveness of correctly identifying water, and the accuracy was higher than that achieved by Otsu based on the same samples. What is more, CNN has a better water outlining capability, and includes less noise and fewer omission errors than the other two widely used machine learning methods, RF and SVM. Therefore, CNN combined with the morphological and geometrical algorithms enabled us to extract SGLs automatically and effectively in this study area. The uncertainty of our results is mainly reflected in the total area because of the calculation of mean images during the entire melt season.
Our findings suggest that, in our study region, the total area of SGLs in 2015 (393.36 km2) represented an obvious increase compared to 2014, and kept decreasing after that, reaching a minimum of 158.73 km2 in 2018. There were 138 SGLs with the area of 23.71 km2 existing during all the five years. Seven hundred SGLs were observed in 2015, which was the highest quantity of the five years, while the largest individual lake developed in 2016. There were 175 new SGLs with a total area of 112.45 km2 in 2015, and 303 SGLs with a total area of 234.63 km2 disappeared from 2015 to 2018. In conclusion, the most active surface melting occurred in 2015 and the most limited in 2018.
Spatially, the new SGLs appearance peaked in 2015 in the southern area and 2016 in the northern area. Similarly, many SGLs disappeared in 2016 in the southern area and in 2017 in the northern area. In 2018, the change of SGLs was the most limited. Disappearing SGLs were mainly located in the southern area, and there were a few new SGLs in the northern area. Combining with DEM, we find that SGLs were most active in the area between 1000 and 1500 m, and SGLs in 2016 more appeared at higher elevations than in the other years.
If the spectral information is sufficient and the selected sample are representative and distributed evenly in the study area, the CNN model can be transferred and used in water identification in other regions. In further studies, the transferring and generalization abilities of our CNN model needs to be verified. What is more, we can calculate the volumes of meltwater by combining the SGLs’ area from our output and depth from physical algorithm, then assess the mass balance of GrIS. As it has been proven that SGLs form at higher altitudes as temperatures rise [
54], and lakes above 1600 m are likely to drain as rapidly as lakes located at lower elevations [
55], applying the method we developed in high-temporal-resolution images to explore the relationship between temporal–spatial changes of SGLs and drainage events or meteorological factors is a topic worth investigating.