*3.2. Discriminator*

The inputs of the discriminator are the generated binary masks from the previous stage and the true labels of glacial lakes. Firstly, lake information is enhanced by the element-wise product between the input masks and the Landsat imagery. ResNet-152 is used as a backbone for the extraction of features from the results of the element-wise product. The corresponding output is a 2048 dimensional feature vector of a glacial lakethis is then processed by two fully convolutional layers, and fed into a single sigmoid layer to determine whether each pixel is that of a glacial lake.

#### *3.3. Loss Function*

GAN defines a competitive game between a generator and discriminator, and the final stable state of this game is evaluated by an adversarial loss function, as follows:

$$\begin{aligned} \min\_{\mathcal{G}} \max\_{D} E\_{M \sim P\_{\text{label}}} \left[ \log(D(M)) \right] \\ + E\_{I \sim p\_{\text{label}}} [\log(1 - D(G(I)))] \end{aligned} \tag{4}$$

where *G* and *D* are the generator and discriminator, respectively, *I* is the input Landsat image, and *M* is the input mask.

However, the action of using this loss function to train the GAN model directly is unstable because it may lead to mode collapse or convergence failure [38]. Under these conditions, a loss function in WGAN-GP is employed, which places a Lipschitz constraint on the adversarial loss and penalizes the gradient norm of the adversarial loss with respect to the input binary masks. The penalty term is defined as follows:

$$E\_{\mathfrak{X}\sim P\_{\mathfrak{X}}}\left[\left(\left\|\nabla\pounds D(\mathfrak{X})\right\|\_{2}-1\right)^{2}\right] \tag{5}$$

where *Px*<sup>ˆ</sup> is the uniform sampling along the lines between the pairs of points sampled from the label distribution *Plabel* and lake distribution *Plake*.

In order to verify whether the glacial lake information can be effectively discriminated, we used an L2 loss function to represent the content loss in the discriminator to measure the similarity between image features derived from generated masks and those derived from ground truth, as follows:

$$l\_{content}(G) = \frac{1}{N} \sum\_{i=1}^{N} ||G(I)\_i - B\_i||^2 \tag{6}$$

where *B* is the binary masks of ground truth. Finally, combining the WGAN-GP adversarial loss and content loss, our loss function can be expressed as:

$$I(G, D) = l\_{advrsarial}(G, D) + l\_{content}(G) \tag{7}$$
