**3. Results**

Results will be presented separating the three objectives of the paper. The first part of the analysis will consider a small dataset where different supervised and automated classifiers will be compared. The second section will consider a ten-year dataset where about 8000 images will be processed using automated solutions. Finally, the FSC estimated by terrestrial photography will be compared to the output obtained by remotely sensed data.

#### *3.1. Comparison between Supervised and Automated Classifiers*

This first part of the analysis includes two steps: one dedicated to the orthorectification of the panoramic view observed by the webcam; the other focused on the image classification performed considering the color components associated with a RGB color space. The first process produced a weighting mask applying a geometrical correction and all the considered classification algorithms used this product successively. The classification step was operated on a small dataset of 30 images due to the user intervention required by the supervised methods (ML, MD, MA and PD) for the definition of snowed ROIs. This is a strong limitation for the analysis of long time series, and it outlines the need of automated solutions since BT and SS algorithms, for example, did not require any user decision. The results obtained by the BT method and the SS algorithm were preliminarily analyzed considering the confusion matrix of each image and estimating the average overall accuracy as reported in Table 1.

Considering only two classes of cover (snow and not snow pixels), the comparison between automated and supervised classifiers showed in general a good agreemen<sup>t</sup> with an overall accuracy higher than 90%. Furthermore, SS showed a better performance compared to BT with an increased average accuracy of about 1–2% in terms of pixel number. While BT reached the full agreemen<sup>t</sup> with the supervised methods in 10% of images, SS matched the classifications obtained by the traditional approaches in more than 30% of images. The goodness of the automated algorithms is confirmed by the Cohen's kappa coefficient, which increases from 0.89 for BT to 0.93 for SS. Both averages indicated

very good agreements between supervised and automated solutions but they confirmed the increased performance of the algorithm based on Spectral Similarity. Although these differences may seem limited, the contribution of 2000–5000 pixels (in a masked part of the camera image of 250,000 pixels) in terms of surface can be important, depending on the distance of each pixel. The projection of each pixel on the surface could increase consistently from closer to faraway pixels. From this perspective, the impact of omissions and false discoveries on the projected area could be higher than the overall accuracy in terms of pixels and it should be analyzed case by case.

**Table 1.** Overall accuracy of automated algorithms, Blue Thresholding (BT) and Spectral Similarity (SS), versus supervised classifiers: the Mahalanobis distance (MA); the Maximum Likelihood (ML); the Minimum Distance (MD); and the Parallelepiped classifier (PD).


#### *3.2. Comparison between Automated Classifiers*

The comparison between the estimated snow-covered areas obtained by the two automated algorithms (Figure 4a and Figure S1 for one example) confirmed the trend on underestimating the snow extent by BT compared to SS (see Table S1 for the raw data). The FSC estimated by the two methods differed slightly (the non-parametric Kruskal-Wallis chi-squared test indicated a non-significant statistical difference) and the Root Mean Squared Error (RMSE) was about 7.4%. The relation between the two FSC estimations showed a good correlation (R<sup>2</sup> close to 0.95) and the slope of the regression was 0.91 with an intercept of 11.5%.

**Figure 4.** Performance of Blue Thresholding (BT) algorithm versus the Spectral Similarity (SS) method considering only the test dataset (**a**). Comparison between the two methods considering the complete dataset (**b**).

Although BT and SS estimations were almost consistent considering only the small dataset, the complete dataset highlighted an improved performance of SS (Figure 4b). The Kruskal-Wallis test indicated differences with a significance level higher than 99% and the RMSE was about 12%. The relation between the two FSC estimations showed a limited correlation (R<sup>2</sup> close to 0.87) compared

to the small dataset, and the slope of the regression was 0.87 with an intercept of 14.5%. The detection of snow-covered areas using SS was generally higher than that obtained by BT and in few occurrences, it was completely missed by BT (see Table S2 for the raw data). The points closer to the left axis were, in fact, situations where light conditions (low sun elevation or intense cloud cover) affected the BT output. Those illumination conditions were important also in additional cases, where BT underestimated the snow-covered area compared to SS.

#### *3.3. Comparison between FSC Estimations Obtained by Terrestrial Photography and Remote Sensing*

The comparison between satellite products and terrestrial photography retrievals was focused on evaluating the relationship associated with the two data sources (see Table S3 for the raw data). We considered remotely sensed data with different spatial resolutions and data chains. The Landsat images available in the considered time range was 189, but 55 images were discarded due to the intense cloud coverage. The MODIS values were obtained in absence of clouds 2314 times over 6556 overpasses within the studied period. Finally, 289 GlobSnow data points were available during the considered period. While Landsat and MODIS data were converted in FSC considering the state-of-the-art relation described by [8], the GlobSnow product is ready-to-be-used considering the ground-truth support of the calibration sites identified in the images.

The Landsat sensors provided 24 observations (Figure 5a) and 10 were characterized by NDSI higher than 0.6, indicating the total coverage of snow in pixels. While two observations showed coherent NDSI values with the camera estimates (when snow cover was absent, the NDSI was negative), intermediate values were 3 times slightly above the expected results estimated using Equation (2) and 9 times consistently higher (more than 30% of overestimation). Whereas illumination differences can be related to the definition of a possible site-specific relation, heavy differences occurred when a partial shadow of clouds on the ground was present during the satellite revisit. The non-parametric Kruskal-Wallis chi-squared test indicated differences with a significance level of 80%, the RMSE was about 21% and the correlation coefficient was 0.59.

The MODIS sensors provided 430 observations (Figure 5b) and 205 were characterized by NDSI higher than 0.6, indicating the total coverage of snow in terms of pixels. The intermediate values were, also in this case, generally above the expected results. A first group of 26 observations showed camera FSC higher than expected NDSI-derived values with a difference higher than 30%; 33 observations were up to 30% higher; and 15 times MODIS products didn't detect any snow cover while the camera measured FSC ranging between 10–60%. All of these situations occurred when the cloud screening missed to identify partial cloud shadows on the ground while the satellite was overpassing. This comparison, in addition to Landsat indications, showed negative estimations in eight cases. These estimations (more than 20%) were artifacts associated with wrong cloud masking (there was no snow on the ground and it was full of clouds in the sky). The non-parametric Kruskal-Wallis chi-squared test indicated differences with a significance level of 99%, the RMSE was about 14% and the correlation coefficient was 0.91.

Finally, the GlobSnow SE product provided 62 observations (Figure 5c) and the estimated output was coherent 57 times (with full snow coverage at the ground), whereas the GlobSnow product missed to detect the snow cover 5 times, compared to the camera observations. The non-parametric Kruskal-Wallis chi-squared test indicated differences with a significance level of 99%, the RMSE was about 18% and the correlation coefficient was 0.84. From a statistical point of view, all the satellite products showed significant differences compared to the camera-based estimations even if the correlation was good. This observation is influenced, of course, by the number of outliers included in the available dataset composed by the different satellite revisits, which depends mostly on cloud screening.

**Figure 5.** Comparison between Fractional Snow Cover estimations obtained by terrestrial photography and remote sensing. Plots refer to Landsat (**a**), MODIS (**b**) and GlobSnow (**c**).
