**4. Methods**

#### *4.1. Deriving the Snow Depletion Curves from Snow Observations*

To help capture the range of observed SWE and SCF relationships, a histogram analysis is introduced that allows for reducing the dispersion of data and retaining most measurements within the analysis. A binned scatterplot method is used where SNOTEL SWE observations (along the *x*-axis) are grouped in to bins of equal length (e.g., 40 mm of SWE) and the corresponding MODIS SCF observations are averaged within the given bin. This approach allows for highlighting relationships that may otherwise be masked in a normal scatter plot (e.g., [46,47]). For each bin, a boxplot can be produced to reflect the spread in MODIS SCF values for that given bin and be represented by that bin's statistics (mean, median, minimum/maximum, etc.). Figure 2 presents the full scatter across the different SNOTEL SWE and MODIS SCF observation ranges and the SCF averages per 40 mm SWE bin for the WA and CO domains. For this study, the average SCF (arithmetic mean) of each SWE bin is used.

For the period of interest, WYs 2000–2010, two years were selected, WY2004 and 2005, for the validation period, since they represent a range of snow conditions for the two domains. For both CO and WA, WY2004 experienced slightly less than average annual SWE. However, for WY2005, CO experienced above normal SWE and WA had very low SWE amounts. The years used for the binning and SDC equations include WYs 2000–2003 and 2006–2010, which are considered the "training" dataset. The WYs 2004–2005 are only included for the main DA experiments and validation period (see Section 5).

Figure 2 shows the range of MODIS SCF values (*y*-axis) in relation to the SNOTEL SWE points (*x*-axis) as scatter points. The large range of observed SCF and SWE values is noted here as a representative of the different snowpack and cover conditions that can occur in mountainous regions, e.g., low SWE amounts with high SCF during the accumulation phase. MODIS SCF values of 0% are excluded, and zero-valued SWE values are not included in the 40 mm bins. Some bins are removed from the curves if less than five MODIS SCF measurements were in a bin or were considered extreme outliers, resulting in a total of four bins removed for WA region and one for the CO region. The WA region scatterplot points (red) reveal much higher SNOTEL SWE values (greater than 2500 mm) than the CO SWE points. Since the points are scattered and weighted in a logarithmic fashion, logarithmic functions were fitted to the points to estimate relationships between the values. The WA logarithmic-regression line (red) indicates less change in SCF as SWE increases than the CO regressed line (green). Regression equations and coefficient of determination, R2, values are highlighted in the lower right-hand corner boxes, where the red (green) box corresponds to WA (CO). The logarithmic equations and R<sup>2</sup> values shown here were generated within a Microsoft Excel Spreadsheet analysis. For the binned average values, logarithmic functions are linearly regressed onto them, producing high R<sup>2</sup> values: 0.822 and 0.931 for WA and CO, respectively. With binning the SWE observations and averaging the corresponding SCF values within each bin, much of the high data variability and spread is reduced with the binned averages, thus providing a greater "signal", as reflected with the *R*<sup>2</sup> values.

To show the robustness of this approach, other histogram tests were applied to the data to see if the relationships between the SWE bin and SCF average would change with different bin sizes or input data. First, an equal number of SNOTEL SWE observations were grouped per bin (1000 per bin) with SCF averaged across each new SWE bin. Another test was also applied where the SNOTEL observation sites were separated into two separate groups, thus two groups of 23 for WA (56 total sites) and 49 for CO (98 total) were set. These tests provided support for the use of the binned scatterplot approach in deriving the SDC-type relationships between the SNOTEL SWE and Terra MODIS SCF observations. For more details of these results, please see Section S1 of the Supplementary Material.

**Figure 2.** The binned scatterplot bin averages for Washington (WA) (large red squares) and Colarado (CO) (larger green triangles) imposed over all scatter plot points that are used in generating the bins. Logarithmic functions are fitted here to the binned snow cover fraction (SCF) averages with the red (green) line corresponding to the equation and *R*<sup>2</sup> value in the lower right-hand side red (green) box for WA (CO).

#### *4.2. Applying the New Snow Depletion Curves as Observation Operators*

In this section, the focus is on incorporating the logarithmic fitted functions, of the SWE-bin averaged SCF values, as new observation operators within the EnKF method. The observation operator used in the EnKF and other DA approaches, is typically represented with the symbol, **H** or **h**, which is used here as a vector function to transform the model SWE or snow depth states into the predicted SCF estimates, equivalent to the SCF observations. These two new annual-based, SDC-type observation operators for both the WA and CO regions are considered observation-based and referred to hereinafter as *obs*-**h** functions. We compare these new observation-based operators with the default, CLM2 model-based snow depletion curve, referred to hereinafter as the *model*-**h** observation operator. The 40 mm SWE binned SCF averages are plotted against predicted SCF values estimated with the CLM2 function, which is used as the default model-based SDC or observation-operator. The default CLM2 SDC function requires snow depth as the main input to calculate SCF. Thus, to compare it and its shape against the new observation-based SDC logarithmic functions on the same plot, we convert the 40 mm SWE bin values to snow depth, which requires a snow density estimate. Figure 3 compares these different SDC observation operators over the range of the 40 mm SWE-bin values, used as inputs to the model curve. For CLM2, three different bulk snow densities (100, 250, 400 kg m<sup>−</sup>3) were used to derive snow depth values from the SWE bin inputs to show the impact on the predicted SCF values in relation to the observation-based functions.

**Figure 3.** Comparison of the different SDC-observation operators (**h**), including the Community Land Model-based SDCs (CLM2 and CLM4-Niu and Yang (2007; NY07), and the observation-based observation operators (*obs*-**h**) for WA (maroon dashes) and CO (orange dashes). The 40 mm snow water equivalent (SWE) values are used in the CLM curves (*model*-**h**) to derive SCF estimates for this comparison. For the CLM2 curve, three different snow densities are used to show snow-depth to SWE (100, 250, 400 kg m<sup>−</sup>3) ranges in SCF estimates, and the NY07 curve for just the 250 kg m<sup>−</sup><sup>3</sup> snow density value is used (navy blue diamonds).

In this comparison, the CLM2 range of density-dependent curves tend to have a greater rate of change in SCF for the first 400 mm of SWE in relation to the range of observation-based points (for both WA and CO). These differences translate into how model predicted SWE is converted into predicted SCF for the EnKF updates. In addition to the default CLM2 SDC function, the CLM version 4 (CLM4) SDC function, developed by Niu and Yang (2007; referred here as NY07), is included in Figure 3, using its default settings that have been used in some global climate model simulations [17] and snow cover DA studies [13]. For the CLM4 curve, only the bulk snow density of 250 kg m<sup>−</sup><sup>3</sup> was used, and shown in Figure 3; however, observation-based snow densities have been estimated to be between 175 and 320 kg m<sup>−</sup><sup>3</sup> at coarser scales (at 1.0◦ × 1.0◦) by Niu and Yang [20]. For their data assimilation experiments, Su et al. [13] tuned the melt factor shape parameter and set the minimum snow density to 100 kg m<sup>−</sup>3. The main difference in the CLM4 equation and that used in CLM2 is that the CLM4 hyperbolic tangent function reaches a value of 1, whereas the CLM2 function only approaches 1. Other studies have used the NY07 SDC type formulation for snow cover DA experiments, and either tuned the parameters or used the default model formulation [16].

The CLM4 NY07 SDC function would predict very high SCF values, even overpredict, for higher resolutions applications, such as used in this study with a 0.01◦ grid size. The CLM4 SDC was originally tuned to coarser scale SCF and snow depth observations (e.g., 1.0◦ grid size), so the higher predicted SCF values, shown in Figure 3, would not be as applicable to our high-resolution snow cover data assimilation study. The observation-based SDC logarithmic functions do sugges<sup>t</sup> lower predicted SCF, with WA averages barely extending above 90% fractional snow coverage. Though it is argued here that the observation-based SDCs may reflect more realistic SCF estimates, these curves do not take in to account specific density changes, nor underestimation that can occur with the presence of dense forests and highly variable terrain, both major issues that impact MODIS SCF detection. The noticeable

differences between the WA and CO binned SCF averages is most likely a result of such detection issues in WA, where there are especially much thicker and darker forest canopies than found in CO.

Next, we apply these different observation operators with the EnKF method and added new options in LIS to handle the two new *obs*-**h** functions for WA and CO. These two functions are considered representative of annually based climatologies and are therefore considered static in nature, encompassing a range of snow conditions, similar to the CLM2 *model*-**h** curves. In Figure 2, the two logarithmic functions shown were derived using a minimum of five observations per histogram bin. To impose a slightly stricter constraint in generating these equations, a minimum of ten observations per SWE bin was applied in generating the final SDC-type observation operator equations used in the data assimilation step. The final two annual observation-based *obs*-**h** logarithmic equations are:

$$\text{WA: SCF}\_{\text{i}}^{\text{f}} = 5.7928 \,\,\text{\*}\,\text{ln(SWE}\_{\text{i}}^{\text{f}}) + 43.0622,\tag{1}$$

$$\text{'CO:SCF}\_{\text{l}}^{\text{f}} = 6.8359 \,\,\text{\*}\,\text{ln(SWE}\_{\text{l}}^{\text{f}}) + 45.8553,\tag{2}$$

where f denotes the model forecast and i denotes the ensemble member. Though the slope intercepts of these two logarithmic Equations (1) and (2), and what is shown in Figures 2 and 3, occur around 40 to 50% SCF when SWE is shown to be at the 0 value. However, when the model predicted SWE is at 0., the predicted SCF is also set to 0. Thus, when the predicted SWE goes above 0, Equation (1) or Equation (2), depending on the region, is applied as the observation operator in the DA innovation step, when comparing with the MODIS SCF estimate.

#### *4.3. Generating Temporal and Physiographic Varying Snow Depletion Curves*

Physiographic conditions are known to be a major control on larger spatial patterns of snow cover distribution, especially topography [2,3,48,49]. Also, snow cover patterns vary greatly throughout the year with extensive (lower) snow cover and low (high) snowpack depths in fall (spring), affected by snow accumulation (snowmelt) processes. As similarly derived for the annually representative observation-based SDC equations (described in the two previous sections), additional *obs*-**h** SDC relationships are derived and evaluated for different temporal and physiographic conditions, e.g., elevation bands and vegetation type, for the two regions, Washington and Colorado. The observation-based operators derived in the previous section will be referred to hereinafter as the *annual* based *obs*-**h** operators, to distinguish them from the ones that are derived here for the varying conditions.

Similar to the description in Section 4.1, the binned scatterplot approach is applied, but for varying elevation bands, vegetation types, and calendar months for both WA and CO. The SNOTEL SWE and MODIS SCF measurements are used in the same manner but binned for specific cases. The scatter-bins and logarithmic functions are generated for each case (see Supplementary Material for more information). For time of year, an *obs*-**h** curve is derived for each month, and separately for each region. The summer months were found in different calculations to average near zero, so no observed curves were fit for this season, and the CLM2 *model*-**h** curve is used instead when a MODIS SCF observation is present. For vegetation type, only the vegetation types associated with the MODIS-based UMD landcover map and collocated with a SNOTEL station (related to the equivalent gridcell location) are included. For WA and CO, five overlapping UMD classes resulted from the group of SNOTEL sites, so bins and curves were derived for those types. These five landcover types include evergreen needleleaf, mixed forest, woodland, open shrub and grassland. For CO, there was also one deciduous broadleaf case found collocated with a SNOTEL site. This was not included in the final DA experiment and the mixed forest SDC was applied at this location.

For elevation bands, elevation ranges were based on SNOTEL elevation values with near average mid-points of elevation taken as a cutoff value to distinguish higher versus lower elevation bands, as similarly done in Andreadis and Lettenmaier [10] and Su et al. [13]. For WA, the average midpoint elevation is about 1500 m and used to set the lower elevation range as 1000–1500 m and higher elevation range as 1501–2000 m. For CO, the two elevation band ranges are 2500–3000 m for the lower and 3001–3600 m for the upper band. These two bands represent the range of elevation conditions involving the SNOTEL sites, even though higher elevations extend above the highest SNOTEL station. Applying the scatterbin plot approach to each group of elevation points, two sets of logarithmic curves are generated like before for each region, one reflecting a lower and the other a higher elevation band. Table 1 presents an overview of the different model experiments and naming conventions used.

**Table 1.** A summary table of the different baseline and data assimilation-based experiments and the naming convention used for each experiment. Same experiments were conducted for both WA and CO domains over the water years, 2004 and 2005.


Additional details, figures, and formulas for the newly fitted curves for the different conditions and regions are provided in the Supplementary Material, Section S2, that accompany this paper. Please refer to that published material for how the vegetation type, elevation band, and 12-month binned and fitted SDC functions were derived.

#### *4.4. Estimating the Observational Standard Errors*

To estimate the total observational standard error, <sup>σ</sup>total, associated with each of these SDC-based observation operators, a method is followed that is similar to what is outlined in Arsenault et al. (2013). Using SNOTEL SWE and MODIS SCF measurements for WYs 2000–2003 and WYs 2006–2010, the new estimated <sup>σ</sup>total values for the *obs*-**h** are: 33.04% for WA, and 28.67% for CO. For the *model*-**h** estimated <sup>σ</sup>total values, they are 37.60% for WA, and 27.3% for CO. Previous studies have used a static 10% observational standard error [10,13,16], while others used a more dynamic or calibrated set of observational errors when assimilating MODIS snow cover [14,17]. It is hypothesized that by using the MODIS SCF observations themselves to derive new SDC formulas for the EnKF approach, should bring the model SCF predictions in line with the SCF observations. In a way, this is considered an observation bias reduction approach for the filter, by using observation operators that more closely match the observational averages.

For the different temporal and physiographic type observation-based operators, corresponding observation standard errors, σ**total**, were also estimated by using these new logarithmic curves along with the available SNOTEL SWE and MODIS SCF observations for the period, WYs 2000–2003 and 2006–2010, which is outside of the data assimilation experiment window. The new σ**total** values are calculated for the different monthly, vegetation and elevation conditions and for each region by inputting the SNOTEL SWE values in to the new logarithmic functions and then compared to the MODIS SCF estimates. This is again to capture both the errors in the SCF detection rates and newly derived observation operator functions. As shown in Figure 4, the results for both CO (blue bars on the right) and WA (red bars on the left) are compared side-by-side to examine the regional differences in the

estimated observation errors. CO-associated errors are lower overall than WA, especially during the winter months. When these observational errors are lower, again more dependence on the observations is given, and thus through the EnKF updates, the model will draw closer to the observations. Stronger responses to the SCF observations are then expected in CO than in WA region for the DA results.

**Figure 4.** A bar plot of the varying observation standard error values, <sup>σ</sup>**total**, (in % of SCF) for each given set of derived observation-based observation operators, *obs*-**h**, including annual-based, monthly, vegetation type and elevation band. Blue (red) bars indicate observation errors associated with CO (WA).
