**1. Introduction**

Snow depletion curves (SDC) define the relationship between changes in snow cover area (SCA) and the snow pack, which can impact, for example, snow-albedo feedback in global climate models [1] and the amount of water storage and melt for hydrological models [2,3]. SDCs typically involve functions that are fit or tuned to a given set of snow-based observations or theoretical conditions. Currently, many hydrological, land surface models (LSMs), and climate models use simple to complex schemes to define this snow depth-cover relationship, with many models still using very simple schemes, which do not account for even regional or temporal changes [4]. Some approaches to estimating snow depth-snow cover relationships involve statistical approaches, from using simple fitted functions to more shape and scale parameter-based gamma and beta distributions [4–8]. Much of the snowpack depletion in mountainous regions relates to late winter and early spring peak snow water equivalent (SWE) melt energy and radiation spatial variations (e.g., [2,9]), even though SWE can decrease without decreasing areal snow cover.

Several SDC schemes have been used to map the predicted SWE or snow depth states to snow cover fraction (SCF) (or vice versa) for different snow cover data assimilation (DA) approaches [10–12]. The SDC scheme can be used as the observation operator, which relates the model variables to the observations. In this case, the SDC acts as the observation operator and converts snow-based model estimates (e.g., SWE) to be in the same units and similar value range as the snow cover observation estimates (e.g., snow cover fraction), for calculating the innovation and assimilation update. The innovation step is simply the difference taken between model-generated snow cover and the observed snow cover estimates. Some studies have utilized the Ensemble Kalman Filter (EnKF) to assimilate snow cover fraction or area observations [10,13,14], as it has been shown to perform overall better than simpler methods, e.g., direct insertion [15]. EnKF relies on an ensemble of model forecasts generated using error covariances. Snow cover assimilation using the EnKF method has been applied at different scales, including global [16], continental [13], and regional [10,14,15,17]. Other ensemble-based snow cover DA studies have used the ensemble square-root filter [18] and the particle filter [19].

A variety of SDC functions have been used as the observation operator in these past snow-cover DA studies. Cumulative density functions (CDFs) of beta distributions for varying conditions have been applied regionally [10], or tuning of a hyperbolic tangent formulation, using shape parameters and a snow density parameter [20], has been applied also for different scales [13,16]. Other studies have used, for example, a two-parameter based lognormal probability distribution function, where snow cover area was represented as a summation of areal SWE and the SDCs were tuned by varying different coefficient of variation (CV) parameters for melt and accumulation periods [18]. Such SDC-based observation operator schemes have accounted for varying conditions, such as elevation, vegetation and accumulation and ablation phases [10,13,17]. Accounting for the accumulation and ablation phases, also referred to as the hysteresis characteristic of snow, can be a very important aspect of the SDC scheme, since different curves can represent these phases and are reflected in the curve parameters (e.g., [10,17]).

Previous studies that have assimilated satellite-based snow cover to improve model snow states and other hydrological variables (e.g., streamflow) have utilized snow observations to tune the snow depletion curves. Despite these previous efforts, many of the SDCs used in snow cover assimilation may have been considered theoretically simplistic [12,14] or tuned with very coarse spatial scale snow observations [20], and these SDCs may not be suitable enough for assimilating snow cover data for finer scale, mountainous regions. Also, what if the snow observations themselves were used to derive the snow depletion curves to be used as the observation operators for snow cover assimilation? One recent study used satellite-based snow cover and five snow depth stations to derive SDC functions, based on the hyperbolic tangent formulation, to derive such observation operators for a mountainous catchment in China [17]. Xu et al. [17] generated curves separately for each station and the two snow phases, accumulation and ablation. Assimilating snow cover into an LSM using these highly tuned observational based observation operators improved the modeled snow depth states at those few sites [17]. Another recent study, which focused on generating SDCs at different scales by using observed SWE and satellite-based snow cover area, showed how using such observational information can help estimate better SWE conditions (e.g., peak SWE) when 100% snow cover occurs (for an area or grid cell), given the location or area being represented [2].

In this study, we present a new approach to estimating snow depletion curves and their application for assimilating snow cover fraction observations, using an EnKF data assimilation approach and a land surface model with a multi-layer snow physics scheme. Building upon these recent studies, we use observed SWE and snow cover fraction estimates to derive new SDCs, using a larger array of observations, spanning two different mountainous regions in the United States (U.S.). We refer to these new SDC-type observation operators as "observation-based" and benchmark their skill against the default, model-based SDC function. These new SDC observation-based observation operators are hypothesized to improve the model-based SCF forecasts and snow state analysis. A secondary goal in applying this SDC approach is to see how accounting for varying vegetation, elevation, and temporal conditions may better capture heterogeneous features related to the snowpack and snow cover patterns when assimilating snow cover observations. Finally, the new SDC-based observation operators are used to derive the observational errors, which are used in the EnKF method. Accounting for different errors related to the varying conditions provides different weighting of the snow cover observations against that of the model in the EnKF innovation and update steps.

The primary research question is: How do these new observation-based, SDC-type operational operators perform relative to a default model-based scheme when assimilating snow cover estimates? The secondary research question is: How does accounting for different temporal and physiographic conditions in relation to the new observation-based SDC impact the assimilation of the snow cover estimates? We address these questions throughout the paper within the following sections. Section 2 provides an overview of the study regions, and Section 3 provides several details of the snow observations, the multi-layer snow model, and data assimilation method used in this study. Section 4 describes the method on how the new observation-based SDCs are derived and applied in the EnKF assimilation approach, and Section 5 presents the results of the new SDCs applied as the observation operator when assimilating the snow cover observations. Finally, we discuss the results, benefits and challenges of this new SDC approach in the Discussion and Conclusions section.

## **2. Study Area**

The study domains span the mountainous regions of Washington (WA) and Colorado (CO) states within the United States. The spatial grid is on an equirectangular projection with a resolution of 0.01 degree (~1 km) with bounding coordinates of the lower left corner (45.005◦ N, −121.995◦ W) and upper right corner (48.995◦ N, −116.995◦ W) for WA (400 latitudinal and 501 longitudinal points), and lower left corner (37.005◦ N, −108.995◦ W) and upper right corner (40.995◦ N, −102.005◦ W) for CO (400 latitudinal and 700 longitudinal points). The 0.01◦ gridcells are represented at the center of the gridcell, as reflected in the bounding coordinates. Figure 1 highlights the WA and CO study areas, depicting the areas' high topographic variability (e.g., mountainous and low-land areas). The 0.01◦ elevation maps are derived from the 30 m U.S. Geological Survey (USGS) National Elevation Dataset (NED; https://lta.cr.usgs.gov/NED). Also, these areas have a wide range of soil and vegetation types (e.g., coniferous forests, grasslands, agricultural, etc.). The regions were selected to capture not only the topographic variability, but also variability in surface snow amounts. Figure 1 also shows each region's associated in-situ U.S. Department of Agriculture's (USDA) Natural Resources Conservation Service (NRCS) SNOwpack TELemetry (SNOTEL) measurement networks used in this study. After additional quality-control checks, 56 SNOTEL stations are identified as being within the WA domain, and 98 SNOTEL stations are within the CO domain (represented by black triangles). These stations are used in deriving and the new SDCs, as described in Section 4.1.

**Figure 1.** Elevation maps of Washington and Colorado domains, from the U.S. Geological Survey (USGS) National Elevation Dataset (NED) dataset. All SNOwpack TELemetry (SNOTEL) site (black triangles) are used in generating the observation-based snow depletion curves (SDCs), and a reduced amount of site locations are used only in the model experiments and validation (grey open circles).

#### **3. Data and Model Sources**

#### *3.1. Observational Datasets Used*

The Terra MODIS Level 3, collection 5 (MOD10A1), daily 500 m snow cover fraction (SCF) product has been selected for the work presented here [21,22]. This product is derived from the MODIS Normalized Difference Snow Index (NDSI) to estimate an SCF value for each 500 m pixel [21, 23]) The MODIS snow cover products are obtained from the National Snow and Ice Data Center (NSIDC) in Boulder, Colorado, USA (http://nsidc.org/data/mod10a1) [24]. The 500 m MODIS snow cover estimates have been shown to be sufficient in high mountain terrain [25]. The daily 500 m products come in a sinusoidal projection, which are resampled with the nearest neighbor using the MODIS Reprojection Tool [26] to a 0.01◦ geographic coordinate system (GCS) for our model and DA experiments. The original 500 m sinusoidal projection dataset is used for deriving the observation-based SDCs and observational errors, which are further described in Section 4.

The main SWE observation dataset used in this study is the SNOTEL network dataset [27]. The daily SNOTEL SWE measurements are observed using snow pillow instruments. The SNOTEL network is sometimes one of the only datasets available in mountainous areas in the U.S. to validate model estimates [28], which provide important information, like streamflow, in key water supply areas in the mountainous West [29]. The SNOTEL data used in this study include Water Years 2000–2010, which overlap the Terra/MODIS data period selected. Note: A water year (WY) is used in many hydrological studies and starts on October 1 and goes to September 30 of the following year, spanning one full snow season cycle. The SNOTEL stations are point measurements, but applied here as a representative over the gridcell (e.g., 0.01◦ cells). New SDCs are derived from the above MODIS SCF and SNOTEL SWE observations (which is further described in Section 4).

#### *3.2. Land Surface Model Description and Setup*

The LSM used in this study is the Community Land Model, version 2.0 (CLM2) [30–32]. CLM2 is run in an off-line mode, uncoupled to any atmospheric or other modeling component, and driven within the Land Information System (LIS) framework [33]. The use of LIS allows us to take advantage of its many DA, observational, and model interface capabilities. At the time of this study, later versions of CLM were not available in LIS though CLM2 was already implemented and well tested. The model contains ten soil layers with varying thickness, and a five-layer snow scheme, which accounts for liquid, ice, and heat energy storages and changes within each layer. The water balance equations are mass conserving with inputs provided mainly by precipitation and outputs associated through evapotranspiration sources (e.g., vegetation and soil) and runoff. The snow parameterizations are adapted mostly from other studies [34–36]. The snow scheme's layers can grow and collapse, with the number of layers and a layer's thickness being dependent on snow depth. Any solid precipitation (i.e., snowfall) is automatically integrated with surface snow or soil layers, and any rainfall is incorporated after fluxes and temperatures are updated [32].

Snow cover fraction affects surface albedo, net shortwave radiation, and almost all subsequent model energy and temperature calculations. Despite being such an important model component, it is parameterized in CLM2 and other LSMs as a predicted variable, dependent mainly on total snow depth (m). Therefore, any knowledge of snow cover from the previous timestep is only retained via the depth of the snowpack. CLM2's SCF calculation employs a single, nonlinear formula that is a function of both the snow depth state and momentum of roughness parameter (m) for soil, which is set to a default value of 0.01 m. The snow fraction value is then used in the overall direct and diffuse beam ground albedo calculations and subsequent energy and moisture flux predictions. Comparison studies have shown CLM2 to perform well in different regions and settings, especially for snow [31,37,38]. Subsequent versions of CLM (3.5 and greater) have incorporated additional updates to the snow physics with varying improvement (e.g., [20,39]) and will be made available in future releases of LIS.

The meteorological forcing dataset used to drive the CLM2 simulations is from the North American Land Data Assimilation System (NLDAS) [40]. The University of Maryland (UMD) land classification dataset, derived from the Terra MODIS (collection 4) UMD land cover classification product is used [41]. The CLM2's plant function type classification scheme and parameters were mapped to the UMD classification. For the elevation map, the 1 km NED elevation dataset is used. For further details of the CLM2 model set up and its use for this study, please see Arsenault et al. [15].

#### *3.3. Data Assimilation Method and Experiment Setup*

To assimilate the MODIS snow cover fraction estimates into CLM2, the 1-D Ensemble Kalman Filter (EnKF) method is used in this study, as adapted from Reichle et al. [42,43], which is also part of the LIS DA framework [44]. The SWE and snow depth states are perturbed, along with forcing fields that more effectively drive snow dynamics (e.g., precipitation, shortwave radiation), to generate the ensemble of model forecast conditions with 12 members. The DA methods and parameters used in this study are further outlined and described in De Lannoy et al. [14] and Arsenault et al. [15]. The MODIS SCF observations are also perturbed, as done with the EnKF method. The Terra MODIS SCF product is assimilated into CLM2 close to its local overpass time near 10:00 am local overpass time. Additional details of the snow cover assimilation, observation operators, and observation errors are further described in Sections 4 and 5.

An open-loop (OL) simulation was generated, which is a baseline simulation where no observations are assimilated, and is used as the benchmark experiment to compare the assimilated runs against. The CLM2 OL and DA experiments include a spin-up time period from 30 September 1999, to 30 September 2003. For the OL and DA experiments, a slightly smaller number of stations were used, since CLM2 was found to build "glacier-like" points (i.e., snowpack does not completely melt off during the summer months), due to (1) a cold-bias in the NLDAS temperature field, especially when brought to finer elevation scales (e.g., 1 KM); and (2) an absence of blowing snow represented in the model, which can be a major snow removal process near mountain terrain peaks [45]. Eliminating these "glacier" type points from the DA and analysis restricts the evaluation of the new SDC-based observation operators to the more normally varying snow accumulation and melt grid points. The reduction in stations resulted in a final 40 stations for WA and 78 for CO, versus the original 56 and 98, respectively, used in deriving the observation-based SDCs. These final stations and corresponding gridcells are

highlighted with open grey circles in Figure 1, and they are used for the DA experiments and statistical results, shown in Section 5.
