*3.1. Summary*

Our methods were designed to address two objectives. The first objective was to determine which would be more accurate, ET predictions from a watershed model calibrated to streamgage (ETwm) or ET predictions from a watershed model of the same area calibrated to ETrs product (rather than streamgage). We did this by comparing the ET accuracy of two models: (1) MODIS MOD16, (2) SWAT calibrated to streamgage. Model accuracies were evaluated relative to ET observations at three eddy-covariance flux towers (Figure 1). If ETrs is more accurate than ETwm, then it follows that calibrating the watershed model to that remote-sensing ET product could potentially improve the watershed model's ET accuracy over that of a purely stream-based calibration approach. Conversely, if ETrs is less accurate than ETwm, then calibrating the watershed model to those ETrs would only degrade the watershed model's ET accuracy relative to a stream-based calibration approach. ET observations and simulations were considered on a monthly basis during water years 2009–2018, with a water year defined to extend from October 1 through September 30. For metrics of ET model accuracy, we used the Nash-Sutcliffe efficiency (NSE) [47] which equals the fraction of variance in observations explained by a model, and model percent-bias in average ET (PBIAS) [48]. We also evaluated how well the two models producing ETrs and ETwm captured the relationship between average ET and elevation (1160–2700 m). Metrics of NSE and PBIAS were reported for the period of all water years during 2009–2018, water years with annual precipitation below the 2001–2019 median of 803 mm ("dry years") based on PRISM AN81d product [34,35], and water years with annual precipitation at or above that median.

The second objective of this study was to compare and contrast ET from flux towers and models in a way that identifies seasonal conditions associated with model strengths and weaknesses. We started this out with an examination of monthly time series plots. Next, we compared and contrasted ET "seasonality", defined as the set of monthly ET averages during the calendar year. Lastly, we examined relationships between monthly ET and weather variables of air temperature and vapor pressure deficit (Section 3.2), variables which influence atmospheric water demand and are also used to limit canopy conductance in the ETrs model (Section 3.4).

#### *3.2. Weather Data*

Weather data for watershed modeling and examination of ET-weather relationships were obtained from daily 2.5-arcminute PRISM AN81d product [34,35,49] and GridMET product [50]. Air temperature and vapor pressure deficit were computed from PRISM values of minimum air temperature, maximum air temperature, and dewpoint temperature, along with the equation for saturation vapor pressure in Murray [51]. Downward solar radiation and wind speed were obtained from GridMET. Daily weather time series

were assembled for each subbasin of the watershed (N = 47; Figure 1, lower right) by downsampling the gridded weather data to approximately 250-m resolution, masking the downsampled data to subbasin boundaries, and computing spatial averages (Supplement section S1). These weather time series were input to ArcSWAT to force spatial elements of the watershed model referred to as Hydrologic Response Units (HRUs) (Section 3.5). The weather data used for the examination of ET-weather relationships were taken from the model HRUs nearest to the flux towers as shown in Figure 1.

#### *3.3. Flux Tower Observations of ET*

The ET observations used in this study were collected at three flux towers (Figure 1) situated along an environmental gradient ranging from rain-dominated Ponderosa pine at 1160 m to snow-dominated Lodgepole pine at 2700 m [28] (Table 1). The observations were collected during the day at 30-minute intervals using the eddy covariance method [52], at a height of 5–10 m above the tallest trees [28]. We first obtained the 30-minute ET observations and "Flexible Filler" processing script (Matlab) from the Goulden Lab [45]. Using the Flexible Filler script, we filtered out 30-minute data during calm atmospheric conditions and filled the resulting gaps using the regression approach in Goulden et al. [28,52]. The filtered, gap-filled data were then aggregated to monthly values, requiring at least 50% data availability for each month. Dates of data availability slightly differed for each flux tower, as shown in Table 1.

**Table 1.** Site characteristics of eddy covariance flux towers providing ET observations for this study [28]. These sites are mapped in Figure 1. Water years of data used for this study are listed at right. Years with partial data availability, indicated with asterisk, provided less than 11 out of 12 months' worth of data.


#### *3.4. Remote Sensing of ET*

The ETrs product used in this study was the 8-day, 0.25-arcminute global MOD16A2 product from MODIS sensors onboard the Terra and Aqua satellites [11,12,53]. The model behind this product uses a modifed Penman-Monteith approach adapted from Cleugh et al. [6] to predict evaporation from wet and dry soil, evaporation from wet canopy, and transpiration from dry canopy [12,54]. The model operates at a daily time step using meteorology from NASA's Global Modeling and Assimilation Office (GMAO) reanalysis dataset (1.0 × 1.25◦ resolution), 8-day composites of absorbed photosynthetically active radiation (FPAR) and leaf area index (LAI), and 16-day composites of albedo. Limitations on ET associated with water availability and stomatal physiology (e.g., Jarvis [55]) are modeled in the MOD16A2 product using canopy conductance correction factors consisting of linear functions of vapor pressure deficit and minimum air temperature [11,12,54]. These functions are parameterized by biome-specific look-up tables organized by land cover classification (Table 3.2 of Running et al. [54]).

There was some uncertainty as to how ET in individual MODIS cells, each approximately 500-m across, would correspond to flux tower observations which collect fluxes from areas (footprints) typically measuring 100–2000 m across [56]. For the comparisons to ET from SWAT and flux towers, and in consideration of plausible intra-footprint variability in MODIS data (see Supplement Figure S3), we decided to obtain the MODIS ET data from the single MODIS cells containing the flux tower locations (Figure 1). This decision followed from the results in Supplement Figure S3 showing that MODIS intra-footprint

variability tends to be substantially less than same-cell differences between MODIS and flux tower. We filtered out any MOD16A2 data having QA/QC flags not of "good quality" and not free of "significant" cloud cover, then aggregated it to monthly values using time-weighted averaging and requiring at least 50% data availability for each month.

#### *3.5. Watershed Modeling of ET*

We modeled ET in the upper Kings River using the Soil & Water Assessment Tool (SWAT) version 2012, with ArcSWAT version 2012 for model construction. SWAT dates back to the late 1970s for research into land managemen<sup>t</sup> impacts on water, sediment, nutrients, and pesticides [57–59]. It has been widely used to simulate hydrologic fluxes in snowinfluenced watersheds such as this study area [60–65]. In SWAT, the water conservation equation is solved at a daily time step in HRUs, each of which is a numerical 1-D element ("bucket") of homogeneous land cover, soil, and slope. Each HRU exchanges water/energy with the atmosphere (including ET) and delivers runoff directly to the stream reach within enclosing subbasin (subbasins shown in Figure 1, lower right). We selected the Penman-Monteith option for potential ET; actual ET was computed based on potential ET and water availability in soil and underlying aquifer [66].

To construct the watershed model, we first built a "base model" in ArcSWAT using the parameters and input datasets for elevation, soil, land cover, meteorology, and snow cover detailed in Supplement section S1. We then initialized the base model with LAI and biomass from the end of a 30-year spin-up forced by dynamic steady-state weather (Supplement section S2), forming the "plant spin-up" model representing mature forest conditions at the start of simulations. Next, we applied a calibration procedure to the plant spin-up model as follows. First, we used a global sensitivity analysis (GSA) to identify the most influential parameters of the plant spin-up model to estimate via calibration. Using the Sobol method of GSA [67], we identified the influential parameters (N = 12) accounting for 99% of the variance in model streamflow error (Supplement section S3). Next, we calibrated the plant spin-up model by estimating values of influential parameters that minimized model errors relative to 2003–2010 monthly, full-natural streamflow at the watershed outlet [36], as described in Supplement section S4. We used for calibration the Sequential Uncertainty Fitting algorithm (SUFI-2) calibration and uncertainty tool of the SWAT-CUP software package [68–70]. Lastly, we validated the calibrated model to 2011–2019 monthly, full-natural streamflow using the parameter ranges obtained from calibration (Supplement Table S6). Simulations for calibration and validation were spun up to a total of 13 years of weather data, consisting of 10 years of dynamic steady-state weather (for aquifer equilibration) followed by three years of real weather (for soil and plant equilibration) (Supplement sections S2 and S4). In terms of model performance, the calibrated and validated models both showed "very good" results (Figure 2) based on criteria in Moriasi et al. [48] and Abbaspour et al. [68]. For the comparisons to flux tower and MODIS, we used monthly ET of the "best" simulation (Figure 2) taken from the HRUs nearest to the flux towers (Figure 1).

**Figure 2.** Monthly stream discharge at Pine Flat Dam from observations versus best simulation of SWAT watershed model. 95% simulation probability also shown, defined at the 2.5– to 97.5–percentile of monthly simulations. Fit statistics are for "best simulation" having the highest objective function, i.e., the Kling-Gupta efficiency (KGE) [71]. NSE = Nash-Sutcliffe efficiency [47], PBIAS = model percent bias [48].
