*2.1. Potential for Remote Sensing as a Tool to Collect Data on the Subwatershed Scale*

We used the mean annual fraction of cloudy days from the EarthEnv global cloud cover model dataset [28] as a surrogate to indicate how likely it is to acquire remote sensing imagery of a 1 Km<sup>2</sup> scene and how hard it is to process this imagery for use (Figure 2a). This dataset includes 1 Km resolution mean conditions between the period of 2001 and 2014. We used the global map of minimum travel times from a location to a population center (defined as a city or township with more than 50,000 people) produced by the Malaria Atlas Project [29] as a surrogate for accessibility to a given 1 Km2 grid cell (Figure 2b). This raster map is the outcome of a geospatial travel time model that takes existing road and rail networks, land cover and 30 m resolution digital terrain models of the entire globe. The combination of these two datasets, respectively indicate how likely it is to acquire a scene remotely, and how easy it is to ground-truth reflectances to link the remotely sensed data to land cover characteristics and water quality impairments. The tacit assumption is that it will be possible for stakeholders tasked with managing the watershed to reach any major population center, and then travel from there to perform in-situ sampling work. To link these datasets to the hydrological units of interest, we used the USGS' National Hydrography Dataset's (NHDplus) HUC-12 layer [3].

**Figure 2.** Datasets used to model potential for remote sensing in watershed monitoring: (**a**) mean annual fraction of cloudy days, and (**b**) maximum shortest travel time to population center within each HUC-12 subwatershed. These quantities are surrogates for acquisition and accessibility of remotely sensed scenes and in-situ data needed to ground-truth them.

We modeled the potential for using remote sensing for monitoring and data collection within each subwatershed as the product of various categorical levels of τShortest max , the maximum of the shortest time to reach a population center from anywhere within the subwatershed, and various categorical levels of *μ*Cloud Cover, the mean annual fraction of cloudy days within the subwatershed. Within each subwatershed, we estimated τShortest max as the maximum over all pixels within the area of the shortest time to reach a population center. We estimated *μ*Cloud Cover as the mean over all pixels within the area of mean annual fraction of cloudy days. Thus, for the *i*th subwatershed, we have

$$P\_{\text{Remote Series},i} = \mathbb{C}\left(\tau\_{\text{max},i}^{\text{Shortest}}\right) \times \mathbb{C}(\mu\_{\text{Clood\\_Cover},i}) \tag{1}$$

Here, the function <sup>C</sup>(·) assigns a category from 0 to 3 to both the acquisition (*μ*Cloud Cover) and access (τShortest max ) variables according to Table 3. 0 indicates a low potential for success, and 3 indicates the highest potential for success. The possible values of *P*Remote Sensing obtained in this way are binned into four categories for ease of application: 0 (unsuitable), 1–3 (low), 4–6 (good), and 9 (excellent). We note that the numeric values of *P*Remote Sensing themselves are only meaningful when they contribute to the different categories of "unsuitable," "low," "good" and "excellent."


**Table 3.** Cost-payoff matrix for modeling potential of remote sensing for watershed monitoring.

As stakeholders in the watershed management domain may weigh the costs associated with the logistics of travel to remote locations for in-situ ground-truthing of remote sensing data, and the careful planning of periods when to collect in-situ data in conjunction with clear sky days, we represented these weights in cost-payoff matrices to set the categorical values of the acquisition and access variables (Table 3). These weights effectively lower the τShortest max and *μ*Cloud Cover threshold values, respectively for the accessibility and acquisition criteria for the different categories going from conservative to normal to optimistic estimates of the cost-payoff relationships (Table 3).

For example, one regulatory agency may be willing to send an engineer on a six-hour long journey to collect water quality data. So, any value of τShortest max,*<sup>i</sup>* smaller than six hours for a subwatershed within the jurisdiction of this agency would get a C τShortest max value of 3. Another agency may decide that it does not want its engineers to spend more than one hour traveling to a remote site. In this case, the C τShortest max for any τShortest max larger than one hour would be 2 or lower. Similarly, a consultant may be willing to invest the effort into building robust processing pipelines to deal with largely cloudy scenes or invest time and resources in data fusion-based cloud removal algorithms that are becoming more popular across various platforms [30,31]. This consultant would perhaps be willing to assign a C(*μ*Cloud Cover) value of 2 or higher for a subwatershed where the sky is likely to be cloudy for almost 75% of the days in the year. Another consultant who is perhaps geared to directly use analysis-ready products may assign a C(*μ*Cloud Cover) value of 3 for their subwatershed only if the sky is likely to be clear for 75% of the days in the year. By presenting maps of the categorical levels of the potential across a range of <sup>C</sup>(·) values, we allow decision-makers to choose the cost-payoff level they are comfortable with to plan their workflows accordingly.
