2.3.2. Implementation

Runoff is among the components of the water cycle as surface water flows overland instead of infiltrating into the ground or evaporating. The GIS RUNOFF procedure has the ability to directly measure the amount of catchment of every pixel in a grid scene under certain assumptions of the distribution of rainfall over the study area. As a result, we utilized the "WATERSHED" and "RUNOFF" procedures of the IDRISI analysis package of TerrSet Software2 [23]. A critical element of watershed delineation is the location and vectorization of points at the lowest elevation of the watershed—i.e., at the mouth of the stream that has gathered all the streams of the catchment area into a single channel just before it empties into a larger body of water such as the sea or a more major river. In the case of the Bumbu, this point, also known as the pour point, lies at the entrance of the Bumbu Stream onto the Huon Gulf. The watershed delineation can be sensitive to the seed image provided as the lower extremity of the watershed. The CONTOUR function under "feature extraction" was found to be helpful in locating a proper seed image. Next, the steps involved for a drainage analysis based on the line vector characterization of stream data are summarized as follows:

• Obtain the SRTM 1 arc second DEM supplied by USGS with 3 0 m resolution and window the DEM to the study area. Using the DEM and the appropriate WATERSHED function of available GIS software, delineate the [WATERSHED] raster (See Figure 4a) and convert the raster to a watershed vector polygon (Figure 4b).

**Figure 4.** Bumbu Watershed Polygon overlain with (**a**) 20 m elevation contour lines and (**b**) major stream lines.


It is important to note that runo ff units are all in pixels where each pixel is equivalent to 977.21 m2. The assumption is that one unit of precipitation falls on each pixel unit of the watershed, or in the case of categories, on each pixel unit of the category. In the case of di fferential precipitation, fractional precipitation is assumed to fall on each pixel. The runo ff procedure accumulates pixels into the stream network as a proxy for actual precipitation.

### *2.4. Raster Based GIS Layers*

Similar techniques can be applied to capture the relative potential of other layers to correlate with water sampling station results. This section elaborates on how we rely on aerial photography and/or satellite imagery to create vegetative density and urbanization layers for similar runo ff extraction at sampling stations. High-definition aerial photography and satellite imagery of the study area are available from USGS Earth Explorer [21]. Categorization of landscape elements in imagery can be accomplished using various techniques of cluster and classification analysis. In the aerial photography shown in Figure 5, it can be readily seen that the study area varies from what appears to be a mature virgin forest to a highly industrialized urban environment. A study by Doaemo et al. [25] revealed that Bumbu Watershed has undergone extensive deforestation and an increase in urbanization in the last 33 years (1987–2020). In this instance, we settled on five arbitrary but intuitively selected categories of (1) dense forest, (2) regen (regenerating) forest, (3) green space (4) semi-urban and (5) highly urban environments as relevant to the water quality study. The land-use types are largely self-explanatory with the exception of "green space". This land-use category arose as a result of aerial photo interpretation of the landscape. "Green Space" characterization was designed to di fferentiate between land primarily characterized by vegetation in various stages of tree growth (mature and regenerating forest) and non-vegetated land (designated urban classes). Close inspection of these vegetated areas in aerial photography revealed extensive garden cultivation of otherwise vacant land. The proximity of these garden plots to highly urban and semi-urban areas suggests these areas are used extensively for food production.

Prototypes for the various groups were envisioned. Sampling of the prototypes was accomplished by identifying points in areas assumed to be prime examples of the proposed classes. We identified thirty sample points per class, and the sample points were saved as a shapefile and then rasterized on a raster of the same dimension and location as DEM. Sample points were expanded to rectangles of 5 by 5 pixels covering approximately 2.7 hectares each and converted back to vector shapefile polygons. The distribution of signature polygons is shown in Figure 5.

These polygons constitute the sampling areas to be superimposed on the satellite imagery for the development of class signature profiles. In this study, signature profiles were developed by sampling the individual color bands of the satellite imagery of Hansen et al. (2013) [26]. The profiles/signatures that were developed were then used to hard classify the entire study area using maximum likelihood estimation (MLE) for final classification of the watershed as shown in Figure 6.

Each land-use class was individually coded as a categorical variable and mapped as a separate layer. As discussed in the example of Section 2.3.2, a Roads and Streets line vector shapefile available from Open Street Map [24] was similarly dummy coded and transformed into a categorical gridded map layer. Subsequently, a separate Population/Habitation layer was extracted from aerial photography by filtering pixels exhibiting high reflectance values >90 for all three RGB bands. The high reflectance was assumed to be the sun's reflection from metal rooftops. This layer was deemed advisable as a secondary measure of human habitation and human activity that might be missed by other urban classifications. Results of the grid transformations and categorizations are shown in Figure 7a,b.

**Figure 5.** Signature polygons used for categorization of the Bumbu Watershed. In total, 30 sample points for each class were identified to implement the MLE clustering technique.

After categorization of the watershed, multi-category rasters are converted into individual single category feature rasters coded 1/0. The roads and population/habitation rasters are similarly recoded 1/0. The procedure from this point follows the same procedure described in previous section for roads. We utilized the DEM to accumulate [CLASS# RUNOFF] for each class. Next, by overlaying the [SAMPLING POINT] raster onto each [CLASS# RUNOFF] raster, a class# runoff value is assigned to each sampling point and saved in an attribute values file for later incorporation into correlational analyses along with other sampling station results. It is again useful to convert the rasters to [CLASS# RUNOFF] line vector shapefiles and point vector shapefiles for graphic presentation of results.

**Figure 6.** Categorization of the Bumbu Watershed in land use categories 1 to 5, including (1) Dense Forest, (2) Regen Forest, (3) Green Space, (4) Semi-urban and (5) Highly Urban.

### *2.5. Point Vector Based GIS Layers*

A third scenario considers the case where only point estimates of important socio-economic or environmental variables are available. Such is the case for rainfall and other weather-related variables measured at individual sampling stations. Thus, far into the development of a protocol, only spatially and temporally uniform rainfall across the watershed was assumed. In reality, rainfall varies spatially and temporally. Such data require interpolation to landscape coverage for analysis using the methods described in Section 2 above for aerial and satellite imagery. A spatially diverse rainfall pattern is a

good example for general application. Unfortunately for this study, only sparse rainfall weather station data are available for the Bumbu Watershed. Given the sparseness of available data, estimates of the spatial pattern of rainfall for the Lae area resulted in a rainfall mapping with substantial uncertainties associated with the estimated spatial pattern. For the purposes of protocol development, these large uncertainties in the estimates of the spatial and temporal distribution of rainfall will be ignored.

**Figure 7.** Shapefile input layers and result of grid transformation (**a**) road line vectors and (**b**) population/habitation polygons.

Spatial interpolation is a well-researched field of geology where point samples of geologic formations must be used for interpolation and reliable estimation of the amount and value of mineral resources. Methods of spatial interpolation include Triangulated Irregular Network (TIN) and Kriging. The sparseness of our sample points failed to satisfy Kriging requirements for a sufficient number of sample points to estimate spatial autocorrelation. Thus, for the purposes of this study, less demanding TIN methods were employed. The current study confines itself to consideration of the spatial variation of average annual rainfall. At present, current weather station data are too sparse to reliably estimate the spatial variation of rainfall across the Bumbu Watershed. Historically the situation is slightly better. McAlpine et al. (1975) [27] reported results of a 15-year study at 600 weather stations across mainland PNG and the islands. Though the McAlpine data are out of date and climate patterns are changing, the McAlpine data represent the best current available estimate of the pattern of the spatial variation of rainfall of the study area, even if absolute amounts of annual rainfall have changed.

TIN network and TIN surface estimation are a standard feature of GIS packages. Thirteen weather stations in the vicinity of Lae from the McAlpine study as shown in Figure 8a were used for this current study. Using the 13 McAlpine point estimates of average annual rainfall 1957–1972 represented in Figure 8a, TIN and TIN surfaces were compiled of the estimated pattern of spatial variation across the Bumbu Watershed study area as illustrated in Figure 8b. The rainfall surface was generated onto a grid, compatible in location and resolution with the watershed DEM used for previous RUNOFF analyses. It is convenient to convert the raw rainfall into a grid coded 0 to 1 as the fraction of maximum expected annual rainfall across the watershed. The scaled RAINFALL surface and contours are shown in Figure 8b. Other examples of spatially and temporally distributed variables, measured or estimated by point sources, are results of geo-located population surveys of water, sanitation and hygiene (WASH) practices. In upcoming studies, MDF will undertake community surveys for these variables in proximity to the same 22 water sampling points of this study of the Bumbu Watershed in order to study their relation to runoff water quality.

**Figure 8.** Rainfall analysis data sites and results: (**a**) 13 weather station locations from McAlpine (1975) used to model average annual rainfall pattern across the Bumbu Watershed and (**b**) average annual rainfall contours as estimated by Triangulated Irregular Network (TIN) scaled 1 to 100% of maximum.

### *2.6. Observed Limitations and Rectifications*

No major impediments to using the protocol appear to exist except in the limitations of data as explained further below. The value of the protocol will emerge with application to correlation analysis of water quality measurements with the derived inputs. At this time, the imprecision of the DEM necessitated the estimation of the locations of some sampling stations on the derived stream lines. In the cases where there was a discrepancy between derived stream lines and actual streams, estimates were made of the location of sampling station points of equivalent hydrologic position. The uncertainties created by this process are unknown at this point. In Figure 9, below, the differences in positions of the actual 22 sampling sites versus their estimated "equivalent" hydrologic positions on the DEM are shown. The sites numbered 3, 12, 13, 14 and 18 required the greatest adjustment and are highlighted below.

**Figure 9.** Geo-adjustment of sample site locations to coincide with imprecise Digital Elevation Model (DEM) stream delineation. Points of largest deviation from derived points are highlighted.

In practice, it was found that high resolution aerial photography down to 1 meter pixel resolution as portrayed on the USGS Earth Explorer [21] was superior to satellite imagery for identifying appropriate locations for signature polygons. These images are not always available for download but can nevertheless be used for informal geo-location. Landsat imagery proved superior for signature definition and more exact geo-location. There was a tendency for satellite imagery to incorrectly identify streambeds and water bodies as "highly urban" and "semi urban". Attempts to create a sixth signature and category for water were unsuccessful, but masking out of the stream layer obtained by DEM analysis partially compensated for this shortcoming. The results presented are based on the use of the 5 land-use categories for categorization of 4-band satellite imagery as compiled by Hansen et al. (2013) [25]. Refinement for any application can explore what is most appropriate in that specific study scene. For the purposes of this protocol development, no "ground truthing" other than by aerial photography verification of the categorizations was performed.
