*2.1. Study Area*

Our investigation is centered around the Bumbu basin, located in Morobe Province in Papua New Guinea (PNG). Figure 1 illustrates the study area at different scales. The watershed is bounded on the west by the Markham river basin, and on the east by the Busu river basin. The Bumbu River traverses through Lae, the capital of Morobe Province and the second largest city in PNG. The river originates from the Atzera Range and is relatively narrow as it flows downstream at a medium pace. However, during the extreme rainfalls of the flooding season, the rate of flow is much higher [15], resulting in rapid erosion of the sandy loam that is the main constituent of the Bumbu floodplain [16]. All coordinates in this study are based on the World Geodetic System of 1984 (WGS84) datum.

**Figure 1.** Depiction of the study area on different scales: (**a**) Papua New Guinea (PNG), (**b**) Morobe Province, (**c**) Bumbu basin.

In total, 22 water sampling points spread across the Bumbu river basin were chosen for this research as shown in Table 1. These sites lie at different locations and elevations, with varying levels of vegetation and urbanization. Water samples were collected at these points for further research analysis, and their respective position coordinates were captured with the help of GPS. The position of these sampling sites with respect to the Bumbu river basin are represented in Figure 2. The sampling points on the Bumbu river are divided into three main categories, namely Bumbu main channel, left hand Bumbu stream and right hand Bumbu stream sampling points. The Station IDs of these points belong to UA, UB and UC series, respectively. The captured GPS details can be found in Appendix A, Tables A1–A3.

**Table 1.** Water sampling points and the names of the respective sites.


**Figure 2.** Position of 22 water sampling sites, Bumbu river main channel and other relevant sites of interest in the Bumbu Watershed.

### *2.2. Overview of the Protocol*

Multiple geographically distributed factors such as rainfall, geophysical and geochemical conditions, human and animal populations, residential and commercial WASH conditions and practices, and general landscape use and conditions exist in the watershed [4–11,17–20]. These together with roads, habitations and forests potentially impact water quality and represent the social, economic and environmental (SEE) factors present. To analyze and understand the impact of these factors on water

quality, it is imperative that these factors are assessed appropriately at water quality (WQ) sampling stations throughout the watershed. In practice, for almost all social, economic and environmental (SEE) variables, direct measurement of their impacts is practically impossible. Consequently, it is essential to incorporate into the analysis a systematic and consistent spatial interpolation protocol for estimating the accumulation of these factors at each Water Quality sampling station. The data for the factors taken into consideration for such studies usually exist in spatial formats such as raster layers and vectors shapefiles. Vector GIS layers can be point-, line- or polygon-based. Vector features are carried within a shapefile using geographic latitude and longitude coordinates to define points, lines and edges of geometric shapes. Individual survey points lend themselves to point characterization, while boundaries lend themselves to line characterization. Buildings and other surface structures lend themselves to polygonal characterization. Rivers and streams can be characterized by lines or polygons, but the relevant hydrologic information can be conveyed by line vectors.

As a result of varying datasets, we need to apply different techniques to extract appropriate runoff information for distinct factors involved based on how each factor can be described by line, point and raster-based GIS layers. Due to lack of WASH-related data at the time of this exploratory study, we limited the application of the protocol to the measurement of the influence of runoff of anthropogenic and environmental factors. The data related to WASH parameters and other SEE variables are usually gathered by community surveys of households, and hence are restrained to estimation at point sources. In Section 2.5, we explain how we can study point sources of information by considering the example of rainfall data in this format and demonstrating how we make use of spatial interpolation techniques. Figure 3 depicts the generalized process of runoff extraction using the different environmental and anthropogenic factors formatted in point-, line- and raster-based GIS layers. The resulting outputs include Flow Runoff, Road Runoff, Dense Forest Runoff, Green Space Runoff, Highly Urban Runoff, Habitation Runoff, Semi-Urban Runoff and Rainfall Runoff. The categorization based on environmental and anthropogenic factors is also shown. The procedures involved in extracting raw runoff are more thoroughly explained in the upcoming Sections 2.3–2.6 and procedures for compiling relative importance for the factors are discussed in Section 3.1.

### *2.3. Line Vector-Based GIS Layers*

This section explains the methodology to analyze factors which have line vector characteristics. In this context, we delineated the stream drainage network and extracted the pattern and associated values of road network runoff.
