**2. Methods**

## *2.1. Site Description*

We collected data from the following forest managemen<sup>t</sup> units (FMUs) in Sabah, Malaysia (Figure 1): Deramakot FMU (5◦14–28 N, 117◦20–38 E, 551 km2) and Tangkulap FMU (5◦18–31 N, 117◦11–22 E, 276 km2). Supplementary data were also collected from Segaliud Lokan FMU (5◦20–27 N, 117◦23–39 E, 576 km2). The three FMUs are adjacent to each other with the same undulating topography and the same equatorial tropical rain forest climate. Selective logging was a major driver of the changes of carbon stock and tree community composition in these forests. The elevation ranges from 12 m to 355 m. The original vegetation is a lowland dipterocarp rainforest throughout the three FMUs.

The areas of the three FMUs were initially logged in 1956, 1970 and 1958, respectively, using conventional logging methods (i.e., high-impact logging with no environmental considerations). In Deramakot, conventional logging continued until 1989, when all logging activities were halted for regrowth [25]. Then, a long-term managemen<sup>t</sup> plan with RIL was introduced to Deramakot in 1995. RIL is an improved method of selective logging, including pre-harvest inventory, mapping of all canopy trees, directional felling, liana cutting, and planning of skid trails, log decks, and roads [26]. In combination with reduced-impact logging, a longer cutting cycle (i.e., 40 years) was strictly adhered to in accordance with the long-term managemen<sup>t</sup> plan. These combined approaches helped to preserve forest intactness [24,27]. Deramakot was the first tropical forest certified by the FSC in 1997, and was considered a model of sustainable forest managemen<sup>t</sup> by the Sabah Government [25,26]. Reflecting this logging history, we observed that the forests inside the Deramakot FMU were less disturbed based on aboveground biomass and various biological communities [27]. In contrast, Tangkulap was repeatedly logged using conventional logging until 2003 and all logging activities were suspended thereafter. Tangkulap was entirely designated as a conservation forest in 2008 and later also certified by the FSC in recognition of its conservation-oriented long-term managemen<sup>t</sup> plan in 2011. Both Deramakot and Tangkulap are currently managed directly by the Sabah Forestry Department. By contrast, Segaliud Lokan was repeatedly logged using conventional logging until 2002, after which RIL was implemented by the KTS Plantation Corp. Several areas were converted to plantation forests. It is currently certified by the Malaysian Timber Certification Council (MTCC).

**Figure 1.** Map indicating the locality of the three forest managemen<sup>t</sup> units (1, Segaliud Lokan; 2, Deramakot; and 3, Tangkulap), Sabah, Malaysia.

In the following analysis, we compare spatiotemporal changes of ecosystem services, including carbon stock and forest intactness, between Deramakot and Tangkulap. A comparison of these two FMUs may elucidate differences between their respective spatiotemporal patterns in ecosystem services, reflecting the difference in past logging intensity, despite the fact that they are now both FSC-certified. We excluded Segaliud Lokan from this comparison because BOLEH was designed primarily for natural forests. The occurrence of plantation forests in Segaliud Lokan would interfere with the analysis. However, we included natural-forest data of Segaliud Lokan for the following remote-sensing analysis to enhance data points and statistical reliance.

## *2.2. Field Sampling*

A total of 50 circular plots (20-m radius with 0.126 ha area each) were placed in the FMUs from December 2011 until May 2012 (hereafter 2012 inventory). Plots were placed with a stratified random design to represent various forest statuses, ranging from a near pristine high-stock forest to a highly degraded low-stock forest.

Prior to the placement of plots, the entire area was stratified to 6 degradation strata, from primary forest (stratum 1) to highly degraded, open area without trees (stratum 6), with the aid of Landsat Thematic Mapper (TM) band-7 imagery (see Imai et al. [21] for the detailed procedure). Imagery obtained in 2010 was used for the stratification procedure. Briefly, we radiometrically converted Landsat data into Top-of-the-Atmosphere (TOA) reflectance values. Illumination artifacts caused by heterogeneous topography were reduced following the sun-canopy-sensor (SCS) model using TOA values [28]. Shuttle Radar Topography Mission (SRTM) data were employed to correct for illumination artifacts. After masking all non-forest pixels, band-7 values were categorized into 6 strata using a level slicing method. Subsequently, we randomly placed 10 plots in each stratum of the five strata, except for stratum 6. Stratum 6, which is devoid of forests, was not actually sampled. Therefore, we established a total of 50 plots in strata 1 through 5. The distance between any two plots was greater than 100 m in order to minimize spatial autocorrelation.

All trees with ≥10 cm in diameter at breast height (dbh) were measured in dbh and identified as genera (actually operationally to species) in each circular plot. Voucher specimens were collected from the trees that could not be identified in the field and later used for identification at the herbarium of the Forest Research Centre, Sandakan. A polyvinyl chloride (PVC) stake was driven into the ground at the center of each circular plot as a permanent marker, and the exact coordinates (latitude and longitude) above the stake were recorded by averaging for two hours using a portable Global Positioning System (GPS).

In August 2014, we repeated the same procedure to inventory the forests as of 2014 (hereafter 2014 inventory). This time, we placed a total of 88 20-m-radius circular plots again evenly in strata 1–5 (actually, some of the plots had a 30 m × 40 m rectangular shape with approximately the same area as the circular plot because we used these rectangular plots for another purpose). None of the 88 plots overlapped with the 2012 plots because random sampling was deployed. In reality, a total of 50 plots would have been sufficient to derive reliable values of carbon stock and forest intactness according to our standard protocol manuals; however, we added 38 more plots that were placed for another research purpose to enhance model accuracy.

#### *2.3. Pre-Processing of Satellite Images*

We used the following two Landsat images to elucidate the temporal changes of carbon and forest intactness between 2009 and 2014 at the landscape level:


The raw digital numbers of each image were converted into top-of-atmosphere radiance. Subsequently, top-of-atmosphere radiance was converted to surface reflectance to compensate for atmospheric scattering and absorption effects using an atmospheric correction algorithm based on the Second Simulation of a Satellite Signal in the Solar Spectrum radiative transfer code (version. 1.1, [29,30]). The topographic effects on illumination were removed using the method described in Ekstrand [31] with Shuttle Radar Topography Mission (SRTM) data. Finally, pixels covered with clouds/shadow were removed according to the procedure described in Fujiki et al. [22]; pixels covered/affected by cloud/shadow were clustered as segments based on a homogeneity criterion with eCognition Developer 8.7 and all segments above a certain threshold were removed as cloud/shadow. Removed segments were filled in using the cloud-free areas of temporally adjoining data. Calibrated cloud-free parts of adjoining secondary images were incorporated into the missing parts of the base image (see Fujiki et al. [22] for the details). We used ERDAS Imagine version.11.0 and ArcGIS 9.3.1 for these pre-processing procedures.

#### *2.4. Estimating Spatiotemporal Changes of Carbon Stock*

Above-ground biomass (AGB) of each plot was estimated according to the allometric equation obtained by Chave et al. [32] as:

$$\text{AGB} = \rho \times \exp(-1.499 + 2.148 \ln(\text{D}) + 0.207 (\ln(\text{D}))^2 - 0.0281 (\ln(\text{D}))^3), \tag{1}$$

where D is dbh (cm) and ρ is the wood-specific gravity (g/cm3). We obtained the wood-specific gravity ρ for the sampled species/genera from various sources (see Imai et al. [21] for the detailed procedure). We estimated AGB of each plot for both the 2012 and 2014 inventories. Subsequently, the estimated AGB of each plot was multiplied by 0.48 to derive the amount of carbon.

A multivariate regression model was established with the amount of carbon per plot as a dependent variable and reflectance of the corresponding pixel on a Landsat image as an independent variable. Here, textural metrics of the 3 × 3 pixels surrounding each plot were also added as independent variables. The amounts of carbon derived from the 2012 inventory were regressed

with the reflectance and textural metrics of the 2009 Landsat imagery (Landsat TM, Path117/Row56, 11 August 2009). The amounts of carbon derived from the 2014 inventory were regressed with the reflectance and textural metrics of the 2014 Landsat imagery (Landsat OLI, Path117/Row56, 06 June 2014). The 2012 inventory data were regressed with the 2009 Landsat imagery without correcting for the tree growth between 2009 and 2012 for a first approximation of the 2009 condition because there were neither 2009 inventory data nor clear 2012 Landsat images; this was logical because all plots measured in 2012 were also present in 2009. However, modeling using the 2012 inventory data without the correction for tree growth would slightly overestimate the modeled 2009 carbon stock.

The following Landsat metrics were used as independent variables: reflectance value of each band (Band1TM/OLI, Band2TM/OLI, Band3TM/OLI, Band4TM/OLI, Band5TM/OLI, Band6OLI, Band7TM/OLI), NDVI [33], normalized difference water index (NDWI) [34,35], normalized difference soil index (NDSI) [36], and enhanced vegetation index (EVI) [37]. We calculated mean value of each Landsat metric within a 20-m radius from the center of a plot to represent the plot. The normalized indices were calculated as follows:

$$\text{NDVI-TM(OL)} = \text{(band 4(band5) -- band 3(band4))} / \text{(band 4(band5) + band 3(band4))},\tag{2}$$

$$\text{NDWI-TM(OL)} = \text{(band 3(band4) -- band 5(band6))} / \text{(band 3(band4) + band5(band6))},\tag{3}$$

$$\text{NDSI-TM(OLI)} = \text{(band 5(band6) -- band 4(band5))} / \text{(band 5(band6) + band 4(band5))}, \qquad (4)$$

$$\begin{aligned} \text{EVI-TM(OLI)} &= 2.5 \times \text{(band 4(band5) -- band 3(band4))/(band 4(band5) + 6 \times 5)} \\ &\quad \text{band3(band4) -- 7.5 \times \text{band 1(band2) + 1)}, \end{aligned} \tag{5}$$

In addition, we used the following metrics as proxies for spectral heterogeneity because degradation of forest canopies might affect the heterogeneity of the spectral pattern: the coefficient of variation (CV), standard deviation (SD), and textures of the gray-level co-occurrence matrix (GLCM) [38]. CV, SD, and textures of the GLCM were derived using a 3 × 3 pixel window based on the reflectance values and each of the normalized indices. GLCM is a tabulation explaining how often different combinations of gray levels occur at a specified distance and orientation in an image object [39]. Homogeneity, contrast, angular second moment, entropy, dissimilarity, correlation, mean, and standard deviation were derived as the indices of the textures. Overall, 120 and 132 metrics were generated based on Landsat TM and OLI, respectively. For developing an adequate regression model, independent variables were selected using a stepwise selection from the full model containing all metrics to avoid multi-collinearity among the independent variables. We did not find significant multi-collinearity because the variance inflation factor (VIF) of selected independent variables, which is an indicator of multi-collinearity, was in all cases less than 10. Established models are shown in Table 1. Subsequently, each of the models (2009 and 2014 models) was extrapolated to the entire area to estimate the amount of carbon outside the inventory plots based on the 2009 or 2014 Landsat imagery.

We took a Monte Carlo approach to show model accuracy and to test for significant differences of mean AGB values between 2009 and 2014 for a given FMU, or between FMUs for a given year (2009 or 2014). Four-fifths of the plots in each stratum (e.g., a total of 40 plots in the case of 2009, and 70 plots in 2014) were randomly selected to construct the AGB models for 2009 and 2014, respectively. The model predictions for AGB of the remaining one-fifth of the plots were regressed with the observed values both for 2009 and 2014, and the adjusted R2 values were determined. Based on these models, we estimated the mean AGB value each for Deramakot and Tangkulap each in 2009 and 2014. These steps were reiterated 1000 times for 2009 and 2014, respectively, to derive the 95% confidence intervals (CIs) of the adjusted R2 and the mean AGB values. The statistical tests were conducted using R ver. 3.20 [40].


**Table 1.** Selected multivariate regression models to explain carbon density based on all plots for 2009 and 2014.

N, number of plots used for each model; R2, adjusted R-squared value; Coefficient, standardized partial regression coefficient; SE, standard error; B1, band 1; B3, band 3; B7 band 7; GLCM, textures of grey-level co-occurrence matrix; NDSI, normalized difference soil index.

#### *2.5. Estimating Spatiotemporal Changes of Forest Intactness*

It has been well established that tree-species/genus composition is one of the best indexes of the magnitude of forest degradation [21,23,27]. Similarity of the species/genus composition of a given forest with that of pristine forests decreases linearly with increasing magnitude of degradation of the forest [21,27]. We applied this principle to the forests of Deramakot and Tangkulap and evaluated the status of a given forest in terms of its compositional similarity with pristine forests [21]. The compositional similarity with pristine forests is termed here "forest intactness" and can be mapped through a special extrapolation procedure [22]. Here, we compared generic composition among plots instead of species composition because logging causes similar shifts in generic composition as in species composition [21,23]. The procedure of mapping "forest intactness" for a given time has been described by Fujiki et al. [22]. We elaborate here on the procedure to elucidate temporal changes of "forest intactness".

Firstly, differences in community composition among these plots laid out for estimating carbon storage were examined by using an ordination technique. We used a composite of the data of both the 2012 and 2014 inventories to derive a single data matrix to allow for a comparison between these two years using standardized scores. The Chao distance [41] and the number of trees of each genus were used to calculate a distance matrix. An ordination of plots was conducted with non-metric multidimensional scaling (nMDS) using the "metaMDS" procedure in the Vegan package in R [42].

Although plots were located distantly from each other, we could not completely rule out the possibility of spatial autocorrelations among plots. Therefore, we tested autocorrelations among plots, but did not find significant effects of spatial autocorrelations (see Figure S1, Supplementary Materials).

A multivariate regression model was established with the derived nMDS axis-1 scores of plots as dependent variables, and reflectance and textural metrics of the corresponding pixels on a Landsat image as independent variables; the nMDS axis-1 scores of the 2012 inventory plots were regressed with the 2009 Landsat TM imagery and those from the 2014 inventory were regressed with the 2014 Landsat OLI imagery, as was conducted for the carbon analysis. We did not correct the nMDS axis-1 scores of 2012 when regressed with the 2009 imagery because the tree growth during 3 years between 2009 and 2012 would not substantially change tree-species composition. We used the same Landsat metrics as a carbon stock estimate in this analysis. Established models are indicated in Table 2. Each model was extrapolated to the entire area on the 2009 and 2014 Landsat imagery, respectively. This procedure yielded the maps of forest intactness for 2009 and 2014.

We also took a Monte Carlo approach to test significant differences of mean nMDS axis-1 scores between 2009 and 2014 for a given FMU, or between FMUs for a given year (2009 or 2014) [22]. Four-fifths of the plots in each stratum were randomly selected to construct the 2009 and 2014 model, respectively. Based on these models, we estimated mean nMDS axis-1 scores for Deramakot and

for Tangkulap in 2009 and in 2014. The model predictions for nMDS axis-1 scores of the remaining one-fifths of the plots were regressed with the observed values both for 2009 and 2014, and the adjusted R<sup>2</sup> values were collected. Based on these models, we estimated mean nMDS axis-1 scores for Deramakot and for Tangkulap in 2009 and in 2014. These steps were reiterated 1000 times each for 2009 and 2014 to derive the 95% CIs of the adjusted R<sup>2</sup> and the mean reiterated nMDS axis-1 scores. The statistical tests were conducted using R ver. 3.20 [40].

**Table 2.** Selected multivariate regression models to explain forest intactness (nMDS axis-1 values) based on all plots for 2009 and 2014. Two of the plots for 2014 were not used in the analysis due to incomplete species identification, yielding N = 86.


N, number of plots used for each model; R2, adjusted R-squared value; Coefficient, standardized partial regression coefficient; SE, standard error; B3, band 3; B5, band 5; B7 band 7; GLCM, textures of the grey-level co-occurrence matrix; NDSI, normalized difference soil index; EVI, enhanced vegetation index.
