Importance of Spectral Information, Seasonality, and Topography on Land Cover Classification of Tropical Land Cover Mapping

Sovann, Chansopheaktra; Olin, Stefan; Mansourian, Ali; Sakhoeun, Sakada; Prey, Sovann; Kok, Sothea; Tagesson, Torbern

doi:10.3390/rs17091551

Open AccessArticle

Importance of Spectral Information, Seasonality, and Topography on Land Cover Classification of Tropical Land Cover Mapping

by

Chansopheaktra Sovann

^1,2,*

,

Stefan Olin

¹

,

Ali Mansourian

¹

,

Sakada Sakhoeun

³,

Sovann Prey

⁴,

Sothea Kok

² and

Torbern Tagesson

¹

Department of Physical Geography and Ecosystem Science, Lund University, Sölvegatan 12, 223 62 Lund, Sweden

²

Department of Environmental Science, Royal University of Phnom Penh, Phnom Penh 120404, Cambodia

³

Provincial Department of Environment, Ministry of Environment, Siem Reap 171201, Cambodia

⁴

Independent Researcher, 142Eo, Street 19, Chey Chumneah Commune, Daunh Penh District, Phnom Penh 120208, Cambodia

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(9), 1551; https://doi.org/10.3390/rs17091551

Submission received: 28 February 2025 / Revised: 21 April 2025 / Accepted: 23 April 2025 / Published: 27 April 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

Tropical forests provide essential ecosystem services, playing a critical role in climate regulation, biodiversity conservation, and regional hydrological cycles while also supporting livelihoods. However, they are increasingly threatened by deforestation and land-use change. Accurate land cover (LC) mapping is vital to monitor these changes, but mapping tropical forests is challenging due to complex spatial patterns, spectral similarities, and frequent cloud cover. This study aims to improve LC classification accuracy in such a heterogeneous tropical forest region in Southeast Asia, namely Kulen, Cambodia, which is characterized by natural forests, regrowth forests, and agricultural lands including cashew plantations and croplands, using Sentinel-2 imagery, recursive feature elimination (RFE), and Random Forest. We generated 65 variables of spectral bands, indices, bi-seasonal differences, and topographic data from Sentinel-2 Level-2A and Shuttle Radar Topography Mission datasets. These variables were extracted from 1000 random points per 12 LC classes from reference polygons based on observed GPS points, Uncrewed Aerial Vehicle imagery, and high-resolution satellite data. The random forest models were optimized through correlation-based filtering and recursive feature elimination with hyperparameter tuning to improve classification accuracy, validated via confusion matrices and comparisons with global and national-scale products. Our results highlight the significant role of topographic variables such as elevation and slope, along with red-edge spectral bands and spectral indices related to tillage, leaf water content, greenness, chlorophyll, and tasseled cap transformation for tropical land cover mapping. The integration of bi-seasonal datasets improved classification accuracy, particularly for challenging classes like semi-evergreen and deciduous forests. Furthermore, correlation-based filtering and recursive feature elimination reduced the variable set from 65 to 19, improving model efficiency without sacrificing accuracy. Combining these variable selection methods with hyperparameter tuning optimized the classification, providing a more reliable LC product that outperforms existing LC products and proves valuable for deforestation monitoring, forest management, biodiversity conservation, and land use studies.

Keywords:

land cover; tropical forest; recursive feature elimination; random forest; Kulen; Cambodia; Southeast Asia

1. Introduction

Tropical forests provide essential ecosystem functions and social benefits [1,2]. They play a critical role in global climate regulation, biodiversity conservation, and influence regional hydrological cycles, while supporting livelihoods [3,4]. Despite their significance, these forests are under threat from deforestation and land cover change due to agricultural land demand, urban expansion, and resource extraction [5,6], with approximately 13 million hectares lost annually [7]. Urgent conservation actions, such as the implementation of Reducing Emissions from Deforestation and Forest Degradation (REDD+) initiatives and protected area designations, are crucial for safeguarding remaining forest lands [8]. Therefore, accurately mapping land cover (LC) is essential to monitor landscape structure changes, guide conservation strategies, and promote sustainable forest management and spatial planning [9].

Mapping LC in tropical landscapes poses significant challenges due to complex spatial patterns, spectral similarities among different forest types, rapid land use transformation (e.g., from forest to agriculture and settlements), and frequent cloud cover, especially in mountainous regions during the rainy season [10,11]. To address these challenges, various strategies focus on enhancing surface information through diverse variable inputs to enhance classification accuracy. These approaches include integrating additional spectral bands, conducting multi-temporal image datasets, employing sensor-level and feature-level data fusion techniques, and incorporating topographic variables [12].

Previous studies highlight the effectiveness of spectral indices, tasseled cap transformations (TCT), texture features, and multi-temporal datasets in improving classification accuracy in both complex agricultural and forest landscapes [13,14]. Additionally, synergistic use of optical and radar data, alongside topographic variables such as elevation, slope, and aspect, has proven beneficial for LC mapping in mountainous tropical regions [15]. These terrain attributes offer critical insights into physical surface characteristics, influencing land use patterns, hydrology, vegetation distribution, and human accessibility [16]. Furthermore, using multi-seasonal composite images as training input for LC classification outperforms a single-season or an annual composite, particularly in mapping diverse land use classes in tropical regions [13]. While previous studies have demonstrated that spectral indices, topographic variables, and multi-temporal data improve LC classification accuracy across various contexts, the integration of all these variables may introduce redundancy, overfitting, or noise, potentially reducing model stability and applicability [17]. Therefore, identifying an optimal subset of variables that balances classification accuracy, computational efficiency, and model generalizability remains crucial for effective LC mapping in tropical landscapes.

Variable selection in remote sensing is necessary for managing high-dimensional datasets to enhance prediction accuracy, efficiency, and interpretability [18]. Most previously applied methods are categorized into filter and wrapper approaches. Filter methods use statistical metrics such as variance and correlation coefficients to identify relevant variables independently of model performance, suitable for large datasets but potentially overlooking critical feature interactions [19,20]. In contrast, wrapper methods like recursive feature elimination (RFE) iteratively select variable subsets based on their impact on model performance, integrating cross-validation and variable importance ranking to improve predictive accuracy, robustness, and interpretability. RFE has proven effective in optimizing variable subsets across diverse datasets and landscapes, including urban areas, croplands, and grasslands [21,22,23]. Previous studies suggest that in LC classification, RFE based on Random Forest (RF) typically outperforms RFE based on support vector machines, particularly as the number of variables increases, and is considered suitable for various classifiers [23,24,25]. However, its application remains underexplored in complex tropical forest landscapes characterized by diverse LC classes and spatial heterogeneity resulting from mixed rural and forest land uses. Bridging this gap necessitates integrating RFE with robust classifiers and scalable geospatial computing platforms, such as cloud-based or high-performance systems, capable of handling the complex patterns and high variability typical of such environments, thus enhancing the accuracy and reliability of LC mapping in tropical landscapes [26].

Factors influencing the accuracy of LC classification encompass input variables (wavelength band, index, sensor type, and their quality), the quality and amount of reference data, the number of classes, and the chosen classification methods [27,28,29]. The Random Forest classifier is recognized for its robustness and versatility in LC mapping. By utilizing multiple decision trees, RF ensures reliable results and enhances interpretability through its variable importance ranking [30,31]. Unlike other machine learning classifiers, RF shows less sensitivity to training sample quality and overfitting, offering advantages such as minimal training time, scalability for large datasets, and straightforward parameterization [32,33]. Moreover, RF’s integrated variable selection, particularly when coupled with RFE, simplifies model complexity and improves interpretability without compromising predictive accuracy [21,23]. Recent studies of tropical LC mapping in Cambodia and Vietnam highlight RF’s effectiveness, achieving validation accuracies exceeding 89% and 90%, respectively [34,35,36]. Integrating RF with efficient cloud-based platforms such as Google Earth Engine (GEE) shows promise in addressing LC classification challenges, facilitating efficient data handling, processing, and comprehensive large-scale mapping with minimal time and effort [37,38]. This integration of RF with RFE and GEE holds the potential for overcoming challenges in mapping tropical forest landscapes.

Building on these advancements, this study aims to improve LC classification accuracy in mountainous tropical rural-forest landscapes by integrating Sentinel-2 data with topographic variables and applying advanced machine-learning techniques. Specifically, our objective is to rank variable importance and optimize variable selection for LC classification using recursive feature elimination. We further investigate the combined influence of bi-seasonal difference and topographic variables on classification accuracy, and map the current LC of a Southeast Asian tropical landscape that was once predominantly pristine forests but has since been extensively modified by human activity, resulting in a heterogeneous mosaic of remaining primary forest, regrowth forest, and agricultural land.

2. Materials and Methods

2.1. Study Area

The study area encompassed Phnom Kulen National Park (Kulen) and a 10 km buffer zone surrounding it. Kulen, located in the center of Siem Reap Province, Cambodia, covers approximately 37,380 hectares (13°34′07.4″N, 104°06′29.6″E). Situated 30 km northeast of the Angkor Wat UNESCO World Heritage Site, Kulen is recognized as a hotspot for Cambodia’s ecosystem services. Positioned on sandstone plateaus with the highest peak of 496 m [39] and receiving significant annual rainfall of approximately 2290 mm [40], Kulen serves as a crucial water source for the Angkor Wat region, supporting local livelihoods [41,42]. The park is also a significant tourist attraction, featuring natural waterfalls, substantial forest lands, and important archaeological sites. Kulen comprises two main plateaus—Phnom Kbal Spean in the northwest and Phnom Kulen in the southwest—spanning the districts of Svay Leu, Banteay Srey, and Varin (Figure 1). The area is home to 5370 residents from eight villages in Knang Phnom commune, who rely heavily on cash crops, particularly cashew nuts, as well as livestock, tourism, and non-timber forest products [43]. Given the diverse topography, varied land use practices, and ecological importance, it is an ideal location for studying the complexity of mapping land cover in tropical mountainous regions.

Cambodia’s rapid economic and population growth has increased pressure on natural forests for resource extraction and agricultural expansion [44]. Kulen and its surrounding areas have faced similar deforestation and degradation pressures despite their protected status [45]. The villages within and around Kulen, characterized by high poverty and low educational levels, depend heavily on farming and forest resources, further threatening the remaining intact forests [46]. Previous studies indicate that only 13% of the forest remains nearly intact, with the remainder consisting of degraded forest, selectively logged area, regrowth forest, barren land, and agricultural land [47]. This highlights the urgent need for detailed land cover maps and analyses of land change dynamics of the Kulen area, critical for sustaining ecosystem services and for effective planning of the management of the area.

2.2. Methodology

2.2.1. Overview

The methodological workflow is summarized in Figure 2. We prepared a total of 65 variables, comprising 62 derived from Sentinel-2 Level-2A imagery—including annual spectral bands, annual spectral indices, and bi-seasonal difference variables calculated from dry and rainy season spectral bands and spectral indices—and three topographic variables (slope, aspect, and elevation) derived from NASA’s Shuttle Radar Topography Mission. Reference land cover polygons, generated from GPS points from field observations (GCP), Uncrewed Aerial Vehicle (UAV) imagery, and high-resolution satellite imagery, were utilized to extract 1000 random pixel values per LC class. These data were used to train and evaluate six different Random Forest models employing diverse datasets and optimization techniques. The most accurate model was then used to generate an LC map, which was compared against existing global and national-scale products.

2.2.2. Data Sources

Sentinel-2 Dataset

We used Sentinel-2 MSI (Multi-Spectral Instrument) data, part of the European Space Agency’s Copernicus program, as the primary remote sensing data for land cover classification. With a swath width of 290 km and frequent revisit time (5 days at the equator with its two satellites), Sentinel-2 offers free and publicly accessible global coverage from 56°S to 83°N [48]. Featuring fine spatial resolution (10 to 60 m) and rich spectral resolution (13 bands spanning 442–2202 nm), alongside low radiometric calibration uncertainty [49], Sentinel-2 is well-suited for mapping and monitoring LC, particularly in tropical regions with diverse vegetation cover and high atmospheric interference (including high cloud coverage).

We generated one median composite image each for annual, dry, and rainy seasons for LC classification in 2021, using Sentinel-2 Level-2A surface reflectance data [50]. The annual image collection consisted of 288 images from 1 May 2020, to 30 April 2021, while the bi-seasonal collections included 148 images from the rainy season (1 May–31 October 2020) and 140 images from the dry season (1 November 2020–30 April 2021). The three 60 m bands were excluded from the analysis, as they are primarily designed for atmospheric correction and are not commonly used in LC classification [51,52]. Pixels with cloud shadows and low-to-high cloud probabilities were excluded using the scene classification layer of each image. Median composite methods were applied to the blue, green, red, red-edge (RE) 1–4, near-infrared (NIR), and short-wave infrared (SWIR) 1–2 bands across the three image collections, resulting in a single representative composite images for annual, rainy, and dry seasons.

Remote Sensing Indices

We calculated 21 spectral indices from the spectral bands of the annual and bi-seasonal composite images. The spectral indices were divided into two main groups. The first group comprised 12 variables targeting water-related, disturbance, vegetation-related, chlorophyll-related, and built-up indices (Table 1). The second group included nine variables from tasseled cap transformations, a standardized method for transforming multispectral data into brightness, greenness, and wetness components, applied following Shi et al. (2019) [53]. Additionally, tasseled cap angles (tcAngle) and distances (tcDist) were computed in Equations (1) and (2) [54,55].

t c A n g l e ({T C T}_{1}, {T C T}_{2}) = a t a n 2 (\frac{{T C T}_{1}}{{T C T}_{2}})

(1)

t c D i s t ({T C T}_{1}, {T C T}_{2}) = \sqrt{{T C T}_{1}^{2} + {T C T}_{2}^{2}}

(2)

where

{T C T}_{1}

and

{T C T}_{2}

represent two of the three TCT components (brightness, greenness, and wetness).

t c A n g l e ({T C T}_{1}, {T C T}_{2})

is the tasseled cap angle of the two TCT components, and

t c D i s t ({T C T}_{1}, {T C T}_{2})

is the Euclidean distance between the two TCT components.

Bi-Seasonal Difference

Bi-seasonal difference variables were calculated as the difference between rainy and dry season values for each spectral band and index. This process produced bi-seasonal differences for 10 spectral bands and 21 spectral indices hereinafter identified by the suffix “_diff”.

Topographic Data

Topographic variables, including elevation, slope, and aspect, were included in the Random Forest classification model as part of the full feature set, together with spectral bands, spectral indices, and bi-seasonal differences. Elevation data were obtained from NASA’s Shuttle Radar Topography Mission at a 1 arc-second resolution (approximately 30 m) [70]. Slope and aspect were calculated in degrees from the elevation data using GEE’s “ee.Terrain” function. These metrics were derived from the 4-connected neighbors of each target pixel in Equations (3) and (4). Aspect represents the direction of the steepest slope, measured clockwise from 0 degrees (north) to 360 degrees.

s l o p e = a r c t a n (\sqrt{G_{E W}^{2} + G_{N S}^{2}}) \times \frac{180}{π}

(3)

a s p e c t = a t a n 2 (- G_{E W}, G_{N S}) \times \frac{180}{π}

(4)

where slope and aspect are measured in degrees,

a r c t a n

is the inverse tangent of a single value, and

a t a n 2

is the 2-argument arctangent.

G_{E W}

and

G_{N S}

represent the elevation differences between eastern and western neighbors, and northern and southern neighbors, respectively, over a distance of twice the pixel cell size. If the calculated aspect is negative, 360° is added to convert it to a positive value.

Reference Data of Land Cover Classes

Within the 10 km buffer zone surrounding Kulen, twelve distinct LC classes were identified: evergreen forests, semi-evergreen forests, deciduous forests, regrowth forests, bamboo, croplands, paddy fields, villages, water bodies, cashew plantations, rubber plantations, and other tree plantations (e.g., eucalyptus, acacia, teak) (Table A1 in Appendix A). All LC class descriptions were adopted from the Cambodian national LC classification [71], except for cashew plantations, which we introduced in this study due to their recent expansion as a prominent agricultural crop in the area [47]. To generate LC classification reference polygons, we utilized three primary data sources as ground control points: GPS points from field observations, uncrewed aerial vehicle imagery, and high-resolution satellite imagery. Between March and April 2021, a total of 494 field observed GPS points were collected within Kulen and its 10 km buffer zone. The GPS points were preselected to include at least 200 points for cashew plantations due to their rapid expansion in the region, limited availability of existing GPS data, and the difficulty of identifying them from UAV or satellite imagery without ground verification. Additional GPS points were collected from representative LC patches across other classes, identified through a literature review and discussions with local key informants, with each GCP covering at least 1 ha. For each GCP, we collected field information including LC class name, GPS location, geo-tagged photos, elevation, site slope (0–5°, 5–30°, >30°), average tree height, and additional site notes such as cashew tree ages, growing areas, and crop types. Of the 494 GPS points collected, 236 were from cashew plantations, followed by 71 from croplands, 54 from villages, 41 from regrowth forests, 23 each from deciduous and evergreen forests, 18 from paddy fields, 12 from semi-evergreen forests, 8 from bamboo, 6 from tree plantations, and 2 from rubber plantations.

UAV imagery from 10 sites within the study area was captured between March and April 2021 using a DJI Phantom 4 RTK (DJI, Shenzhen, China) and processed with Pix4D software, version 4.4.12 (Pix4D S.A., Prilly, Switzerland). A total of 673 aerial images were collected, covering 617 ha, with an average coverage of 88 ha per site (range: 16–180 ha; Table S1). The processed UAV imagery had a ground resolution of 0.04 to 0.10 m, with a geolocation RMSE for X and Y under 1.30 m. Reference polygons were digitized from portions of UAV imagery where high visual interpretation confidence was possible or from areas overlapping with GCP. These reference polygons contributed 410 ha (6.8% of the total reference area) across deciduous forests, semi-evergreen forests, evergreen forests, regrowth forests, croplands, villages, and cashew plantations.

Additional reference polygons were digitized from the March 2021 PlanetScope imagery, which has a spatial resolution of 4.8 m [72]. Reference polygons from this data source included areas with high visual interpretability, such as water bodies, and areas visited during GCP data collection, such as rubber plantations, other tree plantations, and evergreen forests. To reduce uncertainty in LC class reference polygon labelling, we cross-checked the digitized areas using multiple date imagery from Google Earth and the Cambodian national LC map [71], and validated the information through interviews with local rangers in Phnom Kulen National Park.

A total of 437 reference polygons were generated across the twelve LC classes, covering 6062 ha. All reference data are available at Zenodo (https://zenodo.org/records/14927089), deposited on 28 February 2025 (Figure S1). The average area per LC class was 505.1 ± 575.7 ha, with individual class areas ranging from 25 ha to 1732 ha. With these reference polygons, we aimed to generate 1000 random points per class for LC classification. However, some classes had fewer due to the limited number of reference polygons: bamboo (993), cashew plantations (957), croplands (999), rubber plantations (982), villages (981), and water bodies (811). The reference points for each class were randomly split into 70% for training and 30% for validation.

2.2.3. Data Analyses

Random Forest Classifier and Variable Importance

The Random Forest, a decision tree-based ensemble method devised by Breiman et al., 2001 [30], was utilized to develop Kulen LC classification and quantify variable importance. RF improves classification by aggregating predictions from multiple decision trees, each trained on random subsets of the data and input variables. This ensemble approach reduces overfitting and minimizes the effects of multicollinearity while enhancing model robustness, making it particularly effective for high-dimensional datasets such as those used in this study. RF requires minimal configuration, primarily involving the adjustment of two critical parameters: the number of decision trees (ntree) and the number of variables tries at each split (mtry) [73,74,75]. Variable importance was calculated as the sum of decreases in Gini impurity, indicating each variable’s contribution to improving class separation across all decision trees in the Random Forest model [76,77]. Variables with higher importance scores were more frequently selected for data splitting, indicating their stronger influence on improving model performance [78]. This metric provides a straightforward and interpretable means of ranking predictor variables according to their impact on classification accuracy, allowing us to prioritize those most relevant to distinguishing land cover types in our landscape.

To ensure stable variable importance rankings and efficient processing, we implemented the RF model in Google Earth Engine, configuring it with 300 trees (following Nguyen et al., 2018 [35]) and using default settings for all other parameters. This setup is referred to hereafter as the “300-tree RF”. The raw variable importance scores generated by RF were subsequently normalized as percentages of the total importance across all variables, facilitating direct comparisons of their relative contribution to the model.

Variable Selections

We addressed multicollinearity in land cover classification variables through correlation-based filtering and recursive feature elimination. First, we computed the Pearson correlation matrix for all 65 variables from 1000 random sampling points to assess multicollinearity and independently ranked variable importance using a 300-tree RF to evaluate their contribution to classification accuracy. Based on these two results, highly correlated but less important variables were separately filtered at correlation thresholds of 0.80, 0.90, 0.95, and 0.99, generating four subsets with reduced redundancy.

We further applied an RFE based on Random Forest to each filtered subset, aiming to reduce variable quantity while maintaining robust classification accuracy [23]. RFE is a wrapper-based variable selection method that iteratively uses a learning algorithm, such as Random Forest, to remove the least important variables based on their model performance metrics for identifying the optimal subset of variables. RFE was performed using a Random Forest with 500 trees, conducting five iterations of 10-fold cross-validation to ensure the reliability and stability of the variable selection process. For this run, we used the training dataset including between 568 to 700 reference points per LC class. The entire RFE process was executed using the caret package (version 6.0.94) in R software (version 4.2.3) [79]. The final variable subset achieving the highest classification accuracy with the fewest variables was selected to develop the land cover classification model for Kulen, with its variable importance reported.

Comparing Different Random Forest Models

Six Random Forest models were used to systematically evaluate the impact of input variable combinations, variable selection, and hyperparameter tuning on the LC classification accuracy. These models included: (1) annual spectral bands only (hereafter referred to as “Spectral”), (2) annual spectral bands combined with spectral indices (Spectral+SI), (3) Spectral+SI combined with bi-seasonal differences (Spectral+SI+Diff), (4) Spectral+SI+Diff combined with topographic variables (Spectral+SI+Diff+Topo), (5) RFE-selected variables (RFEvar), and (6) RFEvar with hyperparameter-tuned Random Forest (RFEvar-Hyper). All models, except RFEvar-Hyper, were classified using a 300-tree RF as previously described; however, RFEvar-Hyper employed optimal ntree and mtry hyperparameters identified by grid search [80].

One-sided Z-tests for comparing overall accuracies (OA) were conducted to evaluate statistically significant differences in classification accuracies among the models, providing insight into the relative contributions of spectral indices, bi-seasonal differences, topographic variables, variable selection, and hyperparameter tuning [75,81]. The pairwise comparisons included Spectral vs. Spectral+SI, to examine the effect of adding spectral indices; Spectral+SI vs. Spectral+SI+Diff, to evaluate the impact of incorporating bi-seasonal differences; Spectral+SI+Diff vs. Spectral+SI+Diff+Topo, to assess the contribution of topographic variables; Spectral+SI+Diff+Topo vs. RFEvar, to evaluate the effectiveness of variable selection; and RFEvar vs. RFEvar-Hyper, as well as Spectral+SI+Diff+Topo vs. RFEvar-Hyper, to assess the impact of hyperparameter tuning. The one-sided Z-test statistic was calculated using Equation (5) [82,83]. Z-values greater than 1.64, 2.33, and 3.09 indicate statistically significant differences at the 95% (one-sided p-value < 0.05), 99% (one-sided p-value < 0.01), and 99.99% (one-sided p-value < 0.001) confidence levels, respectively, based on a one-sided test in the positive direction.

Z = \frac{{O A}_{A} - {O A}_{B}}{\sqrt{{\hat{v a r}}_{A} + {\hat{v a r}}_{B}}}

(5)

where

{O A}_{A}

and

{O A}_{B}

are the estimated overall accuracies for the error matrices (maps) A and B, while

{\hat{v a r}}_{A}

and

{\hat{v a r}}_{B}

are their corresponding estimated variances, derived from the standard errors.

We employed confusion matrices to report both map-level and class-level accuracy metrics of the six Random Forest models. The metrics included overall accuracy, kappa coefficient (Kappa), F1 score, user accuracy (UA), producer accuracy (PA), omission error, and commission error [84,85]. OA and Kappa measure the agreement between predicted and referenced pixels, with values above 85% indicating good agreement [86,87]. The F1 score, ranging from 0 to 1, provides an integrated accuracy measure for individual classes, representing the harmonic mean of UA and PA [88].

Comparison with Other Land Cover Products

We compared our optimized LC map (hereafter referred to as “KuLandCover”) with two widely used products: the European Space Agency WorldCover 2020 (“ESA”) and the SERVIR-SEA Cambodia National Land Cover 2021 (“SERVIR”) [89,90]. These datasets were selected for their comprehensive LC classification, comparable production years, and spatial resolutions (Table A2). To facilitate comparison, all three LC products and our reference polygons were reclassified into five land-cover classes of the Intergovernmental Panel on Climate Change (IPCC) [91]: forest lands, croplands, wetlands, settlements, and other lands (Table A3). We generated 200 random points per class within the reclassified reference polygons to obtain representative reference points for the accuracy assessment of the three products. These points were spaced at least 30 m apart to match SERVIR’s spatial resolution, ensuring independent sampling and preventing the extraction of identical pixel values. These points were used to extract LC values from each product, and confusion matrices were constructed to compute OA and Kappa for accuracy comparison.

3. Results

3.1. Variable Selection and Variable Importance Ranking

Variable importance ranking among all 65 input variables reveals that elevation was the most significant predictor, followed by NDTI, slope, aspect, and RE1 (Figure 3). Among TCT variables, wetness ranked highest (6). For bi-seasonal difference variables, Blue_diff ranked highest (12), followed by SWIR1_diff (17), NDTI_diff (18), MCARI_diff (19), and Wetness_diff (20). In contrast, vegetation indices such as NDVI, EVI2, and SAVI ranked lower, indicating a limited contribution to model performance.

A Pearson correlation matrix revealed high multicollinearity among all 65 input variables. To reduce redundancy, correlation-based filtering at thresholds of 0.80, 0.90, 0.95, and 0.99 reduced the variable count to 16, 25, 36, and 59, respectively (Figure 4 and Figures S2–S5). Recursive feature elimination further refined the 0.90, 0.95, and 0.99 subsets to 19, 26, and 38 variables, achieving highest classification accuracies of 91.0%, 91.0%, and 90.2%, respectively. In contrast, the 0.80-filtered subset retained all 16 variables after RFE, with an accuracy of 90.3%. The 0.90-filtered subset with 19 RFE-selected variables was thereby considered optimal, maximizing accuracy with minimal input and thus selected for land cover classification in this study.

The RFE result applied to the 0.90-filtered subset effectively reduced redundancy while maintaining classification accuracy (Figure 5a). The highest accuracy 91.0 ± 0.9% (mean ± standard deviation (SD)) was achieved with 19 selected variables. Adding more variables did not improve performance, indicating redundancy and potential overfitting. Reducing the subset caused a minor accuracy drop of 0.4% with 16 variables, remaining within one SD of the peak accuracy. However, models with fewer than 11 variables showed significant performance loss, with accuracy dropping 2.4% at 11 variables and 28.0% with a single variable.

The final 19-variable subset effectively represents a balanced integration of variable groups, comprising all topographic variables (3/3), annual spectral bands 2/10), annual spectral indices (7/21), and bi-seasonal difference variables (7/31) (Figure 5b). Comparing these selected variables with the top 19 variables before selection (Figure 3) revealed a 58% overlap, with elevation, slope, and NDTI consistently ranking as the three top predictors. The top five variables—elevation, slope, NDTI, RE4, and RE1—significantly improved accuracy by 25.5%, increasing from 63.0% with only elevation to 88.5% with the addition of the other four variables.

Topographic variables, particularly elevation and slope, dominated the top rankings, while spectral indices such as NDTI, tcDistBG, MNDWI, as well as red-edge bands (RE4, RE1), contributed prominently to model performance. Additionally, bi-seasonal difference variables (SWIR1_diff, Blue_diff, MCARI_diff, RE3_diff, RE1_diff) capturing seasonal variability, further improved classification accuracy. These results highlight the effectiveness of RFE in optimizing variable selection, reducing redundancy, and maintaining high classification accuracy by integrating diverse variable groups across spectral bands, spectral indices, topographic variables, and bi-seasonal data.

3.2. Impact of Spectral Indices, Bi-Seasonal Differences, and Topography on Accuracy in Land Cover Classification

Figure 6 demonstrates that integrating bi-seasonal differences and topographic variables with variable selection and hyperparameter tuning significantly improved land cover classification accuracy. We observed that the addition of spectral indices to spectral bands (Spectral vs. Spectral+SI) did not yield a statistically significant improvement in the classification (Z = 0.36, one-sided p-value > 0.05). However, incorporating bi-seasonal variations (Spectral+SI vs. Spectral+SI+Diff) significantly improved overall accuracy by 6.7% (Z = −6.88, one-sided p-value < 0.001). Furthermore, adding topographic variables (Spectral+SI+Diff vs. Spectral+SI+Diff+Topo) further enhanced accuracy by 6.7% (Z = −8.07, one-sided p-value < 0.001), emphasizing the significance of elevation, slope, and aspect in LC classification.

The RFE-selected variables (19 variables) achieved an overall accuracy of 90.1%, showing only a minor, non-significant improvement over the full 65-variable model (Spectral+SI+Diff+Topo vs. RFEvar, Z = 1.53, one-sided p-value > 0.05). Similarly, further hyperparameter optimization of the Random Forest classifier on variable selection (Supplementary Figure S6) yielded just a minimal, non-significant 1.1% increase in OA (RFEvar vs. RFEvar-Hyper, Z = −1.60, one-sided p-value > 0.05), indicating that the RFEvar subset had already achieved near-optimal performance. In contrast, the RFEvar-Hyper model markedly outperformed the Spectral+SI+Diff+Topo model (Z = −3.12, one-sided p-value < 0.001), highlighting the combined effects of variable selection and hyperparameter tuning in enhancing classification accuracy. With 91.2% OA and 90.4% Kappa, within the acceptable thresholds for LC classification suggested by previous studies [86,87], RFEvar-Hyper was selected as the most accurate model for LC mapping in Kulen.

The class-level accuracy assessment of the six Random Forest models, using user accuracy, producer accuracy, and F1 score (Figure 7), aligns with overall accuracy trends at the map level (Figure 6), highlighting the benefits of bi-seasonal and topographic variables, variable sections, and hyperparameter tuning in LC classification. Across all models, rubber plantations and water bodies consistently achieved the highest UA, PA, and F1 score, whereas annual dataset models (Spectral and Spectral+SI) performed weakly, particularly for semi-evergreen forests falling below 65% for all three metrics. The incorporation of bi-seasonal variables (Spectral+SI+Diff) significantly improved accuracy for challenging classes such as semi-evergreen forests, deciduous forests, bamboo, tree plantations, paddy fields, and croplands. The integration of topographic variables (Spectral+SI+Diff+Topo) markedly enhanced UA and PA across all LC classes, except for cashew plantation and cropland UA. By reducing redundancy from Spectral+SI+Diff+Topo, the RFEvar model consistently achieved UA and PA above 80% across all classes. Following further optimization, the RFEvar-Hyper model was slightly improved and established as the most accurate and efficient model for LC classification in this study.

3.3. Comparison with Other Land Cover Products

The overall accuracies achieved by our model (KuLandCover), the ESA, and SERVIR were 92.1%, 68.8%, and 60.4%, respectively. All LC maps exhibited consistent patterns, with forest land predominating in Kulen and croplands being the dominant land cover in the surrounding buffer zone (Figure 8 and Figure A1). However, ESA and SERVIR overestimated forest land by 7.7–7.8% compared to KuLandCover, which estimated forest cover as 38.9% for the study area (Figure 8a–c, Figure A1 and Figure A2). Additionally, KuLandCover captured more settlements, classifying 13.0% of the study area, compared to less than 1.0% for both ESA and SERVIR. ESA, in contrast, allocated 15.0% of the area to “other land,” while SERVIR and KuLandCover classified less than 1.0% of the area as “other land”. KuLandCover accurately identified croplands in southern Kulen (Figure 8d), whereas ESA and SERVIR misclassified this area as forest lands. Similarly, KuLandCover correctly distinguished settlements surrounded by croplands (Figure 8e–f), while ESA and SERVIR misclassified these regions as forest lands.

3.4. Final Land Cover Map

Figure 9a illustrates the final land cover map of Kulen and its surrounding 10 km buffer zone in 2021, clearly distinguishing between natural forest and non-forest lands in the landscape. Forest lands cover 37.3% of the total area (164,706.7 ha), primarily concentrated in the central and northwestern parts of the landscape, particularly within the Kulen protected area. The forest lands consist of evergreen forests (8.3%), semi-evergreen forests (7.8%), deciduous forests (12.6%), regrowth forests (6.2%), and bamboo (2.4%). Meanwhile, non-forest lands dominate 62.7% of the landscape, primarily driven by croplands (22.2%) and paddy fields (18.9%), which are mainly located in the eastern and southern regions (Figure 9b).

Within Phnom Kulen National Park, forest lands are dominant, covering 72.1% of the total area, primarily consisting of evergreen forests (30.3%), semi-evergreen forests (16.5%), and deciduous forests (11.6%), extending from the central to northwestern regions (Figure 9a). Non-forest lands account for 27.9%, with cashew plantations as a major contributor (15.4%) in the southeast region. Other major non-forest lands, such as croplands, villages, and paddy fields, constitute 12.1% of the area (Figure 9c).

Similarly, the 10 km buffer zone outside Kulen is predominantly non-forest land (73.0%), mainly villages, paddy fields, and croplands, which collectively contribute 66.5% of the total area (Figure 9d). Despite the dominance of agricultural and settlement land uses, natural forests persist (27.0%), with deciduous forests accounting for 12.9%, primarily concentrated in the northern part of the buffer zone. Large patches of evergreen and semi-evergreen forests, covering 7.1% of the total area, are found mainly along the northwestern border of Kulen.

4. Discussions

4.1. Variable Selection and Variable Importance Ranking

4.1.1. Correlation-Based Filtering and Recursive Feature Elimination

The integration of correlation-based filtering and recursive feature elimination effectively reduced the number of variables from 65 to 19 while maintaining high classification accuracy (90.5% vs. 89.3% for the full dataset) (Figure 5 and Figure 6). A 0.9 correlation threshold, when combined with RFE, was optimal for retaining essential variables while minimizing redundancy. In contrast, previous studies [20,92] applied a 0.8 threshold, which proved overly restrictive, removing valuable variable interactions—particularly in RF models, which are capable of capturing those interactions for enhanced classification accuracy. Our findings found that applying a 0.8 threshold reduced accuracy, consistent with Schulz et al. (2021) [20], who reported similar declines when using RF models with strict correlation filtering.

Additionally, our findings align consistently with prior studies applying RFE across diverse remote sensing datasets and landscapes, such as Demarchi et al. (2020) [21], who reported a reduction from 188 to 24–28 variables in grassland classification, and Ma et al. (2022) [23], who reduced variables from 33 to 10 in cropland classification, all while maintaining classification accuracy. By systematically selecting high-impact variables from the full dataset, RFE retained essential, complementary variables, thereby enhancing model efficiency and interpretability while mitigating overfitting and reducing computational demands [22]. These findings confirm that RFE is a pivotal method for variable selection in high-dimensional remote sensing datasets for diverse LC classification tasks. Beyond optimizing variable selection, RFE enhanced model interpretability by directly linking each variable importance with classification accuracy throughout the iterative selection process (Figure 5). This capability provides a clear understanding of how the inclusion or exclusion of specific variables influences model performance, facilitating informed decision-making about the trade-off between accuracy and computational complexity [22,93]. For instance, although the optimal subset of 19 variables achieved the highest accuracy, models with 5 variables still performed within the acceptable accuracy thresholds for LC classification [86,87], offering a computationally efficient alternative for large-scale mapping efforts, particularly where computational resources are constrained.

However, applying RFE to large datasets presents computational challenges, and its outcomes may vary based on the training dataset’s characteristics, requiring customized variable selection strategies tailored to the dataset’s specific properties [22]. Despite these challenges, RFE remains an indispensable method in remote sensing applications, offering significant benefits in reducing dimensionality and improving model reliability and performance by focusing on the most informative variables. These findings highlight the importance of integrating RFE into LC classification workflows, particularly for complex tropical landscapes where diverse input variables and large-scale datasets demand efficient, robust solutions [22,94,95].

Despite valuable insights gained from applying RFE and a Random Forest classifier for tropical land cover classification, a few limitations must be acknowledged. The focus on a limited number and types of LC classes specific to the study area may limit the generalizability of the findings to other landscapes with more diverse or spatially complex LC compositions. Additionally, the exclusive focus on Random Forest classifiers, and a specific selection of input variables, raises concerns about the robustness of the model’s performance. The effectiveness of these methods is significantly influenced by the quality and availability of training data, as well as the complexity of the landscape. In this study, the use of extensive, high-quality training data ensured their robust performance. However, applying these approaches to regions with varying data quality and characteristics may yield differing results, highlighting the need for further research to evaluate their applicability in diverse environments. Future studies should explore a variety of machine learning and deep learning techniques to validate and enhance these findings. Integrating synthetic aperture radar data and a gray-level co-occurrence matrix could further improve classification accuracy [96,97]. Furthermore, adopting stacked ensemble models that combine multiple machine learning algorithms, such as Random Forest, support vector machines, or XGBoost, may capture diverse data patterns and enhance overall LC classification performance [23].

4.1.2. Variable Importance

Impact of Topography, Tillage, SWIR, Red Edge, Water, and Vegetation Indices on Land Cover Mapping of Tropical Regions

Our findings highlight elevation and slope as the most significant variables for LC classification, consistently ranking as top contributors in variable importance analysis before and after variable selection (Figure 3 and Figure 5b). The inclusion of topographic variables significantly improved classification accuracy, increasing OA by 6.7% compared to models without these variables (Figure 6). This result aligns with previous studies conducted across various landscapes [98,99,100], emphasizing the critical role of topographic data in accurately mapping LC classes. Furthermore, the topographic variables exhibited low correlation with other variables (Figure 4), consistent with Ma et al. (2023) [23] and Zhang et al., (2020) [75], indicating that they provide complementary information that enhances model performance and offers a unique physical and social context for distinguishing LC patterns [101]. Specifically, in this study, most natural forest lands, particularly evergreen forests, semi-evergreen forests, and bamboo, are predominantly found at higher elevations and slopes within the Kulen protected area. These elevated areas receive significantly higher annual rainfall than the surrounding lowlands, along with high relative humidity and tropical temperatures, creating favorable ecological conditions that support dense evergreen and semi-evergreen forest ecosystems [40]. In contrast, human-interacted LC classes such as croplands, paddy fields, rubber plantations, and tree plantations occupy lower, flatter areas outside the protected zone (Figure 9). Cambodia’s topography is predominantly flat, with approximately 70% of the country composed of low-elevation plains, while elevated areas such as plateaus and mountainous regions are often designated as protected areas to conserve natural forests [102,103]. Therefore, topographic variables are highly recommended to improve land cover classification in landscapes similar to those found in Cambodia.

NDTI is a widely used spectral index, primarily applied in agricultural management to assess soil tillage intensity and crop residue cover [104,105]. By utilizing short-wave infrared wavelengths, NDTI is highly sensitive to variations in soil exposure and moisture, allowing a clear distinction between bare soil conditions, non-photosynthetic vegetation, and dense green vegetation, which is critical for reducing misclassification between agricultural lands and natural forests in LC classification [106,107]. Agricultural areas often experience soil disturbance from tillage, harvesting, and planting, leading to greater soil exposure than forested areas, which typically maintain dense, stable canopies with minimal exposed soil. Furthermore, in the Kulen landscape which includes diverse agricultural and forested areas, NDTI not only differentiates agricultural lands from forests, but also provides critical insights into diverse agricultural practices, aiding in the distinction between croplands, paddy fields, and plantations, all of which may require different soil management (Table A1). These properties make NDTI a key variable for improving classification accuracy in our analysis.

The importance of the red-edge band for LC classification is driven by its unique spectral properties that enhance classification accuracy by capturing reflectance changes related to chlorophyll content and leaf structure across the red-to-near-infrared spectrum transition zone [108,109]. This sensitivity enables precise assessment of plant health, and productivity, which is crucial for distinguishing subtle variations between vegetation and other land cover types [108,110]. In this study, we classify multiple vegetation-related land cover types differing in species composition, canopy density, leaf structure, and chlorophyll content. The red-edge band effectively captures these fine-scale structural and physiological differences, improving class separability and highlighting its significant role in accurate land cover classification.

Our findings demonstrate the significant contribution of tcDistBG, tcAngleBW, and tcAngleGW to LC classification accuracy, providing additional evidence of the effectiveness of tasseled cap transformation in remote sensing. By reducing spectral dimensionality and enhancing spectral separability, TCT effectively extracts brightness (urban and bare soil), greenness (vegetation density), and wetness (moisture content), which are widely applied in land cover classification and land cover change detection [53,111]. tcDistBG, representing the spectral magnitude from the origin to the brightness–greenness plane, has been recognized for its role in assessing forest stand age, structure, and biomass [54,112,113] and has been further supported in this study as a key classification variable. Meanwhile, tcAngleBW and tcAngleGW, though previously introduced in land cover classification [55], emerge in this study as critical variables, demonstrating their significance in characterizing directional spectral relationships relative to their primary axes and enhancing class separability. While tcAngleBG has been associated with vegetation cover detection [114,115], the broader ecological and biophysical significance of these angular variables remains underexplored, emphasizing the need for further research to refine their interpretation and applications in remote sensing.

MNDWI can be regarded as a key variable due to its complementary role in distinguishing moisture content in soil and vegetation. MNDWI was developed as an improvement on NDWI to enhance the detection of water bodies by reducing interference from vegetation, shadows, soil, and impervious surfaces [116,117], which is supported by our choice of MNDWI instead of NDWI (Figure 5b). Meanwhile, MCARI and GNDVI were selected for their ability to capture detailed leaf chlorophyll concentration and canopy structure, which could improve vegetation differentiation. MCARI, integrating red-edge bands, exhibits high sensitivity to leaf chlorophyll while minimizing interference from non-photosynthetic materials [67,118]. It strongly correlates with leaf area index [119] and is effective for crop type distribution mapping [120]. Unlike NDVI, GNDVI replaces the red band with green combined with NIR, minimizing atmospheric interference, enhancing leaf chlorophyll detection, and reducing saturation in dense canopies such as evergreen and semi-evergreen forests [121].

Significance of Multi-Temporal Data in Tropical Land Cover Classification

Our analysis demonstrates that incorporating bi-seasonal datasets into LC classification models consistently outperforms models using only annual datasets, regardless of whether spectral bands or spectral indices are included (Figure 6). This finding aligns with previous studies emphasizing the importance of bi-seasonal data for effectively classifying forest types, crop types, and mixed land uses across diverse landscapes [13,20]. The higher accuracy of the Spectral+SI+Diff model stems from improved classification of natural forests, such as semi-evergreen forests, bamboo, deciduous forests, and tree plantations (Figure 7). Similar findings were reported by Nguyen et al. (2020) [13], where bi-seasonal data significantly improved the classification accuracy of semi-evergreen and plantation forests in Vietnam. This improvement is largely due to the ability of bi-seasonal datasets to capture phenological changes in deciduous and semi-evergreen forests, which are otherwise challenging to differentiate using single-season data [13,122].

Additionally, bi-seasonal data improve the classification of agricultural lands by capturing temporal variations in human land use practices [123]. This capability is particularly valuable for distinguishing between crop types, growth stages, and management practices, such as irrigation and harvesting [124]. Specifically, our study highlights improved classification between paddy fields and croplands (mainly cassava), which dominate agricultural land use in the region. In the dry season, paddy fields and cassava croplands show similar characteristics, such as dryness and sparse vegetation cover. In contrast, the rainy season distinguishes them, with paddy fields becoming inundated for rice cultivation while cassava fields remain unflooded. Future studies should incorporate bi-seasonal datasets to capture seasonal phenological changes and temporal variations, thereby improving LC classification accuracy for forest and agricultural lands.

4.2. Comparison with Other Land Cover Products

Our LC map, KuLandCover, outperforms ESA and SERVIR in accuracy. However, several studies have reported that global and national-scale land cover maps typically exhibit lower accuracy than locally derived models, particularly when capturing fine-scale, localized features [125,126]. This emphasizes the advantages of using locally derived models for more accurate land cover classification, especially in heterogeneous tropical mountain areas.

The observed discrepancies in forest land cover estimates among the three products are due to variations in their original class definitions relative to IPCC’s criteria for forest lands. The IPCC defines forest lands as areas with woody vegetation, including natural forests and tree plantations mainly used for timber production [71,91]. KuLandCover aligns closely with this definition by clearly distinguishing between natural forest classes and identifying tree plantations, ensuring compliance with the IPCC’s forest land classification. While SERVIR classifies natural forests similarly, it does not have a distinct tree plantation class, limiting its alignment with the IPCC’s definition. Although SERVIR’s natural forests contribute to its forest land estimate, the misclassification of croplands and settlements as forest lands (Figure 8) leads to an overestimation. In contrast, ESA uses a broad “Tree cover” definition, encompassing areas with over 10% tree cover, merging natural forests, tree plantations, and other plantations including cashews and rubbers, which are not classified as forest lands. This broad categorization complicates the distinction between the tree plantations and other plantations and often results in overestimating forest lands. Therefore, the detailed LC classes in KuLandCover, which are closely aligned with the IPCC’s forest land definition and effectively distinguish between tree plantations with other plantations, make it ideally suited for studies on deforestation, forest management, and biodiversity.

KuLandCover estimates a larger settlement area due to its broader classification definition, which includes built-up areas and surrounding cultivated land, reflecting typical rural settlements in Cambodia. In contrast, ESA restricts settlements to built-up areas, resulting in a lower estimate [71,127]. KuLandCover achieves higher accuracy, with user and producer accuracies of 84.9% and 82.0%, compared to ESA’s 67.7% and 67.9%, respectively (Figure 7; [128]). Although SERVIR does not report class-level accuracy, its development builds on Saah et al. (2020) [55], which documented low accuracy in settlement classification. Ultimately, KuLandCover delivers more accurate and comprehensive settlement estimates through precise classification, thereby capturing the human dimension of landscapes, enhancing our understanding of human–forest interactions, and informing land use and forest management [129].

The 15.0% estimate of IPCC’s “other land” in ESA is primarily attributed to the misclassification of grasslands, a class extensively included in ESA. In contrast, KuLandCover and SERVIR report much lower estimates (0.0% and 0.2%, respectively), as natural grasslands were not observed in our study area and were therefore excluded from classification. Both KuLandCover and SERVIR perhaps provide more accurate estimates since grasslands covered only 1.76% of Cambodia in 2018, mainly around Tonle Sap Lake and Mondulkiri Province, outside our study area [71,129,130]. The inclusion of grasslands in ESA’s “other land” category is inconsistent with the landscape characteristics of our study area, where grasslands are absent. Furthermore, previous studies, including Venter et al. (2022) [131] and Zhai et al. (2023) [132], have shown that ESA overestimates grassland coverage globally and in mainland Southeast Asia, with low user accuracy. These discrepancies support the validity of our estimation, which reflects the actual LC characteristics of the study area.

Although we applied the IPCC classification scheme to harmonize the ESA, SERVIR, and KuLandCover products before the comparison, differences in the original class definitions likely contributed to the observed discrepancies in the accuracy results. These inconsistencies highlight the challenges of comparing products developed with different classification schemes, and we acknowledge that this limited our ability to assess how specific class-level differences affected agreement across products.

4.3. Land Cover of Kulen

Our study provides a detailed, locally specific LC classification for the Kulen area in Cambodia, addressing critical data gaps for effective landscape management in Phnom Kulen National Park. Notably, we found disagreements between our estimates of natural forest and cashew plantation areas and those reported by Singh et al. (2019) [47]. Despite differences in mapping years, our estimate of natural forest lands in 2021 (72.1%) is much higher than Singh et al.’s 2016 [47] estimate of less than 40%, which contrasts with other studies reporting at least 70.0% forest lands for the same region in 2016 [41,55]. This suggests that our estimate is more consistent with the prevailing understanding of the region’s forest cover. Furthermore, our area estimation of cashew plantations (15.4%) contrasts sharply with Singh et al.’s overestimate of greater than 60%, likely due to challenges in generating accurate reference data for cashew classification, particularly when relying on high-resolution imagery from sources like Google Earth [133,134]. In contrast, our study incorporated a robust dataset of 236 GPS points, specifically for cashew plantations, which significantly improved classification accuracy. Singh et al. (2019) [47], however, did not provide sufficient details regarding their reference data collection methodology, which likely impacted the precision of their estimates. Additionally, given the steady increase in cashew plantation area and production in Cambodia over the last decade [134,135], Singh et al.’s estimate for 2016 would suggest significant growth in cashew plantations by 2021 [47], making their estimate even more discrepant from our findings. Therefore, our results offer a more plausible and reliable representation of LC in Kulen, reflecting a more accurate depiction of LC in the landscape.

5. Conclusions

This study investigated land cover classification accuracy in tropical mountainous landscapes in relation to integrating spectral bands, spectral indices, bi-seasonal differences, and topographic variables, alongside advanced machine-learning techniques. Our findings demonstrate the significant role of topographic variables, such as elevation and slope, in improving model performance by providing a complementary physical and social context for distinguishing land cover patterns. Spectral bands and indices, including NDTI, red edge, MNDWI, MCARI, GNDVI, and tasseled cap transformations significantly contributed to the differentiation of forest types, agricultural lands, and water-land, while bi-seasonal datasets captured essential phenological changes and temporal variations in land use, improving the classification of challenging land cover classes like semi-evergreen forests, deciduous forests, bamboo, tree plantations, paddy fields, and croplands.

Applying correlation-based filtering and recursive feature elimination for variable selection reduced the variable set from 65 to 19, improving model efficiency and interpretability while preserving accuracy. Combining recursive feature elimination with hyperparameter tuning in Random Forest further optimized performance, ensuring model stability and reliability. Our LC map, KuLandCover, outperforms global and national-scale products (ESA, SERVIR), providing more accurate estimates of natural forest cover, croplands, settlements, and cashew plantations with finer resolution and greater reliability. This demonstrates the advantages of using locally derived models and high-quality ground control points for accurate LC classification in heterogeneous landscapes, making it ideal for deforestation, forest management, biodiversity conservation, and land use studies.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs17091551/s1, Table S1: Description of UAV imagery data from 10 sites; Figure S1: Reference polygons used for generating training and validation datasets overlaid on March 2021 PlanetScope imagery; Figure S2: Pearson correlation matrix of all 65 variables; Figure S3: Pearson correlation matrix of the 59 variables retained after applying correlation-based filtering at a 0.99 threshold; Figure S4: Pearson correlation matrix of the 36 variables retained after applying correlation-based filtering at a 0.95 threshold; Figure S5: Pearson correlation matrix of the 16 variables retained after applying correlation-based filtering at a 0.80 threshold; Figure S6: Hyperparameter tuning of Random Forest for variable selection.

Author Contributions

Conceptualization, C.S., T.T. and S.O.; methodology, C.S., T.T., S.O. and A.M.; software, C.S. and S.P.; formal analysis, C.S.; investigation, C.S. and S.S.; resources, S.K. and S.S.; data curation, C.S. and S.P.; writing—original draft preparation, C.S.; writing—review and editing, C.S., T.T., S.O., A.M. and S.K.; visualization, C.S.; supervision, T.T., S.O., S.K. and A.M.; funding acquisition, S.K., T.T. and S.O. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Swedish International Development Cooperation Agency through the “Sweden-Royal University of Phnom Penh Bilateral program” (Contribution Number: 11599). Tagesson was additionally funded by the Swedish National Space Agency (SNSA Dnr: 2021-00144) and Formas (Dnr. 2021-00644; 2023-02436). We also acknowledge funding from Fysiografen. The research presented in this paper is a contribution to the Strategic Research Area “Biodiversity and Ecosystem Services in a Changing Climate”, BECC, funded by the Swedish government.

Data Availability Statement

The original data presented in the study are openly available in Zenodo at https://zenodo.org/records/14927089, deposited on 28 February 2025.

Acknowledgments

We are grateful to the Ministry of Environment (Cambodia) and Siem Reap Provincial Administration for their grant permissions, administrative support, and accommodation during our fieldwork. A special note of thanks goes to Seng Saingheat for his dedication and leadership, along with his ranger colleagues, including Sou Sy, Let Chey, Khun Chi, Soun Sao, Kroem Veng, Choun Choy, and Ti Has. We greatly appreciate Mot Ly and Chim Lychheng, as well as Rum Pheara, Nhong Vatey, Svay Chanboth, and Mach Sokmean, for their support throughout our data collection journey.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LC	Land cover
RFE	Recursive feature elimination
REDD+	Reducing Emissions from Deforestation and Forest Degradation
RF	Random Forest
GEE	Google Earth Engine
Kulen	Phnom Kulen National Park
MSI	Multi-Spectral Instrument
NIR	Near Infrared
RE	Red Edge
SWIR	Short-Wave Infrared
TCT	tasseled cap transformations
tcAngle	tasseled cap angles
tcDist	tasseled cap distances
MNDWI	Modification Normalized Difference Water Index
NDWI	Normalized Difference Water Index
NDMI	Normalized Difference Moisture Index
NBR	Normalized Burn Ratio
NDTI	Normalized Difference Tillage Index
NDVI	Normalized Difference Vegetation Index
EVI2	2-band Enhanced Vegetation Index
GNDVI	Green Normalized Difference Vegetation Index
SAVI	Soil Adjusted Vegetation Index
NDRE	Normalized Difference Red Edge Index
MCARI	Modified Chlorophyll Absorption Ratio Index
BUI	Build-Up Index
SRTM	Shuttle Radar Topography Mission
GCP	GPS points from field observations
GPS	Global Positioning System
UAV	Uncrewed Aerial Vehicle
Spectral	Annual spectral bands
Spectral+SI	Annual spectral bands combined with spectral indices
Spectral+SI+Diff	Spectral+SI combined with bi-seasonal differences
Spectral+SI+Diff+Topo	Spectral+SI+Diff combined with topographic variables
RFEvar-Hyper	RFEvar with hyperparameter-tuned Random Forest
RFEvar	RFE-selected variables
OA	Overall accuracy
Kappa	Kappa coefficient
UA	User accuracy
PA	Producer accuracy
ESA	European Space Agency (ESA) WorldCover 2020
KuLandCover	Our optimized LC map
SERVIR	SERVIR-SEA Cambodia National Land Cover 2021
IPCC	Intergovernmental Panel on Climate Change
_diff	Suffix for bi-seasonal differences in 10 spectral bands and 21 indices.
NASA	National Aeronautics and Space Administration
ntree	numbers of decision trees
mtry	numbers of variables tries at each split
300-tree RF	The RF model was configured with 300 trees, using the default settings of the “ee.Classifier.smileRandomForest” function in Google Earth Engine: mtry as the square root of the number of variables, minLeafPopulation as 1, bagFraction as 0.5, no limit on maxNodes, and seed as 0
SD	standard deviation

Appendix A

Table A1. Land cover class definitions used in this study.

No.	Land Cover Class	Description
1	Evergreen forests	Areas covered by trees maintain their leaves during the whole year.
2	Semi-evergreen forests	Contain variable percentages of evergreen and deciduous trees.
3	Deciduous forests	Comprised of dry mixed deciduous forest and dry Dipterocarp forests
4	Regrowth forests	Areas with more than 50% naturally regenerated forest with clearly visible indications of human activities (selective logging, previous agricultural land use, recovering from human-induced fire)
5	Bamboo	Areas dominated by bamboo
6	Tree plantations	Plantations of teak, eucalyptus, acacia, jatropha, and others.
7	Rubber plantations	Areas with more than 50% rubber plantation.
8	Croplands	Arable and tillage land, and agro-forestry systems where tree cover falls below the thresholds used for the forest land category. Examples of cropland include cassava and mango plantations.
9	Paddy rice fields	Flooded parcels of arable land used for growing semi-aquatic rice.
10	Villages	The patch of land with houses and gardens surrounding the house.
11	Water bodies	Area of fresh and seawater
12	Cashew plantations	This area is primarily dominated by cashew trees, ranging from small household-scale plantations to larger commercial plantations.

Table A2. Characteristics of land cover products used for comparison.

Product	Resolution	Data	Coverage	Classes	Reference
ESA WorldCover 10 m 2020 v100	10 m	2020	Global	Tree cover, shrubland, mangroves, cropland, bare/sparse vegetation, permanent water bodies, herbaceous wetland, built-up, grassland, snow and ice, moss and lichen.	[90]
SERVIR-SEA Cambodia National Land Cover	30 m	2021	Cambodia	Mangrove, shrub, evergreen, deciduous, flooded forest, semi-evergreen, other plantation, rice, cropland, rubber, water, wetland, built-up area, village, grass, others.	[89]

Table A3. Reclassification of original land cover products into standardized IPCC land cover classes for comparative analysis.

IPCC Land Covers [91]	Original LC Classes
IPCC Land Covers [91]	ESA [127]	SERVIR [89]	Our Reference Polygons and LC Product
Forest land	Tree cover, shrubland	Evergreen, deciduous, semi-evergreen, flooded forest, shrub	Evergreen forest, semi-evergreen forest, deciduous forest, regrowth forest, bamboo, tree plantation
Cropland	Cropland, bare/sparse vegetation	Other plantations, rice, cropland, rubber	Rubber plantation, cropland, paddy field, cashew
Wetland	Permanent water bodies, herbaceous wetland	Water, wetland	Water
Settlement	Built-up	Built-up area, village	Village
Other land	Grassland	Grass, others	-

Figure A1. Comparison of land cover between (a) KuLandCover, (b) ESA, and (c) SERVIR.

Figure A2. Percentage of shared land cover areas between classification products.

References

Brandon, K. Ecosystem services from tropical forests: Review of current science. Cent. Glob. Dev. Work. Pap. 2014, 7, 380. [Google Scholar] [CrossRef]
Leemans, R.; De Groot, R.S. Millennium Ecosystem Assessment: Ecosystems and Human Well-Being: A Framework for Assessment; CIFOR: Bogor, Indonesia, 2003. [Google Scholar]
Artaxo, P.; Hansson, H.C.; Machado, L.A.T.; Rizzo, L.V. Tropical forests are crucial in regulating the climate on Earth. PLoS Clim. 2022, 1, e0000054. [Google Scholar] [CrossRef]
Davis, K.F.; Koo, H.I.; Dell’Angelo, J.; D’Odorico, P.; Estes, L.; Kehoe, L.J.; Kharratzadeh, M.; Kuemmerle, T.; Machava, D.; Pais, A.d.J.R.; et al. Tropical forest loss enhanced by large-scale land acquisitions. Nat. Geosci. 2020, 13, 482–488. [Google Scholar] [CrossRef]
Lamarre, G.P.A.; Fayle, T.M.; Segar, S.T.; Laird-Hopkins, B.C.; Nakamura, A.; Souto-Vilarós, D.; Watanabe, S.; Basset, Y. Chapter Eight—Monitoring tropical insects in the 21st century. In Advances in Ecological Research; Dumbrell, A.J., Turner, E.C., Fayle, T.M., Eds.; Academic Press: Cambridge, MA, USA, 2020; Volume 62, pp. 295–330. [Google Scholar]
Pauly, M.; Crosse, W.; Tosteson, J. High deforestation trajectories in Cambodia slowly transformed through economic land concession restrictions and strategic execution of REDD+ protected areas. Sci. Rep. 2022, 12, 17102. [Google Scholar] [CrossRef]
Kobayashi, S. Landscape rehabilitation of degraded tropical forest ecosystems: Case study of the CIFOR/Japan project in Indonesia and Peru. For. Ecol. Manag. 2004, 201, 13–22. [Google Scholar] [CrossRef]
Agrawal, A.; Nepstad, D.; Chhatre, A. Reducing Emissions from Deforestation and Forest Degradation. Annu. Rev. Environ. Resour. 2011, 36, 373–396. [Google Scholar] [CrossRef]
Zekeng, J.C.; Sebego, R.; Mphinyane, W.N.; Mpalo, M.; Nayak, D.; Fobane, J.L.; Onana, J.M.; Funwi, F.P.; Mbolo, M.M.A. Land use and land cover changes in Doume Communal Forest in eastern Cameroon: Implications for conservation and sustainable management. Model. Earth Syst. Environ. 2019, 5, 1801–1814. [Google Scholar] [CrossRef]
Wang, C.; Yu, M.; Gao, Q. Continued Reforestation and Urban Expansion in the New Century of a Tropical Island in the Caribbean. Remote Sens. 2017, 9, 731. [Google Scholar] [CrossRef]
Tsai, Y.H.; Stow, D.; Chen, H.L.; Lewison, R.; An, L.; Shi, L. Mapping Vegetation and Land Use Types in Fanjingshan National Nature Reserve Using Google Earth Engine. Remote Sens. 2018, 10, 927. [Google Scholar] [CrossRef]
Ferrer Velasco, R.; Lippe, M.; Tamayo, F.; Mfuni, T.; Sales-Come, R.; Mangabat, C.; Schneider, T.; Günter, S. Towards accurate mapping of forest in tropical landscapes: A comparison of datasets on how forest transition matters. Remote Sens. Environ. 2022, 274, 112997. [Google Scholar] [CrossRef]
Nguyen, H.T.; Doan, T.M.; Tomppo, E.; McRoberts, R.E. Land Use/Land Cover Mapping Using Multitemporal Sentinel-2 Imagery and Four Classification Methods—A Case Study from Dak Nong, Vietnam. Remote Sens. 2020, 12, 1367. [Google Scholar] [CrossRef]
Eggen, M.; Ozdogan, M.; Zaitchik, B.F.; Simane, B. Land Cover Classification in Complex and Fragmented Agricultural Landscapes of the Ethiopian Highlands. Remote Sens. 2016, 8, 1020. [Google Scholar] [CrossRef]
Pizarro, S.E.; Pricope, N.G.; Vargas-Machuca, D.; Huanca, O.; Ñaupari, J. Mapping Land Cover Types for Highland Andean Ecosystems in Peru Using Google Earth Engine. Remote Sens. 2022, 14, 1562. [Google Scholar] [CrossRef]
Salinas-Melgoza, M.A.; Skutsch, M.; Lovett, J.C. Predicting aboveground forest biomass with topographic variables in human-impacted tropical dry forest landscapes. Ecosphere 2018, 9, e02063. [Google Scholar] [CrossRef]
Gregorutti, B.; Michel, B.; Saint-Pierre, P. Correlation and variable importance in random forests. Stat. Comput. 2017, 27, 659–678. [Google Scholar] [CrossRef]
Hamad, Z.O. Review Of Feature Selection Methods Using Optimization Algorithm (Review Paper For Optimization Algorithm). Polytech. J. 2023, 12, 24. [Google Scholar] [CrossRef]
Jović, A.; Brkić, K.; Bogunović, N. A review of feature selection methods with applications. In Proceedings of the 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 25–29 May 2015; pp. 1200–1205. [Google Scholar]
Schulz, D.; Yin, H.; Tischbein, B.; Verleysdonk, S.; Adamou, R.; Kumar, N. Land use mapping using Sentinel-1 and Sentinel-2 time series in a heterogeneous landscape in Niger, Sahel. ISPRS J. Photogramm. Remote Sens. 2021, 178, 97–111. [Google Scholar] [CrossRef]
Demarchi, L.; Kania, A.; Ciężkowski, W.; Piórkowski, H.; Oświecimska-Piasko, Z.; Chormański, J. Recursive Feature Elimination and Random Forest Classification of Natura 2000 Grasslands in Lowland River Valleys of Poland Based on Airborne Hyperspectral and LiDAR Data Fusion. Remote Sens. 2020, 12, 1842. [Google Scholar] [CrossRef]
Ramezan, C.A. Transferability of Recursive Feature Elimination (RFE)-Derived Feature Sets for Support Vector Machine Land Cover Classification. Remote Sens. 2022, 14, 6218. [Google Scholar] [CrossRef]
Ma, Z.; Li, W.; Warner, T.A.; He, C.; Wang, X.; Zhang, Y.; Guo, C.; Cheng, T.; Zhu, Y.; Cao, W.; et al. A framework combined stacking ensemble algorithm to classify crop in complex agricultural landscape of high altitude regions with Gaofen-6 imagery and elevation data. Int. J. Appl. Earth Obs. Geoinf. 2023, 122, 103386. [Google Scholar] [CrossRef]
Cánovas-García, F.; Alonso-Sarría, F. Optimal Combination of Classification Algorithms and Feature Ranking Methods for Object-Based Classification of Submeter Resolution Z/I-Imaging DMC Imagery. Remote Sens. 2015, 7, 4651–4677. [Google Scholar] [CrossRef]
Georganos, S.; Grippa, T.; Vanhuysse, S.; Lennert, M.; Shimoni, M.; Kalogirou, S.; Wolff, E. Less is more: Optimizing classification performance through feature selection in a very-high-resolution remote sensing object-based urban application. GIScience Remote Sens. 2018, 55, 221–242. [Google Scholar] [CrossRef]
Xu, S.; Xiao, W.; Ruan, L.; Chen, W.; Du, J. Assessment of ensemble learning for object-based land cover mapping using multi-temporal Sentinel-1/2 images. Geocarto Int. 2023, 38, 2195832. [Google Scholar] [CrossRef]
Manandhar, R.; Odeh, I.O.A.; Ancev, T. Improving the Accuracy of Land Use and Land Cover Classification of Landsat Data Using Post-Classification Enhancement. Remote Sens. 2009, 1, 330–344. [Google Scholar] [CrossRef]
Nguyen, T.T.H. Forestry Remote Sensing: Multi-Source Data in Natural Evergreen Forest Inventory in the Central Highlands of Vietnam; LAP LAMBERT Academic Publishing: Saarbrücken, Germany, 2011. [Google Scholar]
Heydari, S.S.; Mountrakis, G. Effect of classifier selection, reference sample size, reference class distribution and scene heterogeneity in per-pixel classification accuracy using 26 Landsat sites. Remote Sens. Environ. 2018, 204, 648–658. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Georganos, S.; Grippa, T.; Niang Gadiaga, A.; Linard, C.; Lennert, M.; Vanhuysse, S.; Mboga, N.; Wolff, E.; Kalogirou, S. Geographical random forests: A spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto Int. 2019, 36, 121–136. [Google Scholar] [CrossRef]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Pelletier, C.; Valero, S.; Inglada, J.; Champion, N.; Dedieu, G. Assessing the robustness of Random Forests to map land cover with high resolution satellite image time series over large areas. Remote Sens. Environ. 2016, 187, 156–168. [Google Scholar] [CrossRef]
Tieng, T.; Sharma, S.; MacKenzie, R.A.; Venkattappa, M.; Sasaki, N.K.; Collin, A. Mapping mangrove forest cover using Landsat-8 imagery, Sentinel-2, Very High Resolution Images and Google Earth Engine algorithm for entire Cambodia. IOP Conf. Ser. Earth Environ. Sci. 2019, 266, 012010. [Google Scholar] [CrossRef]
Nguyen, H.T.T.; Doan, T.M.; Radeloff, V. Applying random forest classification to map land use/land cover using Landsat 8 OLI. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, 42, 363–367. [Google Scholar] [CrossRef]
Ha, T.V.; Tuohy, M.; Irwin, M.; Tuan, P.V. Monitoring and mapping rural urbanization and land use changes using Landsat data in the northeast subtropical region of Vietnam. Egypt. J. Remote Sens. Space Sci. 2020, 23, 11–19. [Google Scholar] [CrossRef]
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
Amani, M.; Ghorbanian, A.; Ahmadi, S.A.; Kakooei, M.; Moghimi, A.; Mirmazloumi, S.M.; Moghaddam, S.H.A.; Mahdavi, S.; Ghahremanloo, M.; Parsian, S.; et al. Google Earth Engine Cloud Computing Platform for Remote Sensing Big Data Applications: A Comprehensive Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5326–5350. [Google Scholar] [CrossRef]
Geissler, P.; Hartmann, T.; Ihlow, F.; Neang, T.; Seng, R.; Wagner, P.; Bohme, W. Herpetofauna of the Phnom Kulen. Cambodian J. Nat. Hist. 2019, 40, 780. [Google Scholar]
Sovann, C.; Tagesson, T.; Vestin, P.; Sakhoeun, S.; Kim, S.; Kok, S.; Olin, S. Characteristics of ecosystems under various anthropogenic impacts in a tropical forest region of Southeast Asia. EGUsphere 2025, 2025, 3784. [Google Scholar] [CrossRef]
Somaly, O.; Sasaki, N.; Kimchhin, S.; Tsusaka, T.W.; Shrestha, S.; Malyne, N. Impact of Forest Cover Change in Phnom Kulen National Park on Downstream Local Livelihoods along Siem Reap River, Cambodia. Int. J. Environ. Rural Dev. 2020, 11, 93–99. [Google Scholar] [CrossRef]
Hang, P.; Ishwaran, N.; Hong, T.; Delanghe, P. From Conservation to Sustainable Development—A Case Study of Angkor World Heritage Site, Cambodia. J. Environ. Sci. Eng. A 2016, 5, 141–155. [Google Scholar] [CrossRef]
Provincial Department of Planning Siem Reap. Commune Socio-Economic Situation; Provincial Department of Planning Siem Reap: Siem Reap, Cambodia, 2024.
Magliocca, N.R.; Khuc, Q.V.; de Bremond, A.; Ellicott, E.A. Direct and indirect land-use change caused by large-scale land acquisitions in Cambodia. Environ. Res. Lett. 2020, 15, 024010. [Google Scholar] [CrossRef]
Motzke, I.; Wanger, T.C.; Zanre, E.; Tscharntke, T.; Barkmann, J. Socio-economic context of forest biodiversity use along a town-forest gradient in Cambodia. Raffles Bull. Zool. 2012, 30, 37–53. [Google Scholar]
Chim, K.; Tunnicliffe, J.; Shamseldin, A.; Ota, T. Land Use Change Detection and Prediction in Upper Siem Reap River, Cambodia. Hydrol.-Basel 2019, 6, 64. [Google Scholar] [CrossRef]
Singh, M.; Evans, D.; Chevance, J.-B.; Tan, B.S.; Wiggins, N.; Kong, L.; Sakhoeun, S. Evaluating remote sensing datasets and machine learning algorithms for mapping plantations and successional forests in Phnom Kulen National Park of Cambodia. PeerJ 2019, 7, e7841. [Google Scholar] [CrossRef] [PubMed]
Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
Gorroño, J.; Banks, A.C.; Fox, N.P.; Underwood, C. Radiometric inter-sensor cross-calibration uncertainty using a traceable high accuracy reference hyperspectral imager. ISPRS J. Photogramm. Remote Sens. 2017, 130, 393–417. [Google Scholar] [CrossRef]
Müller-Wilm, U.; Devignot, O.; Pessiot, L. S2 MPC Sen2Cor Configuration and User Manual; EASE: San Francisco, CA, USA, 2017. [Google Scholar]
Hosseiny, B.; Abdi, A.M.; Jamali, S. Urban land use and land cover classification with interpretable machine learning—A case study using Sentinel-2 and auxiliary data. Remote Sens. Appl. Soc. Environ. 2022, 28, 100843. [Google Scholar] [CrossRef]
Pour, A.B.; Ranjbar, H.; Sekandari, M.; Abd El-Wahed, M.; Hossain, M.S.; Hashim, M.; Yousefi, M.; Zoheir, B.; Wambo, J.D.T.; Muslim, A.M. 2-Remote sensing for mineral exploration. In Geospatial Analysis Applied to Mineral Exploration; Pour, A.B., Parsa, M., Eldosouky, A.M., Eds.; Elsevier: Amsterdam, The Netherlands, 2023; pp. 17–149. [Google Scholar]
Shi, T.; Xu, H. Derivation of Tasseled Cap Transformation Coefficients for Sentinel-2 MSI At-Sensor Reflectance Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 4038–4048. [Google Scholar] [CrossRef]
Powell, S.L.; Cohen, W.B.; Healey, S.P.; Kennedy, R.E.; Moisen, G.G.; Pierce, K.B.; Ohmann, J.L. Quantification of live aboveground forest biomass dynamics with Landsat time-series and field inventory data: A comparison of empirical modeling approaches. Remote Sens. Environ. 2010, 114, 1053–1068. [Google Scholar] [CrossRef]
Saah, D.; Tenneson, K.; Poortinga, A.; Nguyen, Q.; Chishtie, F.; Aung, K.S.; Markert, K.N.; Clinton, N.; Anderson, E.R.; Cutter, P.; et al. Primitives as building blocks for constructing land cover maps. Int. J. Appl. Earth Obs. Geoinf. 2020, 85, 101979. [Google Scholar] [CrossRef]
Xu, H. Modification of normalised difference water index (NDWI) to enhance open water features in remotely sensed imagery. Int. J. Remote Sens. 2006, 27, 3025–3033. [Google Scholar] [CrossRef]
McFeeters, S.K. The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. Int. J. Remote Sens. 1996, 17, 1425–1432. [Google Scholar] [CrossRef]
Gao, B.-C. NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sens. Environ. 1996, 58, 257–266. [Google Scholar] [CrossRef]
García, M.J.L.; Caselles, V. Mapping burns and natural reforestation using thematic Mapper data. Geocarto Int. 1991, 6, 31–37. [Google Scholar] [CrossRef]
Van Deventer, A.P.; Ward, A.D.; Gowda, P.M.; Lyon, J.G. Using thematic mapper data to identify contrasting soil plains and tillage practices. Photogramm. Eng. Remote Sens. 1997, 63, 87–93. [Google Scholar]
Rouse, J.W.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring Vegetation Systems in the Great Plains with ERTS; NASA: Washington, DC, UAS, 1973. [Google Scholar]
Jiang, Z.; Huete, A.R.; Didan, K.; Miura, T. Development of a two-band enhanced vegetation index without a blue band. Remote Sens. Environ. 2008, 112, 3833–3845. [Google Scholar] [CrossRef]
Huang, X.; Xiao, J.; Ma, M. Evaluating the Performance of Satellite-Derived Vegetation Indices for Estimating Gross Primary Productivity Using FLUXNET Observations across the Globe. Remote Sens. 2019, 11, 1823. [Google Scholar] [CrossRef]
Gitelson, A.A.; Kaufman, Y.J.; Merzlyak, M.N. Use of a green channel in remote sensing of global vegetation from EOS-MODIS. Remote Sens. Environ. 1996, 58, 289–298. [Google Scholar] [CrossRef]
Huete, A.R. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
Barnes, E.M.; Clarke, T.R.; Richards, S.E.; Colaizzi, P.D.; Haberland, J.; Kostrzewski, M.; Waller, P.; Choi, C.; Riley, E.; Thompson, T. Coincident detection of crop water stress, nitrogen status and canopy density using ground based multispectral data. In Proceedings of the Fifth International Conference on Precision Agriculture, Bloomington, MN, USA, 16 July 2000. [Google Scholar]
Daughtry, C.S.T.; Walthall, C.L.; Kim, M.S.; de Colstoun, E.B.; McMurtrey, J.E. Estimating Corn Leaf Chlorophyll Concentration from Leaf and Canopy Reflectance. Remote Sens. Environ. 2000, 74, 229–239. [Google Scholar] [CrossRef]
Zhang, H.; Li, J.; Liu, Q.; Lin, S.; Huete, A.; Liu, L.; Croft, H.; Clevers, J.G.P.W.; Zeng, Y.; Wang, X.; et al. A novel red-edge spectral index for retrieving the leaf chlorophyll content. Methods Ecol. Evol. 2022, 13, 2771–2787. [Google Scholar] [CrossRef]
He, C.; Shi, P.; Xie, D.; Zhao, Y. Improving the normalized difference built-up index to map urban built-up areas using a semiautomatic segmentation approach. Remote Sens. Lett. 2010, 1, 213–221. [Google Scholar] [CrossRef]
Farr, T.G.; Rosen, P.A.; Caro, E.; Crippen, R.; Duren, R.; Hensley, S.; Kobrick, M.; Paller, M.; Rodriguez, E.; Roth, L.; et al. The Shuttle Radar Topography Mission. Rev. Geophys. 2007, 45, 2. [Google Scholar] [CrossRef]
Ministry of Environment (Cambodia). Cambodia Forest Cover 2018; Ministry of Environment (Cambodia): Phnom Penh, Cambodia, 2020.
Planet Team. Planet Application Program Interface: In Space for Life on Earth; NASA: San Francisco, CA, USA, 2017. [Google Scholar]
Pedergnana, M.; Marpu, P.R.; Mura, M.D.; Benediktsson, J.A.; Bruzzone, L. Classification of Remote Sensing Optical and LiDAR Data Using Extended Attribute Profiles. IEEE J. Sel. Top. Signal Process. 2012, 6, 856–865. [Google Scholar] [CrossRef]
Lu, D.; Weng, Q. A survey of image classification methods and techniques for improving classification performance. Int. J. Remote Sens. 2007, 28, 823–870. [Google Scholar] [CrossRef]
Zhang, F.; Yang, X. Improving land cover classification in an urbanized coastal area by random forests: The role of variable selection. Remote Sens. Environ. 2020, 251, 112105. [Google Scholar] [CrossRef]
Vizzari, M.; Lesti, G.; Acharki, S. Crop classification in Google Earth Engine: Leveraging Sentinel-1, Sentinel-2, European CAP data, and object-based machine-learning approaches. Geo-Spat. Inf. Sci. 2024, 27, 1–16. [Google Scholar] [CrossRef]
Badda, H.; Cherif, E.K.; Boulaassal, H.; Wahbi, M.; Yazidi Alaoui, O.; Maatouk, M.; Bernardino, A.; Coren, F.; El Kharki, O. Improving the Accuracy of Random Forest Classifier for Identifying Burned Areas in the Tangier-Tetouan-Al Hoceima Region Using Google Earth Engine. Remote Sens. 2023, 15, 4226. [Google Scholar] [CrossRef]
Behnamian, A.; Millard, K.; Banks, S.N.; White, L.; Richardson, M.; Pasher, J. A Systematic Approach for Variable Selection With Random Forests: Achieving Stable Variable Importance Values. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1988–1992. [Google Scholar] [CrossRef]
Kuhn, M. Building Predictive Models in R Using the caret Package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef]
Shih, H.-C.; Stow, D.A.; Tsai, Y.H. Guidance on and comparison of machine learning classifiers for Landsat-based land cover and land use mapping. Int. J. Remote Sens. 2019, 40, 1248–1274. [Google Scholar] [CrossRef]
Sun, J.; Ongsomwang, S. Optimal parameters of random forest for land cover classification with suitable data type and dataset on Google Earth Engine. Front. Earth Sci. 2023, 11, 1188093. [Google Scholar] [CrossRef]
Congalton, R.G.; Green, K. Assessing the Accuracy of Remotely Sensed Data: Principles and Practices; CRC Press: Boca Raton, FL, USA, 2019. [Google Scholar]
Foody, G.M. Explaining the unsuitability of the kappa coefficient in the assessment and comparison of the accuracy of thematic maps obtained by image classification. Remote Sens. Environ. 2020, 239, 111630. [Google Scholar] [CrossRef]
Olofsson, P.; Foody, G.M.; Herold, M.; Stehman, S.V.; Woodcock, C.E.; Wulder, M.A. Good practices for estimating area and assessing accuracy of land change. Remote Sens. Environ. 2014, 148, 42–57. [Google Scholar] [CrossRef]
Congalton, R.G.; Oderwald, R.G.; Mead, R.A. Assessing Landsat classification accuracy using discrete multivariate analysis statistical techniques. Photogramm. Eng. Remote Sens. 1983, 49, 1671–1678. [Google Scholar]
Anderson, J.R.; Hardy, E.E.; Roach, J.T.; Witmer, R.E. A Land Use and Land Cover Classification System for Use with Remote Sensor Data; U.S. Government Printing Office: Washington, DC, USA, 1976; p. 964.
Foody, G.M. Status of land cover classification accuracy assessment. Remote Sens. Environ. 2002, 80, 185–201. [Google Scholar] [CrossRef]
Xia, T.; He, Z.; Cai, Z.; Wang, C.; Wang, W.; Wang, J.; Hu, Q.; Song, Q. Exploring the potential of Chinese GF-6 images for crop mapping in regions with complex agricultural landscapes. Int. J. Appl. Earth Obs. Geoinf. 2022, 107, 102702. [Google Scholar] [CrossRef]
SERVIR-SEA. Cambodia Biophysical Monitoring and Evaluation Dashboard. Available online: https://servir.adpc.net/tools/biophysical-me-dashboard (accessed on 8 November 2024).
Zanaga, D.; Van De Kerchove, R.; De Keersmaecker, W.; Souverijns, N.; Brockmann, C.; Quast, R.; Wevers, J.; Grosu, A.; Paccini, A.; Vergnaud, S.; et al. ESA WorldCover 10 m 2020 v100; Zenodo: Genève, Switzerland, 2021. [Google Scholar] [CrossRef]
IPCC. 2006 IPCC Guidelines for National Greenhouse Gas Inventorie; Eggleston, H.S., Buendia, L., Miwa, K., Ngara, T., Tanabe, K., Eds.; IGES: Kanagawa, Japan, 2006. [Google Scholar]
Jin, Z.; Azzari, G.; You, C.; Di Tommaso, S.; Aston, S.; Burke, M.; Lobell, D.B. Smallholder maize area and yield mapping at national scales with Google Earth Engine. Remote Sens. Environ. 2019, 228, 115–128. [Google Scholar] [CrossRef]
Kasraei, B.; Schmidt, M.G.; Zhang, J.; Bulmer, C.E.; Filatow, D.S.; Arbor, A.; Pennell, T.; Heung, B. A framework for optimizing environmental covariates to support model interpretability in digital soil mapping. Geoderma 2024, 445, 116873. [Google Scholar] [CrossRef]
Liu, H.; An, H. Preliminary tests on the performance of MLC-RFE and SVM-RFE in Lansat-8 image classification. Arab. J. Geosci. 2020, 13, 130. [Google Scholar] [CrossRef]
Wei, X.; Zhang, W.; Zhang, Z.; Huang, H.; Meng, L. Urban land use land cover classification based on GF-6 satellite imagery and multi-feature optimization. Geocarto Int. 2023, 38, 2236579. [Google Scholar] [CrossRef]
Xing, H.; Niu, J.; Feng, Y.; Hou, D.; Wang, Y.; Wang, Z. A coastal wetlands mapping approach of Yellow River Delta with a hierarchical classification and optimal feature selection framework. CATENA 2023, 223, 106897. [Google Scholar] [CrossRef]
Qian, H.; Bao, N.; Meng, D.; Zhou, B.; Lei, H.; Li, H. Mapping and classification of Liao River Delta coastal wetland based on time series and multi-source GaoFen images using stacking ensemble model. Ecol. Inform. 2024, 80, 102488. [Google Scholar] [CrossRef]
Grabska, E.; Frantz, D.; Ostapowicz, K. Evaluation of machine learning algorithms for forest stand species mapping using Sentinel-2 imagery and environmental data in the Polish Carpathians. Remote Sens. Environ. 2020, 251, 112103. [Google Scholar] [CrossRef]
Phan, T.N.; Kuch, V.; Lehnert, L.W. Land Cover Classification using Google Earth Engine and Random Forest Classifier—The Role of Image Composition. Remote Sens. 2020, 12, 2411. [Google Scholar] [CrossRef]
Wang, H.; Liu, C.; Zang, F.; Yang, J.; Li, N.; Rong, Z.; Zhao, C. Impacts of Topography on the Land Cover Classification in the Qilian Mountains, Northwest China. Can. J. Remote Sens. 2020, 46, 344–359. [Google Scholar] [CrossRef]
Cao, W.; Sofia, G.; Tarolli, P. Geomorphometric characterisation of natural and anthropogenic land covers. Prog. Earth Planet. Sci. 2020, 7, 2. [Google Scholar] [CrossRef]
Sovann, C.; Polya, D.A. Improved groundwater geogenic arsenic hazard map for Cambodia. Environ. Chem. 2014, 11, 595–607. [Google Scholar] [CrossRef]
ICEM. Cambodia National Report on Protected Areas and Development. In Review of Protected Areas and Development in the Lower Mekong River Region; Indooroopilly: Queensland, Australia, 2003; p. 148. [Google Scholar]
Zheng, B.; Campbell, J.; Serbin, G.; Daughtry, C. Multitemporal remote sensing of crop residue cover and tillage practices: A validation of the minNDTI strategy in the United States. J. Soil Water Conserv. 2013, 68, 120–131. [Google Scholar] [CrossRef]
Sonmez, N.K.; Slater, B. Measuring Intensity of Tillage and Plant Residue Cover Using Remote Sensing. Eur. J. Remote Sens. 2016, 49, 121–135. [Google Scholar] [CrossRef]
Qin, Q.; Xu, D.; Hou, L.; Shen, B.; Xin, X. Comparing vegetation indices from Sentinel-2 and Landsat 8 under different vegetation gradients based on a controlled grazing experiment. Ecol. Indic. 2021, 133, 108363. [Google Scholar] [CrossRef]
Dai, J.; Roberts, D.; Dennison, P.; Stow, D. Spectral-radiometric differentiation of non-photosynthetic vegetation and soil within Landsat and Sentinel 2 wavebands. Remote Sens. Lett. 2018, 9, 733–742. [Google Scholar] [CrossRef]
Forkuor, G.; Dimobe, K.; Serme, I.; Tondoh, J.E. Landsat-8 vs. Sentinel-2: Examining the added value of sentinel-2’s red-edge bands to land-use and land-cover mapping in Burkina Faso. GIScience Remote Sens. 2018, 55, 331–354. [Google Scholar] [CrossRef]
Amani, M.; Salehi, B.; Mahdavi, S.; Brisco, B. Spectral analysis of wetlands using multi-source optical satellite imagery. ISPRS J. Photogramm. Remote Sens. 2018, 144, 119–136. [Google Scholar] [CrossRef]
Persson, M.; Lindberg, E.; Reese, H. Tree Species Classification with Multi-Temporal Sentinel-2 Data. Remote Sens. 2018, 10, 1794. [Google Scholar] [CrossRef]
Zhai, Y.; Roy, D.P.; Martins, V.S.; Zhang, H.K.; Yan, L.; Li, Z. Conterminous United States Landsat-8 top of atmosphere and surface reflectance tasseled cap transformation coefficients. Remote Sens. Environ. 2022, 274, 112992. [Google Scholar] [CrossRef]
Cohen, W.B.; Spies, T.A.; Fiorella, M. Estimating the age and structure of forests in a multi-ownership landscape of western Oregon, USA. Int. J. Remote Sens. 1995, 16, 721–746. [Google Scholar] [CrossRef]
Duane, M.V.; Cohen, W.B.; Campbell, J.L.; Hudiburg, T.; Turner, D.P.; Weyermann, D.L. Implications of alternative field-sampling designs on Landsat-based mapping of stand age and carbon stocks in Oregon forests. For. Sci. 2010, 56, 405–416. [Google Scholar] [CrossRef]
Pflugmacher, D.; Cohen, W.B.; Kennedy, R.E. Using Landsat-derived disturbance history (1972–2010) to predict current forest structure. Remote Sens. Environ. 2012, 122, 146–165. [Google Scholar] [CrossRef]
Allen, H.; Simonson, W.; Parham, E.; Santos, E.d.B.e.; Hotham, P. Satellite remote sensing of land cover change in a mixed agro-silvo-pastoral landscape in the Alentejo, Portugal. Int. J. Remote Sens. 2018, 39, 4663–4683. [Google Scholar] [CrossRef]
Singh, K.V.; Setia, R.; Sahoo, S.; Prasad, A.; Pateriya, B. Evaluation of NDWI and MNDWI for assessment of waterlogging by integrating digital elevation model and groundwater level. Geocarto Int. 2015, 30, 650–661. [Google Scholar] [CrossRef]
Du, Y.; Zhang, Y.; Ling, F.; Wang, Q.; Li, W.; Li, X. Water Bodies’ Mapping from Sentinel-2 Imagery with Modified Normalized Difference Water Index at 10-m Spatial Resolution Produced by Sharpening the SWIR Band. Remote Sens. 2016, 8, 354. [Google Scholar] [CrossRef]
Gao, S.; Yan, K.; Liu, J.; Pu, J.; Zou, D.; Qi, J.; Mu, X.; Yan, G. Assessment of remote-sensed vegetation indices for estimating forest chlorophyll concentration. Ecol. Indic. 2024, 162, 112001. [Google Scholar] [CrossRef]
Xue, J.; Su, B. Significant Remote Sensing Vegetation Indices: A Review of Developments and Applications. J. Sens. 2017, 2017, 1353691. [Google Scholar] [CrossRef]
Pasternak, M.; Pawluszek-Filipiak, K. The Evaluation of Spectral Vegetation Indexes and Redundancy Reduction on the Accuracy of Crop Type Detection. Appl. Sci. 2022, 12, 5067. [Google Scholar] [CrossRef]
Yoder, B.J.; Waring, R.H. The normalized difference vegetation index of small Douglas-fir canopies with varying chlorophyll concentrations. Remote Sens. Environ. 1994, 49, 81–91. [Google Scholar] [CrossRef]
Xie, Z.; Chen, Y.; Lu, D.; Li, G.; Chen, E. Classification of Land Cover, Forest, and Tree Species Classes with ZiYuan-3 Multispectral and Stereo Data. Remote Sens. 2019, 11, 164. [Google Scholar] [CrossRef]
Macintyre, P.; van Niekerk, A.; Mucina, L. Efficacy of multi-season Sentinel-2 imagery for compositional vegetation classification. Int. J. Appl. Earth Obs. Geoinf. 2020, 85, 101980. [Google Scholar] [CrossRef]
Lira Melo de Oliveira Santos, C.; Augusto Camargo Lamparelli, R.; Kelly Dantas Araújo Figueiredo, G.; Dupuy, S.; Boury, J.; Luciano, A.C.d.S.; Torres, R.d.S.; le Maire, G. Classification of Crops, Pastures, and Tree Plantations along the Season with Multi-Sensor Image Time Series in a Subtropical Agricultural Region. Remote Sens. 2019, 11, 334. [Google Scholar] [CrossRef]
Duarte, D.; Fonte, C.; Costa, H.; Caetano, M. Thematic Comparison between ESA WorldCover 2020 Land Cover Product and a National Land Use Land Cover Map. Land 2023, 12, 490. [Google Scholar] [CrossRef]
Fonte, C.C.; Duarte, D.; Jesus, I.; Costa, H.; Benevides, P.; Moreira, F.; Caetano, M. Accuracy Assessment and Comparison of National, European and Global Land Use Land Cover Maps at the National Scale—Case Study: Portugal. Remote Sens. 2024, 16, 1504. [Google Scholar] [CrossRef]
Van De Kerchove, R.; Zanaga, D.; Xu, P.; Tsendbazar, N.; Lesiv, M. Product User Manual V 2.0; ESA: Paris, France, 2022. [Google Scholar]
Tsendbazar, N.; Li, L.; Koopman, M.; Carter, S.; Herold, M.; Georgieva, I.; Lesiv, M. Product Validation Report (D12-PVR) v 1.1; ESA: Paris, France, 2021. [Google Scholar]
Teck, V.; Poortinga, A.; Riano, C.; Dahal, K.; Legaspi, R.M.B.; Ann, V.; Chea, R. Land use and land cover change implications on agriculture and natural resource management of Koah Nheaek, Mondulkiri province, Cambodia. Remote Sens. Appl. Soc. Environ. 2023, 29, 100895. [Google Scholar] [CrossRef]
Chea, M.; Fraser, B.T.; Nay, S.; Sok, L.; Strasser, H.; Tizard, R. A Survey of Changes in Grasslands within the Tonle Sap Lake Landscape from 2004 to 2023. Diversity 2024, 16, 448. [Google Scholar] [CrossRef]
Venter, Z.S.; Barton, D.N.; Chakraborty, T.; Simensen, T.; Singh, G. Global 10 m Land Use Land Cover Datasets: A Comparison of Dynamic World, World Cover and Esri Land Cover. Remote Sens. 2022, 14, 4101. [Google Scholar] [CrossRef]
Zhai, J.; Xiao, C.; Feng, Z.; Liu, Y. Are there suitable global datasets for monitoring of land use and land cover in the tropics? Evidences from mainland Southeast Asia. Glob. Planet. Change 2023, 229, 104233. [Google Scholar] [CrossRef]
Pereira, S.C.; Lopes, C.; Pedro Pedroso, J. Mapping Cashew Orchards in Cantanhez National Park (Guinea-Bissau). Remote Sens. Appl. Soc. Environ. 2022, 26, 100746. [Google Scholar] [CrossRef]
Chaya, V.; Poortinga, A.; Nimol, K.; Sokleap, S.; Sophorn, M.; Chhin, P.; McMahon, A.; Nicolau, A.P.; Tenneson, K.; Saah, D. Is Cambodia the World’s Largest Cashew Producer? arXiv 2024, arXiv:2405.16926. [Google Scholar]
Keo, C.; Ngin, H.; Michael, B.; Sathya, S.; Ky, B. Cambodian Cashew Nut Value Chain Assessment Report; HEKS/EPER Cambodia: Phnom Penh, Cambodia, 2019. [Google Scholar]

Figure 1. Map of Phnom Kulen National Park (Kulen, green polygon), its 10 km buffer zone (black-dashed polygon), highlighted villages, and stream system as the upstream source for the Angkor Wat region.

Figure 2. Flowchart of the methodology applied for the land cover classification. The Random Forest models included: (1) annual spectral bands only (hereafter referred to as “Spectral”), (2) annual spectral bands combined with spectral indices (Spectral+SI), (3) Spectral+SI combined with bi-seasonal differences (Spectral+SI+Diff), (4) Spectral+SI+Diff combined with topographic variables (Spectral+SI+Diff+Topo), (5) RFE-selected variables (RFEvar), and (6) RFEvar with hyperparameter-tuned Random Forest (RFEvar-Hyper). All models, except RFEvar-Hyper, were classified using a 300-tree Random Forest; however, RFEvar-Hyper employed optimal ntree and mtry hyperparameters identified by grid search.

Figure 3. Variable importance rankings of all 65 variables.

Figure 4. Pearson correlation matrix of the 25 variables retained after applying correlation-based filtering at a 0.90 threshold.

Figure 5. Changes in accuracy (mean ± SD) depending on numbers of included selected variables (a), and key variables identified for land cover mapping (b).

Figure 6. Comparison of overall accuracy and kappa among different methods. “Spectral”: Random Forest model with annual spectral bands (10 variables, 300-tree Random Forest); “Spectral+SI”: RF model with annual spectral bands and indices (31 variables, 300-tree RF); “Spectral+SI+Diff”: RF model with spectral bands, indices, and bi-seasonal differences (62 variables, 300-tree RF); “Spectral+SI+Diff+Topo”: RF model with Spectral+SI+Diff and topographic variables (65 variables, 300-tree RF); “RFEvar”: RF model with 19 optimized variables from Spectral+SI+Diff+Topo (300-tree Random Forest); “RFEvar-Hyper”: hyperparameter tuning of RFEvar. Statistical significance between groups is indicated as follows: ns (not significant, one-sided p-value > 0.05), *** (one-sided p-value < 0.001).

Figure 7. Comparison of user accuracy (a), producer accuracy (b), and F1 score (c) among different methods.

Figure 8. Comparison of land cover maps of (a) KuLandCover, (b) the European Space Agency WorldCover 2020 (ESA), (c) the SERVIR-SEA Cambodia National Land Cover 2021 (SERVIR), (d) zoomed view of the purple outline, (e) zoomed view of the white outline, and (f) zoomed view of the blue outline.

Figure 9. Land cover map (a) and land cover composition (b–d) of Kulen with surrounding area in 2021.

Table 1. Remote sensing indices used for the land cover classification.

Index ID	Index Name	Bands Used	Formula	Application	Reference
Water-related indices
MNDWI	Modified Normalized Difference Water Index	Green, SWIR1	$\frac{(G r e e n - S W I R 1)}{(G r e e n + S W I R 1)}$	Improving water variable visibility while reducing noise from built-up land, vegetation, and soil.	[56]
NDWI	Normalized Difference Water Index	Green, NIR	$\frac{(G r e e n - N I R)}{(G r e e n + N I R)}$	Detecting surface water bodies and moisture content variations in landscapes.	[57]
NDMI	Normalized Difference Moisture Index	NIR, SWIR1	$\frac{(N I R - S W I R 1)}{(N I R + S W I R 1)}$	Assessing vegetation and soil moisture contents.	[58]
Disturbance indices
NBR	Normalized Burn Ratio	NIR, SWIR2	$\frac{(N I R - S W I R 2)}{(N I R + S W I R 2)}$	Assessing forest fire severity and natural reforestation.	[59]
NDTI	Normalized Difference Tillage Index	SWIR1, SWIR2	$\frac{(S W I R 1 - S W I R 2)}{(S W I R 1 + S W I R 2)}$	- Distinguishing non-photosynthetic vegetation biomass from green vegetation biomass - Assessing tillage intensity, soil disturbance, and agricultural land management practices.	[60]
Vegetation-related indices
NDVI	Normalized Difference Vegetation Index	Red, NIR	$\frac{(N I R - R e d)}{(N I R + R e d)}$	Commonly used for vegetation density, health, and greenness.	[61]
EVI2	2-band Enhanced Vegetation Index	Red, NIR	$2.5 \times \frac{(N I R - R e d)}{N I R + (2.4 \times R e d) + 1}$	Enhancing vegetation health and dynamics monitoring through its sensitivity to dense vegetation and strong correlation with forest ecosystem gross primary production.	[62,63]
GNDVI	Green Normalized Difference Vegetation Index	Green, NIR	$\frac{(N I R - G r e e n)}{(N I R + G r e e n)}$	Estimating photosynthetic activity and to determine water and nitrogen uptake into the plant canopy.	[64]
SAVI	Soil Adjusted Vegetation Index	Red, NIR	$1.5 \times \frac{(N I R - R e d)}{(N I R + R e d + 0.5)}$	Compensating for soil brightness to improve vegetation indices’ accuracy.	[65]
Chlorophyll indices
NDRE	Normalized Difference Red Edge Index	RE1, NIR	$\frac{(N I R - R E 1)}{(N I R + R E 1)}$	Assessing plant chlorophyll content using red-edge spectral regions, especially in mid-to-late growing season when the plants are mature and ready to be harvested.	[66]
MCARI	Modified Chlorophyll Absorption Ratio Index	Red, RE1, RE2	$[(R E 2 - R E 1) -$ $0.2 \times (R E 2 - R e d)] \times (\frac{R E 2}{R E 1})$	Quantifying leaf chlorophyll concentration, minimizing soil background effects the background reflectance from soil and other non-photosynthetic materials observed.	[67,68]
Build-up index
BUI	Build-Up Index	Red, NIR, SWIR1	$\frac{(S W I R 1 - N I R)}{(S W I R 1 + N I R)} - N D V I$	Distinguishing urban from non-urban land cover.	[69]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sovann, C.; Olin, S.; Mansourian, A.; Sakhoeun, S.; Prey, S.; Kok, S.; Tagesson, T. Importance of Spectral Information, Seasonality, and Topography on Land Cover Classification of Tropical Land Cover Mapping. Remote Sens. 2025, 17, 1551. https://doi.org/10.3390/rs17091551

AMA Style

Sovann C, Olin S, Mansourian A, Sakhoeun S, Prey S, Kok S, Tagesson T. Importance of Spectral Information, Seasonality, and Topography on Land Cover Classification of Tropical Land Cover Mapping. Remote Sensing. 2025; 17(9):1551. https://doi.org/10.3390/rs17091551

Chicago/Turabian Style

Sovann, Chansopheaktra, Stefan Olin, Ali Mansourian, Sakada Sakhoeun, Sovann Prey, Sothea Kok, and Torbern Tagesson. 2025. "Importance of Spectral Information, Seasonality, and Topography on Land Cover Classification of Tropical Land Cover Mapping" Remote Sensing 17, no. 9: 1551. https://doi.org/10.3390/rs17091551

APA Style

Sovann, C., Olin, S., Mansourian, A., Sakhoeun, S., Prey, S., Kok, S., & Tagesson, T. (2025). Importance of Spectral Information, Seasonality, and Topography on Land Cover Classification of Tropical Land Cover Mapping. Remote Sensing, 17(9), 1551. https://doi.org/10.3390/rs17091551

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Importance of Spectral Information, Seasonality, and Topography on Land Cover Classification of Tropical Land Cover Mapping

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Methodology

2.2.1. Overview

2.2.2. Data Sources

Sentinel-2 Dataset

Remote Sensing Indices

Bi-Seasonal Difference

Topographic Data

Reference Data of Land Cover Classes

2.2.3. Data Analyses

Random Forest Classifier and Variable Importance

Variable Selections

Comparing Different Random Forest Models

Comparison with Other Land Cover Products

3. Results

3.1. Variable Selection and Variable Importance Ranking

3.2. Impact of Spectral Indices, Bi-Seasonal Differences, and Topography on Accuracy in Land Cover Classification

3.3. Comparison with Other Land Cover Products

3.4. Final Land Cover Map

4. Discussions

4.1. Variable Selection and Variable Importance Ranking

4.1.1. Correlation-Based Filtering and Recursive Feature Elimination

4.1.2. Variable Importance

Impact of Topography, Tillage, SWIR, Red Edge, Water, and Vegetation Indices on Land Cover Mapping of Tropical Regions

Significance of Multi-Temporal Data in Tropical Land Cover Classification

4.2. Comparison with Other Land Cover Products

4.3. Land Cover of Kulen

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI