Ensemble Learning for the Land Cover Classification of Loess Hills in the Eastern Qinghai–Tibet Plateau Using GF-7 Multitemporal Imagery

Shi, Feifei; Gao, Xiaohong; Li, Runxiang; Zhang, Hao

doi:10.3390/rs16142556

Open AccessArticle

Ensemble Learning for the Land Cover Classification of Loess Hills in the Eastern Qinghai–Tibet Plateau Using GF-7 Multitemporal Imagery

¹

School of Geographical Science, Qinghai Normal University, Xining 810008, China

²

Institute of Qinghai Meteorological Science Research, Xining 810008, China

³

Ministry of Education Key Laboratory of Tibetan Plateau Land Surface Processes and Ecological Conservation, Xining 810008, China

⁴

Qinghai Province Key Laboratory of Physical Geography and Environmental Process, Xining 810008, China

⁵

Qinghai Province Key Laboratory of Disaster Prevention and Mitigation, Xining 810008, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(14), 2556; https://doi.org/10.3390/rs16142556

Submission received: 14 June 2024 / Revised: 9 July 2024 / Accepted: 10 July 2024 / Published: 12 July 2024

(This article belongs to the Special Issue Monitoring Cold-Region Water Cycles Using Remote Sensing Big Data)

Download

Browse Figures

Versions Notes

Abstract

:

The unique geographic environment, diverse ecosystems, and complex landforms of the Qinghai–Tibet Plateau make accurate land cover classification a significant challenge in plateau earth sciences. Given advancements in machine learning and satellite remote sensing technology, this study investigates whether emerging ensemble learning classifiers and submeter-level stereoscopic images can significantly improve land cover classification accuracy in the complex terrain of the Qinghai–Tibet Plateau. This study utilizes multitemporal submeter-level GF-7 stereoscopic images to evaluate the accuracy of 11 typical ensemble learning classifiers (representing bagging, boosting, stacking, and voting strategies) and 3 classification datasets (single-temporal, multitemporal, and feature-optimized datasets) for land cover classification in the loess hilly area of the Eastern Qinghai–Tibet Plateau. The results indicate that compared to traditional single strong classifiers (such as CART, SVM, and MLPC), ensemble learning classifiers can improve land cover classification accuracy by 5% to 9%. The classification accuracy differences among the 11 ensemble learning classifiers are generally within 1% to 3%, with HistGBoost, LightGBM, and AdaBoost-DT achieving a classification accuracy comparable to CNNs, with the highest overall classification accuracy (OA) exceeding 93.3%. All ensemble learning classifiers achieved better classification accuracy using multitemporal datasets, with the classification accuracy differences among the three classification datasets generally within 1% to 3%. Feature selection and feature importance evaluation show that spectral bands (e.g., the summer near-infrared (NIR-S) band), topographic factors (e.g., the digital elevation model (DEM)), and spectral indices (e.g., the summer resident ratio index (RRI-S)) significantly contribute to the accuracy of each ensemble learning classifier. Using feature-optimized datasets, ensemble classifiers can improve classification efficiency. This study preliminarily confirms that GF-7 images are suitable for land cover classification in complex terrains and that using ensemble learning classifiers and multitemporal datasets can improve classification accuracy.

Keywords:

land cover classification; ensemble learning; multitemporal imagery; GF-7 images; Huangshui River Basin

Graphical Abstract

1. Introduction

Land use and land cover (LULC) data play a vital role in climate change, hydrological modeling, ecosystem studies, and land management policy formulation [1,2]. Accurate estimates of land use/land cover change (LUCC) help scientists and decision makers to better understand and address environmental challenges and achieve a balance between socioeconomic development and ecological protection [3]. As the highest plateau in the world, the Qinghai–Tibet Plateau has a unique ecosystem, a fragile ecological environment, and abundant water resources. For the past half-century, under the influence of climate change, human activities, and the “China Western Development” policy, significant changes in land use modes and land cover patterns have occurred. Given the important position of the Qinghai–Tibet Plateau in terms of its global climate and ecosystem, the study of Qinghai–Tibet Plateau LUCC is of great significance for ecological and environmental protection, climate change research, and sustainable development in the region and globally [4,5,6,7].

The Huangshui River Basin is part of the transition zone between the Qinghai–Tibet Plateau and the Loess Plateau. It has diverse landforms, including middle–high mountains, loess hills, and river valley plains. The loess hill areas are affected by concentrated precipitation, fragmented topography, many gullies, and a localized microclimate. There is large heterogeneity in vegetation cover and a wide variety of crop species, specifically reflected in the common phenomena of “different objects with same spectrum” and the “same object with different spectra” in the ground features of remote sensing images. Therefore, in the Huangshui River Basin, obtaining more accurate land cover data is challenging [8,9].

Since 2010, researchers have continued to research LULC classification methods in the Huangshui River Basin. Firstly, Li [10] used 2011 Landsat Thematic Mapper (TM) images and three classifiers to classify land cover in watersheds. The results showed that the decision tree (DT) method achieved the highest accuracy and an overall classification accuracy of 84.93%. Thereafter, Jia [11] introduced the object-oriented classification method to better solve the classification map fragmentation problem and increased the overall classification accuracy to 86.59%. In recent years, with the continuous optimization and development of machine learning models, novel ensemble learning methods have been gradually introduced for remote sensing land cover classification. For example, based on Landsat 8 Operational Land Imager (OLI) imagery, Gu [12], Ma [13], and Shen [14] confirmed that the classification accuracy of the random forest (RF) was superior to that of the Three-layer Back Propagation Neural Network (BPNN), DT, and support vector machine (SVM). Li et al. [15] used gradient boosting decision trees (GBDTs) and the RF to conduct land cover classification; the classification accuracy of the GBDTs was better than that of the RF. In summary, previous classification studies in the Huangshui River Basin mainly used Landsat series satellite data and attempted to find a better classifier for complex terrain areas to improve classification accuracy. Given the spatial resolution limitation of 15–30 m, it is very difficult to significantly improve the land cover classification accuracy of complex terrain areas using the classifier alone.

The successful launch of new high-resolution remote sensing satellites, including the European Space Agency (ESA) Sentinel series and the Chinese Gaofen (GF) series, has created new data opportunities to improve the classification accuracy of land cover remote sensing. For example, Cui and Li et al. [16] employed Gaofen-5 HSI data and the innovative Superpixel-based and Spatially regularized Diffusion Learning (S²DL) algorithm for mapping unsupervised mangrove species in the Mai Po Nature Reserve, Hong Kong. Maung et al. [17] utilized Sentinel-2 (10 m) imagery and employed the U-Net model for LULC classification in the Wunbaik Mangrove Area in Myanmar. Lam et al. [18] utilized C-band Sentinel-1 SAR time series dual-polarization (VV/VH) data and employed a Convolutional Neural Network (CNN), RF, and multilayer perceptron (MLP) for LULC classification in the Mekong Basin in Vietnam. Meanwhile, some scholars have also used high-spatial-resolution images to carry out land cover classification in the Huangshui River Basin. For example, Tang et al. [9] used Satellite pour l’Observation de la Terre (SOPT) 6 (6 m) images, fused various classification feature data in this watershed, and achieved an overall classification accuracy of more than 90%, preliminarily confirming that using higher-spatial-resolution imagery can improve the land cover classification accuracy in watersheds. Li et al. [19] used five remote sensing images—GF-2 (4 m), SPOT-6 (6 m), Sentinel-2A (10 m), and Landsat-8 (15 m/30 m)—and found that the classification accuracy was better in this watershed when using remote sensing images with a resolution higher than 10 m.

Although the overall classification accuracy of land cover in the Huangshui Basin based on 10 m and greater than 10 m high-resolution images reached more than 90%, it is still difficult to obtain a high level of classification accuracy for some of the ground feature classes located in the mountainous and loess hill areas in the basin [20]. Specifically, ground field surveys and comparisons of unmanned aerial vehicle (UAV) aerial photography showed that the land cover types of these “misclassification prone” areas—for example, the locations of steep slopes between artificial terraces in mountainous and hilly areas—were often misclassified as farmland. In addition, in ecological protection and construction project areas, such as returning farmland to forest and national land greening, forested land, forestland returned from farmland, farmland, and grassland types are easily confused in terms of the color tone of the spots on the image. The main reason for the above problems may be that the spatial and spectral resolutions of the images used are insufficient for areas with complex terrain. It is also difficult to effectively distinguish and resolve the spectral and textural features of mixed pixels. Secondly, the commonly used digital elevation model (DEM) with a spatial resolution of 30 m is one of the main limiting factors for areas with complex terrain. In addition, there may be a high degree of similarity in the spectrum and texture of objects to be classified. This high similarity is related to the insufficient identification ability of the classifiers used in the existing studies.

Therefore, in the abovementioned topographic areas, we strive to achieve a high classification accuracy. However, it is important to note that it is impossible to improve the accuracy by selecting a better classifier alone. Thus, it is very important to obtain images with ultra-high spatial resolution. In fact, since the late 1990s, submeter-scale optical imagery such as Quick Bird (0.61 m), Worldview-2 (0.5 m), and GF-2 (0.8 m) images has been used by researchers. These data are used in urban land classification and change studies [21,22]. However, due to their high cost, obtaining such data in complex hilly and mountainous terrain areas is relatively uncommon. Data from the China GF-7 (GF-7) satellite were officially used in August 2020. Compared with the data of other GF satellites, these data are easier to obtain and can be used to continuously observe submeter-level optical stereo images. In turn, these data can generate submeter-level stereo images and construct digital surface models (DSMs). Whether these satellite data can improve the accuracy of land cover classification in complex terrain areas is a research topic worthy of exploration.

With the continuous development of satellite remote sensing technology, developing machine learning algorithms for land cover classification has become a focal point of remote sensing research [10,11,12,13,14,15,16,17,18,19]. Since the term “machine learning” was proposed by Arthur Samuel in 1959, numerous machine learning algorithms (such as SVM, BPNN, DT, RF, GBDT, and CNN) have emerged and have been widely used in remote sensing classification scenarios such as land cover, wetland, and crop [23,24,25,26,27,28,29]. Among the abovementioned machine learning methods, RFs and GBDTs are considered ensemble learning methods, while CNNs are considered single strong classifiers. This theory was first proposed by Hansen and Salamon. The core idea is that any single strong classifier has its own advantages and deficiencies. Several base classifiers are combined to improve the classification performance of complex data and obtain a classification accuracy higher than that of a single classifier [30]. At present, various types of ensemble learning methods, such as bagging, boosting, stacking, and voting, have been developed. The RF is a typical bagging-type method. Many studies have shown that this algorithm can obtain better classification results and generalizability. Still, for other ensemble learning classifiers, the performance and potential of land cover classification have not been fully explored [31].

Based on previous studies in land cover classification, it is evident that current practices predominantly rely on remote sensing images with spatial resolutions ranging from 2 to 30 m. Few studies have evaluated the suitability of submeter-level stereoscopic images for complex land cover classification scenarios. Furthermore, the performance and potential of emerging ensemble learning classifiers, such as HistGBoost and LightGBM, have rarely been compared and tested for land cover classification across the intricate terrains of the Qinghai–Tibet Plateau. In this study, loess hills in the Eastern Qinghai–Tibet Plateau were selected as an example. Eleven ensemble learning classifiers, three classification datasets, and four single strong classifiers were used to conduct land cover classification studies in complex topographical areas. The classification results were validated using ground-based surveys and low-altitude UAV remote sensing data. The aim of this study was to explore a refined land cover classification method suitable for complex topographic areas. The main research objectives are as follows: (1) to prove the excellent applicability of GF-7 satellite data to the land cover classification of complex surfaces; (2) to determine the effects of single-temporal data, multitemporal data, and feature optimization data on the accuracy and efficiency of land cover classification; and (3) to compare the performance of 11 ensemble learning classifiers in the land cover classification of complex terrain areas.

2. Study Area and Data Sources

2.1. Research Area

The Huangshui River Basin is located in the northeastern corner of the Qinghai–Tibet Plateau and belongs to the transition zone between the Qinghai–Tibet Plateau and the Loess Plateau (Figure 1). The total area is approximately 1.6 × 10⁴ km², and the terrain slope declines from northwest to southeast with an altitude of 1655–4860 m. Due to long-term water erosion, the basin has gradually formed a geomorphic pattern with alternating canyons and basins. The climate is arid and semiarid, and the complex topographic structure and unique climatic characteristics make the basin rich in natural landscapes, with various land cover types and high spatial heterogeneity [8]. In 2020, the basin’s population exceeded 3 million, the urbanization rate exceeded 80%, and the contradiction between humans and the land is prominent [32]. The valley plain and loess hill areas in the central part of the Huangshui River Basin were selected as the study area. There are diverse land cover types, including forestlands, open forestlands, shrub forestlands and shrubs, nurseries and orchards, terraced fields, dry lands, and grasslands. There are also severely fragmented ground features in this area, making land cover classification difficult.

2.2. Data and Preprocessing

2.2.1. GF-7 Images and Preprocessing

Completely overlapping GF-7 stereo image data with the same track were obtained on 25 August 2020 and 18 February 2021. These data include rearview panchromatic (0.65 m), multispectral (2.6 m), and front-view panchromatic image (0.8 m) data. The images were obtained from the land observation satellite data service platform of the China Center for Resource Satellite Data and Applications (CRESDA, http://www.cresda.com/CN/ (accessed on 1 May 2021)). Detailed information is shown in Table 1. After the GF-7 original images were decompressed, radiometric calibration, FLAASH atmospheric correction, and orthorectification were performed sequentially on the backsight multispectral image. At the same time, radiometric calibration and orthorectification were performed on the front- and rear-view panchromatic image. Because the nearest-neighbor diffusion pan-sharpening algorithm (NNDiffuse) can achieve high performance on large, ultra-high-resolution images and is suitable for application scenarios that require a high degree of detail, we used this tool for the image fusion of preprocessed rear-view panchromatic and multispectral images [22]. An enhanced image with a 0.65 m spatial resolution and four multispectral bands was generated. All the above preprocessing steps were completed with ENVI 5.5 software (Exelis Visual Information Solutions, Boulder, CO, USA).

2.2.2. DEM Production Based on a GF-7 Stereo Image Pair

Previous studies have shown that using a DEM and its derived topographical feature data, such as slope, aspect, and hillshade data, can improve the accuracy of land cover classification in complex terrain areas [14,20]. The 30 m resolution of the Shuttle Radar Topography Mission (SRTM) DEM, advanced spaceborne thermal emission and reflection radiometer (ASTER) global DEM (GDEM), and ASTER GDEM V2 images differed significantly from the 0.65 m resolution of the GF-7 images. Therefore, we used GF-7 satellite images acquired on 18 February 2021. A stereo pair was used to produce a DEM. In the ENVI 5.5 software, point cloud data and DSM were generated by dense image matching through image registration, block adjustment, 3D reconstruction, and 3D point cloud filtering. Next, in the Global Mapper 18.2 software (Global Mapper software, Hallowell, ME, USA), classification and hierarchical filtering were performed on the point cloud data according to the ground, low vegetation, and building layers. After the height information of all objects (including low vegetation and buildings) on the ground surface was removed, a triangulated irregular network (TIN) was created and ultimately converted to the DEM data of a raster.

2.2.3. UAV Images and Preprocessing

UAV images feature a high spatial resolution, real-time performance, easy acquisition, and accurate positioning and can be used in land cover surveys and authenticity tests [33]. In this study, a Phantom 4 RTK multirotor high-precision aerial surveying UAV (SZ DJI Technology Co., Shenzhen, China) and a 20-megapixel CMOS RGB camera were used; the accuracy of these devices reached the centimeter level in image control-free operation. In April and July 2021, low-altitude aerial photographs of seven typical areas in the study area (mainly regions with complex ground objects prone to misclassification, such as areas of reforestation and grassland restoration and inaccessible regions due to a complex terrain) were captured using planned routes, ensuring that the seasonal phases of the obtained UAV data were consistent with the GF-7 imagery. The flying altitudes of the UAVs were set to 200 m, and the coverage areas of the routes were greater than 1 km². After the camera parameters, image-matching parameters, and geographic coordinate system were set in Pix4D mapper 4.5.6 (professional photogrammetry software, CH), the software automatically performed the processing steps, such as image matching, 3D reconstruction, and point cloud generation. Then, the UAV orthophotos were generated, and a DSM was constructed (Figure 2). A manual comparison showed that the spatial resolution of the UAV orthophotos was better than 7 cm and had good spatial matching with the GF-7 images.

2.2.4. Classification Sample Data

Two periods, summer (August 2020) and winter (February 2021), were selected for ground surveys in the study area to ensure consistency with the GF-7 imagery acquisition times. During this period, global navigation satellite system (GNSS) receivers (Huace X900 GNSS RTK, Huace Navigation Inc., Hangzhou, China) and single-lens reflex cameras (Canon EOS 5D, Canon Inc., Tokyo, Japan) were used to locate and photograph each of the surface cover types according to the planned survey routes. At the same time, for some “difficult-to-reach” and “prone to misclassification” areas, a surveying-grade UAV was used to acquire orthophotos, and ArcGIS 10.3 software (ESRI Inc., Redlands, CA, USA)was used to randomly generate sampling points in the UAV image area, which were labeled after manual visual identification. The training and validation samples used in this study were mainly composed of ground manual survey samples and UAV image collection samples for a total of 2041 samples.

2.2.5. Classification System

The determination of the classification system is the basis for land cover classification. The land cover classification system used in the study area was developed with reference to the National Remote Sensing Land Use and Land Cover Classification System [34] and consideration of the actual land cover types. The marker library and interpretation are shown in Table 2. The library includes six land cover types, i.e., urban land, unutilized land, grassland, farmland, forestland, and forestland returned from farmland, as well as shadow areas caused by ground object occlusion. In the GF-7 images, the shadow areas caused by terrain and ground object occlusion were ubiquitous, and they were often confused with forestland and water bodies during classification. Therefore, this study also included shadowed areas as a single category. At the same time, forestlands and forestlands returned from farmland were included in the study area. The two previously mentioned forestland types partially overlap spatially, and the spectral characteristics of the GF-7 images were also similar. Previous studies often conflated the above two forestland types as one type. However, we attempted to achieve a refined identification of the two forestland types [20].

3. Methods

3.1. Overview

The land cover classification workflow in this study involved four steps (Figure 3): (1) After the GF-7 images were preprocessed, multispectral images and DEMs corresponding to the winter and summer seasons were obtained. A total of 34 distinct classification features were derived through the computation of spectral indices, extraction of texture features, and consideration of topographic factors. The above classification features were combined and optimized to form three datasets for LC classification: a single-temporal classification dataset, a multitemporal classification dataset, and a feature optimization classification dataset. (2) Detailed information on each type of surface coverage within the study area was collated via the synergy of UAV remote sensing and field surveys, culminating in a comprehensive classification sample library. To ensure that the training and validation sets accurately reflected the actual distribution of ground features, they were delineated using a stratified random sampling method, with training sets comprising 70% and validation sets comprising 30%. (3) A total of 11 ensemble learning classifiers, along with 4 single powerful classifiers, were chosen for LC classification. A meticulous hyperparameter tuning and selection process was employed to guarantee that each classifier performed optimally during training. (4) Utilizing the 11 ensemble learning classifiers across the 3 classification datasets, the GF-7 imagery was employed for LC classification. This multifaceted dataset and multimodel approach offered a comprehensive and robust LC classification methodology. It facilitated an in-depth exploration of the impact of different classifiers and datasets on the LC classification accuracy, aiming to identify the most suitable classification method for the Huangshui River Basin.

3.2. Classifiers

Since Dasarathy et al. proposed the idea of ensemble learning in 1979, various ensemble learning classifiers have been developed. However, ensemble learning strategies are based mainly on bagging, boosting, stacking, and voting [30,31]. In this study, we chose 11 ensemble learning classifiers and 4 machine learning classifiers (SVM, MLP, CNN, and a classification and regression tree (CART)) to focus on the comparative analysis of the effectiveness of the ensemble learning method in improving the classification performance on the land cover classification scenario in complex terrain areas. The above classifiers were all implemented using Python machine learning libraries such as Python 3.8.13,Scikit-Learn 1.0.2, categorical boosting (CatBoost 1.0.5), light gradient boosting machine (LightGBM 3.3.2), extreme gradient boosting (XGBoost 1.6.0), and TensorFlow 2.8.0. Reasonably setting the hyperparameters of each classifier during training is critical for improving the performance and generalizability of the classifier. The hyperparameters that need to be set for the 15 classifiers and their main functions are listed in Table 3. Due to the large number of hyperparameters that need to be set for each classifier, it is difficult to manually select the optimal combination. In this study, we used a random grid search method to efficiently select candidate hyperparameters; moreover, 10-fold cross-validation and 10 replications were performed to more accurately estimate the performance of each set of hyperparameters and improve the robustness of the results. The classification sample library, mainly composed of ground manual survey samples and UAV image collection samples, was divided using stratified random sampling, with 70% of the samples used for training each classifier and 30% used to test the classification accuracy of each classifier. The hardware used in the classification experiment included 2 Xeon Gold 6246R CPUs, 2 RTX A6000 GPUs, 512 GB of memory, and a 2TB SSD.

3.2.1. Bagging

Bagging aims to reduce the model’s variance by randomly selecting multiple subsample sets, training base learners in parallel on each subsample set, and finally averaging or voting on the prediction results of each base learner. The robustness of the model is improved [24,31]. In this study, we chose two typical representative classifiers that belong to the bagging ensemble strategy, namely, RF and extremely randomized trees (ExtraTrees). The difference between the two methods is that ExtraTrees introduces more randomness in the feature selection and selection of the splitting threshold.

3.2.2. Boosting

Boosting emphasizes the synergy of base classifiers. In each iteration, the base classifiers are synergistically adjusted. The learner can focus more on the samples misclassified by the previous learner, better fitting the misclassified samples to construct a model with better generalizability [31,35]. In this study, we selected six representative algorithms for boosting ensemble strategies, namely, histogram-based gradient boosting classification tree (HistGBoost), XGBoost, LightGBM, adaptive boosting to cascade multiple decision trees (AdaBoost-DT), CatBoost, and GBDT. AdaBoost is a pioneering boosting method that uses any base learner and adjusts the weights of misclassified samples in each iteration to improve the model’s performance. AdaBoost-DT uses decision trees as base classifiers and is a special implementation of AdaBoost. The GBDT is also a type of boosting technique, but its base classifier is only a decision tree. This technique gradually optimizes the model’s performance by adjusting the fitting residual in each iteration. XGBoost introduces regularization and Taylor expansion technology based on the GBDT to improve the generalizability and efficiency of the model. HistGBoost is a variant of XGBoost that processes a dataset into a histogram by introducing histogram optimization technology, and the results are displayed in the histogram. The optimal splitting point is selected on the graph to improve the performance and speed of the model on large datasets [36]. The difference between the LightGBM and HistGBoost algorithms is that HistGBoost uses a depth-limited leafwise splitting strategy. This strategy can more efficiently select the splitting point with the maximum gain and accelerate the model’s training on large-scale datasets. CatBoost can automatically process categorical features to simplify feature engineering and uses the symmetric tree algorithm to improve the model’s accuracy [36].

3.2.3. Stacking

Stacking combines the prediction results of multiple heterogeneous primary classifiers as the input of a secondary classifier (also referred to as a metaclassifier), and the final prediction is made by the secondary classifier. Stacking can flexibly combine different types of classifiers and leverage the advantages of multiple classifiers. The advantage of these classifiers is that they represent asymptotically optimal learning systems [24,37]. In this study, we used HistGBoost as a secondary classifier for stacking the LightGBM, XGBoost, and RF primary classifiers.

3.2.4. Voting

Voting combines the prediction results of multiple classifiers through hard voting and soft voting. Both voting methods can achieve better generalizability than stacking by combining the prediction results of the base classifiers. In addition, voting does not involve multiple levels; thus, its structure is simpler [38]. In this study, we referred to the classification results of four classifiers, LightGBM, XGBoost, HistGBoost, and AdaBoost-DT, to yield the final classification results.

3.2.5. Single Strong Classifier

MLPC, SVM, and CART are selected as typical representatives of traditional strong classifiers. MLPC is a simple feedforward artificial neural network (ANN) classifier composed of multiple units called neurons, and the neurons are organized into an input layer, an output layer, and one or more hidden layers interconnected by neuron connections [18]. In this study, a three-layer multilayer perceptron neural network with two hidden nodes and seven output classes is employed. The hyperparameters of MLPC, including the number of neurons in the hidden layers and the activation functions, are configured using a random grid search method.

SVM is a classical supervised learning algorithm successfully applied in domains such as image classification, object detection, and pattern recognition [39,40]. Its effectiveness stems from its ability to efficiently handle high-dimensional data and large-scale datasets while demonstrating robust performance even in the presence of noisy data and limited training samples [41,42]. The combination of random grid search and cross-validation is used to optimize the kernel function and regularization parameter of the SVM in this study. However, it is noteworthy that in addition to a traditional SVM, there are several variants of SVMs, such as the ν-support vector classifier (νSVC), which directly controls the number of support vectors through the ν parameter, thereby adjusting the model’s complexity and generalization ability [43]. In future research, some improved versions of SVMs should be widely used in LULC classification. CART is a tree-based machine learning algorithm with interpretability and intuitiveness, making it suitable for handling multi-output problems [44,45].

A CNN, as a widely used deep learning framework in high-spatial-resolution image classification tasks, has a better generalization ability and higher classification accuracy than traditional single strong classifiers. In constructing the CNN model, this study mainly refers to the study by Li et al. [19], which used a CNN model for land cover classification in the Huangshui River Basin. The CNN structure built in this study consists of nine layers, employing two convolution kernels (5 × 5 pixels and 3 × 3 pixels) to fully extract features of different scales. Each convolution layer is followed by an activation function. To avoid excessive information loss, only a max-pooling layer of 2 × 2 pixels size and stride of 2 is used after the fourth convolution layer and activation function. Additionally, a Dropout layer is employed after the second fully connected layer to mitigate overfitting. Finally, a softmax classifier is used to complete the classification task. Since CNN models cannot directly utilize point-based training samples as constructed in this study, we implemented the following transformation: in the classification sample library, 70% of the sample points were randomly selected and spatially overlaid onto the classification dataset using ArcGIS 10.3 software (ESRI Inc., Redlands, CA, USA). For each selected sample point, a rectangular window of 5 × 5 pixels was used to extract a subset from each classification dataset and assign a land cover type label, creating a suitable sample set for CNN model training.

3.3. Classification Feature Extraction and Optimization

Twelve spectral indices, including the normalized difference vegetation index (NDVI), normalized difference water index (NDWI), and enhanced vegetation index (EVI), in summer and winter, were calculated using the GF-7 multispectral images. Two time series (TS) spectral indices, NDVI-TS and EVI-TS, were also calculated to reflect vegetation growth and change. The four spectral bands of the GF-7 summertime phase were changed through principal components, and the first principal component was extracted to participate in the calculation of the gray-level cooccurrence matrix (GLCM). Eight texture features, such as contrast, correlation, and entropy, were obtained [20]. The GF-7 DEM was used to calculate and generate topographic factors such as slope, aspect, and hillshade. The classification dataset constructed in the study contains a total of 34 classification features, including spectral bands, spectral indices, textures, and topographic features [46,47,48,49,50]. The specific classification feature information is shown in Table 4.

Considering that there may be information redundancy among the abovementioned categorical features and the contribution of each feature to the performance of the classifier is different, we used the meta-converter in sklearn in combination with the built-in feature importance score (FIS) method of each classifier to calculate the feature importance and automatically select the best combination of categorical features. The FIS methods used for each classifier differ, leading to a lack of comparability regarding the FIS results. For the convenience of comparison, the FIS of each classifier was normalized. Moreover, it should be noted that the SVM, CNN, and MLP classifiers lack the importance scoring ability. Therefore, importance evaluation was not performed. The classification features ultimately selected for the three combined classifiers of stacking, hard voting, and soft voting are based on the union of the feature optimization results of each base classifier.

3.4. Classification Dataset Construction

The complexity of the terrain and the variability in local microclimates in the Huangshui River Basin result in significant heterogeneity in the surface vegetation cover. In such complex terrain and vegetation conditions, constructing single-temporal, multitemporal, and feature optimization classification datasets allows for the comparison and integration of the advantages of different datasets [20]. This approach enables better adaptation to the challenges of LC classification in complex terrain and vegetation conditions. In this study, a single-temporal classification dataset, a multitemporal classification dataset, and a feature optimization classification dataset were constructed sequentially for land cover classification. Detailed information on the classification datasets is shown in Table 5.

Single-temporal classification datasets capture the surface characteristics of a specific season, typically selecting summer data during the peak vegetation growth period. The main reason for this is that the spectral and textural features of vegetation are more prominent during the summer, resulting in the most significant differences between vegetation and non-vegetation types [9,11,13,20]. However, there may also be limitations in distinguishing between different land cover types, especially in areas with minimal seasonal variations and diverse vegetation types [20]. Multitemporal classification datasets, by combining images from different seasons and classification features, can provide a more comprehensive representation of seasonal variations on the surface, typically aiding in overcoming the limitations of single-temporal datasets when seasonal changes are minimal or confusing [26,49]. The single-temporal classification datasets in this study contain 22 classification features, specifically including 4 spectral bands from the summer GF-7 imagery, 6 summer spectral indices, 4 terrain features, and 8 texture features. The multitemporal classification datasets build upon the single-temporal datasets by adding four spectral bands from the winter GF-7 imagery, six winter spectral indices, and two time series spectral indices.

Several studies have shown that not all classification features have a positive impact on land cover classification. Using an excessive number of introduced features may reduce the classification accuracy and increase the computation time [50]. The construction of feature optimization classification datasets aims to identify the classification features that contribute most to the accuracy of each classifier, effectively reducing dimensionality and enhancing the efficiency of classification algorithms [20]. This study automatically selects classification features based on the inherent feature importance scoring method of each classifier and creates a feature selection dataset for classification.

3.5. Accuracy Evaluation Metrics

To evaluate the performance of each classifier and accurately identify the land cover type, the accuracy was evaluated (mainly at the pixel scale) by constructing a confusion matrix and using widely used indicators such as the overall accuracy (OA), kappa coefficient, producer accuracy, and user accuracy [51]. Moreover, considering that the number of samples in this study may be imbalanced among each category, the weighted average F1-score and mean balanced accuracy (MBA) were used as two comprehensive indicators to perform more comprehensive and objective performance evaluations [20].

4. Classification Results and Accuracy Evaluation

4.1. Classification Results

The optimal classification results of a representative classifier are shown in Figure 4. Four local areas are enlarged for display. The main land cover types in Area A include large farmland areas, urban land, and forestlands with relatively flat terrain. The classification results of the ensemble learning classifiers and the CNN were generally good; however, shadows and forestlands were sometimes confused, and the SVM single classifier misclassified some farmland pixels and urban areas. The land use information was too difficult to extract. The main land cover types in Area B include terraces, forestlands, grasslands, and unutilized lands. The classification results of HisGBoost-M, LightGBM-M, and Stacking-M were better for this area than others and completely represent the “stacking” distribution characteristics of the terraced fields. These methods also identified the shrub forest between the upper and lower terraces (ridge area) and the contiguous forestland in the river valley well. The bare soil area formed by a small amount of manual excavation in the terraces (unutilized land) was also accurately identified. At the same time, RF-M and ExtraTrees-SF confused unutilized land with construction land. The CNN failed to fully identify the small area of shrub forest between terraces. SVM extracted many bare soil (unutilized land) and shaded land areas in Area B. The ground cover types in Area C include mainly forestland, forestland returned from farmland, and grassland; these cover types were difficult to classify in the study area, and the abovementioned types were prone to confusion. The poor SVM-M classification results were mainly reflected in the greater extraction of bare soil and shadows. Moreover, a small amount of farmland was misclassified as grassland, and the salt-and-pepper noise was more obvious in the classification results. However, the classification results of ensemble learning and the CNN-M were generally better; the planting boundary representing the conversion from farmland to forest was clearly extracted. The grasslands exposed under some forestlands could also be finely extracted. Only ExtraTrees-SF and RF-M had problems extracting shadow areas that were too large in the case of mixed woodland and understory shadows. We also found that the ensemble learning classifiers and the CNN still confused forestland and forestland returned from farmland in some areas. Field surveys of the abovementioned misclassified areas showed that most areas were affected by artificial tree planting in the original forest areas with poor canopy closure. The two forestland types highly overlapped spatially, and the small number of areas was still the result of the insufficient performance of the classifier or insufficient differentiation of the classification data. Area D mainly featured construction land, farmland, and forestland. All classifiers were accurate in terms of extracting the boundaries of farmland and forestland in this area. However, the extraction range of SVM-M for construction land was too large, while the extraction ranges for ExtraTrees-SF and RF-M were relatively small.

4.2. Accuracy Evaluation

4.2.1. Evaluation of the Classification Accuracy Using Single-Temporal Data

The evaluation results regarding the land cover classification accuracy of each classifier using the single-temporal dataset are shown in Table 6. For the 11 ensemble learning classifiers, the OAs were 0.896–0.907, the kappa coefficients were 0.872–0.886, the MBAs were 0.903–0.916, and the F1-scores were 0.897–0.908, and for the three traditional single classifiers (CART, SVM and MLP), the OAs were 0.809–0.856, the kappa coefficients were 0.768–0.824, the MBAs were 0.779–0.851, and the F1-scores were 0.806–0.856. In general, for each of the ensemble learning classifiers, all the accuracy evaluation indicators were significantly greater than those of the traditional single classifiers. HistGBoost, stacking, soft voting, and LightGBM achieved consistent performance in terms of the four accuracy evaluation indicators, and they ranked among the top four in order, indicating that the accuracies of the above four classifiers were relatively good, while hard voting and GBDT ranked lower in terms of the four accuracy evaluation indicators, indicating relatively poor performance. The performance in terms of the accuracy indicators of ExtraTrees, XGBoost, CatBoost, AdaBoost-DT, the RF, and the CNN was average. These techniques were ranked in the middle. The OA (0.904) and kappa coefficient (0.882) of AdaBoost-DT, RF, and ExtraTrees were consistent, while the OA (0.902) and kappa coefficient (0.88) of XGBoost and CatBoost were consistent; however, AdaBoost-DT achieved a better accuracy in terms of the MBA (0.910) and F1-score (0.905) indicators. We further analyzed the ability of the classifiers to identify each land cover type. Except for the producer accuracies (PAs) of the SVM and MLP for forestland and farmland, which reached a level comparable to the PA of the ensemble classifier, the PAs of the ensemble learning classifiers were better. The differences in the PAs of the 11 ensemble learning classifiers were mainly manifested in the grassland and forestland returned from farmland land types; the PAs of HistGBoost, the RF, and AdaBoost-DT were the highest for the forestland returned from farmland (0.92), and the PAs of stacking, the RF, and GBDT were the highest for grassland (0.87).

4.2.2. Evaluation of the Classification Accuracy Using Multitemporal Data

The evaluation results regarding the land cover classification accuracy of each classifier using the multitemporal dataset are shown in Table 7. The OAs of the 11 ensemble learning classifiers were 0.922–0.935, the kappa coefficients were 0.905–0.920, the MBAs were 0.923–0.937, and the F1-scores were 0.921–0.935. For the three traditional single classifiers (CART, SVM, and MLP), the OAs were 0.834–0.889, the kappa coefficients were 0.796–0.866, the MBAs were 0.806–0.897, and the F1-scores were 0.831–0.892. In general, the accuracies of the ensemble learning classifiers were significantly higher than those of the three traditional single classifiers. The performance of HistGBoost based on the OA (0.935), kappa coefficient (0.92), MBA (0.937), and F1-score (0.935) was optimal and comparable to that of the CNN. The evaluation indicators for ExtraTrees, hard voting, and RF indicate that their performance was relatively poor. The AdaBoost-DT, LightGBM, CatBoost, and stacking algorithms all performed better in terms of the MBA (0.933) indicator, but further comparisons of the OA (0.933), kappa (0.918), and F1-score (0.933) indicators showed that the classification accuracies of AdaBoost-DT and LightGBM were lower and only slightly worse than the accuracy of HistGBoost. The performance and ranking of the accuracy indicators for soft voting, GBDT, and XGBoost were average, with OA, MBA, and F1-score values of 0.93. Further analysis of the ability of the classifier to identify each land cover type showed that except for the PAs of the SVM, CART, and MLP for shading and farmland, which reached the same level as those of the ensemble learning classifier, the PAs for the other classes were better for the ensemble learning classifier. The differences in PAs among the 11 ensemble learning classifiers were mainly manifested in the grassland and forestland returned from farmland land types; the PAs of the stacking, XGBoost, and RF classifiers for forestland returned from farmland were better (0.90–0.92), and the PAs of HistGBoost and stacking (0.90) were the highest for grassland.

4.3. Classification Feature Optimization and Classification Accuracy Evaluation

4.3.1. Evaluation of Classification Feature Importance and Feature Optimization

In terms of classification feature selection, existing studies have shown that using spectral bands and topographic factors leads to significant advantages. Still, textural features and vegetation indices cannot be ignored [22]. In this study, we compared the FISs of the 34 categorical features for each classifier (Figure 5). The mean FISs of the summer near-infrared (NIR-S FIS: 0.77), DEM (FIS: 0.71), winter soil-adjusted vegetation index (SAVI-W; FIS: 0.69), and mean texture information (mean; FIS: 0.60) feature sets are the top four for each classifier, indicating that the above four feature sets contributed greatly to the classification performance of each classifier. A comparison of the importance of the different types of features revealed that among the spectral band features, the mean FIS of the spectral band feature in summer is greater than that in winter. In the spectral index features, the FISs of the summer normalized difference vegetation index (NDVI-S; FIS: 0.53), summer ratio resident area index (RRI-S; FIS: 0.54), SAVI-W (FIS: 0.69), summer/winter green vegetative index (VIgreen-S/-W; FISs: 0.49/0.45), and NDVI-TS (FIS: 0.46) are relatively high. Further, the FISs of the DEM (FIS: 0.71) and mean (FIS: 0.60) among the topographic and texture features are significantly greater than those of the other features of the same type. The FIS of each feature and the number of automatic selections by each classifier are further summed and sorted. The NIR-S band was selected by each classifier, and its cumulative FIS was the highest. The DEM, SAVI-W, mean, green, cumulative FISs of -S, RRI-S, NDVI-S, summer red (Red-S) spectral band, VIgreen-S/-W, and NDVI-TS features are the top ten most common features, and the number of selections by each classifier is also high (8–11). However, some texture features (such as dissimilarity, entropy, second moment, and correlation) and some spectral bands (such as the winter red (Red-W) and winter NIR (NIR-W) spectral bands) and spectral indices (such as winter RRI (RRI-W), winter enhanced vegetative index (EVI-W), and summer EVI (EVI-S)) were selected by the classifier fewer times. For each classifier, the original 34 classification features were selected according to their feature importance. After automatic selection, the number of features was mainly concentrated in the range of 10–19, indicating some redundancy in the original dataset of classification features.

4.3.2. Classification Results and Accuracy Evaluation after Classification Feature Optimization

The accuracy evaluation results of the classifiers using their respective automatic feature optimization datasets for land cover classification are shown in Table 8. In general, the accuracies of the 11 ensemble learning classifiers are significantly greater than the accuracy of the CART classifier. An analysis of the accuracy of each ensemble learning classifier showed that AdaBoost-DT, stacking, and ExtraTrees performed better in terms of all the accuracy evaluation indicators, with OA and F1-score values all exceeding 0.93. XGBoost, on the other hand, has the worst performance in terms of all the accuracy indicators, with OA and F1-score values of 0.90 and an MBA of 0.913, respectively. The MBAs (0.923–0.924) and F1-scores (0.917–0.925) of the remaining seven ensemble learning classifiers are very close. However, further comparisons of the OA and kappa coefficient values revealed that the OAs of CatBoost (0.926), soft voting (0.925), and RF (0.925) are better. Further analysis of the ability of the ensemble classifier to identify each LC type showed that the differences are mainly manifested in the grassland and forestland returned from farmland land types, with AdaBoost-DT and CatBoost achieving the highest PAs (0.92) for forestland returned from farmland and HistGBoost achieving the highest PA for (0.90) for grassland.

5. Discussion

5.1. Impact of the Classifier on the Classification Results

5.1.1. Comparison of the Overall Accuracies of the Classifiers

It was confirmed in the literature that ensemble learning classifiers can achieve better land cover classification accuracy than traditional single classifiers [24]. We confirmed this finding in the current study, that is, the eleven ensemble learning classifiers used in this study significantly outperformed the three traditional single classifiers (CART, SVM, and MLP) regarding each classification accuracy indicator on three different classification datasets.

We further compared the performance of the four ensemble learning methods. Among the representative classifiers that used the bagging strategy, the accuracy of the RF method on the three classification datasets was more robust due to the advantages of the RF itself and its lack of overfitting; ExtraTrees achieved higher accuracy on some datasets. However, increasing the training speed by introducing additional randomness in constructing decision trees can also increase the risk of instability and overfitting [52]. Among the six representative algorithms that use the boosting integration strategy, the accuracies of HistGBoost and LightGBM, which utilize histogram data feature processing technology, on the multitemporal and single-temporal classification datasets were robust and better. The good classification performance of LightGBM has been confirmed by many studies [36]. However, the application of HistGBoost is relatively novel, and its application in different classification scenarios is still very limited. This study preliminarily confirmed that both HistGBoost and LightGBM performed well in complex classification scenarios. Luo et al. [22] reported that AdaBoost can assemble many weak classifiers into a strong classifier to achieve a better classification performance, especially when good performance is retained after feature optimization. In addition, AdaBoost-DT, which uses a multitemporal sum for the feature optimization dataset, achieved a better classification accuracy.

Compared with bagging or boosting, which can train only different versions of the same type of classifier, stacking can flexibly use various types of classifiers to create a series of new classification models, and many studies have confirmed that stacking can improve the overall classification accuracy [24]. Most of the classification results in this study showed that, compared with the accuracy of the member classifier, stacking led to a higher accuracy for all four accuracy evaluation indicators, especially in the case of the single-temporal and feature optimization temporal datasets. The accuracy improvement when applying the concentrated stacking model was more significant.

Because voting combines the predictions of multiple models and attempts to compensate for the deficiencies of each model, the classification performance of the base model may be improved [38]. In this study, soft and hard voting were used on some datasets (the multitemporal and feature optimization datasets), and some improvement in the overall classification accuracy compared to that of the base classifier was observed. Soft voting performed better in terms of all the accuracy indicators. Specifically, the difference in accuracy between the two voting classifiers is mainly due to the voting method. Hard voting is a comprehensive base classifier that predicts class labels and the final class label according to the “majority vote” principle. In contrast, soft voting is the sum of the predicted probabilities of the base classifiers for each classification label. The final class label is predicted according to the “maximum probability” principle [19].

Previously, Peppes et al. [53] used three base classifiers (k-nearest neighbor (K-NN), DT, and RF) to integrate soft and hard voting algorithms. They compared the performance differences and found that soft voting can often achieve a better accuracy than that of other methods due to the use of a more complicated voting mechanism. Although the research results of this study support the findings described above, some studies showed that the performance of the voting classifier is still affected by the performance of the base classifier, the characteristics of the dataset, and the specific application scenarios. Therefore, the difference in performance between the two voting classifiers requires further full evaluation [24,54].

Finally, the differences in accuracy of the 11 ensemble learning classifiers on the same classification dataset were compared. The results were as follows: 0.013–0.032 for the OA, 0.016–0.039 for the kappa coefficient, 0.015–0.018 for the MBA, and 0.01–0.03 for the F1-score. In other words, in this study, the improvement in accuracy achieved by selecting a better ensemble classifier generally ranged from 1% to 3% (Figure 6).

Ensemble and deep learning have gained increasing popularity in LULC classification, with their superior classification performance over traditional single classifiers being extensively verified by numerous studies [17,23,24,27]. Recently, comparative research on the accuracy and efficiency of image classification between ensemble learning and deep learning has triggered discussions. For example, Lam et al. [18] used time series Sentinel-1 data to conduct flood and non-flood inundation area classification using a CNN, RF, and MLP. The results indicated that the CNN could generate more reliable classification results (99%) than other methods. Xie et al. [27] conducted comparative tests of machine learning algorithms (SVM and RF) and deep learning algorithms (CNN) for LULC classification in coastal areas using high-spatial-resolution satellite images (SPOT 5 and Sentinel 2). The results indicated that the CNN generally achieved the highest classification accuracy. Arrechea-Castillo et al. [55] proposed a simple CNN based on the LeNet architecture for the LULC classification of Sentinel-2 images, aiming to enhance the classification efficiency of deep learning. The results demonstrated that the LeNet-CNN achieved a classification accuracy of 96.51%.

In this study, based on the accuracy of each ensemble learning classifier, the classification potential of bagging, boosting, stacking, and voting is briefly discussed. The results demonstrate that the performance of ensemble learning classifiers is better than that of traditional single classifiers. Furthermore, when comparing state-of-the-art ensemble classifiers (HistGBoost, LGBM, and AdaBoost-DT) with a complex CNN deep learning model for LULC classification in the Huangshui River Basin, the preliminary results showed that their accuracies were largely comparable. However, it should be noted that due to the high complexity of the CNN model, parameter optimization and training typically require more time and computational resources to achieve a robust classification accuracy [18,27]. In contrast, ensemble learning classifiers may balance the classification accuracy and efficiency better. Therefore, choosing ensemble learning classifiers is an appropriate decision given the complexity of the classification task.

Presently, scholars continue to compare the performance differences between different types of ensemble learning classifiers and deep learning classifiers in classification tasks in different application scenarios. However, because new classifiers are constantly being released and many classifiers are also applied in depth, the applicability and accuracy potential of each ensemble learning and deep learning classifier still need further investigation.

5.1.2. Comparison of the Differences in the Recognition Accuracy of the Specific Ground Features by Classifier

We further explored the producer accuracy of each land type for each ensemble classifier. In terms of the average PA (Figure 7), in general, forestland (0.910), farmland (0.929), urban land (0.935), unutilized land (0.955), and shadows (0.964) were identified with higher accuracy by each classifier. The PA scatter distributions of unused land and shadows were relatively more concentrated. These results indicate that these two land types can be accurately identified by each classifier. The main reason for this result may be the significant differences in spectral, textural, and other features. Although the producer accuracies for forestland, farmland, and urban land were different according to the ensemble learning classifiers, in general, the producer accuracies (PAs) of HistGBoost, AdaBoost_DT, LightGBM, and stacking were better. The producer accuracies of each ensemble classifier for grassland (0.861) and forestland returned from farmland (0.892) were poor, and the distribution of the PA scatter points was relatively concentrated, indicating that all classifiers failed to identify the above land types well; however, in general, the PAs of AdaBoost_DT, HistGBoost, and LightGBM were better.

5.2. Effect of Classification Data on Separability and the Classification Results of Ground Features

5.2.1. Effect of Classification Datasets on the Classification Accuracy

In previous land cover classification-focused studies, images were mainly collected in summer. During this period, surface vegetation grows the most vigorously, the spectral and textural features in the images are the most obvious, and the differentiation between vegetation and non-vegetation types is the best; therefore, good classification results can usually be obtained [20]. In recent years, many scholars have used multitemporal imagery for crop classification, salt marsh vegetation, and forestland extraction. They have found that multitemporal data can capture changes in vegetation phenological characteristics, thus improving the identification accuracy of land cover types. However, increasing the classification data also introduces problems such as information redundancy and increased calculation time [54,56]. Therefore, in this study, single-temporal, multitemporal, and feature optimization datasets were constructed to investigate their effects on the classification accuracy and efficiency. Figure 8 shows the classification accuracies using the three datasets. In general, all the accuracy indicators of land cover classification based on the multitemporal data were better than those based on the single-temporal data; the OA increased by 0.02–0.03, and the kappa coefficient, MBA, and F1-score increased by 0.02–0.04. However, after feature optimization was performed on the multitemporal data, land cover classification was performed again. Among the classification results, the OA and MBA decreased by 0.005 on average, and the kappa coefficient and F1-score decreased by 0.006 on average. Specifically, for each classifier, the classification accuracy decreased significantly only for XGBoost, LightGBM, and HisGBoost-M after feature optimization, and the OA decreased by more than 0.01. For the other classifiers, the classification accuracies before and after feature optimization were essentially the same.

A comprehensive comparison of the accuracies of the same classifier on three classification datasets was performed. The accuracy differences were 0.011–0.029 for the OA, 0.023–0.036 for the kappa coefficient, 0.016–0.033 for the MBA, and 0.02–0.03 for the F1-score. In this study, the improvement in accuracy achieved by selecting a better classification dataset was generally within 1–3%. Multitemporal Sentinel-2A/B MSI and GBDT and RF classifiers have been used in previous Huangshui River Basin studies. It was confirmed that multitemporal data can improve the overall classification accuracy of land cover [20]. This study, using the multitemporal GF-7 data and many different types of classifiers, confirmed the above points. After feature optimization was performed, the classification data were significantly reduced. Some classifiers were still able to maintain good classification accuracy. These findings are critical for improving the efficiency and accuracy of land cover classification in later stages.

5.2.2. Effect of Classification Datasets on the Separability of Land Types

In this study, using multitemporal and feature optimization datasets improved the overall accuracy of land cover classification. The main reason for this result may be that the abovementioned classification datasets can effectively improve the interclass differentiation of some land cover types, thus improving the model’s recognition accuracy for each class. The Jeffries–Matusita distance (JMD) indicator is usually used to evaluate the degree of separation of each category in the classification features, and the value ranges from 0 to 2. Generally, when the JMD between two classes is greater than 1.9, the above features significantly differ, and they are not easily confused [57]. To confirm the above point of view, we used this indicator on the three classification datasets (Figure 9). The distance between the forestland and grassland classes was generally poor (a JMD less than 1.9), which led to a large number of misclassifications. However, the distances between the classes increased significantly when the multitemporal dataset was used. The overall classification result also improved (Figure 9b). Using the feature optimization dataset (Figure 9a) could increase the distances between land types such as shadows, forestland, and forestland returned from farmland. However, the improvement in the distances between the farmland, grassland, and forestland returned from farmland land types was not optimal; therefore, optimum classification accuracy still cannot be achieved for some classifiers with poorer performance.

Previous studies have confirmed that introducing multitemporal data can improve the differentiation between each land type and thus the classification accuracy of each land type and the overall classification accuracy. However, whether this approach improves the extraction of the true boundaries of each ground feature type still needs further analysis. In this study, two typical areas were selected (Figure 10 and Figure 11). Area A includes terraced fields. The 3D surface model constructed from UAV images fully reflects the complexity of this region’s topography. The differentiation of each land type, such as farmland, forestland, and forestland returned from farmland, was more obvious in the images collected in summer. In contrast, unutilized land was highlighted in images from both seasons. Given the classification results using the single-temporal data, extracting the boundaries of the abovementioned types was essentially complete. However, some farmland with sparse crops was wrongly classified as grassland, unutilized land (bare soil) was wrongly classified as urban land, and the classification map was fragmented. Meanwhile, the classification results using the multitemporal data showed that the problem of farmland misclassification was effectively improved. By comparing the GF-7 and UAV images captured in winter, we found that much farmland had been manually plowed and covered with transparent film. Therefore, human activities led to a change in the differences in land types, such as farmland, grassland, and forestland, in the winter images, which may be the main reason for the improvement in the identification accuracies of the abovementioned land types.

Area B is a typical area where farmland was converted to forest. Two forestlands, forestland returned from farmland, and grassland areas with a high degree of spatial overlap that were easily misclassified were selected, and the image features and classification results of the two areas during different seasons were compared. The GF-7 submeter resolution images had more abundant spatial and structural information in both winter and summer; the forestland returned from farmland class exhibited a unique artificial planting texture in these images, increasing the spectral variability between the forestland returned from farmland and natural forestland classes. The spectral and textural features of many pine trees planted in forestland returned from farmland images were clear in both winter and summer. However, in some areas, the pine saplings were planted for a shorter time. Therefore, the plants were shorter, exposing information on surface grass and low shrubs. As a result, pixels in the image were confused. The natural forestland in this area is mainly composed of a broad-leaved forest and a small amount of a needle- and broad-leaved mixed forest, which is differentiated well from grassland in the summer images. Given the classification results, the boundaries between the forestland returned from farmland, forestland, and grassland were clearer and more accurate regarding the multitemporal image classification results. Moreover, there was an improvement in the number of wrongly classified shrubs and grassland in the forestland’s underlying surface. Within the Huangshui River Basin, large-scale afforestation and reforestation projects are implemented, and the utilization of GF-7 data and ensemble learning classifiers can effectively distinguish between farmland reforestation and forestland areas. This capability provides technical references for monitoring and evaluating the implementation effectiveness of afforestation and reforestation policies.

5.2.3. Impact of Classification Datasets on Classification Time

In land cover classification, scholars continuously strive to improve classification accuracy. However, with the gradual use of high-resolution satellite imagery and complex classification models, the entire classification process faces the challenges of increased computational complexity and time-consuming classification. A successful classification scheme should account for the costs of accuracy, computational resources, and time; in particular, when the difference in classification accuracy is small, the time spent on classification is an important aspect of classifier performance evaluation [22,50].

This study evaluated the time consumption of 11 ensemble learning classifiers and 3 traditional single classifiers across 40 classification schemes (Figure 12). It is important to note that all classification tasks were performed using the same computer equipment and executed in the CPU training mode. Due to the generally time-consuming nature of CNN model training, this study adopted the GPU training mode to accelerate the CNN training process; therefore, the classification time of the CNN model is not included in the comparison. For the same classifier, the time required for the multitemporal phase-classification datasets was the highest, and the time required for the single-temporal phase-classification datasets was the lowest. The above classification results were found using the same operation and code environment, and the time differences were mainly caused by the increase in the amount of data calculated by the classifier. With the same classifier, the classification time for the feature optimization dataset was slightly greater than that of the single-temporal dataset. The reason for this is that the number of optimized features of each classifier was significantly reduced, so the classification time should be lower; however, the classification process includes additional aspects such as feature importance scoring and hyperparameter optimization for each classifier and each dataset. As a result, the overall classification time increased. However, we found that for HistGBoost, XGBoost, and LightGBM, due to the use of various technologies, such as histogram optimization and parallel computing, feature selection and classifier training were completed quickly. Thus, the overall classification time was basically the same as the single-temporal data classification time. These findings will help to improve the classification process of ultra-high-resolution images, ensuring classification accuracy while improving classification efficiency.

5.3. Limitations and Future Work

Different machine learning classifiers have inherent hyperparameters, and hyperparameter tuning is decisive in achieving optimal classification performance [58,59]. During the hyperparameter tuning phase of classifiers, one generally faces the issues of how to set up hyperparameter combinations and which hyperparameter optimization method to choose. This study determined multiple key hyperparameters and their reasonable ranges for various classifiers based on the literature and extensive experimentation. A more efficient random grid search method, suitable for high-dimensional hyperparameter spaces, was chosen to select the optimal parameter schemes for the classifiers. Using the final classification results, this hyperparameter optimization design appears reasonable. However, this method still has some shortcomings. On the one hand, the impact of different types of hyperparameters on the classifier accuracy is usually not the same; this depends on the type of classifier, the specific task, and the data characteristics. Therefore, it is necessary to use methods such as sensitivity analysis to identify the hyperparameters that most significantly affect the classifier performance. On the other hand, different hyperparameter tuning methods have their own advantages and disadvantages. For example, the random sampling of the random grid search method may miss some key hyperparameter combinations. Hence, in addition to conducting a more detailed comparison of the advantages and disadvantages of different hyperparameter optimization methods later on, it is also important to explore the applicable scenarios for different hyperparameter optimization methods.

Compared to traditional single strong classifiers, although ensemble learning classifiers have shown significant improvements in LULC classification accuracy in complex terrain areas, it needs to be considered whether the marginal improvements introduced by using ensemble learning classifiers are sufficient to offset the increased computational and time costs associated with introducing more complex algorithms. In recent years, the effectiveness of many advanced deep learning models in handling remote sensing classification tasks has been successively confirmed. However, various deep learning models were not utilized in this study. Therefore, we could not effectively compare the accuracy and efficiency of deep learning and ensemble learning methods in classifying the submeter resolution GF-7 imagery for LULC. This comparative study will be conducted in future research.

6. Conclusions

In this study, to address the low accuracy of the land cover classification of loess hills in the Northeastern Qinghai–Tibet Plateau, multitemporal GF-7 stereo imagery from the typical area of the Huangshui River Basin was used to investigate the accuracy improvement potential of ensemble learning classifiers on different classification datasets representing complex terrain classification scenarios. The following conclusions were drawn:

(1): Compared with the accuracies of the CART, SVM, and MLP traditional single classifiers, the accuracies of the 11 ensemble learning classifiers were superior across all three land cover classification datasets. Using ensemble learning classifiers could introduce a 5% to 9% improvement in the accuracy of LULC classification for the study area. For the same classification dataset, the difference in accuracy indicators between the 11 ensemble learning classifiers was generally 1–3%. HistGBoost, LightGBM, and other boosting classifiers that adopted techniques such as histogram processing performed better in terms of accuracy. The performance of the bagging classifier was robust, while the accuracy of the ExtraTrees classifier was better. Stacking improved the overall accuracy of the classifier more significantly relative to its member classifiers than voting.
(2): Compared with the land cover classification accuracies when using the feature optimization and single-temporal datasets, the land cover classification accuracy achieved by each classifier on the multitemporal dataset was better. The differences in accuracy indicators achieved using different classification datasets with the same classifier were generally 1–3%. Using multitemporal datasets could greatly improve differentiation among forestland returned from farmland, grassland, and forestland, which helps to improve the overall classification accuracy.
(3): Classification features such as the NIR-S, DEM, mean, RRI-S, Green-S, and SAVI-W were optimized more times, contributing to the performance of each classifier. Each ensemble classifier reduced the time spent on classification after the feature optimization of the multitemporal dataset, and AdaBoost-DT, stacking, and ExtraTrees maintained a better classification accuracy.
(4): Based on a comprehensive consideration of the accuracy evaluation results of each classifier, the GF-7 satellite data were confirmed to have good applicability in the land cover classification of complex topographic areas and can provide data support for the accurate identification of land cover types in loess hills and similar topographic areas.

Author Contributions

Conceptualization, X.G. and F.S.; methodology, F.S.; software, H.Z.; validation, R.L.; formal analysis, X.G.; data curation, R.L.; writing—original draft preparation, F.S.; writing—review and editing, X.G.; supervision, H.Z.; project administration, X.G.; funding acquisition, X.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Qinghai Province of China (2021-ZJ-913).

Data Availability Statement

The data presented in this study are available upon request from the first author.

Acknowledgments

We thank the Institute of Qinghai Meteorological Science Research for providing the computing environment for this study. We also extend our gratitude to the anonymous reviewers for their insightful comments, which greatly improved this manuscript. Furthermore, we appreciate Hongyi Li, Research Fellow at the Northwest Institute of Eco-Environment and Resources, CAS, for his guidance on the manuscript revisions. Finally, we thank Hongda Li for his assistance with the CNN model code.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Camilleri, S.; De Giglio, M.; Stecchi, F.; Pérez-Hurtado, A. Land use and land cover change analysis in predominantly man-made coastal wetlands: Towards a methodological framework. Wetl. Ecol. Manag. 2017, 25, 23–43. [Google Scholar] [CrossRef]
Defries, R. Terrestrial vegetation in the coupled human-earth system: Contributions of remote sensing. Ann. Rev. Environ. Resour. 2008, 33, 369–390. [Google Scholar] [CrossRef]
Gong, P.J.; Wang, J.; Yu, L.; Zhao, Y.; Zhao, Y.; Liang, L.; Niu, Z.; Huang, X.; Fu, H.; Liu, S.; et al. Finer resolution observation and monitoring of global land cover: First mapping results with Landsat TM and ETM+ data. Int. J. Remote Sens. 2013, 34, 2607–2654. [Google Scholar] [CrossRef]
Zhou, Y.; Liu, F.; Zhang, G.; Wang, J. Response of the Normalized Difference Vegetation Index (NDVI) to Snow Cover Changes on the Qinghai–Tibet Plateau. Remote Sens. 2024, 16, 2140. [Google Scholar] [CrossRef]
Sun, H.; Zheng, D.; Yao, T.; Zhang, Y. Protection and construction of the national ecological security shelter zone on Tibetan Plateau. Acta Geogr. Sin. 2012, 67, 3–12. [Google Scholar]
Zhang, Y.; Liu, L.; Wang, Z.; Bai, W.; Ding, M.; Wang, X. Spatial and temporal characteristics of land use and cover changes in the Tibetan Plateau. Chin. Sci. Bull. 2019, 64, 2865–2875. [Google Scholar]
Liu, Z.; Liu, S.; Qi, W.; Jin, H. The settlement intention of floating population and the factors in Qinghai-Tibet Plateau: An analysis from the perspective of short-distance and long-distance migrants. Acta Geogr. Sin. 2022, 76, 1907–1919. [Google Scholar]
Shi, F.; Zhou, B.; Zhou, H.; Zhang, H.; Li, H.; Li, R.; Guo, Z.; Gao, X. Spatial Autocorrelation Analysis of Land Use and Ecosystem Service Value in the Huangshui River Basin at the Grid Scale. Plants 2022, 11, 2294. [Google Scholar] [CrossRef] [PubMed]
Tang, M. Land Use/Land Cover Information Extraction from SPOT6 Imagery with Object-Oriented and Random Forest Methods in the Huangshui River Basin. Master’s Thesis, Qinghai Normal University, Xining, China, 2020. [Google Scholar]
Li, J. Research on Land Use/Land Cover Classification in Complex Terrain Areas. Master’s Thesis, Qinghai Normal University, Xining, China, 2013. [Google Scholar]
Jia, W. Research on Object-Oriented Land Use Information Extraction in Complex Terrain Areas. Master’s Thesis, Qinghai Normal University, Xining, China, 2015. [Google Scholar]
Gu, X. Research on Land Use/Land Cover Classification in Huangshui Basin Based on Machine Learning. Master’s Thesis, Qinghai Normal University, Xining, China, 2018. [Google Scholar]
Ma, H. Land Use/Land Cover Change Detection in Huangshui River Basin Based on Random Forest. Master’s Thesis, Qinghai Normal University, Xining, China, 2018. [Google Scholar]
Shen, Z. Land Use/Land Cover Classification and Accuracy Assessment in Huangshui Basin Based on GEE’s Landsat Image Long-Term Series Data. Master’s Thesis, Qinghai Normal University, Xining, China, 2020. [Google Scholar]
Li, R. Research on Land Cover Classification Based on Ensemble Learning—A Case Study of the Huangshui River Basin in the Northeast of Qinghai-Tibet Plateau. Master’s Thesis, Qinghai Normal University, Xining, China, 2020. [Google Scholar]
Cui, K.; Li, R.; Polk, S.L.; Lin, Y.; Zhang, H.; Murphy, J.M.; Plemmons, R.J.; Chan, R.H. Superpixel-based and Spatially-regularized Diffusion Learning for Unsupervised Hyperspectral Image Clustering. IEEE Trans. Geosci. Remote Sens. 2024, 5, 4. [Google Scholar] [CrossRef]
Maung, W.S.; Tsuyuki, S.; Guo, Z. Improving Land Use and Land Cover Information of Wunbaik Mangrove Area in Myanmar Using U-Net Model with Multisource Remote Sensing Datasets. Remote Sens. 2024, 16, 76. [Google Scholar] [CrossRef]
Lam, C.-N.; Niculescu, S.; Bengoufa, S. Monitoring and Mapping Floods and Floodable Areas in the Mekong Delta (Vietnam) Using Time-Series Sentinel-1 Images, Convolutional Neural Network, Multi-Layer Perceptron, and Random Forest. Remote Sens. 2023, 15, 2001. [Google Scholar] [CrossRef]
Li, H.; Gao, X.; Tang, M. Research on land cover classification of images with different spatial resolutions based on CNN. Remote Sens. Technol. Appl. 2020, 35, 749–758. [Google Scholar]
Li, H. Research on Land Cover Classification of Sentinel-2 Multi-Seasonal Data Based on Gradient Boosting Tree and Random Forest. Master’s Thesis, Qinghai Normal University, Xining, China, 2021. [Google Scholar]
Hao, T.; Elith, J.; Lahoz-Monfort, J.J.; Guillera-Arroita, G. Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models. Ecography 2020, 43, 549–558. [Google Scholar] [CrossRef]
Luo, H.; Li, M.; Dai, S.; Li, H.; Li, Y.; Hu, Y.; Zheng, Q.; Yu, X.; Fang, J. Combinations of Feature Selection and Machine Learning Algorithms for Object-Oriented Betel Palms and Mango Plantations Classification Based on Gaofen-2 Imagery. Remote Sens. 2022, 14, 1757. [Google Scholar] [CrossRef]
Kohavi, R.; Provost, F. Glossary of terms: Machine learning. Appl. Mach. Learn. Knowl. Discov. Process 1998, 30, 271. [Google Scholar]
Wen, L.; Hughes, M. Coastal Wetland Mapping Using Ensemble Learning Algorithms: A Comparative Study of Bagging, Boosting and Stacking Techniques. Remote Sens. 2020, 12, 1683. [Google Scholar] [CrossRef]
Castillo-Navarro, J.; Saux, B.L.; Boulch, A.; Audebert, N.; Lefèvre, S. Semi-Supervised Semantic Segmentation in Earth Observation: The MiniFrance Suite, Dataset Analysis and Multi-Task Network Study. Mach. Learn. 2022, 111, 3125–3160. [Google Scholar] [CrossRef]
Cuypers, S.; Nascetti, A.; Vergauwen, M. Land Use and Land Cover Mapping with VHR and Multi-Temporal Sentinel-2 Imagery. Remote Sens. 2023, 15, 2501. [Google Scholar] [CrossRef]
Xie, G.; Niculescu, S. Mapping and Monitoring of Land Cover/Land Use (LCLU) Changes in the Crozon Peninsula (Brittany, France) from 2007 to 2018 by Machine Learning Algorithms (Support Vector Machine, Random Forest, and Convolutional Neural Network) and by Post-classification Comparison (PCC). Remote Sens. 2021, 13, 3899. [Google Scholar] [CrossRef]
Sánchez, A.-M.S.; González-Piqueras, J.; de la Ossa, L.; Calera, A. Convolutional Neural Networks for Agricultural Land Use Classification from Sentinel-2 Image Time Series. Remote Sens. 2022, 14, 5373. [Google Scholar] [CrossRef]
Kroupi, E.; Kesa, M.; Navarro-Sánchez, V.D.; Saeed, S.; Pelloquin, C.; Alhaddad, B.; Moreno, L.; Soria-Frisch, A.; Ruffini, G. Deep convolutional neural networks for land-cover classification with Sentinel-2 images. J. Appl. Remote Sens. 2019, 13, 024525. [Google Scholar] [CrossRef]
Polikar, R. Ensemble based systems in decision making. IEEE Circuits Syst. 2006, 6, 21–45. [Google Scholar] [CrossRef]
Zhou, Z.H. Ensemble Methods: Foundations and Algorithms; CRC Press: Boca Raton, FL, USA; London, UK; New York, NY, USA, 2012. [Google Scholar]
Qinghai Provincial Bureau of Statistics. Qinghai Statistical Yearbook 2020; China Statistics Press: Beijing, China, 2020; pp. 1–23. [Google Scholar]
Li, Z.; Chen, Z.; Cheng, Q.; Duan, F.; Sui, R.; Huang, X.; Xu, H. UAV-Based Hyperspectral and Ensemble Machine Learning for Predicting Yield in Winter Wheat. Agronomy 2022, 12, 202. [Google Scholar] [CrossRef]
Liu, J.; Kuang, W.; Zhang, Z.; Xu, X.; Qin, Y.; Ning, J.; Chi, W. Spatiotemporal characteristics, patterns, and causes of land-use changes in China since the late 1980s. J. Geogr. Sci. 2014, 24, 195–210. [Google Scholar] [CrossRef]
Chan, J.C.W.; Huang, C.; DeFries, R. Enhanced algorithm performance for land cover classification from remotely sensed data using bagging and boosting. IEEE Trans. Geosci. Remote Sens. 2001, 39, 693–695. [Google Scholar]
Ahn, J.M.; Kim, J.; Kim, K. Ensemble Machine Learning of Gradient Boosting (XGBoost, LightGBM, CatBoost) and Attention-Based CNN-LSTM for Harmful Algal Blooms Forecasting. Toxins 2023, 15, 608. [Google Scholar] [CrossRef] [PubMed]
Van der Laan, M.J.; Polley, E.C.; Hubbard, A.E. Super learner. Stat. Appl. Genet. Mol. Biol. 2007, 6, 9. [Google Scholar] [CrossRef] [PubMed]
Shuai, S.; Zhang, Z.; Zhang, T.; Luo, W.; Tan, L.; Duan, X.; Wu, J. Innovative Decision Fusion for Accurate Crop/Vegetation Classification with Multiple Classifiers and Multisource Remote Sensing Data. Remote Sens. 2024, 16, 1579. [Google Scholar] [CrossRef]
El-Naqa, I.; Yang, Y.; Wernick, M.N.; Galatsanos, N.P.; Nishikawa, R.M. A Support Vector Machine Approach for Detection of Microcalcifications. IEEE Trans. Med. 2002, 21, 1552–1563. [Google Scholar] [CrossRef] [PubMed]
Pontil, M.; Verri, A. Support Vector Machines for 3d Object Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 6, 637–646. [Google Scholar] [CrossRef]
Ren, J.; Wang, R.; Liu, G.; Wang, Y.; Wu, W. An SVM-Based Nested Sliding Window Approach for Spectral-Spatial Classification of Hyperspectral Images. Remote Sens. 2021, 13, 114. [Google Scholar] [CrossRef]
Hsu, C.W.; Lin, C.J. A Comparison of Methods for Multiclass Support Vector Machines. IEEE Trans. Neural Netw. 2002, 13, 415–425. [Google Scholar] [PubMed]
Chan, R.H.; Li, R. A 3-Stage Spectral-Spatial Method for Hyperspectral Image Classification. Remote Sens. 2022, 14, 3998. [Google Scholar] [CrossRef]
Cheng, F.; Ou, G.; Wang, M.; Liu, C. Remote Sensing Estimation of Forest Carbon Stock Based on Machine Learning Algorithms. Forests 2024, 15, 681. [Google Scholar] [CrossRef]
Breiman, L.; Friedman, J.H.; Olshen, R. Classification and Regression Trees; Routledge: London, UK, 2017. [Google Scholar]
Peña-Barragán, J.M.; Ngugi, M.K.; Plant, R.E.; Six, J. Object-based crop identification using multiple vegetation indices, textural features and crop phenology. Remote Sens. Environ. 2011, 115, 1301–1316. [Google Scholar] [CrossRef]
Reschke, J.; Christian, H. Continuous field mapping of Mediterranean wetlands using sub-pixel spectral signatures and multi-temporal Landsat data. Int. J. Appl. Earth Obs. Geoinf. 2014, 28, 220–229. [Google Scholar]
Haralick, R.M.; Shanmugam, K.S. Combined spectral and spatial processing of ERTS imagery data. Remote Sens. Environ. 1974, 3, 3–13. [Google Scholar] [CrossRef]
Padhee, S.K.; Dutta, S. Spatio-Temporal Reconstruction of MODIS NDVI by Regional Land Surface Phenology and Harmonic Analysis of Time-Series. GISci. Remote Sens. 2019, 56, 1261–1288. [Google Scholar]
Laliberte, A.S.; Browning, D.M.; Rango, A. A comparison of three feature selection methods for object-based classification of sub-decimeter resolution Ultra Cam-L imagery. Int. J. Appl. Earth Obs. 2012, 15, 70–78. [Google Scholar]
Ferri, C.; Hernández-Orallo, J.; Modroiu, R. An experimental comparison of performance measures for classification. Pattern Recognit. Lett. 2009, 30, 27–38. [Google Scholar] [CrossRef]
Wang, D.; Huo, Z.; Miao, P.; Tian, X. Comparison of Machine Learning Models to Predict Lake Area in an Arid Area. Remote Sens. 2023, 15, 4153. [Google Scholar] [CrossRef]
Peppes, N.; Daskalakis, E.; Alexakis, T.; Adamopoulou, E.; Demestichas, K. Performance of Machine Learning-Based Multi-Model Voting Ensemble Methods for Network Threat Detection in Agriculture 4.0. Sensors 2021, 21, 7475. [Google Scholar] [CrossRef] [PubMed]
Kpienbaareh, D.; Sun, X.; Wang, J.; Luginaah, I.; Bezner Kerr, R.; Lupafya, E.; Dakishoni, L. Crop Type and Land Cover Mapping in Northern Malawi Using the Integration of Sentinel-1, Sentinel-2, and PlanetScope Satellite Data. Remote Sens. 2021, 13, 700. [Google Scholar] [CrossRef]
Arrechea-Castillo, D.A.; Solano-Correa, Y.T.; Muñoz-Ordóñez, J.F.; Pencue-Fierro, E.L.; Figueroa-Casas, A. Multiclass Land Use and Land Cover Classification of Andean Sub-Basins in Colombia with Sentinel-2 and Deep Learning. Remote Sens. 2023, 15, 2521. [Google Scholar] [CrossRef]
Cheng, K.; Scott, G.J. Deep Seasonal Network for Remote Sensing Imagery Classification of Multi-Temporal Sentinel-2 Data. Remote Sens. 2023, 15, 4705. [Google Scholar] [CrossRef]
Vanniel, T.; McVicar, T.; Datt, B. On the relationship between training sample size and data dimensionality: Monte Carlo analysis of broadband multi-temporal classification. Remote Sens. Environ. 2005, 98, 468–480. [Google Scholar] [CrossRef]
Ajibola, S.; Cabral, P. A Systematic Literature Review and Bibliometric Analysis of Semantic Segmentation Models in Land Cover Mapping. Remote Sens. 2024, 16, 2222. [Google Scholar] [CrossRef]
Nasiri, V.; Darvishsefat, A.A.; Arefi, H.; Griess, V.C.; Sadeghi, S.M.; Borz, S.A. Modeling Forest Canopy Cover: A Synergistic Use of Sentinel-2, Aerial Photogrammetry Data, and Machine Learning. Remote Sens. 2022, 14, 1453. [Google Scholar] [CrossRef]

Figure 1. The study area location. (a) The Huangshui River Basin, located in the eastern part of the Qinghai–Tibet Plateau. (b) The study region, located in the central region of the Huangshui River Basin. (c) Summer false-color imagery of the study area captured by a GF-7 satellite. (d) Winter true-color imagery of the study area captured by a GF-7 satellite.

Figure 2. Sample points and UAV image distribution. (a) Sample points and drone aerial photography area. The distribution of sample points and UAV aerial photography area in the GF-7 3D model of the study area. (b) A 3D model of a UAV orthophoto of cultivated land in the loess hilly region during the summer. (c) A 3D model of a UAV orthophoto of cultivated land in a flat terrain during the summer. (d) A 3D model of a UAV orthophoto of farmland reforestation during the summer. (e) A 3D model of a UAV orthophoto of cultivated land in the loess hilly region during the summer. (f) A 3D model of a UAV orthophoto of farmland reforestation during the winter. (g) A 3D model of a UAV orthophoto of cultivated land in the loess hilly region during the winter. (h) A 3D model of a UAV orthophoto of cultivated land in the loess hilly region during the summer.

Figure 3. Workflow illustrating the methodology of this study, representing four main stages: (1) Data preprocessing and feature extraction; (2) construction of the classification sample library; (3) classifier selection and optimization; (4) validation of the classification results and comparative analysis. Detailed tasks for each stage are explained.

Figure 4. The classification results and partial classification details. Area a is characterized by flat farmland. Area b features terraces. Area c consists of land converted from farmland back to forest and grassland. Area d encompasses urban areas.

Figure 5. Sorting of feature importance score and feature selection results. (a) Feature importance score. (b) Cumulative feature importance scores and ranking. (c) The cumulative number and order of each classification feature.

Figure 6. Boxplot of the accuracy evaluation index of the classifier. (a) Boxplot of OA. (b) Boxplot of kappa. (c) Boxplot of MBA. (d) Boxplot of the F1-score.

Figure 7. Boxplot of PA and UA of the classifier. (a) Boxplot of PA. (b) Boxplot of UA.

Figure 8. Scatter plot of accuracy evaluation indicators under each classification dataset.

Figure 9. A comparison of the separability of surface types based on 3 classification datasets. (a) The separability of land surface types based on the feature optimization dataset. (b) The separability of land surface types based on the multitemporal classification dataset. (c) The separability of land surface types based on the single-temporal classification dataset.

Figure 10. A comparison of images and classification results at different phases of Area A.

Figure 11. A comparison of images and classification results at different phases of Area B.

Figure 12. A comparison of the classification time.

Table 1. Specifications of the GF-7 satellite.

Date Acquired	Image Type	Spatial Resolution (m)	Spectral Ranges (μm)	Cloud Cover (%)
25 August 2020	Backward multispectral	2.6	B1 0.45–0.52	0
			B2 0.52–0.59
			B3 0.63–0.69
8 February 2021			B4 0.77–0.89
	Backward panchromatic	0.65	0.45–0.9
	Forward panchromatic	0.8	0.45–0.9

Table 2. Classification system and shadow and visual interpretation signs.

Land Cover Types	Description	True-Color Composite Image of GF-7 Satellite on 25 August 2020	False-Color Composite Image of GF-7 Satellite on 25 August 2020	True-Color Composite Image of GF-7 Satellite on 18 February 2021	True-Color Composite Image of Unmanned Aerial Vehicle in 2021
Urban land	Urban construction land and urban roads
Forestland	Shrubs and woodlands
Farmland reforestation	Forest land returned from farmland
Farmland	Cultivated land in loess hilly region
Farmland	Cultivated land in flat terrain
Grassland	Grassland loess hilly region
Unutilized land	Bare land
Shadow	Shadows from forested and built-up areas

Table 3. Hyperparameters of the classifier.

Classifiers	Parameters	Description	Tuning Ranges
HistGBoost, LGBM, AdaBoost_DT, CatBoost, XGBoost, GBDT, and CNN	learning_rate	The learning rate, controlling the update magnitude of model parameters in each iteration.	0.01–0.3
HistGBoost, LGBM, XGBoost, GBDT, RF, ExtraTrees, and CART	max_depth	The maximum depth of each tree.	1–50
LGBM, AdaBoost_DT, XGBoost, GBDT, RF, and ExtraTrees	n_estimators	The number of trees, representing the number of iterations.	1–300
HistGBoost, RF, ExtraTrees, CART, and GBDT	min_samples_leaf	The minimum number of samples a leaf node must have.	1–20
HistGBoost, RF, ExtraTrees, CART, and GBDT	min_samples_split	The minimum number of samples a node must have to be split.	1–20
XGBoost	min_child_weight	The minimum sum of instance weight in each leaf node.	1–50
LGBM	boosting_type	The parameter that specifies the type or strategy of the gradient boosting algorithm.	gbdt
LGBM	min_child_samples	The minimum number of samples in each leaf node.	1–50
AdaBoost_DT	algorithm	The algorithm implementation for AdaBoost.	SAMME.R, SAMME
CatBoost	iterations	The number of iterations, representing the number of boosting rounds.	1000
	depth	Depth of the trees.	1–50
	loss_function	The parameter used to specify the loss function utilized during the training process.	MultiClass
GBDT	criterion	The parameter used to define the criterion for measuring split quality.	Friedman_mse
RF	max_features	The number of features to consider when looking for the best split.	None
SVM	kernel	The kernel parameter used to specify the kernel function employed for transforming the input data.	RBF
SVM	C	The parameter that controls the penalty of misclassification.	1–100
MLPC	hidden_layer_sizes	The parameter that specifies the number of neurons in each hidden layer.	1–100
	activation	The parameter that specifies the activation function for the hidden layers.	Relu, Tanh, Logistic
	solver	The parameter that specifies the optimization algorithm used for weight optimization.	Adam, Sgd, Lbfgs
	max_iter	The number of iterations.	1500
CNN	Number of Convolutional Layers	The parameter that determines the depth and complexity of feature extraction.	9
	Kernel Size	The parameter that specifies the spatial extent of each convolutional filter.	5.3
	Regularization	The parameter used to prevent overfitting	Dropout
	Activation Function	The parameter that introduces non-linearity into the model.	Relu

Table 4. Classification features extracted from GF 7 images.

Types	Features	Description	References
Summer and winter temporal spectral bands	Blue band	Use the 1st, 2nd, 3rd, and 4th bands of the GF-7 backward multispectral and backward panchromatic spectral fusion images from 25 August 2020 and 18 February 2021 for calculation, with a spatial resolution of 0.68 m.	[46]
	Green band
	Red band
	NIR band
Summer and winter temporal spectral indices	Normalized differential vegetation index (NDVI)	Use the 1st, 2nd, 3rd, and 4th bands of the GF-7 backward multispectral and backward panchromatic spectral fusion images from 25 August 2020 and 18 February 2021 for calculation, with enhancement of vegetation, water bodies, and urban areas.	[46]
	Normalized differential water index (NDWI)
	Ratio of the resident area index (RRI)
	Green vegetation index (VIgreen)
	Soil-adjusted vegetation index (SAVI)
	Enhanced vegetation index (EVI)
Time series spectral index	Time series normalized vegetation index (NDVI-TS)	Use GF-7 satellite images from summer and winter to calculate the NDVI-TS and EVI-TS. These indices are utilized to assess the vegetation growth variations during different periods.	[47]
Time series spectral index	Time series enhanced vegetation index (EVI-TS)		[47]
Terrain information	DEM	Use the backward and forward panchromatic image of the GF-7 from 25 August 2020 for digital elevation model (DEM) calculation, with a spatial resolution of 0.68 m. Use the DSM for slope, aspect, and shaded relief calculation, with a spatial resolution of 0.68 m, which mainly indicate the topographic information.	[14,20]
	Slope
	Aspect
	Shaded relief
Texture information	Mean	After performing the principal component analysis (PCA) on the 1st, 2nd, 3rd, and 4th bands of the GF-7 backward multispectral and backward panchromatic spectral fusion images from 25 August 2020, the first principal component is used to calculate the gray-level cooccurrence matrix (GLCM) to reflect the information on the distance, grayscale level, and direction in the image.	[48,50]
	Variance
	Homogeneity
	Contrast
	Dissimilarity
	Entropy
	Second moment
	Correlation

Table 5. Feature dataset for classification.

Types	Features	Number
Single-temporal classification dataset (-S)	Summer temporal spectral bands (Blue-S, Green-S, Red-S, NIR-S) Summer temporal spectral indices (NDVI-S, NDWI-S, RRI-S, VIgreen-S, SAVI-S, EVI-S) Terrain information (DEM, slope, aspect, shaded relief) Texture information (mean, variance, homogeneity, contrast, dissimilarity, entropy, second moment, correlation)	22
Multitemporal classification dataset (-M)	Summer temporal spectral bands (Blue-S, Green-S, Red-S, NIR-S) Winter temporal spectral bands (Blue-W, Green-W, Red-W, NIR-W) Summer temporal spectral indices (NDVI-S, NDWI-S, RRI-S, VIgreen-S, SAVI-S, EVI-S) Winter temporal spectral indices (NDVI-W, NDWI-W, RRI-W, VIgreen-W, SAVI-W, EVI-W) Time series spectral index (NDVI-TS, EVI-TS) Terrain information (DEM, slope, aspect, shaded relief) Texture information (mean, variance, homogeneity, contrast, dissimilarity, entropy, second moment, correlation)	34
Feature selection dataset for classification (-FS)	Automatically select classification features based on the feature importance scoring methods inherent to each classifier	10–19

Table 6. The accuracy evaluation of the classification results of the single-temporal dataset.

Model	Producer Accuracy (PA)							OA	Kappa	MBA	F1- Score
Model	Urban	Unutilized Land	Shadow	Farmland Returned to Forest Land	Grassland	Forest	Farmland	OA	Kappa	MBA	F1- Score
AdaBoost_DT	0.89	1.00	0.97	0.92	0.84	0.90	0.89	0.904	0.882	0.910	0.905
CART	0.81	0.95	0.95	0.85	0.81	0.82	0.86	0.856	0.824	0.851	0.856
CatBoost	0.92	0.98	0.98	0.89	0.86	0.86	0.90	0.902	0.880	0.904	0.903
CNN	0.92	0.99	0.98	0.90	0.84	0.88	0.91	0.905	0.883	0.910	0.906
ExtraTrees	0.92	1.00	0.97	0.86	0.86	0.87	0.91	0.904	0.882	0.905	0.904
GBDT	0.94	0.96	0.98	0.90	0.87	0.89	0.87	0.900	0.878	0.901	0.902
Hard Voting	0.98	0.96	0.96	0.89	0.81	0.92	0.87	0.896	0.872	0.903	0.897
HistGBoost	0.89	1.00	0.97	0.92	0.84	0.90	0.89	0.907	0.886	0.916	0.908
LGBM	0.96	0.98	0.98	0.90	0.86	0.89	0.89	0.907	0.886	0.911	0.908
MLPC	0.52	0.59	0.88	0.75	0.78	0.95	0.91	0.809	0.768	0.779	0.806
RF	0.88	1.00	0.97	0.92	0.87	0.86	0.90	0.904	0.882	0.904	0.904
Soft Voting	0.94	0.98	0.97	0.89	0.86	0.89	0.90	0.907	0.886	0.915	0.908
Stacking	0.94	0.98	0.97	0.91	0.87	0.87	0.90	0.907	0.886	0.915	0.908
SVM	0.81	0.95	0.89	0.78	0.77	0.92	0.90	0.855	0.824	0.860	0.854
XGBoost	0.92	0.98	0.97	0.90	0.86	0.88	0.89	0.902	0.880	0.908	0.903

Table 7. Accuracy evaluation of the classification results of the multitemporal dataset.

Model	Producer Accuracy (PA)							OA	Kappa	MBA	F1- Score
Model	Urban	Unutilized Land	Shadow	Farmland Returned to Forest Land	Grassland	Forest	Farmland	OA	Kappa	MBA	F1- Score
AdaBoost-DT	0.96	0.96	0.97	0.89	0.87	0.92	0.96	0.933	0.918	0.933	0.933
CART	0.85	0.90	0.97	0.84	0.79	0.86	0.92	0.876	0.849	0.880	0.876
CatBoost	0.98	0.96	0.98	0.89	0.87	0.93	0.95	0.931	0.916	0.933	0.932
CNN	0.99	0.93	0.99	0.91	0.90	0.92	0.92	0.936	0.923	0.937	0.936
ExtraTrees	0.96	0.96	0.93	0.87	0.88	0.91	0.95	0.922	0.905	0.923	0.921
GBDT	0.96	0.98	0.97	0.86	0.88	0.92	0.95	0.928	0.912	0.931	0.928
Hard Voting	0.96	0.96	0.96	0.89	0.86	0.95	0.93	0.922	0.905	0.919	0.921
HistGBoost	0.98	0.94	0.97	0.89	0.90	0.93	0.95	0.935	0.920	0.937	0.935
LGBM	0.98	0.94	0.98	0.89	0.89	0.92	0.95	0.933	0.918	0.933	0.933
MLPC	0.74	0.79	0.94	0.82	0.70	0.83	0.94	0.834	0.796	0.806	0.831
RF	0.96	0.96	0.97	0.90	0.86	0.91	0.95	0.925	0.908	0.924	0.924
Soft Voting	0.98	0.94	0.97	0.89	0.86	0.95	0.95	0.930	0.914	0.932	0.930
Stacking	0.98	0.94	0.97	0.90	0.90	0.92	0.94	0.931	0.916	0.933	0.932
SVM	0.86	1.00	0.92	0.80	0.81	0.89	0.97	0.889	0.866	0.897	0.892
XGBoost	0.98	0.94	0.97	0.92	0.89	0.90	0.94	0.928	0.912	0.932	0.929

Table 8. PA evaluation of the classification results of the feature selection dataset.

Model	Producer Accuracy (PA)							OA	Kappa	MBA	F1- Score
Model	Urban	Unutilized Land	Shadow	Farmland Returned to Forest Land	Grassland	Forest	Farmland	OA	Kappa	MBA	F1- Score
AdaBoost-DT	0.94	0.92	0.97	0.92	0.88	0.97	0.96	0.934	0.919	0.932	0.934
CART	0.91	0.96	0.93	0.83	0.81	0.81	0.92	0.876	0.848	0.880	0.876
CatBoost	0.89	0.91	0.94	0.92	0.85	0.98	0.96	0.926	0.909	0.924	0.925
ExtraTrees	0.96	0.98	0.93	0.91	0.86	0.92	0.97	0.930	0.915	0.928	0.931
GBDT	0.91	0.94	0.96	0.90	0.86	0.93	0.95	0.922	0.905	0.924	0.922
Hard Voting	0.92	0.92	0.97	0.91	0.85	0.95	0.94	0.923	0.907	0.924	0.923
HistBoost	0.89	0.98	0.95	0.87	0.90	0.88	0.97	0.923	0.907	0.924	0.924
LGBM	0.98	0.94	0.97	0.88	0.88	0.89	0.93	0.917	0.898	0.924	0.917
RF	0.94	0.90	0.96	0.89	0.86	0.93	0.96	0.925	0.909	0.923	0.925
Soft Voting	0.92	0.92	0.96	0.91	0.83	0.96	0.96	0.925	0.909	0.923	0.925
Stacking	0.92	0.92	0.96	0.91	0.85	0.98	0.97	0.931	0.916	0.931	0.932
XGBoost	0.85	0.89	0.97	0.84	0.83	0.95	0.93	0.902	0.880	0.913	0.901

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, F.; Gao, X.; Li, R.; Zhang, H. Ensemble Learning for the Land Cover Classification of Loess Hills in the Eastern Qinghai–Tibet Plateau Using GF-7 Multitemporal Imagery. Remote Sens. 2024, 16, 2556. https://doi.org/10.3390/rs16142556

AMA Style

Shi F, Gao X, Li R, Zhang H. Ensemble Learning for the Land Cover Classification of Loess Hills in the Eastern Qinghai–Tibet Plateau Using GF-7 Multitemporal Imagery. Remote Sensing. 2024; 16(14):2556. https://doi.org/10.3390/rs16142556

Chicago/Turabian Style

Shi, Feifei, Xiaohong Gao, Runxiang Li, and Hao Zhang. 2024. "Ensemble Learning for the Land Cover Classification of Loess Hills in the Eastern Qinghai–Tibet Plateau Using GF-7 Multitemporal Imagery" Remote Sensing 16, no. 14: 2556. https://doi.org/10.3390/rs16142556

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ensemble Learning for the Land Cover Classification of Loess Hills in the Eastern Qinghai–Tibet Plateau Using GF-7 Multitemporal Imagery

Abstract

1. Introduction

2. Study Area and Data Sources

2.1. Research Area

2.2. Data and Preprocessing

2.2.1. GF-7 Images and Preprocessing

2.2.2. DEM Production Based on a GF-7 Stereo Image Pair

2.2.3. UAV Images and Preprocessing

2.2.4. Classification Sample Data

2.2.5. Classification System

3. Methods

3.1. Overview

3.2. Classifiers

3.2.1. Bagging

3.2.2. Boosting

3.2.3. Stacking

3.2.4. Voting

3.2.5. Single Strong Classifier

3.3. Classification Feature Extraction and Optimization

3.4. Classification Dataset Construction

3.5. Accuracy Evaluation Metrics

4. Classification Results and Accuracy Evaluation

4.1. Classification Results

4.2. Accuracy Evaluation

4.2.1. Evaluation of the Classification Accuracy Using Single-Temporal Data

4.2.2. Evaluation of the Classification Accuracy Using Multitemporal Data

4.3. Classification Feature Optimization and Classification Accuracy Evaluation

4.3.1. Evaluation of Classification Feature Importance and Feature Optimization

4.3.2. Classification Results and Accuracy Evaluation after Classification Feature Optimization

5. Discussion

5.1. Impact of the Classifier on the Classification Results

5.1.1. Comparison of the Overall Accuracies of the Classifiers

5.1.2. Comparison of the Differences in the Recognition Accuracy of the Specific Ground Features by Classifier

5.2. Effect of Classification Data on Separability and the Classification Results of Ground Features

5.2.1. Effect of Classification Datasets on the Classification Accuracy

5.2.2. Effect of Classification Datasets on the Separability of Land Types

5.2.3. Impact of Classification Datasets on Classification Time

5.3. Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI