1. Introduction
Mountainous areas account for approximately a quarter of the Earth’s total land area [
1]. A mountain–plain transition zone consists of plains (with an elevation below 200 m above the sea), hills (200–500 m), and mountains. Mountains can be further categorized into low mountains (50–500 m), medium mountains (500–2500 m, with terrain relief exceeding 100 m or a slope gradient steeper than 25°), and high mountains (>2500 m). A mountain–plain transition zone is usually characterized by diverse vegetation types due to the distinct convergence of horizontal and vertical land covers [
2,
3]. Cropland, grassland, shrubland, and forests play significant roles in sustainable agriculture, soil and water conservation, climate regulation, and biodiversity protection [
4]. However, unique stepped geomorphic structures in mountain–plain transition zones can lead to poor surface material stability, fragile ecosystems, and frequent mountain hazards. Human activities such as deforestation and agricultural expansion further disrupt ecosystems, exacerbating economic development and ecological conservation in many transition zones [
5,
6]. For instance, deforestation in mountainous areas often leads to significant soil erosion, reducing soil fertility and water retention capacity, which in turn affects downstream agricultural productivity and water resources. Similarly, agricultural expansion in plains, often associated with monoculture farming, disrupts biodiversity and alters carbon cycling processes. These land-use changes significantly impact ecological indicators such as soil stability, vegetation health, and water quality. To protect and manage resources effectively in such zones, advanced methods for mapping high-resolution vegetation are pressingly needed to identify and monitor various vegetation types more accurately. However, complex terrains, steep slopes, and divergent vegetation types in mountain–plain transitional zones make traditional field surveys infeasible, as field surveys require extensive human and financial resources and thus often fail to provide timely real-world data [
7]. To address methodological challenges, remote sensing techniques have unique advantages in terms of multi-temporal and multispectral data acquisition, cost-effectiveness, and efficiency and thus have become an increasingly effective tool for vegetation classification [
8].
Applying remote sensing techniques to vegetation identification involves extensive uses of information sourced from MODIS [
9], Landsat [
10], Sentinel [
11], Gaofen (GF) satellite data [
12], and UAV imagery [
13]. MODIS, with a high temporal but low spatial resolution, is suitable for large-scale dynamic monitoring when combined with other datasets. Landsat provides a medium-scale spatial resolution and is thus well suited for vegetation classification over a large area [
14,
15]. Sentinel series data, with its high spatial resolution (10 m), frequent revisit cycle (5 days), and high spatio-temporal resolutions, offers significant advantages in deciphering small-scale imagery, monitoring fine changes in vegetation, and addressing challenges in both mountainous and plain areas [
16]. The fine spatial resolution of Sentinel-2 enables improved vegetation classification in terrain-shadowed mountainous regions, while the radar capabilities of Sentinel-1 facilitate effective monitoring under cloudy conditions [
7,
17]. These advantages, combined with its global coverage and free accessibility, ensure the reliability and applicability of Sentinel data for vegetation classification in mountain–plain transition zones.
Remote sensing classification mainly consists of Object-Based Image Analysis (OBIA) and Pixel-Based Classification (PBC) [
18,
19]. Studies have shown that, compared with traditional pixel-based methods, object-based methods (which utilize features such as shape and texture) better preserve spatial information and reduce noise in classification results. The OBIA methods often produce more accurate classification and perform well in vegetation classification across diverse geographic regions, comprising mountains, plains, and watersheds [
20,
21].
Including too many features can reduce algorithm efficiency and dilute the importance of key features, causing decreased accuracy in vegetation recognition [
22]. To address this challenge, researchers have attempted to combine different features to achieve better classification performance [
23,
24]. While such combinations can somewhat improve accuracy, they can induce new problems such as collinearity and data redundancy [
25]. To enhance feature selection efficiency and identify key features, feature selection algorithms like Recursive Feature Elimination based on Random Forest (RF-RFE) and ReliefF have been widely applied to studies of grasslands, forests, and urban vegetation types [
25,
26,
27,
28]. However, no studies have yet verified which feature selection method performs best for vegetation type recognition in the mountain–plain transition zones.
Remote sensing-based vegetation recognition methods primarily include supervised, unsupervised, and machine learning approaches [
29]. Supervised and unsupervised methods are widely used. However, achieving high-accuracy classification in the complex terrain of a mountain–plain transition zone faces challenges not only in data acquisition but also in terms of the limitations in classification algorithms, making it even harder to meet the demand for fine-grained vegetation classification [
30].
Machine learning methods, known for their outstanding capabilities in nonlinear data processing and feature extraction, have been widely applied in vegetation classification [
31]. In plains, methods commonly used for vegetation extraction include Random Forest (RF) [
32,
33], convolutional neural networks (CNNs), Support Vector Machine (SVM), and Backpropagation Neural Networks (BPNNs), which are mainly applied to grassland and crop identification [
34,
35,
36,
37]. In mountainous regions, methods of RF, one-dimensional convolutional neural networks (1D-CNNs) [
38,
39], hierarchical classifiers, and Gradient Tree Boosting (GTB) are employed to classify various forest types [
7,
40,
41]. However, the performance of these methods in the complex terrains of mountain–plain transition zones remains unclear, highlighting the need to explore the potential applications of machine learning methods in such transitional landscapes.
To address this, we selected a 1D-CNN [
38,
39], RF [
32,
33], and Multilayer Perceptron (MLP) [
42,
43] for testing. These algorithms demonstrate distinct strengths under complex terrain conditions and excellent performance in mountain vegetation classification. For example, MLP excels at capturing global features in continuous vegetation transitions; RF is highly effective in terms of noise resistance and efficient feature selection, making it suitable for distinguishing vegetation types with subtle spectral differences; and a 1D-CNN, with its ability to extract local spatial patterns, is particularly well suited to areas with complex terrains and high vegetation heterogeneity. These characteristics make the selected algorithms more capable of meeting the demands for efficient vegetation classification in mountain–plain transition zones.
Mountain–plain transition zones pose several challenges for remote sensing-based vegetation mapping. First, the availability of data sources is limited due to frequent cloud and fog coverage, and thus data are susceptible to terrain shadows. For instance, in the mountainous regions of Mianzhu City, persistent cloud cover during the rainy season leads to incomplete satellite imagery, necessitating additional cloud removal and terrain correction, which dramatically increases the complexity of data preprocessing [
44,
45]. Second, vegetation types in mountainous regions are primarily influenced by elevation and climate, with forests being a good example [
46,
47]. In contrast, vegetation in plains (predominantly croplands and grasslands) is mainly influenced by human settlements and policy factors [
48]. For example, forest patches in the mountains of Mianzhu often transition to grasslands or croplands in the adjacent plains due to land-use policies, resulting in highly heterogeneous vegetation patterns. Therefore, selecting features that account for both mountain and plain vegetation characteristics remains a significant challenge. Moreover, the applicability of classification algorithms is limited. While these methods have achieved sound results in standalone mountain or plain areas, their effectiveness in a more complicated terrain like a mountain–plain transition zone remains to be verified.
To address these challenges, this study selected Mianzhu City of Sichuan province in China as the case study area. Mianzhu is geographically characterized by both mountainous and plain terrains. Using an object-based classification approach and Sentinel-1, Sentinel-2, and DEM data, we extracted spectral, texture, topographic, and SAR features as well as the vegetation index. By integrating three machine learning algorithms (1D-CNN, MLP, and RF), we systematically evaluated the performance of different feature combinations and feature selection methods (RFE-RF and ReliefF) for vegetation classification in this mountain–plain transition zone. This study contributes to the literature in three aspects. First, it analyzes the effects of different feature combinations on vegetation classification in a mountain–plain transition zone. Second, it compares the performance of RF-RFE and ReliefF feature selection algorithms to determine the optimal feature combination for vegetation classification in the study area. Third, it evaluates the performance of 1D-CNN, MLP, and RF in vegetation classification within the mountain–plain transition zone, providing both theoretical and practical support for high-accuracy vegetation classification in complex terrains. Additionally, it addresses the Special Issue’s question “How can machine learning and artificial intelligence improve the analysis and interpretation of remote sensing data for vegetation monitoring?”
4. Results
4.1. Classification Accuracy of Different Feature Combinations
In this study, spectral features, texture features, terrain features, vegetation indices, and SAR features were sequentially added to the three machine learning algorithms to investigate the impact of different feature combinations on classification accuracy and execution time (
Table 4); the time unit is expressed in seconds (s). To minimize errors caused by sample selection variability, all three algorithms used the same training and validation samples, with the random seed set to 42.
Based on
Table 4 and
Figure 6, among single features (F1–F5), spectral features (F1) achieved the highest classification accuracy in the 1D-CNN, MLP, and RF algorithms, reaching 75.29%, 79.74%, and 75.95%, respectively. Therefore, for efficiency, subsequent feature combinations primarily focused on spectral features to evaluate their accuracy in the three machine learning algorithms. Sentinel-1 features (F5) showed the lowest accuracy and were excluded from subsequent three-feature and four-feature combinations.
For two-feature combinations (F6–F9), the combination of spectral and terrain features (F8) achieved the highest classification accuracy in 1D-CNN, MLP, and RF, with accuracies of 76.96%, 82.13%, and 80.23%, respectively. Among three-feature combinations (F10-F12), the MLP algorithm performed best with the combination of spectral features, vegetation indices, and terrain features (F11), achieving an accuracy of 82.29%. Meanwhile, 1D-CNN and RF performed best with the combination of spectral, texture, and terrain features (F12), achieving accuracies of 77.75% and 79.95%, respectively.
For four-feature (F13) and five-feature combinations (F14), the MLP and RF algorithms performed best with the combination of spectral, terrain, and texture features and vegetation indices (F13), achieving accuracies of 80.29% and 79.13%, respectively. The 1D-CNN algorithm performed best with the five-feature combination (F15: spectral, terrain, texture, vegetation indices, and Sentinel-1 SAR), achieving an accuracy of 79.38%.
Overall, spectral features (F1) demonstrated significant importance across all algorithms, while the combination of terrain and spectral features (e.g., F8 and F11) further improved classification performance, highlighting the complementary role of the features. This result indicates that multi-feature combinations are beneficial for enhancing the accuracy of vegetation classification. The RF algorithm was highly sensitive to terrain and spectral features (F8), achieving its highest accuracy when these two features were combined. Building on this, adding vegetation indices (F11) enabled the MLP algorithm to achieve the highest accuracy. As the number of input features increased, the classification accuracy of the 1D-CNN algorithm improved consistently, reaching its best performance when all features (F14) were included. In terms of execution time (
Table 3), the RF algorithm was the most efficient, while the 1D-CNN algorithm resulted in the lowest efficiency.
4.2. Comparison Between RF-RFE and ReliefF Feature Optimization Algorithms
This study used a total of 80 features. To avoid multicollinearity among features, those with correlations above 0.96 were removed, leaving 72 features. To further improve classification accuracy and computational efficiency, and to reduce redundancy within the same category, these 72 features were filtered using two feature selection methods. The RF-RFE algorithm ultimately retained 18 features. For better comparison, the ReliefF algorithm also retained the top 18 features based on weight rankings. The feature weights and retention results are reported in
Table 5. RF-RFE retained four types of features, with the highest contributions from topographic and spectral features. ReliefF retained five types of features, with topographic features and vegetation indices showing the highest contributions.
The difference in the retained features reflects the distinct selection mechanisms of the two algorithms. RF-RFE selects features by recursively eliminating the least important ones based on the importance scores calculated by the Random Forest model, prioritizing features that minimize redundancy and maximize model performance. In contrast, ReliefF ranks features by evaluating their ability to distinguish between neighboring samples of different classes, focusing on feature relevance rather than redundancy. These differing criteria explain the variation in the final selected features.
When the selected features were input into the three classification models to compare classification accuracy and runtime (
Table 6), RF-RFE outperformed ReliefF. Specifically, the features selected by RF-RFE achieved higher classification accuracy across all classification methods, with particularly outstanding performance with the MLP algorithm. This indicates that RF-RFE better addresses the specific classification challenges in the mountain–plain transition zone of the study area. Therefore, in subsequent vegetation classification predictions for the entire study area, the 18 features retained by RF-RFE were used as the final feature set.
4.3. Comparison of Mountain Vegetation Mapping Based on Different Classifiers
Based on the 18 optimized features selected by the RF-RFE algorithm, final classification results were obtained for the training, testing, and prediction sets across the three classifiers (
Figure 7). Cultivated vegetation is primarily distributed in the southern plains surrounding residential areas. Grasslands are mainly distributed along rivers and roads, with some areas near cultivated plants. Forests are mainly located in the northern mountainous regions, where, with increasing elevation, the vegetation transitions through evergreen broadleaved forest (EBF), deciduous broadleaved forest (DBF), shrubland, and coniferous forest (CF).
According to
Figure 8, the most widely distributed vegetation type in Mianzhu City is cropland, followed by shrubland, CF, DBF, and EBF, with grassland covering the smallest area.
4.4. Accuracy Assessment of Classification Results
The overall classification accuracy of the three models and their classification performance for different vegetation types are detailed in
Table 7. For shrubland, PA ranges from 82.95% to 90.18% and UA from 76.7% to 77.59% across the three models, indicating that shrubland is more likely to be omitted than misclassified. RF is the model most prone to omission and misclassification for shrubland. For EBF, PA ranges from 51.89% to 61.56% and UA from 69.05% to 71.73%, suggesting that EBF is more likely to be misclassified than omitted. MLP and 1D-CNN are the models most prone to omission and misclassification for EBF, with 1D-CNN and MLP showing the fewest cases of each, respectively. For DBF, PA ranges from 75.47% to 80.25% and UA from 70.39% to 73.22%, indicating that DBF has a higher chance of omission than misclassification. RF and MLP are the models most prone to omission and misclassification, while 1D-CNN and MLP show the fewest cases of each. For CF, PA ranges from 78.98% to 82.64% and UA from 79.72% to 86.67%, indicating a higher likelihood of misclassification than omission. RF and 1D-CNN are the most prone to omission and misclassification, with RF and MLP having the highest rates for each. For grassland, PA ranges from 75.84% to 88.16% and UA from 85.67% to 91.01%, showing that misclassification is more likely than omission. MLP and 1D-CNN are the most prone to both, but 1D-CNN and MLP have the fewest cases. For cropland, PA ranges from 90.78% to 96% and UA from 91.81% to 96.73%, also indicating a higher likelihood of misclassification than omission. 1D-CNN and MLP are the models most prone to each, but 1D-CNN and MLP exhibit the fewest cases.
Overall, among the three models, MLP shows the best performance in terms of OA, the most balanced classification accuracy (AA), and the strongest model reliability (Kappa). Although each model varies in performance across vegetation types, shrubland, grassland, and cropland show relatively good classification results across all models. However, for the complex EBF category, all three models exhibit misclassifications, indicating that improving classification accuracy for this category remains a key focus for future research.
Figure 9 shows that cropland has the best classification results across all models (1D-CNN: 96.73%; MLP: 91.81%; RF: 93.59%), with the most frequent misclassification occurring as grassland. EBF has the poorest classification results across all models (1D-CNN: 51.89%; MLP: 61.56%; RF: 54.6%), with the most frequent misclassification as DBF.
Overall, 1D-CNN achieves the highest prediction accuracy for cropland but shows notable misclassifications between shrubland and EBF. MLP has better accuracy for grassland predictions but higher misclassification rates for EBF and DBF. RF exhibits greater misclassification between shrubland and EBF but has higher accuracy for DBF, grassland, and cropland.
To further compare differences among classifiers, two sub-regions within the study area were selected for localized analysis. One region is located in the mountainous northern part of the study area (
Figure 10b), while the other is in the southwestern part, containing a mix of mountains and plains (
Figure 10f). In the elliptical region of
Figure 10b, the actual vegetation type is primarily EBF, with a smaller proportion of shrubland. However, in the results using the 1D-CNN algorithm (
Figure 10c) and RF algorithm (
Figure 10e), parts of EBF were misclassified as shrubland. In contrast, the MLP algorithm (
Figure 10d) produced results more consistent with the actual vegetation distribution in the study area. In the elliptical region of
Figure 10f, the actual vegetation type is predominantly cropland, with grassland confined to areas near rivers. However, the 1D-CNN algorithm (
Figure 10g) misclassified parts of cropland as grassland. By comparison, the MLP algorithm (
Figure 10h) and the RF algorithm (
Figure 10i) accurately identified grassland and cropland, producing results that align better with the actual vegetation distribution in the study area.
6. Conclusions
This study screened key features with high contribution rates using different feature combinations and feature selection algorithms, and applied 1D-CNN, MLP, and RF algorithms to classify and predict vegetation types in a mountain–plain transition zone. There are three main conclusions. First, multi-feature combinations significantly improved model classification accuracy. Compared with single features, combining multiple features effectively enhanced classification accuracy. Among all algorithms, the combination of spectral and topographic features performed the best. Although the inclusion of indices, texture features, and SAR features expanded the data dimensions, the increase in data redundancy led to an overall decrease in classification accuracy. Second, RF-RFE is the optimal feature selection algorithm. In this study, the RF-RFE algorithm demonstrated excellent performance, ultimately identifying eighteen key features across four feature types. This selection strategy effectively reduced data redundancy and significantly improved the model’s classification accuracy. Third, all three algorithms performed well, but the MLP algorithm outperformed the others. In vegetation type classification, 1D-CNN, MLP, and RF algorithms all showed strong classification capabilities. Among them, the MLP algorithm achieved the highest OA of 81.65% and a Kappa coefficient of 77.75%. It exhibited outstanding performance, particularly in classifying shrubland, evergreen broadleaf forests, and grasslands. This study demonstrates that integrating remote sensing data with machine learning enables efficient and accurate vegetation classification in mountain–plain transition zones. The proposed method provides reliable technical support for ecological restoration, conservation planning, and sustainable land-use decision-making. By significantly reducing the cost and time of manual surveys, minimizing data redundancy, and lowering computational requirements, the method proves highly practical for regional management and highlights its potential for large-scale ecological assessments and policy development.