Analysis of Vegetation Canopy Spectral Features and Species Discrimination in Reclamation Mining Area Using In Situ Hyperspectral Data

Wang, Xu; Xu, Hang; Zhou, Jianwei; Fang, Xiaonan; Shuai, Shuang; Yang, Xianhua

doi:10.3390/rs16132372

Open AccessArticle

Analysis of Vegetation Canopy Spectral Features and Species Discrimination in Reclamation Mining Area Using In Situ Hyperspectral Data

by

Xu Wang

^1,*,

Hang Xu

¹,

Jianwei Zhou

²,

Xiaonan Fang

¹

,

Shuang Shuai

³

and

Xianhua Yang

^4,5

¹

School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China

²

School of Environmental Studies, China University of Geosciences, Wuhan 430074, China

³

School of Civil Engineering and Architecture, Wuhan Polytechnic University, Wuhan 430023, China

⁴

Sichuan Key Laboratory of Rare Earth Strategic Resources, Sichuan Geological Survey Institute, Chengdu 610081, China

⁵

Geo-Big Data Center, Sichuan Geological Survey Institute, Chengdu 610081, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(13), 2372; https://doi.org/10.3390/rs16132372

Submission received: 30 April 2024 / Revised: 18 June 2024 / Accepted: 25 June 2024 / Published: 28 June 2024

(This article belongs to the Special Issue Local-Scale Remote Sensing for Biodiversity, Ecology and Conservation)

Download

Browse Figures

Versions Notes

Abstract

The effective identification of reclaimed vegetation species is important for the subsequent management of ecological restoration projects in mining areas. Hyperspectral remote sensing has been used for identifying vegetation species. However, few studies have focused on mine-reclaimed vegetation. Even if there are studies in this field, the methods used by the researches are mainly traditional discriminant analyses. The environmental conditions of reclaimed mining areas lead to significant intraclass spectral differences in reclaimed vegetation, and there is uncertainty in the identification of reclaimed vegetation species using traditional classification models. In this study, in situ hyperspectral data were used to analyze the spectral variation in the reclaimed vegetation canopy in mine restoration areas and evaluate their potential in the identification of reclaimed vegetation species. We measured the canopy spectral reflectance of five vegetation species in the study area using the ASD FieldSpec 4. The spectral characteristics of vegetation canopy were analyzed by mathematically transforming the original spectra, including Savitzky–Golay smoothing, first derivative, reciprocal logarithm, and continuum removal. In addition, we calculated indicators for identifying vegetation species using mathematically transformed hyperspectral data. The metrics were submitted to a feature selection procedure (recursive feature elimination) to optimize model performance and reduce its complexity. Different classification algorithms (regularized logistic regression, back propagation neural network, support vector machines with radial basis function kernel, and random forest) were constructed to explore optimal procedures for identifying reclaimed vegetation species based on the best feature metrics. The results showed that the separability between the spectra of reclaimed vegetation can be improved by applying different mathematical transformations to the spectra. The most important spectral metrics extracted by the recursive feature elimination (RFE) algorithm were related to the visible and near-infrared spectral regions, mainly in the vegetation pigments and water absorption bands. Among the four identification models, the random forest had the best recognition ability for reclaimed vegetation species, with an overall accuracy of 0.871. Our results provide a quantitative reference for the future exploration of reclaimed vegetation mapping using hyperspectral data.

Keywords:

reclamation mining area; in situ hyperspectral data; species discrimination; feature recursive elimination

Graphical Abstract

1. Introduction

The exploitation of coal resources has played a significant role in the economic development of China [1]. However, surface mining, a crucial method of coal mining, inevitably leads to environmental and land issues [2]. Surface mining permanently alters the topography and geological structure by removing vegetation and soil and disrupts the hydrological structure of the surface and underground [3]. Land reclamation is one of the most commonly used technologies to reduce soil degradation and restore ecological functions in mining areas [4]. Monitoring the vegetation composition in reclaimed areas is important for understanding the succession process of reclaimed vegetation communities and evaluating the effectiveness of mine rehabilitation treatment. Remote sensing provides a necessary means for monitoring the physiological conditions of reclaimed vegetation over large areas [5], but the prerequisite for accurate monitoring is to identify the reclaimed vegetation species [6]. Traditional multispectral remote sensing technology has been used to identify general vegetation classes or broad vegetation communities, while hyperspectral remote sensing technology has been further applied to species-level vegetation identification due to its multiple wavebands and narrow channels [7].

Hyperspectral data have been shown to have good potential for vegetation species discrimination [8]. However, few studies have used hyperspectral data for vegetation species classification in mine reclamation areas. Vegetation species types are still difficult to distinguish in coal mine reclamation areas, resulting in limited existing research on vegetation health assessments in reclaimed areas [9]. Previous studies used in situ hyperspectral data for reclaimed vegetation species discrimination. The spectra of reclaimed vegetation were affected by soil, resulting in the poor classification accuracy of detailed species [10]. The complex habitat conditions within mine sites affect vegetation growth, resulting in high variability in the vegetation coverage of the same species [11]. This variation in vegetation coverage leads to different proportions of soil mixed within a field of view (FOV), weakening the spectral characteristics of reclaimed herbaceous vegetation. Spatial heterogeneity in soil type, soil color, and soil moisture conditions can cause large variations in canopy spectra, even for the same vegetation species [12], thereby challenging the generality of reclaimed vegetation identification modeling. To mitigate the soil effects, spectral transformation methods have been proposed to extract or enhance critical information from a quasi-continuous signal [13]. Mathematical transformation methods, such as derivative transformation [14], logarithm transformation [15], continuum removal and band depth analysis [16], are major spectral transformation methods. Therefore, it is necessary to transform the spectra to eliminate the soil effects and enhance the spectral characteristics of vegetation canopy when using in situ hyperspectral data to identify reclaimed vegetation.

There are many challenges in the practical application of species classification using hyperspectral data, such as high dimensionality, feature redundancy, and the selection of appropriate classification algorithms. Therefore, when using hyperspectral data for vegetation species discrimination, it is necessary to minimize data redundancy and use a classification method suitable for high-dimensional features [17]. Discriminant analysis is a commonly used supervised classification method [18]. Discriminant analyses, such as stepwise and Fisher, can be used as both dimensionality reduction and classification techniques [19], allowing them to be widely used in hyperspectral classification [20], and they have also been used for reclaimed vegetation species identification in recent studies [6]. However, in practice, discriminant analysis is prone to computational instability with small samples [21], and the presence of multicollinearity in classification variables may impair the generalization ability of discriminant analyses [22]. Therefore, further research is needed to explore different procedures for constructing reclamation vegetation species discrimination models using hyperspectral data. Regularized logistic regression, as a parametric model, is simple and efficient, capable of solving the multicollinearity problem and avoiding the overfitting caused by high-dimensional data [23], making it widely used in various classification problems [24]. In addition, non-parametric models, such as neural networks, support vector machines, and random forests, are used for hyperspectral classification due to their ability to identify complex nonlinear relationships and handle small samples of high-dimensional data [25]. Compared to traditional discriminant analysis models, these techniques may more accurately identify reclaimed vegetation species.

In addition to selecting an appropriate classification algorithm, the high-dimensional features of hyperspectral data present many challenges for classification, such as the Hughes phenomenon when numerous of bands are used for classification [26], and overfitting when using multivariate statistical techniques [27]. Therefore, reducing the dimensionality of the feature space is necessary to ensure the accuracy and interpretability of the model. In this context, feature selection methods, such as recursive feature elimination (RFE), are effective in determining the best set of predictors for the model. RFE, an embedded feature selection method, maximizes model performance by reducing the complexity of the model, improving generalization efficiency, and avoiding overfitting [28].

To address these challenges, this study aims to explore the optimal procedures to improve the identification of reclaimed vegetation using in situ hyperspectral data in mining areas, through a comparative analysis of different algorithms. For this purpose, we analyzed the canopy spectral characteristics of reclaimed vegetation and constructed a variety of hyperspectral metrics to maximize the potential information related to vegetation species identification. By using the RFE algorithm, we processed the data dimensions and evaluated the impact of reducing the number of input metrics on model performance.

2. Materials and Methods

2.1. Study Area

The study area is a large open-pit coal mining region located in northeastern Qinghai Province, comprising two mining sites, the western mining area and the eastern mining area, covering approximately 400 km² (Figure 1). The altitude ranges from 3800 to 4300 m, and it experiences a typical plateau continental climate, with an average annual temperature of −4.2 °C and an annual rainfall ranging from 473 to 484 mm. The natural vegetation type in the area is alpine meadow, with the dominant species including Kobresia tibetica, Carex atrofusca Schkuhr Riedgr, Carex scabrirostris and Carex moorcroftii [29].

Coal is the primary mineral extracted from the mining area, with mining beginning in the 1970s. Restoration treatment commenced in 2020, involving the integrated comprehensive remediation of mining pits and slag hills, soil reconstruction, and regreening by planting grass. The mines are located in an alpine cold region, so the selection of reclamation vegetation species needs to be considered for resistance to harsh environmental conditions, particularly cold resistance [11]. Previous studies conducted reclamation experiments in designated areas using various grass species, selecting those suitable for restoring grassland diversity [30]. Therefore, native grass species adapted to alpine environments were chosen for this reclamation project, including Elymus nutans (E. nutans), Poa pratensis (P. pratensis), Poa crymophila (P. crymophila), and Festuca sinensis (F. sinensis). However, there are large intra-class differences in vegetation coverage due to different environmental factors (such as slopes) within the habitats (Figure 1d). For detailed information on the vegetation (chemical composition and plant height), please refer to the Supplementary Files (Tables S1 and S2).

2.2. In Situ Hyperspectral Measurements and Pre-Processing

A field survey was conducted in August 2022 during the peak of vegetation maturity. We used a portable spectrometer (ASD FieldSpec 4) to measure the canopy spectra of natural vegetation (vegetation communities mainly consisting of Kobresia tibetica, Carex atrofusca Schkuhr Riedgr, Carex scabrirostris), Elymus nutans, P. pratensis, P. crymophila, and F. sinensis in the natural meadow and reclaimed area of the mine site. The spectrometer has a wavelength range of 350–2500 nm, with a spectral resolution and sampling interval of 3 nm and 1.4 nm in the spectral region of 350–1000 nm, and 6 nm and 2 nm in the spectral region of 1000–2500 nm. Spectra collection was carried out under sunny, windless, and dry conditions, between 10:00 and 14:00. The field of view (FOV) of the spectrometer is 25°. We positioned the spectrometer probe vertically approximately 1 m above the top of the vegetation canopy, and the diameter of the FOV on the ground was approximately 0.5 m. The instrument was optimized once before measurement, and dark currents were corrected. Subsequently, a standard whiteboard was used for calibration to determine absolute reflectance. We collected 69 samples of grassland canopy spectra at different vegetation coverage levels throughout the study area, including 13 samples of P. pratensis, 14 samples of P. crymophila, 14 samples of E. nutans, 13 samples of F. sinensis, and 14 samples of natural vegetation, where natural vegetation samples were used as a reference group for comparison with reclaimed vegetation spectra to assess spectral intraclass differences and species identification accuracy. Natural vegetation samples were obtained near the mine site and reclaimed vegetation samples were obtained from the reclaimed area. Spectral samples were taken 50 m apart at each sampling point to avoid spatial correlation owing to group proximity in the dataset. Each sample was collected 5 times, and the average spectra was used as the final spectral reflectance curve of grassland canopy. Owing to noise caused by atmospheric water vapor in the spectra, spectral ranges of 1350–1400 nm, 1800–1940 nm, and 2400–2500 nm were excluded in the subsequent analysis. To remove noise and maintain the original characteristics of the spectral profile [31], the original spectra were smoothed using a Savitzky–Golay (S–G) filter with a moving window of 5 and a second-order polynomial fitting. After S–G smoothing, the spectra are recorded as the original spectrum (OR). Then, first order derivative transformation (d(R)) [14], reciprocal logarithm transformation (LogR⁻¹) [15], and continuum removal transformation (CR) [16] were applied.

2.3. Calculation of Hyperspectral Indicators

We calculated several indicators from the hyperspectral data: 14 vegetation indexes (Table S3), and 21 indicators extracted from transformed spectra by reciprocal logarithm, first order derivative, and continuum removal processing (Table S4). These indicators reflect various biophysical properties of the vegetation, such as chlorophyll content, leaf surface canopy, canopy structure, vegetation utilization efficiency of incident light in photosynthesis, relative nitrogen content in vegetation canopy, carbon content in the dry state of cellulose and lignin, stress-related pigments, and water content in vegetation canopy. For detailed information and the equations for the indicators, please refer to the Supplementary Files (Tables S3 and S4).

2.4. Classification Models

Four classification models were considered in this study: regularized logistic regression (RLR), back propagation neural network (BPNN), support vector machines with radial basis function kernel (SVM), and random forest (RF) (Table 1), all of which were implemented using the CRAN R package caret.

2.4.1. Regularized Logistic Regression

Logistic regression is a simple and effective supervised classification method that can provide the probabilities of output classes by the input variables and interpret the regression coefficients of these input variables [32]. Logistic regression classifiers, as linear classifiers, are usually used for modeling binary classification problems [17]. However, it can also handle multiclassification problems. In the LiblineaR package, a multiclassification approach using logistic regression is implemented by building a classifier for each category. In the case of only a few samples with many features, the performance of logistic regression can be improved through regularization [33]. The fundamental idea of regularization is to constrain the regression coefficients with a penalty function [23]. The most commonly used penalty functions are L1 regularization and L2 regularization. L1 regularization, also called lasso regularization, removes non-informative features by adding the L1 penalty (absolute value of coefficient magnitude) to the model’s loss function, leading to feature selection [34]. L2 regularization, also called ridge regularization, successfully solves the collinearity problem by adding the L2 penalty (squared magnitude of the coefficient) to the model’s loss function [35]. We tested three regularization methods: L1 regularization and L2 regularization (dual (constrained) or primal (regularized), with primal being used when the number of samples is greater than the number of features). In addition, the parameters to be set include: (1) cost of constraints violation (Cost); (2) tolerance of termination criterion for optimization (epsilon).

2.4.2. Back Propagation Neural Network

Neural networks, also generally referred to as artificial neural networks, simulate the structure of neural networks by imitating living organisms, enabling tasks such as pattern recognition and decision making [36]. To improve learning efficiency, this study used the back propagation neural network (BPNN) algorithm, which is a feedforward neural network based on the backpropagation algorithm, consisting of several layers: input, hidden, and output layers, connected by neurons [37]. When data are input into the BPNN, it propagates forward through each layer until reaching the output layer. The output result is compared with the desired output. If a discrepancy exists, the error terms of the other layers are deduced backward from the output layer, and the bias of the objective function concerning each parameter is calculated and iteratively updated [38]. The main challenge in modelling using BPNN is to establish the network structure, including input nodes, hidden layers, hidden nodes and output nodes. The number of input nodes is determined by the input indicators. It has been proved that a hidden layer with enough nodes allows the neural network to approximate continuous functions of arbitrary complexity with arbitrary accuracy [39]. Therefore, we used a hidden layer in our model with the number of hidden nodes determined by experiments. The BPNN involves parameters for controlling the learning process: (1) number of units in the hidden layer (size); (2) parameter for weight decay (decay).

2.4.3. Support Vector Machines with Radial Basis Function Kernel

Support vector machines is a non-parametric machine learning method widely used in classification tasks [40]. Support vector machines project data into a high-dimensional feature space using kernel functions (such as linear, polynomial, and radial basis) to find the maximum margin hyperplane for each type of data, depending on the type of input data, thereby achieving classification. Kernel function, as an important part of the support vector machine, is a projection function that projects the original linear or nonlinear data into high-dimensional spatial features, and the newly composed data can be linear. Different kernel functions determine different characteristics of the model [41]. Linear kernel is computationally faster, but is limited to solving linearly separable problems [42]. Polynomial kernel can handle non-linearly separable problems and has global properties, but it has many parameters leading to slow computations [43]. The radial basis function kernel is known for its ability to handle non-linear data and strong learning ability [44]. A support vector machine with a radial basis kernel function outperforms linear and polynomial kernels in practical applications [40]. Besides the kernel function, the required parameters include cost (controlling the complexity of the boundary between support vectors) and sigma (smoothing parameter) [28]. SVM has demonstrated robustness to dimensionality, outliers in the training data, and strong generalization ability [45].

2.4.4. Random Forest

Random forest is a machine learning algorithm based on the bagging integrated learning theory and random subspace methods [46]. RF is an integrated learning paradigm based on decision tree classification. The system includes multiple decision trees, trained and combined using the bagging integrated-learning technique. The training samples are divided into several subsets, and a decision tree is generated for each subset. In addition, each node of the tree is split using a specified number of randomly selected features (mtry). For each new training subset used to build a decision tree, one-third of the samples are randomly excluded (out-of-bag samples), and the remaining samples (in-bag samples) are used to build the decision tree. Out-of-bag samples are used to evaluate the model performance, resulting in the selection of training subsets with high accuracy. After model calibration, the classification of new samples is obtained by voting in each tree, classifying samples into the most voted categories. RF reduces inter-tree correlation through bootstrapping and random feature selection, and uses random sampling in each tree structure to ensure that the subsets of different trees are unique and independent of each other [47], effectively solving the problem caused by collinearity among variables. Numerous theoretical and empirical studies have demonstrated that RF has high prediction accuracy, good tolerance to outliers and noise, and is not prone to overfitting [48].

2.5. Vegetation Identification Model Construction and Validation

We constructed a feature set consisting of the original bands processed by S–G smoothing (1860 bands), and 35 metrics, including the vegetation index and the metrics extracted from mathematically transformed spectra. First, all metrics were normalized using the standard deviation centered on the mean. Subsequently, four classification algorithms (RLR, BPNN, SVM, and RF) were applied in the modeling framework, consisting of three steps: feature selection, model training, and validation (Figure 2).

For feature selection, we used RFE from the caret package to select the best set of features for each classification algorithm. The RFE algorithm was used to evaluate the impact of the feature set on model performance, determined by classification accuracy, quantified in a 4-fold cross-validation repeated 10 times. The parameters of the classification algorithm were optimized using the grid search method, selecting the parameter with the highest precision as the best model parameter by internal 4-fold cross-validation. For each feature subset

(S_{i}, i = 1, 2, 3, \dots n)

, the top i most important feature metrics are extracted and construct a new dataset, and the classification accuracy of the model obtained from each feature subset is calculated and compared based on the new dataset.

The best set of metrics and parameters selected for each classification method was used to train the model. We employed a 4-fold cross-validation scheme with 10 repetitions to quantify the model performance, in terms of overall accuracy and F1 score. The F1 score is the harmonic mean of precision and recall, where precision measures the ability of model to identify specific objects, and recall measures the model’s ability to find all specific objects. The F1 score provides an overall accuracy of the model for a specific object by combining precision and recall [49].

K-fold cross-validation, or repeated k-fold cross-validation, is widely used for assessing the generalizability of classification algorithms [50]. In k-fold cross-validation, the dataset is split into k subsets, k-1 of the folds are used to train the model, and the performance of the model is verified on 1 of the fold that is not involved in training the model. This process is repeated k times until model validation is performed on all folds. The performance measure reported by k-fold cross-validation is the average of the values computed in the loop. In the performance evaluation of the four models, we used a 4-fold cross-validation strategy, whereby 75% of the data (52 samples) were used to train the models and the remaining 25% (17 samples) were used for model validation, repeating this step four times. The group 4-fold cross-validation was iterated 10 times to account for stochastic effects. In order to account for random effects, the grouped 4-fold cross-validation was iterated 10 times, and the final model performance metric was the average of the values calculated from these 10 cross-validations.

Finally, we calculated the importance of each feature in the model based on specific feature rank criteria, depending on the classification algorithm (Table 1).

2.6. Spectral Separability

The spectral separability between vegetation types was measured by computing the Jeffries–Matusita (JM) distance, using the optimal combination of parameters selected by each classifier and the RFE algorithm. The JM distance quantifies the average distance between two class density functions [51]. The lower and upper values of the JM distance are 0 and

\sqrt{2}

(~1.414), respectively. The lower value of 0 represents complete non-separability between categories, with larger values of the JM distance increasing the separability between categories until the upper value of 1.414 represents perfect separability between categories [13]. We used the JM distance value of ≥1.3718 (i.e., ≥97% separability accuracy) to indicate perfect separability between pairs of species [52].

3. Results

3.1. Spectral Characterization

The mathematical transformation spectra were obtained by different preprocessing methods, including the reciprocal logarithm, first derivative, and continuum removal transformations (Figure 3).

3.1.1. Original Spectral Analysis

The reflectance of each vegetation was variable, but the average reflectance spectra shape of the reclaimed vegetation is similar, as shown in Figure 4. Each reclaimed vegetation had similar peaks and troughs in the waveform characteristics, which were quite different from the spectral waveforms of natural vegetation near the mine area. In addition, the intraclass variation in the original spectra of each reclaimed vegetation was greater than that of the natural vegetation (Figure 3). The reflectance of each vegetation was relatively low in the ultraviolet (UV, 350–400 nm) and visible (VIS, 400–700 nm) regions. However, the spectral shapes of each vegetation have large differences in the 620–660 nm band, with P. crymophila showing a rapid increase in reflectance, forming a significantly different waveform compared to the other vegetation (Figure 4d). In the near-infrared (NIR, 700–1100 nm) region, the reflectance of each vegetation increased rapidly at the red edge of 700–750 nm, and the spectral curve of the vegetation showed a mixed line (Figure 4e). In the range of 760–1300 nm, the reflectance spectra of each vegetation formed three peaks and two troughs. It is noteworthy that the absorption depth of the two troughs of natural vegetation was significantly smaller than that of the four reclaimed vegetation. From the analysis of the original spectra, it was concluded that natural vegetation not only had differences in spectral reflectance, but also had greater separability in waveform compared to reclaimed vegetation, especially in the NIR region, with more pronounced differences between the peaks and troughs at 760–1000 nm. The waveform differences of P. crymophila in certain bands may also achieve its identification from other vegetation.

3.1.2. Reciprocal Logarithm Spectral Analysis

As shown in Figure 5, compared to the original spectra of vegetation, the reciprocal logarithmic transformed spectra enhanced the spectral characteristics of each vegetation in the ultraviolet (UV, 350–400 nm), visible (VIS, 400–700 nm), and short-wave infrared (SWIR, 1100–2500 nm) regions. In the range of 350–700 nm, the “one valley and two peaks” feature, consisting of blue peak, red peak, and green valley in the original spectra, changed to two valleys and one peak. In the position of the blue peak (430–480 nm), the spectral reflectance brightness of E. nutans was significantly greater than that of other vegetation, which could effectively distinguish E. nutans from other vegetation.

3.1.3. First Derivative Spectral Analysis

The overlapping reflectance curves of each vegetation in the red-edge region of the original spectra were well distinguished by the first derivative transformation (Figure 6), making it more effective for extracting the most important trilateral parameter: the red-edge parameter.

The trilateral parameters (red-edge, blue-edge, yellow-edge parameters) are important in the first derivative spectra. As shown in Table 2, there was no difference in the yellow-edge positions of the five vegetation species, which were all at 629 nm. The blue-edge position of natural vegetation is at 525 nm, and the blue-edge positions of reclaimed vegetation are all shifted towards a shorter wavelength compared with those of natural vegetation. As an important indicator of vegetation health, the red-edge position can measure the leaf area, growth and chlorophyll content of vegetation, and it also characterizes the degree of stress on vegetation [53]. The red-edge positions of the five vegetation species showed significant differences, with P. pratensis and F. sinensis located at 730 nm and 728 nm, and P. crymophila, natural vegetation, and E. nutans located at 716 nm, 718 nm, and 719 nm.

3.1.4. Continuum Removal Transform Spectral Analysis

Continuum removal process was applied to the original spectra (Figure 7), which significantly eliminated the influence of soil background noise on vegetation spectra and enhanced the characteristics of the absorption valleys. The continuum removal spectral curves of the five vegetation species had six obvious absorption valleys, located at 410 nm (390–430 nm), 485 nm (460–510 nm), 635 nm (560–710 nm), 985 nm (920–1050 nm), 1185 nm (1120–1250 nm), and 1470 nm (1400–1540 nm) (Figure 7b–g). In the UV and VIS regions, there were three absorption valleys, with the third absorption valley (560–710 nm, related to the cell structure reflectance peak) having the largest absorption intensity, which corresponds to the red valley band range of vegetation. The low reflectance is due to the absorption of red light by the photosynthesis of the vegetation. In the NIR region, there are two absorption bands: the 985 nm absorption band, which is related to starch content, and the 1185 nm absorption band, which is a narrow absorption band for water and oxygen. In the SWIR region, there is only one absorption band (1470 nm), which is strong for water and carbon dioxide.

We calculated three metrics to quantify the characteristics of absorption valleys based on six obvious absorption valleys, including absorption area, absorption slope, and absorption symmetry (Figure 8). These were used as metrics for the classification model. Significant differences were observed in the absorption areas at 985 nm (A₉₈₅) and the absorption slopes at 410 nm (K₄₁₀) between natural vegetation and reclaimed vegetation. In addition, the absorption slopes and absorption symmetries were significantly different for P. crymophila compared to other vegetation in the 635 nm absorption band. Similarly, the absorption slopes and absorption symmetries in the 485 nm absorption band were significantly different for E. nutans compared to other vegetation. These absorption characteristics enable the distinction of specific species from other vegetation.

3.2. Selection of Characteristic Parameters

The effect of feature subset size on the performance of each classifier was obtained by embedding the RFE algorithm into each classification algorithm (Figure 9). The performance of the four classification algorithms (RLR, BPNN, SVM, and RF) reached the maximum at the number of input features of 25, 27, 10 and 22. For RLR, the loss function of logistic regression is determined as L2_dual through the grid search. The application of regularization ensured that the classification accuracy of RLR remained stable even when the number of input features exceeded the optimal number, thus preventing excessive fluctuations as feature dimensions increased. The largest number of features was selected using the BPNN algorithm, and the model performance was considered optimal when the number of features reaches 27. The model performance exhibited some fluctuations when the number of features increased further. The SVM model, utilizing the radial basis kernel, selected the fewest number of features and achieved optimal performance with only 10 features. Beyond this number, the performance exhibited a slow decreasing trend. Among the models, the RF model performed the best, achieving superior classification accuracy and stability across different feature dimensions. The RF model’s performance was optimal at 22 features and remained robust even as the number of features varied.

3.3. Importance of the Feature Indicators

Different classification algorithms resulted in a difference in selected feature subsets and the importance of each feature variable to model performance (Figure 10). The SVM selected the fewest indicators with only 10 metrics, of which the most important feature indicator was the red edge amplitude (

D_{r}

, which is sensitive to plant chlorophyll content), followed by the vegetation indexes (PRI, which characterizes the photosynthetic activity of plants by tracking xanthophyll dynamics; WBI and MSI, which are used to estimate plant water concentration). RLR and BPNN selected 25 and 27 indicators. Both prioritized the absorption band metrics extracted from continuum removal transformed spectra. Additionally, several vegetation indexes (PRI; ARI2, which is used for estimating the anthocyanins content in the leaves of a plant) also have greater importance. The RF model, which achieved the highest classification accuracy, selected various types of metrics, including original band reflectance, vegetation index, and feature metrics extracted by different mathematical transformed spectra. All these metrics had a more balanced contribution to the model’s performance. In all models, the majority of indicators selected were metrics extracted from mathematically transformed spectra and vegetation indexes, especially metrics extracted from continuum removal transformed spectra, including the absorption area, slope and symmetry of the 410 nm and 485 nm chlorophyll absorption band (A₄₁₀, A₄₈₅, K₄₁₀, K₄₈₅, S₄₁₀, S₄₈₅). The original band reflectance was selected by only two classification algorithms (RLR and RF), and only four original bands were selected: Band_{552 nm}, Band_{710 nm}, Band_{718 nm} (related to plant chlorophyll), and Band_{1946 nm} (related to plant water concentration).

3.4. Model Validation and Comparison

The overall classification accuracy of the models was calculated using the confusion matrix. The discrimination accuracy for each vegetation species was measured by the F1 score, quantified using a 4-fold cross-validation, and repeated 10 times (Table 3). The SVM model achieved an F1 score of 0.917 for natural vegetation, while the other three classification algorithms performed better, with an F1 score of 1. In addition, the differences in the F1 score of the four reclaimed vegetation were more obvious. The overall classification accuracy of RLR was 0.821, with the lowest accuracy for E. nutans, where the F1 score was only 0.751. The BPNN model had the lowest overall accuracy among the four models at 0.817. The F1 score of P. pratensis, P. crymophila and E. nutans were all below 0.8, and the F1 score of F. sinensis was the highest, reaching 0.823. The SVM model had an overall accuracy of 0.824, with the highest F1 score for P. pratensis among the four reclaimed vegetation species, reaching 0.813. The RF model performed the best among the four models, with the F1 score of all four reclaimed vegetation species above 0.800, and the overall accuracy reaching 0.871.

Overall, the classification models based on RFE for extracting high-dimensional features could effectively recognize the vegetation species in coal mine reclaimed areas. Compared to the original band reflectance, the indicators extracted from the mathematical transformation spectra and vegetation index played a more significant role in each model. The accuracy of the four discriminative models ranked as follows: RF > SVM > RLR > BPNN. The results show that the model with the RF algorithm is the most suitable for vegetation classification in coal mine reclamation areas under the condition of high-dimensional features, which improves the overall classification accuracy and demonstrates good recognition accuracy for different reclaimed vegetation species.

3.5. Spectral Separability Analysis

The JM distances between the vegetation types were calculated using the best combination of parameters selected by each classifier and the RFE algorithm. The class pairs spectral separability (JM distance) was greater than 1.40 for all vegetation types, exceeding the criterion of perfect separation between species pairs (JM distance ≥ 1.3718). This further demonstrates that the five species are separable in the feature space consisting of the indicators selected by the combination of the RFE algorithm and classifier. In particular, the feature space consisting of the indicators selected by the combination of RF and RFE algorithms allowed for the greatest species separation among the five vegetation species. Only one pair (P. crymophila and E. nutans) had a JM distance of 1.413, while all other species pairs had a JM distance of 1.414. In the feature space consisting of indicators selected by the combination of other classifiers and RFE algorithms, most species pairs had a JM distance of 1.414, with only a small proportion between 1.40 and 1.414. For example, in the RLR, there were two pairs (E. nutans and P. crymophila, F. sinensis and P. pratensis) with JM distances of 1.409 and 1.413. In the BPNN, there was one pair (F. sinensis and P. pratensis) with a JM distance of 1.412. In the SVM, there was one pair (E. nutans and P. crymophila) with a JM distance of 1.412.

4. Discussion

Monitoring the growth of reclaimed vegetation is essential to understand the restoration effectiveness of ecological restoration projects in mining areas [54]. Remote sensing makes it possible to monitor over a large scale, and better discrimination of reclaimed vegetation species can enhance the ability of remote sensing techniques to assess the ecological effects of mine restoration projects [55]. In this study, we analyzed the canopy spectral differences between natural and reclaimed vegetation in mine ecological restoration areas. We evaluated the effects of hyperspectral and transformed spectra, feature extraction algorithms, and classification algorithms on classification accuracy. These findings can help in the subsequent scientific management of mines and are significant for the ecological restoration of reclaimed mines.

4.1. Differences in Vegetation Canopy Spectra

By analyzing the original and mathematically transformed spectra, significant spectral differences were found between natural vegetation and reclaimed vegetation, with some spectral differences between reclaimed vegetation. In the VIS region, the spectra of each vegetation showed the greatest variation in the green peak region and at 650 nm. Although the chemical properties of the vegetation were not directly measured in this study, the spectral differences within the VIS range can be attributed to the presence and concentration of different photosynthetic pigments [56]. In the NIR region, differences in the position of the red edges of the vegetation were observed, which may be due to the different chlorophyll concentrations and water content of the different vegetation. It has also been confirmed in previous studies that differences in chlorophyll content among several reclaimed vegetation species were found in this restoration area (Table S1) [11]. At the canopy-scale, differences in foliage density, plant height, strata-complexity, crown gaps, and clumping can explain the spectral variability among the five vegetation types [57]. For instance, grazing disturbances have caused natural meadow plants to become dwarfed in this area [58]. In contrast, in the reclaimed area, soil covering or the application of organic fertilizer increases the nutrient elements necessary for reclaimed vegetation growth in the soil, resulting in the higher plant height of reclaimed vegetation [59]. This was also found in our study area, where the average height of plants in natural vegetation (8.19 cm) was lower than in reclaimed vegetation (E. nutans: 18.95 cm; P. pratensis: 28.80 cm; P. crymophila: 39.78 cm; F. sinensis: 34.55 cm) (Table S2). In addition, natural and reclaimed vegetation differed greatly in community structure, with reclaimed vegetation having denser canopy gaps than natural vegetation (Figure 1d). This canopy feature provides specific spectral properties that decrease reflectance in the near-infrared region owing to photon trapping [60]. The VIS and NIR regions are important for the discrimination of reclaimed vegetation and natural vegetation, and can be used to select the most appropriate satellite imagery source.

The reclaimed vegetation in this study revealed high spectral variability, which can be attributed to the diverse mining environment. For example, flat ground, a sunny slope, and a shady slope have different impacts on the vegetation restoration of coal mine spoils in mining areas [61]. The gradient change from the sunny slope to flat ground and then to the shady slope is reported to have a positive effect on the soil water content, and have a negative effect on the light and heat conditions [62]. Differences in hydrothermal conditions in different habitats lead to significant differences in plant growth and development [63]. The FVC of vegetation on flat ground and sunny slopes is known to be higher compared to shady slopes [61]. Similarly, reclaimed vegetation tends to acclimate its physiological and biochemical indicators and molecular composition (e.g., chlorophyll content, soluble sugar, and free proline) according to the surrounding environment [11]. All of these factors affect the variability of reclaimed vegetation in the VNIR region.

4.2. Classification Accuracy

The high accuracy achieved in this study can be attributed to the use of the hyperspectral dataset and their mathematical transformation, the RFE algorithm for feature extraction from high-dimensional data, and robust classification algorithms (parametric and non-parametric) with cross-validation. The high spectral resolution data used in this study contain richer information that helps distinguish vegetation species in the mine restoration area. In addition, applying different mathematical transformations to the spectra effectively reduced soil background noise and highlighted spectral differences between different reclaimed vegetation [64]. It was also confirmed to produce a higher classification accuracy compared to the original spectral reflectance [65]. The intraclass differences within each reclaimed vegetation decreased after the mathematical transformation of the raw spectra, especially in the NIR region (Figure 3). This reduction in intraspecific variation is certainly encouraging for species discrimination [66]. Previous studies have shown that the use of mathematically transformed spectra is a promising approach for classifying species [67], and this study supports this conclusion. The metrics calculated from the mathematically transformed spectra play a greater role in performing classification than the original spectra.

The choice of classification algorithm also affects the model classification accuracy. We selected four classification algorithms that are not very sensitive to high-dimensional features, and improved the situation where traditional linear models (logistic regression) are not suitable for high-dimensional datasets by regularizing. We also considered two non-parametric machine learning methods (SVM and RF) with good adaptability, both of which are suitable for processing high-dimensional data. SVM and RF performed well in this study, and have the benefit of limited requirements on the number of training samples [68]. This is particularly useful for studies that focus on reclaimed species with a limited number of samples in the field.

Choosing the most suitable metrics for identifying reclaimed vegetation is another factor affecting the classification accuracy of the model. Unlike other studies [6,10] that used hyperspectral data for vegetation identification in mining areas, using classification algorithms that were insensitive to high-dimensional features, we used RFE to reduce the dimensionality of high-dimensional features before applying classification algorithms. Our results showed that the number of input features could be reduced by the RFE algorithm without a loss of classification accuracy. Even for classification algorithms that are not fully constrained by high-dimensional features, the model can be made more concise by reducing input features.

4.3. Significant Spectral Predictors

The majority of the indicators selected for the model were related to vegetation pigmentation, suggesting that pigmentation differences play a significant role in species discrimination. These indicators included vegetation indexes, spectral metrics, and band reflectance. For example, all classification algorithms selected numerous indicators extracted from mathematically transformed spectra, including red-edge parameters (

D_{r}

and

λ_{r}

, which are sensitive to plant chlorophyll content) extracted from the derivative spectra. In addition, the majority of metrics were extracted from the continuum removal transformed spectra, including absorption valleys at the 410 nm, 485 nm, and 635 nm bands. These absorption characteristics have been correlated with the chlorophyll, lutein, and anthocyanin concentrations of vegetation [69]. Previous studies have indicated that the sensitive wavebands associated for vegetation identification are mostly distributed in the NIR and SWIR regions, covering a larger reflectance range (greater amplitude) compared to the VIS region [70]. However, in this study, the reflectance of the selected bands was distributed at 552 nm and 718 nm with lower amplitude, located in the VNIR and highly correlated with phytochrome. Similarly, indicators of vegetation pigmentation like PSRI, PRI, ARI1, and ARI2 were also selected, further suggesting that differences in pigmentation among vegetation play a critical role in species identification. It is worth noting that ARI1 and ARI2 can reflect the content of vegetation anthocyanin as a phytochrome, which can increase in response to environmental stress and minimize photoinhibition [71]. The above indicators selected by the model demonstrated species separability to a certain extent. This may be due to the different responses of reclaimed vegetation to the mining environment, such as light regimes, temperature, and water availability. Specifically, environmental stress in the mining area may result in changes in vegetation pigment content [72], leading to variations in vegetation spectral reflectance in the pigment-sensitive band [73], thereby enhancing spectral separability between species with varying stress resistance. In addition, vegetation water content can also respond to differences in stress resistance [74], and the NDWI, WBI, and MSI, characterizing vegetation canopy water content, were also selected to reflect canopy spectral differences in vegetation.

5. Conclusions

In this study, the in situ hyperspectral data of the vegetation canopy in the mine rehabilitation area were measured. It is confirmed that these data can be used to distinguish the reclaimed vegetation species. The study demonstrated that the canopy in situ hyperspectral data could be used to distinguish the vegetation species (natural vegetation, P. pratensis, P. crymophila, E. nutans, and F. sinensis) in the mine ecological restoration area. Most of the indicators extracted by the RFE algorithm were constructed using transformed spectra, including NDWI, PSRI,

D_{r}

, PRI,

S_{410}

, and

A_{485}

, which are related to the absorption bands of vegetation pigments and water content in VIS and NIR spectral regions. This demonstrates the importance of spectral transformation in enhancing the detection of spectral differences in reclaimed vegetation and attenuating soil effects. Among the four classification models, RF performed better than RLR, BPNN, and SVM for the identification of reclaimed vegetation species in mining areas, with an overall classification accuracy of 0.871, meeting the accuracy requirements for classification. The results of this study are useful for the reclamation of coal mine restoration and management areas, and provide important information for the subsequent management and decision-making of ecological restoration and management projects in coal mines.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs16132372/s1. Table S1: Chemical composition for each vegetation species; Table S2: Summary statistics of plant height for each vegetation species; Table S3: Vegetation index and calculation formula; Table S4: Spectral characteristic parameters and definitions [6,11,75,76,77,78,79,80,81,82,83,84].

Author Contributions

Conceptualization, X.W. and H.X.; methodology, X.W. and H.X.; software, H.X.; validation, X.W.; formal analysis, X.W. and H.X.; investigation, X.W. and H.X.; resources, X.W. and J.Z.; data curation, H.X.; writing—original draft preparation, X.W., H.X., J.Z., X.F., S.S. and X.Y.; writing—review and editing, X.W. and H.X.; visualization, H.X.; supervision, X.W.; project administration, X.W., J.Z. and X.Y.; funding acquisition, X.W., J.Z. and X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Science and Technology Project of the Department of Ecology and Environment of Qinghai Province [2021046279]; and the Natural Science Foundation of Sichuan Province [2022NSFSCO18].

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Feng, H.; Zhou, J.; Zhou, A.; Bai, G.; Li, Z.; Chen, H.; Su, D.; Han, X. Grassland ecological restoration based on the relationship between vegetation and its below-ground habitat analysis in steppe coal mine area. Sci. Total Environ. 2021, 778, 146221. [Google Scholar] [CrossRef] [PubMed]
Li, K.; Liang, T.; Wang, L.; Yang, Z. Contamination and health risk assessment of heavy metals in road dust in Bayan Obo Mining Region in Inner Mongolia, North China. J. Geogr. Sci. 2015, 25, 1439–1451. [Google Scholar] [CrossRef]
Shrestha, R.K.; Lal, R. Changes in physical and chemical properties of soil after surface mining and reclamation. Geoderma 2011, 161, 168–176. [Google Scholar] [CrossRef]
Swab, R.M.; Lorenz, N.; Byrd, S.; Dick, R. Native vegetation in reclamation: Improving habitat and ecosystem function through using prairie species in mine land reclamation. Ecol. Eng. 2017, 108, 525–536. [Google Scholar] [CrossRef]
Xu, H.; Xu, F.; Lin, T.; Xu, Q.; Yu, P.; Wang, C.; Aili, A.; Zhao, X.; Zhao, W.; Zhang, P.; et al. A systematic review and comprehensive analysis on ecological restoration of mining areas in the arid region of China: Challenge, capability and reconsideration. Ecol. Indic. 2023, 154, 110630. [Google Scholar] [CrossRef]
Zhou, B.; Li, H.; Xu, F. Analysis and discrimination of hyperspectral characteristics of typical vegetation leaves in a rare earth reclamation mining area. Ecol. Eng. 2022, 174, 106465. [Google Scholar] [CrossRef]
Ustin, S.L.; Roberts, D.A.; Gamon, J.A.; Asner, G.P.; Green, R.O. Using Imaging Spectroscopy to Study Ecosystem Processes and Properties. BioScience 2004, 54, 523–534. [Google Scholar] [CrossRef]
Nidamanuri, R.R. Hyperspectral discrimination of tea plant varieties using machine learning, and spectral matching methods. Remote Sens. Appl. 2020, 19, 100350. [Google Scholar] [CrossRef]
Karan, S.K.; Samadder, S.R.; Maiti, S.K. Assessment of the capability of remote sensing and GIS techniques for monitoring reclamation success in coal mine degraded lands. J. Environ. Manag. 2016, 182, 272–283. [Google Scholar] [CrossRef]
Sun, H.; Li, M.; Li, D. The vegetation classification in coal mine overburden dump using canopy spectral reflectance. Comput. Electron. Agric. 2011, 75, 176–180. [Google Scholar] [CrossRef]
Sun, Y.; Yao, X.; Li, C.; Xie, Y. Physiological adaptability of three gramineae plants under various vegetation restoration models in mining area of Qinghai-Tibet Plateau. J. Plant Physiol. 2022, 276, 153760. [Google Scholar] [CrossRef] [PubMed]
Xu, D.; Wang, C.; Chen, J.; Shen, M.; Shen, B.; Yan, R.; Li, Z.; Karnieli, A.; Chen, J.; Yan, Y.; et al. The superiority of the normalized difference phenology index (NDPI) for estimating grassland aboveground fresh biomass. Remote Sens. Environ. 2021, 264, 112578. [Google Scholar] [CrossRef]
Waititu, J.M.; Mundia, C.N.; Sichangi, A.W. Spectral discrimination of invasive Lantana camara L. From co-occurring species. Int. J. Remote Sens. 2023, 119, 103307. [Google Scholar] [CrossRef]
Cho, M.A.; Skidmore, A.K.; Atzberger, C. Towards red-edge positions less sensitive to canopy biophysical parameters for leaf chlorophyll estimation using properties optique spectrales des feuilles (PROSPECT) and scattering by arbitrarily inclined leaves (SAILH) simulated data. Int. J. Remote Sens. 2008, 29, 2241–2255. [Google Scholar] [CrossRef]
Fourty, T.; Baret, F. On spectral estimates of fresh leaf biochemistry. Int. J. Remote Sens. 1998, 19, 1283–1297. [Google Scholar] [CrossRef]
Mutanga, O.; Skidmore, A.K. Hyperspectral band depth analysis for a better estimation of grass biomass (Cenchrus ciliaris) measured under controlled laboratory conditions. Int. J. Appl. Earth Obs. Geoinf. 2004, 5, 87–96. [Google Scholar] [CrossRef]
Algamal, Z.Y.; Lee, M.H. Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification. Comput. Biol. Med. 2015, 67, 136–145. [Google Scholar] [CrossRef]
Thessler, S.; Sesnie, S.; Ramos Bendaña, Z.S.; Ruokolainen, K.; Tomppo, E.; Finegan, B. Using k-nn and discriminant analyses to classify rain forest types in a Landsat TM image over northern Costa Rica. Remote Sens. Environ. 2008, 112, 2485–2494. [Google Scholar] [CrossRef]
Volpi, M.; Petropoulos, G.P.; Kanevski, M. Flooding extent cartography with Landsat TM imagery and regularized kernel Fisher’s discriminant analysis. Comput. Geosci. 2013, 57, 24–31. [Google Scholar] [CrossRef]
Lottering, R.T.; Govender, M.; Peerbhay, K.; Lottering, S. Comparing partial least squares (PLS) discriminant analysis and sparse PLS discriminant analysis in detecting and mapping Solanum mauritianum in commercial forest plantations using image texture. ISPRS J. Photogramm. Remote Sens. 2020, 159, 271–280. [Google Scholar] [CrossRef]
Cao, Z.; Zhang, S.; Liu, Y.; Smith, C.J.; Sherman, A.M.; Hwang, Y.; Simpson, G.J. Spectral classification by generative adversarial linear discriminant analysis. Anal. Chim. Acta 2023, 1261, 341129. [Google Scholar] [CrossRef] [PubMed]
Souza, J.d.C.; Soares, S.F.C.; de Paula, L.C.M.; Coelho, C.J.; de Araújo, M.C.U.; Silva, E.C.d. Bat algorithm for variable selection in multivariate classification modeling using linear discriminant analysis. Microchem. J. 2023, 187, 108382. [Google Scholar] [CrossRef]
Li, L.; Liu, Z.-P. Biomarker discovery for predicting spontaneous preterm birth from gene expression data by regularized logistic regression. Comput. Struct. Biotechnol. J. 2020, 18, 3434–3446. [Google Scholar] [CrossRef] [PubMed]
Cantorna, D.; Dafonte, C.; Iglesias, A.; Arcay, B. Oil spill segmentation in SAR images using convolutional neural networks. A comparative analysis with clustering and logistic regression algorithms. Appl. Soft Comput. 2019, 84, 105716. [Google Scholar] [CrossRef]
Chunhui, Z.; Bing, G.; Lejun, Z.; Xiaoqing, W. Classification of Hyperspectral Imagery based on spectral gradient, SVM and spatial random forest. Infrared Phys. Technol. 2018, 95, 61–69. [Google Scholar] [CrossRef]
Hughes, G. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 1968, 14, 55–63. [Google Scholar] [CrossRef]
Bajcsy, P.; Groves, P. Methodology for Hyperspectral Band Selection. Photogramm. Eng. Remote Sens. 2004, 70, 793–802. [Google Scholar] [CrossRef]
Almeida, C.T.d.; Galvão, L.S.; Aragão, L.E.d.O.C.e.; Ometto, J.P.H.B.; Jacon, A.D.; Pereira, F.R.d.S.; Sato, L.Y.; Lopes, A.P.; Graça, P.M.L.d.A.; Silva, C.V.d.J.; et al. Combining LiDAR and hyperspectral data for aboveground biomass modeling in the Brazilian Amazon using different regression algorithms. Remote Sens. Environ. 2019, 232, 111323. [Google Scholar] [CrossRef]
Shao, Q.; Cao, W.; Fan, J.; Huang, L.; Xu, X. Effects of an ecological conservation and restoration project in the Three-River Source Region, China. J. Geog. Sci. 2017, 27, 183–204. [Google Scholar] [CrossRef]
Zhang, H.; Li, X.; Li, L.; Zhang, J. Effects of Species Combination on Community Diversity and Productivity of Alpine Artificial Grassland. Acta Agrestia Sin. 2020, 28, 1436–1443. [Google Scholar] [CrossRef]
Tsai, F.; Philpot, W. Derivative Analysis of Hyperspectral Data. Remote Sens. Environ. 1998, 66, 41–51. [Google Scholar] [CrossRef]
Bielza, C.; Robles, V.; Larrañaga, P. Regularized logistic regression without a penalty term: An application to cancer classification with microarray data. Expert Syst. Appl. 2011, 38, 5110–5118. [Google Scholar] [CrossRef]
Zhang, X.; Akber, M.Z.; Zheng, W. Predicting the slump of industrially produced concrete using machine learning: A multiclass classification approach. J. Build. Eng. 2022, 58, 104997. [Google Scholar] [CrossRef]
Park, M.Y.; Hastie, T. L1-Regularization Path Algorithm for Generalized Linear Models. J. R. Stat. Soc. B 2007, 69, 659–677. [Google Scholar] [CrossRef]
Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Kisi, O.; Kerem Cigizoglu, H. Comparison of different ANN techniques in river flow prediction. Civ. Eng. Environ. Syst. 2007, 24, 211–231. [Google Scholar] [CrossRef]
Hammerstrom, D. Working with neural networks. IEEE Spectr. 1993, 30, 46–53. [Google Scholar] [CrossRef]
Chen, Y.; Chen, B.; Yao, Y.; Tan, C.; Feng, J. A spectroscopic method based on support vector machine and artificial neural network for fiber laser welding defects detection and classification. NDT E Int. 2019, 108, 102176. [Google Scholar] [CrossRef]
Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2011, 66, 247–259. [Google Scholar] [CrossRef]
Zhong, Z.; Carr, T.R. Application of mixed kernels function (MKF) based support vector regression model (SVR) for CO₂—Reservoir oil minimum miscibility pressure prediction. Fuel 2016, 184, 590–603. [Google Scholar] [CrossRef]
Caraka, R.E.; Bakar, S.A.; Tahmid, M. Rainfall forecasting multi kernel support vector regression seasonal autoregressive integrated moving average (MKSVR-SARIMA). AIP Conf. Proc. 2019, 2111, 020014. [Google Scholar] [CrossRef]
Cheng, K.; Lu, Z.; Wei, Y.; Shi, Y.; Zhou, Y. Mixed kernel function support vector regression for global sensitivity analysis. Mech. Syst. Signal Process. 2017, 96, 201–214. [Google Scholar] [CrossRef]
Zhu, B.; Ye, S.; Wang, P.; Chevallier, J.; Wei, Y.-M. Forecasting carbon price using a multi-objective least squares support vector machine with mixture kernels. J. Forecast. 2022, 41, 100–117. [Google Scholar] [CrossRef]
Monnet, J.M.; Chanussot, J.; Berger, F. Support Vector Regression for the Estimation of Forest Stand Parameters Using Airborne Laser Scanning. IEEE Geosci. Remote Sens. Lett. 2011, 8, 580–584. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Cheng, S.; Yang, X.; Yang, G.; Chen, B.; Chen, D.; Wang, J.; Ren, K.; Sun, W. Using ZY1-02D satellite hyperspectral remote sensing to monitor landscape diversity and its spatial scaling change in the Yellow River Estuary. Int. J. Appl. Earth Obs. Geoinf. 2024, 128, 103716. [Google Scholar] [CrossRef]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
Bergmeir, C.; Hyndman, R.J.; Koo, B. A note on the validity of cross-validation for evaluating autoregressive time series prediction. Comput. Stat. Data Anal. 2018, 120, 70–83. [Google Scholar] [CrossRef]
Schmidt, K.S.; Skidmore, A.K. Spectral discrimination of vegetation types in a coastal wetland. Remote Sens. Environ. 2003, 85, 92–108. [Google Scholar] [CrossRef]
Adam, E.; Mutanga, O. Spectral discrimination of papyrus vegetation (Cyperus papyrus L.) in swamp wetlands using field spectrometry. ISPRS J. Photogramm. Remote Sens. 2009, 64, 612–620. [Google Scholar] [CrossRef]
Cho, M.A.; Debba, P.; Mutanga, O.; Dudeni-Tlhone, N.; Magadla, T.; Khuluse, S.A. Potential utility of the spectral red-edge region of SumbandilaSat imagery for assessing indigenous forest structure and health. Int. J. Appl. Earth Obs. Geoinf. 2012, 16, 85–93. [Google Scholar] [CrossRef]
Zipper, C.E.; Burger, J.A.; Skousen, J.G.; Angel, P.N.; Barton, C.D.; Davis, V.; Franklin, J.A. Restoring Forests and Associated Ecosystem Services on Appalachian Coal Surface Mines. Environ. Manag. 2011, 47, 751–765. [Google Scholar] [CrossRef]
Bao, N.-s.; Wu, L.-x.; Liu, S.-j.; Li, N. Scale parameter optimization through high-resolution imagery to support mine rehabilitated vegetation classification. Ecol. Eng. 2016, 97, 130–137. [Google Scholar] [CrossRef]
Asner, G.P. Biophysical and Biochemical Sources of Variability in Canopy Reflectance. Remote Sens. Environ. 1998, 64, 234–253. [Google Scholar] [CrossRef]
Ollinger, S. Sources of variability in canopy reflectance and the convergent properties of plants. New Phytol. 2010, 189, 375–394. [Google Scholar] [CrossRef]
Niu, Y.; Yang, S.; Wang, G.; Liu, L.; Hua, L. Effects of grazing disturbance on plant diversity, community structure and direction of succession in an alpine meadow on Tibet Plateau, China. Acta Ecol. Sin. 2018, 38, 179–185. [Google Scholar] [CrossRef]
Vloon, C.; Evju, M.; Klanderud, K.; Hagen, D. Alpine restoration: Planting and seeding of native species facilitate vegetation recovery. Restor. Ecol. 2021, 30, e13479. [Google Scholar] [CrossRef]
Fernandes, M.R.; Aguiar, F.C.; Silva, J.M.N.; Ferreira, M.T.; Pereira, J.M.C. Spectral discrimination of giant reed (Arundo donax L.): A seasonal study in riparian areas. ISPRS J. Photogramm. Remote Sens. 2013, 80, 80–90. [Google Scholar] [CrossRef]
Jin, L.; Li, X.; Sun, H.; Zhang, J.; Zhou, W. Characteristics of vegetations and soils under different aspects of slag mountain in alpine mining area. Soil 2020, 52, 831–839. [Google Scholar] [CrossRef]
Wang, J.; Wang, H.; Cao, Y.; Bai, Z.; Qin, Q. Effects of soil and topographic factors on vegetation restoration in opencast coal mine dumps located in a loess area. Sci. Rep 2016, 6, 22058. [Google Scholar] [CrossRef] [PubMed]
Pan, J.; Bai, Z.; Cao, Y.; Zhou, W.; Wang, J. Influence of soil physical properties and vegetation coverage at different slope aspects in a reclaimed dump. Environ. Sci. Pollut. Control Ser. 2017, 24, 23953–23965. [Google Scholar] [CrossRef] [PubMed]
Peñuelas, J.; Gamon, J.A.; Fredeen, A.L.; Merino, J.; Field, C.B. Reflectance indices associated with physiological changes in nitrogen- and water-limited sunflower leaves. Remote Sens. Environ. 1994, 48, 135–146. [Google Scholar] [CrossRef]
Dao, P.D.; He, Y.; Proctor, C. Plant drought impact detection using ultra-high spatial resolution hyperspectral images and machine learning. Int. J. Appl. Earth Obs. Geoinf. 2021, 102, 102364. [Google Scholar] [CrossRef]
Erudel, T.; Fabre, S.; Houet, T.; Mazier, F.; Briottet, X. Criteria Comparison for Classifying Peatland Vegetation Types Using In Situ Hyperspectral Measurements. Remote Sens. 2017, 9, 748. [Google Scholar] [CrossRef]
Harrison, D.; Rivard, B.; Sánchez-Azofeifa, A. Classification of tree species based on longwave hyperspectral data from leaves, a case study for a tropical dry forest. Int. J. Appl. Earth Obs. Geoinf. 2018, 66, 93–105. [Google Scholar] [CrossRef]
Guzmán, Q.J.A.; Laakso, K.; López-Rodríguez, J.C.; Rivard, B.; Sánchez-Azofeifa, G.A. Using visible-near-infrared spectroscopy to classify lichens at a Neotropical Dry Forest. Ecol. Indic. 2020, 111, 105999. [Google Scholar] [CrossRef]
Asner, G.P.; Jones, M.O.; Martin, R.E.; Knapp, D.E.; Hughes, R.F. Remote sensing of native and invasive species in Hawaiian forests. Remote Sens. Environ. 2008, 112, 1912–1926. [Google Scholar] [CrossRef]
Bradter, U.; O’Connell, J.; Kunin, W.E.; Boffey, C.W.H.; Ellis, R.J.; Benton, T.G. Classifying grass-dominated habitats from remotely sensed data: The influence of spectral resolution, acquisition time and the vegetation classification system on accuracy and thematic resolution. Sci. Total Environ. 2020, 711, 134584. [Google Scholar] [CrossRef]
van den Berg, A.; Perkins, T. Nondestructive Estimation of Anthocyanin Content in Autumn Sugar Maple Leaves. HortScience 2005, 40, 685–686. [Google Scholar] [CrossRef]
Zhang, X.; Li, M.; Yang, H.; Li, X.; Cui, Z. Physiological responses of Suaeda glauca and Arabidopsis thaliana in phytoremediation of heavy metals. J. Environ. Manag. 2018, 223, 132–139. [Google Scholar] [CrossRef] [PubMed]
Dutta, S.; Narayan, K.S. Spectroscopic studies of photoinduced transport in polymer field effect transistors. Synth. Met. 2005, 155, 328–331. [Google Scholar] [CrossRef]
Li, R.; Yan, C.; Zhao, Y.; Wang, P.; Qiu, G.Y. Discriminating growth stages of an endangered Mediterranean relict plant (Ammopiptanthus mongolicus) in the arid Northwest China using hyperspectral measurements. Sci. Total Environ. 2019, 657, 270–278. [Google Scholar] [CrossRef] [PubMed]
Sims, D.A.; Gamon, J.A. Relationships between leaf pigment content and spectral reflectance across a wide range of species, leaf structures and developmental stages. Remote Sens. Environ. 2002, 81, 337–354. [Google Scholar] [CrossRef]
Vogelmann, J.; Rock, B.; Moss, D.M. Red edge spectral measurements from sugar maple leaves. Int. J. Remote Sens. 1993, 14, 1563–1575. [Google Scholar] [CrossRef]
Gamon, J.A.; Peñuelas, J.; Field, C.B. A narrow-waveband spectral index that tracks diurnal changes in photosynthetic efficiency. Remote Sens. Environ. 1992, 41, 35–44. [Google Scholar] [CrossRef]
Penuelas, J.; Filella, I.; Lloret, P.; Munoz, F.; Vilajeliu, M. Reflectance Assessment of Mite Effects on Apple-Trees. Int. J. Remote Sens. 1995, 16, 2727–2733. [Google Scholar] [CrossRef]
Hunt, E.R.; Rock, B.N. Detection of changes in leaf water content using Near- and Middle-Infrared reflectances. Remote Sens. Environ. 1989, 30, 43–54. [Google Scholar] [CrossRef]
Gitelson, A.; Keydan, G.; Merzlyak, M.; Gitelson, C. Three-Band Model for Noninvasive Estimation of Chlorophyll Carotenoids and Anthocyanin Contents in Higher Plant Leaves. Geophys. Res. Lett. 2006, 33. [Google Scholar] [CrossRef]
Penuelas, J.; Pinol, J.; Ogaya, R.; Filella, I. Estimation of plant water concentration by the reflectance Water Index WI (R900/R970). Int. J. Remote Sens. 1997, 18, 2869–2875. [Google Scholar] [CrossRef]
Gao, B.-c. NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sens. Environ. 1996, 58, 257–266. [Google Scholar] [CrossRef]
Li, X.; Wei, Z.; Peng, F.; Liu, J.; Han, G. Estimating the distribution of chlorophyll content in CYVCV infected lemon leaf using hyperspectral imaging. Comput. Electron. Agric. 2022, 198, 107036. [Google Scholar] [CrossRef]
Kokaly, R.F.; Asner, G.P.; Ollinger, S.V.; Martin, M.E.; Wessman, C.A. Characterizing canopy biochemistry from imaging spectroscopy and its application to ecosystem studies. Remote Sens. Environ. 2009, 113, S78–S91. [Google Scholar] [CrossRef]

Figure 1. Description of the study area, (a) location of the study area, (b,c) the natural color image and false color infrared image of the study area. Data source: Sentinel-2B satellite image (bands 4, 3, 2) acquired on 7 September 2021 with a resolution of 10 m. (d) photos of natural vegetation, E. nutans, P. pratensis, P. crymophila, and F. sinensis. FVC represents fractional vegetation cover.

Figure 2. Flowchart of data analysis.

Figure 3. Transformed spectral reflectance curves (original spectra (OR), reciprocal logarithm transformed spectra (LogR⁻¹), first order derivative transformed spectra (d(R)), and continuum removal transformed spectra (CR)) of reclaimed vegetation.

Figure 4. (a) Mean spectral reflectance curves of the vegetation species and spectral reflectance curves in the focal spectral region, including (b) ultraviolet region (350–400 nm), (c) blue region (400–500 nm), (d) green-red region (500–700 nm), and (e) red-edge region (680–750 nm).

Figure 5. (a) Mean reciprocal logarithm transformed spectral curves of the vegetation species and spectral reflectance curves in the focal spectral region, including (b) the ultraviolet region (350–400 nm), (c) blue region (400–500 nm), (d) green-red region (500–700 nm), and (e) red-edge region (680–750 nm).

Figure 6. (a) Mean first derivative transformed spectral curves of the vegetation species and spectral reflectance curves in the focal spectral region, including (b) the ultraviolet region (350–400 nm), (c) blue region (400–500 nm), (d) green-red region (500–700 nm), and (e) red-edge region (680–750 nm).

Figure 7. (a) Continuum removal transformed spectral curves of vegetation species and (b–g) absorption bands in the continuum removed reflectance spectra.

Figure 8. Characteristics of vegetation mean continuum removal transformed spectral curve, (a) absorption area, (b) absorption slope, and (c) absorption symmetry.

Figure 9. Effect of subsets feature size on the accuracy of the classification models. The selected feature size was a subset of features with the highest accuracy.

Figure 10. Importance ranking of the feature indicators selected for each classification method.

Table 1. Description of the classification algorithms utilized, including the parameters considered, the basis for feature ranking, and the R package used.

Type	Abbr.	Model	Parameters	Feature Rank Criteria	R Package
Parametric Model	RLR	Regularized Logistic Regression	cost = 1 loss = L1, L2_dual, L2_primal epsilon = 0.001, 0.00325, 0.0055, 0.00775, 0.01	Decrease in accuracy value by permuting a variable *	LiblineaR
Non-parametric Model	BPNN	Back Propagation Neural Network	$size = 1, 2, 3, \dots$ 17, 18, 19, 20 decay = 0, 0.1	combinations of the absolute values of the weights	nnet
	SVM	Support Vector Machines with Radial Basis Function Kernel	$Cost = 10^{n}$ $(n = - 4, - 3, \dots$ 0, 1, 2, 3) singma = 0.1, 0.2, 0.3, ⋯1	Decrease in accuracy value by permuting a variable *	kernlab
	RF	Random Forest	n.tree = 300, 500, 700, 900, 1000, 1500 $mtry = \sqrt{k}$ (k is the number of indicators entered)	Decrease in accuracy value by permuting a variable	randomForest

* Achieved through vip packages. The final selected parameter is marked in bold italics.

Table 2. Mean values of trilateral parameters extracted based on first derivative spectra.

Vegetation Type	$D_{b}$	$λ_{b}$	$D_{y}$	$λ_{y}$	$D_{r}$	$λ_{r}$
Natural Vegetation	0.000954	525	0.000113	629	0.005023	718
P. pratensis	0.001055	522	0.000199	629	0.008661	730
P. crymophila	0.000799	518	0.000547	629	0.006918	716
Elymus nutans	0.001285	522	0.000133	629	0.006486	719
F. sinensis	0.001421	521	0.000019	629	0.009942	728

Table 3. F1 scores for each species discrimination, and classification accuracy of models.

Model	F1 Score					Accuracy
Model	Natural Vegetation	P. pratensis	P. crymophila	E. nutans	F. sinensis	Accuracy
RLR	1.000	0.768	0.808	0.751	0.755	0.821
BPNN	1.000	0.766	0.726	0.766	0.823	0.817
SVM	0.917	0.813	0.810	0.771	0.810	0.824
RF	1.000	0.838	0.827	0.824	0.859	0.871

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, X.; Xu, H.; Zhou, J.; Fang, X.; Shuai, S.; Yang, X. Analysis of Vegetation Canopy Spectral Features and Species Discrimination in Reclamation Mining Area Using In Situ Hyperspectral Data. Remote Sens. 2024, 16, 2372. https://doi.org/10.3390/rs16132372

AMA Style

Wang X, Xu H, Zhou J, Fang X, Shuai S, Yang X. Analysis of Vegetation Canopy Spectral Features and Species Discrimination in Reclamation Mining Area Using In Situ Hyperspectral Data. Remote Sensing. 2024; 16(13):2372. https://doi.org/10.3390/rs16132372

Chicago/Turabian Style

Wang, Xu, Hang Xu, Jianwei Zhou, Xiaonan Fang, Shuang Shuai, and Xianhua Yang. 2024. "Analysis of Vegetation Canopy Spectral Features and Species Discrimination in Reclamation Mining Area Using In Situ Hyperspectral Data" Remote Sensing 16, no. 13: 2372. https://doi.org/10.3390/rs16132372

APA Style

Wang, X., Xu, H., Zhou, J., Fang, X., Shuai, S., & Yang, X. (2024). Analysis of Vegetation Canopy Spectral Features and Species Discrimination in Reclamation Mining Area Using In Situ Hyperspectral Data. Remote Sensing, 16(13), 2372. https://doi.org/10.3390/rs16132372

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Analysis of Vegetation Canopy Spectral Features and Species Discrimination in Reclamation Mining Area Using In Situ Hyperspectral Data

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. In Situ Hyperspectral Measurements and Pre-Processing

2.3. Calculation of Hyperspectral Indicators

2.4. Classification Models

2.4.1. Regularized Logistic Regression

2.4.2. Back Propagation Neural Network

2.4.3. Support Vector Machines with Radial Basis Function Kernel

2.4.4. Random Forest

2.5. Vegetation Identification Model Construction and Validation

2.6. Spectral Separability

3. Results

3.1. Spectral Characterization

3.1.1. Original Spectral Analysis

3.1.2. Reciprocal Logarithm Spectral Analysis

3.1.3. First Derivative Spectral Analysis

3.1.4. Continuum Removal Transform Spectral Analysis

3.2. Selection of Characteristic Parameters

3.3. Importance of the Feature Indicators

3.4. Model Validation and Comparison

3.5. Spectral Separability Analysis

4. Discussion

4.1. Differences in Vegetation Canopy Spectra

4.2. Classification Accuracy

4.3. Significant Spectral Predictors

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI