Identifying the Restoration Stages of Degraded Alpine Meadow Patches Using Hyperspectral Imaging and Machine Learning Techniques

Luo, Wei; Wang, Lu; Cui, Lulu; Zheng, Min; Li, Xilai; Li, Chengyi

doi:10.3390/agriculture14071097

Open AccessArticle

Identifying the Restoration Stages of Degraded Alpine Meadow Patches Using Hyperspectral Imaging and Machine Learning Techniques

by

Wei Luo

^1,2,

Lu Wang

^1,2,*,

Lulu Cui

^1,2,

Min Zheng

³,

Xilai Li

³ and

Chengyi Li

³

¹

College of Computer Technology and Applications, Qinghai University, Ningzhang Road, Xining 810016, China

²

Qinghai Provincial Laboratory for Intelligent Computing and Application, Qinghai University, Ningzhang Road, Xining 810016, China

³

College of Agriculture and Animal Husbandry, Qinghai University, Ningzhang Road, Xining 810016, China

^*

Author to whom correspondence should be addressed.

Agriculture 2024, 14(7), 1097; https://doi.org/10.3390/agriculture14071097

Submission received: 5 June 2024 / Revised: 4 July 2024 / Accepted: 5 July 2024 / Published: 9 July 2024

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

The accurate identification of different restoration stages of degraded alpine meadow patches is essential to effectively curb the deterioration trend of ‘Heitutan’ (areas of severely degraded alpine meadows in western China). In this study, hyperspectral imaging (HSI) and machine learning techniques were used to develop a method for accurately distinguishing the different restoration stages of alpine meadow patches. First, hyperspectral images representing the four restoration stages of degraded alpine meadow patches were collected, and spectral reflectance, vegetation indexes (VIs), color features (CFs), and texture features (TFs) were extracted. Secondly, valid features were selected by competitive adaptive reweighted sampling (CARS), ReliefF, recursive feature elimination (RFE), and F-test algorithms. Finally, four machine learning models, including the support vector machine (SVM), k-nearest neighbor (KNN), random forest (RF), and extreme gradient boosting (XGBoost), were constructed. The results demonstrated that the SVM model based on the optimal wavelengths (OWs) and prominent VIs achieved the best value of accuracy (0.9320), precision (0.9369), recall (0.9308), and F1 score (0.9299). In addition, the models that combine multiple sets of preferred features showed a significant performance improvement over the models that relied only on a single set of preferred features. Overall, the method combined with HSI and machine learning technology showed excellent reliability and effectiveness in identifying the restoration stages of meadow patches, and provided an effective reference for the formulation of grassland degradation management measures.

Keywords:

degraded alpine meadow patches; hyperspectral imaging; machine learning; feature fusion

1. Introduction

Grassland is an important part of terrestrial ecosystems, accounting for 40.5% of the total global land area (except ice caps and ice sheets) [1,2], and has important ecological and food production functions [3,4]. Alpine meadows are the main grassland ecosystems on the Tibetan Plateau, and their unique biodiversity and ecological functions are important for maintaining the ecological stability of the Tibetan Plateau region and the livelihoods of local herders [5]. However, in recent years, alpine meadows have experienced the emergence of diverse types of bald patches, characterized by limited or absent vegetation cover. These patches have arisen primarily due to overgrazing and rodent activities, which may have been compounded or intensified by the effects of climate change, such as altered precipitation patterns and increased temperature extremes [6]. As shown in Figure 1B, the restoration stages of alpine meadow patches are mainly divided into active patches (Stage 0), inactive patches (Stage 1), recovering patches (Stage 2), and healthy alpine meadow (Stage 3). The alpine meadow patches in the active-patch stage after the increase in patches; connectivity; and the formation of a large area of bare land, after continuous degradation, will form a ‘Heitutan’ degraded grassland (Figure 1A), seriously affecting the production and life of local herders [7]. Xilai [8] pointed out that Heitutan degraded grassland can be recovered, but the recovery requirements for different degrees of Heitutan degradation vary greatly. Based on model simulations, Heitutan degraded grassland can form after 21 years of high-intensity grazing. Recovery requires at least 50 years with virtually no external interference, while under typical grazing conditions, the recovery period for Heitutan ranges from 115 to 500 years [9,10].

The timely intervention and development of management measures at the stages of alpine meadow patchiness can effectively reduce the difficulty of the subsequent management of Heitutan degraded grasslands. Previous studies mainly focused on the distribution of patches in degraded alpine meadows, succession patterns, soil microorganisms, etc. Huo et al. [11] carried out patch-scale investigations in alpine meadows at different stages of degradation in the Tibetan Plateau. The study revealed that the changes in patch properties and vegetation shifts in alpine meadows were mainly affected by climate change, human activities, and soil erosion. This research underscores the importance of patch dynamics as indicators of alpine meadow degradation, providing valuable insights for sustainable grassland management. Song et al. [12] used model simulations and field experiments to investigate the impact of grazing activities on the construction of plant communities in the degradation of grasslands. The study revealed the remodeling effects of environmental variables on plant community structure under patchy degradation scenarios, and the effects of different herbivores on plant community construction during grassland degradation. Du et al. [13] focused on the phenomenon of patchiness in degraded alpine meadows of the Sanjiangyuan, and analyzed the characteristics of plant communities at the center and edge of the patches and the role of their associated soil physical properties in combating soil erosion through field investigations and laboratory tests. The results showed that the root–soil complex in the patches had a positive effect on the erosion resistance of degraded soils. Duan et al. [14] carried out a field-based biodiversity study on patches with different degradation levels in alpine meadows and explored the positive effects of β-diversity of fungi on soil multifunctionality in the process of the natural restoration of degraded patches in alpine meadows. Additionally, they elucidated the key roles of soil pH and moisture in regulating the relationship between microbial diversity and soil function. These findings provide an important scientific basis for improving the recovery ability of degraded grassland ecosystems. Li et al. [15] conducted a containment experiment on patchy degraded alpine meadows in the Yellow River source area of the Tibetan Plateau. The results showed that a 5-year enclosure could effectively improve soil nutrients and carbon sequestration, and could maintain grassland productivity without long-term enclosures. These studies mostly rely on field surveys, which are subjective and destructive to alpine meadows [16], while fewer studies have been conducted on alpine meadow patches objectively and non-destructively from the perspective of images.

In recent years, hyperspectral imaging (HSI) technology has been widely used in smart agriculture because of its ability to capture the rich spectral information of features [17]. HSI technology combines the advantages of machine vision and spectral analysis and is able to acquire three-dimensional data cubes containing numerous consecutive spectral bands. Through the in-depth analysis of these data, the detailed spectral features of each pixel point can be used as an effective prediction basis for stage identification [18,19]. In addition, machine learning techniques have demonstrated excellent capabilities in processing large amounts of complex data. By combining different features of images, efficient regression and classification models can be constructed. Mansour et al. [20] studied grassland degradation in Okhombe shared grazing land in South Africa by analyzing canopy hyperspectral reflectance to differentiate between four classes of grasses representing different levels of degradation. They used the random forest (RF) algorithm combined with the forward variable selection technique to select a set of effective feature sets containing eight key wavelengths among a large number of wavelengths, achieving good classification accuracy (88.64%). Guan et al. [21] applied hyperspectral technology to the estimation of soil organic matter content in the degraded grassland of Sanjiangyuan, and confirmed the excellent performance of the RF model in predicting soil organic matter content. Gu et al. [22] conducted a study on the non-destructive detection of early tomato spotted wilt virus (TSWV) infection in tobacco by using HSI and machine learning technology. They used multiple wavelength selection methods with different classification models for comparative analysis, and finally determined that the model combining successive projection algorithms and boosted regression tree performed the best, with an accuracy of 85.2%. This consequently provides an effective method to achieve a fast and accurate non-invasive diagnosis of TSWV in early stages. Fu et al. [23] innovatively fused multispectral images from the Jilin-1 satellite (JL101K) and UAV platforms, and used the Gram–Schmidt algorithm to improve image quality, then made a breakthrough in the karst wetland vegetation classification problem. The results show that the light gradient boosting optimization model is the most advantageous classification model.

In addition, a series of studies have emphasized the importance of multi-feature fusion for improving model performance [24]. Guo et al. [25] combined spectral and textural features to identify the tasseling date of summer maize. The integration of spectral and texture features to generate a new index using the improved adaptive feature weighting method resulted in a reduction in the root-mean-square error for the tasseling date prediction to 5.77 days. Johari et al. [26] successfully identified different instar stages of Metisa plana larvae using HSI and machine learning techniques. A weighted k-nearest neighbor (KNN) constructed based on 506 nm and 538 nm reflectances combined with significant morphological parameters achieved the best identification results. Guo et al. [27] focused on the utilization of crop height to identify the critical stages of maize growth and development. They obtained RGB vegetation indexes (VIs), texture features (TFs), and multispectral VIs, and constructed a maize plant height prediction model by linear regression analysis. The results showed that constructing a maize height model based on multi-source images is an important complementary tool for extracting different maize phenology. Ali et al. [28] conducted a classification study of six types of maize seeds by machine learning methods. The researchers integrated color features (CFs), TFs, and spectral features to construct a hybrid feature set, and used the correlation-based feature selection for feature preference. And, the multi-layer perceptron model constructed based on the preferred features achieved an overall classification accuracy of 98.93%. Yan et al. [29] used RF and neural network models combined with multi-source data from geography, meteorology, plants, and microorganisms to predict the degradation degree of grassland in northern China. Among them, the RF model showed the best prediction performance, with a relative error of only 16.9%, which provided theoretical support for the design of a grassland degradation early warning system.

Although the applications of HSI technology and machine learning methods have achieved remarkable results, few studies on the use of these methods integrated with the combination of spectral reflectance, VIs, CFs, and TFs exist to identify the restoration stages of degraded alpine meadow patches in the Tibetan Plateau. Therefore, this study aims to develop an effective method for identifying the restoration stages of alpine meadow patches based on hyperspectral images. This could further be used to provide a scientific reference for the analysis of plant community composition and dominant species on patches in degraded alpine meadows. More specifically, this study has the following purposes: (1) to validate the applicability of HSI in identifying the restoration stages of alpine meadow patches in the Tibetan Plateau; (2) obtain the optimal wavelengths (OWs), prominent VIs, significant CFs, and effective TFs using CARS, reliefF, RFE, and F-test feature selection algorithms, respectively; (3) develop identification models with different machine learning techniques, including the support vector machine (SVM), KNN, RF, and extreme gradient boosting (XGBoost); and (4) determine the optimal combination of feature sets and predictive models for the identification of alpine meadow patches at the restoration stage.

2. Materials and Methods

2.1. Overview of the Study Area

This study was conducted in Qilian County, Haibei Tibetan Autonomous Prefecture, Qinghai Province, China (Figure 2). This area has a flat and broad topography, with the Babao River crossing in the middle, and an average altitude of 3000 m [30]. The region presents typical plateau continental climate characteristics, with an average annual temperature of 1 °C, an annual precipitation of about 420 mm, abundant sunshine time, and significant temperature difference between day and night. Due to the constraints of a high altitude and cold climate conditions, the vegetation type in this area is mainly dominated by alpine meadows, subalpine scrub, and alpine grassland [31,32].

Images were collected in degraded grassland near the Babao River watershed in the Ebao Township (37°56′59.51″ N, 100°56′29.43″ E). We selected six experimental sample plots near this domain to acquire hyperspectral image data. Figure 2 shows the locations of the six meticulously chosen sample plots along the banks of the Babao River. To guarantee sufficient spatial heterogeneity and independence among the sampling regions, we intentionally set these plots far apart from each other. In each of the six selected sample plots, we randomly acquired images of degraded alpine meadow patches representing different stages of restoration.

2.2. Hyperspectral Image Acquisition and Preprocessing

2.2.1. Hyperspectral Imaging Systems

As shown in Figure 3, the HSI system used in this study consisted of a main computer (Lenovo, Beijing, China), an SOC710VP (Surface Optics Corporation, San Diego, CA, USA) hyperspectral camera, and a tripod (BENRO, Zhongshan, China). The computer was interconnected with a spectrometer through a wired interface, which enabled the control of exposure settings and the capture of spectral images. The tripod had a leveling device to ensure that the spectrometer was always level. In addition, there was a portable mobile power supply to power the spectrometer at all times. The SOC710VP hyperspectral camera is capable of acquiring high-quality spectral data across 128 bands within the wavelength range of 400–1000 nm, boasting a spectral resolution of 4.6875 nm. It features a pixel standard of 696 pixels by 520 pixels, a dynamic range (grayscale values) of 12-bit, and utilizes a CCD sensor. The camera was equipped with a Schneider 35 mm infrared-corrected lens with a C-mount interface, enabling a capture speed of 30 lines per second.

2.2.2. Image Acquisition and Calibration

Hyperspectral images were acquired in late July 2023. To guarantee optimal imaging conditions, high-quality images were obtained by scheduling acquisitions between 10:00 a.m. and 2:00 p.m. (Beijing time) strictly on clear and cloud-free days. We randomly selected alpine meadow patches from the experimental sample plots, which were determined to be photographed after verification by experts. While acquiring hyperspectral images, we also recorded the restoration stage that the patches were in for subsequent experiments. In addition, the distribution of alpine meadow patches is discrete and fragmented, making it difficult to perform the hyperspectral image acquisition of the entire patch. To ensure the consistency of spectral imaging, we used the central region of each patch as a standardized shooting target to maintain the uniformity of imaging height.

Each time the environment changed, we captured images incorporating a calibration board to prepare for image correction, and the correction formula is as follows:

R = \frac{R_{s a m p l e} - R_{d a r k}}{R_{b a c k g r o u n d} - R_{d a r k}}

(1)

where R_sample represents the measured reflectance,

R_{d a r k}

is the camera response obtained after covering the lens with an opaque black cap, and R_background is the reflectance obtained at the high-reflectance white calibration board.

The denoising and calibration of the hyperspectral images were performed using SRAnalysis Toolkit software (version 2.0) provided by the manufacturer. After image preprocessing, we obtained a total of 206 images: 42 active patches (S0), 63 inactive patches (S1), 41 recovering patches (S2), and 60 healthy alpine meadows (S3).

2.3. Feature Extraction and Selection

2.3.1. Spectral Feature Extraction and Selection

The spectral reflectance of an image was acquired using Pycharm and the Spectral Python toolkit. An image had a 696 × 520 × 128 reflectance, and the reflectance of every pixel was acquired. High data dimensionality can lead to long model runtimes and the “curse of dimensionality”, resulting in model performance degradation. Therefore, we used the average reflectance of each of the 128 wavelengths as the spectral reflectance feature.

To further reduce the dimensionality, we used the competitive adaptive reweighted sampling (CARS) method to select OWs. This method combines Monte Carlo sampling with partial least squares (PLS) regression analysis, and selects key wavelengths by systematically performing multiple iterative calculation processes. In the subset obtained from each sampling stage, PLS was used for model construction and a cross-validation procedure was performed to evaluate the root mean square error (RMSECV) [33]. This procedure aims to identify the spectral wavelengths that result in the lowest RMSECV. These wavelengths, considered to be the most effective in contributing to model predictions, help achieve the goal of screening key wavelengths from high-dimensional spectral data.

2.3.2. Vegetation Index Extraction and Selection

The ground composition of degraded alpine meadow patches varies across different stages, where S0 predominantly consists of bare soil, while the remaining stages exhibit vegetation coverage of diverse densities. Based on the sensor type of the camera and the application scenarios of VIs, we selected 50 VIs related to structure, water content, vegetation pigmentation, and physicochemical properties from the related literature to achieve the identification of patches at different restoration stages. Table 1 presents the names and formulas of the VIs used in this study.

In order to reduce the number of wavelengths needed to calculate the VIs, the prominent VIs were selected using the ReliefF algorithm. ReliefF can deal with multi-category problems [34]; a sample R is randomly taken out from the training sample set each time. Then k-nearest neighbor samples within the same category are compared with those within different categories, and the feature weights are dynamically updated according to the differences in feature values. Finally, the desired features are obtained by thresholding. The weight is calculated as shown in the following equation:

W (A) = W (A) - \sum_{C = C l a s s (R)} \sum_{j = 1}^{k} \frac{d i f f (A, R_{i}, H_{j})}{(m k)} + \sum_{C \neq C l a s s (R)} [\frac{P (C)}{1 - P (C l a s s (R_{i}))} \sum_{j = 1}^{k} d i f f (A, R_{i}, M_{j} (C))] / (m k)

(2)

where

d i f f (A, R_{i}, H_{j})

and

d i f f (A, R_{i}, M_{j} (C))

are the Euclidean distances of the two samples on feature A,

P (C)

is the probability of occurrence of category C,

C l a s s (R)

is the category to which the random sample R belongs,

P (C l a s s (R))

is the probability of occurrence of the random sample, R, and m is the sampling time.

Table 1. Vegetation indexes used in this study.

No.	Vegetation Index	Equation	Reference
1	Visible atmospherically resistant index	VARI = (R550 − R670)/(R550 + R670 − R450)	[35]
2	Coloration index	CI = (R670 − R450)/R670	[36]
3	Shape index	IF = (2 × (R670) − R550 − R450)/(R550 − R450)	[36]
4	Ratio analysis of reflection of spectral chlorophyll-a	RARSa = R675/R700	[37]
5	Photochemical reflectance index	PRI = (R531 − R570)/(R531 + R570)	[38]
6	Physiological reflectance index	PhRI = (R550 − R531)/(R550 + R531)	[38]
7	Structure insensitive pigment index	SIPI = (R800 − R445)/(R800 + R680)	[39]
8	Normalized chlorophyll pigment ratio index	NCPI = (R670 − R450)/(R670 + R450)	[39]
9	Nitrogen reflectance index	NRI = (R570 − R670)/(R570 + R670)	[40]
10	Plant pigment ratio	PPR = (R550 − R450)/(R550 + R450)	[41]
11	Green leaf index	GLI = (2 × R550 − R670 − R450)/(2 × R550 + R670 + R450)	[42]
12	Greenness index	GI = R554/R667	[43]
13	Normalized difference vegetation index	NDVI = (R800 − R670)/(R800 + R670)	[44]

2.3.3. Color Feature Extraction and Selection

In order to extract the CFs of the hyperspectral image, the data of the three visible light wavelengths, red, green, and blue, were first selected and the intensity ratios of the corresponding bands were adjusted, and then the information of the three bands was merged to generate the RGB image. In this process, we extracted color histogram features from each channel and also calculated three color moment features: the first-order color moment (Equation (3)), second-order color moments (Equation (4)), and third-order color moment (Equation (5)). The formulas for these moments are as follows:

E_{i} = \frac{1}{N} \sum_{j = 1}^{N} P_{i, j}

(3)

σ_{i} = \sqrt{(\frac{1}{N} \sum_{j = 1}^{N} {(P_{i, j} - E_{i})}^{2})}

(4)

S_{i} = \sqrt[3]{(\frac{1}{N} \sum_{j = 1}^{N} {(P_{i, j} - E_{i})}^{3})}

(5)

In the above equation, N denotes the number of pixels in the image, i is the number of color channels in the image,

P_{i, j}

is the color component of the jth pixel on the ith color channel, and

E_{i}

is the color mean value of the ith color channel of all pixels.

To efficiently select the significant CFs, the recursive feature elimination (RFE) algorithm is used. RFE employs a stepwise elimination strategy [45], where the importance of each feature is measured by evaluator and the least influential features are removed in each iteration. This process is carried out continuously along with cross-validation until the desired number of features is reached.

2.3.4. Texture Feature Extraction and Selection

The texture of images reflects the physicochemical properties and structural features; a total of four TFs were extracted based on the gray-level co-occurrence matrix (GLCM) [46]. They are homogeneity, contrast, energy, and entropy, which are calculated by the following formulas:

H o m o g e n e i t y = \sum_{i = 1}^{N} \sum_{j = 1}^{N} \frac{P (i, j)}{1 + {(i - j)}^{2}}

(6)

C o n t r a s t = \sum_{i = 1}^{N} \sum_{j = 1}^{N} {(i - j)}^{2} P (i, j)

(7)

E n e r g y = \sum_{i = 1}^{N} \sum_{j = 1}^{N} P {(i, j)}^{2}

(8)

E n t r o p y = - \sum_{i = 1}^{N} \sum_{j = 1}^{N} P (i, j) \lg (P (i, j))

(9)

where N denotes the number of gray levels, i and j denote the gray levels of the two pixels being compared, and

P (i, j)

denotes the number of times i and j have been co-created in the GLCM.

In the process of extracting TFs, a total of 16 TFs (four directions × four TFs) were extracted in four directions (0°, 45°, 90°, 135°) using the generated GLCM, and then the average of the four directions of each texture feature was used as the final texture feature of the sample. Thus, a hyperspectral image was obtained with 512 TFs (128 wavelengths × 4 TFs).

In order to achieve the fast and efficient identification of the restoration stages of patches, effective TFs were obtained using the F-test. The F-test can be used to assess the degree of correlation or difference between each feature and target variable [47], which in turn identifies effective features. The formula for calculating the F-value is as follows:

F = \frac{S S (b e t w e e n)}{m - 1} / \frac{S S (e r r o r)}{n - m}

(10)

where SS(between) is the sum of squares between groups, SS(error) is the sum of squares within groups, m is the total number of categories for the target variable, and n is the number of samples in each group.

2.4. Feature Fusion and Model Building

The fusion of multiple features introduces richer information, potentially making it more effective to utilize the fused preferential feature dataset for identifying the restoration stages of degraded alpine meadow patches. In this study, four classifiers, SVM, KNN, RF, and XGBoost, were selected to construct the identification model of the restoration stages of degraded alpine meadow patches.

SVM is a potent classifier, which distinguishes among different classes by identifying an optimal separating hyperplane. This hyperplane is designed to maximize the shortest distance margin between sample points of different classes, known as the maximum margin classification. The SVM’s strength lies in its generalization ability, even for complex data distributions, where it can leverage kernel tricks (such as RBF and polynomial kernels) to map data into higher-dimensional spaces, facilitating the discovery of highly discriminative hyperplanes. Additionally, by introducing the concept of soft margins, the SVM can tolerate a degree of noise and outliers, enhancing the model’s robustness [48].

KNN is a simple and intuitive classifier, which classifies samples into different classes based on the distance between a test sample and training samples. Specifically, KNN calculates the distances from the test sample to all known class samples, selects the K-nearest neighbors with the smallest distances, and determines the category of the test sample by majority voting or other strategies among these neighbors. The choice of K is a crucial hyperparameter, influencing model complexity and classification performance. A small K value may make the model sensitive to noise, while a large K value may introduce too many irrelevant samples, blurring the classification boundaries [49].

RF is an integrated learning method, which classifies or regresses by constructing an ensemble of multiple decision trees and combining their predictions. Each decision tree in RF is built on a random subset of the training data and considers only a subset of features, adding diversity to the model and reducing the risk of overfitting. During predictions, RF aggregates the outputs of all trees, either by voting (for classification) or averaging (for regression), to produce the final result. RF is renowned for its high accuracy, robustness, and ability to handle large-scale datasets efficiently [50].

XGBoost is an integrated gradient boosting tree-based learning method, which enhances prediction performance by iteratively training multiple weak classifiers and combining them into a strong classifier. XGBoost builds upon the gradient boosting framework with several optimizations, including the introduction of regularization terms to control model complexity, the use of second-order Taylor expansions to approximate the objective function, and support for column sampling. These improvements allow XGBoost to maintain a high predictive accuracy while significantly improving training speed and generalization capability. Furthermore, XGBoost offers a wide range of configurable parameters, enabling users to tailor the model to specific tasks for optimal performance [51].

The experimental platform is based on a computer with a Windows 11 operating system, which is equipped with an Intel Core i7 12th generation processor (i7-12700H), with 16 GB of DDR5-4800 MHz RAM, and a 1 TB NVMe SSD solid-state disk to ensure efficient data processing. At the software level, we created the Python 3.9.7 environment using the Anaconda version manager. Experimental code development and execution were mainly performed on the PyCharm IDE (Professional Edition) and Jupyter Notebook platform (version 6.4.5). In addition, we utilized the Scikit-Learn machine learning framework (version 0.24.2) and the hyperspectral data processing dependency library (Spectral Python 0.23.1) to construct models.

2.5. Evaluation Indicators

In order to obtain the best identification model for the restoration stages of alpine meadow patches, the model identification results were evaluated using 5-fold stratified cross-validation and four evaluation metrics: accuracy (Acc), precision (P), recall (R), and F1 score (F1). Due to the limited number of samples, we chose the K-fold stratified cross-validation to evaluate the model, and stratification will ensure that each category of data is treated equally by the model. The average of the evaluations was used to judge the model’s performance. The computation of the evaluation metrics was based on the confusion matrix, which is shown in Figure 4. The specific formulas for the four evaluation metrics are as follows:

A c c u r a c y = \frac{T P + T N}{T P + F P + F N + T N}

(11)

P r e c i s i o n = \frac{T P}{T P + F P}

(12)

R e c a l l = \frac{T P}{T P + F N}

(13)

F 1 = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(14)

2.6. Research Process

In this study, we used HSI and machine learning techniques to construct a model for identifying the restoration stages of degraded alpine meadow patches by combining the spectral reflectance, VIs, CFs, and TFs of the images, and Figure 5 illustrates the research flowchart of the study.

3. Results

3.1. Spectral Analysis and Modeling

In this study, 128 wavelengths of spectral reflectance in the 400–1000 nm range were selected to analyze the spectral features. Vegetation growth and development, health, and growing conditions affect the spectral reflectance and form unique spectral profiles. In the visible region, the green wavelength band (around 550 nm) exhibits a significant reflectance peak, which is characteristic of vegetation and directly related to the low absorption of green light by chlorophyll. Meanwhile, the blue wavelength band (around 450 nm) and red wavelength band (around 670 nm) show pronounced absorption bands due to the presence of chlorophyll a and b in plant tissues. These pigments selectively absorb blue and red light to drive the photosynthetic process, resulting in reduced reflectance in these spectral regions. Thus, the distinct spectral profile of vegetation, with a high reflectance in the green band and lower reflectance in the blue and red bands, is a manifestation of the selective light-absorbing properties of chlorophyll, which plays a pivotal role in the energy capture mechanisms of plants. In addition, the near-infrared region (700–800 nm) shows a sharp increase in reflectance, which is another signature feature of vegetation spectra [52]. The different restoration stages of degraded alpine meadow patches differed in vegetation cover and species richness, which resulted in different reflectances in spectral wavelengths, a feature that can be used to effectively differentiate the restoration stages.

Four models (SVM, KNN, RF, and XGBoost) were constructed using spectral reflectance as the input. The data were normalized during the experiments to prevent the models from favoring features with larger numerical values due to differences in dimension. Random search methods were used to determine the optimal parameters for each model, with the number of iterations proportional to the number of model parameters. All subsequent experimental manipulations followed the above settings and steps. The results of the model based on the optimal parameters are shown in Table 2. SVM has the best performance, with accuracy, precision, and recall above 0.95, and F1 score of 0.9487. Despite not demonstrating the superiority of the SVM, the remaining three models still prove effective in accurately identifying the restoration stages of alpine meadow patches.

In order to simplify the model, we used CARS to select the OWs from 128 wavelengths. The number of random samples in this process was fifty, and 20 OWs were selected (411.3, 421.1, 485.6, 545.8, 566.1, 663.3, 689.2 704.8, 720.5, 736.2, 751.9, 788.8, 820.7, 842.0, 944.4, 955.3, 977.1, 982.6, 993.5, and 1010.0 nm). The process of wavelength selecting by CARS and the distribution of the selected wavelengths are shown in Figure 6. The results of four models using optimal wavelengths are shown in Table 2; the SVM based on OWs has the best performance, and the accuracy, precision, recall, and F1 score are 0.9318, 0.9297, 0.9304, and 0.9279, respectively. Compared with the performance of the full-wavelength-based models, all models achieved satisfactory results. The performances of KNN and XGBoost improved across the board, which indicates the effectiveness of the feature selection strategy. Although there was a slight decrease in the performances of SVM and RF, the input features were reduced from 128 to 20, which greatly improved the efficiency of model construction while maintaining the discrimination effect of the model, and provides the possibility for subsequent large-scale data processing.

3.2. Vegetation Index Analysis and Modeling

VIs reflect the growth status of vegetation and are widely used in the research on vegetation growth monitoring, disaster monitoring, and water resource management [53]. To proceed with the analyses of vegetation health and restoration status for patches, it is necessary to employ a suite of vegetation indices (VIs) that can capture various aspects of plant condition and ecosystem dynamics. In this study, 50 VIs were selected from the relevant literature to enable the identification of patches at different restoration stages. Among them, PSSRa and GI are related to vegetative pigments and possess the ability to capture pigment concentration and photosynthesis [54]. SIPI is used to study the growth status of plants, which can eliminate the influence of structure on VIs [55]. NRI and PRI are closely related to the physiological status of plants, which reflect the efficiency of the photosynthesis of plants and the estimation of nutrient element content [56]. And NDVI is considered to be one of the best VIs for estimating the chlorophyll content and cover degree of plants, which can effectively correlate with the greenness change in vegetation [57].

Table 3 shows the performance of each model constructed based on 50 VIs. The SVM model had the best performance, with an accuracy of 0.9514, precision of 0.9570, recall of 0.9481, and F1 score of 0.9318. To our surprise, compared with the spectral reflectance, the performance of the three models based on VIs (KNN, RF, and XGBoost) has been greatly improved, and the performance improvement is more than 10% for all of them. This implies that the hyperspectral vegetation index will outperform hyperspectral reflectance in terms of the identification effect. Given that calculating fifty VIs demands forty-three spectral reflectance data points, it is necessary to filter and exclude the spectral VIs with weak effects in order to further optimize the performance of the models.

Ten VIs (PSSRa, SIPI, NRI, PPR, GLI, GI, NDVI, IF, PhRI, and VARI) were selected using ReliefF, and the significance scores for each of the VIs are shown in Figure 7. The results of machine learning modeling using prominent VIs are shown in Table 3; SVM, KNN, and RF decreased slightly (by no more than 0.03), but the required VIs reduced from fifty to ten, and the reflectance required for the hyperspectral vegetation index calculation reduced from forty-three to twelve, which greatly reduced the computational complexity.

3.3. Color Feature Analysis and Modeling

A color histogram reflects the overall color characteristics of an image [58]. We segmented brightness into twenty intervals, treating each as a distinct color feature. Color moments are a statistical quantity that describes the color distribution of an image [59]. First-order color moments reflect the overall lightness and darkness of an image; the larger the value, the brighter the image. Second-order color moments reflect the range of color distribution of the image; the larger the value, the wider the range of the color distribution. Third-order color moments reflect the symmetry of the image color distribution. We input the data from the red-, green-, and blue-light wavelengths of the hyperspectral image into the three-channel RGB, and synthesized the RGB image. Then, we extracted the color histogram features and color moment features for each channel, and a total of 69 CFs was extracted (three channels × (twenty color histogram features + three color moment features)).

The results of the model evaluation using 69 CFs are shown in Table 4, where the accuracy values of the four models are 0.7573, 0.7138, 0.7282, and 0.7529. The performance is worse than that of the spectral reflectance and spectral vegetation index. This may be explained by the fact that the color tones on S1 and S2 of the alpine meadow patches have similar distributions.

To pick out the significant CFs, we used RFE to select two histogram features for the red channel (brightness 1 and brightness 15), one histogram feature for the green channel (brightness 20), and three histogram features for the blue channel (brightness 1, brightness 3, and brightness 19). First-, second-, and third-order moments for the red channel; third-order moment for the green channel, and first-order moment for the blue channel were also selected. Using the significant CFs to construct the model, the results show (Table 4) that the SVM and KNN show an improvement in all evaluation metrics, while XGBoost shows an improvement in accuracy and RF performance shows a slight decrease.

3.4. Textural Feature Analysis and Modeling

Based on the GLCM, we obtained 512 TFs from each sample. The extraction results of the TFs are shown in Figure 8. These features reveal a variety of texture attributes: homogeneity demonstrates uniformity of the local grayscale of the image, contrast reflects clarity of image and depth of furrows, entropy measures randomness of the information contained in image, and energy reflects smoothness of the image’s grayscale distribution [60].

Table 5 shows the modeling results based on TFs, from which it can be seen that all the models can effectively identify different restoration stages. But, the identification results of KNN are slightly inferior, which are attributed to an excessive number of TFs, causing the model to fall into the “dimensional catastrophe” and resulting in poor model performance. Hence, it is imperative to select effective TFs.

Ten effective TFs (contrast and entropy at 741.4, 746.7, 751.9, and 757.2 nm, and contrast at 778.3 and 783.5 nm) were selected using the F-test. Machine learning modeling using effective TFs showed (Table 5) that all models were capable of effectively discriminating the restoration stages of degraded alpine meadow patches. Although the SVM, RF, and XGBoost models showed a decrease in performance, the KNN model showed a significant improvement in performance. And, we optimized ten effective TFs from 512 TFs, which greatly reduced the model complexity.

3.5. Discriminative Modeling of Fusion Features

Eleven fused feature datasets were constructed based on OWs, prominent VIs, significant CFs, and effective TFs. For a comprehensive overview of the feature composition in each dataset, please refer to Table 6 below.

The models were constructed using the fused feature dataset, and the evaluation results were obtained. The experimental results are shown in Table 7 and Table 8; the best performance was obtained by combining the OWs and prominent VIs with the SVM model, resulting in accuracy, precision, recall, and F1 scores of 0.9320, 0.9369, 0.9380, and 0.9299, respectively, which are higher than the results of the best model based on OWs or prominent VIs.

In addition, by adding OWs or prominent VIs to the effective TFs or significant CFs, the performance of the model can be further improved, especially for KNN, RF, and XGBoost. Based on dataset F (effective TFs + significant CFs), the accuracies of the four models—SVM, KNN, RF, and XGBoost—are 0.8355, 0.7668, 0.7521, and 0.7670, respectively. After adding OWs and prominent VIs (dataset K), the model accuracies improve by 0.0821, 0.1215, 0.1554, and 0.1501 for SVM, KNN, RF, and XGBoost, respectively, with KNN, RF, and XGBoost showing improvements of over 10%. It is worth mentioning that, although the best model for most of the datasets is the SVM, the performance of XGBoost on datasets E, G, H, and J is also excellent, and even outperforms the SVM on dataset D. Therefore, XGBoost also has the potential to efficiently identify degraded alpine meadow patches.

As shown in Figure 9, to further validate the classification performance of the SVM based on fused dataset A, we plotted the receiver operating characteristic (ROC) curves and precision–recall (PR) curves for each stage under the SVM, and calculated the area under the curve (AUC) for both ROC and PR curves to evaluate the model’s performance. From the figure, it is evident that the SVM model achieves ROC-AUC and PR-AUC values of 1 for S0 and S1, 0.9918 and 0.9838 for S2, respectively, and 0.9962 and 0.9861 for S3, respectively, which is a highly satisfactory result that strongly proves the SVM model’s excellent performance on this dataset.

In addition, we also plotted the ROC and PR curves for each stage under the KNN, RF, and XGBoost models, based on dataset A, and calculated the corresponding ROC-AUC and PR-AUC values (as shown in Figures S1–S3). Notably, while the SVM model yielded the best outcomes, it is worth mentioning that the other three models also produced effective results, with their ROC-AUC and PR-AUC values at S0 both reaching 1, indicative of the good performance of all models in recognizing S0.

To evaluate the accuracy of model identification across each restoration stage, we depicted the confusion matrices of the optimal models derived from both the preferred feature datasets and the fused feature dataset in Figure 10. The five confusion matrices for each optimal model with five-fold hierarchical cross-validations are superimposed to obtain the final confusion matrix. After a careful analysis of the confusion matrix, we observed significant differences in the classification accuracy exhibited by the models for the different patch restoration stages. Specifically, for S0, the recognition accuracy is high, with almost no misclassification, and only very few samples are incorrectly categorized as S1, which suggests that the model has a high degree of specificity and accuracy in handling S0 samples. When turning to S1, we find that it is easily misidentified as S0, and there is also a case of being misidentified as S2. Furthermore, we note the complexity presented in the S2 identification process. Some of the S2 samples may be incorrectly determined as S1, and there are also cases of being misidentified as S3, which reveals that there may be subtle and indistinguishable feature intersections between S2 and S1 and S3, thus posing a test to the recognition ability of model. As for S3, the recognition performance of the model is outstanding, with the vast majority of samples being successfully recognized, and only a few instances being incorrectly classified as S2. This reflects that the model possesses a high degree of reliability and stability when dealing with S3 samples. Overall, S0 and S3 are the best identified, while S1 and S2 are slightly weaker but also effectively identified.

4. Discussion

The applications of HSI in the field of grassland ecology mainly include the disease monitoring, biomass estimation, and discrimination of grass species, as well as the detection of different grass cover types [61]. Hyperspectral imaging possesses the unique advantage of providing information in hundreds of consecutive spectral wavelengths to reveal spectral features that cannot be directly perceived by the human eye.

The purpose of this study is to identify different restoration stages based on spectral, color, and texture features extracted from hyperspectral images of degraded alpine meadow patches. The experimental results show that the importance of spectral features is greater than that of CFs, and the importance of CFs is greater than that of TFs. In this study, six OWs (421.1, 485.6, 545.8, 566.1, 663.3, and 689.2 nm) were selected near the two absorption bands (450 nm and 670 nm) and the reflectance peak (550 nm) in the visible region, which are important for distinguishing the different restoration stages. And these OWs are strongly correlated with the chlorophyll content [62]. In addition, the OWs and prominent VI fusion feature dataset combined with the optimized SVM model demonstrated the best performance, which is consistent with the results of the study by Wu et al. [63].

The results based on the fused feature dataset show that the SVM model is the most effective in identifying the restoration stages of alpine meadow patches, and all of its evaluation indexes are greater than 0.8. The performance of the XGBoost model is second only to the SVM, and improves with the increase in features. RF and KNN models are not as effective as the SVM and XGBoost, but they are also able to identify restoration stages of degraded alpine meadows patches. In addition, the average accuracy of the model based on a single preferred feature was 0.7828, while that of the model constructed by fusing preferred features was improved to 0.8626. This indicates that fusing multiple features can more effectively improve the overall performance of the model compared with a single preferred feature in the model construction process. This finding is consistent with that found in related studies [64,65]. It is worth noting that the overall performance of the model can be significantly improved by considering the prominent VIs or OWs. This assertion is vividly substantiated by a comparison of the experimental outcomes. Compared with the experimental results based on dataset F (effective TFs + significant CFs), the experimental results of dataset K (OWs + prominent VIs + effective TFs + significant CFs) show a significant improvement. Specifically, the accuracies of KNN, RF, and XGBoost improved by more than 10%. Furthermore, the fact that the best model is based on dataset A (OWs + prominent VIs) underscores the clear advantages of these two features in this study. Thus, in interdisciplinary grassland ecology and computer science research, prioritizing OWs and VIs is a judicious and effective strategy. While the SVM model excelled in identifying alpine meadow restoration stages, our approach mainly focuses on vegetation production and recovery, leaving room for a deeper exploration of plant community structure and species diversity. Future work will integrate ground-based ecological assessments with remote sensing data, collaborating with grassland ecologists to broaden our ecological indicators. This multi-dimensional approach aims to enhance our understanding of restoration impacts on biodiversity and community dynamics, offering more comprehensive insights into grassland ecosystem health.

For each stage of identification, S0 and S3 were the most effective. S1 and S2 were also effectively identified, but some models encountered challenges in identifying S2. This difficulty stems from the unique ecological community structure of S2, which contains not only 1–2-year-old weeds from S1, but also plants such as the sedge family and perennial weeds that dominate in S3 [66]. Furthermore, the plant community of S2 exhibits a higher degree of species diversity and compositional complexity, with some species emerging as dominant players, affecting the recovery trajectory. Understanding the specific roles and interactions of these species is crucial for optimizing restoration strategies. These findings not only reveal the advantages and limitations of the model in dealing with different categories, but more importantly, they provide valuable insights for subsequent studies, especially for the exploration of strategies on how to improve the recognition accuracy of S2. In the future, we plan to delve deeper into the plant community dynamics of S2, incorporating multi-scale analysis and additional environmental factors. By leveraging ensemble methods and focusing on the interactions between species, we aim to enhance the robustness and accuracy of our model, especially for S2.

The method proposed in this study opens up new perspectives for the practical application of hyperspectral technology in degraded meadows, and the adoption of this advanced digital agricultural technology in the Tibetan Plateau region also has broad social and policy implications. Among them, S0 is an important node in the degradation process of alpine meadows, and a timely and effective intervention is the key to stop the spread of localized active patches and prevent them from being transformed into irreversible Heitutan deteriorated grassland. The method can be used as a key tool for identifying the restoration stages of alpine meadow patches, thus determining the optimal timing of degraded grassland management. The adoption of this digital agriculture approach has great potential for rural communities in the Tibetan Plateau. By facilitating early detection and targeted interventions, local farmers and herders can benefit from improved land management practices that increase rangeland productivity and maintain livestock health. This may help alleviate poverty and promote sustainable development in remote areas, in line with national policies aimed at rural revitalization and ecological conservation. In addition, the integration of solutions, such as hyperspectral imaging and machine learning algorithms, into agricultural practices may require specialized personnel capable of operating and maintaining complex equipment. This shift could drive the demand for education and training programs focused on digital literacy and advanced agricultural technologies, creating new jobs in rural areas and promoting a knowledge-intensive economy. Thus, the proposed method not only provides a scientific reference for grassland ecology, but also has more far-reaching implications for social well-being and policy development in the Tibetan Plateau region.

5. Conclusions

This study investigated the potential of utilizing HSI and machine learning techniques to identify the restoration stages of degraded alpine meadows patches. An integrated approach for the identification of the restoration stages of alpine meadow patches was developed using an HIS system combined with feature selection algorithms (CARS, ReliefF, RFE, and F-test) and machine learning algorithms (SVM, KNN, RF, and XGBoost). Furthermore, 20 OWs were selected based on CARS, 10 prominent VIs were selected by ReliefF, 11 significant CFs were selected by RFE, and 10 effective TFs were selected by F-test. The models constructed based on the preferred features dataset, as well as the fused features dataset, achieved satisfactory identification results. It is worth mentioning that the SVM model constructed based on OWs and prominent VIs obtained the best values for accuracy (0.9320), precision (0.9369), recall (0.9308), and F1 score (0.9299). Therefore, this study proposes an objective, non-invasive method to identify the restoration stages of degraded alpine meadows patches, so as to provide references for the sustainable use and intelligent monitoring or management of alpine meadows. However, this study only identified four types of degraded alpine meadow patches in one watershed in Qilian County, in the Qinghai–Tibet Plateau. In the future, we plan to improve the hyperspectral unmanned aerial vehicle image acquisition system to obtain patch data in different watersheds, and further improve the accuracy and applicability of the classification model to realize the digital and accurate identification of the restoration stages of degraded patches on a larger scale. Furthermore, we plan to strengthen our close collaboration with grassland ecologists to delve deeper into ecosystem health from a more nuanced perspective, such as species diversity and composition, aiming for a more precise identification of the restoration stages of patches, particularly for S2. Lastly, extending the application of this research methodology to a wider range of cases is also one of the crucial directions for our future work.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/agriculture14071097/s1. Table S1: 50 spectral indices; Figure S1: ROC and PR curves for KNN; Figure S2: ROC and PR curves for RF; Figure S3: ROC and PR curves for XGBoost; Figure specification: illustration documents of Figures S1–S3.

Author Contributions

Conceptualization, W.L.; Data curation, W.L. and L.W.; Formal analysis, L.C., M.Z. and C.L.; Funding acquisition, X.L.; Investigation, W.L., L.W., L.C. and C.L.; Methodology, W.L., L.W. and C.L.; Project administration, L.W.; Resources, L.W. and X.L.; Software, W.L., L.C. and M.Z.; Supervision, X.L.; Validation, W.L. and L.C.; Visualization, W.L. and M.Z.; Writing—original draft, W.L. and L.C.; Writing—review and editing, W.L., L.W. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financially supported by the National Natural Science Foundation of China (U21A20191, U23A20159), Qinghai Science and Technology Department (2023-QY-210), and Higher Education Discipline Innovation Project, the 111 Project of China (D18013).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available at Zenodo: Luo, W. (2024). Hyperspectral images of patches at different stages of degraded alpine meadows [dataset]. Zenodo. https://doi.org/10.5281/zenodo.10990325 (accessed on 18 April 2024). The code for this study can be found at github: https://github.com/luowei-jia/HSI (accessed on 7 April 2024).

Acknowledgments

We gratefully acknowledge the support provided by high performance computing center of Qinghai University.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bai, Y.; Huang, J.; Zheng, S.; Pan, Q.; Zhang, L.; Zhou, H.; Ma, J. Drivers and regulating mechanisms of grassland and desert ecosystem services. Chin. J. Plant Ecol. 2014, 38, 93–102. [Google Scholar] [CrossRef]
Finn, J.; Kirwan, L.; Connolly, J. Ecosystem function enhanced by combining four functional types of plant species in intensively managed grassland mixtures: A 3-year continental-scale field experiment. J. Appl. Ecol. 2013, 50, 365–375. [Google Scholar] [CrossRef]
Fang, J.; Bai, Y.; Li, L. Scientific basis and practical ways for sustainable development of China’s pasture regions. Chin. Sci. Bull. 2016, 61, 155–164. [Google Scholar] [CrossRef]
Li, X.L. Mechanisms Underlying the Dwarf Phenotype of Leymus chinensis Induced by Long-Term Overgrazing. Ph.D. Dissertation, Chinese Academy of Agricultural Sciences, Beijing, China, 2016. [Google Scholar]
Tan, Y.; Chen, Z.; Liu, W.; Yang, M.; Du, Z.; Wang, Y.; Wu, D. Grazing exclusion alters denitrification N₂O/(N₂O + N₂) ratio in alpine meadow of Qinghai–Tibet Plateau. Sci. Total Environ. 2024, 912, 169358. [Google Scholar] [CrossRef] [PubMed]
Song, Z.; Li, X.; Li, J.; Ma, G.L. Research on changes in plant functional groups and root-soil complex characteristics in different disturbed patches of alpine meadows. Ecol. Sci. 2022, 41, 31–38. [Google Scholar] [CrossRef]
Yang, L.; Li, X.; Shi, D.; Sun, H.; Yang, Y.W. Research on the succession patterns of degraded grassland vegetation in “Heitutan” on the Qinghai-Tibet Plateau. Qinghai Pratacult. 2005, 1, 2–5, 15. [Google Scholar]
Li, X. The Spatio-Temporal Dynamics of Four Plant-Functional Types (PFTs) in Alpine Meadow as Affected by Human Disturbance, Sanjiangyuan Region, China. Ph.D. Dissertation, University of Auckland, Auckland, New Zealand, 2012. Available online: http://hdl.handle.net/2292/19565 (accessed on 20 April 2024).
Shang, Z.; Dong, Q.; Shi, J.; Zhou, H.; Dong, S.; Shao, X.; Cao, G. Progress of research on degraded grasslands and their ecological restoration in the Qinghai-Tibet Plateau in the past 10 years—A parallel study on ecological restoration of Sanjiangyuan. J. Grassl. 2018, 26, 1–21. [Google Scholar] [CrossRef]
Li, X.L.; Perry, G.; Brierley, G. Quantitative assessment of degradation classifications for degraded alpine meadows (Heitutan), Sanjiangyuan, western China. Land Degrad. Dev. 2014, 25, 417–427. [Google Scholar] [CrossRef]
Huo, J.; Zhu, J.; Song, M.; Li, Y.; Xu, X.; Zhou, H. Vegetation patch characteristics during degradation succession in alpine meadows on the Qinghai-Tibet Plateau. Acta Agrestia Sin. 2022, 11, 3113–3118. [Google Scholar] [CrossRef]
Song, Y.; Xu, M.; Xu, T.; Zhao, X.; Yue, Y.; Yu, H.; Wang, L. Changes in plant community assembly from patchy degradation of grasslands and grazing by different-sized herbivores. Ecol. Appl. 2023, 33, e2803. [Google Scholar] [CrossRef]
Du, C.; Qiao, Y.; Zhao, L.; Chen, H.; Cao, M.; Liu, Y. Research on root-soil complex characteristics in degraded alpine meadow patches. Acta Agrestia Sin. 2023, 31, 202–209. [Google Scholar] [CrossRef]
Duan, C.; Li, X.; Li, C.; Yang, P.; Chai, Y.; Xu, W. Positive effects of fungal β diversity on soil multifunctionality mediated by pH in the natural restoration succession stages of alpine meadow patches. Ecol. Indic. 2023, 148, 110122. [Google Scholar] [CrossRef]
Li, C.; Li, X.; Yang, Y.; Zhang, S.; Yang, P.N. The impact of enclosure reclamation on carbon exchange and its components in patchy degraded alpine meadows within the Yellow River source region. Acta Ecol. Sin. 2023, 24, 10228–10237. [Google Scholar] [CrossRef]
Cui, L.; Wang, L.; Su, J.; Song, Z.; Li, X. Classification and identification of degraded alpine m eadows based on machine learning techniques. In Proceedings of the 2023 4th International Conference on Computer Vision, Image and Deep Learning (CVIDL), Zhuhai, China, 12–14 May 2023; pp. 263–267. [Google Scholar] [CrossRef]
Neri, I.; Caponi, S.; Bonacci, F.; Clementi, G.; Cottone, F.; Gammaitoni, L.; Mattarelli, M. Real-Time AI-Assisted Push-Broom Hyperspectral System for Precision Agriculture. Sensors 2024, 24, 344. [Google Scholar] [CrossRef] [PubMed]
Thomas, S.; Kuska, M.T.; Bohnenkamp, D.; Brugger, A.; Alisaac, E.; Wahabzada, M. Benefits of hyperspectral imaging for plant disease detection and plant protection: A technical perspective. J. Plant Dis. Prot. 2018, 125, 5–20. [Google Scholar] [CrossRef]
Singh, V.; Sharma, N.; Singh, S. A review of imaging techniques for plant disease detection. Artif. Intell. Agric. 2020, 4, 229–242. [Google Scholar] [CrossRef]
Mansour, K.; Mutanga, O.; Everson, T. Discriminating indicator grass species for rangeland degradation assessment using hyperspectral data resampled to AISA Eagle resolution. ISPRS J. Photogramm. Remote Sens. 2012, 70, 56–65. [Google Scholar] [CrossRef]
Guan, W.; Liu, Z.; He, G. Spectral simulation estimation of soil organic matter content in degraded alpine grasslands of the Sanjiangyuan Region. Grassl. Turf 2022, 42, 28–36. [Google Scholar] [CrossRef]
Gu, Q.; Sheng, L.; Zhang, T.; Lu, Y.; Zhang, Z.; Zheng, K.; Zhou, H. Early detection of tomato spotted wilt virus infection in tobacco using the hyperspectral imaging technique and machine learning algorithms. Comput. Electron. Agric. 2019, 167, 105066. [Google Scholar] [CrossRef]
Fu, B.; Zuo, P.; Liu, M.; Lan, G.; He, H.; Lao, Z.; Gao, E. Classifying vegetation communities karst wetland synergistic use of image fusion and object-based machine learning algorithm with Jilin-1 and UAV multispectral images. Ecol. Indic. 2022, 140, 108989. [Google Scholar] [CrossRef]
Guo, Y.; Chen, S.; Fu, Y.H.; Xiao, Y.; Wu, W.; Wang, H.; Beurs, K.D. Comparison of multi-methods for identifying maize phenology using phenocams. Remote Sens. 2022, 14, 244. [Google Scholar] [CrossRef]
Guo, Y.; Xiao, Y.; Li, M.; Hao, F.; Zhang, X.; Sun, H.; He, Y. Identifying crop phenology using maize height constructed from multi-sources images. Int. J. Appl. Earth Obs. Geoinf. 2022, 115, 103121. [Google Scholar] [CrossRef]
Johari, S.; Khairunniza-Bejo, S.; Shariff, A.; Husin, N.; Basri, M.; Kamarudin, N. Identification of bagworm (Metisa plana) instar stages using hyperspectral imaging and machine learning techniques. Comput. Electron. Agric. 2022, 194, 106739. [Google Scholar] [CrossRef]
Guo, Y.; Fu, Y.H.; Chen, S.; Bryant, C.R.; Li, X.; Senthilnath, J.; de Beurs, K. Integrating spectral and textural information for identifying the tasseling date of summer maize using UAV based RGB images. Int. J. Appl. Earth Obs. Geoinf. 2021, 102, 102435. [Google Scholar] [CrossRef]
Ali, A.; Qadri, S.; Mashwani, W.K.; Brahim Belhaouari, S.; Naeem, S.; Rafique, S.; Anam, S. Machine learning approach for the classification of corn seed using hybrid features. Int. J. Food Prop. 2020, 23, 1110–1124. [Google Scholar] [CrossRef]
Yan, H.; Ran, Q.; Hu, R.; Xue, K.; Zhang, B.; Zhou, S.; Wang, Y. Machine learning-based prediction for grassland degradation using geographic, meteorological, plant and microbial data. Ecol. Indic. 2022, 137, 108738. [Google Scholar] [CrossRef]
Li, J.; Liu, Y.; Mo, C.; Wang, L.; Pang, G.; Cao, M. IKONOS image-based extraction of the distribution area of Stellera chamaejasme L. in Qilian County of Qinghai Province, China. Remote Sens. 2016, 8, 148. [Google Scholar] [CrossRef]
Hou, Q.; Ji, Z.; Yang, H.; Yu, X. Impacts of climate change and human activities on different degraded grassland based on NDVI. Sci. Rep. 2022, 12, 15918. [Google Scholar] [CrossRef]
Liu, Y.; Zhao, F.; Wang, L.; He, W.; Liu, J.; Long, Y. Spatial Distribution and Influencing Factors of Soil Fungi in a Degraded Alpine Meadow Invaded by Stellera chamaejasme. Agriculture 2021, 11, 1280. [Google Scholar] [CrossRef]
Li, H.; Liang, Y.; Xu, Q.; Cao, D. Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Anal. Chim. Acta 2009, 648, 77–84. [Google Scholar] [CrossRef]
Robnik-Šikonja, M.; Kononenko, I. Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 2003, 53, 23–69. [Google Scholar] [CrossRef]
Hunt, J.; Daughtry, C.; Eitel, J. Remote sensing leaf chlorophyll content using a visible band index. Agron. J. 2011, 103, 1090–1099. [Google Scholar] [CrossRef]
Escadafal, R.; Belghit, A.; Ben-Moussa, A. Indices spectraux pour la télédétection de la dégradation des milieux naturels en Tunisie aride. In Proceedings of the 6th International Symposium on Physical Measurements and Signatures in Remote Sensing, Val d’Isère, France, 17–24 January 1994; Guyot, G., Ed.; pp. 253–259. Available online: https://www.scirp.org/reference/ReferencesPapers?ReferenceID=1933253 (accessed on 20 April 2024).
Chappelle, E.; Kim, M.; McMurtrey, J., III. Ratio analysis of reflectance spectra (RARS): An algorithm for the remote estimation of the concentrations of chlorophyll a, chlorophyll b, and carotenoids in soybean leaves. Remote Sens. Environ. 1992, 39, 239–247. [Google Scholar] [CrossRef]
Gamon, J.; Penuelas, J.; Field, C.B. A narrow-waveband spectral index that tracks diurnal changes in photosynthetic efficiency. Remote Sens. Environ. 1992, 41, 35–44. [Google Scholar] [CrossRef]
Penuelas, J.; Baret, F.; Filella, I. Semi-empirical indices to assess carotenoids/chlorophyll a ratio from leaf spectral reflectance. Photosynthetica 1995, 31, 221–230. [Google Scholar]
Filella, I.; Serrano, L.; Serra, J.; Penuelas, J. Evaluating wheat nitrogen status with canopy reflectance indices and discriminant analysis. Crop Sci. 1995, 35, 1400–1405. [Google Scholar] [CrossRef]
Metternicht, G. Vegetation indices derived from high-resolution airborne videography for precision crop management. Int. J. Remote Sens. 2003, 24, 2855–2877. [Google Scholar] [CrossRef]
Baroni, F.; Boscagli, A.; Di Lella, L.A.; Protano, G.; Riccobono, F. Arsenic in soil and vegetation of contaminated areas in southern Tuscany (Italy). J. Geochem. Explor. 2004, 81, 1–14. [Google Scholar] [CrossRef]
Zarco-Tejada, P.J.; Berjon, A.; Lopez-Lozano, R.; Miller, J.R.; Martín, P.; Cachorro, V.; De Frutos, A. Assessing vineyard condition with hyperspectral indices: Leaf and canopy reflectance simulation in a row-structured discontinuous canopy. Remote Sens. Environ. 2005, 99, 271–287. [Google Scholar] [CrossRef]
Rouse, J.W., Jr.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring the Vernal Advancement and Retrogradation (Green Wave Effect) of Natural Vegetation (No. NASA-CR-132982). 1973. Available online: https://ntrs.nasa.gov/citations/19750020419 (accessed on 20 April 2024).
Chen, Q.; Meng, Z.; Liu, X.; Jin, Q.; Su, R. Decision variants for the automatic determination of optimal feature subset in RF-RFE. Genes 2018, 9, 301. [Google Scholar] [CrossRef]
Mohanaiah, P.; Sathyanarayana, P.; GuruKumar, L. Image texture feature extraction using GLCM approach. Int. J. Sci. Res. Publ. 2013, 3, 1–5. [Google Scholar]
Lix, L.M.; Keselman, J.C.; Keselman, H.J. Consequences of assumption violations revisited: A quantitative review of alternatives to the one-way analysis of variance F test. Rev. Educ. Res. 1996, 66, 579–619. [Google Scholar] [CrossRef]
Schuldt, C.; Laptev, I.; Caputo, B. Recognizing human actions: A local SVM approach. In Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 26 August 2004; IEEE: Piscataway, NJ, USA; Volume 3, pp. 32–36. [Google Scholar] [CrossRef]
Zhang, S.; Li, X.; Zong, M.; Zhu, X.; Cheng, D. Learning k for knn classification. ACM Trans. Intell. Syst. Technol. (TIST) 2017, 8, 43. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Li, L.; Zhang, Q.; Huang, D. A review of imaging techniques for plant phenotyping. Sensors 2014, 14, 20078–20111. [Google Scholar] [CrossRef] [PubMed]
Xue, J.; Su, B. Significant remote sensing vegetation indices: A review of developments and applications. J. Sens. 2017, 2017, 1353691. [Google Scholar] [CrossRef]
Eitel, J.; Long, D.S.; Gessler, P.E.; Smith, A. Using in-situ measurements to evaluate the new RapidEye™ satellite series for prediction of wheat nitrogen status. Int. J. Remote Sens. 2007, 28, 4183–4190. [Google Scholar] [CrossRef]
Strachan, I.B.; Pattey, E.; Boisvert, J.B. Impact of nitrogen and environmental conditions on corn as detected by hyperspectral reflectance. Remote Sens. Environ. 2002, 80, 213–224. [Google Scholar] [CrossRef]
Peñuelas, J.; Gamon, J.A.; Fredeen, A.L.; Merino, J.; Field, C. Reflectance indices associated with physiological changes in nitrogen-and water-limited sunflower leaves. Remote Sens. Environ. 1994, 48, 135–146. [Google Scholar] [CrossRef]
Pettorelli, N.; Vik, J.O.; Mysterud, A.; Gaillard, J.M.; Tucker, C.J.; Stenseth, N.C. Using the satellite-derived NDVI to assess ecological responses to environmental change. Trends Ecol. Evol. 2005, 20, 503–510. [Google Scholar] [CrossRef]
Ali, H.; Lali, M.I.; Nawaz, M.Z.; Sharif, M.; Saleem, B.A. Symptom based automated detection of citrus diseases using color histogram and textural descriptors. Comput. Electron. Agric. 2017, 138, 92–104. [Google Scholar] [CrossRef]
Malakar, A.; Mukherjee, J. Image clustering using color moments, histogram, edge and K-means clustering. Int. J. Sci. Res. 2013, 2, 532–537. [Google Scholar]
Gebejes, A.; Huertas, R. Texture characterization based on grey-level co-occurrence matrix. Databases 2013, 9, 375–378. [Google Scholar] [CrossRef]
Wang, Z.; Ma, Y.; Zhang, Y.; Shang, J. Review of remote sensing applications in grassland monitoring. Remote Sens. 2022, 14, 2903. [Google Scholar] [CrossRef]
Blackburn, G.A. Hyperspectral remote sensing of plant pigments. J. Exp. Bot. 2007, 58, 855–867. [Google Scholar] [CrossRef]
Wu, H.; Song, Z.; Niu, X.; Liu, J.; Jiang, J.; Li, Y. Classification of toona sinensis young leaves using machine learning and UAV-borne hyperspectral imagery. Front. Plant Sci. 2022, 13, 940327. [Google Scholar] [CrossRef]
Wu, G.; Fang, Y.; Jiang, Q.; Cui, M.; Li, N.; Ou, Y.; Zhang, B. Early identification of strawberry leaves disease utilizing hyperspectral imaging combing with spectral features, multiple vegetation indices and textural features. Comput. Electron. Agric. 2023, 204, 107553. [Google Scholar] [CrossRef]
Shi, Y.; Han, L.; Kleerekoper, A.; Chang, S.; Hu, T. Novel cropdocnet model for automated potato late blight disease detection from unmanned aerial vehicle-based hyperspectral imagery. Remote Sens. 2022, 14, 396. [Google Scholar] [CrossRef]
Sun, H.F.; Li, X.L.; Jin, L.Q.; Zhao, Y.R.; Li, C.Y.; Zhang, J.; Song, Z.H.; Su, X.X.; Liu, K. Changes in Soil Bacterial Community Diversity in Degraded Patches of Alpine Meadow in the Source Area of the Yellow River. Environ. Sci. 2022, 43, 4662–4673. [Google Scholar] [CrossRef]

Figure 1. Succession process of degraded alpine meadow patches: (A) persistent deterioration of patches and (B) succession of four restoration stages in patches.

Figure 2. Geographic location of the study area: (A) geographic location of Qinghai Province; (B) geographic location of Ebao Town; (C) detailed deployment of six sample plots in Ebao Town.

Figure 3. Hyperspectral image acquisition system. The hyperspectral camera (SOC710VP), mounted on a tripod and controlled via a portable computer, was used for data collection in Ebao Town.

Figure 4. Confusion matrix.

Figure 5. Flowchart of the experiment. Starting with the acquisition of HSI data, the workflow progresses through feature extraction, feature selection, model training, and finally model validation. Each step is detailed as follows: (A) Acquisition of HSI data from the study area. (B) Extraction of spectral reflectance, VIs, CFs, and TFs from the acquired images, which serve as important features for model development. (C) Preprocessing of the extracted features, including feature selection to identify and retain the optimal subset of features for subsequent analysis. (D) Application of machine learning algorithms to train models using the preprocessed dataset, aiming to identify the restoration stages of degraded alpine meadow patches. (E) Validation of the trained models through cross-validation techniques to evaluate their accuracy.

Figure 6. Wavelength selection process (A) and optimal wavelength distribution (B) (MC stands for Monte Carlo). Note: The red square represents the number of features selected under optimal sampling.

Figure 7. Vegetation index preferred feature scores. The red bubble represents the vegetation index with the highest score and the green bubble denotes the vegetation index with the lowest score.

Figure 8. Example of extracted texture features. Note: RGB is the synthesized true-color image.

Figure 9. ROC curve (A) and PR curve (B) for the SVM model at different stages. The figure displays ROC and PR curves at four stages, with the AUC values—specifically, ROC-AUC and PR-AUC—provided in the figure legend. The stages include: Stage 0, representing active patches; Stage 1, representing inactive patches; Stage 2, representing recovering patches; and Stage 3, representing healthy alpine meadow areas.

Figure 10. Heat map of the best model confusion matrix based on different datasets of features.

Table 2. Identification results of full hyperspectral reflectance features and preferred features.

Model	Full Spectrum				Optimal Wavelength
Model	Acc	P	R	F1	Acc	P	R	F1
SVM	0.9513	0.9515	0.9509	0.9487	0.9318	0.9297	0.9304	0.9279
KNN	0.7621	0.7611	0.7535	0.7529	0.8008	0.7955	0.7917	0.7908
RF	0.7766	0.7790	0.7611	0.7619	0.7718	0.7821	0.7552	0.7590
XGBoost	0.7720	0.7758	0.7511	0.7497	0.7961	0.8036	0.7848	0.7833

Table 3. Identification results of full vegetation index features and preferred features.

Model	Total Vegetation Index				Prominent Vegetation Index
Model	Acc	P	R	F1	Acc	P	R	F1
SVM	0.9514	0.9570	0.9481	0.9485	0.9318	0.9325	0.9290	0.9281
KNN	0.9416	0.9489	0.9401	0.9364	0.9270	0.9289	0.9248	0.9220
RF	0.9075	0.9106	0.9019	0.9009	0.8832	0.8891	0.8786	0.8783
XGBOOST	0.9075	0.9116	0.9061	0.9038	0.9171	0.9215	0.9168	0.9146

Table 4. Identification results of full color features and preferred features.

Model	All Color Features				Significant Color Features
Model	Acc	P	R	F1	Acc	P	R	F1
SVM	0.7573	0.7745	0.7411	0.7375	0.8061	0.7981	0.7869	0.7839
KNN	0.7138	0.7311	0.6854	0.6749	0.7427	0.7386	0.7174	0.7140
RF	0.7282	0.7159	0.6980	0.6893	0.7088	0.6916	0.6883	0.6730
XGBOOST	0.7529	0.7472	0.7293	0.7268	0.7430	0.7629	0.7251	0.7194

Table 5. Identification results of full texture features and preferred features.

Model	All Texture Features				Effective Texture Features
Model	Acc	P	R	F1	Acc	P	R	F1
SVM	0.8008	0.7984	0.7931	0.7907	0.7569	0.7716	0.7424	0.7380
KNN	0.6696	0.6662	0.6479	0.6489	0.7375	0.7525	0.7257	0.7179
RF	0.8108	0.8107	0.8053	0.8036	0.6794	0.6914	0.6732	0.6646
XGBOOST	0.8397	0.8430	0.8345	0.8343	0.7037	0.7223	0.6934	0.6857

Table 6. Eleven fused feature datasets.

Fused Feature Datasets	Features Included
A	OWs + prominent VIs
B	OWs + effective TFs
C	OWs + significant CFs
D	prominent VIs + effective TFs
E	prominent VIs + significant CFs
F	effective TFs + significant CFs
G	OWs + prominent VIs + effective TFs
H	OWs + prominent VIs + significant CFs
I	OWs + effective TFs + significant CFs
J	prominent VIs + significant CFs + effective TFs
K	OWs + effective TFs + prominent VIs + significant CFs

Table 7. Identification results of preferred two-feature combinations.

Model	Evaluation	A	B	C	D	E	F
SVM	Acc	0.9320	0.8936	0.8643	0.8977	0.9080	0.8355
	P	0.9369	0.8710	0.8593	0.8970	0.9174	0.8278
	R	0.9308	0.8626	0.8619	0.8997	0.9013	0.8245
	F1	0.9299	0.8599	0.8559	0.8943	0.9041	0.8171
KNN	Acc	0.9126	0.7716	0.7478	0.8734	0.8931	0.7668
	P	0.9157	0.7464	0.7532	0.8740	0.8999	0.7559
	R	0.9034	0.7504	0.7209	0.8730	0.8904	0.7459
	F1	0.9057	0.7380	0.7189	0.8684	0.8838	0.7337
RF	Acc	0.8831	0.7232	0.7765	0.8831	0.8880	0.7521
	P	0.8856	0.7387	0.7635	0.8874	0.9039	0.7163
	R	0.8786	0.7114	0.7532	0.8793	0.8821	0.7323
	F1	0.8765	0.7091	0.7460	0.8775	0.8822	0.716
XGBoost	Acc	0.8976	0.7815	0.8010	0.9075	0.9075	0.7670
	P	0.9067	0.7822	0.8003	0.9202	0.9178	0.7752
	R	0.8942	0.7661	0.7886	0.9074	0.9026	0.7477
	F1	0.8936	0.7620	0.7904	0.9057	0.9023	0.7366

Table 8. Identification results of preferred three- and four-feature combinations.

Model	Evaluation	G	H	I	J	K
SVM	Acc	0.9123	0.9176	0.8643	0.9177	0.9176
	P	0.9218	0.9233	0.8680	0.9248	0.9220
	R	0.9092	0.9089	0.8560	0.9128	0.9103
	F1	0.9077	0.9120	0.8554	0.9131	0.9118
KNN	Acc	0.8639	0.8738	0.8104	0.8738	0.8883
	P	0.8709	0.8769	0.8173	0.8823	0.9017
	R	0.8591	0.8572	0.7956	0.8634	0.8707
	F1	0.8563	0.8600	0.7990	0.8637	0.8740
RF	Acc	0.8831	0.8930	0.8104	0.8927	0.9075
	P	0.8840	0.8989	0.8173	0.8966	0.9118
	R	0.8803	0.8884	0.7956	0.8873	0.9043
	F1	0.8784	0.8883	0.7990	0.8869	0.9016
XGBooost	Acc	0.9026	0.9075	0.8202	0.9171	0.9171
	P	0.9129	0.9104	0.8180	0.9234	0.9226
	R	0.9033	0.9071	0.8105	0.9144	0.9144
	F1	0.9001	0.9043	0.8104	0.9130	0.9129

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, W.; Wang, L.; Cui, L.; Zheng, M.; Li, X.; Li, C. Identifying the Restoration Stages of Degraded Alpine Meadow Patches Using Hyperspectral Imaging and Machine Learning Techniques. Agriculture 2024, 14, 1097. https://doi.org/10.3390/agriculture14071097

AMA Style

Luo W, Wang L, Cui L, Zheng M, Li X, Li C. Identifying the Restoration Stages of Degraded Alpine Meadow Patches Using Hyperspectral Imaging and Machine Learning Techniques. Agriculture. 2024; 14(7):1097. https://doi.org/10.3390/agriculture14071097

Chicago/Turabian Style

Luo, Wei, Lu Wang, Lulu Cui, Min Zheng, Xilai Li, and Chengyi Li. 2024. "Identifying the Restoration Stages of Degraded Alpine Meadow Patches Using Hyperspectral Imaging and Machine Learning Techniques" Agriculture 14, no. 7: 1097. https://doi.org/10.3390/agriculture14071097

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Identifying the Restoration Stages of Degraded Alpine Meadow Patches Using Hyperspectral Imaging and Machine Learning Techniques

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of the Study Area

2.2. Hyperspectral Image Acquisition and Preprocessing

2.2.1. Hyperspectral Imaging Systems

2.2.2. Image Acquisition and Calibration

2.3. Feature Extraction and Selection

2.3.1. Spectral Feature Extraction and Selection

2.3.2. Vegetation Index Extraction and Selection

2.3.3. Color Feature Extraction and Selection

2.3.4. Texture Feature Extraction and Selection

2.4. Feature Fusion and Model Building

2.5. Evaluation Indicators

2.6. Research Process

3. Results

3.1. Spectral Analysis and Modeling

3.2. Vegetation Index Analysis and Modeling

3.3. Color Feature Analysis and Modeling

3.4. Textural Feature Analysis and Modeling

3.5. Discriminative Modeling of Fusion Features

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI