Mapping Seasonal Spatiotemporal Dynamics of Alpine Grassland Forage Phosphorus Using Sentinel-2 MSI and a DRL-GP-Based Symbolic Regression Algorithm

Shi, Jiancong; Zhang, Aiwu; Wang, Juan; Gao, Xinwang; Hu, Shaoxing; Chai, Shatuo

doi:10.3390/rs16214086

Open AccessArticle

Mapping Seasonal Spatiotemporal Dynamics of Alpine Grassland Forage Phosphorus Using Sentinel-2 MSI and a DRL-GP-Based Symbolic Regression Algorithm

by

Jiancong Shi

^1,2,3,

Aiwu Zhang

^1,2,3,*,

Juan Wang

^1,2,3,

Xinwang Gao

^1,2,3,

Shaoxing Hu

⁴ and

Shatuo Chai

⁵

¹

Key Laboratory of 3D Information Acquisition and Application, Ministry of Education, Capital Normal University, Beijing 100048, China

²

Engineering Research Center of Space Information Technology, Ministry of Education, Capital Normal University, Beijing 100048, China

³

Center for Geographic Environment Research and Education, Capital Normal University, Beijing 100048, China

⁴

School of Mechanical Engineering & Automation, Beihang University, Beijing 100191, China

⁵

Academy of Animal and Veterinary Sciences, Qinghai University, Xining 810016, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(21), 4086; https://doi.org/10.3390/rs16214086

Submission received: 25 August 2024 / Revised: 26 October 2024 / Accepted: 31 October 2024 / Published: 1 November 2024

(This article belongs to the Special Issue Remote Sensing of Mountain and Plateau Vegetation)

Download

Browse Figures

Versions Notes

Abstract

:

An accurate estimation of seasonal spatiotemporal dynamics of forage phosphorus (P) content in alpine grassland is crucial for effective grassland and livestock management. In this study, we integrated Sentinel-2 multispectral imagery (MSI) with computational hyperspectral features (CHSFs) and developed a novel symbolic regression algorithm based on deep reinforcement learning and genetic programming (DRL-GP) to estimate forage P content in alpine grasslands. Using 243 field observations collected during the regreening, grass-bearing, and yellowing periods in 2023 from the Shaliu River Basin, we generated 10 CHSF images (CHSFIs) with varying spectral dispersions (1–10 nm). Our results demonstrated the following: (1) The DRL-GP-based symbolic regression model identified the optimal CHSF and spectral dispersion for each growing season, significantly enhancing estimation accuracy. (2) Forage P content estimations using the combined CHSF and DRL-GP-based symbolic regression algorithm significantly outperformed traditional methods. Compared to original spectral features, the R² improved by 99.5%, 57.4%, and 86.2% during the regreening, grass-bearing, and yellowing periods, with corresponding MSE reductions of 84.8%, 41.5%, and 75.8% and MAE decreases of 70.7%, 57.5%, and 50.4%. Across these growing seasons, the R² increased by 322.2%, 68.2%, and 639.8% compared to MLR, 128.9%, 97.4%, and 469.2% compared to RF, and 485.1%, 65.3%, and 231.3% compared to DNN. The MSE decreased by 31%, 82.9%, and 52.4% compared to MLR, 39.9%, 42.4%, and 31.4% compared to RF, and 84.5%, 73.4%, and 81.9% compared to DNN. The MAE decreased by 32.6%, 67%, and 44.2% compared to MLR, 42.6%, 47.6%, and 37.9% compared to RF, and 60.2%, 50%, and 56.3% compared to DNN. (3) Proximity to the water system notably influenced forage P variation, with the highest increases observed within 1–2 km of water sources. These findings provide critical insights for optimizing grassland management and improving livestock productivity.

Keywords:

alpine grassland; phosphorus content; computational hyperspectral feature; symbolic regression model; seasonal spatiotemporal dynamics

1. Introduction

Grasslands cover approximately 40.5% of the terrestrial ecosystem, excluding Greenland and Antarctica. They serve as vital reservoirs of organic carbon and provide essential ecosystem services, such as forage for domesticated and wild herbivores [1]. However, over-exploitation and climate change have led to the degradation of grasslands, particularly in alpine regions [2]. Consequently, there is an urgent need to assess the current growth status of alpine grasslands and address the challenges they face. Phosphorus (P), a critical component of cellular structures including phospholipids, nucleic acids, and adenosine triphosphate (ATP), plays a key role in metabolic processes in plants [3]. As an essential nutrient, P impacts the growth of alpine grassland vegetation and grazing patterns. It also enhances the resilience of forage to rapidly changing climatic condition [4]. Therefore, the accurate and timely estimation of forage P content, along with its spatial distribution and temporal dynamics, is crucial for the effective management of alpine grasslands and grazing practices.

Remote sensing has become a widely used tool for estimating forage P content in alpine grasslands due to its ability to provide large-scale, rapid, and cost-effective data [5]. Estimating forage P content via remote sensing typically involves constructing an inversion model that correlates field-measured P content with various indicators such as spectral reflectance and climatic factors. These models are optimized using regression techniques, including linear models, machine learning, and deep learning [6].

Most existing feature construction models rely on multispectral reflectance data from remote sensing images and spectral band operations, such as principal component analysis, filtering, or wavelet transforms [7,8]. However, the limited spectral information in multispectral images hampers the extraction of sensitive information related to forage P content [9]. In contrast, hyperspectral imagery, with its finer spectral resolution, offers a more effective means of capturing functional and compositional features. It has been used successfully to estimate plant functional types, biochemical properties, and health status [10]. Nevertheless, the high cost of hyperspectral sensors presents a substantial obstacle to their utilization in extensive grassland studies [11]. Consequently, more affordable multispectral data are used to reconstruct or simulate continuous hyperspectral features [12]. Recent approaches have explored spectral band operations [13], spectral reconstruction [14], and spectral simulation [15], but these methods remain constrained by the discrete spectral data provided by multispectral imagery. This limits their ability to capture continuous variation in grasslands across the spectral dimension [16]. The main challenge in estimating forage P content in large grassland areas stems from the data acquisition process of multispectral sensors, where each band captures data within a specific wavelength range, resulting in discrete datasets and reduced interpretive power [17]. Therefore, novel theories and techniques should be considered.

Establishing a more reasonable estimation model through the creation of feature datasets continues to be a significant research priority in grassland remote sensing. A critical aspect of model development lies in identifying appropriate algorithms. Traditionally, empirical models, physical models, machine learning, and deep learning techniques have been widely used, but these methods often suffer from limited transferability and interpretability [9,18,19,20]. Recently, transferability and interpretability have become prominent research areas in machine learning and deep learning [21]. Symbolic regression has emerged as a promising technique in this regard, as it searches through a vast space of functions to uncover meaningful symbolic expressions that describe the underlying data patterns. Unlike black-box models, which focus solely on input and output, symbolic regression provides valuable insights into physical laws and quantitative data relationships [22]. It has proven especially effective at generating simple mathematical expressions that describe complex processes [23]. Symbolic regression is now considered a leading candidate for developing interpretable models in fields such as physics [24], biology [25], and climate science [26]. Furthermore, it has consistently outperformed popular machine learning algorithms like the scalable end-to-end tree boosting system (XGBoost) [27], highly efficient gradient boosting decision tree (LightGBM) [28], random forest [29], and adaptive boosting (AdaBoost) [30], often producing models that are orders of magnitude smaller.

Existing research on estimating forage P content predominantly relies on traditional regression algorithms. For instance, Pang et al. used partial least squares regression to estimate forage P content in the Inner Mongolia Plateau [5], while Gao et al. applied random forest algorithms for estimation in Gansu Province [31]. Zhang et al. employed support vector machines and random forest models to estimate P content in the eastern Tibetan Plateau [32]. However, traditional machine learning models, such as random forests and support vector machines, are limited in their ability to generate explicit mathematical expressions for forage P content, reducing the interpretability and applicability of these models. In contrast, although traditional linear regression can produce explicit numerical expressions, it often struggles to capture the nonlinear relationships inherent in the data [33]. Additionally, estimating forage P content using symbolic regression encounters challenges due to variations in data dimensions, scales, and units. Therefore, it is advisable to seek more suitable symbolic regression algorithms.

This study addresses these critical issues and makes four key contributions: (1) we introduce a computational hyperspectral feature (CHSF) method that generates continuous CHSFs from multispectral data; (2) we develop a symbolic regression model based on genetic programming, deep learning, and reinforcement learning (DRL-GP) to derive optimal symbolic expressions for forage P content; (3) the combination of CHSFs and DRL-GP-based symbolic regression significantly improves the accuracy of forage P content estimation compared to original spectral features and traditional inversion models across the regreening, grass-bearing, and yellowing periods of the growing seasons; and (4) we map the seasonal spatiotemporal dynamics of forage P content, offering valuable insights for the remote sensing inversion of grassland nutrient content.

2. Materials and Methods

2.1. Study Area

This study surveys the Shaliu River Basin, located in the northeastern part of the Tibetan Plateau (Figure 1a). The region is predominantly alpine meadows, with natural year-round grasslands covering about 31% of Gangcha County’s total land area. These grasslands are vital to the development of grassland livestock farming in the Qinghai Lake Basin. The area is situated at an altitude exceeding 3100 m and experiences a typical continental plateau climate, characterized by long, cold winters and short, cool summers. The annual mean temperature is 1.7 °C, with an average annual rainfall of approximately 456.0 mm. Notably, the annual average evaporation is around 1461.4 mm.

2.2. Data

2.2.1. Field Data Collection

Field sampling for this study was conducted during three periods: 4–10 June 2023 (regreening period), 12–18 August 2023 (grass-bearing period), and 13–17 October 2023 (yellowing period). A total of 71 samples were collected during the regreening period, 114 during the grass-bearing period, and 58 during the yellowing period, covering an altitude range of 400 m (Figure 1b). GPS coordinates (longitude, latitude, and elevation) were recorded for each sample plot. During the regreening period, the plot size was 1 m × 1 m, whereas during the grass-bearing and yellowing periods, the plot size was 0.5 m × 0.5 m (Figure 2(1)). However, to better represent the spatiotemporal dynamics of forage P content, we standardized the plot size for calculations by expanding the plots from the grass-bearing and yellowing periods to 1 m × 1 m, which is a factor of four. Vegetation in each plot was cut at the ground level, labeled, bagged, and transported to the laboratory for analysis. Laboratory procedures followed the GB/T 6437-2018 [34] standard to ensure the accurate determination of forage phosphorus content, expressed as mg per m² of ground area.

2.2.2. Satellite Data Processing

The multispectral images used in this study are sourced from the Sentinel-2A/2B satellites, with data obtained from the European Space Agency’s (ESA) official website (https://browser.dataspace.copernicus.eu/ accessed on 1 April 2024). Due to cloud cover in the study area during the actual sampling dates, the imagery was collected over three periods: 1–15 June 2023, 10–20 August 2023, and 10–20 October 2023. The primary reason for selecting these individual sensing images was to ensure that the acquisition dates were as close as possible to our sampling dates. This approach enables a more reasonable modeling of the features derived from the remote sensing images in relation to the measured P content, with the aim of obtaining more accurate estimation results. Sentinel-2 Level-2A data provide extensive multispectral information across 12 bands, ranging from visible to shortwave infrared. For this study, 11 bands are utilized, excluding Band 1, which primarily captures atmospheric effects.

All images underwent preprocessing was conducted, including radiometric calibration, atmospheric correction, and geometric rectification. Cloud-free or low-cloud data were synthesized based on cloud coverage. Remote sensing image preprocessing was conducted using the SNAP 10.0 software provided by the ESA, and all bands were resampled to a spatial resolution of 10 m. This process resulted in the formation of a multispectral image dataset used for CHSF extraction.

Digital elevation model (DEM) data were obtained from NASA with a spatial resolution of 30 m. The water system was produced using ArcGIS 10.5 software with the DEM data.

2.3. Methods

To address the challenges in estimating forage P content outlined above, two primary issues require attention: (1) the limitations of multispectral data mining methods, which arise from their discrete spectral sampling approach, fail to capture the inherent continuous geometric relationships within multispectral data, thereby limiting the accurate extraction of continuous spectral features; and (2) traditional symbolic regression algorithms based on genetic programming exhibit shortcomings such as dimensionality issues, low search efficiency, and limited global search capability. This section introduces two methods (Figure 2(2)) to tackle these challenges. First, the CHSF extraction method, developed using “Graph Theory”, enables the generation of continuous spectral curves directly from multispectral data. It creates CHSF images (CHSFIs) with multiple spectral dispersions using various sampling approaches. Secondly, integrating deep reinforcement learning and genetic programming into symbolic regression enhances global search capabilities and efficiency (Figure 2(3)).

Figure 2. Framework of this study. (1) We collected data including DEM, multispectral images, and ground sampling. (2) We defined the CHSF and generated a method base and CHSFI dataset. (3) Using the DRL-GP algorithm, we built a symbolic regression model to acquire the optimal symbolic inversion model and spectral dispersion. (4) We analyzed the spatiotemporal dynamics of P content during growing seasons.

2.3.1. Extracting CHSF from Sentinel-2 MSI of Forage P Content Estimation

Sunlight reflection on surfaces can generally be visualized as a continuous curve. However, due to the discrete sampling methods of multispectral sensors, current data mining approaches remain limited to discrete points [35]. To address this limitation, this study leverages “Graph Theory” [36], which has been applied in various fields, including algebraic topology [37], quantum computing [38], probability statistics [39], and transportation planning [40]. “Graph Theory” has also been increasingly applied in the field of remote sensing. For example, Xie et al. used remote sensing data and “Graph Theory” to identify urban structures [41]; Wang et al. extracted river width from remote sensing images based on “Graph Theory” [42]; and Matthias et al. applied “Graph Theory” to extract agricultural fields from remote sensing imagery [43]. Building on these applications, we leverage “Graph Theory” for feature mining in remote sensing imagery.

“Graph Theory” is commonly employed to model relationships between events, where vertices represent individual events, and edges connect vertices to indicate relationships [44] (Figure 2(2)—“Graph Theory”). From this perspective, the spectral curve of each pixel in multispectral or hyperspectral remote sensing images can be represented as a network of interconnected components. Shifting from the original one-dimensional “sequence” to a two-dimensional “image”, pixel values from different spectral bands in multispectral images are treated as “points,” and the edges between them represent the spectral variation in different land features. As a result, the spectral sequences form a “polygonal graph”. However, the multispectral/hyperspectral graph exhibits discrete connectivity due to the sensor’s discrete spectral sampling (Figure 2(2)—remote sensing imagery graph). To overcome this, we compute continuous spectral curves, forming a “curvilinear graph” (Figure 2(2)—remote sensing imagery graph). This method, known as CHSF extraction, allows for the generation of continuous spectral curves from multispectral data (Figure 2(2)—remote sensing imagery graph). Furthermore, we produce 10 CHSFIs with varying spectral dispersions using equidistant discrete sampling (1–10 nm) (Figure 2(2)—CHSFI dataset) to estimate the content of forage P.

In the context of “Spectral Graph”, numerous ways exist to connect different points graphically. However, not all curves can be considered CHSF curves for remote sensing imagery. The extraction of these curves must align with the fundamental principles of remote sensing image acquisition and the characteristics of terrestrial features [45]. The proposed CHSF aims to establish an unsupervised system for extracting continuous hyperspectral features. Assuming a set of n known points (representing the number of spectral bands in the spectral image) denoted as

{(x}_{k}, y_{k}), k = 1,2, 3, \dots, n

, the CHSF extraction involves computing the corresponding function values

f (x)

for

x

, where (

x \neq x_{k}, k = 1,2, 3, \dots, n

).

f (x)

is a function defined on the interval [a, b] (where [a, b] represents the spectral range of input imagery),

x_{1}, x_{2}, x_{3} \dots x_{n}

are n distinct points on [a, b], and let G be a given class of functions. If there exists function

g (x)

in G satisfying the following equation, then

g (x)

is considered the CHSF methods of

f (x)

with respect to the points

x_{1}, x_{2}, x_{3} \dots x_{n}

:

g (x) = f (x_{i}), k = 1,2, 3, \dots, n

(1)

where

k

refers to the number of bands,

f (x)

is a different method of CHSFs, and

i

is the position of each band.

The CHSF, depicted in Figure 2(2)—(CHSF method base), categorizes points into two types: those lying on the curve and those off the curve. This section focuses on the extraction of spectral features using two primary methodological approaches: interpolation and fitting methods. Interpolation methods encompass spline interpolation (quadratic, cubic, and quartic), radial basis function (RBF) interpolation (multiple quadratic, inverse polynomial and thin plate), and polynomial function interpolation (linear and Hermite). Conversely, fitting methods consist of polynomial fitting (quadratic, and cubic), piecewise polynomial fitting (linear, quadratic, and cubic), and machine learning-based fitting (k-nearest neighbors, decision tree, and random forest). Interpolation refers to the process of estimating new data points within a given range based on known discrete points [46], addressing issues related to points lying on a continuous curve. Fitting, in contrast, involves approximating the original point function curve within a range using known discrete points, which helps resolve the challenge of points not lying on the curve [47]. This section expands the sixteen methods to spectral data and Sentinel-2 multispectral imagery (MSI) to construct CHSF datasets.

Initially, grassland spectral data from the USGS were utilized to establish the CHSF method base. This base was instrumental in extracting CHSFIs with a spectral dispersion of 5 nm from multispectral data obtained through satellite remote sensing imagery.

Figure 3 illustrates the challenge of “points lying on the line” and demonstrates the application of interpolation methods within the CHSF framework. The key observations are as follows: (1) Spline Interpolation: Quadratic, cubic, and quartic splines were utilized. Beyond quartic splines, the CHSF curves exhibited negative trends, deviating from the expected reflection mechanisms in grassland remote sensing. Quadratic and cubic splines shared similar characteristics, displaying multiple wave crests and troughs along their profiles. (2) RBF Interpolation: Compared to spline interpolation, RBF methods showed more pronounced wave crests and troughs across the spectral range. Multiple quadratic and inverse polynomial basis function interpolations exhibited distinct features between 470 nm and 1045 nm, while the thin-plate basis function interpolation revealed specific peaks and troughs at 1621.5 nm and 1923.6 nm. (3) Polynomial Interpolation: Linear and Hermite interpolations produced smoother curves compared to spline and RBF methods. Although smoother, these profiles still adhered to the fundamental principles of grassland remote sensing reflection.

Figure 4 visually represents the “points lying off the line” problem through the application of various fitting methods: (1) Machine learning fitting: CHSF curves generated using a decision tree and K-nearest neighbor algorithms tend to be relatively simple, while random forest produces more varied results. In general, machine learning fitting lacks the distinct wave crests and troughs typically observed in spectral curves. Instead, it tends to display a noticeable step-like pattern. (2) Piecewise polynomial fitting: When the polynomial degree exceeds three, the resulting CHSF curve deviates from the expected grassland remote sensing reflection mechanisms, often falling below zero. Therefore, piecewise linear, quadratic, and cubic polynomials are applied. The CHSF curve generated by linear polynomial fitting lacks evident wave crests and troughs. In contrast, quadratic and cubic piecewise polynomial fitting produce curves with clear wave crests and troughs, concentrated in the 560 nm to 880 nm range (Figure 4). (3) Polynomial fitting: Similarly, when the polynomial degree is greater than three, the CHSF curve falls below 0, and linear fitting results in a straight line. Therefore, quadratic and cubic polynomials are selected. While these methods generate CHSF curves with noticeable wave crests and troughs, they tend to diverge from the fundamental trends in grassland spectral reflection.

In summary, these CHSF curves are derived from equidistant spectral discrete sampling, generating a dataset of CHSFIs with varying spectral dispersions (ranging from 1 to 10 nm) based on Sentinel-2 MSI data. The detailed process for this CHSF extraction and dataset generation is illustrated in Figure 5.

2.3.2. Symbolic Regression Based on DRL-GP of Forage P Content Estimation

Genetic algorithms have been widely employed in symbolic regression problems because of their powerful search capabilities. However, they often struggle to efficiently capture relevant knowledge during the evolutionary process, requiring significant computational resources and time for each iteration. To fully exploit the advantages of genetic algorithms in symbolic regression, this section integrates deep learning and reinforcement learning algorithms. Reinforcement learning is a machine learning algorithm that employs rewards and penalties to enable an agent to learn through interactions with its environment. Its goal is to discover the optimal strategy and maximize cumulative rewards [48]. In this study, deep reinforcement learning is embedded within the genetic algorithm framework, assigning an agent to each gene point of symbolic expressions. The specific process consists of three key steps (Figure 2(3)): state input, action space definition, and searching for the optimal expression.

The neural network uses all training data as input. The input specifically includes CHSFs for each measured sample plot, and the output corresponds to the measured P content for each plot (Figure 5). To improve the efficiency of the entire learning process, all images are converted into sequential inputs.

The action space for the skeletal component of symbolic expressions consists of a predefined function library. As for the feature component of symbolic expressions, the action space encompasses all input parameters. These two components are mutually exclusive, and the skeletal and feature action spaces do not intersect.

In each generation, a symbolic expression is generated, and this experiment utilizes R², MSE, and MAE as reward metrics for the reinforcement learning algorithm. Iterations continue until the highest coefficient of determination (R²) (Equation (2)), lowest mean squared error (MSE) (Equation (3)), and mean absolute error (MAE) (Equation (4)) values are determined, serving as criteria for identifying the optimal symbolic expression.

R^{2} = 1 - \frac{{S S}_{r e s}}{{S S}_{t o t}},

(2)

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - {\hat{Y}}_{i})}^{2},

(3)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | Y_{i} - {\hat{Y}}_{i} |,

(4)

where

{S S}_{r e s}

refers to the sum of squared errors between individual observed values

Y_{i}

and their corresponding predicted values

{\hat{Y}}_{i}

and

{S S}_{t o t}

are the sum of squares between each observed value and their mean.

For training, ten-fold cross-validation is utilized to mitigate the influence of data overfitting or underfitting on model performance. The hyperparameters are configured as follows: PyTorch 1.10.2 framework is employed, and the loss function is defined as a combination of the coefficient of determination R², MSE, and MAE.

In this experiment, we selected a training size of 75% and a test size of 25%. During the selection phase, we used a tournament selector with a size of five participants. The population size was set to 100 because the selection operator chose individuals from both the population and the archive. The evaluation times were set to 10,000 and the number of evolutionary productions to 100, contingent on the population size. This study employed a neural network with two fully connected layers, comprising a total of 128 neurons.

For the skeleton of the symbolic expressions, the function space includes basic mathematical operators such as addition (+), subtraction (−), multiplication (×), division (÷), and trigonometric functions (sin, cos, and tan).

2.3.3. Comparison of CHSF with OSF in P Content Estimation

To validate the effectiveness of the CHSF proposed in Section 2.3.1 for estimating forage P content, this section utilizes models incorporating original spectral features (OSFs), including original bands of Sentinel-2 and commonly used vegetation indices (Table 1), as a control group. The results from these control models are compared and analyzed against those obtained using the CHSF, aiming to confirm the efficacy of the proposed method. In this comparative experiment, we utilize the optimal symbolic models identified in Section 2.3.2 as the inversion models. This section selects 19 vegetation indices that utilize both the near-infrared and red-edge bands due to their importance in estimating forage P content.

2.3.4. Comparison of Symbolic Regression with MLR, RF, and DNN in Forage P Content Estimation

This section aims to validate the symbolic regression model proposed in Section 2.3.2 for estimating forage P content. Control groups include multiple linear regression (MLR), random forest regression (RF), and deep neural network regression (DNN) models, which are compared and analyzed against the results of the optimal symbolic regression model. To ensure consistent data partitioning, the control experiments utilize a training size of 75% and a test size of 25%. The input data for the MLR, RF, and DNN models align with those of the symbolic regression model, encompassing both CHSFs and the measured P content (Figure 5).

2.3.5. Seasonal Spatiotemporal Dynamics of Forage P Content from Regreening to Yellowing Period

Based on the distribution of the water system in the Shaliu River Basin (Figure 1), this section employs kernel density analysis (KDA) to delineate the entire basin according to the river’s course. KDA is a non-parametric method used for estimating probability density that generates continuous density surfaces from vector points and lines [65]. This method effectively delineates polygons of interest based on the spatial characteristics of density values. In this study, we utilized ArcGIS 10.8 and the natural break method to reclassify the kernel density results into four distinct change areas, enabling us to analyze variations in P content within each area. Based on the spatial distributions and data histograms of P content from the regreening to yellowing period, the threshold for KDA levels is determined. It is consistently agreed that this threshold aligns with the variations in P content. Thus, this method is employed to divide the spatial dynamic variation results of forage P content within the study area.

3. Results

3.1. Optimal CHSF Methods Analysis of Forage P Content Estimation

Based on the dataset acquired from the CHSFI in Section 2.3.1, a comparative analysis was conducted to assess the estimative accuracy of forage P content utilizing both the CHSFI dataset and symbolic regression models across various growth stages. The objective was to determine the most effective method for CHSF extraction. Figure 6 illustrates the accuracy of forage P content estimation across all CHSF methods. During the regreening period, the inverse polynomial method of RBF interpolation demonstrated superior performance, with an R² of 0.825, an MSE of 0.0028, and an MAE of 0.03. In the grass-bearing period, the thin-plate method of RBF interpolation was identified as the most accurate, with an R² of 0.747, an MSE of 0.0018, and an MAE of 0.03. For the yellowing period, quartic spline interpolation emerged as the most effective, with an R² of 0.811, an MSE of 0.006, and an MAE of 0.061. These methods were subsequently utilized to extract the optimal spectral dispersion for estimating forage P content.

3.2. Optimal Spectral Dispersion of CHSF Analysis of Forage P Content Estimation

Following the identification of the optimal CHSF extraction method, a comparative analysis was undertaken to evaluate the accuracy of CHSFI at varying spectral resolutions for the estimation of forage P content. Table 2 clearly reports the optimal CHSF spectral dispersion for P content as follows: the regreening period is 6nm (R² = 0.836, MSE = 0.002, and MAE = 0.029); grass-bearing is 10 nm (R² = 0.831, MSE = 0.083, and MAE = 0.054); and yellowing is 9 nm (R² = 0.825, MSE = 0.0037, and MAE = 0.049). Accordingly, these dispersions are utilized to determine the optimal CHSFI for estimating forage P content.

Importantly, the optimal symbolic models for estimating forage P content are obtained for the regreening, grass-bearing, and yellowing periods (Table 3).

3.3. Accuracy Evaluation of Combined CHSF with Symbolic Regression Model in Forage P Content Estimations

Table 4 clearly demonstrates the accuracy differences between the CHSF and OSF inversion results using the optimal symbolic regression model (Table 3). CHSFs showed significant R² improvements over OSFs, with increases of 99.5%, 57.4%, and 86.2% during the regreening, grass-bearing, and yellowing periods, respectively. The MSE for CHSFs was also markedly lower, with reductions of 84.8%, 41.5%, and 75.8% across the same periods. Additionally, the MAE for CHSFs decreased by 70.7%, 57.5%, and 50.4% during the regreening, grass-bearing, and yellowing periods, respectively.

Table 5 clearly reports the performance of the MLR, RF, and DNN models with optimal symbolic regression model using optimal CHSFIs. Generally, the symbolic regression model with CHSFs proposed in this study outperforms other estimation methods significantly. Across regreening, grass-bearing, and yellowing periods, the following applies: the R² increased by 322.2%, 68.2%, and 639.8% compared to MLR; 128.9%, 97.4%, and 469.2% compared to RF; and 485.1%, 65.3%, and 231.3% compared to the DNN. The MSE decreased by 31%, 82.9%, and 52.4% compared to MLR, 39.9%, 42.4% and 31.4% compared to RF and 84.5%, 73.4%, and 81.9% compared to the DNN. The MAE decreased by 32.6%, 67% and 44.2% compared to MLR; 42.6%, 47.6%, and 37.9% compared to RF; and 60.2%, 50%, and 56.3% to the DNN.

Figure 7 clearly reports the fitting results between the predicted and measured P content of our optimal models (Table 3). Outliers were removed during the plotting process. Overall, the predicted P content of all growing seasons exhibits small differences compared to the measured values.

3.4. Spatiotemporal Distributions Analysis of Forage P Content During Regreening, Grass-Bearing, and Yellowing Periods

The optimal estimation model from Section 3.2 was used to map the spatial distribution of forage P content within the study area, enabling an analysis of its spatial pattern variations. Figure 8 shows the spatial distributions and data histograms of forage P content in the regreening, grass-bearing, and yellowing periods. Grasslands located near water systems exhibited significantly higher P content. Throughout the three growing seasons, the overall trend in P content followed the order grass-bearing > yellowing > regreening. In the regreening period, P content was generally low, showing a south-to-north decreasing gradient in the Shaliu River Basin (Figure 8a). During the grass-bearing period, P content rose sharply, particularly in central regions (Figure 8b). By the yellowing period, P content declined, with the highest levels in the northern part of the basin (Figure 8c).

4. Discussion

4.1. Contributions of Combining CHSF with Symbolic Regression Model for Estimating Forage P Content

This study shows that using CHSFs and the symbolic regression model to estimate forage P content across different growing seasons is significantly more effective than traditional methods. The CHSF generally outperforms OSFs in estimating forage P content using optimal symbolic regression models (Table 4). Previous studies have shown that hyperspectral information and reconstructed or simulated spectral information can improve the estimation of forage P content [5]. However, these supervised methods are often limited when estimating forage P content over large areas and across multiple growing seasons. These findings provide an effective method for extracting spectral information for the large-scale estimation of forage P content.

Symbolic regression is widely recognized for its ability to construct symbolic models for data without relying on prior knowledge/models, using evolutionary methods to autonomously search and combine optimal mathematical operators and functions into a functional expression [66]. In Section 3.3, we compared the performance of the symbolic regression model with MLR, RF, and DNN models for estimating the P content of alpine grassland combined, using optimal CHSFIs (Table 5). The results clearly indicate that symbolic regression achieves significantly higher estimation accuracy than the other methods (Table 5).

The core of grassland remote sensing inversion lies in establishing a mathematical relationship of the form

y = k x + b

[67] between different spectral bands x and ground truth biochemical covariates

y

. Both the spectral bands

x

and the mathematical formulation of this relationship are critical for determining the accuracy and reliability of the inversion process. Unlike the purely numerical outputs of linear regression or the opaque “black-box” nature of machine learning and deep learning models, symbolic regression provides a more interpretable and robust mathematical framework.

Additionally, the reward mechanism in reinforcement learning enhances the search efficiency of the model, leading to a significant improvement in the efficiency of remote sensing inversion. Furthermore, a symbolic regression model can provide mathematical expressions that are more robust and interpretable, avoiding dependence on statistical learning or numerical optimization.

4.2. Feature CHSF Bands of Optimal Symbolic Inversion Model for Estimating Content of Forage P

Based on the optimal spectral dispersions from Table 2 and the optimal estimation models from Table 3, we identified the key spectral bands for estimating P content in forage across different growing seasons (Table 6). Table 6 reveals that these key spectral bands are predominantly in the shortwave infrared region, consistent with previous studies that have identified sensitive P bands in this range [68,69]. However, some studies have also identified sensitive P bands in the visible spectrum (439 nm, 554 nm, 669 nm, and 674 nm) [70]. Moreover, Table 6 highlights important spectral bands within the visible range, such as 677.51 nm, 701.47 nm, and 719.44 nm during the regreening period; 734.08 nm in the grass-bearing period; and 686.5 nm during the yellowing period. These findings further support the validity of the spectral dispersion in the CHSF concept and highlight its potential for overcoming the challenge of acquiring high-spectral information for the large-scale remote sensing of ecological parameters.

4.3. Seasonal Spatiotemporal Dynamics Analysis of Forage P Content from Regreening to Yellowing Period

The seasonal spatiotemporal dynamics of forage P content were analyzed across the regreening, grass-bearing, and yellowing periods, revealing distinct spatial pattern variations in the study area (Figure 9). The histograms in Figure 8, generated using the natural break method, informed the final kernel density estimation threshold based on spatial distributions. This threshold delineated four distinct change areas (Figure 9c). Figure 9 clearly illustrates the changes in content of forage P across the Shaliu River Basin from the regreening to yellowing period. From the regreening to grass-bearing period, there is a noticeable increase in P content, particularly in areas closer to the water system, with the central part of the basin experiencing the most significant rise (Figure 9a). Conversely, from the grass-bearing to yellowing period, the central region of the basin shows the most substantial decrease in P content, with the decline being more pronounced in areas nearer to the water system (Figure 9b). The kernel density analysis of the water system in the Shaliu River Basin, as depicted in Figure 9c, reveals four distinct change areas. These are classified as the core change area (less than 1 km), primary (1–1.5 km), secondary (1.5–2 km), and non-significant (greater than 2 km). The order of these regions by area size, from largest to smallest, is as follows: primary > core > secondary > non-significant (Table 7).

Table 7 clearly demonstrates that the increase in P from the regreening to grass-bearing period is most pronounced in primary change areas (mean = 0.426 and STD = 0.116). The change in P in secondary areas is comparable to that in secondary areas (mean = 0.421 and STD = 0.105), indicating that the greatest increase in P occurs within a 1–2 km range from the water system. Similarly, the data for the transition from the grass-bearing to yellowing period show that the most significant decrease in P occurs in both the primary (mean = −0.286 and STD = 0.116) and secondary (mean = −0.281 and STD = 0.117) areas, also within the 1–2 km range from the water system.

In summary, the P content of forage in the Shaliu River Basin exhibits clear seasonal dynamic changes: from the regreening period to the grass-bearing period, forage P content shows a marked increase, while from the grass-bearing period to the yellowing period, P content significantly decreases. These results indicate that the P content follows a general seasonal dynamic, increasing during the early growth stages and decreasing as the forage matures. This suggests that as forage growth progresses, its P uptake capacity first increases and then declines. During the regreening to grass-bearing periods, rapid growth and active metabolism result in an enhanced capacity for P absorption [71]. However, as the forage enters the yellowing period, its metabolic activity decreases, leading to a reduced ability to absorb P [72]. Therefore, the seasonal dynamics of forage P content reflect the progression of forage growth and nutritional stages. More importantly, kernel density analysis shows that the seasonal variation in P content also exhibits a consistent spatial distribution pattern, strongly correlated with the proximity to water systems. This spatial distribution allows us to explore the relationship between water system distance and P content variations. The kernel density analysis results divide the spatial distribution of forage P content dynamics across the regreening, grass-bearing, and yellowing periods into four zones, which show a strong spatial correlation with the location of the water system. Therefore, the distance of forage from the water system is also a critical factor influencing P uptake.

4.4. Limitations and Future Work

Our study presents some uncertainties in the following aspects. First, CHSFs solely consider feature extraction within the spectral range of the original data, neglecting features outside this range. Second, CHSFs rely on multispectral data. Third, the DRL-GP-based symbolic regression method requires a substantial amount of computational time, which has not been addressed.

For future research, it is recommended to continue evaluating the performance of CHSFs across different sensors (such as Landsat and MODIS) and various data sources (including hyperspectral and multispectral data). More importantly, exploring the potential applications of integrating CHSFs with symbolic regression in the remote sensing monitoring of agriculture, forests, water systems, and other ecosystems will also be a key focus of our future work.

5. Conclusions

This study integrates the CHSF with a symbolic regression model to improve the accuracy of estimating P content in forage across regreening, grass-bearing, and yellowing periods. It includes a comparative analysis and accuracy evaluation of the optimal methods and spectral dispersions for P estimation. Additionally, the study explores the spatiotemporal dynamics of forage P content. Key findings indicate that combining the CHSF with symbolic regression significantly enhances the accuracy of P content estimates across regreening (R² = 0.836, MSE = 0.002, and MAE = 0.029), grass-bearing (R² = 0.831, MSE = 0.0083, and MAE = 0.054), and yellowing periods (R² = 0.825, MSE = 0.0037, and MAE = 0.049). The study also identifies critical spectral information directly from multispectral images relevant to P estimation. Notably, the most significant changes in P content occur within 1–2 km of the water system, providing valuable insights for the sustainable management of alpine grasslands and the rational use of grassland resources. The limitations of this work are discussed in Section 4.4 and will be addressed in future research.

Author Contributions

Conceptualization, A.Z. and J.S.; data curation, S.H. and S.C.; formal analysis, A.Z. and J.S.; funding acquisition, A.Z.; investigation, S.H. and S.C.; methodology, A.Z. and J.S.; resources, A.Z.; software, J.W. and X.G.; supervision, A.Z.; validation, A.Z. and J.S.; visualization, J.W. and X.G.; writing—original draft, J.S.; writing—review and editing, A.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (42071303, 41571369); the Science and Technology Program of Qinghai Province of China (2022-NK-136); and the Joint program of Beijing Municipal Education Commission and Beijing Municipal Natural Science Foundation of China (KZ202110028044).

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Muro, J.; Linstädter, A.; Magdon, P.; Wöllauer, S.; Männer, F.A.; Schwarz, L.-M.; Ghazaryan, G.; Schultz, J.; Malenovský, Z.; Dubovyk, O. Predicting plant biomass and species richness in temperate grasslands across regions, time, and land management with remote sensing and deep learning. Remote Sens. Environ. 2022, 282, 113262. [Google Scholar] [CrossRef]
Li, C.; de Jong, R.; Schmid, B.; Wulf, H.; Schaepman, M.E. Spatial variation of human influences on grassland biomass on the Qinghai-Tibetan plateau. Sci. Total Environ. 2019, 665, 678–689. [Google Scholar] [CrossRef]
Hammond, J.P.; Broadley, M.R.; White, P.J. Genetic Responses to Phosphorus Deficiency. Ann. Bot. 2004, 94, 323–332. [Google Scholar] [CrossRef]
Gao, J.; Meng, B.; Liang, T.; Feng, Q.; Ge, J.; Yin, J.; Wu, C.; Cui, X.; Hou, M.; Liu, J.; et al. Modeling alpine grassland forage phosphorus based on hyperspectral remote sensing and a multi-factor machine learning algorithm in the east of Tibetan Plateau, China. ISPRS J. Photogramm. Remote Sens. 2019, 147, 104–117. [Google Scholar] [CrossRef]
Pang, H.; Zhang, A.; Yin, S.; Zhang, J.; Dong, G.; He, N.; Qin, W.; Wei, D. Estimating Carbon, Nitrogen, and Phosphorus Contents of West–East Grassland Transect in Inner Mongolia Based on Sentinel-2 and Meteorological Data. Remote Sens. 2022, 14, 242. [Google Scholar] [CrossRef]
Morais, T.G.; Teixeira, R.F.M.; Figueiredo, M.; Domingos, T. The use of machine learning methods to estimate aboveground biomass of grasslands: A review. Ecol. Indic. 2021, 130, 108081. [Google Scholar] [CrossRef]
Greenacre, M.; Groenen, P.J.F.; Hastie, T.; D’Enza, A.I.; Markos, A.; Tuzhilina, E. Principal component analysis. Nat. Rev. Dis. Primers 2022, 2, 100. [Google Scholar] [CrossRef]
Zeng, Y.; Hao, D.; Huete, A.; Dechant, B.; Berry, J.; Chen, J.M.; Joiner, J.; Frankenberg, C.; Bond-Lamberty, B.; Ryu, Y.; et al. Optical vegetation indices for monitoring terrestrial ecosystems globally. Nat. Rev. Earth Environ. 2022, 3, 477–493. [Google Scholar] [CrossRef]
Yang, S.; Feng, Q.; Liang, T.; Liu, B.; Zhang, W.; Xie, H. Modeling grassland above-ground biomass based on artificial neural network and remote sensing in the Three-River Headwaters Region. Remote Sens. Environ. 2018, 204, 448–455. [Google Scholar] [CrossRef]
Asner, G.P.; Martin, R.E.; Anderson, C.B.; Knapp, D.E. Quantifying forest canopy traits: Imaging spectroscopy versus field survey. Remote Sens. Environ. 2015, 158, 15–27. [Google Scholar] [CrossRef]
Psomas, A.; Kneubühler, M.; Huber, S.; Itten, K.; Zimmermann, N.E. Hyperspectral remote sensing for estimating aboveground biomass and for exploring species richness patterns of grassland habitats. Int. J. Remote Sens. 2011, 32, 9007–9031. [Google Scholar] [CrossRef]
Cai, Y.; Lin, J.; Lin, Z.; Wang, H.; Zhang, Y.; Pfister, H.; Timofte, R.; Gool, L. MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction. arXiv 2022. [Google Scholar] [CrossRef]
Zhang, A.; Yin, S.; Wang, J.; He, N.; Chai, S.; Pang, H. Grassland Chlorophyll Content Estimation from Drone Hyperspectral Images Combined with Fractional-Order Derivative. Remote Sens. 2023, 15, 5623. [Google Scholar] [CrossRef]
Wang, R.; Gamon, J.A.; Hmimina, G.; Cogliati, S.; Zygielbaum, A.I.; Arkebauer, T.J.; Suyker, A. Harmonizing solar induced fluorescence across spatial scales, instruments, and extraction methods using proximal and airborne remote sensing: A multi-scale study in a soybean field. Remote Sens. Environ. 2022, 281, 113268. [Google Scholar] [CrossRef]
Pang, H.; Zhang, A.; Kang, X.; He, N.; Dong, G. Estimation of the Grassland Aboveground Biomass of the Inner Mongolia Plateau Using the Simulated Spectra of Sentinel-2 Images. Remote Sens. 2020, 12, 4155. [Google Scholar] [CrossRef]
Prasad, S.; Chanussot, J. Hyperspectral Image Analysis: Advances in Machine Learning and Signal Processing; Springer Nature: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
Jensen, J.R. Remote Sensing of the Environment: An Earth Resource Perspective 2/e; Pearson Education: Noida, India, 2009. [Google Scholar]
Celesti, M.; van der Tol, C.; Cogliati, S.; Panigada, C.; Yang, P.; Pinto, F.; Rascher, U.; Miglietta, F.; Colombo, R.; Rossini, M. Exploring the physiological information of Sun-induced chlorophyll fluorescence through radiative transfer model inversion. Remote Sens. Environ. 2018, 215, 97–108. [Google Scholar] [CrossRef]
Fassnacht, F.E.; Hartig, F.; Latifi, H.; Berger, C.; Hernández, J.; Corvalán, P.; Koch, B. Importance of sample size, data type and prediction method for remote sensing-based estimations of aboveground forest biomass. Remote Sens. Environ. 2014, 154, 102–114. [Google Scholar] [CrossRef]
Lu, D.; Chen, Q.; Wang, G.; Liu, L.; Li, G.; Moran, E. A survey of remote sensing-based aboveground biomass estimation methods in forest ecosystems. Int. J. Digit. Earth 2016, 9, 63–105. [Google Scholar] [CrossRef]
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
Udrescu, S.-M.; Tegmark, M. AI Feynman: A physics-inspired method for symbolic regression. Sci. Adv. 2020, 6, eaay2631. [Google Scholar] [CrossRef]
La Cava, W.; Burlacu, B.; Virgolin, M.; Kommenda, M.; Orzechowski, P.; de França, F.O.; Jin, Y.; Moore, J.H. Contemporary Symbolic Regression Methods and their Relative Performance. Adv. Neural Inf. Process Syst. 2021, DB1, 1–16. [Google Scholar]
Schmidt, M.; Lipson, H. Distilling Free-Form Natural Laws from Experimental Data. Science 2009, 324, 81–85. [Google Scholar] [CrossRef]
Schmidt, M.D.; Lipson, H. Automated modeling of stochastic reactions with large measurement time-gaps. In Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, Dublin, Ireland, 12–16 July 2011; pp. 307–314. [Google Scholar]
Stanislawska, K.; Krawiec, K.; Kundzewicz, Z.W. Modeling global temperature changes with genetic programming. Comput. Math. Appl. 2012, 64, 3717–3728. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 3149–3157. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Dick, G.; Owen, C.A.; Whigham, P.A. Feature standardisation and coefficient optimisation for effective symbolic regression. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference, Cancún, Mexico, 8–12 July 2020; pp. 306–314. [Google Scholar]
Gao, J.; Liu, J.; Liang, T.; Hou, M.; Ge, J.; Feng, Q.; Wu, C.; Li, W. Mapping the Forage Nitrogen-Phosphorus Ratio Based on Sentinel-2 MSI Data and a Random Forest Algorithm in an Alpine Grassland Ecosystem of the Tibetan Plateau. Remote Sens. 2020, 12, 2929. [Google Scholar] [CrossRef]
Zhang, X.; Liang, T.; Gao, J.; Zhang, D.; Liu, J.; Feng, Q.; Wu, C.; Wang, Z. Mapping the forage nitrogen, phosphorus, and potassium contents of alpine grasslands by integrating Sentinel-2 and Tiangong-2 data. Plant Methods 2023, 19, 48. [Google Scholar] [CrossRef]
Makke, N.; Chawla, S. Interpretable scientific discovery with symbolic regression: A review. Artif. Intell. Rev. 2024, 57, 2. [Google Scholar] [CrossRef]
GB/T 6437-2018; Determination of Phosphorus in Feeds—Spectrophotometry. Standards Press of China: Beijing, China, 2018.
Bowker, D.E. Spectral Reflectances of Natural Targets for Use in Remote Sensing Studies; NASA: Washington, DC, USA, 1985; Volume 1139. [Google Scholar]
Biggs, N.; Lloyd, E.K.; Wilson, R.J. Graph. Theory, 1736–1936; Oxford University Press: Oxford, UK, 1986. [Google Scholar]
Gunturi, S.K.; Sarkar, D.; Sumi, L.; De, A. A Combined Graph Theory–Machine Learning Strategy for Planning Optimal Radial Topology of Distribution Networks. Electr. Power Compon. Syst. 2022, 49, 1158–1168. [Google Scholar] [CrossRef]
Gu, X.; Chen, L.; Krenn, M. Quantum experiments and hypergraphs: Multiphoton sources for quantum interference, quantum computation, and quantum entanglement. Phys. Rev. A 2020, 101, 033816. [Google Scholar] [CrossRef]
Lumia, G.; Cushman, S.; Praticò, S.; Modica, G. Intra-network Analysis Based on Comparison Between Graph Theory Approach and Pathwalker. In Proceedings of the Computational Science and Its Applications—ICCSA 2023 Workshops, Athens, Greece, 3–6 July 2023; Springer: Cham, Switzerland, 2023; pp. 300–309. [Google Scholar]
Yi, Z.; Liu, X.C.; Markovic, N.; Phillips, J. Inferencing hourly traffic volume using data-driven machine learning and graph theory. Comput. Environ. Urban. Syst. 2021, 85, 101548. [Google Scholar] [CrossRef]
Xie, Z.; Yuan, M.; Zhang, F.; Chen, M.; Shan, J.; Sun, L.; Liu, X. Using Remote Sensing Data and Graph Theory to Identify Polycentric Urban Structure. IEEE Geosci. Remote Sens. Lett. 2023, 20, 3000505. [Google Scholar] [CrossRef]
Wang, Z.; Li, J.; Lin, Y.; Meng, Y.; Liu, J. GrabRiver: Graph-Theory-Based River Width Extraction From Remote Sensing Imagery. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1500505. [Google Scholar] [CrossRef]
Wagner, M.P.; Oppelt, N. Extracting Agricultural Fields from Remote Sensing Imagery Using Graph-Based Growing Contours. Remote Sens. 2020, 12, 1205. [Google Scholar] [CrossRef]
Phillips, J.D.; Schwanghart, W.; Heckmann, T. Graph theory in the geosciences. Earth Sci. Rev. 2015, 143, 147–160. [Google Scholar] [CrossRef]
Strahler, A.H.; Woodcock, C.E.; Smith, J.A. On the nature of models in remote sensing. Remote Sens. Environ. 1986, 20, 121–139. [Google Scholar] [CrossRef]
Maeland, E. On the comparison of interpolation methods. IEEE Trans. Med. Imaging 1988, 7, 213–217. [Google Scholar] [CrossRef] [PubMed]
Halli, S.S.; Rao, K.V. Advanced Techniques of Population Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Zhao, Y.; Liu, Z.; Wu, J. Grassland ecosystem services: A systematic review of research advances and future directions. Landsc. Ecol. 2020, 35, 793–814. [Google Scholar] [CrossRef]
Rouse, J.W.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring vegetation systems in the Great Plains with ERTS. NASA Spec. Publ. 1974, 351, 309. [Google Scholar]
Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.; Ferreira, L.G. Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sens. Environ. 2002, 83, 195–213. [Google Scholar] [CrossRef]
Huete, A.R. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
Kanke, Y.; Tubaña, B.; Dalen, M.; Harrell, D. Evaluation of red and red-edge reflectance-based vegetation indices for rice biomass and grain yield prediction models in paddy fields. Precis. Agric. 2016, 17, 507–530. [Google Scholar] [CrossRef]
Gitelson, A.A.; Gritz, Y.; Merzlyak, M.N. Relationships between leaf chlorophyll content and spectral reflectance and algorithms for non-destructive chlorophyll assessment in higher plant leaves. J. Plant Physiol. 2003, 160, 271–282. [Google Scholar] [CrossRef] [PubMed]
Broge, N.H.; Mortensen, J.V. Deriving green crop area index and canopy chlorophyll density of winter wheat from spectral reflectance data. Remote Sens. Environ. 2002, 81, 45–57. [Google Scholar] [CrossRef]
Gitelson, A.A.; Kaufman, Y.J.; Merzlyak, M.N. Use of a green channel in remote sensing of global vegetation from EOS-MODIS. Remote Sens. Environ. 1996, 58, 289–298. [Google Scholar] [CrossRef]
Xu, D.; Wang, C.; Chen, J.; Shen, M.; Shen, B.; Yan, R.; Li, Z.; Karnieli, A.; Chen, J.; Yan, Y.; et al. The superiority of the normalized difference phenology index (NDPI) for estimating grassland aboveground fresh biomass. Remote Sens. Environ. 2021, 264, 112578. [Google Scholar] [CrossRef]
Steven, M.D. The Sensitivity of the OSAVI Vegetation Index to Observational Parameters. Remote Sens. Environ. 1998, 63, 49–60. [Google Scholar] [CrossRef]
Castillo, J.A.A.; Apan, A.A.; Maraseni, T.N.; Salmo, S.G. Estimation and mapping of above-ground biomass of mangrove forests and their replacement land uses in the Philippines using Sentinel imagery. ISPRS J. Photogramm. Remote Sens. 2017, 134, 70–85. [Google Scholar] [CrossRef]
Daughtry, C.S.T.; Walthall, C.L.; Kim, M.S.; de Colstoun, E.B.; McMurtrey, J.E. Estimating Corn Leaf Chlorophyll Concentration from Leaf and Canopy Reflectance. Remote Sens. Environ. 2000, 74, 229–239. [Google Scholar] [CrossRef]
Gitelson, A.; Merzlyak, M.N. Spectral Reflectance Changes Associated with Autumn Senescence of Aesculus hippocastanum L. and Acer platanoides L. Leaves. Spectral Features and Relation to Chlorophyll Estimation. J. Plant Physiol. 1994, 143, 286–292. [Google Scholar] [CrossRef]
Fernández-Manso, A.; Fernández-Manso, O.; Quintano, C. SENTINEL-2A red-edge spectral indices suitability for discriminating burn severity. Int. J. Appl. Earth Obs. Geoinf. 2016, 50, 170–175. [Google Scholar] [CrossRef]
Barnes, E.; Clarke, T.; Richards, S.; Colaizzi, P.; Haberland, J.; Kostrzewski, M.; Waller, P.; Choi, C.; Riley, E.; Thompson, T. Coincident detection of crop water stress, nitrogen status and canopy density using ground based multispectral data. In Proceedings of the Fifth International Conference on Precision Agriculture, Bloomington, MN, USA, 16–19 July 2000. [Google Scholar]
Sims, D.A.; Gamon, J.A. Relationships between leaf pigment content and spectral reflectance across a wide range of species, leaf structures and developmental stages. Remote Sens. Environ. 2002, 81, 337–354. [Google Scholar] [CrossRef]
Qi, J.; Chehbouni, A.; Huete, A.R.; Kerr, Y.H.; Sorooshian, S. A modified soil adjusted vegetation index. Remote Sens. Environ. 1994, 48, 119–126. [Google Scholar] [CrossRef]
Dong, J.; Peng, J.; Liu, Y.; Qiu, S.; Han, Y. Integrating spatial continuous wavelet transform and kernel density estimation to identify ecological corridors in megacities. Landsc. Urban. Plan. 2020, 199, 103815. [Google Scholar] [CrossRef]
Quade, M.; Abel, M.; Shafi, K.; Niven, R.K.; Noack, B.R. Prediction of dynamical systems by symbolic regression. Phys. Rev. E 2016, 94, 012214. [Google Scholar] [CrossRef] [PubMed]
Liang, T.; Yang, S.; Feng, Q.; Liu, B.; Zhang, R.; Huang, X.; Xie, H. Multi-factor modeling of above-ground biomass in alpine grassland: A case study in the Three-River Headwaters Region, China. Remote Sens. Environ. 2016, 186, 164–172. [Google Scholar] [CrossRef]
Bogrekci, I.; Lee, W.S. Spectral Phosphorus Mapping using Diffuse Reflectance of Soils and Grass. Biosyst. Eng. 2005, 91, 305–312. [Google Scholar] [CrossRef]
Mutanga, O.; Kumar, L. Estimating and mapping grass phosphorus concentration in an African savanna using hyperspectral image data. Int. J. Remote Sens. 2007, 28, 4897–4911. [Google Scholar] [CrossRef]
Ramoelo, A.; Skidmore, A.K.; Cho, M.A.; Mathieu, R.; Heitkönig, I.M.A.; Dudeni-Tlhone, N.; Schlerf, M.; Prins, H.H.T. Non-linear partial least square regression increases the estimation accuracy of grass nitrogen and phosphorus using in situ hyperspectral and environmental data. ISPRS J. Photogramm. Remote Sens. 2013, 82, 27–40. [Google Scholar] [CrossRef]
Gao, J.; Liang, T.; Yin, J.; Ge, J.; Feng, Q.; Wu, C.; Hou, M.; Liu, J.; Xie, H. Estimation of Alpine Grassland Forage Nitrogen Coupled with Hyperspectral Characteristics during Different Growth Periods on the Tibetan Plateau. Remote Sens. 2019, 11, 2085. [Google Scholar] [CrossRef]
Huang, J.Y.; Zhu, X.G.; Yuan, Z.Y.; Song, S.H.; Li, X.; Li, L.-H. Changes in nitrogen resorption traits of six temperate grassland species along a multi-level N addition gradient. Plant Soil. 2008, 306, 149–158. [Google Scholar] [CrossRef]

Figure 1. Study area and sampling site distribution: (a) geographic location of sampling sites; (b) the sampling site distribution.

Figure 3. CHSF curves produced by interpolation methods: (a–c) the results obtained using quadratic, cubic, and quartic spline interpolation, respectively; (d–f) the results from RBF interpolation with different kernel functions: multiple quadratic, inverse polynomial, and thin plate; (g,h) linear interpolation and Hermite interpolation, respectively; the red curve is the interpolation curve, blue point is the control point of curve interpolation, the red cycle is the wave crest CHSF curve, and green cycle is the wave trough of the CHSF curve.

Figure 4. CHSF curves produced by fitting methods: (a–c) the results obtained using the decision tree, K-nearest neighbor, and random forest fitting, respectively; (d–f) the results from the piecewise polynomial fitting of linear, quadratic, and cubic, respectively; (g,h) quadratic polynomial and cubic polynomial fitting, respectively; the red curve is the fitting curve, blue point is the control point of curve fitting, the red cycle is the wave crest CHSF curve, and green cycle is the wave trough of the CHSF curve.

Figure 5. The flowchart of CHSF extraction and symbolic regression model development.

Figure 6. Accuracy assessment of grassland key ecological parameters inversion by using CHSF: The orange diamonds, green diamonds, and purple diamonds represent the regreening, grass-bearing, and yellowing periods, respectively. MQ, IP, and TP are the kernel function of RBF: multiple quadratic, inverse polynomial, and thin-plate. KNN, DT, and RF are the methods of machine learning: K-nearest neighbor, decision tree, and random forest.

Figure 7. Measured and predicted P by the optimal symbolic regression model: (a–c) the regreening, grass-bearing, and yellowing periods, respectively; the blue points represent the validation data, the blue solid line represents the fitted line, and the red dashed line represents the 1:1 line.

Figure 8. Spatial distributions and data histograms of P content. (a–c) the regreening, grass-bearing, and yellowing periods, respectively.

Figure 9. Seasonal spatiotemporal dynamics of P content in change areas across regreening, grass-bearing, and yellowing periods: (a) spatiotemporal dynamics from regreening to grass-bearing period, (b) spatiotemporal dynamics from grass-bearing to yellowing period, (c) the change areas are divided according to variations in P content.

Table 1. Equations and references of vegetation indices.

VIs	Equation	References
NDVI	$(ρ_{N I R} - ρ_{R e d}) / (ρ_{N I R} + ρ_{R e d})$	[49]
EVI	$2.5 * (ρ_{N I R} - ρ_{R e d}) / (ρ_{N I R} + {6 * ρ}_{R e d} - {7.5 * ρ}_{B l u e} + 1)$	[50]
SAVI	$(1 + L) * (ρ_{N I R} - ρ_{R e d}) / (ρ_{N I R} + ρ_{R e d} + L)$	[51]
RVI	$ρ_{N I R} / ρ_{R e d}$	[52]
CIg	$ρ_{N I R} / ρ_{G r e e n} - 1$	[53]
DVI	$ρ_{N I R} - ρ_{R e d}$	[54]
GNDVI	$(ρ_{N I R} - ρ_{G r e e n}) / (ρ_{N I R} + ρ_{G r e e n})$	[55]
NDPI	$(ρ_{N I R} - (0.74 * ρ_{R e d} + 0.26 * ρ_{S W I R 1})) / (ρ_{N I R} + (0.74 * ρ_{R e d} + 0.26 * ρ_{S W I R 1}))$	[56]
OSAVI	$(ρ_{N I R} - ρ_{R e d}) / (ρ_{N I R} + ρ_{R e d} + 0.16)$	[57]
CIre	$ρ_{E D G E 3} / ρ_{E D G E 1} - 1$	[53]
IRECI	$(ρ_{E D G E 3} - ρ_{R e d}) / (ρ_{E D G E 1} / ρ_{E D G E 2})$	[58]
MCARI	$((ρ_{E D G E 1} - ρ_{R e d}) - 0.2 * (ρ_{E D G E 1} - ρ_{G r e e n})) * (ρ_{E D G E 1} / ρ_{R e d})$	[59]
NDVIre1	$(ρ_{N I R} - ρ_{E D G E 1}) / (ρ_{N I R} + ρ_{E D G E 1})$	[60]
NDVIre2	$(ρ_{N I R} - ρ_{E D G E 2}) / (ρ_{N I R} + ρ_{E D G E 2})$	[61]
NDVIre3	$(ρ_{N I R} - ρ_{E D G E 3}) / (ρ_{N I R} + ρ_{E D G E 3})$	[61]
NDre1	$(ρ_{E D G E 2} - ρ_{E D G E 1}) / (ρ_{E D G E 2} + R ρ_{E D G E 1})$	[60]
NDre2	$(ρ_{E D G E 3} - ρ_{E D G E 1}) / (ρ_{E D G E 3} + R ρ_{E D G E 1})$	[62]
SRre	$ρ_{N I R} / ρ_{E D G E 1}$	[63]
MSAVI	${(2 * ρ}_{N I R} + 1 - \sqrt{{(2 * ρ_{N I R})}^{2} - 8 * (ρ_{N I R} - ρ_{R e d})}) / 2$	[64]

Where Red, NIR, Blue, Green, SWIR1, EDGE1, EDGE2, EDGE3 are means of original bands of Band 4, Band 8, Band 2, Band 3, Band 11, Band 5, Band 6, Band 7.

Table 2. Optimal CHSF spectral dispersion of forage P content estimation.

Growing Seasons		Regreening			Grass-Bearing			Yellowing
Indicators		R²	MSE	MAE	R²	MSE	MAE	R²	MSE	MAE
Spectral Dispersion	1 nm	0.761	0.0025	0.04	0.598	0.0274	0.132	0.675	0.0112	0.084
	2 nm	0.789	0.0018	0.033	0.696	0.0219	0.118	0.609	0.0127	0.09
	3 nm	0.76	0.0024	0.041	0.802	0.0092	0.077	0.678	0.0089	0.075
	4 nm	0.763	0.0022	0.038	0.8	0.0094	0.078	0.699	0.0085	0.077
	5 nm	0.825	0.0028	0.03	0.747	0.0018	0.061	0.811	0.006	0.061
	6 nm	0.836	0.002	0.029	0.774	0.0099	0.081	0.704	0.0057	0.06
	7 nm	0.759	0.0023	0.038	0.725	0.0113	0.085	0.654	0.0067	0.065
	8 nm	0.795	0.0033	0.045	0.714	0.0117	0.093	0.697	0.0104	0.083
	9 nm	0.756	0.0023	0.043	0.76	0.0136	0.089	0.825	0.0037	0.049
	10 nm	0.777	0.0021	0.037	0.831	0.0083	0.054	0.704	0.0084	0.073

Note: The bold numbers represent the optimal accuracy for each period.

Table 3. Optimal model of forage P content estimation.

Growing Seasons	Optimal Model
Regreening period	tan(divide(multiply(subtract(divide(subtract(X111, X177), X39), X119), X43), multiply(X182, cos(X46))))
Grass-bearing period	divide(multiply(multiply(sin(X40), sin(multiply(sin(X118), cos(divide(divide(divide(tan(X39), X29), divide(sin(X161), X148)), X90))))), X41), X65)
Yellowing period	sin(cos(divide(subtract(sin(sin(add(divide(sin(divide(X110, subtract(add(X193, X93), divide(X172, X141)))), sin(sin(X27))), X191))), X190), X96)))

Note: X refers to the optimal CHSF method, and the number is the specific bands of CHSFIs.

Table 4. Accuracy comparison of CHSF and OSF in forage P content estimation.

Growing Seasons	Regreening			Grass-Bearing			Yellowing
Indicators	R²	MSE	MAE	R²	MSE	MAE	R²	MSE	MAE
OSF	0.419	0.0132	0.099	0.528	0.0142	0.127	0.443	0.0153	0.0987
Ours	0.836	0.002	0.029	0.831	0.0083	0.054	0.825	0.0037	0.049
Increment	99.5%	−84.8%	−70.7%	57.4%	−41.5%	−57.5%	86.2%	−75.8%	−50.4%

Table 5. Accuracy comparison of symbolic regression and MLR, RF, and DNN in forage P content estimation.

Growing Seasons	Regreening			Grass-Bearing			Yellowing
Indicators	R²	MSE	MAE	R²	MSE	MAE	R²	MSE	MAE
MLR	0.198	0.0029	0.043	0.363	0.0138	0.094	0.141	0.0238	0.123
Ours	0.836	0.002	0.029	0.831	0.0083	0.054	0.825	0.0037	0.049
Increment	322.2%	−31.0%	−32.6%	68.2%	−82.9%	−67.0%	639.8%	−52.4%	−44.2%
RF	0.497	0.0117	0.088	0.421	0.0144	0.103	0.499	0.0139	0.098
Ours	0.836	0.002	0.029	0.831	0.0083	0.054	0.825	0.0037	0.049
Increment	128.9%	−39.9%	−42.6%	97.4%	−42.4%	−47.6%	469.2%	−31.4%	−37.9%
DNN	0.113	0.0042	0.052	0.146	0.0121	0.087	0.249	0.0204	0.112
Ours	0.836	0.002	0.029	0.831	0.0083	0.054	0.825	0.0037	0.049
Increment	485.1%	−84.5%	−60.2%	65.3%	−73.4%	−50.0%	231.3%	−81.9%	−56.3%

Table 6. Feature bands of optimal model in forage P content.

Growing Seasons	Feature Bands/nm
Regreening period	677.51, 701.47, 719.44, 1108.79, 1156.71, 1504.13, 1534.08
Grass-bearing period	734.08, 834.14, 844.15, 854.16, 1094.31, 1344.46, 1624.64, 1924.83, 2054.91
Yellowing period	686.5, 1279.52, 1306.48, 1432.27, 1710.81, 1989.35, 2151.09, 2160.07, 2178.04

Table 7. Descriptive statistics of change areas.

Change Area	Area/km²	Regreening → Grass-Bearing Period				Grass-Bearing → Yellowing Period
Change Area	Area/km²	Min/g	Max/g	Mean/g	STD/g	Min/g	Max/g	Mean/g	STD/g
Core	82.255	0.012	0.886	0.413	0.121	−0.774	0	−0.278	0.115
Primary	317.901	0.01	0.921	0.426	0.116	−0.788	0	−0.286	0.116
Secondary	303.053	0.018	0.889	0.421	0.105	−0.921	−0.001	−0.281	0.117
Non-significant	180.318	0	0.767	0.386	0.118	−0.732	0	−0.241	0.099

Notes: Min, Max, Mean, and STD refer to the minimum, maximum, mean, and standard deviation of data.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, J.; Zhang, A.; Wang, J.; Gao, X.; Hu, S.; Chai, S. Mapping Seasonal Spatiotemporal Dynamics of Alpine Grassland Forage Phosphorus Using Sentinel-2 MSI and a DRL-GP-Based Symbolic Regression Algorithm. Remote Sens. 2024, 16, 4086. https://doi.org/10.3390/rs16214086

AMA Style

Shi J, Zhang A, Wang J, Gao X, Hu S, Chai S. Mapping Seasonal Spatiotemporal Dynamics of Alpine Grassland Forage Phosphorus Using Sentinel-2 MSI and a DRL-GP-Based Symbolic Regression Algorithm. Remote Sensing. 2024; 16(21):4086. https://doi.org/10.3390/rs16214086

Chicago/Turabian Style

Shi, Jiancong, Aiwu Zhang, Juan Wang, Xinwang Gao, Shaoxing Hu, and Shatuo Chai. 2024. "Mapping Seasonal Spatiotemporal Dynamics of Alpine Grassland Forage Phosphorus Using Sentinel-2 MSI and a DRL-GP-Based Symbolic Regression Algorithm" Remote Sensing 16, no. 21: 4086. https://doi.org/10.3390/rs16214086

APA Style

Shi, J., Zhang, A., Wang, J., Gao, X., Hu, S., & Chai, S. (2024). Mapping Seasonal Spatiotemporal Dynamics of Alpine Grassland Forage Phosphorus Using Sentinel-2 MSI and a DRL-GP-Based Symbolic Regression Algorithm. Remote Sensing, 16(21), 4086. https://doi.org/10.3390/rs16214086

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mapping Seasonal Spatiotemporal Dynamics of Alpine Grassland Forage Phosphorus Using Sentinel-2 MSI and a DRL-GP-Based Symbolic Regression Algorithm

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data

2.2.1. Field Data Collection

2.2.2. Satellite Data Processing

2.3. Methods

2.3.1. Extracting CHSF from Sentinel-2 MSI of Forage P Content Estimation

2.3.2. Symbolic Regression Based on DRL-GP of Forage P Content Estimation

2.3.3. Comparison of CHSF with OSF in P Content Estimation

2.3.4. Comparison of Symbolic Regression with MLR, RF, and DNN in Forage P Content Estimation

2.3.5. Seasonal Spatiotemporal Dynamics of Forage P Content from Regreening to Yellowing Period

3. Results

3.1. Optimal CHSF Methods Analysis of Forage P Content Estimation

3.2. Optimal Spectral Dispersion of CHSF Analysis of Forage P Content Estimation

3.3. Accuracy Evaluation of Combined CHSF with Symbolic Regression Model in Forage P Content Estimations

3.4. Spatiotemporal Distributions Analysis of Forage P Content During Regreening, Grass-Bearing, and Yellowing Periods

4. Discussion

4.1. Contributions of Combining CHSF with Symbolic Regression Model for Estimating Forage P Content

4.2. Feature CHSF Bands of Optimal Symbolic Inversion Model for Estimating Content of Forage P

4.3. Seasonal Spatiotemporal Dynamics Analysis of Forage P Content from Regreening to Yellowing Period

4.4. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI