Leveraging Remote Sensing-Derived Dynamic Crop Growth Information for Improved Soil Property Prediction in Farmlands

Geng, Jing; Tan, Qiuyuan; Zhang, Ying; Lv, Junwei; Yu, Yong; Fang, Huajun; Guo, Yifan; Cheng, Shulan

doi:10.3390/rs16152731

Open AccessArticle

Leveraging Remote Sensing-Derived Dynamic Crop Growth Information for Improved Soil Property Prediction in Farmlands

by

Jing Geng

^1,2

,

Qiuyuan Tan

¹,

Ying Zhang

¹,

Junwei Lv

¹

,

Yong Yu

¹,

Huajun Fang

^3,4,*,

Yifan Guo

³ and

Shulan Cheng

⁵

¹

School of Geospatial Engineering and Science, Sun Yat-sen University, Zhuhai 519082, China

²

Key Laboratory of Natural Resources Monitoring in Tropical and Subtropical Area of South China, Ministry of Natural Resources, Zhuhai 519082, China

³

Key Laboratory of Ecosystem Network Observation and Modeling, Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China

⁴

The Zhongke-Ji’an Institute for Eco-Environmental Sciences, Ji’an 343000, China

⁵

College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(15), 2731; https://doi.org/10.3390/rs16152731

Submission received: 8 June 2024 / Revised: 6 July 2024 / Accepted: 20 July 2024 / Published: 26 July 2024

(This article belongs to the Section Remote Sensing in Agriculture and Vegetation)

Download

Browse Figures

Versions Notes

Abstract

:

Rapid and accurate mapping of soil properties in farmlands is crucial for guiding agricultural production and maintaining food security. Traditional methods using spectral features from remote sensing prove valuable for estimating soil properties, but are restricted to short periods of bare soil occurrence within agricultural settings. Addressing the challenge of predicting soil properties under crop cover, this study proposed an improved soil modeling framework that integrates dynamic crop growth information with machine learning techniques. The methodology’s robustness was tested on six key soil properties in an agricultural region of China, including soil organic carbon (SOC), total nitrogen (TN), total phosphorus (TP), dissolved organic carbon (DOC), dissolved organic nitrogen (DON), and pH. Four experimental scenarios were established to assess the impact of crop growth information, represented by the normalized difference vegetation index (NDVI) and phenological parameters. Specifically, Scenario I utilized only natural factors (terrain and climate data); Scenario II added phenological parameters based on Scenario I; Scenario III incorporated time-series NDVI based on Scenario I; and Scenario IV combined all variables (traditional natural factors and crop growth information). These were evaluated using three advanced machine learning models: random forest (RF), Cubist, and Extreme Gradient Boosting (XGBoost). Results demonstrated that incorporating phenological parameters and time-series NDVI significantly improved model accuracy, enhancing predictions by up to 36% over models using only natural factors. Moreover, although both are crop growth factors, the contribution of the time-series NDVI variable to model accuracy surpassed that of the phenological variable for most soil properties. Relative importance analysis suggested that the crop growth information, derived from time-series NDVI and phenology data, collectively explained 14–45% of the spatial variation in soil properties. This study highlights the significant benefits of integrating remote sensing-based crop growth factors into soil property inversion under crop-covered conditions, providing valuable insights for digital soil mapping.

Keywords:

soil properties mapping; crop growth factor; time-series NDVI; farmland soils

Graphical Abstract

1. Introduction

Soil, as a fundamental component in human sustenance and agricultural production, plays a crucial role in achieving food security, ecological balance, and sustainable development goals [1]. In agricultural settings, various soil properties play a crucial role in providing essential nutrients and creating favorable environmental conditions for crop development [2]. For instance, soil organic carbon (SOC), an indispensable component of soil nutrients, directly affects the physical, chemical, and biological attributes of the soil, thereby facilitating optimal crop growth [3]. Total nitrogen (TN) and total phosphorus (TP), which are vital for crop vitality, impact energy transfer and cellular processes in plants. However, excessive levels of TN and TP can lead to soil fertility degradation and environmental issues such as water pollution [4]. Soil pH, a measure of acidity or alkalinity, is pivotal for crop growth as it affects various soil biological activities. The adaptability of crops to different pH levels varies significantly, underscoring the importance of precise pH management practices [5]. Additionally, dissolved organic carbon (DOC) and dissolved organic nitrogen (DON), representing water-soluble fractions of organic carbon and nitrogen, respectively, are integral to both crop uptake processes and soil microbial activities [6]. The acquisition of a comprehensive understanding of the spatial distribution patterns of these key soil properties is therefore crucial. This knowledge is indispensable for implementing effective agricultural management strategies that prioritize environmental stewardship.

Traditional methods to investigating these soil properties typically rely on field sample collection followed by laboratory chemical analyses to determine their characteristics. However, the costs associated with field surveys and the limitations on the number of samples that can be collected pose challenges in conducting extensive soil surveys over large areas [7]. Moreover, these methods yield localized data points that fail to effectively capture the continuous spatial variation in soil properties [8]. With the advancement of computer technology and geographic information systems, digital soil mapping has emerged as an efficient method for expressing the spatial distribution of soil properties [9]. Moreover, the rapid advancement of remote sensing sensors, combined with improvements in the resolution of remote sensing images, has enabled cost-effective, extensive, and prompt acquisition of surface change information [10,11]. This significantly enhances both the accuracy and timeliness of digital soil mapping.

Numerous studies have demonstrated a strong correlation between the spectral reflectance captured in remote sensing images and soil properties, and established prediction models for estimating soil properties based on spectral features [12,13]. However, these methodologies are primarily tailored for assessing bare soil observations and face significant limitations when evaluating soil attributes under vegetation cover. This is particularly problematic given that bare-soil time windows are often short and can be easily affected by cloudy and rainy weather conditions. To address this issue, scholars have adopted an alternative approach by using crop growth as an indirect indicator of soil nutrients. This method is increasingly supported by evidence that the interaction between crop growth and soil properties significantly influences the spatial variation in soil attributes [14,15]. For instance, many studies have employed vegetation indices from remote sensing imagery as predictive variables, such as the normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI) [16,17]. However, most existing studies have primarily employed static vegetation indices as variables, such as annually averaged composite values or peak growing season-aggregated values. Consequently, these studies often overlook the impacts of dynamic changes in vegetation throughout the season on soil characteristics.

Recent evidence indicates that time-series vegetation indices, which incorporate multiple temporal data points to characterize crop growth processes, offer enhanced potential for predicting soil properties compared to single temporal vegetation indices [13,18]. Moreover, it is essential to acknowledge that soil formation and modification are not discrete events but rather integral components of a continuous, interconnected process. Therefore, there are inherent limitations in using single temporal remote sensing images to infer soil properties [19,20]. Furthermore, phenological parameters derived from multi-temporal remote sensing data also reflect the temporal dynamics of soil properties through seasonal and annual variations associated with crop growth stages and anthropogenic management practices [18]. This encompasses alterations in soil nutrient composition resulting from the incorporation of crop residues and fluctuations in nutrient uptake efficiency during different growth stages. Such reciprocal interactions are pivotal for the spatial variability of soil properties and underscore the necessity of grasping these complex dynamics for effective land management [21]. However, the quantitative comparison of the efficacy of time-series vegetation indices and phenological parameters, both derived from temporally diverse remotely sensed data, remains largely unexplored in digital soil mapping.

Moreover, the utilization of a diverse range of predictive models has been instrumental in elucidating the intricate relationships between soil properties and environmental factors [22]. These models encompass statistical approaches such as multiple linear regression, geostatistical models including ordinary Kriging and regression Kriging, as well as machine learning techniques like random forests (RF), support vector machines (SVM), Cubist, and XGBoost. Machine learning methods are particularly advantageous due to their ability to model complex nonlinear relationships between soil properties and environmental variables without assuming specific data distribution, potentially enhancing the accuracy of soil property predictions [23]. Although no single machine learning model consistently outperforms others in all scenarios, tree-based models such as RF, Cubist, and XGBoost have demonstrated superior predictive performance in numerous studies and have become prevalent in the digital mapping of soil properties [24]. The effectiveness of these models in areas characterized by fragmented ground cover, rapid surface changes, and complex terrain warrant further investigation.

The aim of this study is to address the challenges identified in soil property mapping by predicting six fundamental soil properties (SOC, TN, TP, DOC, DON, and pH) in farmlands under crop cover conditions. The specific objectives are as follows: (1) evaluate the efficacy of temporal vegetation indices and phenological parameters in predicting various soil attributes in comparison to traditional climatic and topographic variables; (2) analyze the relative contributions of multiple environmental variables, with a focus on crop growth information, to soil properties in the southern hilly region characterized by red soils; and (3) develop optimal machine learning models for various soil properties, thereby providing a detailed analysis of the spatial distribution of soil properties.

2. Materials and Methods

2.1. Study Area

This study was conducted in Taihe County, situated within Ji’an City, Jiangxi Province. This region spans approximately 2667 km², and is located in the hilly terrain of southeastern China (Figure 1). Geographically, the study area features mountains of higher elevation in its southeastern and western parts, while the central region encompasses the Jitai Basin, complemented by surrounding hilly landscapes. The area is characterized by a subtropical monsoon climate, with an average annual temperature of 18.6 °C and annual precipitation averaging 1726 mm. The primary soil type in this study area is red paddy soil, which is characteristic of the typical red soil regions prevalent in southern China. The sampling sites were specifically concentrated within the paddy soil regions, located predominantly in the central area.

2.2. Soil Sampling and Soil Properties Determination

A total of 90 soil samples were collected from rice fields in the study area, utilizing a randomized sampling method, in December 2020. The selection of December as the sampling month was strategically chosen to capture the cumulative impact of the entire growing season on soil properties. By this time, the soil had undergone various interactions with the crops, including nutrient uptake, root expansion, and organic matter deposition from crop residues and root exudates. This timing ensures that the collected samples reflect the full period of vegetation development and provide a comprehensive snapshot of soil properties influenced by the entire cycle of vegetation growth and development. The sampling procedure employed a five-point sampling method, with soil samples collected from a depth of 0–20 cm. After collection, each soil sample was thoroughly homogenized and immediately placed in a freezer for preservation until laboratory analysis. The geographical coordinates of each sampling site were accurately recorded using a portable GPS device (CHCNAV LT60H, GPS, Shanghai, China) with differential positioning, ensuring an error of no more than 0.05 m.

In the laboratory, the soil samples were initially sieved through a 2 mm mesh to remove stones and roots. A portion of each sample was then air-dried and further passed through a 100-mesh screen in preparation for nutrient analysis. The contents of soil organic carbon (SOC) and total nitrogen (TN) were determined through combustion analysis using an elemental analyzer (vario MACRO cube, Elementar, Langenselbold, Germany), while total phosphorus (TP) was quantified using an acid digestion method, followed by spectrophotometric analysis with a spectrophotometer (UV-2450, Shimadzu, Kyoto, Japan).

For the determination of dissolved organic carbon (DOC) and dissolved organic nitrogen (DON), fresh soil samples were subjected to aqueous and potassium chloride (KCl) extractions, respectively. These extracts were then filtered and analyzed using a total organic carbon analyzer (Liqui TOCII, Elementar, Langenselbold, Germany) for DOC, and a continuous flow Auto Analyzer (AA3, SEAL, Norderstedt, Germany) for DON. Soil pH measurements were conducted on a soil–water mixture at a 1:2.5 ratio, utilizing a pH meter (FE28-Meter, Mettler Toledo, Greifensee, Switzerland).

2.3. Environmental Covariates and Preprocessing

The environmental variables considered in this study encompass terrain, climate, vegetation indices, and phenological parameters. To assess the efficacy of incorporating crop growth information in soil property inversion and to compare the capabilities of time-series vegetation indices and phenological parameters in representing crop growth information, this study systematically categorized four scenarios of environmental variables (Figure 2). Specifically, Scenario I included only terrain and climate variables. Scenario II augmented this base with phenological parameters. Scenario III introduced time-series NDVI alongside the terrain and climate variables. Finally, Scenario IV combined all variables: terrain, climate, phenological parameters, and time-series NDVI.

All data were raster-based and uniformly transformed to the WGS84-UTM50 projection coordinate system for consistency. The nearest neighbor method was then employed to ensure uniform spatial resolution across datasets [25]. A high-resolution (10 m) distribution dataset of paddy rice in China was utilized to extract rice planting areas within the study area for the year 2020 [26]. Subsequently, these identified regions were employed to mask environmental variable data, ensuring that only relevant areas were included in soil properties prediction analysis.

2.3.1. Terrain Data

The terrain variable data were obtained from the ASTER GDEM, which has a spatial resolution of 30 m (https://www.gscloud.cn/, accessed on 15 January 2023). The ASTER GDEM is particularly suitable for estimating soil properties over larger areas due to its widespread use and appropriate resolution for regional studies [4,27]. By utilizing this elevation data, a suite of derivative terrain variables was generated using SAGA GIS software (version 8.5.1). These derived variables encompassed various topographical and morphological indices, including slope, aspect, plan curvature (PlanC), profile curvature (ProC), erosion base height (CNBL), multi-scale ridge-top flatness (MRRTF), multi-scale valley floor flatness (MRVBF), relative slope position index (RSP), terrain roughness index (TRI), topographic wetness index (TWI), and valley depth (VD). Collectively, these variables provided a comprehensive characterization of the topographic features that play a crucial role in comprehending and modeling the environmental factors influencing soil properties within the study area.

2.3.2. Climate Data

The climatic data utilized in this study were sourced from the WorldClim2.1 Bioclimatic dataset (https://worldclim.org/, accessed on 15 January 2023), which provides a comprehensive set of 19 bioclimatic variables at a spatial resolution of 1 km. These variables are derived from monthly temperature and rainfall data, offering ecologically significant insights into the climate patterns within the study area. They encompass various aspects of climate, including trends, seasonality, and extreme conditions, with a particular focus on temperature and precipitation characteristics. Specifically, these climate variables include the annual mean temperature (BIO1), mean diurnal range (BIO2), isothermality (BIO3), temperature seasonality (BIO4), max temperature of warmest month (BIO5), min temperature of coldest month (BIO6), temperature annual range (BIO7), mean temperature of wettest quarter (BIO8), mean temperature of driest quarter (BIO9), mean temperature of warmest quarter (BIO10), mean temperature of coldest quarter (BIO11), annual precipitation (BIO12), precipitation in wettest month (BIO13), precipitation in driest month (BIO14), precipitation seasonality (BIO15), precipitation in driest quarter (BIO16), precipitation in wettest quarter (BIO17), precipitation in warmest quarter (BIO18), and precipitation in coldest quarter (BIO19). The aforementioned variables offer a comprehensive and nuanced insight into the climatic influences on the ecological and agricultural dynamics within the study area.

2.3.3. Time-Series Crop NDVI Data

The normalized difference vegetation index (NDVI) is a widely used vegetation index for assessing vegetation growth and is extensively applied in digital soil mapping [14,20]. To evaluate the effectiveness of temporal vegetation index variables in predicting soil properties, this study processed Sentinel-2 Level 2A images on the Google Earth Engine (GEE) cloud platform with the objective of synthesizing the maximum monthly NDVI values for the study area in 2020.

De-clouding was primarily performed using the QA band, a standard procedure complemented by the Fmask cloud removal method to enhance data quality. In order to evaluate plant health characteristics, we employed a widely recognized method in time-series analysis for synthesizing monthly maximum NDVI, which effectively captures peak vegetation conditions. Consistent with previous studies [20,28], the data gaps in regions lacking maximum NDVI imagery for certain months in 2020 were addressed by utilizing corresponding peak NDVI values from the same month in adjacent years, specifically either 2019 or 2021. Furthermore, the time-series NDVI dataset was compiled by incorporating a geographic mask of the study area, which is an essential step to ensure the alignment of our analysis with the specific spatial context of the study area.

2.3.4. Crop Phenological Parameters Data

The phenological variables for this study were derived from the MODIS vegetation phenology data (MCD12Q2) with a spatial resolution of 500 m for the year 2020. By utilizing the MODIS MCD43A4 reflectance data (NBAR) as a foundation, the MODIS phenology data were derived by first computing a time series of the two-band enhanced vegetation index (EVI2). The dynamics within the EVI2 time series were then analyzed using a sliding window technique to identify intervals indicative of vegetation growth seasons. The classification of an interval as a growth season is determined by whether its maximum value and range meet predefined threshold criteria.

For each identified growth season, piecewise logistic regression was applied to pinpoint the occurrence of seven key phenological events based on changes in curvature within the EVI2 series. These events include emergence of greenness marking the beginning of the growing season (Greenup), continued vegetative growth with increasing greenness (MidGreenup), highest level of greenness indicating full vegetative development (Peak), stabilization of growth as plants reach full development (Maturity), decline of greenness signaling the end of active growth (Senescence), accelerated reduction in greenness as plants prepare for dormancy (MidGreendown), and period of minimal biological activity in vegetation (Dormancy). Moreover, the MCD12Q2 product provides three supplementary phenological metrics for each growth season: the minimum value of EVI2 baseline (EVI_Minumum), the difference between maximum and minimum values (EVI_Amplitude), and the integral of EVI2 over the entire season, representing cumulative EVI2 (EVI_Area).

2.4. Modeling Techniques and Accuracy Evaluation

In order to optimize the model’s efficiency and accuracy while addressing multicollinearity concerns among variables, we implemented the recursive feature elimination (RFE) algorithm for variable selection. RFE is a backward selection technique that progressively refines the variable set by eliminating less significant variables, ultimately identifying the optimal subset for each combination of environmental factors [29]. The RFE process was carried out using a random forest estimator to provide robust importance scores for the variables. The number of features to select was determined through cross-validation, ensuring optimal predictive performance. The step parameter was set to 1, removing one feature at each iteration. Variable importance was assessed based on the model’s accuracy metrics, including R-squared and RMSE. A ten-fold cross-validation approach was employed to validate the selection process and prevent overfitting [30,31].

2.4.1. Machine Learning Techniques

The random forest model (RF) is an ensemble learning technique that constructs multiple decision trees during the training process [32]. It utilizes bootstrap sampling to randomly select samples for building each decision tree, thereby forming a “forest”. Key parameters include the number of trees in the forest (ntrees) and the number of features considered for splitting at each node (mtry). The final outcome is determined by aggregating predictions from numerous trees through a voting mechanism. This approach effectively addresses the risk of overfitting and enhances the generalization capability of the model. The RF model was executed using the ‘randomForest’ package in R 4.2.2.

The cubist model (Cubist), an evolution of the M5 model tree, is distinguished by its utilization of linear regression functions at the terminal nodes of the tree [33]. It constructs a regression tree model through an assembly of piecewise linear models. Cubist simplifies the model tree into a set of rules, each representing a condition that enables capturing local linear trends within the variable space, thereby enhancing prediction accuracy. The main parameters for this model encompass the number of base decision trees (committees) and the count of nearest samples (neighbors) for locally weighted regression, which influences the model’s fit to local data patterns. In this study, the Cubist model was implemented using the ‘Cubist’ package in R 4.2.2.

The extreme gradient boosting model (XGBoost) is an advanced gradient boosting decision tree algorithm due to its effectiveness in regression tasks [34]. It strengthens predictive performance by sequentially combining multiple weak learners into a robust learner. XGBoost distinguishes itself by applying a second-order Taylor expansion of the loss function and incorporating regularization techniques to mitigate overfitting while optimizing model efficiency. Key parameters for XGBoost encompass the learning rate (eta), the iteration number (nround), the maximum depth of individual trees (max_depth), the feature sampling rate for node splits (colsample_bytree), the sample rate per tree (subsample), and the minimum weight of leaf nodes (min_child_weight). The XGBoost model was executed using the ‘xgboost’ package in R.

2.4.2. Model Performance Evaluation

The model’s performance was rigorously evaluated in this study using the ten-fold cross-validation technique, a widely recognized and standard process in machine learning and statistical analysis [35,36]. This method randomly partitioned the input data into equally sized subsets [27]. During each iteration of this process, nine subsets were combined to form the training set, while the remaining subset was used as the validation set. This cycle was repeated ten times, with a distinct subset serving as the validation set in each iteration. The overall validation accuracy of the model was then determined by averaging the accuracies obtained from all ten iterations. This approach provides a robust estimate of the model’s performance by reducing potential biases and variance associated with a single data partitioning [37].

The effectiveness of the different models deployed in this study was assessed and compared using three statistical metrics: the coefficient of determination (R²), the root mean square error (RMSE), and the mean absolute error (MAE). The formulas for these accuracy evaluation metrics are delineated as follows:

R^{2} = 1 - \frac{{\sum_{i = 1}^{n} (P_{i} - {\bar{O}}_{i})}^{2}}{\sum_{i = 1}^{n} {(O_{i} - {\bar{O}}_{i})}^{2}}

(1)

RMSE = \sqrt{\frac{\sum_{i = 1}^{n} {(O_{i} - P_{i})}^{2}}{n}}

(2)

MAE = \frac{\sum_{i = 1}^{n} |O_{i} - P_{i}|}{n}

(3)

where n represents the total number of samples, i represents the single sample, O_i represents the observed value of the i-th sample,

{\bar{O}}_{i}

represents the average of observed values of all samples, and P_i represents the predicted value of the i-th sample.

3. Results and Discussion

3.1. Statistical Characteristics of Observed Six Soil Properties

The descriptive statistical analysis of observed soil properties within the study area is presented in Table 1. For SOC, the observed range was 7.57 to 40.39 g/kg, indicating a wide variation in organic carbon content across the region. To further understand this variability, we calculated the coefficient of variation (CV) for SOC, which was 24.46%. This CV suggests that while certain areas exhibit high organic carbon content, most of the region maintains moderate levels. The moderate CV value indicates a balanced distribution of organic carbon, reflecting both high and moderate SOC areas across the landscape [38].

TN values ranged from 0.56 to 3.18 g/kg, and TP measurements varied from 0.26 to 0.91 g/kg, indicating a moderate spread in nutrient distribution within these soils. The CV for TN was 26.40%, and for TP, it was 25.93%, indicating that the variation is moderate but within acceptable limits for agricultural soils. These results suggest that, despite some variability, the essential nutrients for plant growth are relatively evenly distributed across the studied soils [39].

Additionally, the CV for pH was below 10%, denoting minimal variability with values ranging from 4.58 to 6.87. This characterized the soil as predominantly acidic, a common feature in many of southern China’s red soil regions [40]. The relatively narrow pH range observed in this study aligned with findings from other similar ecological settings, where acidic soil conditions prevail due to the climatic influences and the types of parent materials that have contributed to their formation [41]. For DOC and DON, the CVs were 62.37% and 50.88%, respectively, reflecting a moderate level of variation. The significantly positive skewness and high kurtosis value for DOC indicated that while most sampling points cluster around a concentration of 4~5 mg/kg, a minority exhibited substantially higher levels of DOC. This variability in DOC and DON concentrations could be attributed to specific local practices, such as the type of crops grown or the use of organic amendments [42]. For instance, areas with intensive agricultural practices or high organic matter inputs typically show greater variability in these properties [39].

3.2. Spatial Variability and Temporal Dynamics of Crop NDVI and Phenological Parameters Variables

The dynamic crop NDVI in Taihe County revealed a distinct temporal and spatial pattern in vegetation health and growth (Figure 3). The NDVI time-series over 12 months clearly exhibited fluctuations that closely corresponded to the phenological stages of the region’s crop. Elevated NDVI values observed from May to October indicated the peak growth periods of rice, signifying dense and healthy crop cover (Figure 3e–j). Conversely, lower NDVI values observed during the remaining months were indicative of harvest and fallow periods, characterized by a significant reduction in vegetation cover [43]. Moreover, the crop growth within this region displayed significant spatial heterogeneity, as evidenced by consistently higher NDVI values observed in central areas over multiple months. This suggested disparities in agronomic factors, such as soil fertility, water availability, and local microclimates, which significantly impact crop health and productivity [44]. Interestingly, elevated NDVI levels observed in specific regions during the months of February to April may indicate the cultivation of alternative crops such as rapeseed or vegetables prior to rice planting. This hypothesis is supported by studies that have documented similar agricultural practices and their impact on NDVI levels in the region [45,46]. This suggests diverse agricultural practices that could contribute to soil conservation and income diversification for the farmers [47,48].

The spatial–temporal characterization of phenological parameters within Taihe County, as derived from the remotely sensed vegetation phenology data in 2020, is a pivotal element for understanding the life cycle of the region’s vegetation. Phenological events spanned from April to November, with distinct temporal dynamics: Greenup in April, MidGreenup in May, Maturity during June and July, Peak in August, Senescence in September, MidGreendown in October, and Dormancy by November (Figure 4a–g). These stages reflect the intricate growth patterns of vegetation throughout the year [14,49]. The non-uniform emergence of Greenup and the dispersion of Peak and Maturity phases across the county reflect the area’s diverse agronomic conditions, which play a crucial role in crop management and agricultural productivity [50].

In addition to these phenological stages, the spatial distribution of EVI2-based metrics, namely the minimum value of EVI2 baseline (EVI_Minumum), the difference between maximum and minimum values (EVI_Amplitude), and integral of EVI2 over the entire season representing cumulative EVI2 (EVI_Area), further delineate variations in vegetative activity and vigor across different areas within the county (Figure 4h–j). The observed variation in EVI_Minimum across the landscape serves as an indicator of disparities in baseline vegetation health, which are likely influenced by differences in soil quality, terrain, and other environmental factors, including weather anomalies [51]. Additionally, the distributions of EVI_Amplitude and EVI_Area revealed variations in vegetative vigor and total productivity, particularly in the central plains of Taihe County, where the amplitude of EVI changes is more pronounced and overall values are higher. By correlating these variations with phenological stages and soil properties, it becomes evident that agriculture strategies can be effectively customized to the distinct environmental conditions of each agricultural zone within the county, thereby enhancing targeted agricultural productivity and sustainability.

3.3. Comparison of Modeling Performance with Different Variable Scenarios

The comparative analysis demonstrated that the selection of both environmental variables and model types significantly influenced the prediction accuracy of different soil properties (Table 2). The inclusion of phenological parameters (Scenario II), time-series NDVI variables (Scenario III), and their combination (Scenario IV) all showed a clear association with an increase in R², as well as decreases in RMSE and MAE, compared to using solely natural predictive variables (climate and terrain, Scenario I). While it is understood that the addition of more variables generally leads to a reduction in RMSE, our analysis focused on the specific contribution of these crop growth parameters. The improvements in model accuracy, as demonstrated by the adjusted R² and validated through cross-validation techniques, highlight the significant and unique predictive value of phenological parameters and time-series NDVI variables, beyond the mere statistical effect of adding more variables. Notably, models incorporating natural variables and crop growth factors (Scenario IV) exhibited a significant enhancement in accuracy, ranging from an 8% improvement in DOC to a 36% improvement in soil TP, compared to models using only traditional variables of climate and terrain (Scenario I). This highlighted the significance of considering temporal vegetation indices alongside phenological parameters for soil property mapping, particularly in regions characterized by complex terrain and soil types [52]. Moreover, Scenario III (incorporated with time-series NDVI) demonstrated significantly better accuracy in modeling pH, SOC, DOC, and DON compared to Scenario II (incorporated with phenological parameters). This indicated a superior explanatory power of temporal vegetation indices for capturing soil property variations in red soil hilly areas compared to phenological parameters. These findings aligned with previous studies emphasizing the significance of temporal vegetation indices in capturing spatiotemporal variations in soil properties [18,25].

In terms of model types, our results have demonstrated the efficacy of XGBoost and RF models in soil property mapping, while various studies suggest different optimal models for predicting soil properties [53,54]. Specifically, XGBoost emerged as the preferred model for SOC, TP, DOC, and DON prediction, while RF demonstrated superior performance in TN and pH estimation (Table 2). Both XGBoost and RF models are renowned for their robustness in handling complex datasets with nonlinear relationships between predictors and response variables, enabling them to effectively capture intricate patterns present in soil property data [55]. Additionally, the regularization mechanisms employed by XGBoost and RF models play a crucial role in mitigating overfitting risk, particularly when dealing with high-dimensional datasets featuring collinearity among predictors [24]. Moreover, the XGBoost model with all natural and crop growth variables (Scenario IV) established the highest prediction accuracy for DOC (R² = 0.570, RMSE = 1.908 mg/kg, MAE = 1.333 mg/kg) among the six soil properties. This indicated that incorporating crop growth information significantly improved the accuracy of DOC mapping, as it is closely linked to crop root exudates that exhibit substantial spatial variability during crop growth [56]. The enhanced understanding and accurate mapping of DOC are essential due to its critical role in nutrient cycling and soil fertility.

3.4. Relative Importance of Predictors Based on the Optimal Model

To investigate the major influencing factors on different soil properties in the red soil hilly region, we identified the top ten variables and compared the overall relative importance of four types of environmental variables (Figure 5). The results revealed that climatic variables were the primary factors influencing the variability of soil properties in red soil hilly region. However, the contribution of different climatic variables varied across various soil properties. For instance, the relative importance of climatic variables accounted for 80% in relation to soil pH (Figure 5d). The temperature seasonality coefficient (BIO4) and mean diurnal range of temperature (BIO2) were the most significant climatic variables for SOC and DON, while mean temperature of the warmest quarter (BIO10) and annual mean temperature (BIO1) were crucial for TN (Figure 5b). For DOC, primary climatic factors included BIO9 (mean temperature of the driest quarter) and BIO1 (annual mean temperature) (Figure 5e). In general, these variables are primarily associated with temperature, which is closely linked to the decomposition of organic matter by microorganisms [57]. For instance, elevated temperature can enhance microbial activity, resulting in accelerated rates of decomposition and impacting the contents of SOC, DOC, and DON [58].

The annual precipitation (BIO12) was identified as the primary factor influencing soil TP contents in this red soil hilly region (Figure 5c). Generally, precipitation has a substantial impact on the leaching process, which involves the transportation of nutrients from the upper soil profile to deeper layers, or their complete removal from the soil system [59]. Although soil TP tends to strongly bind to soil particles, particularly in soils abundant in iron and aluminum oxides commonly found in red soil regions, this binding capacity restricts its vertical leaching while rendering it vulnerable to losses through surface runoff and erosion [60]. Moreover, the phenological variables demonstrated a greater explanatory power for TP in comparison to their contribution to other soil properties. The key phenological parameters for TP prediction encompassed the initiation of the growing season (Greenup) and the minimum value of enhanced vegetation index during the growing season (EVI_Minimum) (Figure 5c). These indicators were shaped by both the recurring natural cycles and the impact of human agricultural practices [28]. These initial points can provide insights into the early stages of crop growth, potentially correlating with the timing and management of fertilizer application by farmers [61]. Therefore, this relationship emphasized the interconnected roles played by natural phenology and agricultural management in influencing soil TP contents.

The utilization of time-series NDVI indices exhibited notable potential in explaining the spatial variations in these soil properties. Specifically, these indices accounted for 25% of the variation in SOC, 14% in TN, 17% in TP, 14% in pH, 44% in DOC, and 18% in DON. This finding aligned with a previous study, which demonstrated a significant correlation between NDVI time series and soil nutrient concentrations [17,20]. The diverse growth stages of crops, which are influenced by soil fertility, can be effectively captured by time-series vegetation indices that reflect the underlying dynamics of nutrients [62]. However, the correlation between soil nutrients and vegetation indices is also influenced by the growth stages and crop types, resulting in temporal variations in the predictive importance of these indices for different soil properties. For instance, practices such as incorporating rice straw back into the fields after harvest further increase soil organic matter inputs [63]. Consequently, NDVI values during these specific months are notably important for the prediction of SOC and TN contents, highlighting the intricate interplay between crop management practices, vegetation indices, and soil nutrient profiles.

3.5. Spatial Distribution of Soil Properties Maps

The spatial distribution for SOC, TP, DOC, and DON predicted by XGBoost, as well as the predictions of TN and pH using RF based on the combination of all variables (Scenario IV), are presented in Figure 6. The spatial distributions of SOC and TN exhibited similar spatial patterns, with high concentrations in the southeastern regions, moderate levels centrally, and lower concentrations predominantly in the northeast (Figure 6a,b). Specifically, SOC concentrations ranged between 12 and 34 g/kg, while TN concentrations varied from 1.3 to 2.2 g/kg. Soil TP levels were generally between 0.4 and 0.8 g/kg, with higher concentrations primarily in the southwestern cultivated lands and notably lower concentrations in the central cultivated areas (Figure 6c). The spatial distribution of soil pH indicated higher values in the eastern regions and lower values in the western regions, though the pH values of the rice red soils across the entire area predominantly ranged from 5.2 to 6.1 (Figure 6d), indicating overall acidic conditions. DOC levels were below 10 mg/kg in most areas, with some higher concentrations observed in the western and southeastern regions (Figure 6e). The regional DON content typically ranged from 0.5 to 1.8 mg/kg, with higher levels in the southeastern cultivated lands and lower levels centrally (Figure 6f).

Overall, the comparison between the predicted and measured values for soil properties demonstrated a high level of consistency in spatial distribution trends, thereby validating the reliability of the prediction results. However, there were discrepancies observed between the predicted and measured ranges of soil properties. For instance, the range of predicted TN values (1.0 g/kg to 2.52 g/kg) was narrower than that of measured values (0.56 g/kg to 3.18 g/kg). The lowest predicted value for SOC (4.04 g/kg) fell below the minimum measured value (7.57 g/kg), while the lowest predicted pH value (5.04) exceeded the minimum measured value (4.58). These discrepancies may have arisen from the tendency of machine learning models to underestimate high values and overestimate low values, a common issue known as regression to the mean [64]. This effect can result in narrower predicted ranges compared to the actual measured values. Additionally, the model’s training process might not fully capture the extreme variations in soil properties due to the inherent smoothing effect of machine learning algorithms. To address this, further refinement of the model and incorporation of techniques that can better handle extreme values, such as ensemble methods or anomaly detection algorithms, may be necessary [65]. Future work should also consider the potential benefits of increasing the diversity and representativeness of the training data to improve the model’s ability to predict extreme values accurately.

4. Conclusions

This study quantitatively evaluated the effectiveness of integrating crop growth information, including time-series vegetation indices and phenological parameters, in digital soil mapping. To achieve a robust assessment, we tested six fundamental soil properties, including SOC, TN, TP, DOC, DON, and pH. The results demonstrated that both time-series NDVI and phenological parameters significantly enhance the predictive accuracy for these properties, with time-series NDVI demonstrating superior accuracy, particularly in the prediction of SOC, DOC, DON, and pH. Furthermore, the study revealed variability in the performance of machine learning models depending on the soil property being predicted. Notably, the XGBoost model, utilizing variable combinations of terrain, climate, time-series NDVI, and crop phenology metrics, achieved the highest prediction accuracy for DOC, with an R² of 0.57, RMSE of 1.91 mg/kg, and MAE of 1.33 mg/kg. The analysis revealed that time-series NDVI and phenological variables account for 14–45% of the variation in soil properties, highlighting their importance in the context of agricultural soil mapping in hilly regions. Therefore, we recommend incorporating crop growth predictors derived from multi-source remote sensing into the modeling framework for mapping soil properties in agricultural settings. This approach enhances prediction results and benefits agricultural management and precise fertilizer guidance.

Author Contributions

Conceptualization, J.G. and H.F.; methodology, Q.T. and Y.Z.; software, Y.Z., J.L. and Y.Y.; validation, S.C., J.L. and Y.G.; formal analysis, J.G., Q.T. and H.F.; investigation, J.G., H.F., Y.G. and S.C.; resources, J.G., S.C. and H.F.; data curation, Y.Z., J.L. and Y.Y.; writing—original draft preparation, J.G.; writing—review and editing, J.G., Q.T., Y.Z., H.F., J.L., Y.Y., S.C. and Y.G.; visualization, Q.T. and Y.Z.; supervision, J.G. and H.F.; project administration, J.G. and H.F.; funding acquisition, J.G. and H.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the “Unveiling the List of Hanging” Science and Technology Project of Jinggangshan Agricultural High-tech Industrial Demonstration Zone (No. 20222-051244), the National Natural Science Foundation of China (32101301, 32371725).

Data Availability Statement

The data presented in this study are available from the corresponding author on reasonable request.

Acknowledgments

The authors would like to thank the anonymous reviewers for their constructive suggestions and comments to improve the quality of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lal, R.; Bouma, J.; Brevik, E.; Dawson, L.; Field, D.J.; Glaser, B.; Hatano, R.; Hartemink, A.E.; Kosaki, T.; Lascelles, B. Soils and sustainable development goals of the United Nations: An International Union of Soil Sciences perspective. Geoderma Reg. 2021, 25, e00398. [Google Scholar] [CrossRef]
Stenberg, B. Soil attributes as predictors of crop production under standardized conditions. Biol. Fertil. Soils 1998, 27, 104–112. [Google Scholar] [CrossRef]
Schjønning, P.; Jensen, J.L.; Bruun, S.; Jensen, L.S.; Christensen, B.T.; Munkholm, L.J.; Oelofse, M.; Baby, S.; Knudsen, L. The role of soil organic matter for maintaining crop yields: Evidence for a renewed conceptual basis. Adv. Agron. 2018, 150, 35–79. [Google Scholar]
Zhou, T.; Geng, Y.; Chen, J.; Sun, C.; Haase, D.; Lausch, A. Mapping of soil total nitrogen content in the middle reaches of the Heihe River Basin in China using multi-source remote sensing-derived variables. Remote Sens. 2019, 11, 2934. [Google Scholar] [CrossRef]
Husson, O. Redox potential (Eh) and pH as drivers of soil/plant/microorganism systems: A transdisciplinary overview pointing to integrative opportunities for agronomy. Plant Soil 2013, 362, 389–417. [Google Scholar] [CrossRef]
Surey, R.; Schimpf, C.M.; Sauheitl, L.; Mueller, C.W.; Rummel, P.S.; Dittert, K.; Kaiser, K.; Böttcher, J.; Mikutta, R. Potential denitrification stimulated by water-soluble organic carbon from plant residues during initial decomposition. Soil Biol. Biochem. 2020, 147, 107841. [Google Scholar] [CrossRef]
Forkuor, G.; Dimobe, K.; Serme, I.; Tondoh, J.E. Landsat-8 vs. Sentinel-2: Examining the added value of sentinel-2’s red-edge bands to land-use and land-cover mapping in Burkina Faso. GISci. Remote Sens. 2018, 55, 331–354. [Google Scholar] [CrossRef]
Wang, H.; Zhang, X.; Wu, W.; Liu, H. Prediction of Soil Organic Carbon under Different Land Use Types Using Sentinel-1/-2 Data in a Small Watershed. Remote Sens. 2021, 13, 1229. [Google Scholar] [CrossRef]
Luo, C.; Zhang, W.; Zhang, X.; Liu, H. Mapping of soil organic matter in a typical black soil area using Landsat-8 synthetic images at different time periods. Catena 2023, 231, 107336. [Google Scholar] [CrossRef]
Khanal, S.; Kc, K.; Fulton, J.P.; Shearer, S.; Ozkan, E. Remote sensing in agriculture—Accomplishments, limitations, and opportunities. Remote Sens. 2020, 12, 3783. [Google Scholar] [CrossRef]
Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: ESA’s optical high-resolution mission for GMES operational services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
Luo, C.; Wang, Y.; Zhang, X.; Zhang, W.; Liu, H. Spatial prediction of soil organic matter content using multiyear synthetic images and partitioning algorithms. Catena 2022, 211, 106023. [Google Scholar] [CrossRef]
Silvero, N.E.Q.; Demattê, J.A.M.; Amorim, M.T.A.; dos Santos, N.V.; Rizzo, R.; Safanelli, J.L.; Poppiel, R.R.; de Sousa Mendes, W.; Bonfatti, B.R. Soil variability and quantification based on Sentinel-2 and Landsat-8 bare soil images: A comparison. Remote Sens. Environ. 2021, 252, 112117. [Google Scholar] [CrossRef]
Yang, L.; He, X.; Shen, F.; Zhou, C.; Zhu, A.; Gao, B.; Chen, Z.; Li, M. Improving prediction of soil organic carbon content in croplands using phenological parameters extracted from NDVI time series data. Soil Tillage Res. 2020, 196, 104465. [Google Scholar] [CrossRef]
Wu, T.; Wang, D.; Mu, C.; Zhang, W.; Zhu, X.; Zhao, L.; Li, R.; Hu, G.; Zou, D.; Chen, J. Storage, patterns, and environmental controls of soil organic carbon stocks in the permafrost regions of the Northern Hemisphere. Sci. Total Environ. 2022, 828, 154464. [Google Scholar] [CrossRef]
Guo, L.; Fu, P.; Shi, T.; Chen, Y.; Zhang, H.; Meng, R.; Wang, S. Mapping field-scale soil organic carbon with unmanned aircraft system-acquired time series multispectral images. Soil Tillage Res. 2020, 196, 104477. [Google Scholar] [CrossRef]
Zeraatpisheh, M.; Garosi, Y.; Owliaie, H.R.; Ayoubi, S.; Taghizadeh-Mehrjardi, R.; Scholten, T.; Xu, M. Improving the spatial prediction of soil organic carbon using environmental covariates selection: A comparison of a group of environmental covariates. Catena 2022, 208, 105723. [Google Scholar] [CrossRef]
Zhang, L.; Cai, Y.; Huang, H.; Li, A.; Yang, L.; Zhou, C. A CNN-LSTM model for soil organic carbon content prediction with long time series of MODIS-based phenological variables. Remote Sens. 2022, 14, 4441. [Google Scholar] [CrossRef]
McBratney, A.B.; Santos, M.M.; Minasny, B. On digital soil mapping. Geoderma 2003, 117, 3–52. [Google Scholar] [CrossRef]
Zhang, Y.; Guo, L.; Chen, Y.; Shi, T.; Luo, M.; Ju, Q.; Zhang, H.; Wang, S. Prediction of soil organic carbon based on Landsat 8 monthly NDVI data for the Jianghan Plain in Hubei Province, China. Remote Sens. 2019, 11, 1683. [Google Scholar] [CrossRef]
Yang, L.; Cai, Y.; Zhang, L.; Guo, M.; Li, A.; Zhou, C. A deep learning method to predict soil organic carbon content at a regional scale using satellite-based phenology variables. Int. J. Appl. Earth Obs. Geoinf. 2021, 102, 102428. [Google Scholar] [CrossRef]
Lamichhane, S.; Kumar, L.; Wilson, B. Digital soil mapping algorithms and covariates for soil organic carbon mapping and their implications: A review. Geoderma 2019, 352, 395–413. [Google Scholar] [CrossRef]
Wadoux, A.M.-C.; Minasny, B.; McBratney, A.B. Machine learning for digital soil mapping: Applications, challenges and suggested solutions. Earth-Sci. Rev. 2020, 210, 103359. [Google Scholar] [CrossRef]
Zhou, T.; Geng, Y.; Ji, C.; Xu, X.; Wang, H.; Pan, J.; Bumberger, J.; Haase, D.; Lausch, A. Prediction of soil organic carbon and the C: N ratio on a national scale using machine learning and satellite data: A comparison between Sentinel-2, Sentinel-3 and Landsat-8 images. Sci. Total Environ. 2021, 755, 142661. [Google Scholar] [CrossRef] [PubMed]
Zhang, X.; Xue, J.; Chen, S.; Wang, N.; Shi, Z.; Huang, Y.; Zhuo, Z.Q. Digital Mapping of Soil Organic Carbon with Machine Learning in Dryland of Northeast and North Plain China. Remote Sens. 2022, 14, 2504. [Google Scholar] [CrossRef]
Pan, B.; Zheng, Y.; Shen, R.; Ye, T.; Zhao, W.; Dong, J.; Ma, H.; Yuan, W. High resolution distribution dataset of double-season paddy rice in china. Remote Sens. 2021, 13, 4609. [Google Scholar] [CrossRef]
Zhou, T.; Geng, Y.; Chen, J.; Liu, M.; Haase, D.; Lausch, A. Mapping soil organic carbon content using multi-source remote sensing variables in the Heihe River Basin in China. Ecol. Indic. 2020, 114, 106288. [Google Scholar] [CrossRef]
He, X.; Yang, L.; Li, A.; Zhang, L.; Shen, F.; Cai, Y.; Zhou, C. Soil organic carbon prediction using phenological parameters and remote sensing variables generated from Sentinel-2 images. Catena 2021, 205, 105442. [Google Scholar] [CrossRef]
Bao, Y.; Ustin, S.; Meng, X.; Zhang, X.; Guan, H.; Qi, B.; Liu, H. A regional-scale hyperspectral prediction model of soil organic carbon considering geomorphic features. Geoderma 2021, 403, 115263. [Google Scholar] [CrossRef]
Sabetizade, M.; Gorji, M.; Roudier, P.; Zolfaghari, A.A.; Keshavarzi, A. Combination of MIR spectroscopy and environmental covariates to predict soil organic carbon in a semi-arid region. Catena 2021, 196, 104844. [Google Scholar] [CrossRef]
Xiao, Y.; Xue, J.; Zhang, X.; Wang, N.; Hong, Y.; Jiang, Y.; Zhou, Y.; Teng, H.; Hu, B.; Lugato, E.; et al. Improving pedotransfer functions for predicting soil mineral associated organic carbon by ensemble machine learning. Geoderma 2022, 428, 116208. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Biney, J.K.M.; Blöcher, J.R.; Bell, S.M.; Borůvka, L.; Vašát, R. Can in situ spectral measurements under disturbance-reduced environmental conditions help improve soil organic carbon estimation? Sci. Total Environ. 2022, 838, 156304. [Google Scholar] [CrossRef] [PubMed]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Wang, Y.; Wang, S.; Adhikari, K.; Wang, Q.; Sui, Y.; Xin, G. Effect of cultivation history on soil organic carbon status of arable land in northeastern China. Geoderma 2019, 342, 55–64. [Google Scholar] [CrossRef]
Piikki, K.; Wetterlind, J.; Söderström, M.; Stenberg, B. Perspectives on validation in digital soil mapping of continuous attributes—A review. Soil Use Manag. 2021, 37, 7–21. [Google Scholar] [CrossRef]
Hong, Y.; Guo, L.; Chen, S.; Linderman, M.; Mouazen, A.M.; Yu, L.; Chen, Y.; Liu, Y.L.; Liu, Y.F.; Cheng, H.; et al. Exploring the potential of airborne hyperspectral image for estimating topsoil organic carbon: Effects of fractional-order derivative and optimal band combination algorithm. Geoderma 2020, 365, 114228. [Google Scholar] [CrossRef]
Tripathi, R.; Nayak, A.; Shahid, M.; Lal, B.; Gautam, P.; Raja, R.; Mohanty, S.; Kumar, A.; Panda, B.; Sahoo, R. Delineation of soil management zones for a rice cultivated area in eastern India using fuzzy clustering. Catena 2015, 133, 128–136. [Google Scholar] [CrossRef]
Fan, M.; Lal, R.; Zhang, H.; Margenot, A.J.; Wu, J.; Wu, P.; Zhang, L.; Yao, J.; Chen, F.; Gao, C. Variability and determinants of soil organic matter under different land uses and soil types in eastern China. Soil Tillage Res. 2020, 198, 104544. [Google Scholar] [CrossRef]
Han, Y.; Yi, D.; Ye, Y.; Guo, X.; Liu, S. Response of spatiotemporal variability in soil pH and associated influencing factors to land use change in a red soil hilly region in southern China. Catena 2022, 212, 106074. [Google Scholar] [CrossRef]
Cai, Z.; Yang, C.; Du, X.; Zhang, L.; Wen, S.; Yang, Y. Parent material and altitude influence red soil acidification after converted rice paddy to upland in a hilly region of southern China. J. Soils Sed. 2023, 23, 1628–1640. [Google Scholar] [CrossRef]
Zeng, Q.; Yin, M.; Fu, L.; Singh, B.K.; Liu, S.; Chen, H.; Ge, A.; Han, L.; Zhang, L. Green manure substitution for potassium fertilizer promotes agro-ecosystem multifunctionality via triggering interactions among soil, plant and rhizosphere microbiome. Plant Soil 2023, 498, 431–450. [Google Scholar] [CrossRef]
Wu, Z.; Liu, Y.; Han, Y.; Zhou, J.; Liu, J.; Wu, J. Mapping farmland soil organic carbon density in plains with combined cropping system extracted from NDVI time-series data. Sci. Total Environ. 2021, 754, 142120. [Google Scholar] [CrossRef] [PubMed]
Gan, Y.; Siddique, K.H.; Turner, N.C.; Li, X.; Niu, J.; Yang, C.; Liu, L.; Chai, Q. Ridge-furrow mulching systems—An innovative technique for boosting crop productivity in semiarid rain-fed environments. Adv. Agron. 2013, 118, 429–476. [Google Scholar]
Tian, Z.; Ji, Y.; Xu, H.; Qiu, H.; Sun, L.; Zhong, H.; Liu, J. The potential contribution of growing rapeseed in winter fallow fields across Yangtze River Basin to energy and food security in China. Resour. Conserv. Recycl. 2021, 164, 105159. [Google Scholar] [CrossRef]
Liu, W.; Li, S.; Tao, J.; Liu, X.; Yin, G.; Xia, Y.; Wang, T.; Zhang, H. CARM30: China annual rapeseed maps at 30 m spatial resolution from 2000 to 2022 using multi-source data. Sci. Data 2024, 11, 356. [Google Scholar] [CrossRef] [PubMed]
Tao, J.; Wu, W.; Liu, W.; Xu, M. Exploring the Spatio-Temporal Dynamics of Winter Rape on the Middle Reaches of Yangtze River Valley Using Time-Series MODIS Data. Sustainability 2020, 12, 266. [Google Scholar] [CrossRef]
Ndip, F.E.; Molua, E.L.; Mvodo, M.S.; Nkendah, R.; Choumbou, R.F.D.; Tabetando, R.; Akem, N.F. Farmland Fragmentation, crop diversification and incomes in Cameroon, a Congo Basin country. Land Use Policy 2023, 130, 106663. [Google Scholar] [CrossRef]
Wu, F.; Qiu, Y.; Huang, W.; Guo, S.; Han, Y.; Wang, G.; Li, X.; Lei, Y.; Yang, B.; Xiong, S. Water and heat resource utilization of cotton under different cropping patterns and their effects on crop biomass and yield formation. Agric. For. Meteorol. 2022, 323, 109091. [Google Scholar] [CrossRef]
Zhang, C.; Diao, C. A Phenology-guided Bayesian-CNN (PB-CNN) framework for soybean yield estimation and uncertainty analysis. ISPRS J. Photogramm. Remote Sens. 2023, 205, 50–73. [Google Scholar] [CrossRef]
Bégué, A.; Arvor, D.; Bellon, B.; Betbeder, J.; De Abelleyra, D.; Ferraz, R.P.D.; Lebourgeois, V.; Lelong, C.; Simões, M.; Verón, S.R. Remote sensing and cropping practices: A review. Remote Sens. 2018, 10, 99. [Google Scholar] [CrossRef]
Liu, X.; Wang, J.; Song, X. Improving the Spatial Prediction of Soil Organic Carbon Content Using Phenological Factors: A Case Study in the Middle and Upper Reaches of Heihe River Basin, China. Remote Sens. 2023, 15, 1847. [Google Scholar] [CrossRef]
Ahirwal, J.; Nath, A.; Brahma, B.; Deb, S.; Sahoo, U.K.; Nath, A.J. Patterns and driving factors of biomass carbon and soil organic carbon stock in the Indian Himalayan region. Sci. Total Environ. 2021, 770, 145292. [Google Scholar] [CrossRef] [PubMed]
Geng, L.; Che, T.; Ma, M.; Tan, J.; Wang, H. Corn biomass estimation by integrating remote sensing and long-term observation data based on machine learning techniques. Remote Sens. 2021, 13, 2352. [Google Scholar] [CrossRef]
Tan, Q.; Geng, J.; Fang, H.; Li, Y.; Guo, Y. Exploring the Impacts of Data Source, Model Types and Spatial Scales on the Soil Organic Carbon Prediction: A Case Study in the Red Soil Hilly Region of Southern China. Remote Sens. 2022, 14, 5151. [Google Scholar] [CrossRef]
Wilson, C.H.; Caughlin, T.T.; Rifai, S.W.; Boughton, E.H.; Mack, M.C.; Flory, S.L. Multi-decadal time series of remotely sensed vegetation improves prediction of soil carbon in a subtropical grassland. Ecol. Appl. 2017, 27, 1646–1656. [Google Scholar] [CrossRef]
Moinet, G.Y.; Hunt, J.E.; Kirschbaum, M.U.; Morcom, C.P.; Midwood, A.J.; Millard, P. The temperature sensitivity of soil organic matter decomposition is constrained by microbial access to substrates. Soil Biol. Biochem. 2018, 116, 333–339. [Google Scholar] [CrossRef]
Jia, Y.; Kuzyakov, Y.; Wang, G.; Tan, W.; Zhu, B.; Feng, X. Temperature sensitivity of decomposition of soil organic matter fractions increases with their turnover time. Land Degrad. Dev. 2020, 31, 632–645. [Google Scholar] [CrossRef]
Hou, E.; Chen, C.; Luo, Y.; Zhou, G.; Kuang, Y.; Zhang, Y.; Heenan, M.; Lu, X.; Wen, D. Effects of climate on soil phosphorus cycle and availability in natural terrestrial ecosystems. Glob. Chang. Biol. 2018, 24, 3344–3356. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.; Shi, L.; Fu, S. Effects of nitrogen deposition and increased precipitation on soil phosphorus dynamics in a temperate forest. Geoderma 2020, 380, 114650. [Google Scholar] [CrossRef]
Tittonell, P.; Shepherd, K.D.; Vanlauwe, B.; Giller, K.E. Unravelling the effects of soil and crop management on maize productivity in smallholder agricultural systems of western Kenya—An application of classification and regression tree analysis. Agric. Ecosyst. Environ. 2008, 123, 137–150. [Google Scholar] [CrossRef]
Guo, L.; Fu, P.; Shi, T.; Chen, Y.; Zeng, C.; Zhang, H.; Wang, S.Q. Exploring influence factors in mapping soil organic carbon on low-relief agricultural lands using time series of remote sensing data. Soil Tillage Res. 2021, 210, 104982. [Google Scholar] [CrossRef]
Wang, W.; Lai, D.; Wang, C.; Pan, T.; Zeng, C. Effects of rice straw incorporation on active soil organic carbon pools in a subtropical paddy field. Soil Tillage Res. 2015, 152, 8–16. [Google Scholar] [CrossRef]
dos Santos, E.P.; Moreira, M.C.; Fernandes-Filho, E.I.; Demattê, J.A.M.; dos Santos, U.J.; da Silva, D.D.; Cruz, R.R.P.; Moura-Bueno, J.M.; Santos, I.C.; Sampaio, E.V.D.S.B. Improving the generalization error and transparency of regression models to estimate soil organic carbon using soil reflectance data. Ecol. Inf. 2023, 77, 102240. [Google Scholar] [CrossRef]
Meng, X.; Bao, Y.; Wang, Y.; Zhang, X.; Liu, H. An advanced soil organic carbon content prediction model via fused temporal-spatial-spectral (TSS) information based on machine learning and deep learning algorithms. Remote Sens. Environ. 2022, 280, 113166. [Google Scholar] [CrossRef]

Figure 1. (a) Geographical location, (b) land cover types and sampling points, and (c) elevation distribution across the study area.

Figure 2. Schematic of the experimental design and workflow for the study.

Figure 3. Spatial–temporal distribution of maximum monthly NDVI values for 2020 across the croplands in the study area. Panels (a–l) depict maximum monthly NDVI values from January to December, respectively.

Figure 4. Spatial–temporal patterns of phenological parameters, including phenological stages (a–g) and EVI2-derived metrics (h–j), throughout 2020 in the croplands of the study area.

Figure 5. Ranking of the relative importance of the top ten variables, with pie charts illustrating the overall relative importance of four types of environmental variables in predicting six soil properties.

Figure 6. Spatial predictions of soil properties in the study area’s croplands using terrain, climate, phenological parameters, and time-series NDVI variables. XGBoost was employed for the prediction of SOC, TP, DOC, and DON, while RF was used for TN and pH. Insets in the lower left corner provide an enlarged view of localized areas, with triangles representing observed values.

Table 1. Statistical characteristics of the observed six soil properties in the study area.

Soil Property	SOC (g/kg)	TN (g/kg)	TP (g/kg)	pH	DOC (mg/kg)	DON (mg/kg)
Max	40.39	3.18	0.91	6.87	21.35	3.06
Min	7.57	0.56	0.26	4.58	1.42	0.01
Mean	23.92	1.78	0.54	5.70	4.73	1.14
SD	5.85	0.47	0.14	0.40	2.95	0.58
CV%	24.46%	26.40%	25.93%	7.02%	62.37%	50.88%
Kurt	0.62	0.71	0.14	0.08	11.54	0.70
Skew	−0.11	0.05	0.73	−0.02	2.88	0.34

Notes: Max, Min, SD, CV, Kurt, and Skew refer to the maximum, minimum, standard deviation, coefficient of variation, kurtosis, and skewness of observed soil properties, respectively.

Table 2. Comparative analysis of model accuracy for soil property inversion using three machine learning models across four variable scenarios. Scenario I includes terrain + climate; Scenario II includes terrain + climate + phenology; Scenario III includes terrain + climate + time-series NDVI; Scenario IV includes terrain + climate + phenology + time-series NDVI. Each scenario was selected by a recursive feature elimination (RFE) algorithm before modelling.

Soil Property	Covariate Scenarios	RF			Cubist			XGBoost
Soil Property	Covariate Scenarios	R²	RMSE	MAE	R²	RMSE	MAE	R²	RMSE	MAE
SOC	Scenario I	0.276	5.370	4.057	0.276	5.351	4.015	0.276	5.351	4.015
	Scenario II	0.286	5.253	3.981	0.283	5.306	4.044	0.283	5.306	4.044
	Scenario III	0.286	5.295	4.017	0.283	5.310	4.051	0.301	5.205	4.063
	Scenario IV	0.300	5.234	3.936	0.293	5.274	4.026	0.313	5.216	3.952
TN	Scenario I	0.212	0.444	0.339	0.231	0.451	0.345	0.216	0.438	0.342
	Scenario II	0.236	0.435	0.328	0.242	0.450	0.343	0.238	0.422	0.329
	Scenario III	0.218	0.437	0.333	0.212	0.459	0.345	0.219	0.436	0.334
	Scenario IV	0.256	0.428	0.321	0.232	0.455	0.344	0.253	0.430	0.330
TP	Scenario I	0.183	0.138	0.107	0.173	0.141	0.109	0.221	0.133	0.104
	Scenario II	0.203	0.136	0.107	0.174	0.146	0.114	0.263	0.130	0.103
	Scenario III	0.224	0.133	0.104	0.198	0.138	0.108	0.243	0.131	0.103
	Scenario IV	0.248	0.131	0.103	0.228	0.138	0.106	0.276	0.130	0.104
pH	Scenario I	0.427	0.318	0.253	0.376	0.347	0.278	0.400	0.326	0.263
	Scenario II	0.425	0.319	0.253	0.378	0.347	0.278	0.415	0.326	0.261
	Scenario III	0.456	0.310	0.252	0.407	0.344	0.275	0.424	0.321	0.258
	Scenario IV	0.468	0.308	0.250	0.441	0.330	0.263	0.436	0.317	0.253
DOC	Scenario I	0.516	2.041	1.444	0.434	2.309	1.593	0.528	1.940	1.363
	Scenario II	0.528	2.086	1.423	0.416	2.412	1.595	0.531	1.920	1.319
	Scenario III	0.540	1.946	1.349	0.469	2.180	1.517	0.541	1.890	1.326
	Scenario IV	0.550	1.927	1.353	0.466	2.187	1.526	0.570	1.908	1.333
DON	Scenario I	0.366	0.486	0.377	0.353	0.502	0.387	0.354	0.490	0.380
	Scenario II	0.371	0.484	0.377	0.372	0.490	0.374	0.364	0.488	0.378
	Scenario III	0.433	0.469	0.365	0.402	0.474	0.371	0.453	0.446	0.352
	Scenario IV	0.440	0.464	0.365	0.420	0.465	0.368	0.471	0.439	0.341

Notes: RMSE and MAE units for SOC, TN, and TP are in g/kg, and for DOC and DON are in mg/kg.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Geng, J.; Tan, Q.; Zhang, Y.; Lv, J.; Yu, Y.; Fang, H.; Guo, Y.; Cheng, S. Leveraging Remote Sensing-Derived Dynamic Crop Growth Information for Improved Soil Property Prediction in Farmlands. Remote Sens. 2024, 16, 2731. https://doi.org/10.3390/rs16152731

AMA Style

Geng J, Tan Q, Zhang Y, Lv J, Yu Y, Fang H, Guo Y, Cheng S. Leveraging Remote Sensing-Derived Dynamic Crop Growth Information for Improved Soil Property Prediction in Farmlands. Remote Sensing. 2024; 16(15):2731. https://doi.org/10.3390/rs16152731

Chicago/Turabian Style

Geng, Jing, Qiuyuan Tan, Ying Zhang, Junwei Lv, Yong Yu, Huajun Fang, Yifan Guo, and Shulan Cheng. 2024. "Leveraging Remote Sensing-Derived Dynamic Crop Growth Information for Improved Soil Property Prediction in Farmlands" Remote Sensing 16, no. 15: 2731. https://doi.org/10.3390/rs16152731

APA Style

Geng, J., Tan, Q., Zhang, Y., Lv, J., Yu, Y., Fang, H., Guo, Y., & Cheng, S. (2024). Leveraging Remote Sensing-Derived Dynamic Crop Growth Information for Improved Soil Property Prediction in Farmlands. Remote Sensing, 16(15), 2731. https://doi.org/10.3390/rs16152731

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Leveraging Remote Sensing-Derived Dynamic Crop Growth Information for Improved Soil Property Prediction in Farmlands

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Soil Sampling and Soil Properties Determination

2.3. Environmental Covariates and Preprocessing

2.3.1. Terrain Data

2.3.2. Climate Data

2.3.3. Time-Series Crop NDVI Data

2.3.4. Crop Phenological Parameters Data

2.4. Modeling Techniques and Accuracy Evaluation

2.4.1. Machine Learning Techniques

2.4.2. Model Performance Evaluation

3. Results and Discussion

3.1. Statistical Characteristics of Observed Six Soil Properties

3.2. Spatial Variability and Temporal Dynamics of Crop NDVI and Phenological Parameters Variables

3.3. Comparison of Modeling Performance with Different Variable Scenarios

3.4. Relative Importance of Predictors Based on the Optimal Model

3.5. Spatial Distribution of Soil Properties Maps

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI