Hydro-Topographic Contribution to In-Field Crop Yield Variation Using High-Resolution Surface and GPR-Derived Subsurface DEMs

Chang, Jisung Geba; Anderson, Martha; Gao, Feng; Russ, Andrew; Zhao, Haoteng; Cirone, Richard; Pachepsky, Yakov; Johnson, David M.

doi:10.3390/rs17173061

Open AccessArticle

Hydro-Topographic Contribution to In-Field Crop Yield Variation Using High-Resolution Surface and GPR-Derived Subsurface DEMs

by

Jisung Geba Chang

^1,*

,

Martha Anderson

¹,

Feng Gao

¹

,

Andrew Russ

¹,

Haoteng Zhao

¹,

Richard Cirone

¹

,

Yakov Pachepsky

²

and

David M. Johnson

³

¹

Hydrology and Remote Sensing Laboratory, USDA ARS, Beltsville, MD 20705, USA

²

Environmental Microbial and Food Safety Laboratory, USDA ARS, Beltsville, MD 20705, USA

³

National Agricultural Statistics Service, USDA, Washington, DC 20250, USA

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(17), 3061; https://doi.org/10.3390/rs17173061

Submission received: 4 July 2025 / Revised: 21 August 2025 / Accepted: 31 August 2025 / Published: 3 September 2025

(This article belongs to the Section Remote Sensing in Agriculture and Vegetation)

Download

Browse Figures

Versions Notes

Abstract

Understanding spatial variability in crop yields across fields is critical for developing precision agricultural strategies that optimize productivity while reducing negative environmental impacts. This variability often arises from a complex interplay of topographic features, soil characteristics, and hydrological conditions. This study investigates the influence of hydro-topographic factors on corn and soybean yield variability from 2016 to 2023 at the well-managed experimental sites in Beltsville, Maryland. A high-resolution surface digital elevation model (DEM) and subsurface DEM derived from ground-penetrating radar (GPR) were used to quantify topographic factors (elevation, slope, and aspect) and hydrological factors (surface flow accumulation, depth from the surface to the subsurface-restricting layer, and distance from each crop pixel to the nearest subsurface flow pathway). Topographic variables alone explained yield variation, with a relative root mean square error (RRMSE) of 23.7% (r² = 0.38). Adding hydrological variables reduced the error to 15.3% (r² = 0.73), and further combining with remote sensing data improved the explanatory power to an RRMSE of 10.0% (r² = 0.87). Notably, even without subsurface data, incorporating surface-derived flow accumulation reduced the RRMSE to 18.4% (r² = 0.62), which is especially important for large-scale cropland applications where subsurface data are often unavailable. Annual spatial yield variation maps were generated using hydro-topographic variables, enabling the identification of long-term persistent yield regions (LTRs), which served as stable references to reduce spatial anomalies and enhance model robustness. In addition, by combining remote sensing data with interannual meteorological variables, prediction models were evaluated with and without hydro-topographic inputs. The inclusion of hydro-topographic variables improved spatial characterization and enhanced prediction accuracy, reducing error by an average of 4.5% across multiple model combinations. These findings highlight the critical role of hydro-topography in explaining spatial yield variation for corn and soybean and support the development of precise, site-specific management strategies to enhance productivity and resource efficiency.

Keywords:

hydro-topography; long-term persistent yield region; precision agriculture; Random Forest; Shapley addictive exPlanations; yield spatial variability

1. Introduction

Understanding the spatiotemporal variability in crop yield plays a critical role in enabling efficient agricultural management and achieving sustainable agriculture with both economic and environmental benefits. Temporal trends in crop yields and their variability have been widely studied and are generally understood to be driven by climatic factors such as precipitation, solar irradiance, and temperature, along with the biophysical responses of crops [1,2,3]. However, spatial variability in yields within fields remains a challenge due to the complex interplay among topographic features, soil characteristics, hydrological conditions, and management practices. Even though above-canopy weather conditions are generally uniform at the field scale and management practices such as fertilization and planting density are applied consistently within the field, considerable spatial variation in crop yield is still observed. This variation is influenced by several factors, including (1) nutrient loss due to runoff, especially under intense rainfall interacting with soil characteristics and slope; (2) variation in energy received, driven by topographic features such as elevation and aspect; and (3) differences in water availability resulting from spatial patterns in soil moisture retention and subsurface water flow pathways [2,4,5,6]. In addition to these main factors, ecological or operational influences may also contribute to variability. For example, annual tillage direction can have a slight effect on water movement along furrows, and field edges may be more exposed to wildlife or environmental stress.

Remote sensing (RS) technologies have significantly improved the monitoring of spatiotemporal changes in agricultural conditions, providing valuable information on vegetation dynamics and crop conditions down to small spatial scales [7]. RS data reflect the current biophysical status of vegetation, such as canopy cover, structure, and density, and often show strong correlations with crop yield. However, this information primarily captures surface-level conditions and has limitations in fully explaining complex yield variability relating to subsurface properties [8]. In particular, optical RS data are primarily sensitive to vegetation cover, so most reflectance-based vegetation indices (such as the Normalized Difference Vegetation Index, NDVI) mainly reflect canopy characteristics [9,10]. Active sensors such as Synthetic Aperture Radar (SAR) can partially penetrate the canopy, but still have limited capacity to detect subsurface features [11], and SAR signals are also influenced by vegetation biomass, soil moisture, and surface roughness [12]. These limitations underscore the importance of incorporating complementary variables into spatial yield analysis.

In this context, environmental factors that influence underlying field conditions become particularly important. Topography and soil properties play a central role in explaining spatial yield variability within fields. Soil layers influence water movement and the distribution of chemicals, and soil texture is a key factor affecting infiltration and nutrient retention, which can increase spatial variability in crop yield [5]. When rainfall exceeds the soil’s infiltration capacity—which is determined by both soil texture and rainfall intensity—water and nutrients are transported downslope via surface runoff [2,13]. In addition, the timing of rainfall relative to fertilizer application can significantly affect nutrient availability and movement. Since soils are inherently heterogeneous, even a single field can be divided into distinct management zones based on varying needs. However, in fields where soil conditions are generally uniform, the influence of soil texture on yield variability is relatively limited, and hydro-topographical factors—such as slope and elevation differences—become more important in redistributing sediments and nutrients through runoff. Topographic variation is another key driver of within-field yield differences. Changes in elevation, slope, and aspect significantly affect yield response by modifying the biophysical conditions that influence plant growth [3,5]. For instance, south-facing slopes in the Northern Hemisphere generally receive more sunlight, which enhances photosynthesis. Topographic variables may have limited explanatory power for temporal yield variability because they change little over time. However, they can make a significant contribution to understanding spatial yield variation within the field. In particular, high-resolution digital elevation models (DEMs) allow for more precise analysis of these topographic effects.

Subsurface conditions also influence yield variability. Groundwater availability is affected by the depth of marginal soil layers, such as clay horizons, which influence moisture retention and availability [1,5,14,15]. These layers can be effectively detected using Ground Penetrating Radar (GPR) [16]. The relationship between crop yield and subsurface layer was evaluated by Gish et al. (2002) at the agricultural optimizing site in Beltsville Agricultural Research Center (BARC), MD [8]. They found that strong dielectric discontinuities detected by GPR at this site—often from clay lenses under sandy layers—indicated zones of differing water-holding capacity. Gish et al. (2005) further showed that corn yields were higher near the GPR-identified subsurface flow pathway network (SFPN) during drought, although this effect was absent where restrictive layers were too deep [5]. This SFPN formed complex three-dimensional hydrological networks combining lateral flow and localized perched water tables [5,17]. Building on this, Morgan et al. (2019) quantified the role of shallow groundwater and SFPN in yield variability [2] and found that yields in these fields tend to increase near SFPN in dry years but not in wet years, suggesting that the influence of subsurface flow is climate dependent.

While many studies have investigated the influence of surface topography and subsurface hydrology on the crop yield variability, their ability to explain its spatial patterns remains limited [2]. An integrated set of hydro-topographic variables derived from high-resolution DEMs and GPR might provide valuable insights into subsurface and terrain-driven processes that influence water and nutrient redistribution, key factors in spatial yield variability. This study aims to (1) identify the contribution of key hydro-topographic variables to spatial yield variability in corn and soybean from 2016 to 2023 by integrating high-resolution surface and subsurface DEMs at an experimental field site, and (2) examine the synergistic effects and contribution of hydro-topographic variables when integrated with remote sensing data for spatial crop yield analysis and prediction.

2. Materials and Methods

2.1. Study Area

The Optimizing Production Inputs for Economic and Environmental Enhancement (OPE3) experimental site is located on the East Farm (EF) of the Beltsville Agricultural Research Center (BARC) in Beltsville, Maryland, as shown in Figure 1. In this study, yield variability was investigated in four OPE3 fields, in north-to-south order: EF 5-3A (5.56 hectares), EF 5-3B (3.27 hectares), EF 5-3C (3.46 hectares), and EF 5-3D (3.94 hectares). The OPE3 site is managed as a rainfed system without supplemental irrigation. Based on a 1 m resolution DEM (see Section 2.3) [18], the elevation across the OPE3 fields ranges from 31 to 43 m above sea level, with slopes varying from 0 to 5 degrees.

According to the soil classification by SoilWeb [19], Fields A, B, and D primarily consist of the Russett–Christiana complex (RcB) and Beltsville silt loam (BaB), as shown in the lower right panel of Figure 1. The RcB unit comprises approximately 40% Russett and 35% Christiana soils, with smaller portions of Christiana (10%) and Hambrook soils. The BaB unit consists of approximately 70% Beltsville soils, along with 10% Reybold and 10% Aquasco soils. Field C is primarily characterized by the Christiana–Downer complex (CcC) and BaB soils, with the CcC unit composed of approximately 45% Christiana, 30% Downer, and 10% Beltsville soil, resulting in about 50% silt loam coverage in this Field. Despite these variations in soil classification, all fields share similar soil textures, dominated by silt loam, resulting in comparable soil physical properties, including a saturated soil moisture range of approximately 33–39% [2,6,20].

2.2. Crop Yield and Meteorological Data

The OPE3 fields were planted with corn (Zea mays L.) and soybean (Glycine max (L.) Merr.) from 2016 to 2021. From 2016 to 2021, all four fields were planted with the same crop each year. However, in 2022 and 2023, fields A and B were planted with one crop, while fields C and D were planted with a different crop. Nitrogen was applied to the four fields of the OPE3 site every year, with a total average application rate of approximately 150 kgN/ha. Overall, total nitrogen use increased approximately 4.5% to 6.9% per year (from 134 N/ha in 2016 to 169 N/ha in 2023). Crop yields were recorded using Ag Leader yield monitors equipped with differential GPS to ensure sub-meter accuracy. Raw data were cleaned to remove errors such as zero flow rates or outliers, and high-quality gridded yield maps were produced by interpolating the cleaned point data using ordinary kriging at a 5 × 5 m resolution, providing spatially continuous datasets suitable for precision agriculture research and management applications [3]. Detailed data preprocessing and workflow are shown in Appendix A Figure A1, which outlines the data collection, preprocessing, and feature extraction steps used to assess the spatial variability in crop yield.

Figure 2 and Figure 3 show the spatial variation and average crop yield at the OPE3 site from 2016 to 2023. Note that yield data are missing for all fields in 2021 and for Fields C and D in 2023. To ensure full site coverage, soybean data from 2022 and 2023 were merged across Fields A, B, C, and D. Field A consistently exhibited higher average yields than the other fields across all years, and the average crop yields of the other fields were generally similar, except for Field D in 2016, as shown in Figure 3. It is important to emphasize that, despite the relatively minor temporal variation in average yield shown in Figure 3, distinct spatial heterogeneity, evident in Figure 2, was consistently observed across years under uniform management conditions, underscoring the significance of spatial variability as a key factor in crop production analysis.

Alongside crop yield data, meteorological data were collected for the study period. Precipitation was included in the analysis as a key variable, as water availability during critical growth stages—such as root development, nutrient uptake, and grain filling—directly affects yield, particularly for soybeans. Precipitation data were collected from a BARC weather station located approximately 1.5 km from the study site (as shown in Table 1), along with PRISM (Parameter-elevation Regressions on Independent Slopes Model) precipitation data (gridded to 800 m) for use as input in the machine learning model [21]. These two datasets showed a high level of agreement, with a correlation of 88.5%. Solar radiation directly drives photosynthesis and thus plays a key role in biomass and yield. Higher radiation during growth stages boosts productivity if moisture is adequate. Downward Shortwave Radiation (SW), a measure of incoming solar energy, was obtained as daily total SW from the MCD18A1 product (Version 6.2) at 500 m resolution. In addition, other meteorological variables, such as air temperature (TA) and vapor pressure deficit (VPD), were collected from the CONUS-wide PRISM dataset [21], which is gridded at an 800 m resolution. Furthermore, daily land surface temperature (LST) data from MODIS (MOD11A1) were also acquired at 1 km resolution. LST influences crop growth, as high temperatures can potentially reduce yield. LST also reflects crop health, with higher transpiration rates leading to lower LST via evaporative cooling of the vegetated component of the scene. Although LST is generally considered a remote sensing-based diagnostic of surface conditions and can be downscaled to the crop yield pixel level using high-resolution thermal sensors and spatial sharpening techniques, in this section, we analyzed LST in conjunction with other meteorological variables as a regional indicator to explore its relationship with crop yield. All daily meteorological data (SW, TA, and VPD) and surface temperature (LST) from April to October between 2016 and 2023 were processed into monthly means on the Google Earth Engine (GEE) platform and subsequently extracted over the experiment site. Table 1 presents the crop planting dates for each year, along with the corresponding meteorological variables derived from these processed values. Precipitation and SW represent the accumulation of monthly averages from June to October, while LST, TA, and VPD were calculated as the average of monthly means for July and August.

2.3. Hydro-Topographic Variables and Remote Sensing Data

High-resolution topographic variables can improve the representation of local environmental heterogeneity, thereby enhancing the explanation of yield variability. In this study, a 1 m DEM provided by the U.S. Geological Survey (USGS) [https://www.usgs.gov/3d-elevation-program/about-3dep-products-services] (accessed on 16 December 2024), derived from high-resolution Light Detection and Ranging (LiDAR) data, was downloaded for topographic analysis. Multiple DEM tiles were mosaicked to cover the entire BARC field, as shown in Figure 1. Aspect and slope were calculated from the DEM data, representing the direction and steepness of the terrain, respectively. Aspect was normalized to a range between 0 (north-facing) and 1 (south-facing) using the equation ‘Aspect _norm = 1 − |(aspect − 180)/180|’, representing directional orientation from 0° to 360°. Slope was normalized to a range between 0 and 1 using the transformation ‘Slope _norm= 1 − cos(θ)’, where θ is the slope angle in radians. These normalized variables were used as inputs for the machine learning models for analyzing spatial variability.

Hydrological variables were derived to capture both surface and subsurface water movement patterns. Surface-level flow accumulation was calculated in ArcGIS Pro 3.3 using the 1 m DEM. Flow accumulation indicates areas of potential water concentration and surface runoff. To complete the calculation, the DEM was first preprocessed using the Fill tool to remove sinks and ensure continuous flow paths. Subsequently, the Flow Direction tool was applied to determine water movement directions across the terrain. Finally, the Flow Accumulation tool was used to quantify the number of upstream cells contributing flow to each pixel, thereby identifying zones of concentrated runoff.

In addition, subsurface flow pathways were identified using subsurface DEM data generated by a GPR system equipped with a 150 MHz antenna. The raw data were collected within 25 × 25 m plots along north-to-south transects, with additional data gathered by towing the antenna at 2 m intervals. The data were processed through distance normalization and autocorrelation, then resampled to an 8 × 8 m grid [8]. The calculation of subsurface flow pathways involved an additional step beyond standard flow accumulation—applying a threshold value to define significant pathways. In our study, a threshold value of 8 was applied, as higher values did not yield substantially different results, as noted by Morgan et al. [2].

Furthermore, using both surface and subsurface DEMs, we calculated two additional variables: the depth from the surface to the infiltration-restricting subsurface layer [5] (ranging from 0.7 to 2.7 m across the four fields), and the distance from each crop pixel to the nearest subsurface flow pathway. Figure 4 is a schematic representation of the hydro-topographic variables, including DEMs with contour lines. To enhance visualization, surface flow accumulation values were transformed using a 0.25 power function, which reduced the influence of extreme values caused by pixel convergence in high-resolution data. Throughout the manuscript, the term “hydro” is used as shorthand for this combined set of hydrological variables (flow accumulation, distance, and depth) unless otherwise specified. These hydrological variables, combined with topographic variables (“topo”) such as surface elevation, slope, and aspect, were integrated to comprehensively assess the spatial variability of crop yield.

Along with hydro-topographic variables and meteorological data, remote sensing (RS) data were also acquired to analyze the degree of improvement in prediction model performance when integrating hydro-topographic and RS data. Since this study primarily focuses on yield spatial variability, we aimed to evaluate the contribution of hydro-topographic variables using only the most commonly used vegetation index in RS data—the Normalized Difference Vegetation Index (NDVI). NDVI was derived from Sentinel-2 (10 m) and Landsat (30 m) imagery available on the GEE platform. Sentinel-2 data (Level-2A orthorectified surface reflectance, COPERNICUS/S2_SR_HARMONIZED) with less than 20% cloud cover were primarily used due to their higher spatial resolution; however, Landsat NDVI was used to supplement the time periods or years when Sentinel-2 data were unavailable [22].

2.4. Explanatory Performance and Feature Importance Score

To assess the contribution of hydro-topographic variables to spatial yield variability, the Random Forest (RF) algorithm was implemented on GEE. RF is a non-parametric ensemble method capable of capturing nonlinear relationships and evaluating variable importance [23].

For variable importance, SHAP (SHapley Additive exPlanations) was used to provide a consistent and theoretically grounded interpretation of feature contributions [24]. The general SHAP formulation is given as follows:

ϕ_{i} = \sum_{S \subseteq F ∖ \{i\}} \frac{|S|! (|F| - |S| - 1)!}{|F|!} [f (S \cup \{A\}) - f (S)]

(1)

where

ϕ_{i}

is the SHAP value for feature

i

,

F

is the set of all features, and

S

is a subset of features excluding

i

.

This equation measures the marginal contribution of each feature across all possible feature subsets, ensuring fair attribution based on cooperative game theory principles. The mean absolute SHAP value (

\bar{|ϕ_{i}|}

) was calculated to represent each variable’s general contribution (average impact on model output magnitude). In addition, a SHAP summary plot was generated to visualize the distribution and magnitude of each feature’s impact, and dependencies were used to describe how changes in each variable influence the model output.

Explanatory performance was evaluated using the root mean squared error (RMSE) and relative RMSE (RRMSE), which provide insights into how effectively different combinations of variables explain spatial yield variability. Additionally, the coefficient of determination (r²) was calculated to compare the fit of models across different feature combinations [25]. Although the Random Forest model is nonlinear and R² does not directly represent the percentage of explained variance, it remains a useful indicator for comparing models with the same structure using different sets of input variables. The formulas for RMSE, RRMSE, and R² are as follows:

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{k} - \hat{y_{k}})}^{2}}

(2)

R e l a t i v e R M S E = \frac{R M S E}{\bar{y}} \times 100

(3)

r^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{k} - \hat{y_{k}})}^{2}}{\sum_{i = 1}^{N} {(y_{k} - \bar{y_{k}})}^{2}}

(4)

where N is the total number of observations,

y_{k}

is the observed value for the kth observation,

\hat{y_{k}}

is the predicted value for the kth observation, and

\bar{y_{k}}

is the mean of the actual observed values.

Various combinations of hydro-topographic variables were tested using RF to evaluate their explanatory power for corn and soybean yield variation from 2016 to 2023, with the model design kept consistent and repeated for each year.

2.5. Spatially Normalized Yield Maps and Long-Term Persistent Yield Regions (LTRs)

Through the above process, key hydro-topographic variables that contribute most significantly to yield spatial variability were identified. Crop yield (CY) estimates were generated for each crop year using the per-year RF models based on these selected hydro-topographic variables. To focus on spatial variation while accounting for differences in average yield and crop type across years, all estimated yield values were first normalized ‘

{C Y}_{n o r m}

= (

C Y

−

{C Y}_{m i n}

)/(

{C Y}_{m a x}

−

{C Y}_{m i n}

)

\times

10− 5’, resulting in a range of −5 to 5. Prior to normalization, the top and bottom 2% of values were truncated based on percentile thresholds (i.e., ±2%) to reduce the influence of outliers and noise.

Because hydro-topographic variables are temporally static, combining them with temporally dynamic variables, such as interannual meteorological and remote sensing data, can introduce noise if year-specific spatial anomalies are not properly accounted for. To isolate areas that might be better explained by hydro-topographic variables, this study identifies regions with consistent yield patterns across years, referred to as Long-Term Persistent yield regions (LTRs), utilizing the standard deviation of normalized yield estimates over multiple years. Areas with low standard deviation were interpreted as stable regions that maintained relatively spatial yield characteristics over time [26,27]. This approach helps mitigate the influence of temporal variability inherent in raw crop yield data, which reflects both temporal and spatial fluctuations. As a result, LTRs enable clearer interpretation of spatial yield variability. LTRs are also particularly important to consider when applying hydro-topographic variables in machine learning models.

2.6. Machine Learning-Based Crop Yield Prediction with Hydro-Topo Integration

A better understanding of spatial variation in crop yield can also enhance the accuracy of temporal yield predictions. In this study, we developed a crop yield prediction model using machine learning algorithms within the domain of artificial intelligence (AI). As a preliminary step, several models, including Multiple Linear Regression (MLR), XGBoost (XGB), Random Forest (RF), and Deep Neural Network (DNN), were evaluated, and the most suitable and reliable model was selected for this study. Using the selected ML model, two model configurations were compared across multiple combinations of RS and supplementary data: one model incorporating hydro-topographic variables and the other excluding them.

This analysis, informed by the findings of the previous sections, aims to assess the contribution and potential of hydro-topographic features in improving model performance. The RF model was trained using multi-year historical data and tested on the most recent year. To ensure model robustness and reduce noise from anomalous yield patterns, long-term persistent yield points were used as training samples. By focusing on these spatially stable regions, the model’s generalizability and reliability were significantly improved.

3. Results

3.1. Interannual Relationship Between Crop Yields and Meteorological Variables and LST

As a preliminary analysis before examining the spatial variability in crop yields, the temporal relationships between monthly meteorological variables and crop yield were evaluated using five years of corn data and three years of soybean data. Figure 5 presents the monthly Pearson correlation coefficients (r) between meteorological variables and crop yield from 2016 to 2023. Since the number of years of data for each crop is limited, the soybean yield data were scaled using the average yield ratio between corn and soybean to extend the analysis across eight years. The results show that precipitation remains an important driver throughout the growing season, with the strongest correlation observed in July (r = 0.51) and June (r = 0.45), reflecting the significance of water availability during early reproductive stages. Other variables, particularly LST and TA, demonstrated more consistent associations with yield during the summer months. In August, LST and TA showed strong correlations with yield, at r = 0.65 and r = 0.58, respectively. Overall, SW and VPD showed lower correlations with crop yield but had some association in August (SW: r = 0.32, VPD: r = 0.15).

These findings suggest that summer-season meteorological data, particularly from July and August, have the greatest influence on interannual variability in crop yield in this region, and may therefore be useful for yield prediction models. However, these correlations primarily reflect interannual variability, highlighting the need for a more in-depth investigation to understand the fundamental causes of spatial variability in yield distribution.

3.2. Contribution of Hydro-Topographic Variables to Explaining the Spatial Variation in Crop Yield

We analyzed the contribution of hydro-topographic variables using the RF model because it demonstrated the best performance among the multiple ML models, including MLR, XGB, and DNN, as shown in Appendix A Figure A2. Based on the seven-year averages, RF achieved the highest mean R² (0.82), whereas MLR showed very poor performance (mean R² = 0.07). XGB also performed strongly (mean R² = 0.71), while DNN exhibited only moderate accuracy (mean R² = 0.28). In addition, a correlation analysis was conducted to identify potential collinearity among variables, as shown in Appendix A Figure A3. The results indicate that there is no strong direct relationship between most variables (correlation coefficients mostly < 0.3) and no linear relationships (MLR R² < 0.1), suggesting that linear-based analyses are not suitable for this purpose, even though they may offer stronger interpretability. The RF model was implemented to estimate crop yield using hydro-topographic variables based on 3000 samples each year, demonstrating their explanatory power and contribution to spatial yield variability. As noted above, the model was trained separately for each year to focus on spatial (rather than temporal) variability, using the same network/model structure. Table 2 summarizes the performance of different hydro-topographic variable combinations from 2016 to 2023, including average RMSE for corn (based on four years) and soybean (based on three years), along with relative RMSE (RRMSE) and r² averaged across all years.

The model using only topographic variables (DEM, slope, and aspect) exhibited limited explanatory power, with an average RRMSE of 23.7% (r² = 0.38). However, the inclusion of hydrological variables significantly improved performance, reducing RRMSE by approximately 5–8% depending on the combination. Among the hydrological variables, the subsurface depth variable had the greatest influence, contributing more to explaining spatial yield variability than other hydrological variables. In the full model incorporating subsurface information such as depth and distance, the RRMSE was further reduced to 15.3% (r² = 0.73). Notably, without using subsurface DEM, the combination of flow accumulation derived from surface DEM and topographic variables alone resulted in a 5.3% reduction in RRMSE, improving it to 18.4% (r² = 0.62). This demonstrates that in large-scale cropland areas where subsurface DEMs are not available, meaningful improvements in explanatory power can still be achieved using high-resolution surface DEM data.

Due to differences in average crop yield between corn and soybean, their RMSE values also showed distinct patterns. For corn, the RMSE decreased from approximately 2.7 ton/ha when using only topographic variables to around 1.5 ton/ha when all hydro-topographic variables were included. In contrast, soybean, which exhibits lower yield variability, showed a smaller reduction in RMSE, from about 0.4 t/ha to 0.3 t/ha. Despite this difference in scale, both crops demonstrated comparable improvements in the explanation of spatial yield variability when hydro-topographic variables were incorporated. Given the yield scale differences between the two crops, relative RMSE (RRMSE) and R² were used as the primary metrics to evaluate the contribution of hydro-topographic variables across different years and crop types.

The upper panel (A) of Figure 6 illustrates the importance scores of each variable in the RF model when all hydro-topographic variables were included. DEM consistently exhibited the highest contribution across all years, followed by depth, aspect, and distance. Flow accumulation showed relatively low importance scores, likely due to its high correlation with depth and distance, which reduced its relative contribution within the RF model. The lower panel (B) of Figure 6 provides a more detailed view of each feature’s contribution in relation to its values by SHAP value. In general, the contributions of individual features were highly mixed—certain value ranges increased yield, while other ranges decreased it—highlighting that simple models may not achieve high accuracy. However, DEM and flow accumulation showed patterns that could be interpreted at least approximately: lower DEM values tended to increase yield, whereas higher DEM values were associated with lower yield; higher flow accumulation generally increased yield and vice versa. Nonetheless, the contribution of flow accumulation remained relatively low compared to other features.

It is important to note that these analyses do not represent direct cause–effect relationships between features and yield. Rather, the scores reflect each variable’s overall influence within the model, not its isolated or independent effect on spatial yield variation.

Furthermore, the SHAP dependence for each feature was analyzed, as shown in Figure 7, which illustrates how a feature affects the target variable (crop yield) across the entire dataset. DEM exhibited a highly complex relationship with crop yield, showing both negative and positive contributions depending on specific value ranges. This suggests that DEM and depth influence crop yield patterns through complex interactions among local environmental factors, rather than a simple monotonic effect of elevation. Slope and aspect generally showed negative associations with crop yield at higher values, although their trends were not sharply defined. Distance and flow accumulation displayed patterns that were difficult to interpret. Overall, the dependence plots indicate that spatial variation in crop yield is shaped not by the isolated influence of any single variable, but by the complex interactions among multiple hydro-topographic variables, as summarized in Table 2.

3.3. Estimated Crop Yield Map Using Hydro-Topographic Variables and LTRs

Based on the results discussed in Section 3.2, crop yield maps were generated for each year and then classified into generalized regions of low, medium, and high yield. Figure 8 shows normalized yield maps estimated using four hydro-topographic variables (DEM, slope, aspect, and flow accumulation) without subsurface variables (depth and distance) on the left side, and another on the right side using the full set, including subsurface variables. Note that no meteorological data were used in these models; only static (fixed-in-time) spatial variables were used. Incorporating subsurface hydrological variables resulted in more detailed spatial patterns and improved estimation of spatial variation compared to observed yields. Nonetheless, surface-based hydro-topographic variables alone performed well in capturing spatial yield variability, demonstrating their practicality and effectiveness for large-scale crop yield mapping.

Figure 9 shows the LTRs, derived from the standard deviation of the spatially normalized crop yield maps shown on the right side of Figure 8. The left panel displays LTRs for corn (based on four years of data), the right panel for soybean (based on three years, although only two years provide full spatial coverage), and the center panel shows combined results for both crops. Standard deviation values ranged from 0 to 3, with soybean exhibiting lower values overall due to the limited temporal coverage, resulting in broader spatially stable regions. Across all maps, field boundaries consistently showed lower temporal stability and higher spatial variability. This pattern corresponds with the original crop yield maps in Figure 2, where greater variation was observed near the field edges. These edge effects are likely driven by external environmental influences or animal activity and were consistently observed across years, regardless of crop type. It is important to highlight that stability maps derived from raw crop yield data are often highly sensitive to year-specific spatial anomalies. In contrast, the hydro-topo-based estimated stability maps (Figure 9) are less affected by such temporal irregularities, as these external drivers are not included in the modeling process.

These stable regions are particularly useful for analyzing spatio-temporal variation, as they help minimize year-specific anomalies that can introduce complexity and noise. Moreover, LTRs contribute to improving the accuracy of prediction models that incorporate interannual climate variability by filtering out such anomalies. This is especially important for advanced AI-based models such as ensemble-based machine learning and deep learning approaches, which are susceptible to overfitting due to such noise. In this context, LTRs serve as a valuable foundation for enhancing model robustness and generalizability.

3.4. Improved Capture of Spatial Variability in Crop Yield by Synergistic Use of Hydro-Topo Variables and Remote Sensing Data

To further evaluate the spatial explanatory power of hydro-topo variables and their synergy with RS data, we conducted a comparative analysis using different input combinations. Figure 10 illustrates how different combinations of hydro-topographic and RS data enhance the explanatory power for spatial yield variation, with crop yield estimates from 2020 used as a representative case. The upper panel in Figure 10 shows a bar plot of the coefficient of determination (R²) and the mean RRMSE, with its standard deviation indicated within each bar, while the lower panel presents the estimated crop yield maps based on the corresponding variable combinations. The left three bars represent the results from Table 2, whereas the two bars on the right show the explanatory power when using only remote sensing data (maximum NDVI from July to August) and when integrating it with hydro-topographic variables. Without RS input, the model using only topographic variables yielded an average RRMSE of 23.7% (standard deviation: 7.46%) and an average R² of 0.38 over seven years. Adding flow accumulation derived from the surface DEM reduced the average RRMSE to 18.4% and increased R² to 0.62. Incorporating additional hydro-geomorphic variables, including subsurface topography, further improved model performance (RRMSE = 15.3%, R² = 0.73). RS data (NDVI) alone performed reasonably well in explaining spatial yield variability (RRMSE = 12.3%, R² = 0.81). When combined with hydro-topographic variables, the model’s explanatory accuracy improved even further (RRMSE = 10.0%, R² = 0.88). This indicates that high-resolution hydro-topographic variables enhance the spatial detail and explanatory power of remote sensing data.

When the spatial resolution of the DEM was reduced, explanatory performance declined: the average RRMSE increased from 16.7% (R² = 0.83) at 5 m resolution to 20.1% (R² = 0.76) at 10 m and 21.8% (R² = 0.69) at 30 m. These results suggest that to maximize the benefits of integrating hydro-topographic variables with RS data, their spatial resolution should be finer than that of the crop yield data.

3.5. Improvement of Yield Prediction Accuracy by Synergistic Use of Hydro-Topographic Variables and Remote Sensing Data

The results in Section 3.3 demonstrate that hydro-topographic variables provide strong explanatory power for spatial variability in crop yield within individual years (see Table 2). However, because these variables are static in nature, they are insufficient to reliably predict interannual yield variability on their own. Section 3.4 and Figure 10 show that integrating hydro-topographic variables with RS data improves their ability to explain spatial yield variability. Furthermore, since RS data can reflect year-to-year changes, it is essential to integrate hydro-topographic, remote sensing, and meteorological variables—particularly those linked to temporal yield variation—to enhance the accuracy of yield predictions across years. To evaluate the extended contribution of hydro-topographic variables in this context, an interannual prediction model was developed using a combination of NDVI, hydro-topographic variables, and interannual meteorological data. Based on the results in Section 3.1, summer (July and August) meteorological data were considered, which showed the strongest associations with yield. Input variables included accumulated precipitation and mean values of LST, SW, TA, and VPD, used in combination with NDVI and hydro-topographic variables. Due to the limited availability of soybean data (only three years), the interannual prediction analysis was conducted using corn yield data only.

Table 3 summarizes the 2020 corn yield prediction results, using training data from 2016, 2017, and 2019. For model training, 3000 sample points were randomly selected within the LTRs from each of the three training years, and these were combined to prepare the training dataset before applying the model to the target year. Models were constructed with different combinations of meteorological variables and NDVI, with and without hydro-topographic variables, and evaluated using a randomly selected prediction–observation comparison on the target dataset, calculating R², RMSE, and RRMSE. Overall, the inclusion of hydro-topographic variables reduced prediction error by an average of 4.5%. Among the meteorological variables, the combination of precipitation and LST produced the best model performance, surpassing even models that included all meteorological inputs. Note that the predictive performance of the interannual model varied not only by the combination of input variables but also by the specific month (s) or seasonal span used.

Figure 11 presents the workflow for the 2020 corn yield prediction along with the predicted crop yield maps generated using different combinations of RS data and hydro-topographic variables. From left to right, the figure shows the observed yield map for 2020, the yield map from the training years based on hydro-topographic variables, the normalized spatial variation map used to identify LTRs, the predicted yield using NDVI and meteorological variables (excluding hydro-topographic variables), and the predicted yield using NDVI, meteorological variables, and hydro-topographic variables. Although NDVI showed a very high spatial correlation with crop yield within individual years, its interannual correlation was low, resulting in reduced prediction accuracy. This result demonstrates that combining hydro-topographic variables improves prediction accuracy, primarily due to enhanced spatial resolution and better representation of spatial yield variability.

4. Discussion

4.1. Scalability of Hydro-Topographic Variables

For sustainable agricultural management, analyzing spatiotemporal variability of crop yields and applying this analysis appropriately to the field is essential for optimizing yield and environmental benefits [7]. For more practical applications, model scalability across wider regions is required. Model scalability refers to the ability of a model to maintain performance and efficiency when applied to larger datasets or broader geographic areas. Model upscaling requires a comprehensive approach that considers not only the accuracy of analysis and modeling but also computational efficiency and interpretability.

First, in terms of wide-region scalability, the hydro-topographic variables derived from the subsurface DEM (such as depth and distance) used in this study will have very limited practical utility, even though they showed strong explanatory power in explaining spatial yield variability. This is because such information is costly and time-consuming to collect and is not widely available; on the other hand, high-resolution surface DEMs (at a 1 m or sub-meter resolution) are widely accessible and updated in the United States and many other countries, making them more suitable for wide-area applications [18]. Therefore, for wide area mapping, it is appropriate to focus on using high-resolution surface topographic variables and the flow accumulation derived from them. It is important to note that for improving within-field spatial analysis and yield prediction, calculating flow accumulation over an entire watershed at high resolution may be unnecessary and computationally excessive. Instead, it is more practical to compute flow accumulation within field-scale or nearby localized areas, which better reflect the relevant topographic influences at the scale of interest.

Second, in terms of model accuracy and interpretability, crop-specific modeling is more effective than modeling multiple crops simultaneously. Although hydro-topographic variables provide strong spatial explanatory power regardless of crop type, their sensitivity to crop responses varies. As a result, spatial variability may appear differently within the same field, depending on the crop planted and the prevailing weather conditions. When applying advanced machine learning models with many parameters, overfitting or misinterpretation can occur if variables and crop-specific sensitivities are not properly considered. Advanced machine learning and deep learning models typically achieve high accuracy but often rely on complex combinations of variables [28,29]. In this study, the Random Forest model was implemented using GEE’s default hyperparameters, typically involving 20–50 trees and tree depths ranging from 10 to 20. While such complexity enhances predictive performance, it reduces interpretability and hinders practical, field-level decision-making. For instance, straightforward questions such as “Will a high-slope, low-elevation area produce low or high yields?” cannot be easily answered. Therefore, complementary research focusing on simpler or more interpretable models is necessary to balance high accuracy with actionable, field-level insights, whether using hydro-topographic variables or remote sensing-based inputs.

It should be highlighted that these trade-offs between interpretability and model performance for scalability in crop yield mapping should also be considered in general remote sensing applications, especially when adopting advanced machine learning techniques. Figure 12 illustrates the SHAP dependence plots for NDVI (representing RS data), DEM (representing topographic variables), and distance (representing hydrological variables) based on the seven-year average. Despite NDVI showing high explanatory power with a generally positive relationship at high NDVI (Figure 12A), its interannual correlation with crop yield (July r = 0.22 and August r = 0.08) was considerably lower than that of other meteorological variables, as shown in Figure 5. This indicates that while NDVI effectively captures intra-annual yield variation, it does not adequately represent interannual yield variability, which involves more complex interactive relationships with weather conditions and crop sensitivity. This explains why prediction models using RS data alone may not achieve high accuracy. Furthermore, the relationship between NDVI and hydro-topo variables is also complex and non-linear, as shown in Figure 12B–D, rather than following a simple pattern similar to crop yield. These non-linear interactions between RS and hydrological variables complicate interpretation but also indicate their potential complementarity when combined to provide additional information. Such relationships, when integrated with meteorological data, should be the focus of ongoing research to improve not only accuracy but also interpretability, thereby enabling better scalability of yield prediction models.

Additionally, to address both computational efficiency and interpretability, applying regionalization methods based on climatic and ecological similarity can improve model performance [29,30]. This strategy is effective not only in environmental modeling but also in agriculture, as it allows for adaptive modeling tailored to different climate zones, which vary in species composition and phenological characteristics over large areas.

4.2. Further Improvements and Diverse Scenarios for Integrated Modeling

Further research is needed to explore diverse integration strategies that combine hydro-topographic, remote sensing (RS), and meteorological data to enhance model performance and applicability. The integration of hydro-topographic and RS data enhances model performance and applicability through spatiotemporal synergy. In particular, such integrated models can be implemented in various ways. Figure 13 illustrates the diverse integration scenarios for constructing spatial variability and yield prediction models using hydro-topographic (hydro-topo), remote sensing (RS), and climate (or meteorological) data. Figure 13A presents the same integration method described in Section 3.3, where hydro-topographic and RS data are applied together as independent features in a single model. In contrast, Figure 13B,C—which build upon the spatial variation maps shown in Section 3.4—provide examples of pre-integrating hydro-topographic and RS data to improve the explanatory power for spatial variability, which are then combined with climate data in the prediction model. The difference between (B) and (C) lies in their modeling strategy: (B) learns yield patterns from data across multiple years before making predictions, while (C) builds a model for each year and applies an ensemble technique for prediction.

These integration techniques can significantly contribute to improving the accuracy and spatial resolution of RS-based yield prediction models. In addition, integrating remote sensing data from different sensor types has the potential to enhance spatial modeling. Water-related indices such as the normalized difference water index (NDWI) can be particularly informative when combined with hydrogeographic variables [31]. Evapotranspiration (ET), known for its strong correlation with crop yield, along with the Evaporative Stress Index (ESI), further supports this relationship [32,33]. Additionally, Synthetic Aperture Radar (SAR) data, which are sensitive to vegetation structure, soil moisture, and topography, provide valuable inputs [34]. Integrating SAR-based and optical indices can complement hydro-topographic variables and improve model robustness across diverse environmental conditions [23,35].

This study was conducted in a small experimental field at BARC, with crop yield data limited in duration. In addition, the subsurface DEM data used here are limited in both spatial coverage and temporal update frequency, which may not fully capture dynamic subsurface changes over time. In work in progress, analyses using surface hydro-topo variables are being extended to the full BARC yield archive, which includes 43 fields over 12 years. We expect reliability to improve as more years of yield data are added to the living archive, sampling a wider range in climatic conditions and more years per crop type for training. Long-term and sustained research efforts are necessary to validate and refine these approaches across broader geographic regions [36]. These efforts will expand the utility of hydro-topographic variables, enabling more robust, scalable, and interpretable yield predictions across a variety of agricultural environments.

5. Conclusions

This study demonstrates the critical contribution of hydro-topographic variables in explaining spatial variability in corn and soybean yields under uniform management practices. Incorporating high-resolution surface and subsurface DEMs derived hydro-topographic variables with remote sensing significantly improved both the explanatory power for spatial yield variation and prediction accuracy. Long-term persistent yield regions (LTRs), derived from spatially normalized crop yield maps estimated using hydro-topographic variables, helped reduce spatial anomalies, while normalization minimized temporal variability and enhanced model robustness by mitigating residual spatial inconsistencies. These results emphasize the potential for integrating hydro-topographic and remote sensing data to develop precise, scalable management strategies, thereby optimizing productivity and resource efficiency in agricultural systems.

Author Contributions

Conceptualization, J.G.C., M.A., and F.G.; Methodology, J.G.C.; Software, J.G.C.; Validation, J.G.C., M.A., and F.G.; Formal analysis, J.G.C.; Investigation, J.G.C., A.R., M.A., F.G., and D.M.J.; Resources, J.G.C. and A.R.; Data curation, J.G.C., M.A., F.G., D.M.J., H.Z., R.C. and Y.P.; Writing—original draft, J.G.C. and M.A.; Writing—review & editing, M.A., F.G., A.R., R.C., Y.P. and D.M.J.; Visualization, J.G.C.; Supervision, M.A. and F.G.; Project administration, M.A. and F.G.; Funding acquisition, M.A., F.G., and J.G.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was provided by the Foundation for Food and Agriculture Research (FFAR grant number: Dsnew-0000000028) and conducted as part of the Long-Term Agroecosystem Research (LTAR) network, which is supported by the United States Department of Agriculture.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The USDA is an equal opportunity provider and employer. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Figure A1. Workflow illustrating the data collection, preprocessing, and feature extraction steps used to assess the spatial variability in crop yield [2,8].

Figure A2. Machine learning model comparison for crop yield from 2016 to 2023 using hydro-topographic variables, showing the average performance across seven years and the range of R² and MAE values for Multiple Linear Regression (MLR), Random Forest (RF), XGBoost (XGB), and Deep Neural Network (DNN) models. The results highlight the superior performance of RF across most years, while MLR generally exhibits the lowest accuracy.

Figure A3. Correlations between hydro-topo variables and crop yield.

References

Kim, S.; Daughtry, C.; Russ, A.; Pedrera-Parrilla, A.; Pachepsky, Y. Analysis of Spatiotemporal Variability of Corn Yields Using Empirical Orthogonal Functions. Water 2020, 12, 3339. [Google Scholar] [CrossRef]
Morgan, B.J.; Daughtry, C.S.T.; Russ, A.L.; Dulaney, W.P.; Gish, T.J.; Pachepsky, Y.A. Effect of Shallow Subsurface Flow Pathway Networks on Corn Yield Spatial Variation under Different Weather and Nutrient Management. Int. Agrophys. 2019, 33, 271–276. [Google Scholar] [CrossRef] [PubMed]
Dulaney, W.P.; Anderson, M.C.; Gao, F.; Stern, A.; Moglen, G.; Meyers, G.; Daughtry, C.S.T.; White, W.; Akumaga, U.; Showalter, J. Development of a Gridded Yield Data Archive for Farm Management and Research at the USDA Beltsville Agricultural Research Center. Agrosyst. Geosci. Environ. 2024, 7, e20474. [Google Scholar] [CrossRef]
Keeney, D.R.; DeLuca, T.H. Des Moines River Nitrate in Relation to Watershed Agricultural Practices: 1945 versus 1980s. J. Environ. Qual. 1993, 22, 267–272. [Google Scholar] [CrossRef]
Gish, T.J.; Walthall, C.L.; Daughtry, C.S.T.; Kung, K. Using Soil Moisture and Spatial Yield Patterns to Identify Subsurface Flow Pathways. J. Environ. Qual. 2005, 34, 274–286. [Google Scholar] [CrossRef]
De Lannoy, G.J.M.; Verhoest, N.E.C.; Houser, P.R.; Gish, T.J.; Van Meirvenne, M. Spatial and Temporal Characteristics of Soil Moisture in an Intensively Monitored Agricultural Field (OPE3). J. Hydrol. 2006, 331, 719–730. [Google Scholar] [CrossRef]
Gao, F.; Anderson, M.; Daughtry, C.; Johnson, D. Assessing the Variability of Corn and Soybean Yields in Central Iowa Using High Spatiotemporal Resolution Multi-Satellite Imagery. Remote Sens. 2018, 10, 1489. [Google Scholar] [CrossRef]
Gish, T.J.; Dulaney, W.P.; Kung, K.-J.; Daughtry, C.S.T.; Doolittle, J.A.; Miller, P.T. Evaluating Use of Ground-penetrating Radar for Identifying Subsurface Flow Pathways. Soil Sci. Soc. Am. J. 2002, 66, 1620–1629. [Google Scholar] [CrossRef]
Herrmann, I.; Pimstein, A.; Karnieli, A.; Cohen, Y.; Alchanatis, V.; Bonfil, D.J. LAI Assessment of Wheat and Potato Crops by VENμS and Sentinel-2 Bands. Remote Sens. Environ. 2011, 115, 2141–2151. [Google Scholar] [CrossRef]
Chang, G.J.; Oh, Y.; Goldshleger, N.; Shoshany, M. Biomass Estimation of Crops and Natural Shrubs by Combining Red-Edge Ratio with Normalized Difference Vegetation Index. J. Appl. Remote. Sens. 2022, 16, 014501. [Google Scholar] [CrossRef]
Chang, G.J.; Oh, Y.; Shoshany, M. Biomass Estimation along a Climatic Gradient Using Multi-Frequency Polarimetric Radar Vegetation Index. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, V-3-2022, 369–374. [Google Scholar] [CrossRef]
Ulaby, F.T.; Batlivala, P.P.; Dobson, M.C. Microwave Backscatter Dependence on Surface Roughness, Soil Moisture, and Soil Texture: Part I-Bare Soil. IEEE Trans. Geosci. Electron. 1978, 16, 286–295. [Google Scholar] [CrossRef]
Nawar, S.; Corstanje, R.; Halcro, G.; Mulla, D.; Mouazen, A.M. Delineation of Soil Management Zones for Variable-Rate Fertilization: A Review. Adv. Agron. 2017, 143, 175–245. [Google Scholar]
Kalita, P.K.; Kanwar, R.S. Effect of Water-Table Management Practices on the Transport of Nitrate-N to Shallow Groundwater. Trans. Am. Soc. Agric. Eng. 1993, 36, 413–422. [Google Scholar] [CrossRef]
Kung, K.J.S. Preferential Flow in a Sandy Vadose Zone: 2. Mechanism and Implications. Geoderma 1990, 46, 59–71. [Google Scholar] [CrossRef]
Kitchen, N.R.; Blanchard, P.E.; Hughes, D.F.; Lerch, R.N. Impact of Historical and Current Farming Systems on Groundwater Nitrate in Northern Missouri. J. Soil Water Conserv. 1997, 52, 272–277. [Google Scholar] [CrossRef]
Zhu, Q.; Lin, H.S. Simulation and Validation of Concentrated Subsurface Lateral Flow Paths in an Agricultural Landscape. Hydrol. Earth Syst. Sci. 2009, 13, 1503–1518. [Google Scholar] [CrossRef]
Arundel, S.T.; Phillips, L.A.; Lowe, A.J.; Bobinmyer, J.; Mantey, K.S.; Dunn, C.A.; Constance, E.W.; Usery, E.L. Preparing The National Map for the 3D Elevation Program–Products, Process and Research. Cartogr. Geogr. Inf. Sci. 2015, 42, 40–53. [Google Scholar] [CrossRef]
O’Geen, A.; Walkinshaw, M.; Beaudette, D. SoilWeb: A Multifaceted Interface to Soil Survey Information. Soil Sci. Soc. Am. J. 2017, 81, 853–862. [Google Scholar] [CrossRef]
Beaudette, D.E.; O’Geen, A.T. Soil-Web: An Online Soil Survey for California, Arizona, and Nevada. Comput. Geosci. 2009, 35, 2119–2128. [Google Scholar] [CrossRef]
Daly, C.; Taylor, G.H.; Gibson, W.P.; Parzybok, T.W.; Johnson, G.L.; Pasteris, P.A. High-Quality Spatial Climate Data Sets for the United States and Beyond. Trans. Am. Soc. Agric. Eng. 2000, 43, 1957–1962. [Google Scholar] [CrossRef]
Masek, J.; Ju, J.; Roger, J.; Skakun, S.; Vermote, E.; Claverie, M.; Dungan, J.; Yin, Z.; Freitag, B.; Justice, C. HLS Operational Land Imager Surface Reflectance and TOA Brightness Daily Global 30 m v2.0; NASA Land Processes Distributed Active Archive Center: Sioux Falls, SD, USA, 2021. [Google Scholar]
Chang, J.G.; Kraatz, S.; Anderson, M.; Gao, F. Enhanced Polarimetric Radar Vegetation Index and Integration with Optical Index for Biomass Estimation in Grazing Lands Across the Contiguous United States. Remote Sens. 2024, 16, 4476. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 1–10. [Google Scholar]
Spiess, A.-N.; Neumeyer, N. An Evaluation of R² as an Inadequate Measure for Nonlinear Models in Pharmacological and Biochemical Research: A Monte Carlo Approach. BMC Pharmacol. 2010, 10, 6. [Google Scholar] [CrossRef] [PubMed]
Zhao, J.; Yang, X. Distribution of High-Yield and High-Yield-Stability Zones for Maize Yield Potential in the Main Growing Regions in China. Agric. For. Meteorol. 2018, 248, 511–517. [Google Scholar] [CrossRef]
Kucharik, C.J.; Ramiadantsoa, T.; Zhang, J.; Ives, A.R. Spatiotemporal Trends in Crop Yields, Yield Variability, and Yield Gaps across the USA. Crop Sci. 2020, 60, 2085–2101. [Google Scholar] [CrossRef]
Chang, G.J. Biodiversity Estimation by Environment Drivers Using Machine/Deep Learning for Ecological Management. Ecol. Inform. 2023, 78, 102319. [Google Scholar] [CrossRef]
Chang, J.G.; Gao, F.; Anderson, M.; Cirone, R.; Zhao, H. Regionalization Analysis of Environmental Drivers of CONUS Grazing Land Biomass. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 12634–12644. [Google Scholar] [CrossRef]
Loveland, T.R.; Merchant, J.M. Ecoregions and Ecoregionalization: Geographical and Ecological Perspectives. Environ. Manag. 2004, 34, S1–S13. [Google Scholar] [CrossRef]
Gao, B.-C. NDWI—A Normalized Difference Water Index for Remote Sensing of Vegetation Liquid Water from Space. Remote Sens. Environ. 1996, 58, 257–266. [Google Scholar] [CrossRef]
Anderson, M.; Kustas, W. Thermal Remote Sensing of Drought and Evapotranspiration. Eos Trans. Am. Geophys. Union 2008, 89, 233–234. [Google Scholar] [CrossRef]
Anderson, M.C.; Allen, R.G.; Morse, A.; Kustas, W.P. Use of Landsat Thermal Imagery in Monitoring Evapotranspiration and Managing Water Resources. Remote Sens. Environ. 2012, 122, 50–65. [Google Scholar] [CrossRef]
Chang, J.G.; Oh, Y.; Shoshany, M. Soil Moisture Mapping Along Climatic Gradient by Dual-Polarization Sentinel-1 C-Band Data. IEEE Geosci. Remote Sens. Lett. 2023, 20, 2500205. [Google Scholar] [CrossRef]
Chang, J.; Shoshany, M. Mediterranean Shrublands Biomass Estimation Using Sentinel-1 and Sentinel-2. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 5300–5303. [Google Scholar]
Bean, A.R.; Coffin, A.W.; Arthur, D.K.; Baffaut, C.; Holifield Collins, C.; Goslee, S.C.; Ponce-Campos, G.E.; Sclater, V.L.; Strickland, T.C.; Yasarer, L.M. Regional Frameworks for the USDA Long-Term Agroecosystem Research Network. Front. Sustain. Food Syst. 2021, 4, 612785. [Google Scholar] [CrossRef]

Figure 1. The Optimizing Production Inputs for Economic and Environmental Enhancement (OPE3) experiment site at the Beltsville Agricultural Research Center (BARC), showing the DEM referenced to mean sea level and enlarged views over the OPE3 fields of aspect, slope, and soil mapping unit from SoilWeb.

Figure 2. Crop yield for corn ( Remotesensing 17 03061 i001

) and soybean ( Remotesensing 17 03061 i002

) in the OPE3 site from 2016 to 2023, at a 5 m spatial resolution. Yield data are missing for all fields in 2021 and for Fields C and D in 2023. The soybean data from 2022 and 2023 were combined across Fields A, B, C, and D to provide full site coverage.

Figure 2. Crop yield for corn ( Remotesensing 17 03061 i001

) and soybean ( Remotesensing 17 03061 i002

) in the OPE3 site from 2016 to 2023, at a 5 m spatial resolution. Yield data are missing for all fields in 2021 and for Fields C and D in 2023. The soybean data from 2022 and 2023 were combined across Fields A, B, C, and D to provide full site coverage.

Figure 3. Average crop yield across four fields at the OPE3 site from 2016 to 2023.

Figure 4. Schematic illustration of hydro-topographic variables. The upper panel shows, from left to right, the surface DEM, a schematic figure, and the subsurface DEM following the subsurface flow pathway (threshold > 8). The lower panels show, from left to right, the surface flow accumulation derived from surface DEM (transformed using the 0.25 power to improve visualization of high values caused by pixel convergence in high-resolution data), the depth from the surface to the subsurface infiltration-restricting layer, and the Euclidean distance from each crop yield pixel to the nearest subsurface flow pathway.

Figure 5. Monthly correlation coefficients (r) between meteorological variables and crop yield from 2016 to 2023, based on five years of corn data and three years of soybean data (adjusted by yield ratio).

Figure 6. Variable importance scores (mean and standard deviation across multiple years) based on SHAP. The upper panel (A) shows the general contribution based on the mean absolute SHAP values, and the lower panel (B) presents SHAP values illustrating each feature’s impact (positive values indicate an increase in the feature’s effect, while negative values indicate a decrease).

Figure 7. SHAP dependence plots for each feature, representing how a feature affects the target variable (‘crop yield’) across the entire dataset: (A) DEM, (B) Depth, (C) Aspect, (D) Distance, (E) Slope, and (F) Flow accumulation.

Figure 8. Spatially normalized crop yield maps estimated using surface hydro-topographic variables (DEM, slope, aspect, and flow accumulation) (left) and both surface and subsurface hydro-topographic variables (including depth and distance) (right).

Figure 9. Long-term persistent yield regions (LTRs) were identified based on the standard deviation of the multi-year normalized spatial variation maps (range: −5 to 5) as shown in Figure 8. The standard deviation values are mapped on a scale from 0 to 3, where lower values indicate higher stability across years. The values are categorized into six classes: 0–0.5 (Very Low), 0.5–1.0 (Low), 1.0–1.5 (Moderate), 1.5–2.0 (High), 2.0–2.5 (Very High), and 2.5–3.0 (Extremely High). LTRs were extracted separately for corn (four years) (A), soybean (three years) (B), and both crops combined (seven years) (C).

Figure 10. Improvement of explanatory accuracy of spatial variation in crop yields (in 2020) corresponding to different combinations of hydro-topographic and remote sensing data. The bar plot shows the coefficient of determination (R²), with relative RMSE (%) and its standard deviation range indicated within each bar.

Figure 11. Workflow and results of the 2020 corn yield prediction using long-term persistent yield pixels. From left to right: observed crop yield in 2020, yearly estimated yield maps based on hydro-topographic variables, spatially normalized modeled crop yield based on LTRs, predicted yield using NDVI and meteorological variables (excluding hydro-topographic variables), and predicted yield using NDVI, meteorological variables, and hydro-topographic variables.

Figure 12. SHAP dependence plots showing the relationships between NDVI (representing RS data), DEM (representing topographic variables), and distance (representing hydrological variables): (A) NDVI SHAP values; (B) interaction between NDVI and DEM; (C) interaction between NDVI and distance; (D) interaction between DEM and distance.

Figure 13. Three conceptual scenarios for integrating hydro-topographic (hydro-topo), remote sensing (RS), and climate data to enhance spatial variability analysis and yield prediction. (A) Hydro-topo and RS data used together as independent features within a single model; (B) pre-integration of hydro-topo and RS data to enhance explanatory power for spatial variability, combined with climate data and trained across multiple years; (C) pre-integration of hydro-topo and RS data with climate data in year-specific models, followed by an ensemble prediction. AI (ML/DL) refers to artificial intelligence techniques (machine learning and deep learning), and CYM and LTR are the estimated spatial crop yield map and the long-term persistent yield regions, respectively.

Table 1. Crop planting dates from 2016 to 2023 with corresponding meteorological data. Precipitation represents accumulated values from June to October, while shortwave radiation (SW), surface temperature (LST), air temperature (TA), and vapor pressure deficit (VPD) are averaged over July and August.

Planting Date (Crop)	Precipitation (mm)	SW (W/m²)	LST (°C)	TA (°C)	VPD (hPa)
28 May 2016 (Corn)	463.2	769	29.4	24.9	21.8
10 June 2017 (Corn)	545.0	740	28.3	23.4	19.7
18 June 2018 (Soybean)	906.2	688	30.8	23.8	20.8
8 June 2019 (Corn)	474.0	773	29.7	25.9	23.3
20 May 2020 (Corn)	831.5	771	29.8	25.9	20.8
1 June 2022 (Soybean)	691.3	699	29.6	25.2	21.1
8 June 2023 (Soybean)	455.4	714	27.6	24.8	21.5

Table 2. Explanatory powers of different hydro-topographic variable combinations for spatial variation in crop yield from 2016 to 2023. The table presents the mean r² and relative RMSE (RRMSE) across all years, along with the average RMSE for corn (based on four years) and soybean (based on three years).

Hydro-Topographic Variables	Avg. RMSE Corn (t/ha)	Avg. RMSE Soybean (t/ha)	Avg. RRMSE	Avg. r²
Topo: DEM, slope, aspect	2.29	0.42	23.7%	0.38
Topo and flowAccum	1.77	0.33	18.4%	0.62
Topo and distance	1.58	0.30	16.5%	0.69
Topo and depth	1.53	0.29	15.9%	0.71
Topo and Hydro	1.47	0.28	15.3%	0.73
Hydro: flowAccum, distance, depth	2.10	0.40	22.0%	0.46

Table 3. Prediction performance for 2020 corn yield using different combinations of NDVI, hydro-topographic variables, and meteorological variables, with corresponding relative RMSE (%) and R² values.

Feature Variables	Model Accuracy
Feature Variables	RRMSE	R²
Prec., NDVI	33.9%	0.28
Prec., NDVI, Hydro-topo	29.8%	0.45
Prec., DSR, NDVI	37.0%	0.15
Prec., DSR, NDVI, Hydro-topo	29.8%	0.45
Prec., LST, NDVI	33.0%	0.32
Prec., LST, NDVI, Hydro-topo	28.2%	0.50
Prec., LST, TA, DSR, VPD, NDVI	32.6%	0.34
Prec., LST, TA, DSR, VPD, NDVI, Hydro-topo	30.3%	0.43
Average without Hydro-topo	34.1%	0.27
Average with Hydro-topo	29.6%	0.46

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chang, J.G.; Anderson, M.; Gao, F.; Russ, A.; Zhao, H.; Cirone, R.; Pachepsky, Y.; Johnson, D.M. Hydro-Topographic Contribution to In-Field Crop Yield Variation Using High-Resolution Surface and GPR-Derived Subsurface DEMs. Remote Sens. 2025, 17, 3061. https://doi.org/10.3390/rs17173061

AMA Style

Chang JG, Anderson M, Gao F, Russ A, Zhao H, Cirone R, Pachepsky Y, Johnson DM. Hydro-Topographic Contribution to In-Field Crop Yield Variation Using High-Resolution Surface and GPR-Derived Subsurface DEMs. Remote Sensing. 2025; 17(17):3061. https://doi.org/10.3390/rs17173061

Chicago/Turabian Style

Chang, Jisung Geba, Martha Anderson, Feng Gao, Andrew Russ, Haoteng Zhao, Richard Cirone, Yakov Pachepsky, and David M. Johnson. 2025. "Hydro-Topographic Contribution to In-Field Crop Yield Variation Using High-Resolution Surface and GPR-Derived Subsurface DEMs" Remote Sensing 17, no. 17: 3061. https://doi.org/10.3390/rs17173061

APA Style

Chang, J. G., Anderson, M., Gao, F., Russ, A., Zhao, H., Cirone, R., Pachepsky, Y., & Johnson, D. M. (2025). Hydro-Topographic Contribution to In-Field Crop Yield Variation Using High-Resolution Surface and GPR-Derived Subsurface DEMs. Remote Sensing, 17(17), 3061. https://doi.org/10.3390/rs17173061

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hydro-Topographic Contribution to In-Field Crop Yield Variation Using High-Resolution Surface and GPR-Derived Subsurface DEMs

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Crop Yield and Meteorological Data

2.3. Hydro-Topographic Variables and Remote Sensing Data

2.4. Explanatory Performance and Feature Importance Score

2.5. Spatially Normalized Yield Maps and Long-Term Persistent Yield Regions (LTRs)

2.6. Machine Learning-Based Crop Yield Prediction with Hydro-Topo Integration

3. Results

3.1. Interannual Relationship Between Crop Yields and Meteorological Variables and LST

3.2. Contribution of Hydro-Topographic Variables to Explaining the Spatial Variation in Crop Yield

3.3. Estimated Crop Yield Map Using Hydro-Topographic Variables and LTRs

3.4. Improved Capture of Spatial Variability in Crop Yield by Synergistic Use of Hydro-Topo Variables and Remote Sensing Data

3.5. Improvement of Yield Prediction Accuracy by Synergistic Use of Hydro-Topographic Variables and Remote Sensing Data

4. Discussion

4.1. Scalability of Hydro-Topographic Variables

4.2. Further Improvements and Diverse Scenarios for Integrated Modeling

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI