*3.3. Relationships between STP and FWM P Concentrations*

Positive relationships were observed between STP in both sampling depths and the FWM DRP concentrations in both surface runoff and tile drainage (Figure 1). For surface runoff DRP, similar regression slopes (0.015 vs. 0.016) were observed from relationships with shallow vs. agronomic STP, indicating that increases in agronomic or shallow STP resulted in similar increases in FWM DRP concentration (Table 3). However, shallow STP explained more variation in surface DRP concentration (R2 of 0.31 vs. 0.19) and improved the predictions of surface DRP concentration (RMSE of 0.66 vs. 0.72) compared to agronomic STP. For tile drainage DRP, regression slopes were also similar for relationships using both soil sampling depths (0.014 vs. 0.016). Greater variation in tile drainage DRP was explained with shallow STP (R<sup>2</sup> of 0.44 vs. 0.32) and predictions likewise improved (0.56 vs. 0.62) (Table 2). Comparing R<sup>2</sup> and RMSE values between surface and tile DRP models demonstrated that STP, measured from shallow and agronomic depths, was a stronger predictor of tile drainage DRP than surface DRP.

Flow-weighted mean TP concentrations in surface runoff and tile drainage was also positively related to STP in both sampling depths (Figure 2). However, the R2 of the regression models indicated that less variation in TP concentrations was explained by shallow and agronomic STP compared to DRP (Table 3). As observed with DRP, the slopes of both the surface runoff and tile drainage TP regressions were similar for agronomic and shallow depth samples. More variation in surface runoff TP was explained with shallow STP compared to agronomic STP (R<sup>2</sup> 0.26 and 0.21, respectively), but prediction accuracy similar (RMSE 0.44 and 0.45, respectively). The tile drainage TP showed similar patterns as surface runoff, with greater variation explained by shallow STP compared to agronomic STP (R2 0.19 and 0.11) and similar prediction accuracy (RMSE of 0.50 and 0.53). In general, differences between the predictive power of agronomic and shallow depth samples were lesser for TP than those observed for DRP.

**Figure 1.** Regression relationships between soil test P (STP) and flow-weighted mean dissolved reactive P (FWM DRP) concentrations. Surface runoff FWM DRP vs. STP in the (**A**) agronomic soil horizon (0–20 cm) and (**B**) the shallow soil horizon (0–5 cm). Tile drainage FWM DRP vs. STP in the (**C**) agronomic soil horizon and the (**D**) shallow soil horizon. Regressions were performed on log transformed runoff P concentrations; data are plotted in original units with log-scale x-axis.

**Table 3.** Regression results for field average soil test P vs. flow weighted mean dissolved reactive P (DRP) and total P (TP) concentrations. Regressions were performed on natural log transformed FWM P concentrations.


\* P concentration data were log transformed prior to regression; all regression slopes and intercepts were significant at the *p* < 0.01 level.

Additionally, the maximum agronomic and shallow STP values in each field were also tested as a predictors of EoF P concentrations (Table S3). Maximum STP at both sampling depths was positively and significantly related to DRP and TP concentrations in surface runoff and tile drainage (regressions; R<sup>2</sup> range 0.07 to 0.38, *p* < 0.05 for all). However, the field average shallow and agronomic STP model explained more variation and improved predictions of EoF P concentration data in all cases.

Residuals of the regression of DRP concentration against agronomic STP were positively correlated with Pstrat for both surface runoff (r = 0.34, *p* = 0.01) and tile drainage (r = 0.36, *p* < 0.001; Figure 3). A significant positive correlation between residual TP and Pstrat was also observed in tile drainage (r = 0.28, *p* < 0.01) but not surface runoff (Figure 4). The positive correlations indicate that fields with greater Pstrat tended to also have more positive residuals, i.e., runoff P concentrations were prone to

underprediction by the agronomic STP. Likewise, runoff P concentrations in fields with lesser Pstrat were prone to over-prediction by agronomic STP.

**Figure 2.** Regression relationships between soil test P (STP) and flow weighted mean total P (FWM TP) concentrations (natural log transformed). Surface runoff FWM TP vs. STP for the (**A**) agronomic soil horizon (0–20 cm) and the (**B**) surface cm soil horizon (0–5 cm). Tile drainage FWM TP vs. STP for the (**C**) agronomic soil horizon and the (**D**) surface soil horizon. Regressions were performed on log transformed runoff P concentrations; data are plotted in original units with log-scale x-axis.

**Figure 3.** Correlation between residual FWM DRP (from regression of agronomic STP (0–20 cm depth) vs. natural log transformed FWM DRP concentrations) and the P stratification ratio for tile drainage (**A**) and surface runoff (**B**).

**Figure 4.** Correlation between residual FWM TP (from regression of agronomic STP (0–20 cm depth) vs. natural log transformed FWM TP concentrations) and the P stratification ratio for tile drainage (**A**) and surface runoff (**B**).

Phosphorus stratification ratio was a significant factor when added to the regressions of agronomic STP vs. EoF P concentrations for surface DRP, tile DRP, and tile TP concentrations (Table 4). Interactions between STP and Pstrat were not significant and so were not included in the final model. However, Pstrat was not significant when added to the surface TP model. Model fit and explanatory power of the two factor models was similar or slightly better to that of the single factor shallow sample models (Table 3). In addition, comparison of residuals from the agronomic STP models to residuals from the agronomic STP + Pstrat models showed that the improvement in fit provided by addition of Pstrat was not related to soil texture class, field slope, STP, or average daily discharge for either DRP or TP (ANOVA and regressions; *p* > 0.05 for all; data not shown).

**Table 4.** Multiple linear regression results predicting edge-of-field FWM P concentrations with agronomic soil test P (STP) and P stratification ratio (Pstrat). Regressions were performed on natural log transformed FWM P concentrations.


\* Model factors were significant at the *p* < 0.01 level.

### **4. Discussion**

#### *4.1. Soil Test P*

Soil test P values and Pstrat of individual fields were highly dynamic over the 3 years of this study, highlighting the importance of frequent soil sampling. A large portion of the observed STP changes between sampling events was presumably due to inherent within-field spatial variability in STP. This study used field average STPs derived from multiple samples that were taken at a sampling intensity (average of 1 sample per 1.5 ha) similar to that of commercial agronomic soil sampling in the WLEB. Other studies have shown agronomic STP frequently varies dramatically at the scale of 10 s of meters [36–38], so more intensive sampling may be required to improve the accuracy of calculated field average STP. This study was not designed to quantify the relationship between sampling intensity and STP variability, so further research will be required to quantify the soil sampling intensity needed to achieve acceptable precision of field average STP values.

Changes in STP and Pstrat can also be caused by management activities such as tillage and P fertilizer management (e.g., P rate, fertilizer placement) [21,39], and soil sampling schemes should take into account these management activities. However, in this study we did not see a large influence of management practices on changes in STP or Pstrat. For example, the amount of P applied between two soil sampling events, which ranged from 0–119 kg P ha−<sup>1</sup> (Table S2), was not associated with statistically significant changes in STP, indicating that recent (i.e., within the past 3 years) P application rate was not a primary driver of STP over that period. Relatively large additions of P fertilizer are required to substantially increase STP. For instance, annual P applications of 44 kg ha−<sup>1</sup> increased STP by only 2.5 mg kg−<sup>1</sup> yr−<sup>1</sup> at a location in Iowa [40] and, in Ohio, P fertilizer rates double the crop removal rate did not substantially increase STP after 9 years at three locations [41]. It is likely that within field variability in STP was much greater than changes induced by P fertilizer applications and consequently overwhelmed observation of these changes. In contrast, manure application was associated with increases in STP in subsequent soil sampling events, likely due in part to the relatively large amounts of P (average 44 kg P ha−1) added to the manured fields. Additionally, compared to chemical P fertilizer, manure may maintain a greater proportion of P in labile forms that are extracted by the Mehlich-3 extractant over a period of at least several months after application [42]. Manure application has been previously identified as a major factor driving P losses in runoff and tile drainage [43,44]. Interestingly, an earlier analysis of EoF water quality from the same fields used this study found that P losses were significantly greater in fields that received manure applications [45]. These results highlight the importance of manure management for addressing agricultural P losses, and affirm that the connections between manure application, STP, and P losses should be a priority for future research.

Tillage operations prior to a soil sampling event was not related to changes in STP or Pstrat. However, it is important to note that the tillage operations used were either vertical, non-inversion tillage or shallow tillage. Such conservation tillage operations limit the mixing of surface and deeper soil horizons, and thus typically maintain significant soil stratification [46,47]. A previous study suggested that one-time inversion tillage that fully eliminated P stratification could greatly reduce EoF P losses [18]. Our results provide evidence that non-inversion tillage practices will not substantially mitigate P stratification, but more intensive inversion tillage will be required to reduce the level of P stratification.

The limited influence of management on STP changes in this study is likely due in part to the soil sampling intensity and frequency. More intensive or frequent sampling may have strengthened our ability to identify management influences, but this study used a soil sampling intensity similar to commercial farms in the region so these results may be a reasonable approximation for the strength of individual fertilization and non-inversion tillage effects on STP that could be expected from commercially collected STP data across individual crop fields.

#### *4.2. Relationships between STP and FWM P Concentrations*

Stratified soil sampling provided improved prediction of environmental P losses compared to typical agronomic soil sampling. Shallow sample STP accounted for more variability in both surface runoff and tile drainage FWM P concentrations (DRP and TP) compared to agronomic sample STP. Additionally, the best model fits for surface runoff DRP and tile drainage DRP and TP were obtained with regressions using agronomic sample STP combined with Pstrat. Thus, widespread testing of stratified soil samples could be used to improve identification of fields in the WLEB with increased risk of environmental P losses due to high P stratification. Agronomic soil testing is currently widely employed by producers in the WLEB [48], so stratified soil sample collection and analysis could be readily integrated into existing soil testing efforts. However, producer incentives may be required to encourage widespread implementation since stratified soil sampling increases soil testing costs and provides little direct benefit to producers.

This study provided new evidence that stratified sampling was useful for improving predictions of P loss risk for soil types and geography common in the WLEB, but previous research from small experimental plots has provided mixed evidence that stratified soil sampling can improve relationships between environmental P losses and STP. For example, a rainfall simulation experiment on 4 soils in Texas showed that 0–5 cm soil samples produced stronger relationships between surface runoff DRP and STP compared to 0–15 cm samples for two soil types, but on two other soils the 0–15 cm samples produced the stronger relationships [26]. In Manitoba, surface runoff DRP from snowmelt was better predicted by 0–5 cm STP than 0–15 cm STP ([49] Wilson et al., 2019). Conversely, a rainfall simulation study in Wisconsin showed that shallow samples (0–2 cm) did not provide consistent improvements in relationships between surface runoff DRP and STP, relative to 0–15 cm samples [15]. Similarly in pasture soils shallow (0–2 cm) samples were not better predictors of surface runoff DRP than deeper (0–10 cm) samples [50]. Finally, sandy pasture soils under simulated rainfall showed no change in the relationship between surface runoff P concentrations and STP between several sample depths (0–2, 0–5, or 0–10 cm), but the soils in that study had relatively little P stratification [51]. The depth of agronomic sampling in this study (20 cm) likely caused greater distinction between sampling depths than studies with shallower sampling depths, enabling differentiation between these soil depths effects on EoF P concentrations. Additionally, removal of EoF observations immediately following P applications may have reduced variability in EoF P concentrations and enhanced our ability to identify differences in the relationships between STP and EoF P concentrations. Furthermore, the finely textured soils that dominate NW Ohio may favor the enhanced importance of surface soil layers to EoF water quality, as the zone of interaction with water is likely more limited than in coarsely textured soils with greater infiltration rates. Finally, this study encompassed a relatively large number of fields with a wide range of STP and Pstrat, which provided a sufficient range of conditions from which we were able to observe significant relationships between STP and EoF P concentrations. The predictive benefit of adding Pstrat to the agronomic STP models was not related to differences in soil texture, slope, or hydrologic conditions within this study, suggesting that soil P stratification was important across the full range of conditions in the studied fields. However, it should be noted that the patterns observed here may not extend to regions with differing climates, management regimes, and soil characteristics.

Concentrations of DRP were more closely related to STP and P stratification than TP concentrations, indicating that STP is relatively more important for understanding DRP losses compared to TP losses. Agronomic soil tests use extractants, including the Mehlich-3 extractant used in this study, that aim to indicate the plant availability of orthophosphate over the period of a growing season, and thus can be expected to relate relatively closely to DRP concentrations in runoff [51,52]. In contrast, TP in runoff typically includes a significant portion of sediment-bound particulate P that is not measured by soil tests. Surface runoff TP loss has been shown to be closely related to sediment loss which is controlled by multiple factors in addition to STP, such as soil erodibility, ground cover, and conservation practices [15,53]. Incorporating these factors into runoff P concentration predictions (in addition to STP) could further improve model predictive power, particularly for TP loss.

The relationships between STP and P concentrations in runoff reported in this work were based on EoF monitoring, and were unsurprisingly weaker than STP-runoff P concentration relationships previously reported from more controlled rainfall simulator studies (e.g., [15,26]). In this study, the variability in runoff P concentrations that was unexplained by STP is likely in large part a result of the EoF data collection effort occurring over the broad range of environmental conditions and management practices included in the USDA EoF network. Management and environmental factors are known to influence DRP and TP concentrations in runoff, and recent research has identified crop rotation [54], soil texture [55,56], tillage [57,58], and precipitation characteristics [59–61] as important factors that can influence surface runoff or tile drainage P concentrations. Regardless, this study demonstrated that the influence of STP on EoF P concentrations was readily observed despite the variability in management and environmental characteristics over time and space. Similarly, the benefits of stratified soil sampling were robust enough to be observed across a wide range of crop production scenarios. Thus, predictions of P losses, whether made using empirical relationships or process-based models, could be improved by enhanced efforts to gather and use information on soil P stratification.

An additional challenge of developing relationships at the field scale is the limited understanding of the variability in contributions to EoF water quality across a large field area with variable soil properties and topography. Improving the accuracy of predictions of environmental P losses from heterogeneous fields can be achieved through soil testing regimes that account for disproportionate contributions of field areas P losses, particularly through surface runoff [62,63]. Furthermore, in soils where macropores play a role in drainage, subsurface tile drains have been shown to have direct connection with surface soils above the drains [64,65]. Additionally, small "hotspots" of high STP within fields could potentially play a disproportionate role in determining EoF P concentrations [38,66]. In this study, the maximum observed STP did not provide better predictions of EoF P concentrations than the field average STP for either surface runoff or tile discharge, but a more intensive soil sampling regime may be necessary to effectively characterize P hotspots. While this study did not take into account any spatial variation in contributions to EoF water quality across the field areas, future research should investigate the feasibility of designing targeted environmental soil testing to account for spatial differences in contributions to EoF water quality in tile drained landscapes.

Soil P and runoff P concentrations were measured in a relatively large number of crop fields (39) in this study, yet how closely these fields represent the broader landscape of the WLEB remains an important question. A recent study of STP stratification in croplands in the WLEB presented results of over 140,000 soil samples [18]. The region-wide average STP presented in that study was similar to the average of the fields studied in this research; the agronomic (0–20 cm) STP averaged 48.1 mg kg−<sup>1</sup> compared to 44 mg kg−<sup>1</sup> in this study. Furthermore, in Baker et al., (2017) [18] 71% of agronomic STPs were <47 mg P kg<sup>−</sup>1, i.e., the state extension recommended range for "build-up" and "maintenance" of STP for corn, whereas in this study 69% of field average STPs were also below that threshold. Finally, on average surface (0–5 cm) STP was 68% higher compared to the 5–20 cm depth, resulting in an average Pstrat of 1.68 in Baker et al. (2017) [18], which was somewhat less than in this study (1.88). The greater P stratification in this study may be due to the high prevalence of no-till and non-inversion tillage in the fields included in the USDA-ARS EoF network [9]. However, the relatively close agreement between our study and the findings of Baker et al. (2017) [18] indicates that the relationships observed in this study should be expected to hold true across the broader WLEB.

#### **5. Conclusions**

Robust relationships between agronomic STP and P concentrations were observed across 39 production crop fields in Ohio. Phosphorus stratification varied widely across the fields, and P concentrations in both tile discharge and surface runoff were found to be related more closely to STP of shallow samples (0–5 cm) compared to the agronomic samples (0–20 cm). In fields with greater Pstrat, predicted EoF P concentrations using the agronomic sample STP resulted in systematic underestimation

of tile discharge DRP and TP concentrations and surface runoff DRP concentration. The improvement in model predictive power from using shallow sample STP rather than agronomic sample STP was greater for DRP compared to TP. Additionally, both STP and Pstrat varied significantly within fields and were dynamic over time, highlighting the need for frequent and intensive soil sampling to accurately estimate the P status and risk of environmental P loss of fields. Overall, our results suggest stratified soil sampling can be a readily implemented method to improve understanding of the risk of environmental P losses in the WLEB.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2571-8789/4/4/67/s1, Table S1: Soil characteristics of fields monitored for edge-of-field P losses; Table S2: Initial values and changes (delta) in soil test phosphorus (STP), CV of STP, and P stratification ratio (Pstrat); Table S3: Linear regression results for field maximum STP vs. FWM P concentrations.

**Author Contributions:** K.K. conceived and designed the experiments; K.K., M.W., and E.D. performed the experiments; W.O., and E.D. analyzed the data; W.O. and B.H. wrote the paper. All authors have read and agreed to the published version of the manuscript.

**Funding:** The Ohio State University helped support this research through hosting the USDA ARS Soil Drainage Research Unit and providing academic access to faculty resources, libraries, and students. Funding for this project was provided by several sources including the 4R Research Fund (IPNI-2014- USA-4RN09), USEPA (DW-12-92342501-0), Conservation Innovation Grants (The Ohio State University: 69-3A75-12-231; Heidelberg University: 69-3A75-13-216), the Ohio Corn and Wheat Growers Association, the Ohio Soybean Association, the Ohio Farm Bureau, the NRCS Mississippi River Basin Initiative, the Nature Conservancy, the NRCS Cooperative Conservation Partnership Initiative, and the USDA Conservation Effects Assessment Project.

**Acknowledgments:** The authors would like to thank the landowners of the study sites who provided access to the fields and management data; Jedediah Stinner, Katie Rumora, Marie Pollock, Phil Levison and Sara Henderson for their help in data collection and site maintenance; and Eric Fischer for analytical expertise. This research was a contribution from the Long-term Agroecosystem Research (LTAR) network. The LTAR network is supported by the USDA. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture.

**Conflicts of Interest:** The authors declare no conflict of interest.
