Next Article in Journal
Recycled Eco-Concretes Containing Fine and/or Coarse Concrete Aggregates. Mechanical Performance
Previous Article in Journal
Trajectory Tracking Algorithm Study of Coal Mine Water Detector Drilling Bar Installation
Previous Article in Special Issue
Using Beerkan Procedure to Estimate Hydraulic Soil Properties under Long Term Agroecosystems Experiments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Improved Pedotransfer Function for Soil Hydrological Properties in New Zealand

Manaaki Whenua Landcare Research, Lincoln 7640, New Zealand
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(10), 3997; https://doi.org/10.3390/app14103997
Submission received: 3 April 2024 / Revised: 1 May 2024 / Accepted: 3 May 2024 / Published: 8 May 2024

Abstract

:

Featured Application

Landowners, regional and national governments, and researchers can use predictions of the soil hydrological properties created in this work, such as wilting point, field capacity, macroporosity, and total available water content, to characterize soils for soil management decisions in New Zealand, e.g., in terms of irrigation requirements, or for policy, e.g., nutrient budgets and regulations.

Abstract

This paper describes a new pedotransfer function (PTF) for the soil water content of New Zealand soils at seven specific tensions (0, −5, −10, −20, −40, −100, −1500 kPa) using explanatory variables derived from the S-map soil mapping system. The model produces unbiased and physically plausible estimates of the response at each tension, as well as unbiased and physically plausible estimates of the response differences that define derived properties (e.g., macroporosity and total available water content). The PTF is a development of an earlier model using approximately double the number of sites compared with the earlier study, a change in fitting methodology to a semi-parametric GAM Beta response, and the inclusion of sample depth. The results show that the new model has resulted in significant improvements for the soil water content estimates and derived quantities using standard goodness-of-fit measures, based on validation data. A comparison with an international PTF using explanatory variables compatible with variables available from S-map (EUPTF2) suggests that the model is better for prediction of soil water content using the limited information available from the S-map system.

1. Introduction

Soil hydraulic modeling aims to estimate key parameters relating to the retention of water in the soil matrix. These parameters are used in a wide range of modeling applications including catchment hydrology, soil management, crop production, nutrient losses, and soil ecosystem functions. It is very challenging to obtain these parameters directly from detailed field measurement at the landscape-to-national scale, since they are costly and time-consuming to gather, so a pedotransfer function (PTF) is usually employed to estimate these important parameters.
In previous work, a PTF for the soil water content was developed [1] for tensions of 0, −5, −10, −20, −40, −100, and −1500 kPa, as well as derived quantities (e.g., total available water content (TAW)—the difference between −10 and −1500 kPa), using attributes from a soil mapping system for New Zealand (S-map [2]) as a source of explanatory variables. Since that model was developed, new training data have become available, and additional explanatory variables have been identified, which suggest that revision of the model is advisable. Further, some shortcomings of the existing PTF model have become apparent, such as poor estimation of the response for some soil types of high importance (e.g., Allophanic and Pumice soils), poor estimates of the upper range of estimates of derived products such as TAW for some soil orders, and poor characterization of the uncertainty associated with predictions.
While the current PTF [1] has used New Zealand-derived sample data for training, there are several potential alternatives available, such as the globally applicable Rosetta and European hydraulic PTFs models [3,4], as well as many regional-scale physical and empirical models (e.g., [5,6]). It is therefore relevant to compare any New Zealand PTF with results from other models.
The aim of this paper is to describe an updated soil water PTF model for soils in New Zealand. It describes changes in the assumptions concerning the response variable, alterations in the explanatory variables that can be used, a better functional structure for the modeling (i.e., generalized additive model or GAM versus the original logistic regression), and a considerable change in the spatial extent and number of training samples available. The paper also offers a comparison between prediction results from the new model and other relevant global models for the soil–water response that it is feasible to compute for New Zealand soils. Typically, these models are focused on estimates of TAW, which is the derived parameter of most practical use in applications.

Application of the Model

The model developed in this paper is intended to produce estimates of soil water content for each functional horizon within a soil profile. In practice, the model must be integrated into the S-map inference engine [7] to provide estimates of the water content while also accounting for stones, since the effect of stones is not handled by the model here. Based on work in [8], the water content held by the stones is adjusted according to the type of rock. The fines and rock water content are then combined according to the proportion of stones in each horizon. The S-map inference engine provides estimates of TAW calculated to various depths (or to a rooting barrier if this is higher), including 30, 60, and 100 cm. These stone-corrected estimates are provided on the S-map factsheets, and maps are available for those parts of New Zealand covered by S-map (see https://smap.landcareresearch.co.nz, accessed on 1 May 2024). While not yet fully implemented, the intention in S-map is to eventually provide the 90% confidence interval of estimates of stone-corrected TAW at each tension for each functional horizon.

2. Materials and Methods

2.1. Available Predictors

Both the existing and the proposed newly derived PTF in this paper have a limited range of explanatory variables that could be used, dictated by the information available from the S-map soil mapping system [2]. Specifically, this refers to the soil order and group (group is a subdivision of the soil order), along with the soil drainage class and rock class of fines (the dominant rock type that forms the fine soil material with a particle size < 2 mm) as defined by the New Zealand Soil Classification (NZSC) [9]. A correlation between NZSC soil order and the corresponding WRB Reference Soil Groups is given in Appendix A in Table A1. Also available as explanatory variables were the numeric estimates of sand, silt, and clay particle size fractions (soil texture), and the soil horizon depths. Categorical information concerning soil morphology was available from the S-map functional horizon description. This description is a character string that, when appropriately parsed, determines whether the soil is a topsoil or not and provides a texture class (see Table 1), information on the size of structure, soil strength or consistency, and identifies whether the soil is formed from basic or acidic volcanic material [10,11].
The collection of parameters available for PTF training are shown in Table 1, some of which were expected to have very little influence on soil water content. Additionally, the S-map functional horizon string also provided coding describing the stoniness of the soil (e.g., not stony, extremely stony), but there are no soil water content data for stony soils. Stony soils are handled by a procedure applied to profiles based on 2021 work by Robertson, Almond, Carrick, Penny, Eger, Chau, and Smith [8], as described in a later section.
Organic horizons, which are indicated by the organic categorical variable defined as True in Table 1, required special treatment. For these soils, the only explanatory variables used were the sample depth, carbon content (available for organic horizons), and the organic matter composition (fibrous peaty or humic peat). Fibrous peat is weakly decomposed and is comprised of dominantly well-preserved fibrous plant remains. Humic peat is composed of moderately to strongly decomposed humic peat. The range of explanatory variables was limited because there are too few samples of organic horizons in the training dataset to include other explanatory variables or categories. As a result, a separate PTF model was developed for organic horizons.
It is important to note that some other potentially relevant explanatory variables were not available from the S-map soil mapping system. Examples include fine-earth bulk density, particle density, total porosity, and wilting point, all of which are used with success in various PTFs described in the literature. For example, the PTF described in [13] requires the −1500 kPa response, bulk density, carbon concentration, and sample depth, producing estimates of 0 and −1500 kPa, as well as estimates of saturated soil hydraulic conductivity. In that approach, the model is physically based and defined in terms of the −1500 kPa response, so that the soil hydraulic conductivity and −1500 kPa response are consistent with one another. The result is a model that produces near-exact estimates of soil water content at specific tensions such as 0 or −1500 kPa, along with physical model estimates of soil hydraulic conductivity. By contrast, the model described here is a soil classification and soil morphology model, designed to estimate the soil water content at a specific set of tensions, for which explanatory variable information is available for extensive areas of New Zealand for thousands of sites in soil surveys.

2.2. Available Data

2.2.1. Data Sources and Distribution

Training and validation data for development of the PTF came from information provided in the New Zealand National Soil Data Repository (NSDR) [14] and the National Soils Database (NSD) [15]. The NSDR contains soil classification and soil morphological descriptions and soil analytical data on sand, silt, and clay content and water retention at 0, −5, −10, −20, −40, −100, and −1500 kPa tensions, which also includes the older NSD. The NSDR is a point (soil profile site) database, and a fundamental assumption in the present modeling is that the water-release relationships found between the soil description data and the associated analytical data (NSDR) can be applied to identical soil description data recorded in the S-map database (and thus be applied to the soil map units containing these soils). S-map itself provides the best available soil survey data for extensive areas of New Zealand, providing the basic soil property data given in Table 1.
Table 2 shows the numeric distribution of samples and sites among NZSC soil orders, while Figure 1 shows the spatial distribution of the site data. Sites of the NSDR are individual soil profiles, and samples are the laboratory-analyzed soil horizons in a profile. The equivalence between NZSC soils and US Soil Taxonomy may be found in [9]. The point data used for development of the PTF are clearly not spatially representative, nor spatially random, and areas of intensive agricultural development are over-represented at the expense of low-productivity natural vegetation and agricultural land. However, the distribution of point data among NZSC soil orders in Figure 1 provides good coverage of the dominant soil types used for agricultural production. Sparsely distributed NZSC soil orders (e.g., Oxidic and Semiarid) have relatively few samples available and are more likely to be subject to overfitting using regression-based modeling approaches.

2.2.2. Data Processing

Soil texture was provided by the percentage values of sand, silt, and clay, with the values summing to 100%, forming a ternary simplex. Texture can be provided from a particle size distribution curve (PSD) or from values for sand, silt, and clay. Where the texture ternary simplex values did not sum to 100%, all three values were scaled to force the sum to that value, provided the original sum was within the range [95%, 100%]. Where both PSD and discrete values for the sand, silt, and clay were available, the PSD values were used by preference. PSD values were available for most of the sample soil profiles in the NSDR.
Sand, silt, and clay estimates were available in the NSDR for most samples from appropriate values of the particle size distribution. For a small number of samples where this was not the case, texture was estimated using a monotonically constrained B-Spline curve [16] to the cumulative PSD data parameterized by the log transform of the particle size and using the logit-transformed cumulative PSD as the response; we also added continuity constraints for the PSD at values of 0.0 and 1.0 for nominal particle size values of 10 3 and 10 4   μm for stability. The definitions of ‘sand’, ‘silt’, and ‘clay’ vary by country, typically affecting the border between silt and sand (see Table 3). Where the Australian, European, or North American texture system is needed for a PTF and when the PSD is available, sand, silt, and clay estimates were generated so that the data can be used in PTFs for TAW using the relevant definition. In this paper, the texture definitions implicitly refer to the New Zealand system, but for PTFs developed elsewhere, we assumed that the appropriate texture definition had been used. In addition to texture values provided in the ternary simplex, texture is also defined as a categorical variable (e.g., sandy, loamy, clayey, see Table 1) from the parsed functional horizon string. This categorical descriptor is defined by ranges of texture values [11], so might be expected to have low influence in the regression model.
The texture ternary simplex has an inherent structural correlation, since changing any one value changes the other two, regardless of any functional correlation between them. The ternary simplex values were transformed to a two-component Cartesian system denoted ω 1 and ω 2 such that: ω 1 = 2 ρ S A N D ρ S I L T ρ C L A Y and ω 2 = ρ S I L T ρ C L A Y (where ρ S A N D , ρ S I L T , and ρ C L A Y are the fractions of soil in sand, silt, and clay, respectively. [19]), and these Cartesian components were used as predictors in the regression.
Several NZSC soil orders were split. This splitting was based on the NZSC soil group where groups differentiated soils of different bulk densities, which is an important influence in predicting the water response. Brown soils were split into Allophanic and non-Allophanic Browns, Gley soils into Immature and Mature Gleys, and Pallic soils into Immature and Mature Pallic. Other data pruning and processing steps include:
  • The total porosity (TP) was calculated from fine-earth bulk density ( BD ) and particle density ( PD ) if TP was missing, using the functional relationship TP = 1 BD / PD .
  • Water content values outside the range [0, 1] (0% to 100%) were dropped.
  • Water content values for a sample were only retained where the soil water values from tensions of 0 to −1500 kPa were non-increasing. However, TP is measured using a different method from other tensions, and sometimes the TP response is less than the response measured at −5 kPa by a very small amount (typically 0.5%). Where this is the case, the TP was increased above the −5 kPa value by a value equal to half of the minimum difference between all the TP and the −5 kPa values (after dropping TP values equal to or smaller than the −5 kPa response). This step was used to ensure non-increasing values but avoids dropping too many TP values.
  • Samples were dropped if the sample top and/or bottom were missing, or if the sample top was negative (i.e., above ground level).
  • Since Allophanic soils are characterized by a high porosity (>60%), Allophanic soils with a total porosity of less 60% were dropped. This involved only a very small number of cases (2% of Allophanic soils).
  • Any samples with a functional horizon indicating a pan were dropped, since these dense layers of soil are largely impervious to water and were problematic to model. Anthropic soils are man-made soils whose characteristics are often drastically changed from mixing and compaction and are likely to have confusing effects on regression models. Raw soils are soils that lack soil development and are likely to be outliers in soil–water relationships. These dropped cases involved very few samples, since pan, anthropic, and raw soils are of low interest for modeling of physical characteristics in agricultural landscapes.
  • Any stony soil samples, as defined by the functional horizon description, were dropped, since stony soils were modeled as a post hoc adjustment of water release predictions, as noted in Section 1.
  • Any samples (other than for Organic or Pumice soils) where the bulk density is less than 0.5 were dropped, since these low-density soils suggested incorrect soil classification. Samples were also only retained where the bulk density was measured from an undisturbed core sample.
The available data were split into a training and validation dataset, with 70% in the training and the remaining 30% retained as validation. The split into two sets was carried out while preserving the relative ratios of the number of samples in each soil order, ensuring that relatively rare soil types were appropriately represented within the two subsets (see Table 2). Table 4 shows a summary of the statistics for the responses at all tensions.

2.3. Modeling Approach

Two regression models were developed for: (a) non-organic horizons (the most common class); (b) organic horizons. These two regressions only apply on a sample-by-sample basis, and did not consider the influence of sample layers above or below the specific horizon from which a given sample was taken. Adjustments for stony soils and for collections of samples in a profile are applied separately (i.e., post hoc) and are described in Section 1.
In the modeling, 7 samples were dropped from the training data (none from the same profile) since they were assessed as being excessively influential, by inspection of the quantile plot of the deviance residuals. A total of 9 profiles were dropped as they were assessed as being anomalous after inspection of their characteristics compared to others in the same soil order. These anomalous profiles had functional horizon descriptions that were considered inconsistent for their soil order, as assessed by a pedologist. The statistical modelling was conducted in R version 4.3.1 [20].
The soil hydraulic PTF aimed to provide an estimate of the volumetric water content contained in a sample of soil, ranging from 0–1 (i.e., 0%–100%), given a specified tension or suction applied to the soil. The water content or response, θ , is a function of the tension, ψ (i.e., θ ψ ). The θ ψ ranges from zero (completely dry soil) to a value at saturation (at total porosity θ 0 = θ T P θ ψ = 0   k P a ), and the PTF generated estimates of θ ψ denoted by θ ^ ψ . This relationship between θ and ψ is termed the ‘soil water retention curve’. In New Zealand, for historical reasons, sample data are most frequently available for tensions of 0 (total porosity, TP), −5, −10 (Field Capacity or FC), −20, −40, −100, and −1500 kPa (wilting point or WP). Fewer than 1% of the data had response measurements for tensions other than these figures, and for this reason, the decision was made to only model the response at these 7 tension values.
In the literature, a common approach is to model the response at each tension, providing marginal estimates of the θ ^ ψ i for each specific tension ψ i . This approach does not account for the joint relationship between the response at two different tensions, such as θ ψ i , and θ ψ j , which is important if the user wishes to calculate the response difference to estimate TAW or macroporosity. In this work, it was a requirement to generate a PTF that estimates both the marginal and joint soil–water response such that:
  • The soil–water response is non-increasing with increasing tension value;
  • Estimates of derived properties (e.g., TAW, macroporosity) are bounded below by zero and above by 1;
  • Lower and upper uncertainty estimates for the soil–water response and derived properties are bounded below by zero and above by one.
The previous model for water response [1] used logistic regression for a tension of –1500 kPa and logistic regression to model scaled response difference values. This approach was adopted after comparing several different approaches. The difficulty with that approach was that incorporating new explanatory variables (in this paper, sample depth) made the regression model very complicated. In the present paper, a similar approach to that used in [1] is employed, but logistic regression is replaced by Beta regression [20] within the GAM framework.
The principal advantage of the GAM approach is that the nature of the complicated relationship between (say) soil depth and soil–water response can be determined from the data semi-parametrically, thus providing flexibility in the nature of the relationship. The disadvantages are that considerable computational effort is required to determine the form of the smooth function [21], and the complexity of the resulting model is far greater than a simple parametric model. The choice of parametric effects in the GAM was determined largely by experience using the previous model [1]. However, the increased number of samples available for the GAM meant that additional interactions between explanatory variables could be considered, while also incorporating automatic smoothing. In many cases, interactions between variables were not feasible; for example, not all drainage and rock classes of the fines class levels are present for all soil orders of the NZSC and, therefore, were not considered.

2.4. Non-Organic Horizons

For non-organic soil samples, the −1500 kPa response was modeled by a structure like:
g μ = A ζ + f i w 1 . w 2 + g i log depth + ϵ
where μ E Y and Y ~ EF μ , φ . Y is the response variable (the −1500 kPa response), EF μ , φ is an exponential family distribution with mean μ and scale parameter φ . A is a model matrix for parametric model coefficients (e.g., soil order, rock class of fines, topsoil), ζ is a parameter vector, and the f i w 1 . w 2 and g i log depth are smooth functions of the Cartesian texture values and the log-transformed sample depth, respectively. The residual uncertainty was captured by ϵ , which is distributed as a Normal (i.e., ϵ ~ N 0 , σ 2 . It should be noted that f i w 1 . w 2 and g i log depth are specific to each soil order i ( i = 1 N ) for N = 16 soil orders (see Table 2). The distribution of the −1500 kPa response was taken to be Y ~ Beta μ , φ , which was used since the response was bounded below by a value of zero and above by one.
Once the −1500 kPa response model has been fitted, the −100 kPa response was modeled by forming the response difference between −100 and −1500 kPa so that it covers the interval [0, 1] using:
100 , 1500 = θ 100 k P a θ 1500 k P a 1 θ 1500 k P a
where θ 100 represents the response at a tension of −100 kPa. Then, the regression model was:
g 100 , 1500 μ = A ζ + f i w 1 . w 2 + g i log depth + ϵ
This procedure was repeated for pairs of scaled response differences up to the final model g 0 , 5 . The set of explanatory variables for all GAMs was the same for each response difference, although the relative size of the coefficients changed. One important assumption in the above approach when used to estimate θ for any of the discrete tensions was that the response differences were independent of one another. Table 5(a) shows the correlation between θ for different tensions, while Table 5(b) shows the correlation between response differences (e.g., between θ 10 θ 20 and θ 5 θ 10 ). The correlation between θ values at different tensions is generally high, especially for tensions at approximately −20 kPa (the correlation between θ 20 and θ 40 is 0.99). The correlation between tension differences was much lower, and in most cases was not significant. The moderate correlation of the response differences at −10 to −40 kPa is due to the relatively flat nature of the soil–water response at these tensions.
The disadvantage of the response-difference approach was that calculating estimates of θ ^ was more complicated. The uncertainty of θ ^ estimates also had to be calculated by simulation [21], followed by quantile summary to estimate the lower and upper prediction intervals (i.e., posterior distribution sampling), using the assumption that response differences were independent.
Our method started from a tension of −1500 kPa as a reference, in which case all the differences to lower tension values are positive. The same approach could also use 0 kPa as the reference, in which case all the differences to higher tension values would be negative. In that case, the response for the GAM would need to be the negative of the differences, since the Beta response must be constrained to the interval [0, 1]. Although the choice of a reference tension is arbitrary, the −1500 kPa reference was chosen because the response θ 1500 was the lowest of all responses, so the calculations are simple. Choosing a reference tension other than 0 or −1500 kPa could be problematic since the response differences would then be a mix of positive and negative values, so the calculations would need to include an offset to force the response in the Beta regression to the [0, 1] interval.

2.5. Organic Horizons

Organic horizons, indicated in the functional horizon string, have a carbon concentration of at least 18% (>30% of soil volume) [11]. They are usually, but not necessarily, found in soil profiles in the Organic soil order. A feature of organic horizons is that the sample top can be above ground, depending on the state of overlying vegetation. To ensure compatible modeling between horizons, where the sample top was above ground the profile depth was shifted so that the sample top was zero. For organic horizons, fewer tension responses were available, and the decision was made to generate response estimates for 0, −10, −100, −1500 kPa, and TAW. A total of 259 sample rows were available for analysis after two highly influential points were removed after inspection of the deviance residuals [22].
The modeling approach for organic horizons was like that for non-organic horizons, except that the only explanatory variables were the sample depth, carbon concentration (or, functionally equivalently, the organic matter content), and the horizon composition (humic peat or fibrous peat). Thus, a GAM was used, with a Beta response with a logit link for a tension of −1500 kPa, as well as a Beta response with a logit link for the response differences.

2.6. Model Assessment

Goodness-of-fit (GOF) measures provide a way of comparing models by aggregating results over all training or validation samples of data. They are useful as a general method of comparison between pairwise samples of data where observed data are reliable but only crudely differentiate between models and cannot describe the detailed differences at different response values. In this comparison, we used mean error (ME, range unbounded), mean absolute error (MAE, range [0, ∞]), root-mean-square error (RMSE, range [0, ∞]), Nash–Sutcliffe efficiency (NSE, range [−∞, 1] [23]), coefficient of determination (R2, range [0, 1]), and the index of agreement (d, range [0, 1], [24]), defined as follows:
ME = i = 1 N y m , i y o , i N
MAE = i = 1 N y m , i y o , i N
R 2 = i = 1 N y m , i y m ¯ y o , i y o ¯ i = 1 N y m , i y m ¯ 2 i = 1 N y o , i y o ¯ 2 2
RMSE = i = 1 N y m , i y o , i 2 N
NSE = 1 i = 1 N y m , i y o , i 2 i = 1 N y o , i y o ¯ 2
d = 1 i = 1 N y m , i y o , i 2 i = 1 N y m , i y o ¯ y m , i y m ¯
and where y m , i and y o , i are the modeled and observed values of the response, respectively, for i = 1 N pairs of values, and y o ¯ and y m ¯ are the mean observed and modeled values (respectively) of y over all possible values.

2.7. Comparison with Other PTF Models

There are many PTFs in the literature that provide estimates for the soil–water response for some of the tensions used in this present work. What is often required are estimates of derived properties such as the available water, calculated from the difference in the soil–water response between two different tension values. In New Zealand, total available water content is defined as the difference in response between −10 and −1500 kPa, but there are other definitions of this property in the literature, adapting to local circumstances, and for specific purposes. Since most PTFs have been designed with sets of explanatory variables that happen to be available, and because the definition of derived properties varies, it is relatively rare for PTFs to be straightforwardly compared. Added to this complexity are the differences in the definition of the size of silt and sand (Table 3), which is frequently an influential explanatory variable in soil water PTFs [25].
The requirement was to compare the PTF generated in this paper with one or more PTFs in the literature where the explanatory variables from S-map could be used and where the various explanatory variables had the same definition as used in S-map. PTFs used for the comparison had to be computable from published equations or available via software packages. We excluded PTFs that used components of the soil–water response (e.g., wilting point, total porosity) as explanatory parameters (and that are sometimes found in the literature) since they are highly influential predictors of the response. In these PTFs, the total porosity (TP) or wilting point (WP) was a required parameter, which means that predictions for 0 and −1500 kPa are perfect or nearly perfect (depending on the nature of the model).
Given the above restrictions, very few PTFs were available for direct comparison with the PTF developed in this paper. One suitable set of PTFs, however, was the European hydraulic pedotransfer functions (EUPTF2, [26]). These are an update of previously published European PTFs [4], EUPTF v1.4.0, which provides prediction uncertainty. EUPTF2 was derived for point predictions of soil water content at saturation (0 kPa), field capacity (at both 10 and 33 kPa), wilting point (1500 kPa), plant available water, and saturated hydraulic conductivity, as well as the van Genuchten model parameters of the moisture retention and hydraulic conductivity curve [27]. The models can use a selection of inputs, with a minimum of soil depth, sand, silt, and clay content. Additional information such as soil organic carbon content, bulk density, calcium carbonate content, pH, and cation exchange capacity can also be included, although these are not viable as explanatory parameters for S-map.
A complication when using S-map information in the EUPTF2 model is that texture is defined using different classifications for sand, silt, and clay compared to those used in New Zealand (Table 3), so texture cannot be used without some kind of transformation of standard S-map texture values from New Zealand to those compatible with EUPTF2. For the purposes of comparison with the PTF developed in this paper, sample data from the particle size distribution were used to generate estimates of sand, silt, and clay content using the USDA/FAO (Table 3) with a monotonically constrained B-Spline curve [15], as outlined in Section 2.2.2, along with the sample depth. No other explanatory variables from S-map were available for use in the EUPTF2 model. For simplicity, the EUPTF2 model was only used to predict soil water content for non-organic soils.

3. Results

3.1. Non-Organic Horizons

Figure 2 shows predictions of responses using the new PTF for all tensions as well as TAW against measured values, for validation data. The estimates for TAW are particularly important since they show a derived response defined by the difference between tensions of −10 and −1500 kPa, which follows from the structural method of predicting marginal responses and confirms the correctness of the posterior simulation method for prediction. For all responses, the data follow the 1:1 line, and departures from this behavior, when they occur, tend to be at the upper end of the dynamic range, where training data are sparse. TAW predictions tended to depart from good prediction above a value of about 38% to 40%, which is a very high TAW value since the 95% quantile of the sample TAW data was 35%.
Figure 3 shows probability density plots of the response residuals for training and validation data for all responses. The distributions are all centered about zero, and there is no detectable difference between the distributions, which suggests that overfitting is unlikely.

3.2. Organic Horizons

Figure 4 shows the measured against predicted response using a Beta regression GAM with a smooth function for sample depth and carbon concentration, by composition class. The error bars represent the 95% prediction interval, calculated by posterior simulation. The responses for organic soils show greater variation in uncertainty than is the case for non-organic horizons (Figure 2), but all values, including TAW, have satisfactory response, centered around the one-to-one line.
Figure 5 shows a contour plot of the estimated TAW as a function of sample depth and carbon concentration, for humic peat and fibrous peat soils, along with the original sample data. There is a marked contrast between the two composition types. Fibrous peat soils show considerable sensitivity to organic matter, while for humic peat the response is less sensitive to organic content. For humic peat, the response increases with sample depth, while for fibrous peat soils the response is less clear and depends on the organic matter content.

3.3. Model Assessment

Figure 6 shows plots of GOF measures for training and validation subsets of the data, for each separate response, for non-organic soils. The dashed line in each case corresponds to the ideal or perfect fit between the observed and predicted data (e.g., a value of 1.0 for R 2 ). The general trend from Figure 6 is that the −1500 kPa response is the best, although there are exceptions to this rule, such as for mean error where the −5 kPa model has the best GOF measure. Generally, most GOF measures degrade as the tension changes from −1500 kPa to lower tensions and is generally (but not always) the worst for total porosity. The GOF measures for TAW and MP are generally worse than for marginal tensions, as might be expected.
Figure 6 also suggest that validation subset results are slightly worse compared with training versions, although there are rare exceptions, such as for some cases of mean error. Poorer validation GOF values compared with training data are expected, since the validation data are not used in the development of the model. The result suggests only very mild overfitting.
There is no universal criterion for acceptable values of GOF in the literature, except that it is clear that the aim is to minimize certain values (e.g., MAE) or maximize others (e.g., NSE). GOFs can be used to compare models and to provide signs of overfitting, but this should be carried out with caution since most measures are sensitive to outliers in large values. Here, our primary aim was to use GOFs to indicate potential overfitting and general trends in performance.

3.4. Error by Covariates

A relevant question to ask about the model is how the uncertainty of predictions changes with respect to specific covariates. An important issue of interest, for instance, is how the uncertainty changes by depth. To answer this, a slightly simpler definition for the accuracy of the model is needed, since the upper and lower 95% confidence intervals are asymmetric about the estimated mean, particularly for small response values. A useful definition adopted here is that the average accuracy is defined as: ‘the average of the differences between each of the upper and lower 95% confidence intervals and the fitted value.’ That is, the average accuracy is defined as f ^ 0.975 f ^ 0.500 + f ^ 0.500 f ^ 0.025 / 2 , which is f ^ 0.975 f ^ 0.025 / 2 , where f ^ q is the q -quantile prediction estimate of the distribution of the soil water content at some specific tension or values such as TAW or MP, for some quantile q ( q = 0 1 ).
Figure 7 shows a plot of the predicted TAW as a function of the sample depth, by soil order, along with the average accuracy (as defined above). Note that the maximum value of the sample depth for the combined training and validation datasets was 155 cm, although this varies by soil order. The general trend is for the predicted value to be high near the surface, reducing to a more-or-less plateau level as depth increases. Exceptions to this rule included Podzol soils, where the predicted value increased below a depth of about 75 cm, and to a lesser extent Pumice, Semiarid, and Mature Gley soils. The average accuracy, by contrast, was generally low for samples near the surface, increasing slightly with sample depth. Again, there were exceptions, such as Granular and Ultic soils where the accuracy diminishes with depth. It is important to remember when interpreting Figure 7 that the average accuracy depends to a significant extent on the texture (sand, silt, and clay content), in addition to sample depth and all the other lesser-influential explanatory variables.

3.5. Comparisons with Other PTF Models

It is useful to compare the results from the present PTF described here with other models available from the literature, to see how well the model performs and check if another PTF option with better performance is viable using S-map information. Almost all PTFs tend to be built for use with locally available explanatory variables, so the nature of any comparison is inherently flawed, if not unfair. The vast majority of PTFs in the literature were ruled out for comparison because the required explanatory variables were not readily available from conventional soil mapping from New Zealand, or required laboratory estimates that would be difficult to obtain in the field (e.g., bulk density, carbon concentration). A further group of potentially useful PTFs were ruled out since code for them was not straightforwardly available.
One set of PTFs that has been extensively described are the EUPTF models [26], which can use various sets of explanatory variables. After consideration of the available models, only one of the EUPTF models was employed, using only sand, silt, and clay content. The clear flaw in this approach is that the soil types used to fit EUPTF differ from those available in New Zealand, so systematic differences in predictions might be expected.
Figure 8 shows the measured versus predicted plot for TAW using the EUPTF2 model, by soil order. TAW in this case was calculated using the difference between the EUPTF2 estimates of −10 and −1500 kPa, and comparing this with the measured difference between −10 and −1500 kPa from New Zealand sample data. Over the practical range of TAW (approximately 10% to 30%), the tendency was for EUPTF-derived estimates to be biased low.
It is difficult to draw definitive conclusions from the results in Figure 8. Although the EUPTF results appear to be biased with respect to the PTF developed here, this could be due to the different constituent soil types used in each PTF, or a combination of other relevant explanatory variables. The result suggests that it is important to be aware of the potential for bias when attempting to generate predictions for areas of New Zealand where mapping from S-map is not available.
Figure 9 shows plots of goodness-of-fit (GOF) measures for the previous New Zealand model [1], the New Zealand model developed in this paper, and the EUPTF2 model. The best of the models was the New Zealand model developed in this paper, while the EUPTF2 model gave the poorest results. In particular, the EUPTF2 model provided estimates that were generally biased below the true value (as observed in Figure 8). It should be noted that the available explanatory variables for use in the EUPTF2 model are very restricted, and much better results would be obtained if, for example, bulk density or total porosity were available from S-map. Unfortunately, using a richer range of explanatory variables was not a viable option in the New Zealand case, since the objective was to estimate the responses using variables readily available from S-map.

4. Discussion

4.1. Model Improvements

The important improvements between the previously published New Zealand model [1] and the present version are enumerated below.
  • There is a twofold increase in the number of sites available for fitting the PTF, especially for less common soils in New Zealand (e.g., Pumice soils).
  • Inclusion of the sample depth as an explanatory variable; this is justified since sample depth is associated with changes in carbon concentration [28] and bulk density.
  • Refinements to the processing of the training data, including removal of all stony soils to a post hoc procedure, and more consistent estimation of texture classes.
  • A change from logistic regression in [1] to the GAM semi-parametric approach with a Beta response. This allows the model to adapt to the data characteristics while still adhering to the requirement for physically plausible estimates of the soil–water response and derived values (e.g., TAW).
The structure of the models presented in this paper are such that the soil–water response is physically plausible (e.g., the values are bounded in the range [0, 100] (i.e., 0% to 100%) and monotonic with respect to tension), but the models are not physically interpretable. There are clear advantages to using a physically interpretable model (e.g., [29]), but there is a trade-off between complexity and fidelity between the measured and modeled values. Empirical models tend to have close fidelity between measured and modeled values, particularly if (as in this case) the fitting GAM methodology is semi-parametric, but the result is a very complicated model. For example, the −1500 kPa model developed in this paper has 89 coefficients, most of which are associated with the smooth functions of sample depth, as well as the sand, silt, and clay content. By contrast, a physically based model may have a relatively small number of independent parameters (as few as 4–6), but the fidelity between measured and modeled values for out-of-sample data is typically worse than for the empirical approach. The choice between the two approaches depends on how important it is to the practitioner to have results that are physically interpretable.

4.2. Comparisons between Models

A comparison between the previous New Zealand model [1], the present model developed in this paper, and the EUPTF2 model clearly suggests that the model presented here is the best, given that only a restricted set of explanatory variables are available for predictions. Estimates of TAW from the EUPTF2 model were biased with respect to measured values, but this is likely due to the restricted range of explanatory variables provided and the fact that the soil types in New Zealand do not match soil types fit to the EUPTF2 model. Other published PTFs were not included in the comparison for several reasons.
  • Many PTFs require soil inputs such as bulk density that are not spatially available in New Zealand.
  • Some PTFs are so geographically specific that their use for New Zealand soils would be pointless [30].
  • Either the coefficients for the published model were not available for application with New Zealand data, or the software was not suitable for use in calculations of large quantities of S-map sample data.
  • Global maps [31] in image format of the mean and coefficient of variation of saturated moisture content, field capacity, and permanent wilting point estimated using an ensemble model can be found online (https://doi.org/10.7910/DVN/VPIN2B, accessed on 1 May 2024). In this case, the PTF predictions are available at such a low spatial resolution (10 km postings) that a comparison with New Zealand point data was pointless. We also note that recent research is finding that local models usually perform better than global models (e.g., [32,33,34,35]).

4.3. Limitations

While refinements to the earlier PTF [1] have produced improved results, as described in this paper, there are still some fundamental limitations of this approach.
  • The PTF is physically informed, with estimates bounded above and below by physically plausible limits (0%, 100%), and the estimates are enforced to be monotonic with increasing tension. However, the differences in response between soil types and other explanatory variables may not be explainable in a straightforward manner [36]. As for all PTF models, there is a compromise choice between the fidelity of predictions to measured data and the interpretability of the results.
  • The model indicates associative relationships between the various explanatory variables and the soil–water response, but those relationships may not be causal. For example, including sample depth does not cause a strong change in soil–water response by itself, but its beneficial inclusion is likely due to the strong association between sample depth with bulk density and carbon concentration, both of which are known to affect soil water retention [6,37,38,39].
  • The training data used in development of the model were derived from historical data and do not represent random sampling over the landscape or over explanatory variables. It is difficult to determine the effect of this non-random sampling, but certain diagnostics were used to provide checks on potential problems. For example, the smooth effect of adding spatial coordinates was extracted to determine where spatial extrapolation of the estimated response (e.g., at −1500 kPa) was notable. Although not shown here, the result did not suggest that there are extensive areas where the predicted response is likely to be biased.

5. Conclusions

This paper has described a new model for the soil–water response at seven specific tensions (0, −5, −10, −20, −40, −100, −1500 kPa), using a development of an earlier model [1]. Diagnostics show that the model produces unbiased and physically plausible estimates of the response at each tension, as well as unbiased and physically plausible estimates of the response differences that define derived properties (e.g., macroporosity and total available water content [TAW]). An increase in the available sample data, a change in fitting methodology to a semi-parametric GAM Beta response, and the inclusion of sample depth have all resulted in significant improvements for the response estimates using GOF measures, based on withheld validation data. A comparison with an international PTF using explanatory variables compatible with variables available from S-map (EUPTF2) showed that the model developed in this paper is better for prediction of the marginal and derived responses given the limited information available from the S-map system.

Author Contributions

Conceptualization, S.M., L.L., T.W. and S.C.; methodology, S.M., L.L., T.W. and S.C.; software, S.M. and S.V.; validation, S.M., L.L., S.V., T.W. and S.C.; formal analysis, S.M. and S.V.; investigation, S.M., L.L., T.W. and S.C.; resources, L.L.; data curation, S.V. and S.M.; writing—original draft preparation, S.M.; writing—review and editing, S.M., L.L., T.W. and S.C.; visualization, S.M.; supervision, L.L. and S.C.; project administration, L.L.; funding acquisition, L.L. and S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the New Zealand Government’s Ministry of Business, Innovation and Employment Strategic Science Investment Fund, and their Endeavour Fund (Critical Pathway Programme).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Soil data are accessible via the Manaaki Whenua—Landcare Research Soils Portal at https://soils.landcareresearch.co.nz (accessed on 1 May 2024).

Conflicts of Interest

Authors Stephen McNeill, Linda Lilburne, Shirley Vickers, Trevor Webb and Samuel Carrick were employed by the company Manaaki Whenua Landcare Research. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A

Table A1 provides a correlation from [40] between soil orders of the New Zealand Soil Classification [9], the soil taxonomy, and World Reference Base equivalent reference soil group soil types [40,41]. The correlations in Table A1 are a guide only; for accurate classifications of a specific soil, the soil properties and details of the particular classification should be consulted in [40].
Table A1. Correspondence between soil orders of the New Zealand Soil Classification, soil taxonomy, and World Reference Base equivalent reference soil group.
Table A1. Correspondence between soil orders of the New Zealand Soil Classification, soil taxonomy, and World Reference Base equivalent reference soil group.
New Zealand Soil Classification (Soil Order)Soil Taxonomy
(Equivalent Soil Order, or Suborder/Great Group)
World Reference Base
(Equivalent Reference Soil Group)
AllophanicAndisols (except Vitrands)Andosols
AnthropicEntisols or unclassifiedAnthrosols or Technosols
BrownInceptisols (also Entisols)Cambisols
GleyAquic suborders of Inceptisols, Entisols, Oxisols, etc.Gleysols
GranularUltisolsNitisols
MelanicMollisols, VertisolsVertisols, Calcisols, Chernozems or Phaeozems
OrganicHistosolsHistosols
OxidicOxisolsFerralsols
PallicAlfisols, InceptisolsCambisols, Luvisols, or Planosols
PodzolSpodosolsPodzols
PumiceAndisols: mainly Vitrands (also Vitricryands, Vitriquands)Andosols
RawNot-soil or EntisolsRegosols, Aernosols, or unclassified
RecentEntisols (also Inceptisols)Fluvisols, Arenosols, Leptosols, Cambisols, Regosols, or Umbrisols
SemiaridAridisolsLuvisols, Cambisols, or Solonetz
UlticUltisolsPlanosols

References

  1. McNeill, S.J.; Lilburne, L.R.; Carrick, S.; Webb, T.H.; Cuthill, T. Pedotransfer functions for the soil water characteristics of New Zealand soils using S-map information. Geoderma 2018, 326, 96–110. [Google Scholar] [CrossRef]
  2. Manaaki Whenua Landcare Research. S-Map—New Zealand’s National Digital Soil Map. Available online: https://smap.landcareresearch.co.nz/ (accessed on 12 March 2024).
  3. Schaap, M.G.; Nemes, A.; van Genuchten, M.T. Comparison of models for indirect estimation of water retention and available water in surface soils. Vadose Zone J. 2004, 3, 1455–1463. [Google Scholar] [CrossRef]
  4. Toth, B.; Weynants, M.; Nemes, A.; Mako, A.; Bilas, G.; Toth, G. New generation of hydraulic pedotransfer functions for Europe. Eur. J. Soil Sci. 2015, 66, 226–238. [Google Scholar] [CrossRef]
  5. Weynants, M.; Vereecken, H.; Javaux, M. Revisiting Vereecken pedotransfer functions: Introducing a closed-form hydraulic model. Vadose Zone J. 2009, 8, 86–95. [Google Scholar] [CrossRef]
  6. Saxton, K.E.; Rawls, W.J. Soil water characteristic estimates by texture and organic matter for hydrologic solutions. Soil Sci. Soc. Am. J. 2006, 70, 1569–1578. [Google Scholar] [CrossRef]
  7. Lilburne, L.R.; Hewitt, A.; Webb, T. Soil and informatics science combine to develop S-map: A new generation soil information system for New Zealand. Geoderma 2012, 170, 232–238. [Google Scholar] [CrossRef]
  8. Robertson, B.B.; Almond, P.C.; Carrick, S.T.; Penny, V.; Eger, A.; Chau, H.W.; Smith, C.M.S. The influence of rock fragments on field capacity water content in stony soils from hard sandstone alluvium. Geoderma 2021, 389, 114912. [Google Scholar] [CrossRef]
  9. Hewitt, A.E. New Zealand Soil Classification, 3rd ed.; Manaaki Whenua Press: Lincoln, Canterbury, New Zealand, 2010; p. 136. [Google Scholar]
  10. Webb, T.H. Identification of functional horizons to predict physical properties for soils from alluvium in Canterbury, New Zealand. Aust. J. Soil Res. 2003, 41, 1005–1019. [Google Scholar] [CrossRef]
  11. Webb, T.H.; Lilburne, L.R. Criteria for Defining the Soil Family and Soil Sibling: The Fourth and Fifth Categories of the New Zealand Soil Classification, 2nd ed.; Manaaki Whenua Press: Lincoln, Canterbury, New Zealand, 2011; p. 38. [Google Scholar]
  12. Milne, J.D.G.; Clayden, B.; Singleton, P.L.; Wilson, A.D. Soil Description Handbook; Manaaki Whenua Press: Lincoln, Canterbury, New Zealand, 1995. [Google Scholar]
  13. Littleboy, M. Spatial Generalisation of Biophysical Simulation Models for Quantitative Land Evaluation: A Case Study for Dryland Wheat Growing Areas of Queensland. Ph.D. Thesis, The University of Queensland, Brisbane, Australia, 1997. [Google Scholar]
  14. Manaaki Whenua Landcare Research. National Soils Data Repository (NSDR). Available online: https://viewer-nsdr.landcareresearch.co.nz/ (accessed on 12 March 2024).
  15. Manaaki Whenua Landcare Research. National Soils Database (NSD). Available online: https://viewer-nsdr.landcareresearch.co.nz/search (accessed on 12 March 2024).
  16. Ng, P.; Maechler, M. A fast and efficient implementation of qualitatively constrained quantile smoothing splines. Stat. Model. 2007, 7, 315–328. [Google Scholar] [CrossRef]
  17. Minasny, B.; McBratney, A.B. The Australian soil texture boomerang: A comparison of the Australian and USDA/FAO soil particle-size classification systems. Soil Res. 2001, 39, 1443–1451. [Google Scholar] [CrossRef]
  18. U.S. Department of Agriculture Natural Resources Conservation Service. National Soil Survey Handbook, Title 430-VI; 2023. Available online: https://directives.sc.egov.usda.gov/ (accessed on 1 May 2024).
  19. Cornell, J.A. Experiments with Mixtures; John Wiley and Sons: New York, NY, USA, 1981. [Google Scholar]
  20. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2023; Available online: https://www.R-project.org (accessed on 1 May 2024).
  21. Marra, G.; Wood, S.N. Coverage properties of confidence intervals for generalized additive model components. Scand. J. Stat. 2012, 39, 53–74. [Google Scholar] [CrossRef]
  22. Wood, S.N. Generalized Additive Models: An Introduction with R, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
  23. Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
  24. Willmott, C.J. On the validation of models. Phys. Geogr. 1981, 2, 184–194. [Google Scholar] [CrossRef]
  25. Nemes, A.; Rawls, W.J. Evaluation of different representations of the particle-size distribution to predict soil water retention. Geoderma 2006, 132, 47–58. [Google Scholar] [CrossRef]
  26. Szabó, B.; Weynants, M.; Weber, T.K.D. Updated European hydraulic pedotransfer functions with communicated uncertainties in the predicted variables (euptfv2). Geosci. Model Dev. 2021, 14, 151–175. [Google Scholar] [CrossRef]
  27. van Genuchten, M.T. A closed-form equation for predicting the hydraulic conductivity of unsaturated soils. Soil Sci. Soc. Am. J. 1980, 44, 892–898. [Google Scholar] [CrossRef]
  28. Kelliher, F.M.; Parfitt, R.L.; van Koten, C.; Schipper, L.A.; Rys, G. Use of shallow samples to estimate the total carbon storage in pastoral soils. New Zealand J. Agric. Res. 2013, 56, 86–90. [Google Scholar] [CrossRef]
  29. Pollacco, J.A.P.; Fernandez-Galvez, J.; Ackerer, P.; Belfort, B.; Lassabatere, L.; Angulo-Jaramillo, R.; Rajanayaka, C.; Lilburne, L.; Carrick, S.; Peltzer, D.A. HyPix: 1D physically based hydrological model with novel adaptive time-stepping management and smoothing dynamic criterion for controlling Newton-Raphson step. Environ. Model. Softw. 2022, 153, 105386. [Google Scholar] [CrossRef]
  30. Rudiyanto; Minasny, B.; Chaney, N.W.; Maggi, F.; Giap, S.G.E.; Shah, R.M.; Fiantis, D.; Setiawan, B.I. Pedotransfer functions for estimating soil hydraulic properties from saturation to dryness. Geoderma 2021, 403, 115194. [Google Scholar] [CrossRef]
  31. Zhang, Y.; Schaap, M.G.; Wei, Z. Development of hierarchical ensemble model and estimates of soil water retention with global coverage. Geophys. Res. Lett. 2020, 47, e2020GL088819. [Google Scholar] [CrossRef]
  32. Helfenstein, A.; Mulder, V.L.; Heuvelink, G.B.M.; Okx, J.P. Tier 4 maps of soil pH at 25 m resolution for the Netherlands. Geoderma 2022, 410, 115659. [Google Scholar] [CrossRef]
  33. Liu, F.; Wu, H.Y.; Zhao, Y.G.; Li, D.C.; Yang, J.L.; Song, X.D.; Shi, Z.; Zhu, A.X.; Zhang, G.L. Mapping high resolution national soil information grids of China. Sci. Bull. 2022, 67, 328–340. [Google Scholar] [CrossRef] [PubMed]
  34. Mulder, V.L.; Lacoste, M.; Richer-de-Forges, A.C.; Martin, M.P.; Arrouays, D. National versus global modelling the 3D distribution of soil organic carbon in mainland France. Geoderma 2016, 263, 16–34. [Google Scholar] [CrossRef]
  35. Vitharana, U.W.A.; Mishra, U.; Mapa, R.B. National soil organic carbon estimates can improve global estimates. Geoderma 2019, 337, 55–64. [Google Scholar] [CrossRef]
  36. Peters, A.; Durner, W.; Wessolek, G. Consistent parameter constraints for soil hydraulic functions. Adv. Water Resour. 2011, 34, 1352–1365. [Google Scholar] [CrossRef]
  37. Pollacco, J.A.P. A generally applicable pedotransfer function that estimates field capacity and permanent wilting point from soil texture and bulk density. Can. J. Soil Sci. 2008, 88, 761–774. [Google Scholar] [CrossRef]
  38. Contreras, C.P.; Bonilla, C.A. A comprehensive evaluation of pedotransfer functions for predicting soil water content in environmental modeling and ecosystem management. Sci. Total Environ. 2018, 644, 1580–1590. [Google Scholar] [CrossRef]
  39. Rawls, W.J.; Pachepsky, Y.A.; Ritchie, J.C.; Sobecki, T.M.; Bloodworth, H. Effect of soil organic carbon on soil water retention. Geoderma 2003, 116, 61–76. [Google Scholar] [CrossRef]
  40. Hewitt, A.E.; Balks, M.R.; Lowe, D.J. The Soils of Aotearoa New Zealand; Springer Link: Cham, Switzerland, 2021. [Google Scholar]
  41. IUSS Working Group WRB. World Reference Base for Soil Resources. International Soil Classification System for Naming Soils and Creating Legends for Soil Maps, 4th ed.; International Union of Soil Sciences (IUSS): Vienna, Austria, 2022. [Google Scholar]
Figure 1. Distribution of points across New Zealand in the assembled NSDR dataset. One point is missing from this map since there were no spatial coordinates available.
Figure 1. Distribution of points across New Zealand in the assembled NSDR dataset. One point is missing from this map since there were no spatial coordinates available.
Applsci 14 03997 g001
Figure 2. Measured versus modeled soil–water response [0, 1] for all tensions by soil order, as well as total available water content (TAW), using validation data. The error bars are the 95% prediction intervals, calculated by posterior simulation.
Figure 2. Measured versus modeled soil–water response [0, 1] for all tensions by soil order, as well as total available water content (TAW), using validation data. The error bars are the 95% prediction intervals, calculated by posterior simulation.
Applsci 14 03997 g002
Figure 3. Probability density of the response residuals (i.e., predicted minus measured values of the response) for training and validation data, for all tensions. TAW—total available water content.
Figure 3. Probability density of the response residuals (i.e., predicted minus measured values of the response) for training and validation data, for all tensions. TAW—total available water content.
Applsci 14 03997 g003
Figure 4. Plot of measured against predicted response using a Beta regression GAM with a smooth function for sample depth and carbon concentration, by composition class. TAW—total available water content.
Figure 4. Plot of measured against predicted response using a Beta regression GAM with a smooth function for sample depth and carbon concentration, by composition class. TAW—total available water content.
Applsci 14 03997 g004
Figure 5. Contour plot of the estimated median total available water content (TAW) for organic horizons as a function of sample depth and carbon fraction for: (a) fibrous peaty soils; (b) humic peat composition soils. The markers are the data points used for model fitting.
Figure 5. Contour plot of the estimated median total available water content (TAW) for organic horizons as a function of sample depth and carbon fraction for: (a) fibrous peaty soils; (b) humic peat composition soils. The markers are the data points used for model fitting.
Applsci 14 03997 g005
Figure 6. Plot of selected goodness-of-fit (GOF) measures for training and validation subsets of the data for each response, for non-organic soils. The dashed line corresponds to the value for the GOF measure for an ideal or perfect fit between the observed and predicted data. TAW—total available water content, MP—Macroporosity.
Figure 6. Plot of selected goodness-of-fit (GOF) measures for training and validation subsets of the data for each response, for non-organic soils. The dashed line corresponds to the value for the GOF measure for an ideal or perfect fit between the observed and predicted data. TAW—total available water content, MP—Macroporosity.
Applsci 14 03997 g006
Figure 7. Plot of the estimated median of total available water content (TAW) (left axis, solid line) and average accuracy (mean of the upper and lower 95% confidence values of TAW; right axis, dashed line) as a function of the sample depth. The shaded region in each case is plus-and-minus one standard error of the estimate.
Figure 7. Plot of the estimated median of total available water content (TAW) (left axis, solid line) and average accuracy (mean of the upper and lower 95% confidence values of TAW; right axis, dashed line) as a function of the sample depth. The shaded region in each case is plus-and-minus one standard error of the estimate.
Applsci 14 03997 g007
Figure 8. Measured against predicted total available water content (TAW) response for the EUPTF2 model. The predicted EUPTF2 TAW value was calculated from the difference between the EUPTF2 predicted responses at −10 and −1500 kPa. The long-dashed line is the linear fit between measured and modeled values, while the short-dashed line is the 1:1 line.
Figure 8. Measured against predicted total available water content (TAW) response for the EUPTF2 model. The predicted EUPTF2 TAW value was calculated from the difference between the EUPTF2 predicted responses at −10 and −1500 kPa. The long-dashed line is the linear fit between measured and modeled values, while the short-dashed line is the 1:1 line.
Applsci 14 03997 g008
Figure 9. Goodness-of-fit (GOF) measures for the 2018 logistic regression model from [1], the 2024 GAM model from this study, and the EUPTF V2 models [26], by measure. The preferred direction varies between measures; the dashed red line shows the ideal value for each of the measures.
Figure 9. Goodness-of-fit (GOF) measures for the 2018 logistic regression model from [1], the 2024 GAM model from this study, and the EUPTF V2 models [26], by measure. The preferred direction varies between measures; the dashed red line shows the ideal value for each of the measures.
Applsci 14 03997 g009
Table 1. Summary of the available explanatory variables available from S-map for use in the PTF model, derived from New Zealand National Soil Data Repository soils data.
Table 1. Summary of the available explanatory variables available from S-map for use in the PTF model, derived from New Zealand National Soil Data Repository soils data.
ParameterData TypeInterpretation
NZSC soil orderCategoricalSoil order defined as in [9]
NZSC soil groupCategoricalSubdivision of specific soil order classes defined as in [9]
Sample depthNumericMid-depth of sample (maximum 1 m)
Sand, silt, clay fractionsNumericSand, silt, and clay content, as defined in the National Soils Data Repository, and documented in [12]
Drainage classCategoricalAn ordered set of 5 categories from “verypoorlydrained” to “welldrained”, as defined in [12]
Rock class of finesCategoricalAn unordered set of 8 categories from “Siliceous”, “SoftRocks”, “HardRocks”, “BasicRocks”, “Limestone”, “Rhyolitic”, “OtherRocks”, or “NotApplicable” (where rock class of fines is not specified, as opposed to being missing), as defined by Webb and Lilburne [11]
TopsoilBooleanTrue/False (from functional horizon description)
Organic horizonBooleanTrue/False, as defined in [12]
TextureCategoricalAn unordered set of 4 categories from “sandy”, “loamy”, “clayey”, or “nulltexture” (where texture class is not applicable, as opposed to being missing), as defined in [11]
Ped sizeCategoricalAn unordered set of 4 categories from “earthy”, “fine”, “coarse”, or “nullpedsize” (where ped size is not applicable, as opposed to being missing), as defined in [11]
StrengthCategoricalAn unordered set of 4 categories from “loose”, “weakslightlyfirm”, “firm”, or “nullstrength” (where strength is not applicable, as opposed to being missing), as defined in [11]
TephraCategoricalAn unordered set of 3 volcanic soil categories, from “acidic”, “basic”, or “nulltephra” (for non-tephric soil material), as defined in [11]
CompositionCategorical[Only organic horizons] Organic composition of the horizon, an unordered set of 2 categories from either “peaty” or “humic”, as defined in [11]
Table 2. Number of samples and sites in the assembled NSDR dataset in each NZSC soil order, as well as in total.
Table 2. Number of samples and sites in the assembled NSDR dataset in each NZSC soil order, as well as in total.
NZSC Soil OrderNumber of SamplesNumber of Sites
Allophanic55464
Allophanic Brown16425
Granular517
Immature Gley14819
Immature Pallic958110
Mature Gley1005125
Mature Pallic74381
Melanic10414
Non-Allophanic Brown841159
Organic579
Oxidic518
Podzol21030
Pumice15728
Recent69193
Semiarid9810
Ultic12414
All soil orders5956796
Table 3. Definition of particle size distribution values associated with sand, silt, and clay texture classes. Closed and open intervals are represented by ‘[’ and ‘)’, respectively.
Table 3. Definition of particle size distribution values associated with sand, silt, and clay texture classes. Closed and open intervals are represented by ‘[’ and ‘)’, respectively.
ComponentParticle Size Distribution Range (µm)
New Zealand (1)Australia (2)USDA/FAO (2, 3)
Sand[60, 2000)[20, 2000)[50, 2000)
Silt[2, 60)[2, 20)[2, 50)
Clay[0, 2)[0, 2)[0, 2)
1: See [12]. 2: For a comprehensive comparison, see [17]. 3: See Figure 618-A17 of [18].
Table 4. Selected summary statistics for responses at all tensions, for both the training and validation sample subsets.
Table 4. Selected summary statistics for responses at all tensions, for both the training and validation sample subsets.
Response Tension
Total Porosity−5 kPa−10 kPa−20 kPa−40 kPa−100 kPa−1500 kPa
TrainingNum. samples 2788352035582944281130414135
Minimum 0.2910.0860.0500.0460.0370.0180.010
2.5% quantile 0.3720.2500.1740.1280.1070.0850.037
Median 0.5270.4190.3940.3730.3540.3250.186
Mean 0.5420.4300.4010.3780.3570.3270.197
97.5% quantile 0.7410.6350.6140.6010.5810.5520.393
Maximum 0.9390.8500.8200.7370.7270.7050.566
ValidationNum. samples1174150515181263122313041757
Minimum 0.3140.0800.0540.0490.0410.0290.014
2.5% quantile 0.3700.2810.2000.1560.1330.1060.041
Median 0.5290.4190.3950.3740.3540.3260.190
Mean 0.5410.4310.4020.3790.3560.3260.198
97.5% quantile 0.7590.6440.6210.6070.5920.5630.400
Maximum 0.9340.8490.7490.7380.7280.7030.520
Table 5. (a) Correlation between soil–water response at different tensions (kPa). (b) Correlation between soil–water response pairs at different tension (kPa) combinations.
Table 5. (a) Correlation between soil–water response at different tensions (kPa). (b) Correlation between soil–water response pairs at different tension (kPa) combinations.
(a) θ 5 θ 10 θ 20 θ 40 θ 100 θ 1500
0.720.620.560.520.500.28 θ 0
0.960.910.870.830.54 θ 5
0.980.960.920.65 θ 10
0.990.970.73 θ 20
0.990.76 θ 40
0.81 θ 100
(b) θ 5 θ 10 θ 10 θ 20 θ 20 θ 40 θ 40 θ 100 θ 100 θ 1500 θ 1500
0.450.270.150.05−0.26−0.41 θ 0 θ 5
0.610.380.15−0.13−0.46 θ 5 θ 10
0.620.52−0.01−0.46 θ 10 θ 20
0.630.10−0.37 θ 20 θ 40
0.11−0.30 θ 40 θ 100
−0.02 θ 100 θ 1500
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

McNeill, S.; Lilburne, L.; Vickers, S.; Webb, T.; Carrick, S. An Improved Pedotransfer Function for Soil Hydrological Properties in New Zealand. Appl. Sci. 2024, 14, 3997. https://doi.org/10.3390/app14103997

AMA Style

McNeill S, Lilburne L, Vickers S, Webb T, Carrick S. An Improved Pedotransfer Function for Soil Hydrological Properties in New Zealand. Applied Sciences. 2024; 14(10):3997. https://doi.org/10.3390/app14103997

Chicago/Turabian Style

McNeill, Stephen, Linda Lilburne, Shirley Vickers, Trevor Webb, and Samuel Carrick. 2024. "An Improved Pedotransfer Function for Soil Hydrological Properties in New Zealand" Applied Sciences 14, no. 10: 3997. https://doi.org/10.3390/app14103997

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop