Next Article in Journal
Biomass of Eichhornia crassipes as an Alternative Substrate for the Formation of Lettuce Seedlings
Previous Article in Journal
Observed Energy Use by Broiler and Pullet Farms
Previous Article in Special Issue
Autonomous Driving Strategy for a Specialized Four-Wheel Differential-Drive Agricultural Rover
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction of Soil Field Capacity and Permanent Wilting Point Using Accessible Parameters by Machine Learning

1
Zachary Department of Civil and Environmental Engineering, Texas A&M University, College Station, TX 77843, USA
2
General Research Service Center, National Pingtung University of Science and Technology, Pingtung 91201, Taiwan
3
Department of Civil Engineering, National Pingtung University of Science and Technology, Pingtung 91201, Taiwan
4
International Irrigation Research and Development Service Center, National Pingtung University of Science and Technology, Pingtung 91201, Taiwan
*
Author to whom correspondence should be addressed.
AgriEngineering 2024, 6(3), 2592-2611; https://doi.org/10.3390/agriengineering6030151
Submission received: 5 June 2024 / Revised: 15 July 2024 / Accepted: 17 July 2024 / Published: 2 August 2024
(This article belongs to the Special Issue Implementation of Artificial Intelligence in Agriculture)

Abstract

:
The field capacity (FC) and permanent wilting point (PWP) are fundamental hydrological properties critical for assessing water availability within soils, rather than direct measures of soil health. Due to the challenges associated with their field measurement, alternative assessment methods are necessary. In this study, global-scale accessible soil data were retrieved from the world soil database called the World Soil Information Service (WoSIS), and artificial neural network (ANN) and gene-expression programming (GEP) algorithms were used to predict soil FC and PWP based on easily obtainable parameters from the database. The best-fit variable combination for FC (longitude, latitude, altitude, sand content, silt content, clay content, and electrical conductivity) and PWP (best-fit FC combination plus pH) modeling was determined. Both ANN and GEP showed greater accuracy than linear-based models in simulating the FC and PWP from the best-fit variables. The mean absolute error (MAE) was reduced by 51.54% for the FC and 56.38% for the PWP by the ANN model, compared with the linear model used in the previous literature. The normalized root mean square error (NRMSE) evaluation indicated that the ANN model performed best for PWP prediction (NRMSE of 19.9%), while the GEP model was superior for FC prediction (NRMSE of 29.9%). Between the ANN and GEP models, the ANN model showed a slightly higher model of interpretability; however, the GEP model exhibited a similar or better ability to avoid large error, based on the error distribution. Overall, our results demonstrated that machine learning is effective in predicting the FC and PWP from easily accessible data from WoSIS, and the GEP model is more preferable for FC and PWP modeling.

Graphical Abstract

1. Introduction

Plant available water capacity provides information on the amount of water that is stored in soil and can be used by plants, which is determined by the differences between the soil field capacity (FC, −33 kPa of suction pressure) and the permanent wilting point (PWP, −1500 kPa of suction pressure) [1]. While the FC and PWP are critical for understanding soil water dynamics, it is important to clarify that these parameters primarily reflect the hydrological aspects of soil rather than direct measures of soil health. These properties are integral to applications such as irrigation management [2], crop modeling [3], and precision agriculture [4]. However, the FC and PWP of soil are often not available because their measurements are both time-consuming and costly [5,6]. In the latest soil health database, SoilHealthDB [7], only 81 results for the FC and PWP were reported out of 5908 data entries across 41 countries. This paucity of direct data underscores the urgent need for more efficient and cost-effective methodologies to estimate these crucial soil parameters. Therefore, the capability to estimate the FC and PWP from other easily measurable parameters is highly desirable [8]. Historically, pedotransfer functions (PTFs) have been applied to derive FC and PWP values based on parameters like soil texture, soil bulk density, and soil organic matter (SOM) [9,10]. These traditional methods, while noble in their intent, have been fraught with challenges, most prominently their limited accuracy. In an innovative push to enhance the precision of these models, parameters like the soil cation exchange capacity (CEC) [11], soil calcium carbonate (CaCO3) content [12], geographical locations [13], and even geographical nuances [13] have been incorporated. Past efforts have shed light on the limitations of conventional models. For instance, Santra et al. [9] calculated the gravimetric FC and PWP in arid western India with sand, silt, and clay contents; organic carbon; and bulk density as input variables of PTF. The results showed that the coefficient of determination (R2) of best-fit FC and PWP was 0.63 and 0.34, respectively. Cueff et al. [14] used CEC and phosphorus content as additional inputs in addition to those used by Santra and colleagues to model the FC and PWP for soil in southwest France. The root mean squared error (RMSE) of their model ranged from 3.7% to 8.0% for the FC and 3.4% to 5.7% for the PWP. A significant drawback of previous efforts is the reliance on linear algorithms, which have inherently restricted the accuracy of these models [15,16].
Considering these challenges, there is a clear and present need for novel methodologies that move beyond traditional linear models, incorporating advanced algorithms that can capture the complexity and variability of soils, leading to more accurate and reliable estimations of the FC and PWP. For example, geostatistical approaches such as Kriging [17] and k-nearest neighbors (k-NN) [18] and parameter approaches such as regression tree (RT) [16], artificial neural network (ANN) [19], neuro-fuzzy (NF) [20], random forest (RF) [21], support vector machine (SVM) [22], gene-expression programming (GEP) [20], and deep learning (DL) [23] have been used for FC and PWP modeling in various studies. In a recent study, Shiri et al. [20] modeled FC and PWP for soil in Iran (0–60 cm) using five different algorithms: SVM, GEP, NF, RF, and RT. The contents of sand, silt, and clay; equivalent CaCO3; bulk density; particle density; geometric mean of particle diameter; and geometric standard deviation of soil particle size were used as input variables. The study findings showed that the NF model had the highest R2 followed by GEP. Taşan and Demir [12] simulated the FC and PWP in Turkey by conventional linear regression (REG) and an ANN, with sand, silt, and clay contents; bulk density; OM; CEC; and CaCO3 as input variables. The results showed that the ANN model displayed higher modeling accuracy, and R2, RMSE, and the mean absolute error (MAE) were 0.80, 3.12%, and 2.27% for the FC and 0.83, 1.84%, and 2.40% for the PWP, respectively. Yamaç et al. [23] modeled the FC and PWP of soil in Turkey using previously published PTFs, as well as k-NN, DL, and ANN using the same input data of sand, silt, and clay content; lime; bulk density; OM; particle density; and aggregate stability. The DL and ANN approaches had comparable R2 and MAE values of 0.829 and 2.7% in the FC modeling. In PWP modeling, k-NN showed the best performance with an R2 = 0.800 and MAE = 2.1%. Although previous studies demonstrated remarkable FC and PWP modeling ability by advanced algorithms, they were primarily based on site-specific datasets [14,24]. Therefore, models that can be applied to broader geographical regions are still missing.
Due to the importance of the FC and PWP in agricultural production, more reliable and broadly applicable FC and PWP modeling based on global-scale soil datasets is needed [12,13,14,23]. This study leveraged machine learning (ML) algorithms, specifically ANN and GEP, to predict FC and PWP values using an extensive range of data. The models incorporated physical (sand, silt, and clay content), chemical (electrical conductivity (EC) and pH), and geological (longitude, latitude, and altitude) parameters. Both EC and pH were found to be significantly related to soil water content (SWC) [25,26] and are easy to measure [27,28], but they were not used as modeling inputs in previous FC and PWP simulations.
ANNs were chosen for their proven ability to handle complex, non-linear interactions and large datasets, which makes them ideal for analyzing intricate relationships among various soil properties. Their effectiveness is supported by research that highlights their utility in modeling diverse soil dynamics, such as soil compaction and water retention [29,30]. Conversely, GEP was selected for its strengths in modeling non-linear and complex interactions, including soil water capacity [20] and soil stability [31], offering a robust framework for scenarios where traditional models may falter. This adaptability is crucial for fine-tuning models to uncover subtle patterns and interactions in the soil data. By integrating these advanced algorithms, this study aimed to enhance the predictive accuracy of FC and PWP models, providing a more comprehensive understanding of soil behavior that is critical for improving agricultural practices and irrigation strategies. The inclusion of ANNs and GEP represents a significant advancement over conventional linear models, offering new insights into soil dynamics that could lead to more effective and environmentally sustainable agricultural outcomes. Specific objectives of this study included the following:
(i)
Determine the optimal combination of variables for FC and PWP modeling;
(ii)
Apply machine learning algorithms to predict the FC and the PWP from easily measurable inputs from global scale accessible data.

2. Materials and Methods

2.1. Data Source and Process

Both the FC and PWP (cm3/100 cm3; %) and related soil composition (sand, silt, and clay contents; %), geographical location (longitude, latitude; decimal degree), pH, and EC (dS/m) were retrieved from the latest global soil database (WoSIS) [32]. Another geographical variable, altitude (m), was retrieved from Google Maps by the longitude and latitude information of the sample site. The WoSIS database includes over 5.8 million quality-assessed records from 173 countries. The procedure for data standardization and detailed information on the WoSIS database can be found in [32,33,34]. In this study, the WoSIS data were cleaned by only keeping information on the topsoil from 0 to 35 cm and by using a dataset that had complete information on the input variables, the FC, and the PWP. A total of 210 FC and 254 PWP values were extracted from the database for modeling. Descriptive statistical information for FC and PWP modeling, including the average (μ), standard deviation (σ), maximum (Max.) and minimum (Min.) of all used variables, and correlation coefficient (r) between each input variable and the FC or PWP, are summarized in Table 1. The geographical distribution of the FC and PWP from the database are shown in Figure 1. The soil classification and countries from which the applied dataset for FC and PWP modeling were derived are shown in the Appendix A Table A1. The boxplot-based dataset distribution is illustrated in Figure A1.

2.2. Previous PTFs and Linear Regression Algorithm

Three regression-based PTFs applied in Saxton and Rawls [35], Adhikary et al. [36], and Tóth et al. [16] were collected as base models for FC and PWP modeling (Appendix B, Table A3). In addition, this study developed a WoSIS-based PTF model based on a linear regression algorithm, which is described as Equation (1).
F C   o r   P W P = i = 1 n α i x i + β + ε
where n is the total number of data points, xi is the ith input variable; α is the slope of the linear equation; β is an intercept of the regression, and ε is the error term.

2.3. Artificial Neural Networks (ANNs)

In addition to the four linear regression-based models, two ML-enabled algorithms, ANN and GEP, were also evaluated for their predictive accuracy of the FC and PWP. ANN is a powerful algorithm for non-linear problems that performs experience-based learning processes with an inherent punishment mechanism. Compared with a conventional linear model, an ANN deploys data features in multiple dimensions, which enables more comprehensive and accurate predictions. This study used NeuroSolution 7.1 software (NeuroDimension Inc, Gainesville, FL, USA) to develop predictive models for the FC and PWP by using a classic backpropagation neural network (BPNN). The maximum epoch was 1000 and the algorithm had a 0.01 learning rate. The Levenberg–Marquardt gradient search method with an early stopping callback was used to prevent overfitting. A single hidden layer was employed with 10 neuros [37]. The modeling dataset was normalized using Equation (2) and randomly divided into three sub-datasets for training, cross-validation (CV), and testing, with data ratios of 70%, 15%, and 15%, respectively. The training dataset was used for initial model development, and the CV dataset was used for hyper-parameter adjustment; the adjusted model was applied to the testing dataset for model performance evaluation [38]. Cross-validation was unavailable in the linear regression (REG) model; therefore, the dataset for CV was added to the training set for REG modeling. The data ratio of training and testing of the REG model was 85% and 15%, respectively.
x n o r m = x x m i n x m a x x m i n
where xnorm is the normalized dimensionless variable, x is the observed value, xmin is the minimum observed value, and xmax is the maximum observed value of the variable.

2.4. Gene-Expression Programming (GEP)

The GEP algorithm was the second ML-based algorithm investigated in this study. The algorithm is capable of establishing mathematical relationships between input variables and output parameters, similar to the gene algorithm (GA) and gene programming (GP). However, the computing is significantly faster and the computer accuracy is appreciably higher than the GA and GP [39]. A classic GEP begins with a major race and undergoes a continuous evolutionary process, such as selection, replication, mating, mutation, adaptation, reversal, and transformation to evolve toward a predetermined objective [40,41,42]. In this study, GeneXproTools 5.0 software (Gepsoft Ltd., Bristol, UK) was used for FC and PWP modeling. The algorithm was also trained, cross-validated, and tested by the same input dataset as the ANN model. The FC model incorporates two genes, whereas the PWP model is more intricate, encompassing five genes. The process of gene linkage, or how these genes interact and combine, is pivotal. For the FC model, a “minimum” linking function is employed, signifying that the smallest value or output from its genes is selected. Conversely, the PWP model utilizes a “multiplication” linking function, indicating that the outputs of its genes are multiplied together to produce a resultant value. Ten head sizes were used for both models with fifty chromosomes and ten thousand evolved generations with model elements including +, −, *,/, x−1, x2, x5, x1/3, x1/4, x1/5, natural log (ln), floor, power, no processing, sine, cosine, tangent, arcsine, arctangent, average, and minimum. These operational elements, which can be likened to mathematical and computational functions, are integral to the model’s ability to adapt, learn, and ultimately solve the designated problem. Other parameters, such as the mutation rate, inversion rate, and gene transposition rate were set at default values, as described in the GEP theory in [43].

2.5. Assessment of the Best-Fit Combination of Input Variables for ML Based Models

The best-fit input variables for FC and PWP prediction using ML-based algorithms were determined using an ANN. Fifteen different combinations of input variables, including geographical location (L; longitude, latitude, and altitude), soil textures (S; sand, silt, and clay content), pH, EC, L + S, pH + EC, L + pH, L + EC, L + pH + EC, S + pH, S + EC, S + pH + EC, L + S + pH, L + S + EC, and L + S + pH + EC were evaluated. These combinations were used as input variables for ANN and typical model evaluation metrics, R2, RMSE, normalized RMSE (NRMSE), and MAE [44,45,46], were calculated to determine the model performance of each combination, as shown in Equation (3)–(6). Furthermore, following the approach of Jamieson et al. [47], the NRMSE was categorized into four classes: simulations with NRMSE less than 10% were deemed excellent, those between 10% and 20% were good, those between 20% and 30% were fair, and those above 30% were considered poor.
R 2 = i = 1 n y ^ i y ¯ 2 i = 1 n y i y ¯ 2
R M S E = i = 1 n y i y ^ i 2 n 1
N R M S E = R M S E y ¯
M A E = 1 n i = 1 n A E = 1 n i = 1 n y i y ^ i
where R2 is the coefficient of determination, y ^ i is the predicted value in ith datum, y i is the observed value in ith datum, y ¯ is the average of the observations, n is the number of actual observations, RMSE is the root mean square error, NRMSE is normalized RMSE, MAE is the mean absolute error, and AE is the absolute error.
For the four linear models, Adhikary’s model [36] only used soil texture (sand, silt, and clay content) data as input variables, but the other two linear models required both OM and soil texture for modeling, which were also retrieved from the WoSIS database. A total of 210 data for the FC and 254 data for the PWP were used in the REG model and in Adhikary’s model. A total of 179 data in the FC and 219 data in the PWP were used in models in Saxton and Rawls [35] and Tóth et al. [16], the same as the dataset used in the REG, ANN, and GEP approaches in this study.

2.6. Rank the Input Variables for FC and PWP Modeling

The test dataset used for the FC and PWP simulation with ML models was evaluated by a Kolmogorov–Smirnov nonparametric test (K-S test) to determine the relative importance of input variables. The K-S test is a nonparametric test method for two sample comparison, which is unrelated to the sample’s frequency distribution and was used to compare the underlying probability distributions. The K-S test was analyzed by IBM SPSS software, version 22 (IBM Corp., Armonk, NY, USA). The test dataset was divided into 2 groups by MAE, with one group having an absolute error between observed targets and simulated results greater than MAE and the other group being smaller than MAE.

3. Results and Discussion

3.1. Determination of the Best-Fit Variables

To systematically elucidate the criteria behind the selection of variable combinations for FC and PWP modeling, our approach was grounded in a comprehensive analysis of soil properties’ interactions and their predictive significance. The modeling results from combinations of different inputs are shown in Table 2. The selection process was rigorously informed by a review of the existing literature and empirical evidence demonstrating the individual and combined effects of these properties on soil hydrology. In the FC model, the combination of L + S + EC resulted in the best performance. In the PWP simulation, L + S + pH + EC led to the highest accuracy. While the individual L and S datasets appeared to generate reasonable FC and PWP predictions, the combination of L + S generated more accurate FC and PWP simulations. Interestingly, while the model performance using ionic strength or EC alone as the input variable was quite low, their combination with L + S significantly enhanced the model performance in predicting both the FC and PWP. Previous studies suggested that EC is more sensitive to soil water variation [48,49] because the relative dielectric permittivity of water is generally more than an order of magnitude larger than that of other soil components [50]. As a result, the bulk relative dielectric permittivity of soil is primarily a function of the soil water content (SWC) [51]. The pH appears to have lower impact on the FC than the PWP, because the SWC easily fluctuates with the salinity content in the environment [52]. Chronically high or low pH has been found to cause soil water storage variation [26], but the SWC does not significantly change with rapidly varying pH. In addition, pH in the environment is relatively stable due to the carbonate buffering capacity, thus, its impact on SWC is not significant [53].

3.2. Comparison of Simulated FC and PWP by Different Models

GEP and REG models were developed based on the best-fit combination identified above. As a comparison, the same set of data used for these models was also applied to three PTFs from the literature mentioned earlier (i.e., Saxton and Rawls, Adhikary et al., and Tóth et al.). The outputs from the published PTFs, REG, ANN, and GEP models, are shown in Figure 2 and Figure 3 and Appendix C, Table A4, respectively. The results showed that the model used in Adhikary et al. [36] has the highest R2 in the published PTF in both the FC (0.683) and the PWP (0.823) simulation. In the ANN, GEP, and REG models, the ANN performed the best, as indicated by the R2, RMSE, NRMSE, and MAE in both the FC and PWP models, followed by the GEP and REG model. ML-based ANN and GEP models exhibit a greater modeling accuracy than the REG and other three PTF models. The result is consistent with [20]. The RMSE, NRMSE, and MAE in the ML-based ANN model were decreased by 38.87%, 20.28%, and 51.54% in the FC simulation and 68.07%, 42.35%, and 56.38% in the PWP simulation compared with the PTF model used in Adhikary et al. The modeling results of each model can be found in Table 3 and Appendix C, Figure A2. The expression equations derived from the GEP model for the FC and PWP models with the Python program are provided in Appendix D, and the equations of the FC and PWP developed with the REG are shown in Equations (7) and (8), respectively.
The simulation results for the different models in predicting the FC and PWP are summarized with their respective NRMSE values. For FC prediction, the models exhibited varying degrees of accuracy. According to the classification criteria adapted from Jamieson et al. [47], the GEP model, with an NRMSE of 29.9%, performed the best among all models and was classified as fair. The ANN model, with an NRMSE of 31.9%, was slightly higher but still falls into the poor category. Other models such as the REG, the model by Adhikary et al., and the model by Saxton and Rawls had NRMSE values of 49.2%, 52.2%, and 56.3%, respectively, and were all categorized as poor. Tóth et al. had the highest NRMSE at 123.9%, highlighting that the existing FC models may produce significant errors when using global scale datasets. For PWP prediction, the ANN model outperformed other models significantly, with an NRMSE of 19.9%, classifying it as good. The GEP model followed with an NRMSE of 22.3%, which is considered fair. Tóth et al. showed an NRMSE of 37.7%, falling into the poor category. The Saxton and Rawls, Adhikary et al., and REG models had NRMSE values of 41.6%, 62.2%, and 69.0%, respectively, and were all classified as poor. The NRMSE evaluation on the FC and PWP indicates that the ML-based algorithms provide a relatively accurate prediction.
F C R E G = 0.3067 · L o + 0.1836 · L a + 0.0008 · A l + 0.3495 · S a + 0.4724 · S i + 0.7345 · C 0.093 · E C 24.8313
P W P R E G = 0.0474 · L o + 0.0752 · L a + 0.0004 · A l + 0.2925 · S a + 0.3302 · S i + 0.6530 · C 0.0049 · E C + 0.7750 · p H 31.5174
where FCREG is the field capacity and PWPREG is the permanent wilting point in the REG model (%); Lo is the longitude; La is the latitude; Al is the altitude (m); Sa is the sand content (%); Si is the silt content (%); C is the clay content (%); and EC is the electrical conductivity (ds/m). In the ANN model, due to the complex connections between neurons at different layers, a mathematical formula cannot be derived.

3.3. Identification of Dominant Input Variables

In order to determine the relative importance of the input variables included in the best-fit combinations of variables, K-S analysis was conducted and the results for altitude, sand, silt, clay, EC, and pH, are shown in Table 4. The results showed that clay, sand, and silt in the FC model and altitude and EC in the PWP model have significant impacts on preventing higher absolute errors in FC and PWP simulation. Longitude and latitude were not included in the K-S analysis due to the wide range of values for these two parameters in the database. However, it is clear that geographical locations play essential roles in FC and PWP modeling, which agrees with the conclusions of [9,13,23]. Despite the fact that pH and EC do not always decrease the error in FC and PWP modeling, the inclusion of these two parameters in the combination of input variables generally improves the model’s performance.
For the model error distribution analysis, the comprehensive modeling capacity of each model was evaluated by above-mentioned indicators. However, R2 only describes the model’s responded variation between dependent variables and independent variables. RMSE, NRMSE, and MAE are averaged errors that cannot represent the error distribution, i.e., indicating the primary range of the error. In order to evaluate each model’s error distribution, the absolute error (AE) was utilized in the testing dataset of each model for both the FC and PWP (Figure 4). Five categories were used, including AE ≤ 1, 1 < AE ≤ 2, 2 < AE ≤ 3, 3 < AE ≤ 4, and AE > 4. Figure 4a shows the AE distribution in the FC model. All linear-based models have lower accuracy (AE ≤ 1%), ranging from 0% to 9.4%, and an unignored error (AE > 4%) from 40.6% to 96.4%. In contrast, ML-based ANN and GEP models demonstrate better simulation ability (AE ≤ 1%) from 25.0% to 37.5% and the massive error (>4%) is from 28.1% to 31.3%. Over 60% of AEs in the three PTFs in previous studies are greater than 4% although the FC modeling result from the study of Adhikary et al. [36] presented an acceptable R2 (0.683), but around 78.1% of AEs were higher than 4%.
The result for the PWP simulation was similar to the ANN and GEP models and had greater modeling ability than linear-based models in terms of AE distribution, as shown in Figure 4b. Notably, the GEP model showed a higher simulation ability to avoid large errors (AE > 4%) than the ANN model, almost 90% of AEs in the GEP model were lower than 4% while this value was only 84% in the ANN model. It should be mentioned that the model applied by Adhikary et al. [36] only used sand and silt in modeling the FC and clay in modeling the PWP, meaning that this model has the lowest dataset requirement. In situations in which resources to analyze OM, pH, and EC are not available, equations from Adhikary et al. [36] can be used to provide rough estimations of the FC and PWP values. When EC and pH data are available, the ML-based ANN model is an effective and more accurate algorithm to predict the FC and PWP. However, the ANN model requires more sophisticated hardware support and knowledge than the GEP model. Therefore, considering the applicability and performance, GEP has some advantages over ANN for the FC and PWP simulation.

4. Conclusions

The Field capacity (FC) and permanent wilting point (PWP) are critical metrics in soil health assessment. Yet, direct measurement of their values is often challenging. In the study, data from the World Soil Information Service (WoSIS) were utilized, and advanced algorithms, specifically the artificial neural network (ANN) and gene-expression programming (GEP) algorithms, were employed to predict the FC and PWP using accessible parameters. Optimal variables for modeling the FC and PWP were identified including longitude; latitude; altitude; sand, silt, and clay content; electrical conductivity; and pH. Significantly, the ML-based ANN and GEP models achieved higher modeling accuracy, compared with the conventional, linear-based FC and PWP model developed in Adhikary et al. [36].
The NRMSE evaluation showed that the ANN model was best for PWP prediction, while the GEP model performed best for FC prediction, both providing relatively accurate results. Although greater model interpretability was exhibited by the ANN, greater resilience against substantial errors was demonstrated by the GEP model. However, it is important to note that caution must be exercised when applying these models to current soils because while soil changes are minimal when undisturbed or undeveloped, the collection times for data in the WoSIS database vary. This might result in disparities between the database’s information and the current state of soil development.
Overall, this study revealed that machine learning, particularly in the GEP model, can be effectively used to predict the FC and PWP using WoSIS data. These results can be used to improve rational irrigation practices, akin to IoT-based smart irrigation systems as reported by [54]. This enhanced irrigation scheme could lower water consumption in agriculture, reduce the likelihood of crop failures, and potentially decrease greenhouse gas emissions, marking a significant advancement in soil health evaluation and environmental stewardship. Future studies are encouraged to explore a broader range of machine learning models and to refine predictive accuracy further, thereby enhancing the management of soil resources and the resilience of agricultural ecosystems.

Author Contributions

Conceptualization, L.L. and X.M.; methodology, L.L.; software, L.L.; validation, L.L. and X.M.; formal analysis, L.L.; resources, L.L. and X.M.; data curation, L.L.; writing—original draft preparation, L.L.; writing—review and editing, X.M.; visualization, L.L. and X.M.; supervision, X.M.; funding acquisition, L.L. and X.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work is partially supported by the USDA National Institute of Food and Agriculture, AFRI project (2023-67021-39755), the Ministry of Science and Technology, Taiwan (109-2917-I-020-001), and the National Science and Technology Council, Taiwan (113-2221-E-020-009-MY3).

Data Availability Statement

The data used in this study can be found in the following references: [32]. Batjes, N. H., Ribeiro, E., & Van Oostrum, A. (2020). Standardised soil profile data to support global mapping and modelling (WoSIS snapshot 2019). Earth System Science Data, 12(1), 299-320. [34]. Ribeiro, E., Batjes, N. H., & Van Oostrum, A. J. M. (2018). World Soil Information Service (WoSIS)-Towards the standardization and harmonization of world soil data. Procedures manual, 166.

Acknowledgments

The authors gratefully acknowledge the Department of Civil and Environmental Engineering, Texas A&M University for providing computing equipment and the Taiwan government for research funding.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

AbbreviationMeaning
AEabsolute error
Alaltitude
ANNartificial neural network
BPNNbackpropagation neural network
CECcation exchange capacity
Cclay content
CVcross-validation
DLdeep learning
ECelectrical conductivity
FCfield capacity
FCANNfield capacity in ANN model
FCGEPfield capacity in GEP model
FCREGfield capacity in REG model
GAgene algorithm
GEPgene expression programming
GPgene programming
idatum order
k-NNk-nearest neighbors
K-S testKolmogorov–Smirnov nonparametric test
Lalatitude
Lolongitude
MAEmean absolute error
Max.maximum
Min.minimum
MLmachine learning
nnumber of actual observations
NFneuro-fuzzy
NRMSEnormalized root mean square error
OMorganic matter
PTFpedotransfer functions
PWPpermanent wilting point
PWPANNpermanent wilting point in ANN model
PWPGEPpermanent wilting point in GEP model
PWPREGpermanent wilting point in REG model
rcorrelation coefficient
REGregression
RFrandom forest
RMSEroot mean square error
R2coefficient of determination
RTregression trees
Ssoil textures (sand, silt, and clay content)
Sasand content
Sisilt content
SOMsoil organic matter
SVMsupport vector machine
SWCsoil water content
WoSISWorld Soil Information Service
xobserved value
xmaxmaximum observed value
xminminimum observed value
xnormnormalized dimensionless variable
y i observed value in ith datum
y ^ i predicted value in ith datum
y ¯ average of the observations
μaverage
σstandard deviation
αslope of the linear equation
βintercept of the regression
εerror term

Appendix A. Soil Group and Nations of the FC and PWP Model

In total, 20 countries with 24 Food and Agriculture Organization (FAO) soil groups for field capacity (FC) model development and 27 countries with 25 FAO soil groups for permanent wilting point (PWP) model establishment (Table A1).
Table A1. FAO soil group and observed country in FC and PWP model.
Table A1. FAO soil group and observed country in FC and PWP model.
CountryTargetAcrisolsAndosolsArenosolsCalcisolsCambisolsChernozemsFerralsolsFluvisolsKastanozemsLeptosolsLixisolsLuvisols
AlbaniaFC 2
PWP 2
BeninFC
PWP 1
Burkina FasoFC 2
PWP 2
CanadaFC 1 2
PWP 12 1 2
ColombiaFC 5
PWP 5
EcuadorFC
PWP 1
EthiopiaFC
PWP
GermanyFC 101 5 1
PWP 101 5 1
IndiaFC 4
PWP 4
IndonesiaFC6
PWP6 1
JamaicaFC
PWP 1
JordanFC 1
PWP 1
KenyaFC 1 2
PWP 1 2
MozambiqueFC1 1 11
PWP1 1 11
PolandFC
PWP
PortugalFC 2 2 1
PWP 2 2 1
Puerto RicoFC
PWP1
Sierra LeoneFC1
PWP1
South AfricaFC15 1519 21111822
PWP15 1519 21111822
SurinameFC
PWP2 9
SwedenFC
PWP
ThailandFC 1
PWP 1
UKFC
PWP 1
TanzaniaFC 1 2
PWP 1 2
USAFC
PWP
ZambiaFC4
PWP4
ZimbabweFC 2
PWP 2
UncategorizedFC4
PWP21
Table A2. FAO soil group and observed country in FC and PWP model continued.
Table A2. FAO soil group and observed country in FC and PWP model continued.
CountryTargetNitisolsNitosolsPhaeozemsPlanosolsPlinthosolsPodzolsPodzoluvisolsRegosolsRendzinasSolonetzVertisolsXerosolsYermosols
AlbaniaFC 1 1
PWP 1 1
BeninFC
PWP
Burkina FasoFC
PWP
CanadaFC
PWP
ColombiaFC
PWP
EcuadorFC
PWP
EthiopiaFC 2
PWP 2
GermanyFC 3
PWP 3
IndiaFC 441
PWP 441
IndonesiaFC
PWP
JamaicaFC
PWP
JordanFC 2 1
PWP 2 1
KenyaFC
PWP
MozambiqueFC
PWP
PolandFC
PWP 1
PortugalFC
PWP
Puerto RicoFC
PWP
Sierra
Leone
FC
PWP
South
Africa
FC6 172 9 62
PWP6 172 9 62
SurinameFC
PWP 5
SwedenFC 1
PWP 1
ThailandFC
PWP
UKFC
PWP
TanzaniaFC 1
PWP 1
USAFC 3
PWP 4
ZambiaFC1
PWP1
ZimbabweFC
PWP
Figure A1. Modeling dataset boxplot of (a) the FC and (b) the PWP. The top and bottom lines are the maximum and minimum observation above/below the fence (1.5 × interquartile range), respectively. The upper and lower boundaries of the box are the 75th and 25th percentage data, respectively. The cross-mark and line in the box are the mean and median of the dataset.
Figure A1. Modeling dataset boxplot of (a) the FC and (b) the PWP. The top and bottom lines are the maximum and minimum observation above/below the fence (1.5 × interquartile range), respectively. The upper and lower boundaries of the box are the 75th and 25th percentage data, respectively. The cross-mark and line in the box are the mean and median of the dataset.
Agriengineering 06 00151 g0a1

Appendix B. Used Parameters of Each Factor of Published PTFs

Table A3. Used parameters of each factor of published PTFs.
Table A3. Used parameters of each factor of published PTFs.
PTFsTargetSoilSiltClayOrganic
Matters
Sa × OMC × OMSa × CSi × C1/(OM + 1)Si × OM’C × OM’Constant
SaSiCOMOM’
Saxton and RawlsFC’−0.251 0.1950.0110.006−0.0270.452 0.299
FCFC’ + (1.283(FC’)2 − 0.374(FC’) − 0.015)
Adhikary et al.FC−0.51−0.27 56.37
Tóth et al.FC 0.001540.00453 −0.000511−0.18870.001440.000870.2449
Saxton and RawlsPWP’−0.024 0.4870.0060.005−0.0130.068 0.031
PWPPWP’ + (0.14(PWP’) − 0.02)
Adhikary et al.PWP 0.44000 0.71
Tóth et al.PWP −0.000840.00213 0.000385−0.07670.000950.002330.09878

Appendix C. Model Performance and Comparisons

Table A4. Model performance of published PTFs, ANN, GEP, and REG.
Table A4. Model performance of published PTFs, ANN, GEP, and REG.
TargetModelDataR2RMSENRMSEMAE
TrainingCVTestingTrainingCVTestingTrainingCVTestingTrainingCVTestingTrainingCVTesting
FCSaxton and RawlsTotal: 179Total: 0.467Total: 11.254Total: 55.6%Total: 8.120
Adhikary et al.Total: 210Total: 0.527Total: 9.969Total: 49.2%Total: 7.369
Tóth et al.Total: 179Total: 0.347Total: 17.225Total: 85.0%Total: 15.575
REG178-320.667-0.5778.200-7.05338.5%-49.2%5.438-5.005
ANN14632320.8750.7630.8984.9108.0134.57423.2%36.3%31.9%3.6836.0223.133
GEP14632320.7000.7530.8437.4878.3374.29035.4%37.7%29.9%4.6284.6723.115
PWPSaxton and RawlsTotal: 221Total: 0.485Total: 18.918Total: 157.5%Total: 14.392
Adhikary et al.Total: 254Total: 0.501Total: 17.699Total: 147.3%Total: 13.494
Tóth et al.Total: 221Total: 0.472Total: 16.244Total: 135.2%Total: 12.705
REG217-370.612-0.8375.836-8.47549.8%-69.0%3.531-6.122
ANN18037370.6600.8080.9156.0313.7742.44249.1%36.9%19.9%3.5142.7781.758
GEP18037370.8520.6650.8893.7234.5512.74630.8%46.5%22.3%2.5423.0832.000
Figure A2. Model testing dataset comparison in (a) FC and (b) PWP.
Figure A2. Model testing dataset comparison in (a) FC and (b) PWP.
Agriengineering 06 00151 g0a2

Appendix D. Python-Based FCGEP and PWPGEP Models

  • Python-based FCGEP model
  • # This model was implemented using Python 3.8. Ensure compatibility with this version.
  • # Considering potential future updates to Python that may lead to compatibility issues with certain modules, the authors recommend that future users assess compatibility with the following sections when using different versions of Python.
  •  
  • From math import *
  •  
  • def fieldCapacity(d):
 
C1 = −21.2657870372787; C2 = −9.7779381694998; C3 = 4.57090956378595;
C4 = 1.06872399344557;
 
latitude = 0; longitude = 1; altitude = 2; EC = 3; clay = 4; sand = 5;
silt = 6
 
y = 0.0
 
y = (((d[clay]+(1.0/((d[latitude]/C1))))/2.0)+((((((d[latitude]+d[sand]+d[sand]+d[silt])/4.0)+min(d[silt],d[sand],d[sand])+d[silt]+d[latitude])/4.0)+cos(d[altitude]))/2.0))
y = min(y,(((tan((d[EC]+d[longitude]))+(((d[EC]*d[EC]*d[silt])+((d[silt]+d[clay]+C3 +C4)/4.0)+d[latitude]+d[silt])/4.0))/2.0)+((d[clay]+((C2+d[altitude])/2.0))/2.0)))
 
return y
2.
Python-based PWPGEP model
 
  • # This model was implemented using Python 3.8. Ensure compatibility with this version.
  • # Considering potential future updates to Python that may lead to compatibility issues with certain modules, the authors recommend that future users assess compatibility with the following sections when using different versions of Python.
  •  
  • From math import *
  •  
  • def permanentWiltingPoint(d):
 
C1 = −4.9227785356074; C2 = −8.26012200863814; C3 = −5.76989349040193;
C4 = −0.9464415417951; C5 = −3.81722107697073; C6 = −6.20128482924894;
C7 = −8.31278145411878
 
latitude = 0; longitude = 1; altitude = 2; EC = 3; pH = 4; clay = 5; sand = 6; silt = 7
 
y = 0.0
y = atan(gep3Rt(floor(pow(pow(pow(gep3Rt(d[sand]),(1.0/4.0)),5.0),floor(atan(log(d[pH])))))))
y = y * (d[clay]-gep3Rt((min((C1*d[latitude]),(d[clay]+d[clay]+d[pH]+C1))-((d[silt]+d[sand])/2.0)-tan(d[latitude])-d[latitude])))
y = y * asin((atan((1.0/((C2+((((d[silt]+d[latitude]+d[longitude]+C4)/4.0))/((d[altitude]+C3+d[altitude]+d[clay])/4.0))))))))
y = y * min(C5,((((d[sand]-d[EC]-C5-d[altitude])+(d[longitude]-C6))+tan((d[longitude]-C7))+pow((d[altitude]-d[sand]),2.0))/3.0))
y = y * pow(sin(gep5Rt(d[pH])),5.0)
 
return y
 
def gep3Rt(x):
if (x < 0.0):
  return −pow(−x,(1.0/3.0))
else:
  return pow(x,(1.0/3.0))
 
  • def gep5Rt(x):
if (x < 0.0):
  return −pow(−x,(1.0/5.0))
else:
  return pow(x,(1.0/5.0))

References

  1. Brady, N.C.; Weil, R.R.; Weil, R.R. The Nature and Properties of Soils; Prentice Hall: Upper Saddle River, NJ, USA, 2008; Volume 13. [Google Scholar]
  2. Assi, A.T.; Blake, J.; Mohtar, R.H.; Braudeau, E. Soil aggregates structure-based approach for quantifying the field capacity, permanent wilting point and available water capacity. Irrig. Sci. 2019, 37, 511–522. [Google Scholar] [CrossRef]
  3. Hoogenboom, G.; Porter, C.; Shelia, V.; Boote, K.; Singh, U.; White, J.; Hunt, L.; Ogoshi, R.; Lizaso, J.; Koo, J. Decision Support System for Agrotechnology Transfer (DSSAT); Version 4.7; DSSAT Foundation: Gainesville, FL, USA, 2017; Available online: https://DSSAT.net (accessed on 1 March 2023).
  4. Pentoś, K.; Pieczarka, K.; Serwata, K. The Relationship between Soil Electrical Parameters and Compaction of Sandy Clay Loam Soil. Agriculture 2021, 11, 114. [Google Scholar] [CrossRef]
  5. Mohanty, B.; Gaur, N. Near Surface Soil Moisture Controls Beyond the Darcy Support Scale: A Remote Sensing Perspective. In Proceedings of the AGU Fall Meeting Abstracts, San Francisco, CA, USA, 15–19 December 2014; p. H13D-1135. [Google Scholar]
  6. Tunçay, T.; Başkan, O.; Bayramın, I.; Dengız, O.; Kılıç, Ş. Geostatistical approach as a tool for estimation of field capacity and permanent wilting point in semi-arid terrestrial ecosystem. Arch. Agron. Soil Sci. 2018, 64, 1240–1253. [Google Scholar] [CrossRef]
  7. Jian, J.; Du, X.; Stewart, R.D. A database for global soil health assessment. Sci. Data 2020, 7, 16. [Google Scholar] [CrossRef]
  8. McBratney, A.B.; Santos, M.M.; Minasny, B. On digital soil mapping. Geoderma 2003, 117, 3–52. [Google Scholar] [CrossRef]
  9. Santra, P.; Kumar, M.; Kumawat, R.; Painuli, D.; Hati, K.; Heuvelink, G.; Batjes, N. Pedotransfer functions to estimate soil water content at field capacity and permanent wilting point in hot Arid Western India. J. Earth Syst. Sci. 2018, 127, 35. [Google Scholar] [CrossRef]
  10. Morgan, C.L. Assessing Soil Health: Soil Water Cycling. Crops Soils 2020, 53, 35–41. [Google Scholar] [CrossRef]
  11. Nourbakhsh, F.; Afyuni, M.; Abbaspour, K.C.; Schulin, R. Research note: Estimation of field capacity and wilting point from basic soil physical and chemical properties. Arid Land Res. Manag. 2004, 19, 81–85. [Google Scholar] [CrossRef]
  12. Taşan, S.; Demir, Y. Comparative Analysis of MLR, ANN, and ANFIS Models for Prediction of Field Capacity and Permanent Wilting Point for Bafra Plain Soils. Commun. Soil Sci. Plant Anal. 2020, 51, 604–621. [Google Scholar] [CrossRef]
  13. Jin, X.; Wang, S.; Yu, N.; Zou, H.; An, J.; Zhang, Y.; Wang, J.; Zhang, Y. Spatial predictions of the permanent wilting point in arid and semi-arid regions of Northeast China. J. Hydrol. 2018, 564, 367–375. [Google Scholar] [CrossRef]
  14. Cueff, S.; Coquet, Y.; Aubertot, J.-N.; Bel, L.; Pot, V.; Alletto, L. Estimation of soil water retention in conservation agriculture using published and new pedotransfer functions. Soil Tillage Res. 2021, 209, 104967. [Google Scholar] [CrossRef]
  15. Rotnitzky, A.; Wypij, D. A note on the bias of estimators with missing data. Biometrics 1994, 50, 1163–1170. [Google Scholar] [CrossRef]
  16. Tóth, B.; Weynants, M.; Nemes, A.; Makó, A.; Bilas, G.; Tóth, G. New generation of hydraulic pedotransfer functions for Europe. Eur. J. Soil Sci. 2015, 66, 226–238. [Google Scholar] [CrossRef]
  17. Tunçay, T.; Kılıç, Ş.; Dedeoğlu, M.; Dengiz, O.; Başkan, O.; Bayramin, İ. Assessing soil fertility index based on remote sensing and gis techniques with field validation in a semiarid agricultural ecosystem. J. Arid Environ. 2021, 190, 104525. [Google Scholar] [CrossRef]
  18. Keshavarzi, R.; Mohammadi, S. A new approach for numerical modeling of hydraulic fracture propagation in naturally fractured reservoirs. In Proceedings of the SPE/EAGE European Unconventional Resources Conference and Exhibition—From Potential to Production, Vienna, Austria, 20–22 March 2012; p. cp-285-00039. [Google Scholar]
  19. Mohanty, M.; Sinha, N.K.; Painuli, D.; Bandyopadhyay, K.; Hati, K.; Reddy, K.S.; Chaudhary, R. Modelling soil water contents at field capacity and permanent wilting point using artificial neural network for Indian soils. Natl. Acad. Sci. Lett. 2015, 38, 373–377. [Google Scholar] [CrossRef]
  20. Shiri, J.; Keshavarzi, A.; Kisi, O.; Karimi, S. Using soil easily measured parameters for estimating soil water capacity: Soft computing approaches. Comput. Electron. Agric. 2017, 141, 327–339. [Google Scholar] [CrossRef]
  21. Gunarathna, M.; Sakai, K.; Nakandakari, T.; Momii, K.; Kumari, M.; Amarasekara, M. Pedotransfer functions to estimate hydraulic properties of tropical Sri Lankan soils. Soil Tillage Res. 2019, 190, 109–119. [Google Scholar] [CrossRef]
  22. Ghorbani, M.A.; Shamshirband, S.; Haghi, D.Z.; Azani, A.; Bonakdari, H.; Ebtehaj, I. Application of firefly algorithm-based support vector machines for prediction of field capacity and permanent wilting point. Soil Tillage Res. 2017, 172, 32–38. [Google Scholar] [CrossRef]
  23. Yamaç, S.S.; Şeker, C.; Negiş, H. Evaluation of machine learning methods to predict soil moisture constants with different combinations of soil input data for calcareous soils in a semi arid area. Agric. Water Manag. 2020, 234, 106121. [Google Scholar] [CrossRef]
  24. Hateffard, F.; Dolati, P.; Heidari, A.; Zolfaghari, A.A. Assessing the performance of decision tree and neural network models in mapping soil properties. J. Mt. Sci. 2019, 16, 1833–1847. [Google Scholar] [CrossRef]
  25. McCutcheon, M.; Farahani, H.; Stednick, J.; Buchleiter, G.; Green, T. Effect of soil water on apparent soil electrical conductivity and texture relationships in a dryland field. Biosyst. Eng. 2006, 94, 19–32. [Google Scholar] [CrossRef]
  26. Frost, P.S.; van Es, H.M.; Rossiter, D.G.; Hobbs, P.R.; Pingali, P.L. Soil health characterization in smallholder agricultural catchments in India. Appl. Soil Ecol. 2019, 138, 171–180. [Google Scholar] [CrossRef]
  27. Corwin, D.L.; Lesch, S.M. Apparent soil electrical conductivity measurements in agriculture. Comput. Electron. Agric. 2005, 46, 11–43. [Google Scholar] [CrossRef]
  28. Allen, D.E.; Singh, B.P.; Dalal, R.C. Soil health indicators under climate change: A review of current knowledge. In Soil Health and Climate Change; Springer: Berlin/Heidelberg, Germany, 2011; pp. 25–45. [Google Scholar]
  29. Sinha, S.K.; Wang, M.C. Artificial Neural Network Prediction Models for Soil Compaction and Permeability. Geotech. Geol. Eng. 2008, 26, 47–64. [Google Scholar] [CrossRef]
  30. Besalatpour, A.; Hajabbasi, M.; Ayoubi, S.; Afyuni, M.; Jalalian, A.; Schulin, R. Soil shear strength prediction using intelligent systems: Artificial neural networks and an adaptive neuro-fuzzy inference system. Soil Sci. Plant Nutr. 2012, 58, 149–160. [Google Scholar] [CrossRef]
  31. Pham, V.-N.; Oh, E.; Ong, D.E.L. Effects of binder types and other significant variables on the unconfined compressive strength of chemical-stabilized clayey soil using gene-expression programming. Neural Comput. Appl. 2022, 34, 9103–9121. [Google Scholar] [CrossRef]
  32. Batjes, N.H.; Ribeiro, E.; van Oostrum, A. Standardised soil profile data to support global mapping and modelling (WoSIS snapshot 2019). Earth Syst. Sci. Data 2020, 12, 299–320. [Google Scholar] [CrossRef]
  33. Batjes, N.H.; Ribeiro, E.; van Oostrum, A.; Leenaars, J.; Hengl, T.; Mendes de Jesus, J. WoSIS: Providing standardised soil profile data for the world. Earth Syst. Sci. Data 2017, 9, 1–14. [Google Scholar] [CrossRef]
  34. Ribeiro, E.; Batjes, N.; Van Oostrum, A.J.M. World Soil Information Service (WoSIS)—Towards the Standardization and Harmonization of World Soil Data; ISRIC, World Soil Information: Wageningen, The Netherlands, 2018; Volume 166. [Google Scholar]
  35. Saxton, K.E.; Rawls, W.J. Soil water characteristic estimates by texture and organic matter for hydrologic solutions. Soil Sci. Soc. Am. J. 2006, 70, 1569–1578. [Google Scholar] [CrossRef]
  36. Adhikary, P.P.; Chakraborty, D.; Kalra, N.; Sachdev, C.; Patra, A.; Kumar, S.; Tomar, R.; Chandna, P.; Raghav, D.; Agrawal, K. Pedotransfer functions for predicting the hydraulic properties of Indian soils. Soil Res. 2008, 46, 476–484. [Google Scholar] [CrossRef]
  37. Liu, L.-W.; Hsieh, S.-H.; Lin, S.-J.; Wang, Y.-M.; Lin, W.-S. Rice Blast (Magnaporthe oryzae) Occurrence Prediction and the Key Factor Sensitivity Analysis by Machine Learning. Agronomy 2021, 11, 771. [Google Scholar] [CrossRef]
  38. Hsieh, S.-H.; Liu, L.-W.; Chung, W.-G.; Wang, Y.-M. Sensitivity analysis on the rising relation between short-term rainfall and groundwater table adjacent to an artificial recharge lake. Water 2019, 11, 1704. [Google Scholar] [CrossRef]
  39. Ferreira, C. Gene expression programming in problem solving. In Soft Computing and Industry; Springer: London, UK, 2002; pp. 635–653. [Google Scholar]
  40. Liu, L.-W.; Wang, Y.-M. Modelling reservoir turbidity using Landsat 8 satellite imagery by gene expression programming. Water 2019, 11, 1479. [Google Scholar] [CrossRef]
  41. Wang, X.; Liu, L.; Zhang, W.; Ma, X. Prediction of Plant Uptake and Translocation of Engineered Metallic Nanoparticles by Machine Learning. Environ. Sci. Technol. 2021, 55, 7491–7500. [Google Scholar] [CrossRef]
  42. Lee, C.-H.; Liu, L.-W.; Wang, Y.-M.; Leu, J.-M.; Chen, C.-L. Drone-Based Bathymetry Modeling for Mountainous Shallow Rivers in Taiwan Using Machine Learning. Remote Sens. 2022, 14, 3343. [Google Scholar] [CrossRef]
  43. Ferreira, C. Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2006; Volume 21. [Google Scholar]
  44. Liu, L. Drone-based photogrammetry for riverbed characteristics extraction and flood discharge modeling in Taiwan’s mountainous rivers. Measurement 2023, 220, 113386. [Google Scholar] [CrossRef]
  45. Faloye, O.T.; Ajayi, A.E.; Ajiboye, Y.; Alatise, M.O.; Ewulo, B.S.; Adeosun, S.S.; Babalola, T.; Horn, R. Unsaturated Hydraulic Conductivity Prediction Using Artificial Intelligence and Multiple Linear Regression Models in Biochar Amended Sandy Clay Loam Soil. J. Soil Sci. Plant Nutr. 2022, 22, 1589–1603. [Google Scholar] [CrossRef]
  46. Lee, C.-H.; Hsu, M.-K.; Wang, Y.-M.; Leu, J.-M.; Chen, C.-L.; Liu, L. Evaluating gradient descent variations for artificial neural network bathymetry modeling and sensitivity analysis. J. Appl. Remote Sens. 2024, 18, 022204. [Google Scholar] [CrossRef]
  47. Jamieson, P.D.; Porter, J.R.; Wilson, D.R. A test of the computer simulation model ARCWHEAT1 on wheat crops grown in New Zealand. Field Crops Res. 1991, 27, 337–350. [Google Scholar] [CrossRef]
  48. Moral, F.J.; Serrano, J.M. Using low-cost geophysical survey to map soil properties and delineate management zones on grazed permanent pastures. Precis. Agric. 2019, 20, 1000–1014. [Google Scholar] [CrossRef]
  49. Nocco, M.A.; Ruark, M.D.; Kucharik, C.J. Apparent electrical conductivity predicts physical properties of coarse soils. Geoderma 2019, 335, 1–11. [Google Scholar] [CrossRef]
  50. Carter, M.R.; Gregorich, E.G. Soil Sampling and Methods of Analysis; CRC Press: Boca Raton, FL, USA, 2007. [Google Scholar]
  51. Topp, G.C.; Davis, J.L.; Annan, A.P. Electromagnetic determination of soil water content: Measurements in coaxial transmission lines. Water Resour. Res. 1980, 16, 574–582. [Google Scholar] [CrossRef]
  52. Ratcliffe, R.G.; Rengel, Z. (Eds.) Handbook of plant growth. pH as the master variable. Ann. Bot. 2003, 92, 165–166. [Google Scholar] [CrossRef]
  53. Tang, C.; Rengel, Z. Handbook of Soil Acidity; Marcel Dekker: New York, NY, USA, 2003; pp. 57–81. [Google Scholar]
  54. Liu, L.-W.; Ismail, M.H.; Wang, Y.-M.; Lin, W.-S. Internet of Things based Smart Irrigation Control System for Paddy Field. AGRIVITA J. Agric. Sci. 2021, 43, 378–389. [Google Scholar] [CrossRef]
Figure 1. Geographic origin of the samples in the modeling dataset. Circle symbols represent the field capacity (FC) and triangle symbols stand for the permanent wilting point (PWP). Different colors indicate the range of data values of the FC or PWP.
Figure 1. Geographic origin of the samples in the modeling dataset. Circle symbols represent the field capacity (FC) and triangle symbols stand for the permanent wilting point (PWP). Different colors indicate the range of data values of the FC or PWP.
Agriengineering 06 00151 g001
Figure 2. Modeling results of field capacity (FC) from (a) Saxton and Rawls, Adhikary et al., and Tóth et al. models, (b) REG model, (c) ANN model, and (d) GEP model.
Figure 2. Modeling results of field capacity (FC) from (a) Saxton and Rawls, Adhikary et al., and Tóth et al. models, (b) REG model, (c) ANN model, and (d) GEP model.
Agriengineering 06 00151 g002
Figure 3. Modeling results of permanent wilting point (PWP) from (a) Saxton and Rawls, Adhikary et al., and Tóth et al. models, (b) REG model, (c) ANN model, and (d) GEP model.
Figure 3. Modeling results of permanent wilting point (PWP) from (a) Saxton and Rawls, Adhikary et al., and Tóth et al. models, (b) REG model, (c) ANN model, and (d) GEP model.
Agriengineering 06 00151 g003aAgriengineering 06 00151 g003b
Figure 4. Occurrence of absolute error (AE) from different models for (a) FC and (b) PWP simulation.
Figure 4. Occurrence of absolute error (AE) from different models for (a) FC and (b) PWP simulation.
Agriengineering 06 00151 g004aAgriengineering 06 00151 g004b
Table 1. Values of the input parameters for model fitting.
Table 1. Values of the input parameters for model fitting.
TargetParameterInputOutput
ClaySandSiltLongitudeLatitudeAltitudepHECFC or PWP
%%%DecimalDecimalm-ds/m%
FC
(n = 210)
μ24.31453.40121.40523.519−5.726834.9627.3453.42520.257
σ17.27627.24117.05838.88329.995569.6491.3377.11713.736
Max.79.00098.00078.000116.72169.4332604.00010.40050.50072.000
Min.2.0001.0000.000−154.850−33.821−2.0003.5000.0001.000
r0.618−0.7550.603−0.1800.623−0.240−0.093−0.007-
PWP
(n = 254)
μ24.34853.36121.53615.843−3.078803.5907.1913.12011.937
σ17.83127.67716.77045.22428.847597.0341.4316.7289.485
Max.79.00098.00078.000116.72169.4332604.00010.40050.50066.000
Min.2.0001.0000.000−154.850−33.821−2.0003.5000.0001.000
r0.706−0.7010.424−0.1870.410−0.0030.0480.042-
Table 2. Performance of FC and PWP modeling by ANN using combinations of input variables.
Table 2. Performance of FC and PWP modeling by ANN using combinations of input variables.
TargetCombinationTrainingCVTesting
R2RMSENRMSEMAER2RMSENRMSEMAER2RMSENRMSEMAE
FCL0.6558.06338.1%6.3380.7079.90844.8%8.1870.2599.53066.4%8.019
S0.7396.98633.0%4.9790.7229.16441.5%6.6830.5966.70046.7%5.178
PH0.01613.78765.2%11.0880.03615.51170.2%11.9640.01311.66681.3%10.662
EC0.00213.61664.4%11.1460.01315.40669.7%12.1490.05812.29885.7%11.254
L + S0.7227.60135.9%5.1150.7779.00840.8%5.7420.7166.18643.1%4.578
PH + EC0.02413.50163.8%10.9310.05415.74871.3%12.4680.00211.82982.5%10.891
L + PH0.7476.92532.7%5.4090.7148.85640.1%7.0700.27911.27178.6%9.219
L + EC0.6707.90037.4%6.2540.61310.22046.3%8.4640.2579.90269.0%8.328
L + PH + EC0.5988.85141.8%7.2800.70710.22346.3%7.7490.3868.87761.9%7.457
S + PH0.7756.51130.8%4.8080.7228.87540.2%6.8880.5757.06249.2%5.586
S + EC0.7506.81932.2%5.1650.6968.82339.9%6.3160.5757.03649.1%4.574
S + PH + EC0.8075.99628.3%4.4890.6968.92840.4%6.3890.7125.75440.1%4.137
L + S + PH0.7277.40335.0%4.9360.7628.50238.5%5.1860.7525.70539.8%4.495
L + S + EC0.8754.91023.2%3.6830.7638.01336.3%6.0220.8984.57431.9%3.133
L + S + PH + EC0.7916.63731.4%4.5980.6989.55443.2%5.6890.8194.80133.5%3.647
PWPL0.1519.15474.6%6.6650.4156.36762.3%4.8870.3987.78164.4%6.135
S0.5146.87756.0%3.9940.6544.77346.7%3.7420.7904.32735.8%3.235
PH0.0009.95981.2%7.1940.0158.05578.8%6.4250.0319.50378.7%7.787
EC0.0069.95781.1%7.1370.0247.93677.6%6.3040.0709.84581.5%8.041
L + S0.6395.93948.4%3.6860.7274.20541.1%3.1950.8343.84031.8%2.777
PH + EC0.0079.90480.7%7.1740.0098.00278.3%6.4110.0879.85581.6%8.062
L + PH0.4197.65162.4%5.6490.4735.96258.3%4.3300.4257.37261.0%6.125
L + EC0.5376.94156.6%5.0060.5035.92458.0%4.4580.3827.71963.9%6.023
L + PH + EC0.4877.24859.1%5.3250.5575.65755.3%4.3860.3707.85765.1%6.457
S + PH0.7235.20842.4%3.5200.6974.50244.0%3.3980.7774.57837.9%3.384
S + EC0.5356.72154.8%3.8720.6724.63245.3%3.5750.7854.40036.4%3.148
S + PH + EC0.5966.31051.4%3.5940.7234.24741.5%3.0490.8134.18834.7%2.949
L + S + PH0.6555.88247.9%3.5550.7843.89438.1%3.0590.8723.40528.2%2.508
L + S + EC0.6624.91040.0%3.5020.7984.23041.4%2.8900.8773.57829.6%2.366
L + S + PH + EC0.6606.03149.1%3.5140.8083.77436.9%2.7780.9152.44219.9%1.758
L: longitude, latitude, and altitude; S: sand, silt, and clay content; the bold font is the optimum combination in the field capacity and the permanent wilting point, respectively.
Table 3. Model input and performance parameters on the testing dataset.
Table 3. Model input and performance parameters on the testing dataset.
TargetModelInput VariablesTesting DatasetR2RMSENRMSEMAE
FCSaxton and RawlsSa, C, OM280.6448.07756.3%6.467
Adhikary et al.Sa, Si320.6837.48252.2%6.465
Tóth et al.Si, C, OM280.49017.766123.9%16.405
REGL, Sa, Si, C, EC320.5777.05349.2%5.005
ANNL, Sa, Si, C, EC320.8984.57431.9%3.133
GEPL, Sa, Si, C, EC320.8434.29029.9%3.115
PWPSaxton and RawlsSa, C, OM310.7885.10741.6%3.567
Adhikary et al.C370.8237.64662.2%4.031
Tóth et al.Si, C, OM310.7764.63237.7%4.144
REGL, Sa, Si, C, EC, pH370.8378.47569.0%6.122
ANNL, Sa, Si, C, EC, pH370.9152.44219.9%1.758
GEPL, Sa, Si, C, EC, pH370.8892.74622.3%2.000
L: longitude, latitude, and altitude; Sa: sand; Si: silt; C: clay; OM: organic matters; EC: electrical conductivity.
Table 4. Significance analysis by Kolmogorov–Smirnov test.
Table 4. Significance analysis by Kolmogorov–Smirnov test.
VariableField CapacityPermanent Wilting Point
p-ValueSignificancep-ValueSignificance
Altitude0.704 0.012*
Sand0.017*0.331
Silt0.042*0.099
Clay0.042*0.331
EC0.545 0.039*
pH- 0.415
* The significance is >0.05.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, L.; Ma, X. Prediction of Soil Field Capacity and Permanent Wilting Point Using Accessible Parameters by Machine Learning. AgriEngineering 2024, 6, 2592-2611. https://doi.org/10.3390/agriengineering6030151

AMA Style

Liu L, Ma X. Prediction of Soil Field Capacity and Permanent Wilting Point Using Accessible Parameters by Machine Learning. AgriEngineering. 2024; 6(3):2592-2611. https://doi.org/10.3390/agriengineering6030151

Chicago/Turabian Style

Liu, Liwei, and Xingmao Ma. 2024. "Prediction of Soil Field Capacity and Permanent Wilting Point Using Accessible Parameters by Machine Learning" AgriEngineering 6, no. 3: 2592-2611. https://doi.org/10.3390/agriengineering6030151

Article Metrics

Back to TopTop