Digital Mapping of Soil pH Based on Machine Learning Combined with Feature Selection Methods in East China
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Areas
2.2. Data Source
2.3. Feature Selection
2.4. Random Forest
- (1)
- Sample the training set D with a replacement to generate bootstrap resample Dr.
- (2)
- For each bootstrap sample, grow a tree. At each node of trees, the best split is chosen from a random subset of mtry features.
- (3)
- Repeat the above steps until the random forest is grown.
2.5. Support Vector for Regression
2.6. Parameter Tuning and Analysis
2.7. Model Accuracy Assessment
3. Results and Discussion
3.1. Soil pH Data Analysis
3.2. Feature Selection Results
3.3. Effect of Parameter Optimization on Performance of RF and SVR Model
3.3.1. Single Parameter Optimization Analysis
3.3.2. Multi-Parameter Optimization Analysis
3.4. Mapping Soil pH and Model Accuracy Assessment
4. Conclusions
- (1)
- The prediction accuracy of the RF and SVR models with the combined use of the RFE, SAFS, and SBF feature selection methods outperformed those without any feature selection. Therefore, conducting feature selection before establishing machine learning models is necessary and can significantly enhance model accuracy.
- (2)
- In the study area, employing the RFE feature selection method combined with RF and SVR modeling produced the best soil pH prediction model when compared to the other feature selection methods in terms of prediction accuracy and generalization ability. Both the RFE-RF and RFE-SVR models achieved high prediction accuracy results. The validation set accuracy of the RFE-RF model was greater than that of the RFE-SVR model, with a lower difference between the Rca2 and Rva2 values. These findings indicate that the RFE-RF model possessed higher prediction and generalization ability compared to the RFE-SVR model, making it more suitable as a soil pH prediction model.
- (3)
- For the RF model, individual parameters did not appear to have a significant impact on model accuracy and increasing the number of parameters (ntree) did not lead to a significant improvement in prediction accuracy. For the SVR model, the penalty coefficient (cost) and the parameter gamma for the radial basis function had a significant impact on model accuracy.
- (4)
- Based on the mapping results, both the RF and SVR models exhibited similar distribution patterns for soil pH prediction, which aligned with the “south acid and north alkaline” characteristics of soil acidity in the study area. Therefore, using both models for the spatial prediction and mapping of soil pH has significant meaning.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhao, X.; He, C.; Liu, W.S.; Liu, W.X.; Liu, Q.Y.; Bai, W.; Li, L.J.; Lal, R.; Zhang, H.L. Responses of soil pH to no-till and the factors affecting it: A global meta-analysis. Global Chang. Biol. 2022, 28, 154–166. [Google Scholar] [CrossRef] [PubMed]
- Meng, C.; Tian, D.; Zeng, H.; Li, Z.; Yi, C.; Niu, S. Global soil acidification impacts on belowground processes. Environ. Res. Lett. 2019, 14, 074003. [Google Scholar] [CrossRef]
- Liu, K.; Liu, Z.; Zhou, N.; Shi, X.; Lock, T.R.; Kallenbach, R.L.; Yuan, Z. Diversity-stability relationships in temperate grasslands as a function of soil pH. Land Degrad. Dev. 2022, 33, 1704–1717. [Google Scholar] [CrossRef]
- Roy, W.S. Factors of soil formation. A system of quantitative pedology. Geoderma 1995, 68, 334–335. [Google Scholar]
- McBratney, A.B.; Mendonça Santos, M.L.; Minasny, B. On digital soil mapping. Geoderma 2003, 117, 3–52. [Google Scholar] [CrossRef]
- Srisomkiew, S.; Kawahigashi, M.; Limtong, P.; Yuttum, O. Digital soil assessment of soil fertility for Thai jasmine rice in the Thung Kula Ronghai region, Thailand. Geoderma 2021, 409, 115597. [Google Scholar] [CrossRef]
- Simon, A.; Geitner, C.; Katzensteiner, K. A framework for the predictive mapping of forest soil properties in mountain areas. Geoderma 2020, 371, 114383. [Google Scholar] [CrossRef]
- Zovko, M.; Romić, D.; Colombo, C.; Di Iorio, E.; Romić, M.; Buttafuoco, G.; Castrignanò, A. A geostatistical Vis-NIR spectroscopy index to assess the incipient soil salinization in the Neretva River valley, Croatia. Geoderma 2018, 332, 60–72. [Google Scholar] [CrossRef]
- Odhiambo, B.O.; Kenduiywo, B.K.; Were, K. Spatial prediction and mapping of soil pH across a tropical afro-montane landscape. Appl. Geogr. 2020, 114, 102129. [Google Scholar] [CrossRef]
- Xuanqiang, C.; Mingsong, Z. Comparison and analysis of spatial prediction and variability of soil pH in Anhui Province based on three kinds of geographically weighted regression. Sci. Geogr. Sin. 2023, 43, 173–183. [Google Scholar]
- Leo, B. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar]
- Alex, J.S.; Bernhard, S. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar]
- Suleymanov, A.; Abakumov, E.; Suleymanov, R.; Gabbasova, I.; Komissarov, M. The Soil Nutrient Digital Mapping for Precision Agriculture Cases in the Trans-Ural Steppe Zone of Russia Using Topographic Attributes. ISPRS Int. J. Geo-Inform. 2021, 10, 243. [Google Scholar] [CrossRef]
- Taghizadeh-Mehrjardi, R.; Schmidt, K.; Toomanian, N.; Heung, B.; Behrens, T.; Mosavi, A.; Band, S.S.; Amirian-Chakan, A.; Fathabadi, A.; Scholten, T. Improving the spatial prediction of soil salinity in arid regions using wavelet transformation and support vector regression models. Geoderma 2021, 383, 114793. [Google Scholar] [CrossRef]
- Li, X.; Liu, X.; Liu, M. Random forest algorithm and regional applications of spectral inversion model for estimating canopy nitrogen concentration in rice. J. Remote Sens. 2014, 18, 923–945. [Google Scholar]
- Kennedy, W.; Dieu, T.B.; Øystein, B.D.; Bal, R.S. A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape. Ecol. Indic. 2015, 52, 394–403. [Google Scholar]
- Yang, R.-M.; Zhang, G.-L.; Liu, F.; Lu, Y.-Y.; Yang, F.; Yang, F.; Yang, M.; Zhao, Y.-G.; Li, D.-C. Comparison of boosted regression tree and random forest models for mapping topsoil organic carbon concentration in an alpine ecosystem. Ecol. Indic. 2016, 60, 870–878. [Google Scholar] [CrossRef]
- Avrim, L.B.; Pat, L. Selection of relevant features and examples in machine learning. Artif. Intell. 1997, 97, 245–271. [Google Scholar]
- Isabelle, G.; André, E. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
- Girish, C.; Ferat, S. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar]
- Zhang, X.; Chen, S.; Xue, J.; Wang, N.; Xiao, Y.; Chen, Q.; Hong, Y.; Zhou, Y.; Teng, H.; Hu, B.; et al. Improving model parsimony and accuracy by modified greedy feature selection in digital soil mapping. Geoderma 2023, 432, 116383. [Google Scholar] [CrossRef]
- Ben-Hur, A.; Weston, J. A User’s Guide to Support Vector Machines; Carugo, O., Eisenhaber, F., Eds.; Humana Press: Totowa, NJ, USA, 2010; Volume 609, pp. 223–239. [Google Scholar]
- Zhao, M.-S.; Qiu, S.-Q.; Wang, S.-H.; Li, D.-C.; Zhang, G.-L. Spatial-temporal change of soil organic carbon in Anhui Province of East China. Geoderma Reg. 2021, 26, e00415. [Google Scholar] [CrossRef]
- Li, D.; Zhang, G.; Wang, H. Soil Series of China—Anhui Volume; Science Press at Beijing: Beijing, China, 2017; pp. 3–24. [Google Scholar]
- Zhang, G.-L.; Gong, Z.-T. Soil Survey Laboratory Methods; Science Press: Beijing, China, 2012; pp. 38–40. [Google Scholar]
- Guo, J.; Wang, K.; Jin, S. Mapping of Soil pH Based on SVM-RFE Feature Selection Algorithm. Agronomy 2022, 12, 2742. [Google Scholar] [CrossRef]
- de Sousa, G.P.B.; Tayebi, M.; Campos, L.R.; Greschuk, L.T.; Amorim, M.T.A.; Rosas, J.T.F.; Mello, F.A.d.O.; Chen, S.; Ayoubi, S.; Demattê, J.A.M. Improvement of spatial prediction of soil depth via earth observation. CATENA 2023, 223, 106915. [Google Scholar] [CrossRef]
- Chen, Y.; Ma, L.; Yu, D.; Zhang, H.; Feng, K.; Wang, X.; Song, J. Comparison of feature selection methods for mapping soil organic matter in subtropical restored forests. Ecol. Indic. 2022, 135, 108545. [Google Scholar] [CrossRef]
- Justin, C.W.D.; Victor, J.R. Feature Subset Selection within a Simulated Annealing Data Mining Algorithm. J. Intell. Inf. Syst. 1997, 9, 57–81. [Google Scholar]
- Wang, S.; Lu, H.; Zhao, M. Assessing soil pH in Anhui Province based on different features mining methods combined with generalized boosted regression models. J. Appl. Ecolog. 2020, 31, 3509–3517. [Google Scholar]
- Max, K. Building Predictive Models in R Using the caret Package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar]
- Meyer, D.; Dimitriadou, E.; Hornik, K.; Weingessel, A.; Leisch, F. Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. Documentation on the R Package ‘e1071’ Version 1.7-3. 2019. Available online: https://cranr-project.org/web/packages/e1071/e1071.pdf (accessed on 1 February 2020).
- Breiman, L. Bagging predictors. Mach Learn 1996, 24, 123–140. [Google Scholar] [CrossRef]
- Helfenstein, A.; Mulder, V.L.; Heuvelink, G.B.; Okx, J.P. Tier 4 maps of soil pH at 25 m resolution for the Netherlands. Geoderma 2022, 410, 115659. [Google Scholar] [CrossRef]
- Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
- Pereira, G.W.; Valente, D.S.M.; de Queiroz, D.M.; Santos, N.T.; Fernandes-Filho, E.I. Soil mapping for precision agriculture using support vector machines combined with inverse distance weighting. Precis. Agric. 2022, 23, 1189–1204. [Google Scholar] [CrossRef]
- Svetnik, V.; Liaw, A.; Tong, C.; Wang, T. Application of Breiman’s Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules; Springer: Berlin/Heidelberg, Germany, 2004; pp. 334–343. [Google Scholar]
- John, C.G.; Trevor, I.D. A multiresolution index of valley bottom flatness for mapping depositional areas. Water Resour. Res. 2003, 39. [Google Scholar] [CrossRef]
- Haifeng, G.; Jumhong, B.; Gai, W.Q. Distribution of Soil pH Values and Soil Water Contents in FloodplainWetlands in the Lower Reach of Huolin River. Res. Soil Water Conserv. 2011, 18, 268–271. [Google Scholar]
Soil pH | n | Min | 25% Quantile | Median | 75% Quantile | Max | Mean | Std | CV/% |
---|---|---|---|---|---|---|---|---|---|
Samples | 140 | 4.58 | 5.50 | 6.01 | 7.20 | 8.67 | 6.37 | 1.16 | 18.21 |
Training set | 108 | 4.58 | 5.50 | 6.01 | 7.20 | 8.67 | 6.41 | 1.16 | 18.10 |
Testing set | 32 | 4.68 | 5.49 | 6.00 | 7.27 | 8.51 | 6.25 | 1.17 | 18.72 |
Models | Selected Features | Training Set | Testing Set | Optimal Parameters | ||||
---|---|---|---|---|---|---|---|---|
RMSE | MAE | R2 | RMSE | MAE | R2 | |||
ALL-RF | − | 0.78 | 0.61 | 0.56 | 0.85 | 0.67 | 0.45 | mtry = 3, ntree = 100 |
RFE-RF | MAP, MrRVBF, EVI, MAT, MrRTF | 0.73 | 0.57 | 0.61 | 0.68 | 0.56 | 0.65 | mtry = 1, ntree = 1000 |
SAFS-RF | EVI, NDVI, MrVBF, TWI, plan, CI, MAP | 0.77 | 0.59 | 0.57 | 0.65 | 0.53 | 0.71 | mtry = 5, ntree = 200 |
SBF-RF | EVI, NDVI, MrVBF, MrRTF, TWI, plan, slope, elevation, MAP | 0.75 | 0.59 | 0.59 | 0.66 | 0.57 | 0.69 | mtry = 3, ntree = 200 |
ALL-SVR | − | 0.83 | 0.63 | 0.51 | 0.89 | 0.72 | 0.44 | gamma = 0.0625, C = 1. |
RFE-SVR | MAP, MrRVBF, EVI, MAT, MrRTF | 0.69 | 0.53 | 0.66 | 0.79 | 0.66 | 0.52 | gamma = 0.125, C = 0.5 |
SAFS-SVR | EVI, MrVBF, TPI, plan, aspect, MAP, MAT | 0.69 | 0.50 | 0.66 | 0.84 | 0.69 | 0.50 | gamma = 0.015625, C = 16 |
SBF-SVR | EVI, NDVI, MrVBF, MrRTF, TWI, plan, slope, elevation, MAP | 0.70 | 0.52 | 0.65 | 0.80 | 0.65 | 0.51 | gamma = 0.015625, C = 4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, Z.-D.; Zhao, M.-S.; Lu, H.-L.; Wang, S.-H.; Lu, Y.-Y. Digital Mapping of Soil pH Based on Machine Learning Combined with Feature Selection Methods in East China. Sustainability 2023, 15, 12874. https://doi.org/10.3390/su151712874
Zhao Z-D, Zhao M-S, Lu H-L, Wang S-H, Lu Y-Y. Digital Mapping of Soil pH Based on Machine Learning Combined with Feature Selection Methods in East China. Sustainability. 2023; 15(17):12874. https://doi.org/10.3390/su151712874
Chicago/Turabian StyleZhao, Zhi-Dong, Ming-Song Zhao, Hong-Liang Lu, Shi-Hang Wang, and Yuan-Yuan Lu. 2023. "Digital Mapping of Soil pH Based on Machine Learning Combined with Feature Selection Methods in East China" Sustainability 15, no. 17: 12874. https://doi.org/10.3390/su151712874
APA StyleZhao, Z. -D., Zhao, M. -S., Lu, H. -L., Wang, S. -H., & Lu, Y. -Y. (2023). Digital Mapping of Soil pH Based on Machine Learning Combined with Feature Selection Methods in East China. Sustainability, 15(17), 12874. https://doi.org/10.3390/su151712874