Soil Organic Matter Estimation Model Integrating Spectral and Profile Features
Abstract
:1. Introduction
2. Materials and Methods
2.1. Soil Profile Data and Spectral Data Acquisition
2.2. Predictive Modeling of Soil Organic Matter Content Using Fusion Features
- Data preprocessing: The diverse levels of soil profile information from sampling points within the study region are collated, accompanied by the laboratory-measured organic matter content of soil samples and spectral data. The leave-one-out method is used, coupled with normalization techniques, to preprocess the profile information, thereby deriving distinctive profile features.
- Feature extraction: The risk of overfitting escalates when the number of features surpasses the quantity of samples. To circumvent this issue, we integrate the PCA technique into our feature extraction methodology. This involves utilizing PCA to downscale both the full-band spectra and the feature bands selected with SCARS. PCA is used to extract the principal component features, and the Lasso method and SCARS feature selection technique are used to extract pertinent features from the comprehensive full-band spectra. These principal component features and selected bands are subsequently merged with the profile features, yielding three sets of combination features—namely, PCA features-fused profile features, Lasso features-fused profile features, and SCARS-PCA features-fused profile features.
- Model construction: The combination features are integrated with a regression algorithm to construct the SOM content prediction model based on fused features and normalized soil organic matter content.
2.2.1. Data Preprocessing
2.2.2. Feature Variable Selection Method
- PCA feature extraction
- Lasso feature selection
- SCARS feature selection
2.2.3. Regression Algorithm
2.2.4. Metrics for Model Assessment
3. Results
3.1. Prediction of SOM Content Using Single-Type Features
3.2. Prediction of SOM Content Using Fusion Features
4. Discussion
4.1. Lasso-Selected Features’ Impact on SOM Prediction Models
4.2. SCARS-Selected Features’ Impact on SOM Prediction Models
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhang, X.Y.; Yao, Y.M.; Yan, X.Z. Research progress on prediction of soil organic matter content by mid-infrared spectroscopy. Soil Fertil. Sci. China 2021, 4, 327–336. [Google Scholar]
- Tao, Z.P.; Xu, Z.H.; Ding, J.N.; Zhang, Y. Determination of soil organic matter content under forest based on different methods. Sci. Technol. Eng. 2022, 22, 3892–3901. [Google Scholar]
- Yumiti, M.M.; Wang, X.M. Hyperspectral estimation of soil organic matter content based on continuous wavelet transformation. Spectrosc. Spectr. Anal. 2022, 42, 1278–1284. [Google Scholar]
- Li, X.; Fan, Z.Q.; Gao, H.; Zhang, X.Y.; Dong, Y.S.P.; Hong, P.Z.; Wang, K.; Liu, P.Z.; Du, C.W.; Li, X.J.; et al. Construction of soil organic matter rapid detection model based on hyperspectral. J. Shandong Agric. Univ. 2021, 52, 833–839. [Google Scholar]
- Allo, M.; Todoroff, P.; Jameux, M.; Stern, M.; Paulin, L.; Albrecht, A. Prediction of tropical volcanic soil organic carbon stocks by visible-near- and mid-infrared spectroscopy. Catena 2020, 189, 104452. [Google Scholar] [CrossRef]
- Zhou, W.; Xie, L.J.; Yang, H.; Hua, L.; Li, H.R.; Yang, M. Hyperspectral inversion of soil organic matter content in the three-rivers source region. Chin. J. Soil Sci. 2021, 52, 564–574. [Google Scholar]
- Shang, T.H.; Mao, H.X.; Zhang, J.H.; Chen, R.H.; Wang, F.; Jia, K.L. Hyperspectral estimation of soil organic matter content in Yinchuan plain, China based on PCA sensitive band screening and SVM modeling. Chin. J. Ecol. 2021, 40, 4128–4136. [Google Scholar]
- Gou, Y.; Zhao, Y.; Li, Y.; Zhuo, Z.; Cao, M.; Huang, Y. Soil organic matter content in dryland farmland in northeast China with hyperspectral reflectance based on CWT-SCARS. Trans. Chin. Soc. Agric. Mach. 2022, 53, 331–337. [Google Scholar]
- Liu, J.; Dong, Z.; Xia, J.; Wang, H.; Meng, T.; Zhang, R.; Han, J.; Wang, N.; Xie, J. Estimation of soil organic matter content based on characteristic variable selection and regression methods. Acta Opt. Sin. 2019, 39, 361–371. [Google Scholar]
- Li, X.Y.; Fang, P.P.; Liu, Y.; Qian, W.; Lu, M. Extracting characteristic wavelength of soil nutrients based on multi-classifier fusion. Spectrosc. Spectr. Anal. 2019, 39, 2862–2867. [Google Scholar]
- Yu, L.; Hong, Y.; Zhou, Y.; Zhu, Q.; Xu, L.; Li, J.; Nie, Y. Wavelength variable selection methods for estimation of soil organic matter content using hyperspectral technique. Trans. Chin. Soc. Agric. Eng. 2016, 32, 95–102. [Google Scholar]
- Hao, X.X. Change Characteristic of Soil Organic Matter in Mollisol Profile under Different Ecosystem. Ph.D. Thesis, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China, 2017; pp. 56–61. [Google Scholar]
- Zhang, X.; Li, M.J.; Liu, X.B.; Wu, W. Distribution characteristics and influence factors of organic matter content in cultivated soil in different horizons in hilly areas. Resour. Environ. Yangtze Basin 2020, 29, 2696–2708. [Google Scholar]
- Gao, L.; Chen, X.; Lin, C.; Wang, W.; Zhang, Y. Characteristic of soil profile and nutrient change of fragrant taro typical region in Shaoguan. Southwest China J. Agric. Sci. 2018, 31, 1864–1869. [Google Scholar]
- Jia, Q.W.; Liu, X.F.; Xiao, P.Y. Composition and distribution characteristics of organic matter in soil profiles of Yancheng flats. Wetl. Sci. 2015, 13, 74–79. [Google Scholar]
- Xu, X.B.; Lu, J.S.; Wu, Q.Y.; Qing, Y.; Xu, Z.; Cao, J. Prediction of soil organic matter based PCA-MLR and PCA-BPN algorithm using field VNIR spectroscopy in coastal soils of southern Laizhou bay. Spectrosc. Spectr. Anal. 2018, 38, 2556–2562. [Google Scholar]
- Yan, X.Z.; Yao, Y.M.; Zhang, X.Y. The progress and prospect of soil organic matter mapping based on remote sensing technology. China Agric. Inform. 2019, 31, 13–26. [Google Scholar]
- Ai, T.H. Some thoughts on deep learning enabling cartography. Acta Geod. Cartogr. Sin. 2021, 50, 1170–1182. [Google Scholar]
- Li, Y.; Liu, X.L.; Peng, J.; Li, X.; Wu, J.L. Inversion of desert soil organic matter content using visible-infrared spectrum in southern Xinjiang. Chin. J. Soil Sci. 2018, 49, 767–772. [Google Scholar]
- Zhang, D.H.; Zhao, Y.J.; Qin, K.; Pei, C.K.; Zhao, N.B. A review of hyperspectral multivariate information extraction models for soils. Soil Fertil. Sci. China 2018, 2, 22–28. [Google Scholar]
- Zhang, Z.T.; Lao, C.C.; Wang, H.F.; Arnon, K.; Chen, J.Y.; Li, Y. Estimation of desert soil organic matter through hyperspectral based on fractional-order derivatives and SVMDA-RF. Trans. Chin. Soc. Agric. Mach. 2020, 51, 156–167. [Google Scholar]
- Ma, C.Y.; Sun, Y.Q.; Wu, Z.F.; Zhang, J.; Niu, Y.; Hou, Z.; Chen, J. Spatial prediction of topsoil organic matter of arable land by different models at the regional scale. Chin. J. Soil Sci. 2021, 52, 1261–1272. [Google Scholar]
- He, S.F.; Shen, L.M.; Xie, H.X. Hyperspectral estimation model of soil organic matter content using generative adversarial networks. Spectrosc. Spectr. Anal. 2021, 41, 1905–1911. [Google Scholar]
- Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018, 2018, 6638–6648. [Google Scholar]
Feature Name | CS-03-Aa | CS-03-Ap | CS-03-B | CS-03-Br | CS-03-Bg | CS-03-Er |
---|---|---|---|---|---|---|
Profile_level | 1 | 2 | 3 | 4 | 5 | 6 |
Color_class | 7.5 | 5.5 | 10 | 10 | 10 | 2.5 |
Color_value | 4 | 4 | 4 | 3 | 2 | 5 |
Color_chroma | 4 | 6 | 4 | 3 | 2 | 3 |
Plant_root_thickness | Medium | Thin | Thin | Minuteness | None | None |
Plant_root_abundance | Many | few | few | Seldom | None | None |
Degree_of_soil_ structure_development | Strong | Strong | Medium | Weak | Weak | Weak |
Porosity | High | Medium | Medium | Medium | Medium | Medium |
Pore_size | Medium | Thin | Thin | Minuteness | Minuteness | Minuteness |
Pore_abundance | Medium | Few | Few | Seldom | Seldom | Seldom |
Plasticity | Medium | Medium | Medium | Medium | Strong | Strong |
pH | 9 | 8.5 | 8.5 | 8.5 | 8.2 | 8.2 |
Feature Name | CS-03-Aa | CS-03-Ap | CS-03-B | CS-03-Br | CS-03-Bg | CS-03-Er |
---|---|---|---|---|---|---|
Profile_level | 0.947 | 0.578 | 0.315 | 0.234 | 0.142 | 0.066 |
Color_class | 0.287 | 0.152 | 0.021 | 0.019 | 0.049 | 0.324 |
Color_value | 0.675 | 0.676 | 0.684 | 0.896 | 0.466 | 0.218 |
Color_chroma | 0.761 | 0.376 | 0.779 | 0.892 | 0.824 | 0.988 |
Plant_root_thickness | 0.676 | 0.403 | 0.412 | 0.101 | 0.015 | 0.020 |
Plant_root_abundance | 0.885 | 0.416 | 0.431 | 0.045 | 0.018 | 0.024 |
Degree_of_soil_ structure_development | 0.924 | 0.927 | 0.707 | 0.237 | 0.294 | 0.311 |
Porosity | 0.855 | 0.345 | 0.356 | 0.354 | 0.386 | 0.396 |
Pore_size | 0.838 | 0.739 | 0.754 | 0.325 | 0.389 | 0.407 |
Pore_abundance | 0.220 | 0.008 | 0.017 | 0.031 | 0.079 | 0.093 |
Plasticity | 0.930 | 0.934 | 0.945 | 0.943 | 0.633 | 0.654 |
pH | 0.292 | 0.775 | 0.872 | 0.856 | 0.000 | 0.080 |
Algorithm | Hyperparameter Configuration |
---|---|
RCV | alphas = np.arange (1, 10, 0.2) |
LR | Normalize = False |
PLSR | Normalize = False |
RF | n_estimatiors = 300, criterion = ‘mse’, max_depth = 7 |
SVM | Kernel = ‘rbf’ |
CatBoost | Iterations = 100, depth = 10 |
LightGBM | Objective = ‘regression’, n_estimations = 300 |
ExtraTrees | Criterion = ‘mse’, min_samples_split = 2 |
XGBoost | n_estimators = 300, learning_rate = 0.08, gamma = 0, subsample = 0.75, colsample_bytree = 1, max_depth = 7, tree_method = ‘approx’ |
Algorithm | Five PCA Features of Full-Band Spectra | Twelve Profile Features | ||||||
---|---|---|---|---|---|---|---|---|
Training Set | Test Set | Training Set | Test Set | |||||
R2 | RMSE | R2 | RMSE | R2 | RMSE | R2 | RMSE | |
RCV | 0.684 | 0.172 | 0.680 | 0.145 | 0.709 | 0.165 | 0.633 | 0.155 |
LR | 0.688 | 0.170 | 0.661 | 0.149 | 0.740 | 0.156 | 0.484 | 0.184 |
PLSR | 0.688 | 0.170 | 0.661 | 0.149 | 0.711 | 0.164 | 0.596 | 0.163 |
RF | 0.950 | 0.068 | 0.780 | 0.120 | 0.954 | 0.065 | 0.746 | 0.129 |
SVM | 0.810 | 0.133 | 0.797 | 0.116 | 0.876 | 0.107 | 0.561 | 0.170 |
CatBoost | 0.970 | 0.053 | 0.707 | 0.139 | 0.978 | 0.022 | 0.732 | 0.133 |
LightGBM | 0.830 | 0.126 | 0.687 | 0.144 | 0.945 | 0.071 | 0.618 | 0.159 |
ExtraTrees | 0.974 | 0.049 | 0.795 | 0.119 | 0.986 | 0.020 | 0.738 | 0.131 |
XGBoost | 0.978 | 0.020 | 0.798 | 0.115 | 0.961 | 0.031 | 0.767 | 0.124 |
Algorithm | Eleven Lasso-Selected Features | Five SCARS-PCA Features | ||||||
---|---|---|---|---|---|---|---|---|
Training Set | Test SET | Training Set | Test Set | |||||
R2 | RMSE | R2 | RMSE | R2 | RMSE | R2 | RMSE | |
RCV | 0.732 | 0.161 | 0.684 | 0.144 | 0.756 | 0.151 | 0.759 | 0.126 |
LR | 0.670 | 0.175 | 0.660 | 0.150 | 0.802 | 0.136 | 0.695 | 0.142 |
PLSR | 0.679 | 0.173 | 0.646 | 0.153 | 0.802 | 0.136 | 0.695 | 0.142 |
RF | 0.946 | 0.071 | 0.718 | 0.136 | 0.958 | 0.062 | 0.738 | 0.131 |
SVM | 0.760 | 0.149 | 0.757 | 0.127 | 0.851 | 0.118 | 0.729 | 0.134 |
CatBoost | 0.965 | 0.052 | 0.762 | 0.125 | 0.955 | 0.073 | 0.627 | 0.157 |
LightGBM | 0.923 | 0.085 | 0.651 | 0.152 | 0.918 | 0.087 | 0.668 | 0.148 |
ExtraTrees | 0.988 | 0.012 | 0.738 | 0.131 | 0.985 | 0.016 | 0.837 | 0.103 |
XGBoost | 0.980 | 0.018 | 0.759 | 0.126 | 0.946 | 0.072 | 0.712 | 0.138 |
Algorithm | Five PCA Features of Full-Band Spectra and 12 Profile Features | Eleven Lasso-Selected Features and 12 Profile Features | Five SCARS-PCA Features and 12 Profile Features | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Training Set | Test Set | Training Set | Test Set | Training Set | Test Set | |||||||
R2 | RMSE | R2 | RMSE | R2 | RMSE | R2 | RMSE | R2 | RMSE | R2 | RMSE | |
RCV | 0.850 | 0.118 | 0.694 | 0.142 | 0.814 | 0.132 | 0.706 | 0.139 | 0.864 | 0.112 | 0.718 | 0.136 |
LR | 0.868 | 0.111 | 0.609 | 0.160 | 0.858 | 0.115 | 0.769 | 0.123 | 0.957 | 0.063 | 0.749 | 0.128 |
PLSR | 0.842 | 0.121 | 0.634 | 0.155 | 0.829 | 0.126 | 0.687 | 0.144 | 0.805 | 0.135 | 0.711 | 0.138 |
RF | 0.976 | 0.048 | 0.896 | 0.083 | 0.974 | 0.049 | 0.885 | 0.087 | 0.975 | 0.048 | 0.915 | 0.075 |
SVM | 0.912 | 0.091 | 0.785 | 0.119 | 0.916 | 0.088 | 0.815 | 0.110 | 0.921 | 0.086 | 0.774 | 0.122 |
CatBoost | 0.964 | 0.050 | 0.695 | 0.142 | 0.963 | 0.052 | 0.883 | 0.095 | 0.965 | 0.058 | 0.813 | 0.111 |
LightGBM | 0.971 | 0.052 | 0.808 | 0.113 | 0.948 | 0.069 | 0.760 | 0.126 | 0.955 | 0.064 | 0.723 | 0.135 |
ExtraTrees | 0.964 | 0.053 | 0.931 | 0.068 | 0.968 | 0.051 | 0.907 | 0.078 | 0.950 | 0.064 | 0.903 | 0.080 |
XGBoost | 0.948 | 0.069 | 0.874 | 0.091 | 0.941 | 0.071 | 0.892 | 0.084 | 0.953 | 0.061 | 0.888 | 0.086 |
Experiment | Alpha Configuration | Optimal Alpha | Selected Wavelength |
---|---|---|---|
1 | np.arange (0.01, 1, 0.01) | 0.01 | [‘770’, ‘772’, ‘773’, ‘774’, ‘775’, ‘1279’] |
2 | np.arange (0.001, 0.01, 0.0001) | 0.0014 | [‘353’, ‘767’, ‘768’, ‘769’, ‘770’, ‘1356’, ‘1357’, ‘1358’, ‘1359’] |
3 | np.arange (0.0001, 0.001, 0.00001) | 0.00061 | [‘353’, ‘767’, ‘768’, ‘769’, ‘770’, ‘1356’, ‘1357’, ‘1358’, ‘1359’,‘1360’, ‘2486’] |
4 | np.arange (0.00001, 0.0001, 0.000001) | 0.000099 | [‘353’, ‘610’, ‘611’, ‘612’, ‘613’, ‘614’, ‘1355’, ‘1356’, ‘1357’,‘1358’, ‘1359’, ‘1360’, ‘2408’] |
Algorithm | Lasso-Selected Features Modeling | Lasso-Selected Features and 12 Profile Features | ||||||
---|---|---|---|---|---|---|---|---|
Mean | Standard Deviation | Minimum | Maximum | Mean | Standard Deviation | Minimum | Maximum | |
RCV | 0.675 | 0.037 | 0.619 | 0.697 | 0.709 | 0.007 | 0.703 | 0.715 |
LR | 0.733 | 0.035 | 0.685 | 0.769 | 0.608 | 0.048 | 0.566 | 0.672 |
PLSR | 0.670 | 0.014 | 0.660 | 0.692 | 0.700 | 0.010 | 0.687 | 0.710 |
RF | 0.698 | 0.099 | 0.572 | 0.789 | 0.899 | 0.009 | 0.886 | 0.907 |
SVM | 0.771 | 0.036 | 0.727 | 0.802 | 0.731 | 0.038 | 0.675 | 0.756 |
CatBoost | 0.669 | 0.102 | 0.529 | 0.750 | 0.853 | 0.003 | 0.850 | 0.856 |
LightGBM | 0.627 | 0.020 | 0.602 | 0.651 | 0.739 | 0.055 | 0.704 | 0.819 |
ExtraTrees | 0.651 | 0.123 | 0.472 | 0.750 | 0.903 | 0.008 | 0.892 | 0.909 |
XGBoost | 0.653 | 0.099 | 0.513 | 0.728 | 0.904 | 0.010 | 0.892 | 0.914 |
Algorithm | Five SCARS-PCA Features Modeling | Five SCARS-PCA Features and 12 Profile Features | ||||||
---|---|---|---|---|---|---|---|---|
Mean | Standard Deviation | Minimum | Maximum | Mean | Standard Deviation | Minimum | Maximum | |
RCV | 0.702 | 0.007 | 0.692 | 0.710 | 0.712 | 0.005 | 0.708 | 0.721 |
LR | 0.761 | 0.089 | 0.648 | 0.870 | 0.727 | 0.080 | 0.633 | 0.809 |
PLSR | 0.664 | 0.034 | 0.622 | 0.689 | 0.622 | 0.021 | 0.602 | 0.645 |
RF | 0.714 | 0.037 | 0.657 | 0.761 | 0.872 | 0.033 | 0.822 | 0.915 |
SVM | 0.749 | 0.049 | 0.670 | 0.791 | 0.761 | 0.009 | 0.745 | 0.770 |
CatBoost | 0.700 | 0.059 | 0.635 | 0.769 | 0.801 | 0.028 | 0.774 | 0.848 |
LightGBM | 0.649 | 0.079 | 0.525 | 0.725 | 0.675 | 0.033 | 0.646 | 0.725 |
ExtraTrees | 0.670 | 0.014 | 0.652 | 0.689 | 0.858 | 0.028 | 0.833 | 0.903 |
XGBoost | 0.742 | 0.063 | 0.645 | 0.820 | 0.870 | 0.022 | 0.835 | 0.889 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
He, S.; Tan, S.; Shen, L.; Zhou, Q. Soil Organic Matter Estimation Model Integrating Spectral and Profile Features. Sensors 2023, 23, 9868. https://doi.org/10.3390/s23249868
He S, Tan S, Shen L, Zhou Q. Soil Organic Matter Estimation Model Integrating Spectral and Profile Features. Sensors. 2023; 23(24):9868. https://doi.org/10.3390/s23249868
Chicago/Turabian StyleHe, Shaofang, Siqiao Tan, Luming Shen, and Qing Zhou. 2023. "Soil Organic Matter Estimation Model Integrating Spectral and Profile Features" Sensors 23, no. 24: 9868. https://doi.org/10.3390/s23249868
APA StyleHe, S., Tan, S., Shen, L., & Zhou, Q. (2023). Soil Organic Matter Estimation Model Integrating Spectral and Profile Features. Sensors, 23(24), 9868. https://doi.org/10.3390/s23249868