Soil Heavy-Metal Pollution Prediction Methods Based on Two Improved Neural Network Models
Abstract
:1. Introduction
- (1)
- A new SA-FOA-BP neural network model was constructed to predict missing values of heavy metals in soil. This algorithm model primarily combines the simulated annealing algorithm with the fruit fly optimization algorithm to optimize the parameters in place of traditional methods for parameter optimization in the BP neural network, thereby addressing the shortcomings of the traditional BP neural network.
- (2)
- A spatial information fusion graph convolutional network prediction model, SE-GCN, was proposed. It establishes a spatial information encoder capable of perceiving spatial contextual information and embeds it with spatial autocorrelation, serving as auxiliary learning to predict the heavy-metal content in soil.
- (3)
- The comparative experiments were conducted on the prediction of missing values and contents of heavy metals in soil using two neural network models, and their application scenarios and applicability were analyzed. The experiment showed that, compared to the traditional BP neural network, the SA-FOA-BP neural network model demonstrated improvements in error evaluation metrics, indicating that SA-FOA effectively optimized the parameters. The SE-GCN model achieved more accurate predictions of soil heavy-metal contents compared to existing methods, suggesting that the spatial encoder effectively extracted spatial information.
2. Literature Review
2.1. Research on Missing Value Prediction
2.2. Prediction of Soil Heavy-Metal Content
3. Predicting Missing Soil Heavy-Metal Values Based on SA-FOA-BP
3.1. Adaptive Step-Size Fruit Fly Optimization Algorithm Based on Individual Differences
3.1.1. Steps and Problems of FOA
- (1)
- Randomly initialize the starting positions of the fruit fly population, and set the maximum number of iterations maxgen and the population size.
- (2)
- Determine the position of each fruit fly based on its flight direction and search step length.
- (3)
- Calculate the concentration judgement value Si, which is the reciprocal of the distance between the position and origin coordinates.
- (4)
- Substitute Si into the fitness function, which is also the odor concentration judgement function, to obtain the odor concentration for the individual fruit fly.
- (5)
- Integrate the fruit fly population; the individual with the highest odor concentration is selected to represent the optimal fruit fly position.
- (6)
- Save the best odor concentration value; find the coordinates of the individual with the best odor concentration value, and let the other fruit flies fly in that direction.
- (7)
- The model begins iterative optimization, repeating Steps 2–6 and assessing whether the odor concentration is better than that of the previous iteration. If it is better, continue to search; otherwise, maintain the position of the previous generation of fruit flies and end the algorithm.
3.1.2. FOA Combined with Simulated Annealing Algorithm
- (1)
- The objective function f(x), initial temperature t0, and minimum temperature tmin are set.
- (2)
- Let the current temperature be t and the feasible solution be x. The initial solution is randomly perturbed to generate a new solution.
- (3)
- The energy difference is calculated. If , then accept the new solution and replace the old solution, and move to the next iteration; if , then decide whether to keep the new solution based on probability according to the Metropolis criterion.
- (4)
- Repeat steps 2–3, gradually reducing the temperature after a certain number of iterations. When t < tmin, we iterate and output the optimal solution.
3.2. SA-FOA-BP Model
- (1)
- Model initialization: Set the number of neurons in each layer of the BP neural network, including the number of neurons in the input layer m, number of neurons in the hidden layer n, and number of neurons in the output layer l. Set the initial position of the fruit fly population and set the maximum number of iterations maxgen and the population size.
- (2)
- SA-FOA optimization: The process is introduced in Section 2.1 and is not repeated here. In this study, the error function e in the BP neural network is taken as the objective function, represented by concentration value in the FOA; the solution vector S is used as the comprehensive vector of connection weights and threshold values in the BP network. When the SA-FOA terminates, the weight and threshold values in the solution vector S are used to calculate the weight and threshold values of the current training sample in the BP neural network.
- (3)
- BP neural network training: Assign the weight and threshold values obtained from the SA-FOA optimization to the BP neural network and train the BP neural network according to the preset parameters.
4. Prediction of Soil Heavy-Metal Content Using Graph Convolutional Networks Integrating Spatial Information
4.1. Design of Spatial Information Encoders
4.2. Spatial Autocorrelation for Aiding Learning
4.3. Construction of Graph Convolutional Networks Integrating Spatial Information
- (1)
- Determine the graph structure; input the training dataset of soil heavy metal sampling points, including all features of the spatial dimension, the pollution source dataset, and input of sampling points to form an input vector. Integrate the characteristics of pollution sources to construct an adjacency matrix. Initialize the model parameters.
- (2)
- The model performs spatial feature extraction. The spatial features of each sampling point are extracted from the graph convolutional neural network. The coordinate embedding matrix obtained using the spatial information encoder is connected to the node features extracted by the model to provide training data for the GCN operator.
- (3)
- Further model training, output Moran’s I index of spatial autocorrelation, and provide auxiliary learning for prediction results.
- (4)
- Iteratively learn the SE-GCN model, calculate its loss function value, iterate continuously, and obtain an optimal SE-GCN model.
- (5)
- The test set is input into the optimal SE-GCN model to predict the heavy-metal content in the soil and the predicted values at the required points.
5. Analysis of Experimental Results
5.1. Analysis of Prediction Results of Missing Soil Heavy-Metal Values
5.2. Analysis of Soil Heavy-Metal Prediction Results
5.3. Analysis of Two Improved Applications of Neural Network Models
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Akhtar, M.N.; Shaikh, A.J.; Khan, A.; Awais, H.; Bakar, E.A.; Othman, A.R. Smart sensing with edge computing in precision agriculture for soil assessment and heavy metal monitoring: A review. Agriculture 2021, 11, 475. [Google Scholar] [CrossRef]
- Wei, J.; Kong, H.; Fan, W. Application of Big Data in the Remediation of Contaminated Sites. Asian Agric. Res. 2021, 13, 39–40. [Google Scholar]
- Li, Y.; Cao, Z.; Long, H.; Liu, Y.; Li, W. Dynamic analysis of ecological environment combined with land cover and NDVI changes and implications for sustainable urban–rural development: The case of Mu Us Sandy Land, China. J. Clean. Prod. 2017, 142, 697–715. [Google Scholar] [CrossRef]
- Shao, J.; Meng, W.; Sun, G. Evaluation of missing value imputation methods for wireless soil datasets. Pers. Ubiquit. Comput. 2017, 21, 113–123. [Google Scholar] [CrossRef]
- Freeman, B.S.; Taylor, G.; Gharabaghi, B.; Thé, J. Forecasting air quality time series using deep learning. J. Air Waste Manag. Assoc. 2018, 68, 866–886. [Google Scholar] [CrossRef]
- Deng, W.; Wang, G.; Zhang, X. A novel hybrid water quality time series prediction method based on cloud model and fuzzy forecasting. Chemom. Intell. Lab. Syst. 2015, 149, 39–49. [Google Scholar] [CrossRef]
- Park, J.; Müller, J.; Arora, B.; Faybishenko, B.; Pastorello, G.; Varadharajan, C.; Sahu, R.; Agarwal, D. Long-term missing value imputation for time series data using deep neural networks. Neural Comput. Appl. 2023, 35, 9071–9091. [Google Scholar] [CrossRef]
- Kayid, M. One Generalized Mixture Pareto Distribution and Estimation of the Parameters by the EM Algorithm for Complete and Right-Censored Data. IEEE Access 2021, 9, 149372–149382. [Google Scholar] [CrossRef]
- Bashir, F.; Wei, H.L. Handling missing data in multivariate time series using a vector autoregressive model-imputation algorithm. Neurocomputing 2018, 276, 23–30. [Google Scholar] [CrossRef]
- Gondara, L.; Wang, K. Mida: Multiple imputation using denoising autoencoders. In Advances in Knowledge Discovery and Data Mining, Proceedings of the 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, 3–6 June 2018; Springer International Publishing: Cham, Switzerland, 2018; pp. 260–272. [Google Scholar] [CrossRef]
- Rizal, N.N.M.; Hayder, G.; Mnzool, M.; Elnaim, B.M.E.; Mohammed, A.O.Y.; Khayyat, M.M. Comparison between regression models, support vector machine (SVM), and artificial neural network (ANN) in river water quality prediction. Processes 2022, 10, 1652. [Google Scholar] [CrossRef]
- Wang, H.; Yuan, Z.; Chen, Y.; Shen, B.; Wu, A. An industrial missing values processing method based on generating model. Comput. Netw. 2019, 158, 61–68. [Google Scholar] [CrossRef]
- Cao, W.; Zhang, C. A collaborative compound neural network model for soil heavy metal content prediction. IEEE Access 2020, 8, 129497–129509. [Google Scholar] [CrossRef]
- Yin, G.; Chen, X.; Zhu, H.; Chen, Z.; Su, C.; He, Z.; Qiu, J.; Wang, T. A novel interpolation method to predict soil heavy metals based on a genetic algorithm and neural network model. Sci. Total Environ. 2022, 825, 153948. [Google Scholar] [CrossRef]
- Ma, W.; Tan, K.; Du, P. Predicting soil heavy metal based on Random Forest model. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 4331–4334. [Google Scholar]
- Gao, Z.Y.; Xiao, R.B.; Wang, P.; Deng, Y.R.; Dai, W.J.; Liu, C.F. Improved Regression Kriging Prediction of the Spatial Distribution of the Soil Cadmium by Integrating Natural and Human Factors. Huan Jing Ke Xue Huanjing Kexue 2021, 42, 343–352. [Google Scholar] [PubMed]
- Niu, J.; Zhong, W.; Liang, Y.; Luo, N.; Qian, F. Fruit Fly Optimization Algorithm Based On Differential Evolution and Its Application on Gasification Process Operation Optimization. Knowl.-Based Syst. 2015, 88, 253–263. [Google Scholar] [CrossRef]
- Dowsland, K.A.; Thompson, J. Simulated Annealing. Handb. Nat. Comput. 2012, 43, 1623–1655. [Google Scholar]
- Kirkpatrick, S.; Vecchi, M.P. Optimization by Simulated Annealing. In Spin Glass Theory and Beyond: An Introduction to the Replica Method and Its Applications; World Scientific Publishing Company: Singapore, 1987. [Google Scholar]
- Iscan, H.; Gunduz, M. Parameter Analysis on Fruit Fly Optimization Algorithm. J. Comput. Commun. 2016, 2, 137–141. [Google Scholar] [CrossRef]
- Deng, Y.; Zhou, X.; Shen, J.; Xiao, G.; Hong, H.; Lin, H.; Wu, F.; Liao, B.Q. New Methods Based on Back Propagation (BP) and Radial Basis Function (RBF) Artificial Neural Networks (ANNs) for Predicting the Occurrence of Haloketones in Tap Water. Sci. Total Environ. 2021, 772, 145534. [Google Scholar] [CrossRef]
- Zhang, Y.; Du, D.; Shi, S.; Li, W.; Wang, S. Effects of the Earthquake Nonstationary Characteristics on the Structural Dynamic Response: Base on the BP Neural Networks Modified by the Genetic Algorithm. Buildings 2021, 11, 69. [Google Scholar] [CrossRef]
- Wu, Q. Image retrieval method based on deep learning semantic feature extraction and regularization softmax. Multimed Tools Appl. 2020, 79, 9419–9433. [Google Scholar] [CrossRef]
- Shi, K.; Chang, Z.; Chen, Z.; Wu, J.; Yu, B. Identifying and evaluating poverty using multisource remote sensing and point of interest (POI) data: A case study of Chongqing, China. J. Clean. Prod. 2020, 255, 120245. [Google Scholar] [CrossRef]
- Danel, T.; Spurek, P.; Tabor, J.; Śmieja, M.; Struski, Ł.; Słowik, A.; Maziarka, Ł. Spatial graph convolutional networks. In Proceedings of the International Conference on Neural Information Processing, Bangkok, Thailand, 18–22 November 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 668–675. [Google Scholar]
- Wang, F.; Li, C.T.; Qu, Y. Collective Geographical Embedding for Geolocating Social Network Users. In Advances in Knowledge Discovery and Data Mining, Proceedings of the 21st Pacific-Asia Conference, PAKDD 2017, Jeju, Republic of Korea, 23–26 May 2017; Springer International Publishing: Cham, Switzerland, 2017; pp. 599–611. [Google Scholar]
- Dwivedi, V.P.; Luu, A.T.; Laurent, T.; Bengio, Y.; Bresson, X. Graph neural networks with learnable structural and positional representations. arXiv 2021, arXiv:2110.07875. [Google Scholar]
- Kumar, H.S.; Manjunath, S.H. Use of empirical mode decomposition and K-nearest neighbour classifier for rolling element bearing fault diagnosis. Mater. Today Proc. 2022, 52, 796–801. [Google Scholar] [CrossRef]
- Zhu, D.; Liu, Y.; Yao, X.; Fischer, M.M. Spatial regression graph convolutional neural networks: A deep learning paradigm for spatial multivariate distributions. GeoInformatica 2021, 26, 645–676. [Google Scholar] [CrossRef]
Heavy Metal Elements | Sample Number | Average Actual Value/ (mg/kg−1) | SA-FOA-BP Neural Network Prediction Results | BP Neural Network Prediction Results | ||||
---|---|---|---|---|---|---|---|---|
Average Predicted Value/ (mg/kg−1) | Mean Relative Error | Overall Mean Error | Average Predicted Value/ (mg/kg−1) | Mean Relative Error | Overall Mean Error | |||
Hg | 1–10 | 0.191 | 0.208 | 19.46% | 16.35% | 0.183 | 27.57% | 27.601% |
11–20 | 0.168 | 0.184 | 14.25% | 0.159 | 26.86% | |||
21–30 | 0.172 | 0.167 | 18.36% | 0.175 | 28.66% | |||
31–40 | 0.171 | 0.175 | 13.28% | 0.156 | 27.32% | |||
Cd | 1–10 | 0.367 | 0.365 | 17.14% | 16.37% | 0.332 | 27.31% | 26.58% |
11–20 | 0.197 | 0.192 | 15.95% | 0.186 | 26.21% | |||
21–30 | 0.274 | 0.257 | 16.44% | 0.226 | 25.95% | |||
31–40 | 0.163 | 0.174 | 16.74% | 0.175 | 26.83% | |||
Pb | 1–10 | 23.251 | 24.459 | 15.84% | 14.69% | 24.248 | 27.24% | 26.389% |
11–20 | 26.658 | 25.909 | 15.16% | 26.876 | 26.32% | |||
21–30 | 27.685 | 28.440 | 13.39% | 25.855 | 26.81% | |||
31–40 | 24.429 | 24.138 | 14.35% | 24.274 | 25.14 |
Element | RMSE | MAE | R2 | |
---|---|---|---|---|
Hg | 0 | 0.0265 | 0.0007 | 0.857 |
0.25 | 0.0207 | 0.0004 | 0.912 | |
0.5 | 0.0226 | 0.0005 | 0.896 | |
0.75 | 0.0223 | 0.0005 | 0.898 | |
Ni | 0 | 1.095 | 4.973 | 0.837 |
0.25 | 0.774 | 3.517 | 0.918 | |
0.5 | 0.818 | 3.763 | 0.909 | |
0.75 | 0.86 | 3.836 | 0.899 |
Element | Model | RMSE | MAE | R2 |
---|---|---|---|---|
Hg | SE-GCN | 0.0207 | 0.0004 | 0.912 |
GCN | 0.0273 | 0.0007 | 0.848 | |
DCNN | 0.0219 | 0.0005 | 0.902 | |
Kriging | 0.0393 | 0.0015 | 0.686 | |
RBF | 0.0370 | 0.0014 | 0.720 | |
Ni | SE-GCN | 0.774 | 3.517 | 0.918 |
GCN | 2.394 | 5.733 | 0.871 | |
DCNN | 2.722 | 7.41 | 0.834 | |
Kriging | 3.512 | 12.335 | 0.723 | |
RBF | 3.687 | 13.596 | 0.695 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Z.; Zhang, W.; He, Y. Soil Heavy-Metal Pollution Prediction Methods Based on Two Improved Neural Network Models. Appl. Sci. 2023, 13, 11647. https://doi.org/10.3390/app132111647
Wang Z, Zhang W, He Y. Soil Heavy-Metal Pollution Prediction Methods Based on Two Improved Neural Network Models. Applied Sciences. 2023; 13(21):11647. https://doi.org/10.3390/app132111647
Chicago/Turabian StyleWang, Zhangang, Wenshuai Zhang, and Yunshan He. 2023. "Soil Heavy-Metal Pollution Prediction Methods Based on Two Improved Neural Network Models" Applied Sciences 13, no. 21: 11647. https://doi.org/10.3390/app132111647
APA StyleWang, Z., Zhang, W., & He, Y. (2023). Soil Heavy-Metal Pollution Prediction Methods Based on Two Improved Neural Network Models. Applied Sciences, 13(21), 11647. https://doi.org/10.3390/app132111647