*Article* **Prediction of Heavy Metal Concentrations in Contaminated Sites from Portable X-ray Fluorescence Spectrometer Data Using Machine Learning**

**Feiyang Xia 1,2,† , Tingting Fan 1,2,†, Yun Chen 1,2, Da Ding 1,2 , Jing Wei 1,2, Dengdeng Jiang 1,2 and Shaopo Deng 1,2,\***


**Abstract:** Portable X-ray fluorescence (pXRF) spectrometers provide simple, rapid, nondestructive, and cost-effective analysis of the metal contents in soils. The current method for improving pXRF measurement accuracy is soil sample preparation, which inevitably consumes significant amounts of time. To eliminate the influence of sample preparation on PXRF measurements, this study evaluates the performance of pXRF measurements in the prediction of eight heavy metals' contents through machine learning algorithm linear regression (LR) and multivariate adaptive regression spline (MARS) models. Soil samples were collected from five industrial sites and separated into high-value and low-value datasets with pXRF measurements above or below the background values. The results showed that for Cu and Cr, the MARS models were better than the LR models at prediction (the MARS-R<sup>2</sup> values were 0.88 and 0.78; the MARS-RPD values were 2.89 and 2.11). For the pXRF low-value dataset, the multivariate MARS models improved the pXRF measurement accuracy, with the R<sup>2</sup> values improved from 0.032 to 0.39 and the RPD values increased by 0.02 to 0.37. For the pXRF high-value dataset, the univariate MARS models predicted the content of Cu and Cr with less calculation. Our study reveals that machine learning methods can better predict the Cu and Cr of large samples from multiple contaminated sites.

**Keywords:** site investigation; in situ pXRF; multivariate adaptive regression splines (MARS); heavy metals; rapid field screening

### **1. Introduction**

Heavy metals are indestructible and non-biodegradable. They can occur in living organisms through biomagnification and bioaccumulation and present in high amounts in the environment, which leads to potential risks for human health and the environment [1–4]. Heavy metals can cause adverse effects on humans through the inhalation of respirable dust particles, the ingestion of foods from living organisms exposed to heavy metals, and dermal absorption [1–4].

Portable X-ray fluorescence (pXRF) spectrometers can provide simple, rapid, nondestructive, and cost-effective analysis of the metal contents in soils and have been widely used to assess environmental risks, predict soil properties, and evaluate soil fertility, among other uses [5–9]. According to the Chinese Standard Technical Guidelines for the Investigation on Soil Contamination of Land for Construction [10], the heavy metal rapid detector is recommended for the qualitative and quantitative analysis of heavy metals in soils in situ. The pXRF instrument can help to guide the selection of samples to be analyzed in the laboratory and make investigative and remediative decisions [11,12].

**Citation:** Xia, F.; Fan, T.; Chen, Y.; Ding, D.; Wei, J.; Jiang, D.; Deng, S. Prediction of Heavy Metal Concentrations in Contaminated Sites from Portable X-ray Fluorescence Spectrometer Data Using Machine Learning. *Processes* **2022**, *10*, 536. https://doi.org/ 10.3390/pr10030536

Academic Editors: Guining Lu, Zenghui Diao and Kaibo Huang

Received: 23 January 2022 Accepted: 28 February 2022 Published: 9 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The pXRF instrument realizes the qualitative and quantitative analysis of soil properties through X-ray fluorescence intensities. Normally, X-ray fluorescence intensities are used to evaluate elemental concentrations, mostly using fundamental parameters (FP), empirical coefficients methods, or Compton peak ratios [13], based on the assumptions of sample homogeneity, plain surface, negligible particle size effects, and a priori knowledge of the sample matrix composition [14]. Therefore, some factors, such as physical matrix effects (e.g.,: particle size, homogeneity, surface conditions), moisture content, and chemical matrix effects (e.g.: the presence of iron reduces Cu but enhances Cr measurements) influence the accuracy of the measurements [13,15]. The current method for improving pXRF measurement accuracy is the preparation of soil samples, including screening, grinding, drying, etc. [13,16–18]. Method 6200 [13] recommends that for the obtainment of high-quality data, samples should be dried for 2 to 4 h in a convection or toaster oven at a temperature not greater than 150 ◦C, and then ground with a mortar and pestle and passed through a 60-mesh sieve to achieve a uniform particle size. Sample grinding should continue until at least 90 percent of the original sample passes through the sieve.

In practice, in most site investigations, the pXRF instrument directly measures heavy metal contents without sample preparation. To eliminate the influence of sample preparation on pXRF measurements, models have been used to correct pXRF measurements through the correlation between pXRF measurements and laboratory concentrations. Linear regression (LR) models are commonly used to evaluate the accuracy of pXRF measurements [12,16,17]. Caporale et al. defined metal-based linear models predicting laboratory concentrations from pXRF measurements for two case studies (agricultural and industrial sites) [19]. Their linear regressions revealed strong variability among their studied metals, providing good correlations only for Cu, Pb, and Zn at both sites [19]. For most of the metals, each metal-regression line significantly differed between the two case studies, indicating the site-dependence of the regression fits [19]. Chen et al. built a general modeling method and process based on the relationship between pXRF measurements and site parameters (organic matter and water content) to construct pXRF correction models, which could improve each site's measurement accuracy [20]. The error in heavy metal pXRF measurements decreased from 22.9–75.7% to 9.6–26.9% and showed that models can be used to improve pXRF measurements for Pb, Zn, Fe, and Mn [20]. The results also indicated it is difficult to develop a model that is suitable for every site, because of the particularity of different sites [20]. In addition to site-specific models, some models were built on a large scale. Adler et al. adopted the machine learning method multiple linear regression (MLR), multivariate adaptive regression spline (MARS), and random forest (RF) to create national prediction models for Cu, Zn, and Cd concentrations in agricultural soil [21]. Predictive models using pXRF measurements were created and found to be applicable at the farm and national scales, and the results showed that the MLR model had good performance for predicting Zn, while the MARS model had better performance in the prediction of Cu and Cd in small-scale farmland [21].

In general, although sample preparations could improve pXRF measurement accuracy, they inevitably consume significant amounts of time and preclude the rapid selection of samples to be analyzed in the laboratory. Models have been used to correct pXRF measurements, but they were mainly site-specific models. Studies of predictive models for multiple industrial sites are still limited. Machine learning methods were successfully used in national agricultural soils in Sweden [21]. However, unlike agricultural soils, large variability in the spatial distribution and content of metals is generally recognized in the anthropogenically polluted soils of industrial sites [22]. Therefore, machine learning methods, including LR and non-linear MARS models, were explored to predict eight heavy metals (Cr, Ni, Cu, Zn, As, Cd, Hg, and Pb) in soil samples from five industrial sites. Laboratory concentrations were used to evaluate the predictive performance. In this study, the objectives were to (a) build prediction models of each heavy metal for samples from multiple industrial sites, (b) compare the performance of LR and MARS models for each

heavy metal, and (c) examine the models' performance when predicting heavy metals above or below the natural background values (BVs).

### **2. Materials and Methods**

### *2.1. Soil Sampling and pXRF Rapid Measurement*

The study is based on five site investigations; the industrial sites were formerly used as fertilizer factories, pesticide factories, or steel plants. Soil samples were collected according to the standard of Technical Specification for Soil Environmental Monitoring [23] and, subsequently, the pXRF was used to analyze the contents of heavy metals. After removing the gravel and other debris in soil samples, samples were put into transparent polyvinyl chloride (PVC) plastic bags, tamped, and flattened to ensure their thicknesses were at least 15 mm; they were then tested by pXRF in soil mode for at least 60 s.

In this study, the pXRF instruments included Explorer 9000 (Jiangsu Skyray Instrument Co., Ltd., Kunshan, China), X-MET7000 (Oxford Instruments, Shanghai, China), VANTA-VLW (Olympus, Center Valley, PA, USA), DP-4050 (Olympus, Center Valley, PA, USA), and VANTA-VCA (Olympus, Center Valley, PA, USA). The pXRF instrument used in each site was different; total five types of pXRF instrument were used, and the limits of detection for each heavy metal are shown in Table 1. All operators were well trained, and the procedure followed the manufacturer's instructions and the recommendations of Method 6200 [13]. Therefore, the influences of pXRF instruments were neglected and not discussed.


**Table 1.** Limits of detection (LODs) of each pXRF instrument (mg/kg).

### *2.2. Laboratory Analyses*

Soil samples were sent to laboratories to analyze Cr, Cu, Pb, As, Ni, Cd, Zn, and Hg concentrations. The analytical methods of each heavy metal are shown in Supplementary Table S1. The soil samples were air-dried, ground, and passed through a 100-mesh sieve, and then Cr, Ni, Cu, Zn, Cd, and Pb in soil samples were extracted by HCl-HNO3-HF-HClO<sup>4</sup> electric heating plate digestion. The Hg and As were extracted by aqua regia water bath digestion.

### *2.3. Data Preprocessing Method*

Before performing the regression analysis, outliers were removed based on the box-andwhiskers plot [24] and calculated in Python 3.7, according to the following upstream criteria:


### *2.4. Statistical Method*

Descriptive statistics (including mean, standard deviation, and coefficient of variation) were calculated in Python 3.7.

Pearson correlation coefficients between the pXRF measurements and the laboratory concentrations were calculated in SPSS 20.0.0. The Pearson correlation indicates the linearity between two parameters, and it is generally believed that the coefficient between 0.8–1.0 shows a highly related correlation; 0.6–0.8 shows a strong related correlation; 0.4–0.6 indicates a moderate related correlation; 0.2–0.4 shows a weak related correlation; and 0.0–0.2 shows a very weak related correlation or no related correlation [25].

The geo-accumulation index (*Igeo*) is widely used to estimate the magnitude of anthropogenic activities [19]. *Igeo* was originally proposed by Müller [26] and can be calculated as follows:

$$I\_{\rm geo} = \log\_2\left(\frac{\mathcal{C}\_n}{1.5B\_n}\right) \tag{1}$$

where *C<sup>n</sup>* is the metal content determined by laboratory (mg/kg) and *B<sup>n</sup>* is the background concentration (mg/kg); 1.5 was considered as natural fluctuations due to a very small anthropogenic influence. According to Müller [27], categories based on *Igeo* were established as follows: unpolluted (*Igeo* ≤ 0), unpolluted-to-moderately-polluted (0 < *Igeo* ≤ 1), moderately polluted (1 < *Igeo* ≤ 2), moderately-to-heavily-polluted (2 < *Igeo* ≤ 3), heavily polluted (3 < *Igeo* ≤ 4), heavily-to-extremely-polluted (4 < *Igeo* ≤ 5), and extremely polluted (*Igeo* > 5).

*2.5. Prediction Model*

2.5.1. Model Introduction

Linear Regression

As one of the most basic machine learning methods, the LR model is widely used in various fields. The linear regression model is a statistical analysis method and used to determine the quantitative relationship between two or more variables in regression analysis. The optimal parameters of the model are calculated by the least square method.

### Multivariate Adaptive Regression Spline Model

The MARS model is a spline regression method that can adaptively process highdimensional data; it was proposed by the statistician Jerome Friedman in 1991 [28]. It is a nonparametric statistical method based on a divide-and-conquer strategy in which the training data sets are partitioned into separate piecewise linear segments (splines) of differing gradients (slope). MARS makes no assumptions about the underlying functional relationships between dependent and independent variables. In general, the splines are connected smoothly together, and these piecewise curves (polynomials), also known as basis functions, result in a flexible model that can handle both linear and nonlinear behavior [29,30].

### Univariate and Multivariate Models

In this study, univariate LR and MARS and multivariate MARS models were adopted. The univariate model used pXRF measurements of one heavy metal as the predictor and realized prediction of its corresponding heavy metal content in soil samples. By contrast, the multivariate model used pXRF measurements of eight heavy metals as the predictors. The MARS model was used to build the multivariate model, since it allowed missing values in the predictors while the LR model did not.

### 2.5.2. Model Prediction and Validation Process

The model process is presented in Figure 1. Leave-one-out cross-validation (LOOCV) was adopted to evaluate the model's performance [7,31]. If the size of the dataset was N, then N-1 pieces of data were used for training, and the remaining pieces were used for validation. Each time, one datum was used as validation until all samples were validated, at which point, a total of N times was calculated. LOOCV is suitable for small datasets and can prevent over-fitting and evaluate the model's generalization ability.

2.5.2. Model Prediction and Validation Process

The model process is presented in Figure 1. Leave-one-out cross-validation (LOOCV)

The linear model was in Scikit-learn 0.22.1 from Python 3.7, and the MARS model

was adopted to evaluate the model's performance [7,31]. If the size of the dataset was N, then N-1 pieces of data were used for training, and the remaining pieces were used for validation. Each time, one datum was used as validation until all samples were validated, at which point, a total of N times was calculated. LOOCV is suitable for small datasets

and can prevent over-fitting and evaluate the model's generalization ability.

was from Py-earth 0.1.0. The LOOCV was from LeaveOneOut in Scikit-learn 0.22.1.

**Figure 1.** Flowchart of the prediction model.

**Figure 1.** Flowchart of the prediction model. The linear model was in Scikit-learn 0.22.1 from Python 3.7, and the MARS model was from Py-earth 0.1.0. The LOOCV was from LeaveOneOut in Scikit-learn 0.22.1.

### 2.5.3. Model Evaluation 2.5.3. Model Evaluation

Three parameters evaluated the predictive accuracy of the model: the determination coefficient (R2), the prediction of the root mean squared error (RMSE), and the ratio of percent deviation (RPD). The value of R2 reflected the stability of the model establishment and verification. The closer the R2 value to 1, the better the model. If the R2 value was more significant than 0.7, it was generally considered that the model was good [32]. The smaller the RMSE, the more stable the model's performance. RPD was the ratio of the standard deviation of the validation data to the RMSE of the predictive result, which could be used to judge the model's predictive ability. When RPD <1.4, the model could not realize pre-Three parameters evaluated the predictive accuracy of the model: the determination coefficient (R<sup>2</sup> ), the prediction of the root mean squared error (RMSE), and the ratio of percent deviation (RPD). The value of R<sup>2</sup> reflected the stability of the model establishment and verification. The closer the R<sup>2</sup> value to 1, the better the model. If the R<sup>2</sup> value was more significant than 0.7, it was generally considered that the model was good [32]. The smaller the RMSE, the more stable the model's performance. RPD was the ratio of the standard deviation of the validation data to the RMSE of the predictive result, which could be used to judge the model's predictive ability. When RPD < 1.4, the model could not realize prediction; when 1.4 ≤ RPD < 2.0, the model had regular predictive performance and could be used to perform rough predictions; when RPD ≥ 2.0, the model had excellent predictive ability [33].

### **3. Results**

### diction; when 1.4 ≤ RPD <2.0, the model had regular predictive performance and could be *3.1. Descriptive Statistics of pXRF-Measured Data and Laboratory-Analyzed Data*

used to perform rough predictions; when RPD ≥2.0, the model had excellent predictive ability [33]. **3. Results**  *3.1. Descriptive Statistics of pXRF-Measured Data and Laboratory-Analyzed Data*  The descriptive statistics and coefficients of variations (CVs) of the pXRF measurement and laboratory concentration of each heavy metal are presented in Table 2. The heavy metals As, Pb, and Cu had large sample sizes (2721, 2502, and 2232, respectively). The average concentrations of Cr, Ni, Cu, Zn, As, Cd, Hg, and Pb measured by pXRF were 102.95, 23.59, 47.18, 82.30, 10.81, 2.31, 0.38, and 27.20 mg/kg, respectively, which were smaller than the average laboratory concentrations of 121.61, 32.83, 57.93, 125.21, 11.87, 0.11, 0.13, and 36.14 mg/kg, respectively.

The descriptive statistics and coefficients of variations (CVs) of the pXRF measurement and laboratory concentration of each heavy metal are presented in Table 2. The The CVs of the pXRF measurements were comparable to those of the laboratory concentration of each heavy metal. The CVs of the pXRF measurements and laboratory concentrations of Cr, Cu, Zn, Cd, Hg, and Pb were greater than 1, which indicated higher variation, and that these heavy metals were greatly affected by anthropogenic influences [31]. Apart

heavy metals As, Pb, and Cu had large sample sizes (2721, 2502, and 2232, respectively). The average concentrations of Cr, Ni, Cu, Zn, As, Cd, Hg, and Pb measured by pXRF were

smaller than the average laboratory concentrations of 121.61, 32.83, 57.93, 125.21, 11.87,

centration of each heavy metal. The CVs of the pXRF measurements and laboratory

The CVs of the pXRF measurements were comparable to those of the laboratory con-

0.11, 0.13, and 36.14 mg/kg, respectively.

from Cd, no significant difference in metal concentration between the pXRF-measured data and the laboratory-measured data was observed for other metals, indicating that in situ pXRF can be reliably used to investigate the concentrations of heavy metals. For Cd, the statistical characteristics of the concentrations between the pXRF-measured data and the laboratory-measured data were different. These results may be explained by the low detection limits of the pXRF instrument (2 mg/kg).


**Table 2.** Statistical characteristics of pXRF and laboratory-analyzed results (mg/kg).

### *3.2. Univariate LR and MARS Model Predictive Results*

Predictive Results of Soil Samples from the Whole pXRF-Measured Dataset

The pXRF measurements of each heavy metal were used to predict the contents analyzed in the laboratory through the univariate LR and MARS model, according to the modeling process in Section 2.5.2. The predicted contents against the laboratory concentrations are shown in Figure 2.

The R<sup>2</sup> and RPD values of the MARS models for predicting Cr (0.88, 2.89) and Cu (0.77, 2.11) were larger than those of the LR models for Cr (0.8, 2.22) and Cu (0.73, 1.94), which indicated that the MARS models were better than the LR models at predicting Cu and Cr. For the other six heavy metals, the R<sup>2</sup> values of the LR and MARS models were smaller than 0.7, and their RPD values were smaller than 1.4, indicating that the LR and MARS models could not be used for predicting them. The fitness of the LR model for predicting Cr and Cu, and that of the MARS model for predicting Cu, were consistent with other research [12,20,21].

**Figure 2.** Concentrations of heavy metals predicted from each pXRF measurement using LR and MARS models against laboratory concentrations of the whole dataset. The red dashed line is the 1:1 line, the black line is the regression line, and the points are semi-transparent to show point density. (**a**–**h**) Prediction results of LR models. (**i**–**p**) Prediction results of MARS model.3.2.2. Predictive results of samples in pXRF high-value and low-value datasets. **Figure 2.** Concentrations of heavy metals predicted from each pXRF measurement using LR and MARS models against laboratory concentrations of the whole dataset. The red dashed line is the 1:1 line, the black line is the regression line, and the points are semi-transparent to show point density. (**a**–**h**) Prediction results of LR models. (**i**–**p**) Prediction results of MARS model.3.2.2. Predictive results of samples in pXRF high-value and low-value datasets.

Considering the need to accurately select high concentrations of heavy metal samples to be analyzed in the laboratory and the fact that contaminated industrial sites were primarily impacted by human activities, the first level in the Environmental Quality Standards for Soils [34] was used as the BV to divide the pXRF dataset into two parts (Table 3). The samples of pXRF measurements larger than the BV were classified into the pXRF high-value dataset, and the samples of the pXRF measurements that were lower than the BV were classified into the pXRF low-value dataset. Their corresponding laboratory-analyzed data were also divided into two datasets. The detailed statistical characteristics of the pXRF lowvalue and high-value datasets are shown in Supplementary Tables S2 and S3. The models were trained to predict heavy metal concentrations for samples in the pXRF high-value and the pXRF low-value datasets separately, and the predicted results are presented in Figures 3 and 4.

**Table 3.** Natural background value of each heavy metal and the concentration range of the two datasets.


The results of the pXRF low-value dataset showed that the R<sup>2</sup> and RPD values of the LR and MARS models were smaller than 0.1 and 1.4, which indicated that the models could not predict the concentrations of each heavy metal through the pXRF measurements (Figure 3).

For the pXRF high-value dataset (Figure 4), the R<sup>2</sup> and RPD values of the MARS models for predicting Cr (0.88, 2.84) and Cu (0.79, 2.18) were larger than those of the LR models for Cr (0.8, 2.22) and Cu (0.75, 2.00), which indicated that the MARS models were better than the LR models at predicting Cu and Cr. However, neither the LR model nor the MARS model were suitable for predicting the concentrations of the other six heavy metals for the samples in the pXRF high-value dataset.

In Figure 4a,b,i,j, when the laboratory concentrations of Cu and Cr were smaller than 2000 mg/kg, the LR model had more accurate predictive results than the MARS model, since the black points in the LR model were closer to the 1:1 line (the closer the points were to the 1:1 line, the more the predicted results equaled to the lab concentrations). When the laboratory concentrations were greater than 2000 mg/kg, the MARS model had more accurate predictive results than the LR model, and the predicted points were closer to the 1:1 line in the MARS model than in the LR model. The same results were also found for the samples from the whole dataset when predicting Cu and Cr (Figure 2).

**Figure 3.** Concentrations of heavy metals predicted from each pXRF measurement using LR and MARS models against laboratory values of samples in the pXRF low-value dataset. The red dashed line is the 1:1 line, the black line is the regression line, and the points are semi-transparent to show point density. (**a**–**h**) Prediction results of LR models. (**i**–**p**) Prediction results of MARS model. **Figure 3.** Concentrations of heavy metals predicted from each pXRF measurement using LR and MARS models against laboratory values of samples in the pXRF low-value dataset. The red dashed line is the 1:1 line, the black line is the regression line, and the points are semi-transparent to show point density. (**a**–**h**) Prediction results of LR models. (**i**–**p**) Prediction results of MARS model.

The results of the pXRF low-value dataset showed that the R2 and RPD values of the LR and MARS models were smaller than 0.1 and 1.4, which indicated that the models

**Figure 4.** Concentrations of heavy metals predicted from each pXRF measurement using LR and MARS models against laboratory values of samples in the pXRF high-value dataset. The red dashed line is the 1:1 line, the black line is the regression line, and the points are semi-transparent to show point density. (**a**–**h**) Prediction results of LR models. (**i**–**p**) Prediction results of MARS model. **Figure 4.** Concentrations of heavy metals predicted from each pXRF measurement using LR and MARS models against laboratory values of samples in the pXRF high-value dataset. The red dashed line is the 1:1 line, the black line is the regression line, and the points are semi-transparent to show point density. (**a**–**h**) Prediction results of LR models. (**i**–**p**) Prediction results of MARS model.

### *3.3. Multivariate MARS Model Predictive Results*

### 3.3.1. Predictive Results of Samples in the pXRF Low-Value Dataset

Unlike the univariate MARS model, which used the pXRF measurement of one heavy metal as the predictor, the multivariate MARS model used the pXRF measurements of other seven heavy metals as the predictors. We explored whether the increase in the predictors could improve the predicted results. The results of the multivariate MARS models for predicting the contents of Cr, Cu, Pb, As, Ni, Cd, Zn, and Hg in the samples from the pXRF low-value dataset are shown in Table 4.

**Table 4.** Validation statistics for predictive results of heavy metals of sample in the pXRF low-value dataset and Cr and Cu of sample in pXRF high-value dataset using univariate and multivariate MARS model.


Comparing the results between the univariate MARS and multivariate MARS models, the R<sup>2</sup> values improved from 0.032 to 0.39, the RPD values increased by 0.02 to 0.37, and the RMSE decreased by 0.22 to 4.73. The results showed that the predictive performance of the multivariate MARS models significantly improved for the heavy metals, except for Cd.

The multivariate MARS model had the best predictive ability for Cu (R<sup>2</sup> and RPD values were 0.51, 1.43, respectively). Compared with the univariate predictive result of Cu (R<sup>2</sup> and RPD values were 0.12, 1.06, respectively), the R<sup>2</sup> and RPD values of the multivariate MARS model increased by 0.39 and 0.37. However, for the other heavy metals, the RPD values were less than 1.4, indicating that the predictive abilities of the multivariate models for the other heavy metals were still limited.

### 3.3.2. Predictive Results of Samples in the pXRF High-Value Dataset

Given that the results that the predictive performance of the univariate MARS model for the Cr and Cu in the samples from the pXRF high-value dataset were good, Cr and Cu were selected to be predicted by the multivariate MARS models, and the results are presented in Table 4. The results showed that the predicted abilities of the univariate and multivariate MARS model for Cr were similar (the R<sup>2</sup> and RPD values were 0.87 and 2.80 and 0.88 and 2.92, respectively). For Cu, the univariate MARS model was better than the multivariate MARS model at prediction (the R<sup>2</sup> and RPD values were 0.79, 2.18, and 0.71, 1.84, respectively).

Overall, the multivariate MARS model was a slight improvement on the predictive performance for Cr and Cu of the univariate MARS model.

#### **4. Discussion 4. Discussion**

models.

#### *4.1. Influences of pXRF's Accuracy on Model's Predictive Results 4.1. Influences of pXRF's Accuracy on Model's Predictive Results*

The high accuracy of the pXRF instrument when measuring heavy metals resulted in strong linearity between the pXRF measurements and laboratory concentrations. For the univariate models, the linear correlations between the PXRF measurements and laboratory concentrations were related to the predicted performance of the models, especially the LR models. This study predicted Cu and Cr from corresponding pXRF measurements, while the models could not predict the other heavy metals. The Pearson correlation coefficients showed the linearity between the pXRF measurements and the corresponding laboratory concentrations, which were 0.9 and 0.88 for Cu and Cr, respectively (Figure 5). The coefficients of Cu and Cr were larger than those of the other heavy metals, which were smaller than 0.8 (Figure 5) and could explain the excellent predictive performance of the models. The high accuracy of the pXRF instrument when measuring heavy metals resulted in strong linearity between the pXRF measurements and laboratory concentrations. For the univariate models, the linear correlations between the PXRF measurements and laboratory concentrations were related to the predicted performance of the models, especially the LR models. This study predicted Cu and Cr from corresponding pXRF measurements, while the models could not predict the other heavy metals. The Pearson correlation coefficients showed the linearity between the pXRF measurements and the corresponding laboratory concentrations, which were 0.9 and 0.88 for Cu and Cr, respectively (Figure 5). The coefficients of Cu and Cr were larger than those of the other heavy metals, which were smaller than 0.8 (Figure 5) and could explain the excellent predictive performance of the

**Figure 5.** Pearson correlation between pXRF and laboratory value of samples in the whole dataset. **Figure 5.** Pearson correlation between pXRF and laboratory value of samples in the whole dataset.

The accuracy of the pXRF instrument for different heavy metals was not universal. The excellent linearity between the pXRF measurement and the laboratory concentrations of Cu coincided with the research of Kilbride et al. and Potts et al. [12,35]. Kilbride et al. measured Cu with a range from 3 to 5140 mg/kg, which was a similar range to that used in the current study (4–5000 mg/kg), and found a good accuracy of pXRF for Cu [12]. Potts et al. found that pXRF was not sufficiently sensitive for the determination of Cu with concatenations lower than 200 mg/kg [35]. Therefore, a wide range of Cu in our study could be accurately measured by pXRF. Some research also found a strong linear correlation between PXRF measurement and laboratory concentrations of As [12,16], while the results for As in the present study did not echo those of previous studies. Tian et al. found a weak correlation between the pXRF and laboratory data of As that might have been attributable to the narrow range of concentrations [36]. The range of As in our research was relatively narrow compared with those of Cu or Cr, which might be the reason for the poor linearity of As compared to Cu and Cr. Another reason through which to explain the poor linearity of As might be the presence of Pb. Some research indicated that the presence of high concentrations of Pb could compromise the pXRF's precision for As [37,38], since Pb and As x-rays would cause spectral interferences and impact each other during measurement The accuracy of the pXRF instrument for different heavy metals was not universal. The excellent linearity between the pXRF measurement and the laboratory concentrations of Cu coincided with the research of Kilbride et al. and Potts et al. [12,35]. Kilbride et al. measured Cu with a range from 3 to 5140 mg/kg, which was a similar range to that used in the current study (4–5000 mg/kg), and found a good accuracy of pXRF for Cu [12]. Potts et al. found that pXRF was not sufficiently sensitive for the determination of Cu with concatenations lower than 200 mg/kg [35]. Therefore, a wide range of Cu in our study could be accurately measured by pXRF. Some research also found a strong linear correlation between PXRF measurement and laboratory concentrations of As [12,16], while the results for As in the present study did not echo those of previous studies. Tian et al. found a weak correlation between the pXRF and laboratory data of As that might have been attributable to the narrow range of concentrations [36]. The range of As in our research was relatively narrow compared with those of Cu or Cr, which might be the reason for the poor linearity of As compared to Cu and Cr. Another reason through which to explain the poor linearity of As might be the presence of Pb. Some research indicated that the presence of high concentrations of Pb could compromise the pXRF's precision for As [37,38], since Pb and As x-rays would cause spectral interferences and impact each other during measurement [13].

[13]. The low Pearson coefficients of Ni and Cd (0.37, 0.07) indicated the poor accuracy of pXRF for Ni and Cd, which was also found by Kilbride et al. (2006). For Hg, most soil samples (75%) had laboratory concentrations smaller than 0.1 mg/kg (Table 2), which were below the pXRF detection limit of 0.8 mg/kg (Table 1). Similarly, as with Cd, more than 75% of the samples had laboratory concentrations below the pXRF detection limit of 2.2 mg/kg (Tables 1 and 2). Therefore, poor accuracy of pXRF was found for Hg and Cd [17].

### *4.2. Influences of Concentration on Model's Predictive Results*

Much research has confirmed that wider ranges of concentrations result in strong linearity between pXRF measurements and laboratory concentrations for Cu and Zn. The smaller the metal concentration in the soil sample, the higher the difference between the pXRF measurement and the laboratory concentration [36,39]. Li et al. also found that when the concentrations of Cu and Cr were greater than the first standard in the Environmental Quality Standards for Soils [34,40], which was used to separate the pXRF high-value and low-value datasets, the accuracy of the pXRF instrument was high. Therefore, high concentrations soil samples would result in better predictive model performance compared with low-concentration samples. The current study also confirmed the different predictive results between the pXRF high-value dataset and the pXRF low-value dataset. The R<sup>2</sup> and RPD univariate prediction model values for the samples from the pXRF high-value dataset were larger than those of the samples from the pXRF low-value dataset (Figures 3 and 4).

Although the high-concentration samples had good predictive results, the univariate predictive results between the samples from the whole dataset and the pXRF high-value dataset were not significantly different. Although the sample size of the pXRF low-value dataset was larger than that of the pXRF high-value dataset, the low-concentration data had little influence on the prediction model. The high-concentration data, especially some abnormally high-concentration data, were found in contaminated sites and usually came from anthropologic activities. These data were minor but would enlarge the x-coordinate and cluster small-value data to exert a small influence on the predictive results. Thus, the univariate models made no obvious difference to the prediction of heavy metal contents from the samples from the pXRF high-value and whole datasets.

The results showed that the univariate models had a similar predicted ability for the samples from the pXRF high-value and whole datasets. Based on the need to investigate high-concentration data in site investigations, and the fact that the soil samples with concentrations above the BV were fewer than the soil samples with concentrations below the Bs, pXRF could help select high-concentration data (above BVs) to train the models with fewer calculations.

### *4.3. Comparation between LR and MARS Models*

In this study, the MARS models showed less bias at high concentrations than the LR models (Figures 2 and 4). Adler et al. also found the same result, that MARS models had the least negative bias when predicting Cu and Cd at higher concentrations compared to MLR and RF models [21]. The better predictive ability of the MARS model in high-concentration ranges may be explained by the accuracy of the pXRF instrument and the advantage of the MARS model as a nonlinear model. The higher the concentrations of heavy metals in samples, the more accurate pXRF instrument [36,39]. Therefore, it could be inferred that the linear relation between the pXRF-measured data and the laboratory-analyzed data from the heavy metals differed between high- and low-concentration samples. The linear relationship was stronger in the samples with high concentrations than it was in those with low concentrations; therefore, the MARS model could build different linear models at different concentration ranges. This could help explain why the MARS model could perform better than the LR model by creating more than one linear model for predicting the concentrations of Cu and Cr.

### *4.4. Comparison between Univariate and Multivariate Models*

The multivariate MARS model was better than the univariate MARS model at predicting the heavy metal concentrations for the samples in the pXRF low-value dataset (Table 4).

However this result was not strongly expressed in the samples in the pXRF high-value dataset (Table 4). The soil samples from the pXRF low-value dataset had heavy metal contents measured by pXRF lower than the BV and were assumed not to have been interrupted by other

The multivariate MARS model was better than the univariate MARS model at predicting the heavy metal concentrations for the samples in the pXRF low-value dataset (Table 4). However this result was not strongly expressed in the samples in the pXRF high-

The soil samples from the pXRF low-value dataset had heavy metal contents measured by pXRF lower than the BV and were assumed not to have been interrupted by other pollution sources. The *Igeo* of each heavy metal sample was calculated from their laboratory concentrations and BVs, which indicated the magnitude of anthropogenic influences (Figure 6). For the pXRF low-value dataset, the *Igeo* results confirmed that these heavy metals were not polluted by human activities and came from the same natural source (Figure 6a). The pXRF measurements of the heavy that they received metals were below the BV, and the *Igeo* results for all the metals were negative, indicating no anthropogenic discharge contributions. Hence, the heavy metals in the samples from the pXRF low-value dataset came from the same natural source. According to the research about heavy metals' source apportionment, heavy metals with similar sources are highly correlated [41–43]. The correlation of each heavy metal would contribute to the good predictive performance of multivariate models compared with the univariate model. pollution sources. The *Igeo* of each heavy metal sample was calculated from their laboratory concentrations and BVs, which indicated the magnitude of anthropogenic influences (Figure 6). For the pXRF low-value dataset, the *Igeo* results confirmed that these heavy metals were not polluted by human activities and came from the same natural source (Figure 6a). The pXRF measurements of the heavy that they received metals were below the BV, and the *Igeo* results for all the metals were negative, indicating no anthropogenic discharge contributions. Hence, the heavy metals in the samples from the pXRF low-value dataset came from the same natural source. According to the research about heavy metals' source apportionment, heavy metals with similar sources are highly correlated [41-43]. The correlation of each heavy metal would contribute to the good predictive performance of multivariate models compared with the univariate model.

*Processes* **2022**, *10*, x FOR PEER REVIEW 15 of 18

*4.4. Comparison between Univariate and Multivariate Models* 

value dataset (Table 4).

**Figure 6.** *Igeo* of laboratory concentration of samples in pXRF low-value (**a**) and high-value datasets (**b**). **Figure 6.** *Igeo* of laboratory concentration of samples in pXRF low-value (**a**) and high-value datasets (**b**).

In the pXRF low-value dataset, the Pearson correlation coefficients between the pXRF measurements of Pb with laboratory concentrations of Cr and Cu were larger than 0.6 (Figure 7a), which was larger than the coefficients between the pXRF measurement with laboratory a concentration of Cr (0.35) and the pXRF measurement with a laboratory concentration of Cu (0.33). For Hg, the coefficient between the pXRF measurement of Cu and the laboratory concentration of Hg (0.57) was higher than the coefficient between the the pXRF measurement and the laboratory concentration of Hg (0.31). Therefore, adding the pXRF measurement of other heavy metals could improve the multivariate model's per-In the pXRF low-value dataset, the Pearson correlation coefficients between the pXRF measurements of Pb with laboratory concentrations of Cr and Cu were larger than 0.6 (Figure 7a), which was larger than the coefficients between the pXRF measurement with laboratory a concentration of Cr (0.35) and the pXRF measurement with a laboratory concentration of Cu (0.33). For Hg, the coefficient between the pXRF measurement of Cu and the laboratory concentration of Hg (0.57) was higher than the coefficient between the the pXRF measurement and the laboratory concentration of Hg (0.31). Therefore, adding the pXRF measurement of other heavy metals could improve the multivariate model's performance.

formance. In the pXRF high-value dataset, the samples had concentrations of heavy metal larger than the BV and were collected from different industry sites, which meant that these samples may have been polluted by different pollution sources. For the pXRF high-value dataset, a positive *Igeo* was observed for Cr, Cu, Zn, and Pb, and the *Igeo* for Zn was the largest, which showed moderate pollution (Figure 6b). The *Igeo* for Cr, Cu, and Pb showed unpolluted to moderately-polluted levels, and Ni, As, Cd, and Hg were observed with no anthropogenic influences. These results indicated that the pXRF instrument could roughly identify the anthropogenic pollution for Cr, Cu, Zn, and Pb. For Ni and As, the pXRF instrument performed poorly at identifying anthropogenic influences. For Cd and Hg, In the pXRF high-value dataset, the samples had concentrations of heavy metal larger than the BV and were collected from different industry sites, which meant that these samples may have been polluted by different pollution sources. For the pXRF high-value dataset, a positive *Igeo* was observed for Cr, Cu, Zn, and Pb, and the *Igeo* for Zn was the largest, which showed moderate pollution (Figure 6b). The *Igeo* for Cr, Cu, and Pb showed unpolluted to moderately-polluted levels, and Ni, As, Cd, and Hg were observed with no anthropogenic influences. These results indicated that the pXRF instrument could roughly identify the anthropogenic pollution for Cr, Cu, Zn, and Pb. For Ni and As, the pXRF instrument performed poorly at identifying anthropogenic influences. For Cd and Hg, since most of them had concentrations below the detection limit, the pXRF results were not convincing. Caporale et al. found that the laboratory content was much closer to the

content measured by pXRF when the source of the soil metal pollution was partially or completely from anthropogenic contamination [19]. This coincided with the finding in the current research that the coefficients between the pXRF measurements and the laboratory concentrations of Cu and Cr were larger than the other six heavy metals (Figure 7b). By contrast, Zn and Pb showed relatively low coefficients, which meant that the accuracy of pXRF at detecting them was poor compared to Cu and Cr (Figure 7b). content measured by pXRF when the source of the soil metal pollution was partially or completely from anthropogenic contamination [19]. This coincided with the finding in the current research that the coefficients between the pXRF measurements and the laboratory concentrations of Cu and Cr were larger than the other six heavy metals (Figure 7b). By contrast, Zn and Pb showed relatively low coefficients, which meant that the accuracy of pXRF at detecting them was poor compared to Cu and Cr (Figure 7b).

since most of them had concentrations below the detection limit, the pXRF results were not convincing. Caporale et al. found that the laboratory content was much closer to the

*Processes* **2022**, *10*, x FOR PEER REVIEW 16 of 18

**Figure 7.** Pearson correlation between pXRF measurement with laboratory concentrations of samples in pXRF low-value (**a**) and high-value datasets (**b**). In figure b, the high coefficients between laboratory data of Ni with pXRF measurement of Hg and Cr were due to the small sample size. **Figure 7.** Pearson correlation between pXRF measurement with laboratory concentrations of samples in pXRF low-value (**a**) and high-value datasets (**b**). In figure b, the high coefficients between laboratory data of Ni with pXRF measurement of Hg and Cr were due to the small sample size.

In the pXRF high-value dataset, the correlation coefficients between the pXRF measurements and the laboratory concentrations of Cr and Cu were the highest (Figure 7). There was no heavy metal with pXRF measurement significantly correlated to the laboratory concentrations of Cr and Cu. The different pollution sources explained why the correlated relationship between different heavy metals was not strong. Therefore, the pXRF measurement of other heavy metals was weekly correlated to the laboratory content of the heavy metal, and adding their pXRF measurements could hardly improve the model's In the pXRF high-value dataset, the correlation coefficients between the pXRF measurements and the laboratory concentrations of Cr and Cu were the highest (Figure 7). There was no heavy metal with pXRF measurement significantly correlated to the laboratory concentrations of Cr and Cu. The different pollution sources explained why the correlated relationship between different heavy metals was not strong. Therefore, the pXRF measurement of other heavy metals was weekly correlated to the laboratory content of the heavy metal, and adding their pXRF measurements could hardly improve the model's performance.

### performance. **5. Conclusions**

**5. Conclusions**  This study demonstrates that machine learning methods realized the prediction of Cu and Cr contents from pXRF measurements of soil samples from multiple contaminated sites. For Cu and Cr, the MARS model was better than the LR model at predicting the This study demonstrates that machine learning methods realized the prediction of Cu and Cr contents from pXRF measurements of soil samples from multiple contaminated sites. For Cu and Cr, the MARS model was better than the LR model at predicting the contents. The predicted results of samples in the pXRF high-value and pXRF low-value datasets showed that the univariate and multivariate MARS models performed well.

contents. The predicted results of samples in the pXRF high-value and pXRF low-value datasets showed that the univariate and multivariate MARS models performed well. In general, the different predictive models could be chosen for different purposes. To obtain accurate predictions for high-concentration soil samples, high-concentration soil samples (pXRF measurements above BVs) were used to train the univariate MARS models In general, the different predictive models could be chosen for different purposes. To obtain accurate predictions for high-concentration soil samples, high-concentration soil samples (pXRF measurements above BVs) were used to train the univariate MARS models with fewer calculations. To obtain accurate predictions for low-concentrations soil samples, multivariate MARS models could be used.

with fewer calculations. To obtain accurate predictions for low-concentrations soil samples, multivariate MARS models could be used. **Supplementary Materials:** The following supporting information can be downloaded at: www.mdpi.com/xxx/s1, Table S1: The standards of the analyzed method for selected metals in the laboratory; Table S2: Statistics characteristics of pXRF and Lab analyzed result of samples from pXRF low-value dataset; Table S3: Statistics characteristics of pXRF and Lab analyzed result of samples from pXRF high-value dataset; Table S4: Validation statistics for predictive results of heavy metals using LR model and MARS model; Table S5: Validation statistics for predictive results of **Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/pr10030536/s1, Table S1: The standards of the analyzed method for selected metals in the laboratory; Table S2: Statistics characteristics of pXRF and Lab analyzed result of samples from pXRF low-value dataset; Table S3: Statistics characteristics of pXRF and Lab analyzed result of samples from pXRF high-value dataset; Table S4: Validation statistics for predictive results of heavy metals using LR model and MARS model; Table S5: Validation statistics for predictive results of heavy metals of samples in the pXRF low-value dataset using univariate LR model and MARS model; Table S6: Validation statistics for predictive results of heavy metals of sample in pXRF high-value dataset using univariate LR model and MARS model.

heavy metals of samples in the pXRF low-value dataset using univariate LR model and MARS model; Table S6: Validation statistics for predictive results of heavy metals of sample in pXRF high-

value dataset using univariate LR model and MARS model.

**Author Contributions:** Conceptualization, F.X., T.F., Y.C. and D.D.; methodology, F.X.; simulation, F.X.; validation, F.X. and D.J.; Writing-original draft, F.X. and T.F.; Writing- review & editing, F.X., Y.C., D.D. and J.W.; Supervision, S.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** No new data were created or analyzed in this study. Data sharing is not applicable to this article.

**Acknowledgments:** This work was financially supported by the Natural Science Foundation of Jiangsu Province (No. BK 20180112), the National Natural Science Foundation of China (41807473), and the National Key R&D Program of China (2018YFC1801001).

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**

