Author Contributions
Conceptualization, K.A., K.P., and M.S.; methodology, K.A., K.P., and M.S.; software, K.A.; validation, K.A.; formal analysis, K.A.; investigation, K.A.; data curation, K.A., O.A., and J.E.; writing—original draft preparation, K.A.; writing—review and editing, K.A., K.P., M.S., J.E., and O.A.; supervision, K.P., M.S., and J.E. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Map of Sweden showing soil sampling locations used in the present study. Farm dataset refers to the nine farms that were used for independent validation of the models. National dataset refers to the calibration samples. Base map courtesy of Environmental Systems Research Institute (ESRI) (Redlands, CA, USA).
Figure 1.
Map of Sweden showing soil sampling locations used in the present study. Farm dataset refers to the nine farms that were used for independent validation of the models. National dataset refers to the calibration samples. Base map courtesy of Environmental Systems Research Institute (ESRI) (Redlands, CA, USA).
Figure 2.
Concentrations of copper (Cu), zinc (Zn), and cadmium (Cd) predicted from portable X-ray fluorescence (PXRF) measurements using multiple linear regression (MLR), random forest regression (RF), and multivariate adaptive regression splines (MARS) for national-scale data using leave-one-out cross-validation compared with 7M HNO3 extraction and inductively coupled (ICP) analysis. The symbols are semi-transparent to show point density.
Figure 2.
Concentrations of copper (Cu), zinc (Zn), and cadmium (Cd) predicted from portable X-ray fluorescence (PXRF) measurements using multiple linear regression (MLR), random forest regression (RF), and multivariate adaptive regression splines (MARS) for national-scale data using leave-one-out cross-validation compared with 7M HNO3 extraction and inductively coupled (ICP) analysis. The symbols are semi-transparent to show point density.
Figure 3.
Concentrations of copper (Cu), zinc (Zn), and cadmium (Cd) predicted from portable X-ray fluorescence (PXRF) measurements using multiple linear regression (MLR), random forest regression (RF), and multivariate adaptive regression splines (MARS) on the farm dataset compared with 7M HNO3 extraction and inductively coupled (ICP) analysis. The models were calibrated at the national scale and applied on the farm dataset. Each color represents a specific farm.
Figure 3.
Concentrations of copper (Cu), zinc (Zn), and cadmium (Cd) predicted from portable X-ray fluorescence (PXRF) measurements using multiple linear regression (MLR), random forest regression (RF), and multivariate adaptive regression splines (MARS) on the farm dataset compared with 7M HNO3 extraction and inductively coupled (ICP) analysis. The models were calibrated at the national scale and applied on the farm dataset. Each color represents a specific farm.
Table 1.
The minimum, maximum, mean, median, and standard deviation (SD) of cation exchange capacity (CEC) at pH 7 (cmolc kg−1) for base saturation (%), soil organic matter (SOM) (%), clay content (%), and pH in the topsoil samples of arable land in Sweden used in the analyses (n = 1520).
Table 1.
The minimum, maximum, mean, median, and standard deviation (SD) of cation exchange capacity (CEC) at pH 7 (cmolc kg−1) for base saturation (%), soil organic matter (SOM) (%), clay content (%), and pH in the topsoil samples of arable land in Sweden used in the analyses (n = 1520).
| Minimum | Maximum | Mean | Median | SD |
---|
CEC | 3 | 70 | 17 | 15 | 8 |
Base saturation | 8 | 100 | 69 | 72 | 21 |
SOM | 0.8 | 16.6 | 4.5 | 4.2 | 1.8 |
Clay content | 2 | 80 | 23 | 19 | 15 |
pH | 4.5 | 8.4 | 6.2 | 6.2 | 0.6 |
Table 2.
Descriptive statistics of the elements used for modelling after removal of samples with “not a number” (NaN) classification in any of the elements included (n = 1520). Minimum, maximum, mean, median, and standard deviation (SD) are presented as mg kg−1, where values < 1000 were rounded to the closest integer and values > 1000 to three significant digits. Rec = mean recovery rates from four measurements based on reference standard 2709a from the National Institute of Standards and Technology (NIST) (%); Rec-SD = standard deviation of the four recovery rates (%).
Table 2.
Descriptive statistics of the elements used for modelling after removal of samples with “not a number” (NaN) classification in any of the elements included (n = 1520). Minimum, maximum, mean, median, and standard deviation (SD) are presented as mg kg−1, where values < 1000 were rounded to the closest integer and values > 1000 to three significant digits. Rec = mean recovery rates from four measurements based on reference standard 2709a from the National Institute of Standards and Technology (NIST) (%); Rec-SD = standard deviation of the four recovery rates (%).
Element | Minimum | Maximum | Mean | Median | SD | Rec | Rec-SD |
---|
Pb | 8 | 146 | 19 | 18 | 7 | 63 | 10.8 |
Cs | 10 | 56 | 33 | 34 | 9 | 970 | 33.1 |
Zn | 16 | 518 | 72 | 67 | 32 | 92 | 2.1 |
V | 33 | 411 | 93 | 90 | 30 | 123 | 18.6 |
Rb | 32 | 181 | 104 | 100 | 26 | 83 | 0.8 |
Sr | 71 | 378 | 142 | 132 | 49 | 92 | 0.8 |
Zr | 71 | 955 | 251 | 240 | 77 | 65 | 0.9 |
Ba | 197 | 1140 | 491 | 487 | 98 | 87 | 2.4 |
Mn | 124 | 6000 | 542 | 481 | 345 | 97 | 2.7 |
Ti | 1630 | 6890 | 3860 | 3880 | 765 | 114 | 1.8 |
Ca | 2980 | 196,000 | 11,100 | 9710 | 9390 | 105 | 1.5 |
Fe | 4370 | 93,000 | 21,500 | 19,300 | 9760 | 84 | 0.6 |
K | 11,400 | 36,200 | 24,100 | 24,300 | 4180 | 96 | 1.3 |
Table 3.
Descriptive statistics of lab-analyzed copper (Cu), zinc (Zn), and cadmium (Cd) for the calibration data (national dataset, n = 1520) and validation data (farm dataset, n = 179). Minimum, maximum, mean, median, and standard deviation (SD) are presented as mg kg−1 rounded to the closest integer, apart from those for Cd.
Table 3.
Descriptive statistics of lab-analyzed copper (Cu), zinc (Zn), and cadmium (Cd) for the calibration data (national dataset, n = 1520) and validation data (farm dataset, n = 179). Minimum, maximum, mean, median, and standard deviation (SD) are presented as mg kg−1 rounded to the closest integer, apart from those for Cd.
Lab-Analyzed Element | Minimum | Maximum | Mean | Median | SD |
---|
National dataset | | | | | |
Cu | 2 | 130 | 14 | 11 | 10 |
Zn | 6 | 557 | 61 | 56 | 33 |
Cd | 0.04 | 4.07 | 0.20 | 0.17 | 0.17 |
Farm dataset | | | | | |
Cu | 3 | 77 | 22 | 17 | 19 |
Zn | 22 | 135 | 72 | 67 | 30 |
Cd | 0.06 | 1.60 | 0.37 | 0.21 | 0.38 |
Table 4.
Validation statistics from the cross-validation of the multiple linear regression (MLR), random forest regression (RF), and multivariate adaptive regression spline (MARS) models for copper (Cu), zinc (Zn), and cadmium (Cd). R2 = coefficient of determination; MAE = mean absolute error (mg kg−1); ROI = range of interest (0–20 mg kg−1 for Cu and 0–0.5 mg kg−1 for Cd).
Table 4.
Validation statistics from the cross-validation of the multiple linear regression (MLR), random forest regression (RF), and multivariate adaptive regression spline (MARS) models for copper (Cu), zinc (Zn), and cadmium (Cd). R2 = coefficient of determination; MAE = mean absolute error (mg kg−1); ROI = range of interest (0–20 mg kg−1 for Cu and 0–0.5 mg kg−1 for Cd).
Model | R2 | MAE | R2-ROI | MAE-ROI |
---|
Cu-MLR | 0.58 | 3.87 | 0.06 | 3.00 |
Cu-RF | 0.63 | 3.48 | 0.20 | 2.69 |
Cu-MARS | 0.59 | 3.72 | 0.04 | 2.94 |
Zn-MLR | 0.92 | 5.60 | - | - |
Zn-RF | 0.86 | 5.93 | - | - |
Zn-MARS | 0.92 | 5.63 | - | - |
Cd-MLR | 0.49 | 0.065 | −0.17 | 0.057 |
Cd-RF | 0.48 | 0.053 | 0.40 | 0.043 |
Cd-MARS | 0.70 | 0.054 | 0.20 | 0.047 |
Table 5.
Validation statistics from the farm dataset of the multiple linear regression (MLR), random forest regression (RF), and multivariate adaptive regression spline (MARS) models for copper (Cu), zinc (Zn), and cadmium (Cd). R2 = coefficient of determination; MAE = mean absolute error (mg kg−1); ROI = range of interest (0–20 mg kg−1 for Cu and 0–0.5 mg kg−1 for Cd).
Table 5.
Validation statistics from the farm dataset of the multiple linear regression (MLR), random forest regression (RF), and multivariate adaptive regression spline (MARS) models for copper (Cu), zinc (Zn), and cadmium (Cd). R2 = coefficient of determination; MAE = mean absolute error (mg kg−1); ROI = range of interest (0–20 mg kg−1 for Cu and 0–0.5 mg kg−1 for Cd).
Model | R2 | MAE | R2-ROI | MAE-ROI |
---|
Cu-MLR | 0.90 | 4.40 | 0.12 | 3.56 |
Cu-RF | 0.84 | 4.51 | 0.54 | 2.43 |
Cu-MARS | 0.94 | 3.21 | 0.47 | 2.72 |
Zn-MLR | 0.96 | 4.40 | - | - |
Zn-RF | 0.94 | 5.40 | - | - |
Zn-MARS | 0.97 | 4.00 | - | - |
Cd-MLR | 0.74 | 0.121 | 0.34 | 0.052 |
Cd-RF | 0.74 | 0.109 | 0.44 | 0.050 |
Cd-MARS | 0.80 | 0.087 | 0.50 | 0.043 |
Table 6.
Confusion matrices for classifications above and below thresholds for copper (Cu) fertilization and sewage sludge application for Cu, zinc (Zn), and cadmium (Cd) using the best models for each element in the cross-validation. Swedish recommendations suggest that there is risk of Cu deficiency if the Cu concentration in the soil is below 8 mg kg−1, while sewage sludge application is prohibited if the concentrations of Cu, Zn, and Cd exceed 40, 100, and 0.4 mg kg−1, respectively.
Table 6.
Confusion matrices for classifications above and below thresholds for copper (Cu) fertilization and sewage sludge application for Cu, zinc (Zn), and cadmium (Cd) using the best models for each element in the cross-validation. Swedish recommendations suggest that there is risk of Cu deficiency if the Cu concentration in the soil is below 8 mg kg−1, while sewage sludge application is prohibited if the concentrations of Cu, Zn, and Cd exceed 40, 100, and 0.4 mg kg−1, respectively.
Cu Fertilization | Lab-Analyzed | Total |
---|
Below Threshold | Above Threshold |
---|
Predicted | Below Threshold | 224 | 70 | 294 |
Above Threshold | 200 | 1026 | 1226 |
Total | 424 | 1096 | |
Cu Sewage Sludge | Lab-Analyzed | Total |
Below Threshold | Above Threshold |
Predicted | Below Threshold | 1490 | 27 | 1517 |
Above Threshold | 2 | 1 | 3 |
Total | 1492 | 28 | |
Zn Sewage Sludge | Lab-Analyzed | Total |
Below threshold | Above Threshold |
Predicted | Below Threshold | 1337 | 21 | 1358 |
Above Threshold | 44 | 118 | 162 |
Total | 1381 | 139 | |
Cd Sewage Sludge | Lab-Analyzed | Total |
Below Threshold | Above Threshold |
Predicted | Below Threshold | 1437 | 49 | 1486 |
Above Threshold | 18 | 16 | 34 |
Total | 1455 | 65 | |