2.1. Characteristics of Soil Properties and PAHs Concentrations
The collected soils were dominated by loamy sands with 70.7%, 26.8% and 1.9% of sand, silt and clay, respectively (
Table 1). Most of soils exhibited relatively high acidity (median for pH
KCl was 5.2; interquartile range 4.6–5.6).
The content of TN and C
org was in the range of 0.2–11.7 g kg
−1 (CoV 112%) and 5.2–187.2 g kg
−1 (CoV 142%), respectively. However, in 75% of soils, TN did not exceed 1.4 g kg
−1, and C
org was below 17.6 g kg
−1, which corresponds to the lower range according to the European Union criteria [
33]. The low content of organic matter and clay promotes the increase in contaminant availability, which may result in disturbance of soil functions and create risk to soil organisms [
11].
The total concentration of 16PAHs varied strongly throughout the sampling area (CoV = 410%), and ranged from 311 to 316.1 × 10
3 µg kg
−1, with a mean value of 11.8 × 10
3 µg kg
−1 and a median of 1252 µg kg
−1 (
Table 1). Average PAHs content significantly exceeded the values reported for the agricultural soils from non-industrial areas in Poland (291–616 µg kg
−1 [
4,
5,
34]), Spain (112 µg kg
−1 [
35]), Norway (63 µg kg
−1 [
36]), Italy (333 µg kg
−1 [
37]), as well as in China (202–219 µg kg
−1 [
6,
38]). However, our results were comparable to the PAHs concentration reported for agricultural soils in Czech Republic (847 µg kg
−1 [
39]), for soils from rural regions in the surroundings of coking plant (691 µg kg
−1 [
40], 3016 µg kg
−1 [
7]) and mining areas (1280 µg kg
−1 [
41], 917 µg kg
−1 [
42]) in China. Similar high levels of PAHs were also observed in forest soils from the Silesia region in Poland (1050 µg kg
−1 [
8]), as well as in urban soils from Spain (1002 µg kg
−1 [
35]), Portugal (1544 µg kg
−1 [
2]), Slovakia (2065 µg kg
−1 [
43]) and Poland (1331 µg kg
−1 [
44]).
Contamination status of the area was characterized using previously developed classification system proposed by Maliszewska-Kordybach [
45] and Terelak et al. [
46]. The 16PAH concentrations were classified as follows: background value (<200 µg kg
−1, class 0), non-contaminated (200–600 µg kg
−1, class 1), moderately contaminated (600–1000 µg kg
−1, class 2), contaminated (1000–5000 µg kg
−1, class 3), heavily contaminated (5000–10,000 µg kg
−1, class 4), and very heavily contaminated (>10,000 µg kg
−1, class 5). According to these criteria, 78% of soil samples should be classified as contaminated to varying degree, with PAH concentrations ranging from 1000 to 5000 µg kg
−1 in 50% of samples, and PAH > 5000 µg kg
−1 in 12% of samples. The highest PAH levels were found in the samples from central and north parts of the Czerwionka region and might be attributed to PAH emission sources: historical long-term coke production and coal mining, and nowadays production of bituminous products, waste recovery, high traffic volume and house heating [
11].
The high molecular weight hydrocarbons (HMW; ≥4 rings) dominated the PAHs profile in the region (
Table 1) and ranged from 180 to 270 × 10
3 µg kg
−1. 54% of the HMW pool were compounds considered to be carcinogenic (ΣPAH
Carcin), and 30% four hydrocarbons (BbF, BkF, BaPyr, IndPyr), that are monitored within the European Monitoring and Evaluation Programme of Air Emissions [
47]. The annual emission of ΣPAH
4em in the study area is estimated at 1.75 kg per km
2, and the dust emission (another anthropogenic index) at 6827 kg per km
2 [
4]. The predominance of HMW is typical in areas of high anthropogenic pressure [
2,
4,
6,
7,
42] and may be due to the higher persistence of these compounds in soils, a dominance of combustion over petrogenic sources, and the tendency of HMW PAHs to accumulate in soils that are close to emission sources [
2,
4,
5,
10,
36,
44].
The PAHs isomer ratio of individual compounds is a common tool to identify sources of these hydrocarbons [
4,
7,
10,
37]. Three of the most stable molecular ratios were used in this study (
Table 1). The Fln/(Fln + Pyr) ratio ranged between 0.41 to 0.67 and pointed to coal and biomass combustion in almost 99% of samples. Similarly, the BaA/(BaA + Ch) ratios above 0.35 indicated combustion of vegetation and fossil fuel as a PAH source. This was also confirmed by the IndPyr/(IndPyr + BPer) ratio > 0.5, however, in 36% of samples this ratio ranged from 0.2 to 0.5, which is more typical for liquid fossil fuel combustion (vehicle and crude oil) [
4,
37].
2.2. Evaluation of Potential Ecological Risk—Chemical Indexes
Potential ecological risk was firstly assessed using hazard quotient values calculated for 10 individual PAHs included in Polish soil regulation [
48]. According to Polish guidelines, the acceptable maximum permissible concentration (MPC) for Anth, Ch, IndPyr and BPer is 200 μg kg
−1, while for Napht, BaA, BbF, BkF, BaPyr and DahA, the MPC is 100 μg kg
−1. HQ values of individual PAH compound > 1 indicate a possible ecological risk [
11,
17,
49]. Average HQ value of a single chemical (expressed as median) varied significantly depending on PAH compound from 0.06 for Anth to 1.32 for BbF (
Table 2), what allows us to conclude that according to individual PAH, 7 to 62% of the study area might be under ecological risk (HQ above 1), and needs further actions (e.g., site-specific evaluation or remediation).
To reflect the potential risk caused by PAHs mixture the hazard index (HI) was additionally calculated, and to assess the degree of potential risk the approach of Moreno-Jiménez et al. [
17] was used. As a threshold value for HI index above which the possible risk occurs—the value of 10 was assigned [
11,
14]. Obtained HI values ranged between 1.56 and 1523 (
Table 2) but exceeded the threshold for HI only in 23% of soil samples from the study region. The moderate risk (HI = 10–100) and high risk (HI > 100) were noted in 13 and 4 soil samples, respectively.
Many authors have emphasized that the HQ-based methodology, although frequently used for screening risk assessment [
17,
49], as a very conservative approach, which may lead to risk overestimation [
3,
9]. Furthermore, in our study HQ/HI indexes were calculated only for 10PAH compounds to thus better predict ecological risk in the study region, other approaches were applied and toxic units and toxic pressure were also calculated.
The model of toxic units is frequently used in ecotoxicology and represents the ratio between the concentration of a component in the mixture and its toxicological endpoint, e.g., effect concentration (EC
50) or non-observed effect concentration (NOEC) [
3,
49,
50,
51]. Toxic units of the mixture (TUm) is calculated based on the concentration addition model (CA), which is based on the assumption that the mixture components possess a similar toxicological mode or mechanism of action, and in the case of PAHs it is caused by narcosis [
50,
51]. In our study, the measured soil PAH concentrations were compared to PNEC values for 16PAH compounds (
Table 3). PNECs were adapted from the EU risk assessment report for coal-tar pitch [
3,
52], which were derived by applying an assessment factor to the lowest available NOEC values for the most sensitive terrestrial species, e.g.,
Folsomia candida,
Folsomia fimetaria,
Lactuca sativa or nitrifying bacteria (depending on the PAH compound). Although such deterministic approach based on only one species is rather conservative, it is a good protective measure and it is suggested for risk assessment of chemical mixtures in a first-tier of ERA process [
51,
52]. The TU values calculated for each individual PAH and TUm for 16PAHs can be found in
Table 3.
Toxic units higher than 1 indicate that unacceptable effects on soil organisms are likely to occur. The TU for individual hydrocarbons varied significantly (CoV from 325 to 465%) and on average was 0.12 for Flu up to 20.5 for BaPyr. Generally, TU above 1 was noted mainly for higher molecular PAH compounds, an unacceptable effect on soil organisms may occur in 22–23% of soil samples contaminated with IndPyr, BPer and BbF, and in 43 and 66% of soil samples due to contamination by BaA and BaPyr, respectively. For other PAHs, the risk predicted with TU was acceptable (TU < 1) in 88–96% of soil samples (
Table 3). The toxic units calculated for the 16PAHs mixture (TUm) ranged from 1.2 to 1858 with a median value of 5.7, which indicated that the exposure to the 16PAHs mixture may reveal the risk to soil organisms in 100% of soil samples (
Table 3). For the rating of the TUm values and to differentiate the magnitude of risk to soil organisms, the classification system elaborated by Persoone et al. [
53] was adapted, according to which soil samples with TUm < 1 reveal no toxicity, when TUm is between 1-10, samples are considered as lowly toxic, while samples with TUm above 10 and 100 could be classified as highly toxic and very highly toxic, respectively. It was found that TUm values for the majority (77%) of soil samples ranged between 1.26 to 9.56 and could be considered as lowly toxic; the highest TUms above 100 were only in four sampling points (5.4% of the study area) located in the surroundings of the main contamination sources—coking plant and coal mining landfill [
11].
The third approach used to assess potential ecological risk to the environment was based on toxic pressure (TP) calculation for both individual compounds, as well as for the 16PAHs mixture (
Figure 1,
Table 4).
The response addition model suggested by Jensen et al. [
13] was applied for TP of mixture determination. This model assumes dissimilar modes of action for all compounds [
13], although, the same mode of action is normally considered for chemicals like PAHs [
3]. TP for single chemicals were calculated using the soil screening values as the effect level. Since according to Polish soil guidelines, screening values are available only for 10PAHs, to calculate toxic pressure for the 16PAHs, the maximum permissible concentrations (MPC,
Table 4) from the Dutch regulations were adopted [
3,
54]. MPC values reflect PAH concentrations that should protect all species in the ecosystem from adverse effects. In our study, TPs obtained for individual hydrocarbons were rather low, and ranged on average from 0.03 for Flu to 0.26 for BaPyr (
Figure 1). To assess the magnitude of risk that pollutants may pose to the soil organisms, TP classification according to Jensen et al. [
13] and Dagnino et al. [
15] were applied. Considering the TP of single chemicals, the majority of samples (66–96%) have TP values below the lowest threshold 0.25 (
Table 4), which indicates no risk in this area. The highest TPs were noted for higher molecular weight (>4 ring) PAH, and can be ordered as follows: BaPyr > BaA > BbF > IndPyr > BPer > Pyr > DahA = Ch = BkF. Completely different results were obtained when TPs were calculated for the 16PAH mixture, TPm ranged from 0.0 to 1.0 with an average of 0.53 (
Figure 1). The lowest TPm < 0.25 (no risk) was found in only 34% of soil samples (
Table 4,
Figure 2), in 11% of samples TPm ranged 0.34–0.50 (low risk), in 20% of samples between 0.51 and 0.74 (moderate risk), and as much as 35% of soil samples had TPm value above 0.75 (high risk). Extremely high TPm values (above 0.9) were observed in 16 soil samples from the study region. However, it should be remembered that TP index in our study reflects the total PAHs concentration and is based on the conservative assumption that 100% of measured contaminant concentration is bioavailable [
9,
11,
13]. On the other hand, it is considered that such conservative approach (highly protective for soil organisms) should be used in a generic first tier risk assessment [
3,
51].
2.3. Evaluation of Potential Human Health Risk—Chemical Indexes
Toxic equivalency factors provided by Nisbet and LaGoy [
55] were used in our study to estimate the exposure risks posed by individual compounds and mixture of PAHs to human health. The toxic potency of each PAH (TEQ) was assessed on the basis of its benzo(a)pyrene equivalent concentration [
3,
7,
37,
55,
56] and for 16PAHs ranged from 21 to 52 × 10
3 µg kg
−1 with a median value of 151 µg kg
−1 (
Table 5).
Our results were comparable to toxic PAH concentrations reported for industrial and residential regions in Poland [
8,
44], Spain [
35] and Italy [
37], as well as for rural regions under high anthropogenic pressure in China [
7,
42]. Results highlighted a significant variation of TEQs values (CoV up to 465%) through the study region (
Figure 3). For the majority of samples TEQs ranged from 21 to 584 µg kg
−1 and were below the safe value of 600 µg kg
−1 established as a limit for protection of human health (regardless of the land use) in Canadian soil quality guidelines [
3,
57], which indicated a generally low risk in about 88% of samples from Czerwionka region. To better evaluate our results, we compared TEQ values with other guidelines values available for BaPyr [
3,
57,
58]. All TEQ in our study exceeded the most conservative US EPA soil screening level of BaPyr for residential areas (15 µg kg
−1 [
3]). About 38% had the TEQ below 100 µg kg
−1, and 31% of samples < 200 µg kg
−1, which corresponded to values for residential areas in Italy and Spain [
3,
58], respectively. The highest toxicity value of PAHs (TEQ above Danish cut-off criteria 1000 µg kg
−1 [
3]) in soils was concentrated in the central and north part of the Czerwionka region (
Figure 3) in the immediate vicinity of the main contamination sources (coke plant, asphalt production plant, post-mining landfill and coal recovery plant).
It is worth emphasizing that only in one sampling point TEQ exceeded 2000 µg kg
−1 corresponding to the Finnish guideline value [
58] which considers a target total lifetime cancer risk (TLCR) of 10
−5, and only in three samples TEQ was above 10,000 µg kg
−1 (Italian screening value for commercial and residential areas, [
3]).
2.4. Application of VIS-NIR for Potential Ecological and Human Risk Prediction
Until now, VIS-NIR spectroscopy was used for total and bioavailable PAHs, alkanes and heavy metals determination [
27], alkanes and PAHs in oil-contaminated soil [
28] or the bioremediation of soils polluted with fuel oil [
31]. In our study, this technique was used for diagnostic risk screening of long-term PAH contaminated soils, and to predict their toxicity based on various risk indexes.
The obtained spectra (
Figure 4) are typical for mineral soils with a high sand content (loamy sand) [
59] but are relatively flat with no clear features. This is the result of sample drying and sieving but also of the relatively low C
org content [
20,
32] of most samples (upper quartile for C
org 17.6 g kg
−1). The differences between samples are mostly noticeable in baseline and peak intensities. The reflectance is usually increased with a lower SOM content [
60]. In the present study, C
org is probably not the only factor governing the reflectance intensities since the mean C
org content from class 1 (200–600 μg kg
−1) to class 5 (>10,000 μg kg
−1) are 9.56; 9.81; 22.96; 17.33, and 87.94 g kg
−1, respectively. The absorption features at 1400 and 1900 nm are due to water (OH bonds) and the absorption feature at 2210 nm is due to hygroscopic water in clay minerals [
61]. Overall, there was a reflectance decrease with the increase of PAHs content. This was also observed by Bray et al. [
12] and Okparanma et al. [
16,
26], and could partly be explained by the correlation between soil organic carbon and 16PAHs.
To investigate further the structure hidden within the soil spectra, a principal component analysis (PCA) was undertaken. The score plot is presented in
Figure 5. Most of the spectral variations (99%) are described by the first component (PC-1). Samples are distributed along the PC-1 axis. Two cluster are well separated (low PAH content (class 1) and high PAH content (class 5). Variations of C
org content and soil texture in the other classes were too great to allow for clustering according to PAH content. Bray et al. [
12] on suburban soil samples have noticed that samples were roughly distributed along PC-1 axis. That behaviour could be related to soil properties (e.g., C
org content).
One sample (down the PC-2 axis) seems to be an outlier, but was not removed from the models since preliminary investigations have shown that it did not have any significant effect on the modelling. That sample was an organic soil (muck) while the other samples were mineral soils. The green triangles (
Figure 5) are soils from the 16PAH 1000–5000 μg kg
−1 class. Their spread is probably due to the fact that they present most of the soil texture variation and texture affects soil reflectance [
62], but also the most C
org variation.
The texture heterogeneity leads to more spectral variation related to diffuse reflected light inside the class than the other PAH classes. No features in the 1600–1800 nm related to hydrocarbon contamination [
16,
26] were noticeable in the raw spectra probably due to the fact that samples were sieved and scanned dry. However, even if most features are not discernible to the naked eye, chemometric techniques (e.g., PLS) are able to extract important information from a spectral database.
The PLS regression models of individual PAHs were poorly defined with r
2 < 0.5 and high root mean square errors and are not reported here. Models with MA pre-processing were better than with other pre-processing methods and are the only ones presented below. The quantitative predictions of individual PAHs and 16PAHs were not satisfactory probably due to the low representation of extreme values and to the limited number of samples. The non-normal distribution of PAH contents is also a factor affecting the predictions. To our knowledge, only one paper predicted an individual PAH (BaPyr) concentration with success [
12] on 65 urban soils from Australia. However, their BaPyr content was much higher than in our dataset (74 samples). Phenanthrene, as an individual PAH was also predicted in another study but on artificially contaminated soil samples with a very large amount of diesel oil [
29]. The lack of robust prediction for individual PAH is probably due to an insufficient number of samples representing each class of contamination and to a relatively low PAHs content (except for a few samples). Despite good correlation with C
org (r
2 = 0.81 for the five composite samples from
Figure 5), PAH contents were not possible to predict. Nevertheless, the grouping of samples along the PC-1 component leads us to believe in the possibility to classify samples according to the different risk indexes presented in the above sections.
Figure 6 presents a plot of PLS regression coefficients obtained by the PLS regression of total 16PAH. That plot allows identifying important variables (wavelengths) of the model construction.
Assignment to specific compounds here is difficult too. No distinctive peaks were identified as important but rather regions (thick black line). Those regions are mostly related to organic matter (OM) and clay minerals. The first region (560–804 nm) is related to chromophores of OM and iron oxides [
20,
63]. The second region (1517–1898 nm), which is the first overtone region is mostly related to OM and contains the absorption bands by several bonds (CH, CH
2, CH
3, and Ar-CH aromatic ring [
64]). Moreover, Okparanma et al. [
16] identified a part of that region (1600–1800 nm) as related to hydrocarbon contamination. That important variables region is discontinued before the 1900 nm absorption peak related to water. The next region (1983–2046 nm) starts just after that peak and is related to N-H bonds and aromatic amines [
64]. The last region (2188–2236 nm) is probably related to the 2210 nm absorption band from hygroscopic water and clay mineral [
61].
We, therefore, conclude that predictions of risk indexes (especially binary models) by VIS-NIR spectroscopy are possible because of the PAH relationship with OM but also with clay and perhaps iron oxides (see
Figure 6). Since individual PAHs were poorly predicted, we decided to run the PLS algorithm on the index classes to use the method as a fast screening tool to predict environmental risk. Classes (e.g., 1 to 4 for TUm) were predicted instead of actual index values.
Table 6 summarises the prediction of the classification results.
Prediction of Σ16PAHs classes was accurate in only 58.1% of the samples (
Table 6). The model could not correctly classify samples from the 200–600 μg kg
−1 class (non-contaminated samples). We, therefore, divided the dataset into two classes (threshold—600 μg kg
−1) to test if it was possible to predict if a sample was contaminated or not. In that case, prediction accuracy was 79.7% (59 correctly classified samples out of 74). Still, the model had difficulties in detecting samples of the non-contaminated class (11 cases were predicted as contaminated but being non-contaminated).
Besides, the prediction of the soil PAH concentrations, the VIS-NIR method was also used to predict environmental risk. The prediction of hazard index (HI) classes was better with 83.8% accuracy. The model very accurately predicted samples of low impact (1 < HI < 10), where HI index did not exceed threshold value for acceptable risk. Out of the 12 samples wrongly classified, four were classified as class 2 (HI = 10–100, moderate risk) instead of class 1 (HI = 1–10, low risk), six samples were classified as class 1 instead of class 2, and two samples classified as class 2 instead of class 3 (HI > 100, high risk). When considering for a binary prediction only HI index above which the possible risk occurs (HI > 10), the accuracy was 86.5% with eight samples classified as class 2 (HI > 10) instead of class 1 (HI < 10) and only two samples the other way around.
The classification of toxic units for 16PAH mixture (TUm) according to Persoone et al. [
53] has provided satisfactory results with an accuracy of 83.8%. When predicting if a sample presented toxicity (TUm > 10) with a binary prediction, the accuracy was 87.9% with no over prediction of one class over the second. This is in agreement with Bray et al.’s [
12] accuracy of 90.2% for the prediction of Σ16PAHs > 10 mg kg
−1 using ordinal logistic regression.
The TP class predictions were unsatisfactory, with only 33.8% accuracy classifying the samples into the four TP classes [
15]. The best predicted class was the low risk class (0.25 < TP < 0.50) with 75% accuracy. Nevertheless, there was rarely more than one class error difference (e.g., class “no risk” instead of class “moderate risk”). A binary prediction with a threshold of 0.5 was completed with an aim at detecting if there is a risk or not [
15], and the obtained prediction risk accuracy was 71.6%. It needs to be highlighted that more investigations are needed to develop more accurate models of TP classification when using the 4 classes by including more samples of each class.
The prediction of TEQ classes were made separately for each threshold according to Carlon et al. [
58] and Cachada et al. [
3]. There were no samples of TEQ < 15 μg kg
−1, therefore seven independent models were produced (
Table 6). There was an increase in prediction accuracy with increase of the TEQ threshold value. Bray et al. [
12] obtained an opposite pattern, with lower accuracy while predicting higher threshold values (but for Σ16PAHs; 10, 20, and 40 mg kg
−1). In our case, this is related to the fact that there were fewer samples with higher contamination values, and thus higher TEQs. For a TEQ threshold of 600 μg kg
−1, which is the limit Canadian value for the protection of human health [
57], the binary model (if a sample has a TEQ higher or lower than 600 μg kg
−1) accuracy is 94.6%.
There was overestimation in the predictions of 16PAHs and HI index but overall, our results are promising and the method could be of interest to use a fast screening tool to decide whether or not more investigations are needed in a specific area. This is the first time that VIS-NIR was used to predict HI, TUm, TP, and TEQ indexes in a view to allow a fast screening risk assessment of a contaminated area.