An Assessment of the Mobility of Toxic Elements in Coal Fly Ash Using the Featured BPNN Model

Zhang, Jinrui; Li, Chuanqi; Zhang, Tingting

doi:10.3390/su152316389

Open AccessArticle

An Assessment of the Mobility of Toxic Elements in Coal Fly Ash Using the Featured BPNN Model

by

Jinrui Zhang

¹,

Chuanqi Li

^2,*

and

Tingting Zhang

²

¹

School of Resources & Civil Engineering, Northeastern University, Shenyang 110819, China

²

Laboratory 3SR, CNRS UMR 5521, Grenoble Alpes University, 38000 Grenoble, France

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(23), 16389; https://doi.org/10.3390/su152316389

Submission received: 12 October 2023 / Revised: 20 November 2023 / Accepted: 27 November 2023 / Published: 28 November 2023

Download

Browse Figures

Versions Notes

Abstract

:

This study aims to propose a novel backpropagation neural network (BPNN) featured with sequential forward selection (SFS), named the BPNN_s model, to master the leaching characteristics of toxic elements (TEs) in coal fly ash (CFA). A total of 400 datasets and 54 features are involved to predict the fractions of TEs. The determination coefficient (R²), root mean square error (RMSE) and variance accounted for (VAF) and Willmott’s index (WI) are used to validate the BPNN_s, and its predictive performance is compared with the other three models, including the unified BPNN (BPNN_u), the adaptive boosting (AdaBoost) and the random forest (RF) models. The results indicate that the BPNN_s outperforms others in predicting the fractions of TEs, and feature selection is an imperative step for developing a model. Moreover, the features selected with SFS suggest that the influence of the element properties is more significant than that of the chemical properties as well as the concentration on predicting the fractions of TEs. Atomic weight is found to be the most critical feature in the prediction through a shapely additive explanations (SHAP) analysis. This study helps to assess the TEs’ mobility rapidly and accurately and provides a foundation for obtaining insights into the relationship between the features and the fractions of TEs.

Keywords:

coal fly ash; toxic elements; leaching characteristics; feature selection; BPNN

1. Introduction

Fossil fuels still play a pivotal role in the global energy supply due to their high energy density and widespread availability. Coal is one of the most widely used energy sources and has been used for various applications over the years, such as electricity generation, heating, chemical production and manufacturing [1]. However, these uses inevitably generate a significant amount of mining solid waste. Among them, coal fly ash (CFA) has raised widespread concerns since (1) it occupies a large area of land and is estimated that 600–800 million tons of CFA are generated annually on a global scale [2] and (2) it contains a considerable number of toxic elements (TEs) such as the arsenic (As), boron (B) and selenium (Se) [3]. It is noted that these TEs easily leach into the soil and water if CFA cannot be disposed of well, which can further cause water and soil contamination and detrimental effects on the ecosystem. Therefore, it is important to understand the leaching characteristics of TEs in CFA.

A potential environmental risk assessment is often carried out to help manage the adverse effects of different TEs in CFA, which guarantees that efforts can be focused on some things that are likely to cause high risk [4]. Based on different dimensions of concern, such as bioavailability, toxicity, exposure frequency and concentration, some indices have been proposed and applied in a great number of fields. For example, Rezapour et al. [5] applied single pollution (PI), comprehensive pollution (PIN) and pollution load (PLI) to evaluate the health risk of soils affected by the leachate of heavy metals. To distinguish the toxicity difference between different heavy metals, Men et al. [6] proposed a new method (named the NIRI) based on the Nemerow integrated pollution index (NIPI) and potential ecological risk index (RI). Among these indexes, PI, PIN and NIPI disregard variations in the toxic response factor, which differ significantly for different types of heavy metals. Moreover, due to the fact that the TEs’ mobility is also related to the chemical speciation, these indexes can only provide limited information on understanding the role of TEs in environmental risks [7,8]. To evaluate an environmental risk more rationally, Qi et al. [9] proposed a novel indicator on the basis of PI, and its superiority in ecological risk analysis and critical source identification was validated through a case study of CFA. Meanwhile, the risk assessment code (RAC) was proposed and utilized to evaluate the bioavailability based on the percentage of TEs in the exchangeable and carbonate fractions [10,11]. Therefore, access to fractions is critical for the environmental risk assessment.

To gain the leaching characteristics of elements, Tessier et al. [12] proposed a chemical analysis (known as the sequential extraction) in which the elements are divided into five categories, including exchangeable, bound to carbonates, bound to Fe-Mn oxides, bound to organic matter and residual. The first three phases are considered as the bioavailability phase (BAP), and the latter two phases are regarded as non-bioavailability phases (NBAPs) [13]. The greater the percentage of BAP for a TE, the more harmful it is to the environment. As the approach can simulate different environmental conditions and ensure the accuracy of the experimental results, it has gained a lot of applications in the field of environmental risk assessment [14].

However, the method has limitations: poor reproducibility, time-demanding, expensive and labor-intensive [15]. These deficiencies have seriously hindered the fast assessment of potential ecological risks [9,16]. Alternatively, the machine learning method has been developed to perform classifications and predictions in many practical fields, such as infrastructure constructions [17], and in rock uniaxial compressive strength measurements [18,19]. Regarding the TEs, some investigations have also been conducted in content prediction, mobility evaluation, risk judgment and recovery potential assessment. To determine the leaching risk of heavy metals in fly ash more quickly, Liu et al. [20] constructed an extreme gradient boosting (XGboost) model using 160 samples. However, the model only incorporated six factors as the input. Zheng et al. [21] developed a gradient boosting regression tree (GBRT) model to predict various forms of heavy metal occurrences within tailings, which could be helpful for risk assessment and tailings’ management. Nevertheless, there remains an urgent need to acquire more data to enhance the model’s generalization. Ghosh et al. [4] evaluated the relationship between the metal content and the corresponding fractions of Ni, Cd, Cr and Pb using a machine learning algorithm powered with the random forest (RF) model. Khosravi et al. [15] used three machine learning methods (i.e., extreme gradient boosting (XGBoost), RF and support vector regression (SVR)) to predict the contents of Co and Ni in soil samples collected from a copper mine, and validated these predictions with chemical measurements. However, the limited number of toxic elements that were considered restricts their utilization in a wider range of situations. To evaluate the recovery potential of TEs in CFA, Wu et al. [22] constructed two hybrid models that combined the gradient boosting decision tree (GBDT) and particle swarm optimization (PSO) algorithm to predict the fractions of TEs. Given that the artificial neural network (ANN) model has excellent self-learning and adaptive abilities, Qi et al. [9] developed an ANN optimized with the backpropagation (BP) algorithm model (BPNN) to predict the fractions of TEs in mining solid waste. The outcome indicated that this model achieved a satisfactory predictive performance. Therefore, the BPNN technique was chosen to predict the fractions of TEs in this work.

However, feature selection (FS), a critical step in the machine learning model development process, has been neglected in these studies, which relied on experience and existing conclusions for FS. Inevitably, some redundant features with low relevance were introduced. This not only leads to a lack of interpretability but also increases computational costs. Therefore, to eliminate noisy features and enhance the prediction performance, implementing FS is crucial for predicting fractions of TEs [23,24]. To date, the methods used to select the key features can be generally partitioned into three categories: filter, wrapper and embedded [25]. The filter method selects the features based on the following indices: information, distance, consistency and statistical approaches. Due to the fact that this method can be implemented before constructing a machine learning model, it has been widely applied [26,27]. In contrast to the filter method, the embedded method executes FS and learning algorithm simultaneously. Since decision tree algorithms and regression models have their own scoring mechanisms, they can be implemented into FS more easily [28]. Alternatively, the wrapper method evaluates a model’s performance with specified features in the objective function by using various techniques, such as stepwise regression, forward selection and backward selection, and then chooses the optimal feature subset [29]. Figure 1 displays the detailed introduction of these methods and their relevant differences. Compared to the filter and embedded methods, the wrapper method is more flexible and can provide superior and consistent results [28]. In the wrapper method, the sequential feature selection (SFS) technique is used for FS in this study, since its cooperation with the BPNN has generated satisfactory results in many studies [30,31].

Moreover, given that complex machine learning models can only provide the ultimate results through some factors being selected, there remains an urgent need to master the patterns between the inputs and outputs. To this end, an explainable artificial intelligence (XAI) is increasingly important. Generally, the interpretable methods can be divided into two categories: self-explanatory model and external co-explanation [32]. The former includes decision tree-based and linear-based models. The latter contains instance-based, shapely additive explanations (SHAP), knowledge graph, deep learning and clustering model. Among these methods, SHAP was introduced in the field of machine learning by Lundberg in 2017 and has been widely utilized to provide valuable insights mapping relationships between features and targets [33]. It will be used in this study for the interpretation of the constructed models.

This study aims to predict the fractions of TEs in CFA quickly, interpretably and accurately. A popular machine learning model, the BPNN, is featured with SFS and optimized using some hyperparameter adjustment techniques. Next, the performance of the proposed model (BPNN_s) is evaluated and compared to the other three classical models, including the unified BPNN (BPNN_u), the adaptive boosting (AdaBoost) and RF models. Finally, the SHAP algorithm is used to measure the importance of features selected with SFS and to provide the relation about the pattern between the features and fractions.

2. Data Collection

To date, a large number of studies have confirmed the effects of total concentration and chemical composition on the mobility of TEs [34,35]. In addition, the impacts of elemental properties, such as the atomic weights and atomic numbers on the fractions, have also been investigated and verified [22]. The reason for choosing the cumulative particle size distribution as the factor is that the TEs in the smaller-sized particles are more susceptible to leaching than the larger bulk particles [36]. Considering the above factors, 5 categories of sequential extraction conditions, 16 sample characteristics comprising 5 physical and 11 chemical features, 32 element properties and the total concentrations are considered as the inputs (Table 1). The fractions serve as the outputs to indicate the mobility of TEs under various sequential extraction conditions. For example, when exploring the leaching characteristics of As under the water-extractable condition, the model will make a prediction based on this specific sequential extract condition and the corresponding properties of As. Similarly, fractions under other sequential extraction conditions can also be provided with the model constructed in this study.

The mobility assessment of 8 TEs is investigated under different sequential extraction conditions, including boron (B), vanadium (V), chromium (Cr), arsenic (As), selenium (Se), molybdenum (Mo), antimony (Sb) and tungsten (W). These specific element properties were obtained from Jiang et al. [37]. The corresponding experimental data were extracted from the published study [3], where 10 CFA samples were collected and the leaching characteristics of TEs were investigated with a modified sequential extraction approach as shown in Table 2.

In this study, a total of 400 datasets were compiled and split into the training set (including 80% samples, i.e., 320 samples) and the test set (including 20% samples, i.e., 80 samples). The former is dedicated to training the predictive model (the latter) is used to assess the model performance.

3. Methodology

This section aims to provide a detailed introduction to the implemented methods, including the prediction model, grid search, cross-validation (CV), SFS and SHAP. Moreover, four evaluation matrices for the model validation and comparison are presented.

3.1. Backpropagation Neural Network

The BPNN is a supervised algorithm proposed by Rumelhart et al. [38]. Given the capacities of nonlinear fitting, self-adapting and strong generalization, it has been applied in many complex engineering projects [39,40,41]. Similar to the pattern of information transmission in the human brain, the algorithm involves two procedures: information forward transmission and error backpropagation. During the learning process, some critical parameters, weights and biases will be continuously adjusted based on the specific dataset until the termination criterion is met.

To obtain the desired result, four steps are carried out. Initially, weights w and biases θ in the hidden layer and the output layer are initialized through the technique of random number. Then, the target of the output layer is obtained based on the following equations:

z_{k} = f (\sum_{j = 1}^{m} (w_{j k} \times y_{j}) - θ_{k}) (j = 1, 2, \dots m; k = 1, 2, \dots l)

(1)

y_{j} = f (\sum_{i = 1}^{n} (w_{i j} \times x_{i}) - θ_{j}) (i = 1, 2, \dots n; j = 1, 2, \dots m)

(2)

where x, y and z are the output values in the input layer, the hidden layer, and the output layer, respectively. f is the activation function with the ability of nonlinear mapping. n, m and l are the amount of neurons in the input layer, the hidden layer and the output layer, respectively.

Afterward, the loss functions, E_j and E_k, are adopted in the phase of backpropagation, during which the weights in the different layers are modified based on the following specialized rules:

Δ w_{i j} = - η \frac{\partial E_{j}}{\partial w_{i j}}

(3)

Δ w_{j k} = - η \frac{\partial E_{k}}{\partial w_{j k}}

(4)

where Δw_ij and Δw_jk are the tuning weights between the input and hidden layers as well as between the hidden and output layers, respectively. η is the scale coefficient known as the learning rate.

Next, the weights in the hidden layer and the output layer are updated. The loop stops when termination criteria are satisfied. Generally, the number of iterations is considered.

w_{i j} (t + 1) = w_{i j} (t) + Δ w_{i j}

(5)

w_{j k} (t + 1) = w_{j k} (t) + Δ w_{j k}

(6)

where t is the number of iterations.

3.2. Hyperparameter Optimization Techniques

In the course of developing a prediction model, the model structure and the train pattern are determined by the hyperparameters. Therefore, the optimal hyperparameter combination of the proposed model should be selected before predicting the target. In other words, the best model has the smallest prediction error among all trained models with different hyperparameter combinations.

To minimize the generalization error, several optimization methods were proposed to deal with the combination problem, including the trial–error approach, the grid search, the random search and the meta-heuristic algorithms. Among all these, the grid search has great potential for optimizing the hyperparameters of neural network-based models [42]. The whole optimization process can be described as follows: (1) a research grid is constructed based on the number of hyperparameter types and their range of value; (2) each combination in the research grid is applied in the model to calculate the corresponding generalization errors; (3) the group of hyperparameter combination with the minimal error is selected to assign the model developed.

In addition, different splitting patterns in the dataset also have an influence on a model’s performance. To address this issue, one of the most popular approaches, k-fold CV, is adopted to avoid the bias of data splitting and to minimize the generalization error [43]. In this method, the original dataset is randomly split into k subsets, where k–1 subsets are selected to train the model, and the remaining one is applied to the performance evaluation. During this process, the model is trained and evaluated repeatedly k times and the final performance of the model with the specialized hyperparameters is obtained by averaging the k performance evaluations. The fold number is identified as 5 in this study after fully considering a model’s performance and the computational costs.

3.3. FS

Given that the irrelevant features can reduce a model’s performance and increase its computational time, the FS technique is employed to choose an optimal feature subset, which can maximize relevance and minimize redundancy. The sequential feature selection used in this study can be divided into two categories according to the direction of the sequential search, including SFS and sequential backward selection (SBS) [44]. Compared to SBS, SFS is more computationally time-saving [23]. Therefore, it is used to reduce the number of redundant features in the original dataset. The SFS algorithm constructs an empty candidate subset before implementing FS. Following that, one new feature with a minimal difference compared to the other features is added to the candidate subset each round. The loop continues until the number of features predefined is met. The procedures are shown in Figure 2.

It should be noted that the root mean square error (RMSE) is employed as the criterion for calculations of differences between the predicted and observed values. Moreover, the 5-fold CV is also adopted here to validate the performance of the candidate subsets for superior robustness.

3.4. Evaluation Metrics

To comprehensively verify the performance of the models, four metrics, including the determination coefficient (R²), RMSE, the variance accounted for (VAF) and the Willmott’s index (WI) were utilized in this study. The evaluation metrics can be defined as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(7)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(8)

VAF = [1 - \frac{var (y_{i} - {\hat{y}}_{i})}{var (y_{i})}] \times 100 %

(9)

WI = 1 - [\frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(|{\hat{y}}_{i} - \bar{y}| + |y_{i} - \bar{y}|)}^{2}}]

(10)

where n is the number of samples in the dataset.

y_{i}

and

{\hat{y}}_{i}

are the actual value and the predicted value of the ith sample, respectively.

\bar{y}

is the mean value of the observed samples. Therefore, the model with higher values of R², VAF and WI but lower RMSE values performs better.

3.5. SHAP

With the advancements in big data and the increasing demand for accuracy, complex models have been gaining popularity. Nevertheless, a significant number of features involved in these models leads to the deficiency of poor interpretability. Consequently, various methods have been proposed to help understand the intrinsic mechanism in the constructed models. Due to its high extendibility, SHAP has gained attention in academia and industry to improve the interpretability of models [45]. In this method, the SHAP value used to measure the contribution of each feature to the prediction is calculated based on the cooperation game theory. For a specific sample, the sum of SHAP values of each feature corresponds to the difference between the base value and the model output. Moreover, the positive/negative correlations between the specific feature and the target are also provided explicitly. In this work, these factors are utilized to perform the model’s explanation. Taking all of the above factors into account, the framework of this work is illustrated in Figure 3.

4. Results and Discussion

This section provides several discussions, which include the development of the BPNN models, the performance validation and comparison with the existing models and a SHAP analysis of the best model.

4.1. Development of BPNN Models

For the purpose of increasing the speed of convergence and minimizing the effects of features’ magnitudes, data scaling is first implemented before model training. The Tansig function is considered as the activation function; then, the data with different features are preprocessed as follows [46]:

y = \frac{2 \times (x - x_{\min})}{x_{\max} - x_{\min}} - 1

(11)

where x_min and x_max are the minimum and maximum of data x, and y is the scaling result.

During the model development process, the architecture should also be determined. In this study, the number of the hidden layer is set as 1 due to the proven capability of the single layer to map any nonlinear relationship [47]. Furthermore, providing an optimal configuration for the number of neurons in the hidden layer is always an essential issue. To achieve satisfactory performance, the neuron number is determined empirically as follows [48]:

i = \sqrt{m + n} + a

(12)

where m and n are the number of neurons in the input and output layers. a is a constant in the range from 1 to 10.

Due to the characteristics of the original dataset, the number of neurons in the output layer is set equal to 1. In the input layer, the number of neurons is timely adjusted according to the number of features selected from SFS. It is worth pointing out that the concentration (as a basic property) is not involved in the course of FS. In other words, it is always maintained as the input of a model. Similarly, 5 sequential extraction conditions are also retained since they are required to provide categorical information for the prediction. As a result, the number of neurons in the hidden layer is adjusted in the range from 4 to 18.

Figure 4 shows the variation of the RMSE with an increasing number of features when different numbers of neurons are assigned in the hidden layer using the grid search approach with an increment of 1. Overall, the RMSE gradually decreased when the numbers of features selected with SFS were increased, indicating that the generalization capacity is improved in the model. Moreover, it can be seen that the model obtains a relatively small RMSE values when the number of selected features is between 15 and 30. In particular, the best performance (i.e., the lowest RMSE value) is gained by the model with 12 neurons in the hidden layer and 28 features, instead of the model with the most neurons or features. Therefore, the numbers of neurons in the input and hidden layers for the BPNN_s are set as 28 (i.e., the number of used features) and 12, respectively, as shown in Figure 5.

Table 3 presents the features selected in each round along with the corresponding category and RMSE. It is seen that the selected features are almost element properties, except for K₂O, which suggests that the physical and chemical properties have limited effects on the prediction task. In addition, the RMSE is at its lowest when the feature named Nunfilled is selected from the original dataset.

To demonstrate the performance improvement of the model coupled with FS (BPNN_s), the BPNN_u model with all features is also trained in this work. According to Equation (12), the amount of neurons in the hidden layer is adjusted in the range from 9 to 18. Similarly, the RMSE with 5-fold CV is considered as the criterion to select the optimal hyperparameter. Figure 6 presents the results of the models with different neurons in the hidden layer. Taken as a whole, the loss is minimized with a neuron count of 15, although this may not obtain the best performance in each fold. Therefore, the number of neurons in the hidden layer for the BPNN_u model is considered to be 15.

4.2. Performance Comparison

When the structures and hyperparameters of both models (BPNN_u and BPNN_s) are determined, the prediction performances are compared in the training and test sets, and the results can be found in Table 4 and Figure 7 and Figure 8, respectively. Table 4 provides the comparison results using 4 evaluation metrics as mentioned in Section 3.4. It is worth pointing out that the BPNN_u model achieved a similar performance to the previously published work where the value of R² in the test set was 0.83 [9]. In addition, the BPNN_s is superior to the BPNN_u by resulting in a lower RMSE (train: 7.0455, test: 7.3213) and higher values of R² (train: 0.9150, test: 0.9062), VAF (train: 91.4989, test: 90.7010) and WI (train: 0.8635, test: 0.8551). Considering that the BPNN_s outperforms the BPNN_u with only 28 features, this implies that the original dataset has many redundant features that affect its prediction performance. Therefore, FS needs to be conducted before developing a model for predicting fractions.

Figure 7 provides the regression diagrams of both models in the training and test sets. In the figures, two types of lines are involved in the evaluation of the goodness of fit between the observed and predicted values. One of the lines is the black diagonal line, in which the predicted value is equal to the observed value. Therefore, the closer the points are to the line, the better the performance of the model. Moreover, the level of deviations between the predicted and observed values is reflected by the dashed lines, and the model with more points outside the dashed lines performs worse. It can be seen that the BPNN_s has more data points concentrated on the diagonal line and fewer data points outside the ‘30%’ dash lines than the BPNN_u in both the training and test sets. Consequently, the BPNN_s can produce more satisfactory results in comparison to the BPNN_u.

To clearly demonstrate the error between the observed and predicted values of both models, the prediction results are compared in the test set, as shown in Figure 8. Compared to the BPNN_u, the curve of the BPNN_s has a higher overlap with the observed curve (black line), which means that the BPNN_s generates the desired values. Moreover, it can be also observed that predictions with larger errors in the BPNN_u, such as samples 15–25, are optimized in the BPNN_s. Although it also inevitably generates differences, most of them fall within the limitations of 10%. Therefore, it can be concluded that the process of FS can significantly improve performance and curb errors.

Although the BPNN_s performs better than the model without FS (BPNN_u), this does not indicate that it is superior in contrast to the other types of models. Therefore, two ensemble models, namely the RF and AdaBoost, are developed to further validate the performance of the BPNN_s. Unlike the BPNN-based models, both ensemble models improve performance by constructing a strong evaluator, which is achieved by training multiple weak evaluators. Table 5 presents details on the hyperparameter settings of the two models that are determined through the trial-and-error approach.

Figure 9 shows the performance comparison with the Taylor diagram. The figures include correlation coefficients (yellow radial lines), RMSEs (blue lines) and standard deviations (black lines), which are used to reflect the degree of agreement, the value of error and the level of dispersion between the observed and predicted values, respectively. Three indicators of the four models mentioned above are calculated based on their own predicted values. It is noted that the correlation coefficient and RMSE of the observed point are equal to 1 and 0, respectively.

As can be seen in this figure, the point of the RF is the closest to the observed point in the training set, which means that the RF obtained a better performance than the other models. However, this superiority is not maintained in the test set because of overfitting issues. Unlike the RF model, the Adaboost performs the worst in both datasets. In contrast to the two models based on ensemble learning, the BPNN_s and BPNN_u perform stably, especially for the BPNN_s with the following correlation coefficient (train: 0.9565 and test: 0.9519) and standard deviation (train: 21.9225 and test: 21.9183). Therefore, the BPNN_s is considered as the best model for predicting the fractions of TEs in this study.

4.3. Model Validation

While the BPNN_s model performs well on the dataset, its accuracy on other samples needs to be further investigated. To achieve this, three samples were obtained from another experiment [49], with CFA samples being obtained from the combustion of bituminous coal in the Xinganmeng Coalfield. The chemical composition of CFA and the concentrations of TEs were measured using X-ray fluorescence spectrometry (XRF) and inductively coupled plasma mass spectrometry (ICP-MS), respectively. This study utilized the classical sequential chemical extraction method to ascertain the fractions of TEs. Figure 10 shows the comparison between the actual and predicted fractions of three TEs (Sb, V and Cr) under different sequential extraction conditions. It can be seen that the BPNN_s model produces prediction outcomes with minimal differences. Due to the different locations where the CFA samples were collected, the different combustion methods and the slight variations in the procedures for sequential extraction conditions, these differences are inevitable and acceptable. As a result, it can be inferred that the BPNN_s model, which has demonstrated robust performance, can be utilized for predictions with a more extensive range of fractions of TEs.

4.4. Interpretation of the Black Box—SHAP

The importance of features selected with SFS is evaluated using the SHAP values, as shown in Figure 11. It is noted that the five features indicating different sequential extraction conditions are not involved, even though they are critical for the prediction, as they only provide the categorical information in the model. It can be seen that the atomic weight contributed the most to the fractions. The SHAP values of the chemical properties (K₂O) and concentrations are 1.38 and 1.23, respectively. Moreover, the importance of element properties accounts for over 95%, which is far ahead of the chemical properties (1.31%) and concentrations (1.16%). Therefore, element properties have significant effects on the prediction of the fractions of TEs.

In contrast to the importance ranking, SHAP also provides information about how each feature contributes to the fraction of each sample. Taking sample #2 as an example, Figure 12 illustrates the details in the fraction prediction based on the SHAP values. It is remarkably clear that the predicted fraction in the selected sample is 2.70%, which is obtained with the sum of SHAP values for each feature and the base value (20.20%). In addition, SHAP also provides valuable relations between the features and the fraction from which more in-depth insights related to reducing the bioavailability of TEs can be acquired. It is noted that the SHAP values and the positive/negative relations can vary from sample to sample, which deserves further investigation.

5. Conclusions

The leachate of TEs not only causes serious environmental issues but also poses a threat to human health through the food chain. In this study, the BPNN model was developed to predict the fractions of TEs in CFA. Also, considering the performance and interpretability of the model, the SFS model was utilized to implement FS. Next, the performance of the BPNN model combined with the SFS (BPNN_s) model was tested, and its superiority was validated using three existing models trained with all features. Finally, the importance of the features selected with SFS was analyzed using the SHAP technique. Based on this work, the main conclusions are listed as follows:

(1) Four machine learning models, the BPNN_s, BPNN_u, AdaBoost and RF, were trained to predict the fractions of TEs in CFA, and their corresponding performances were provided. Among these models, the BPNN_s model obtained the more satisfactory performance in the training stage (R²: 0.9150; RMSE: 7.0455) and testing stage (R²: 0.9062; RMSE: 7.3213), indicating that it can provide stable and accurate predictions.

(2) In contrast to the BPNN_u model with a consideration of all features, the BPNN_s achieved a robust performance based on some significant features selected with SFS. Thus, it can be concluded that FS is an imperative step for the improvement in a model’s performance and should thus not be neglected in the process of developing models.

(3) The features selected with SFS and SHAP values suggest that element properties have a profound effect on the fractions of TEs in the CFA. Therefore, these element properties can provide more valuable information in the prediction of fractions, which in turn helps the precise assessment of the potential environmental risk.

The prediction of the fractions of TEs can reduce the experimental cost and enhance the assessment efficiency of a model, with an ability to reduce environmental pollution and evaluate metal recovery potential and aim toward sustainable development. Overall, this study provides a reference and foundation in FS for predicting the fractions of TEs. Given that the machine learning algorithms heavily rely on data sets, more samples should be further added to improve the generalization capacity of the BPNN_s model. Other TEs can also be explored in future studies to broaden the scope of the machine learning model.

Author Contributions

J.Z., C.L. and T.Z.; methodology, J.Z. and C.L.; software, J.Z.; validation, J.Z., C.L. and T.Z.; formal analysis, J.Z.; investigation, J.Z. and C.L.; resources, J.Z. and C.L.; data curation, J.Z. and C.L.; writing—original draft preparation, J.Z.; writing—review and editing, C.L. and T.Z.; visualization, J.Z.; supervision, C.L. and T.Z.; project administration, C.L. and T.Z.; funding acquisition, C.L. and T.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to thank the China Scholarship Council (CSC No. 201906690049). The financial support is greatly appreciated.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ju, T.; Meng, Y.; Han, S.; Lin, L.; Jiang, J. On the state of the art of crystalline structure reconstruction of coal fly ash: A focus on zeolites. Chemosphere 2021, 283, 131010. [Google Scholar] [CrossRef] [PubMed]
Qi, C.; Wu, M.; Xu, X.; Chen, Q. Chemical signatures to identify the origin of solid ashes for efficient recycling using machine learning. J. Clean. Prod. 2022, 368, 133020. [Google Scholar] [CrossRef]
Tian, Q.; Guo, B.; Nakama, S.; Sasaki, K. Distributions and leaching behaviors of toxic elements in fly ash. ACS Omega 2018, 3, 13055–13064. [Google Scholar] [CrossRef] [PubMed]
Ghosh, S.; Mondal, S.; Mandal, J.; Mukherjee, A.; Bhattacharyya, P.K. Effect of metal fractions on rice grain metal uptake and biological parameters in mica mines waste contaminated soils. J. Environ. Sci. 2024, 136, 313–324. [Google Scholar] [CrossRef] [PubMed]
Rezapour, S.; Samadi, A.K.; Kalavrouziotis, I.K.; Ghaemian, N. Impact of the uncontrolled leakage of leachate from a municipal solid waste landfill on soil in a cultivated-calcareous environment. Waste Manag. 2018, 82, 51–61. [Google Scholar] [CrossRef] [PubMed]
Men, C.; Liu, R.; Xu, L.; Wang, Q.; Guo, L.; Miao, Y.; Shen, Z. Source-specific ecological risk analysis and critical source identification of heavy metals in road dust in Beijing, China. J. Hazard. Mater. 2019, 388, 121763. [Google Scholar] [CrossRef] [PubMed]
McBride, M.B. Toxic metal accumulation from agricultural use of sludge: Are USEPA regulations protective? J. Environ. Qual. 1995, 24, 5–18. [Google Scholar] [CrossRef]
Shrivastava, S.K.; Banerjee, D.K. Speciation of metals in sewage sludge and sludge-amended soils. Water Air Soil Pollut. 2004, 152, 219–232. [Google Scholar] [CrossRef]
Qi, C.; Wu, M.; Liu, H.; Liang, Y.; Liu, X.; Lin, Z. Machine learning exploration of the mobility and environmental assessment of toxic elements in mining-associated solid wastes. J. Clean. Prod. 2023, 401, 136771. [Google Scholar] [CrossRef]
Sundaray, S.K.; Nayak, B.B.; Lin, S.; Bhatta, D. Geochemical speciation and risk assessment of heavy metals in the river estuarine sediments—A case study: Mahanadi basin, India. J. Hazard. Mater. 2011, 186, 1837–1846. [Google Scholar] [CrossRef]
Singh, K.P.; Mohan, D.; Singh, V.K.; Malik, A. Studies on distribution and fractionation of heavy metals in Gomti river sediments—A tributary of the Ganges, India. J. Hydrol. 2005, 312, 14–27. [Google Scholar] [CrossRef]
Tessier, A.; Campbell, P.G.C.; Bisson, M. Sequential extraction procedure for the speciation of particulate trace metals. Anal. Chem. 1979, 51, 844–851. [Google Scholar] [CrossRef]
Kwon, Y.T.; Lee, C.-W. Ecological risk assessment of sediment in wastewater discharging area by means of metal speciation. Microchem. J. 2001, 70, 255–264. [Google Scholar] [CrossRef]
Villen-Guzman, M.; Cerrillo-Gonzalez, M.M.; Paz-Garcia, J.M.; Vereda-Alonso, C.; Gomez-Lahoz, C.; Rodriguez-Maroto, J.M. Sequential extraction procedure: A versatile tool for environmental research. Detritus 2020, 13, 23–28. [Google Scholar] [CrossRef]
Khosravi, V.; Gholizadeh, A.; Agyeman, P.C.; Ardejani, F.D.; Yousefi, S.; Saberioon, M. Further to quantification of content, can reflectance spectroscopy determine the speciation of cobalt and nickel on a mine waste dump surface? Sci. Total Environ. 2023, 872, 161996. [Google Scholar] [CrossRef]
Rauret, G.; Rubio, R.; López-Sánchez, J.F. Optimization of tessier procedure for metal solid speciation in river sediments. Int. J. Environ. Anal. Chem. 1989, 36, 69–83. [Google Scholar] [CrossRef]
Yan, K.; Dai, Y.; Xu, M.; Mo, Y. Tunnel surface settlement forecasting with ensemble learning. Sustainability 2019, 12, 232. [Google Scholar] [CrossRef]
Khan, N.M.; Cao, K.; Yuan, Q.; Hashim, M.H.B.M.; Rehman, H.; Hussain, S.; Emad, M.Z.; Ullah, B.; Shah, K.S.; Khan, S. Application of machine learning and multivariate statistics to predict uniaxial compressive strength and static Young’s modulus using physical properties under different thermal conditions. Sustainability 2022, 14, 9901. [Google Scholar] [CrossRef]
Barzegar, R.; Sattarpour, M.; Deo, R.; Fijani, E.; Adamowski, J. An ensemble tree-based machine learning model for predicting the uniaxial compressive strength of travertine rocks. Neural Comput. Appl. 2020, 32, 9065–9080. [Google Scholar] [CrossRef]
Liu, Z.; Lu, M.; Zhang, Y.; Zhou, J.; Wang, J. Identification of heavy metal leaching patterns in municipal solid waste incineration fly ash based on an explainable machine learning approach. J. Environ. Manag. 2022, 317, 115387. [Google Scholar] [CrossRef]
Zheng, J.; Wu, M.; Yaseen, Z.; Qi, C. Machine learning models for occurrence form prediction of heavy metals in tailings. Int. J. Min. Reclam. Environ. 2023, 1–18. [Google Scholar] [CrossRef]
Wu, M.; Qi, C.; Chen, Q.-S.; Liu, H. Evaluating the metal recovery potential of coal fly ash based on sequential extraction and machine learning. Environ. Res. 2023, 224, 115546. [Google Scholar] [CrossRef] [PubMed]
Aggrawal, R.; Pal, S. Sequential feature selection and machine learning algorithm-based patient’s death events prediction and diagnosis in heart disease. SN Comput. Sci. 2020, 1, 344. [Google Scholar] [CrossRef]
Anami, B.S.; Malvade, N.N.; Palaiah, S. Classification of yield affecting biotic and abiotic paddy crop stresses using field images. Inf. Process. Agric. 2020, 7, 272–285. [Google Scholar] [CrossRef]
Roffo, G.; Melzi, S.; Castellani, U.; Vinciarelli, A. Infinite latent feature selection: A probabilistic latent graph-based ranking approach. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1407–1415. [Google Scholar]
Armaghani, D.J.; Hajihassani, M.; Mohamad, E.T.; Marto, A.; Noorani, S.A. Blasting-induced flyrock and ground vibration prediction through an expert artificial neural network based on particle swarm optimization. Arab. J. Geosci. 2014, 7, 5383–5396. [Google Scholar] [CrossRef]
Barkhordari, M.S.; Armaghani, D.J.; Fakharian, P. Ensemble machine learning models for prediction of flyrock due to quarry blasting. Int. J. Environ. Sci. Technol. 2022, 19, 8661–8676. [Google Scholar] [CrossRef]
Jović, A.; Brkić, K.; Bogunovic, N. A review of feature selection methods with applications. In Proceedings of the 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 25–29 May 2015; pp. 1200–1205. [Google Scholar]
Leray, P.; Gallinari, P. Feature selection with neural networks. Behaviormetrika 1999, 26, 145–166. [Google Scholar] [CrossRef]
Ferjaoui, R.; Cherni, M.A.; Boujnah, S.; Kraiem, N.E.H.; Kraiem, T. Machine learning for evolutive lymphoma and residual masses recognition in whole body diffusion weighted magnetic resonance images. Comput. Methods Programs Biomed. 2021, 209, 106320. [Google Scholar] [CrossRef]
Kurtulmuş, F.; Alibas, I.; Kavdir, I. Classification of pepper seeds using machine vision based on neural network. Int. J. Agric. Biol. Eng. 2016, 9, 51–62. [Google Scholar]
Mi, J.; Li, A.; Zhou, L. Review study of interpretation methods for future interpretable machine learning. IEEE Access 2020, 8, 191969–191985. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
Tra Ho, T.L.; Egashira, K. Heavy metal characterization of river sediment in Hanoi, Vietnam. Commun. Soil Sci. Plant Anal. 2000, 31, 2901–2916. [Google Scholar] [CrossRef]
Liu, H.; Li, L.; Yin, C.; Shan, B. Fraction distribution and risk assessment of heavy metals in sediments of Moshui Lake. J. Environ. Sci. 2008, 20, 390. [Google Scholar] [CrossRef] [PubMed]
Seshadri, P.; Seames, W.; Sisk, M.D.; Bowman, F.; Benson, S. Mobility of Semi-volatile Trace Elements from the Fly Ash Generated by the Combustion of a Sub-bituminous Coal—The Effects of the Combustion Temperature. Energy Fuels 2020, 34, 15411–15423. [Google Scholar] [CrossRef]
Jiang, Y.; Chen, D.; Chen, X.; Li, T.; Wei, G.-W.; Pan, F. Topological representations of crystalline compounds for the machine-learning prediction of materials properties. Npj Comput. Mater. 2021, 7, 28. [Google Scholar] [CrossRef] [PubMed]
Rumelhart, D.E.; McClelland, J.L. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations; MIT Press: Cambridge, MA, USA, 1986. [Google Scholar]
Yari, M.; Bagherpour, R.; Jamali, S.; Shamsi, R. Development of a novel flyrock distance prediction model using BPNN for providing blasting operation safety. Neural Comput. Appl. 2016, 27, 699–706. [Google Scholar] [CrossRef]
Al-Jarrah, R.; AL-Oqla, F.M. A novel integrated BPNN/SNN artificial neural network for predicting the mechanical performance of green fibers for better composite manufacturing. Compos. Struct. 2022, 289, 115475. [Google Scholar] [CrossRef]
Liu, E.; Li, J.; Zheng, A.; Liu, H.; Jiang, T. Research on the prediction model of the used car price in view of the pso-gra-bp neural network. Sustainability 2022, 14, 8993. [Google Scholar] [CrossRef]
Tang, L.N.; Na, S. Comparison of machine learning methods for ground settlement prediction with different tunneling datasets. J. Rock Mech. Geotech. Eng. 2021, 13, 1274–1289. [Google Scholar] [CrossRef]
Lipu, M.S.H.; Hannan, M.A.; Hussain, A.; Ayob, A.; Saad, M.H.M.; Karim, T.F.; How, D.N.T. Data-driven state of charge estimation of lithium-ion batteries: Algorithms, implementation factors, limitations and future trends. J. Clean. Prod. 2020, 277, 124110. [Google Scholar] [CrossRef]
Kannangara, K.K.P.M.; Zhou, W.; Ding, Z.; Hong, Z. Investigation of feature contribution to shield tunneling-induced settlement using Shapley additive explanations method. J. Rock Mech. Geotech. Eng. 2022, 14, 1052–1063. [Google Scholar] [CrossRef]
Qi, C.; Wu, M.; Zheng, J.; Chen, Q.-S.; Chai, L. Rapid identification of reactivity for the efficient recycling of coal fly ash: Hybrid machine learning modeling and interpretation. J. Clean. Prod. 2022, 343, 130958. [Google Scholar] [CrossRef]
Nagalakshmi, S.; Kamaraj, N. Evaluation of loadability limit of pool model with TCSC using optimal featured BPNN. In Proceedings of the 2011 International Conference on Recent Advancements in Electrical, Electronics and Control Engineering, Sivakasi, India, 15–17 December 2011; pp. 454–458. [Google Scholar]
Li, B.; Li, Y.; Wang, H.; Ma, Y.; Qiang, H.; Ge, F. Compensation of automatic weighing error of belt weigher based on BP neural network. Measurement 2018, 129, 625–632. [Google Scholar] [CrossRef]
Liu, H.; Qin, X.; Huang, S.; Jin, L.; Wang, Y.; Lei, K. Geometry characteristics prediction of single track cladding deposited by high power diode laser based on genetic algorithm and neural network. Int. J. Precis. Eng. Manuf. 2018, 19, 1061–1070. [Google Scholar] [CrossRef]
Zhou, C.; Li, C.; Li, W.; Sun, J.; Li, Q.; Wu, W.-Q.; Liu, G. Distribution and preconcentration of critical elements from coal fly ash by integrated physical separations. Int. J. Coal Geol. 2022, 261, 104095. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of three feature selection techniques.

Figure 2. SFS feature selection procedure.

Figure 3. Flowchart of the implemented methods.

Figure 4. Extraction process of features.

Figure 5. The structure of the BPNN_s model.

Figure 6. RMSE of the BPNN_u with different neurons using 5-fold CV.

Figure 7. Regression comparison for the BPNN_u and BPNN_s models. (a) Training set; (b) test set.

Figure 8. Prediction comparison for the BPNN_u and BPNN_s models in the test set.

Figure 9. Prediction comparison for the models. (a) Training set; (b) test set.

Figure 10. Comparison between the experimental and predicted fractions.

Figure 11. SHAP values for considered features.

Figure 12. Analysis for features’ contribution in the fraction.

Table 1. Introduction of all selected features.

Type	Number	Content	Explanation
Sequential extraction conditions	5	F1, F2, F3, F4, F5	Water-extractable fraction (F1); acid-soluble fraction (F2); reducible fraction (F3); oxidizable fraction (F4); residual fraction (F5)
Physical properties	5	D10, D30, D50, D60, D90	Particle size when the cumulative proportion of size distribution reaches the corresponding percentage
Chemical properties	11	SiO₂, Al₂O₃, Fe₂O₃, Na₂O, K₂O, CaO, MgO, P₂O₅, TiO₂, SO₂, LOI	Chemical compositions and loss of ignition (LOI) of fly ash samples (% by weight)
Concentration	1	-----	Total concentration of TEs
Element properties	32	-----	-----

Note: atomic volume; atomic weight; boiling temperature; column on periodic table (column); covalent radius of each element (covalent radius); density of element at STP (density); dipole polarizability; electron affinity: the ease of an atom of an element to gain an electron; Pauling electronegativity: differences in electron affinity between elements; energy to remove the first electron from an element (first electron); enthalpy of fusion (fusion enthalpy); DFT bandgap energy of T = 0 K ground state (GSbandgap); DFT magnetic moment of T = 0 K ground state (GSmagmon); specific heat capacity at STP (HeatCMa); molar heat capacity at STP (HeatCMo); enthalpy of fusion for elements at their melting temperatures (heat fusion); melting temperature of element (MeltingT); Mendeleev number; number of unfilled d valence orbitals (NdUnfilled); number of unfilled f valence orbitals (NfUnfilled); number of unfilled p valence orbitals (NpUnfilled); number of unfilled s valence orbitals (NsUnfilled); number of filled p valence orbitals (NpValence); number of filled d valence orbitals (NdValence); number of filled f valence orbitals (NfValence); number of filled s valence orbitals (NsValence); atomic number; number of unfilled valence orbitals (Nunfilled); number of valence electrons (NValence); static average electric dipole polarizability (polarizability); row on periodic table (row); space group of T = 0 K ground state structure (SpaceG).

Table 2. Five-step sequential extraction approach.

Fraction	Material	Procedure	Temperature
Water-extractable	1 g fly ash 40 mL ultrapure water	Shake 16 h	25 °C
Acid-soluble	40 mL 0.11 mol·L⁻¹ CH₃COOH
Reducible	40 mL 0.5 mol·L⁻¹ NH₂OH·HCl
Oxidizable	10 mL 30% H₂O₂	Shake 1 h	25 °C
	------		85 °C
	10 mL 30% H₂O₂		85 °C
	50 mL 1 mol·L⁻¹ CH₃COONH₄	Shake 16 h	25 °C
Residual	7 mL 60% HNO₃ 2.5 mL 46% HF 0.5 mL 30% H₂O₂	20 min in microwave	220 °C

Table 3. Details about the features chosen with SFS.

Feature Category	Feature Selected	RMSE
-----	Concentration Five sequential extraction conditions	22.19
Element properties	Covalent radius	12.18
	Boiling temperature	10.7
	Electron affinity	9.62
	GSbandgap	9.53
	NfUnfilled	9.39
	Density	9.28
	Dipole polarizability	9.21
	Row	9.37
	Atomic weight	9.18
	Mendeleev number	9.62
Chemical properties	K₂O	9.39
Element properties	Atomic volume	9.35
	Fusion enthalpy	9.31
	Polarizability	9.01
	NdValence	9.11
	Column	9.15
	HeatCMo	8.54
	Atomic number	9.67
	NpUnfilled	9.07
	NpValence	9.78
	Heat fusion	9.41
	Nunfilled	8.49

Table 4. Performance comparison of BPNN_s and BPNN_u.

	Training Set		Test Set
	BPNN_s	BPNN_u	BPNN_s	BPNN_u
R²	0.9150	0.8620	0.9062	0.8246
RMSE	7.0455	8.9760	7.3213	10.0113
VAF	91.4989	86.5717	90.7010	83.5423
WI	0.8635	0.8086	0.8551	0.7784

Table 5. Key parameters settings in the two ensemble models.

	Parameter	Value	Explanation
RF	n_estimators	140	The number of trees in the random forest
	min_samples_split	2	The minimum number of samples required to split an internal node
	Sampling method	Bootstrap	Bootstrap sampling method is used when building trees
AdaBoost	n_estimators	33	The maximum number of estimators
	learning_rate	1	Weight applied to each regressor at each boosting iteration
	Loss	Linear	The function used to update the weights after each iteration

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, J.; Li, C.; Zhang, T. An Assessment of the Mobility of Toxic Elements in Coal Fly Ash Using the Featured BPNN Model. Sustainability 2023, 15, 16389. https://doi.org/10.3390/su152316389

AMA Style

Zhang J, Li C, Zhang T. An Assessment of the Mobility of Toxic Elements in Coal Fly Ash Using the Featured BPNN Model. Sustainability. 2023; 15(23):16389. https://doi.org/10.3390/su152316389

Chicago/Turabian Style

Zhang, Jinrui, Chuanqi Li, and Tingting Zhang. 2023. "An Assessment of the Mobility of Toxic Elements in Coal Fly Ash Using the Featured BPNN Model" Sustainability 15, no. 23: 16389. https://doi.org/10.3390/su152316389

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Assessment of the Mobility of Toxic Elements in Coal Fly Ash Using the Featured BPNN Model

Abstract

1. Introduction

2. Data Collection

3. Methodology

3.1. Backpropagation Neural Network

3.2. Hyperparameter Optimization Techniques

3.3. FS

3.4. Evaluation Metrics

3.5. SHAP

4. Results and Discussion

4.1. Development of BPNN Models

4.2. Performance Comparison

4.3. Model Validation

4.4. Interpretation of the Black Box—SHAP

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI