Next Article in Journal
Long-Term Halocarbon Observations in an Urban Area of the YRD Region, China: Characteristic, Sources Apportionment and Health Risk Assessment
Previous Article in Journal
Computational Tools to Facilitate Early Warning of New Emerging Risk Chemicals
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning Models for Predicting Bioavailability of Traditional and Emerging Aromatic Contaminants in Plant Roots

by
Siyuan Li
,
Yuting Shen
,
Meng Gao
,
Huatai Song
,
Zhanpeng Ge
,
Qiuyue Zhang
,
Jiaping Xu
,
Yu Wang
* and
Hongwen Sun
*
MOE Key Laboratory of Pollution Processes and Environmental Criteria, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China
*
Authors to whom correspondence should be addressed.
Toxics 2024, 12(10), 737; https://doi.org/10.3390/toxics12100737
Submission received: 10 September 2024 / Revised: 8 October 2024 / Accepted: 10 October 2024 / Published: 12 October 2024
(This article belongs to the Section Emerging Contaminants)

Abstract

:
To predict the behavior of aromatic contaminants (ACs) in complex soil–plant systems, this study developed machine learning (ML) models to estimate the root concentration factor (RCF) of both traditional (e.g., polycyclic aromatic hydrocarbons, polychlorinated biphenyls) and emerging ACs (e.g., phthalate acid esters, aryl organophosphate esters). Four ML algorithms were employed, trained on a unified RCF dataset comprising 878 data points, covering 6 features of soil–plant cultivation systems and 98 molecular descriptors of 55 chemicals, including 29 emerging ACs. The gradient-boosted regression tree (GBRT) model demonstrated strong predictive performance, with a coefficient of determination (R2) of 0.75, a mean absolute error (MAE) of 0.11, and a root mean square error (RMSE) of 0.22, as validated by five-fold cross-validation. Multiple explanatory analyses highlighted the significance of soil organic matter (SOM), plant protein and lipid content, exposure time, and molecular descriptors related to electronegativity distribution pattern (GATS8e) and double-ring structure (fr_bicyclic). An increase in SOM was found to decrease the overall RCF, while other variables showed strong correlations within specific ranges. This GBRT model provides an important tool for assessing the environmental behaviors of ACs in soil–plant systems, thereby supporting further investigations into their ecological and human exposure risks.

1. Introduction

Aromatic contaminants (ACs), defined by the presence of one or more aromatic rings in their structure, are important pollutants in the environment [1]. Traditional ACs, such as polycyclic aromatic hydrocarbons (PAHs), polychlorinated biphenyls (PCBs), and organochlorine pesticides, are typically hydrophobic and chemically stable, contributing to their resistance to natural degradation processes [2]. These compounds are widely prevalent in the environment due to human activities, including residential heating [3], incineration [4,5], and industrial processes [6]. These traditional ACs, whether released as primary substances, intermediates, or by-products, are of significant concerns due to their persistence and potential for bioaccumulation [7,8,9]. For example, PAHs, with their fused aromatic ring structures, exhibit low water solubility and a strong affinity for organic matter, leading to their persistence in soils and sediments [10]. Similarly, the biphenyl structure of PCBs, combined with varying degrees of chlorination, increases their lipophilicity and environmental stability, promoting bioaccumulation in the fatty tissues of organisms and resistance to biodegradation [6]. Organochlorine pesticides, such as dichlorodiphenyltrichloroethane, share these characteristics due to their high chlorine content, enabling them to persist in the environment for decades and undergo long-range transport [11].
Recently, several emerging contaminants with aromatic structures, including plasticizers and flame retardants, have been identified in soil–plant systems [12,13]. Unlike traditional ACs, emerging ACs such as poly brominated diphenyl ethers (PBDEs), phthalate acid esters (PAEs), and aryl organophosphate esters (OPEs) often exhibit polar and degradable structural features. PBDEs are flame retardants commonly used in consumer products. These hydrophobic, halogenated compounds have long half-lives, resist degradation, and are prone to bioaccumulation and biomagnification [14]. Despite their strong affinity for sediments, they can travel long distances and persist in both aquatic systems and soils [15]. PAEs, widely used as plasticizers, are easily released into the environment because they are not chemically bound to the plastic matrix [16]. These compounds are commonly found in water, sediment, and soil, and their semi-volatile nature allows them to move through various environmental media [17,18]. Depending on factors such as molecular structure and environmental conditions, the half-lives of PAEs in soil range from less than a week to several months [19]. Aryl OPEs, used as substitutes for organohalogen flame retardants, have been widely detected across various environmental media. Research indicates that emissions of aryl OPEs are increasing [20]. The environmental and health risks posed by these ACs are substantial. PAHs and PCBs are linked to carcinogenic, mutagenic, and endocrine-disrupting effects [21,22]. Similarly, organochlorine pesticides are highly lipophilic, accumulating in the fatty tissues of animals and leading to chronic exposure risks, reproductive harm, and neurological damage [23]. PBDEs are associated with neurodevelopmental toxicity and endocrine disruption [24], while PAEs are linked to reproductive toxicity and developmental disorders in humans and wildlife [25]. Aryl OPEs have been connected to reproductive and neurological dysfunction, as well as genotoxicity [26,27]. Additionally, due to their lipophilic properties, ACs are prone to accumulate in soil and edible plants. The substantial structural differences among various ACs lead to distinct accumulation behaviors in plant roots [28,29,30,31,32,33,34], complicating their migration behaviors in soil–plant systems and making risk assessment more challenging.
The root concentration factor (RCF) is a commonly used metric for evaluating a plant’s ability to accumulate organic compounds, defined as the ratio of the concentration in plant roots to the concentration in soil at equilibrium [35]. Previous studies have shown that the RCF is influenced by chemical properties such as hydrophobicity and sorption–desorption features in the soil–water interface [36,37]. Several factors, such as SOM content, plant species, exposure time, root protein content, root lipid content, cultivation mode, temperature, and transpiration capacity, can influence the RCF of ACs [31,38,39]. Among these, SOM plays a critical role in determining contaminant bioavailability. High SOM levels typically reduce the bioavailability of hydrophobic organic compounds by adsorbing them onto soil particles, thus limiting their mobility and reducing plant uptake. This interaction between SOM and organic contaminants underscores the importance of soil properties in controlling accumulation processes in plant systems [40]. Different plant species exhibit varying capacities to accumulate contaminants, which can be attributed to differences in root structure, protein content, and lipid content. For instance, studies have shown that variations in the PAE absorption among plant species may be linked to differences in their lipid content [41]. Another study found that OPEs can associate with non-specific lipid transfer proteins (nsLTPs), facilitating their absorption by plant roots [42]. While plant exposure time can influence chemical absorption, this effect is often limited under conditions of uniform soil and stable chemical concentrations [37]. Cultivation methods also impact the absorption of contaminants by plant roots [43]. However, many studies have focused primarily on single-factor laboratory investigations, neglecting the combined effects of multiple factors on RCF. While empirical regression models have been widely used to predict RCFs [44], these models traditionally rely on a limited set of physicochemical properties, such as the octanol–water partition coefficient (Kow) [45]. Compared to similarly halogenated PCBs, PBDEs tend to accumulate more in plant roots due to their higher hydrophobicity, as compounds with high Kow values are generally more lipophilic and readily absorbed by roots [46]. Moreover, the relationships between the chemical structures of emerging ACs and their accumulation in plants have not been thoroughly explored. Therefore, the complexity of AC structures and the diverse properties of soil–plant properties necessitate timely and effective approaches to comprehensively assess and predict the bioavailability of these compounds in plant roots.
Machine learning (ML) refers to the process by which computers learn patterns from data without the need for explicit programming [47]. Due to the precision and convergence advantages of supervised learning, ML has been extensively utilized in environmental science studies [48,49], including predicting the RCF of organic compounds. ML models are developed to account for various uptake processes [50,51], such as examining the RCF of a wide range of per- and polyfluoroalkyl substance compounds and investigating the critical threshold range for PFAS uptake [52]. Additionally, various ML algorithms have been employed to predict RCF across different crops, such as wheat and cabbage, while analyzing the effects of multiple factors to identify molar refractivity and molecular volume as key attribute descriptors for accurate prediction [44]. However, current research on ML models related to the bioavailability of plant roots has not sufficiently explored emerging contaminants, particularly aryl OPEs and PBDEs [53,54].
Moreover, most studies use Abraham molecular descriptors to characterize the properties of pollutants. However, Abraham descriptors have limitations, as they require experimental measurements for certain physicochemical properties, which can be challenging and time-consuming [55]. Molecular descriptors derived from ChemDes Version 3.2, which integrates multiple software packages and tools for descriptor calculation, provide a more comprehensive description of chemical structure. These descriptors have been successfully applied to predict various chemical and biological properties, such as mercury ecotoxicity and target protein interactions [52,56,57]. For example, GATSe8 is a geometric descriptor that captures the spatial arrangement of atoms within a molecule, offering insights into how molecular structure can influence environmental behavior. chiChain.3, a topological descriptor, is related to molecular connectivity, helping assess the stability and reactivity of a compound’s structure. Meanwhile, xlogP reflects a compound’s lipophilicity, indicating its affinity for lipid environments and serving as a predictor of bioaccumulation potential. Together, these descriptors provide a comprehensive view of a chemical’s behavior in an environmental matrix [57]. Given the complex chemical properties of both traditional and emerging ACs, incorporating a broader range of molecular descriptors is essential for conducting a more thorough analysis of their impact on ML model performance.
In general, ACs, including emerging compounds such as aryl OPEs and PBDEs, exhibit complex chemical structures and soil behaviors, making it difficult to assess their bioavailability and long-term impacts. The ML model allows for rapid prediction of these compounds’ behavior in soil–plant systems, eliminating the need for lengthy and complex laboratory tests. Therefore, this study aims to develop an ML model to predict the RCF of ACs in soil for edible plants. The model incorporates the chemical properties of traditional and emerging ACs (PAHs, PCBs, organochlorine pesticide, PBDEs, aryl OPEs, PAEs), molecular descriptors (e.g., GATSe8, chiChain.3, xlogP), soil–plant properties (SOM content, plant species, exposure time, protein content, lipid content), and cultivation modes. The model can be used to obtain RCF values for ACs without prior knowledge of their soil–plant behavior. Furthermore, a feature importance analysis was conducted to identify the key factors influencing the accumulation of traditional and emerging ACs in plant roots, providing valuable insights into their potential risks to soil health and food safety. The developed ML model facilitates the rapid assessment of ACs’ environmental behavior, which is crucial for managing contamination in agricultural ecosystems.

2. Materials and Methods

2.1. Dataset Collection

The dataset used in this study was sourced from the Web of Science database covering the period from 1967 to 2024. The search query employed was “aromatic pollutants” AND “plants or crops” AND “bioavailability or accumulation or uptake or translation” AND “soil”. The literature data were summarized based on key parameters, including SOM content, plant species, exposure time, protein content, lipid content, and cultivation mode. Each record in the dataset is represented by either field cultivation mode (1) or potted cultivation mode (0). In cases where the literature only provided soil organic carbon content without SOM content, the conversion factor of SOM content = soil organic carbon content * 1.724 was applied [58]. Studies lacking information on exposure time were excluded. Experimental data in tabular format, including bioconcentration factors (BCF) for plant roots, or RCF values were directly incorporated from the literature. For records where plant protein and lipid contents were not provided, values were supplemented using the US Department of Agriculture Food Ingredient Database (https://fdc.nal.usda.gov/index.html, accessed on 15 February 2024). In cases where specific plant exposure durations were missing, they were estimated based on the average growth period of the crops in the study region. The SMILE structural formula of organic compounds was obtained by querying the PubChem database (https://pubchem.ncbi.nlm.nih.gov/, accessed on 15 February 2024). The final dataset consisted of 878 data points, including 55 ACs (PAHs, PCBs, organochlorine pesticide, PBDEs, aryl OPEs, and PAEs) and 17 common edible plants (amaranth, cabbage, cape, carrot, Chinese cabbage, clove, leek, lemon, maize, onions, potato, pumpkin, radish, rice, ryegrass, turnips, and wheat). The numerical distribution of input and output features in the dataset is shown in Figure S1.

2.2. Selection of Molecular Descriptors

The studied ACs in the dataset were processed through the ChemDes platform, which generated 3679 molecular descriptors. Addressing multicollinearity among these descriptors is a standard practice in feature engineering. To prevent overfitting, Spearman correlation coefficients were used to assess the relationships between descriptors and logRCF. Data filtering was performed using the Pandas library, eliminating descriptors that were not significantly correlated with logRCF (p > 0.05) or had weak correlation (|R| ≤ 0.1). In cases of high correlation between descriptors (|R| > 0.8), those with poor correlation to logRCF were removed, retaining only the descriptors with strong correlation [52].

2.3. Machine Learning Models

The model utilizes a set of input variables, including selected molecular descriptors (Text S1), SOM content, plant culture mode, exposure time, protein content, and lipid content. The logRCF serves as the target predictive output (Table S1). Categorical features are converted using one-hot encoding. The data were normalized and randomly divided into training and test sets in an 8:2 ratio. To enhance model generalization, various models were constructed, including gradient boosting regression tree (GBRT), random forest (RF), support vector regression (SVR), and a regularized linear model (LR). The hyperparameters of GBRT, RF, and SVR were optimized using Bayesian optimization and five-fold cross-validation [59]. Boosting and bagging are two classic algorithms in ensemble learning [60]. GBRT is a boosting-based algorithm that iteratively reduces residuals from previous models to optimize in the direction of decreasing model residuals [61]. RF, a classic application of the bagging algorithm, combines multiple weak classifiers to improve accuracy and generalization [62]. SVM is a supervised learning model that can map low-dimensional input space data into high-dimensional data for linear separability, with SVR being a specific application of SVM used for fitting and predicting data. During SVR model processing, input features were standardized to maintain consistency in dimensions and units [63].
These methods were selected for their complementary strengths. GBRT was chosen for its ability to iteratively reduce residual errors and handle non-linear relationships in the data. RF offers robust performance in high-dimensional datasets by combining multiple decision trees, thereby improving accuracy through the bagging process. SVR excels at mapping data into higher-dimensional spaces to better capture non-linear relationships. Lasso regression applies L1 regularization to reduce overfitting, particularly in models with numerous variables, by shrinking less important coefficients to zero, which also aids in feature selection. Traditional multiple linear regression models are prone to overfitting when many variables are involved; however, Lasso regression with L1 regularization helps mitigate this risk [64]. The hyperparameters of GBRT, RF, and SVR were optimized using Bayesian optimization and five-fold cross-validation, which efficiently explores the hyperparameter space and ensures that the models generalize well to unseen data. The results of the model include the optimized hyperparameters (Table S2), predicted logRCF values (Table S3), and key performance indicators for model evaluation, as described in Section 2.4. The ML models were developed and validated using Jupyter Notebook (version 7.0.8), and the corresponding source code can be accessed in Text S2.

2.4. Model Validation

The performance of the logRCF prediction model is evaluated using three key parameters: coefficient of determination (R2), mean absolute error (MAE), and root mean square error (RMSE). R2 measures the level of agreement between the observed and predicted logRCF, with higher values closer to 1 indicating a better fit within the range of 0 to 1 [44]. MAE represents mean absolute error between the actual and predicted logRCF values [65]. RMSE quantifies the overall discrepancy between the observed and predicted values, calculated by taking the square root of the average of the squared differences [66].
R 2 = 1 i ( y i ^ y i ) 2 i ( y i ¯ y i ) 2
M A E = 1 n i = 1 n y i y i ^
R M S E = 1 n i = 1 n ( y i ^ y i ) 2
where y i ^ represents the logRCF value of the i-th prediction, y i represents the logRCF value measured in the i-th experiment, and n represents the quantity.

2.5. Model Interpretability

While machine learning models such as GBRT and RF demonstrate strong performance, the lack of transparency in the process from input parameters to output results poses challenges for model interpretation [67]. This study explores several methods for interpreting these models. The primary approach used is perturbation analysis, which involves randomly replacing certain sample feature values with alternative values. By perturbing the input data and examining the variations in the model’s output before and after perturbation, the predictive behavior and underlying mechanisms of the complex model can be better elucidated. The analysis addresses both global and local perturbations [68]. In this study, model interpretability was enhanced through various methods to comprehensively analyze the impact of different variables. Global perturbations were analyzed using permutation feature importance (PFI), individual conditional expectation (ICE), and 3D interaction plots, while local perturbations were examined using the classic SHAP analysis. PFI, a model-independent approach, involves shuffling feature values to assess their impact on model performance, thereby identifying key features for prediction [69]. SHAP, based on the Shapley additive interpretation method, facilitates both global and local interpretations by calculating the marginal contribution of features to model output, providing valuable insights into model behavior [70]. The workflow of the machine learning model development and evaluation is shown in Figure 1.
Y j = i = 1 n φ j ( x i )
where Y j is the average SHAP value of the j-th feature, φ j is the SHAP value of the j-th feature of the i-th sample, and n is the number of samples.

3. Results and Discussion

3.1. t-Distributed Stochastic Neighbor Embedding (t-SNE) Plot of the RCF Dataset

Unsupervised learning was employed to explore the dataset [71], with high-dimensional data visualized through dimensionality reduction to analyze the RCF dataset. t-SNE was used to create a feature space with reduced dimensions, where similar samples are represented by nearby points and dissimilar samples by more distant points [72]. As depicted in Figure 2, three distinct clusters were observed: PBDEs (orange part), PAHs (green part), and emerging ACs of aryl OPEs and PAEs (yellow part). These clusters effectively highlight the similarities among data points related to soil, plants, and chemicals, suggesting that t-SNE dimensionality reduction effectively captures key information about ACs entering the roots of crops and plants. Furthermore, the random distribution of clusters indicates that ACs with diverse physical and chemical properties are uniformly distributed within the t-SNE space.

3.2. logRCF Prediction with Machine Learning Models

Compared to linear models based on Lasso regression (R2 = 0.46, MAE = 0.33, RMSE = 0.24), ML models such as GBRT, RF, and SVR demonstrated superior performance (Figure 3). Among these, the GBRT model exhibited the highest performance (R2 = 0.75, MAE = 0.11, RMSE = 0.22), followed by the RF model (R2 = 0.72, MAE = 0.12, RMSE = 0.24), and SVR (R2 = 0.63, MAE = 0.16, and RMSE = 0.28). GBRT is widely used for predicting environmental pollutants, including emission degradation and accumulation in organisms. Its operation resembles that of RF, with both being tree-based decision models [73,74,75]. However, unlike RF, GBRT employs a gradient descent method to approximate residuals using the negative gradient of the loss function from the previous model [76], which explains its superior performance in this study. While SVR is effective with fewer features, it struggled with the complexities posed by numerous variables in this analysis [77]. Its performance is particularly sensitive to hyperparameter selection, which may have contributed to its less optimal results [78]. Based on these findings, the GBRT model was further employed to investigate the root accumulation of ACs. Table S2 presents the optimal hyperparameter values obtained from three different model grid searches.

3.3. Identification of Key Features and Their Influence on logRCF

In the PFI analysis shown in Figure 4, the top 15 important features, including SOM, plant root protein content, root lipid content, exposure time, culture mode, and molecular descriptors, significantly influence the accumulation of ACs in plant roots. Notably, approximately 67% of these features are molecular descriptors, underscoring the pivotal role of molecular structure in plant root absorption. Among them, GATSe8 is a Geary autocorrelation descriptor that captures the electronegativity distribution pattern within a molecule at a specific lag (lag 8). ACs often exhibit delocalized electron systems, which strongly influence internal electronegativity. By weighting Sanderson atomic electronegativities, GATSe8 reveals how these patterns correlate over specific distances within the molecule, providing key insights into the unique interactions characteristic of aromatic structures [79]. The high importance of the fr_bicyclic descriptor indicates that the presence of a double-ring structure significantly influences the absorption of ACs by plant roots in soil. ChiChain 3 is a molecular descriptor that quantifies the degree of branching within a molecular chain. As a topological descriptor, it captures the complexity of molecular structures by emphasizing branching patterns [80].
SOM binds to ACs, thereby reducing their bioavailability and limiting their uptake by plant roots. However, under certain conditions, SOM can also enhance the absorption of specific ACs by altering their migration patterns. This dual functionality suggests that the role of SOM in pollutant accumulation is both complex and context-dependent. Protein content and lipid content also play significant roles. Proteins can selectively bind to particular ACs, thereby facilitating their accumulation in roots. In contrast, lipids exhibit strong adsorption capacity for hydrophobic ACs, such as PAHs, making it easier for these compounds to accumulate in lipid-rich root tissues. The importance of molecular descriptors underscores the critical influence of molecular structure on pollutant absorption. Descriptors such as GATSe8, chiChain.3, and fr_bicyclic reflect how the distribution of electronegativity and the complexity of molecular structures affect pollutants’ interactions with plant roots, ultimately influencing their uptake behavior.
Furthermore, the SHAP analysis provides additional insights into the specific contributions of these features to model predictions (Figure 5). The significant contributing features include lipid content, SOM, protein content, exposure time, cultivation mode, bcutp5, GATS8e, and xlogP. The high SHAP values associated with lipid content (represented by the red areas) suggest that roots with higher lipid concentrations are more likely to accumulate ACs, particularly lipophilic ones. SOM exhibits a combination of positive and negative SHAP values: at lower SOM levels, it promotes the migration and uptake of certain ACs, while at higher SOM concentrations, it restricts absorption by reducing ACs’ bioavailability. Similarly, SHAP values for protein content demonstrate both positive and negative effects, indicating that different ACs vary in their affinity for proteins. Some ACs enhance their absorption when bound to proteins, as transport mechanisms like H/phenanthrene co-transporters facilitate the uptake of low-molecular-weight PAHs into plant cells, thereby increasing their accumulation [81]. In contrast, the protein content in plant roots can inhibit certain compounds, such as aryl OPEs, from entering the plant’s vascular system by promoting their binding to the root cell wall, leading to reduced transport within the plant. Lastly, the SHAP values for exposure time indicate that longer exposure periods generally lead to higher accumulation, with mid-stage exposure showing particularly pronounced effects [82]. Each feature plays a multifaceted role in the model of pollutant accumulation in plant roots. These features not only directly impact pollutant absorption and transport but also indirectly regulate these processes by influencing plant and root growth, metabolic activity, and physicochemical environment.
Previous studies on RCF prediction have typically focused on traditional pollutants or specific plant species, which may limit the broad applicability of the model [35,83]. In this study, the developed ML model integrates both traditional and emerging ACs (e.g., aryl OPEs and PBDEs) across a wider range of edible plants, thereby expanding its applicability. Additionally, previous models, such as those used to predict the RCF of PFAS compounds or other persistent organic pollutants (e.g., PAHs and PCBs), have generally relied on a narrower set of molecular descriptors or specific environmental conditions, highlighting regular soil–plant system parameters such as SOM or root lipid content as key predictive features [44,84]. However, the ML model in this study incorporates a more diverse set of molecular descriptors (e.g., GATSe8, chiChain.3, fr_bicyclic, and xlogP), along with soil–plant properties (e.g., SOM content, plant species, and cultivation modes). As a result, more molecular features were identified as key influencing factors, allowing for a more comprehensive analysis of ACs’ behaviors in agricultural systems.
The ICE curve reveals the heterogeneous relationship between specific features and model predictions, illustrating how changes in these features affect the predictions for different individuals. Figure 6 highlights the average ICE plot for each feature, demonstrating varying degrees of dependency on logRCF values. The contribution and dependence of different variables on model predictions vary. In this model, certain variables, such as SOM and exposure time, exhibit strong dependencies within specific intervals, as evidenced by the more pronounced fluctuations in their curves in Figure 6. This indicates that these variables have a greater impact on the model’s prediction results, especially when SOM content is low and exposure time ranges from 40 to 80 days. SOM content emerges as a critical factor significantly influencing the absorption of ACs by plant roots. The figure indicates that RCF values decline notably as SOM content increases up to 10%, with a sharp drop observed between 4.9% and 5.7% SOM content, followed by stabilization. Some ACs have the ability to adsorb and bind to SOM, thereby affecting their uptake by plant roots [85,86,87]. Moreover, functional groups within SOM can interact with pollutants, influencing their mobility and transformation in soil, which indirectly impacts their uptake by plants [88].
Lipid content also plays an important role in the accumulation of ACs. An increase of up to 1% in plant root lipid content is positively correlated with logRCF values. Due to the presence of essential lipophilic components such as proteins and lipids within plant cells, some lipophilic ACs like PAHs can easily permeate plant root cells and accumulate within them [89]. Protein content in plants also significantly influences the absorption of ACs by the root system. A notable positive trend in logRCF values is observed as protein content increases within the range of 1.4% to 1.7%. However, this trend reverses around 11%. This shift may be due to varying affinities of different ACs for proteins; emerging ACs often exhibit high protein affinity, resulting in a positive relationship with protein content [90]. In contrast, traditional ACs such as PAHs and PCBs tend to bind more readily with lipids, potentially leading to a decrease in logRCF values as protein content increases [32,91].
The molecular descriptor xlogP is critical for evaluating the lipophilicity of pollutants [92]. As the xlogP value of pollutants increases, their absorption into plant roots generally decreases. This trend is due to higher xlogP values indicating greater solubility in lipid phases and lower solubility in aqueous phases, thereby reducing their bioavailability to plants.
Plant exposure time is another crucial factor, as illustrated in Figure 6, showing its dual effects. Initially, the absorption rate of ACs by plant roots tends to be higher during the early stages of exposure, driven by the large concentration gradient of pollutants. However, during long-term exposure, as plant roots grow and expand, a growth dilution effect may lead to decreased absorption rates. Studies have observed that the roots of maize (Zea mays L.) and peanut (Arachis hypogaea Linn.) rapidly absorb and accumulate PCBs and other ACs during the early vegetative growth stage. However, the concentrations of these compounds decline notably as plants progress to reproductive stages, likely due to reduced absorption capacity and growth-related dilution effects [39,93]. In contrast, the cultivation model of the plants does not exhibit a significant trend in ICE analysis. Plants cultivated under different conditions demonstrate varying physiological responses: potted plants typically exhibit more consistent physiological traits due to their stable environment [94], whereas outdoor plants must adapt to diverse environmental conditions, potentially affecting their ability to absorb and metabolize pollutants [95].
Among the methods used in this study, the machine learning model based on GBRT is identified as the best approach for accurately predicting logRCF values. The GBRT model excels in capturing complex, non-linear relationships between features and the target variable, offering high predictive performance. However, the t-SNE method plays a complementary role by providing valuable visual insights into the dataset, allowing us to explore clustering patterns and relationships that inform model development. Additionally, feature importance techniques like SHAP and PFI enhance the interpretability of the machine learning models, clarifying how individual features influence predictions.

4. Conclusions

ACs present in soil can be absorbed by plant roots, posing threats to ecological environments and human health. This study investigates root absorption by considering soil properties, plant characteristics, cultivation methods, and the molecular structures of both traditional and emerging ACs. A GBRT ML model was developed based on the dataset including 878 data points with 55 ACs to predict their RCFs, demonstrating strong predictive performance (R2 = 0.75, MAE = 0.11, RMSE = 0.22). SOM, protein content of plant root, and molecular descriptor xlogP were identified as crucial factors influencing RCF values. Notably, molecular descriptors related to the electronegativity distribution pattern (GATSe8) and double-ring structure (fr_bicyclic) of AC compounds were identified as key factors affecting RCFs for the first time. This study provides a comprehensive evaluation of AC root absorption in complex soil–plant systems, and offers valuable insights into the plant uptake and transport process of organic pollutants with aryl structures. Future work could enhance the performance of the prediction model by incorporating more advanced algorithms, such as deep learning, which may better capture complex environmental interactions. Additionally, expanding the dataset to include a wider range of molecular features and environmental variables could further improve the model’s predictive accuracy and generalizability.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/toxics12100737/s1: Figure S1: Variable distribution of RCF dataset; Table S1: The dataset of absorption behaviors of aromatic contaminants in plant root [53,54,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111]; Table S2: The optimal hyperparameter values for three different model grid searches; Table S3: Predicted logRCF values from different models; Text S1: Selected molecular descriptors; Text S2: The code of machine learning.

Author Contributions

Conceptualization, Y.W. and H.S. (Hongwen Sun); data curation, Y.S.; formal analysis, H.S. (Huatai Song) and Z.G.; investigation, Q.Z. and J.X.; methodology, S.L. and Y.S.; software, S.L. and H.S. (Huatai Song); supervision, H.S. (Hongwen Sun); validation, Y.S. and M.G.; visualization, M.G.; writing—original draft, S.L.; writing—review and editing, Y.W. and H.S. (Hongwen Sun). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Major Scientific and Technological Innovation Project of Shandong Province (2021CXGC011206), the National Natural Science Foundation of China (U21A20291, 42377223, and 42307504), the Tianjin Natural Science Foundation (22JCYBJC00400 and 23JCQNJC01510), the Nankai University Experimental Teaching Reform Project-Student-Led Innovative Open Experiments (23NKSYKF07), the 111 Program, Ministry of Education, China (B17025), and the Fundamental Research Funds for the Central Universities, Nankai University (63231195).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors express their gratitude for the Nankai University Scholarship.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Vanwijnsberghe, S.; Peeters, C.; De Ridder, E.; Dumolin, C.; Wieme, A.D.; Boon, N.; Vandamme, P. Genomic Aromatic Compound Degradation Potential of Novel Paraburkholderia Species: Paraburkholderia domus sp. nov., Paraburkholderia haematera sp. nov. and Paraburkholderia nemoris sp. nov. Int. J. Mol. Sci. 2021, 22, 7003. [Google Scholar] [CrossRef] [PubMed]
  2. Sakshi; Singh, S.K.; Haritash, A.K. Polycyclic aromatic hydrocarbons: Soil pollution and remediation. Int. J. Environ. Sci. Technol. 2019, 16, 6489–6512. [Google Scholar] [CrossRef]
  3. Przybysz, A.; Nersisyan, G.; Gawronski, S.W. Removal of particulate matter and trace elements from ambient air by urban greenery in the winter season. Environ. Sci. Pollut. Res. 2019, 26, 473–482. [Google Scholar] [CrossRef]
  4. Primbs, T.; Piekarz, A.; Wilson, G.; Schmedding, D.; Higginbotham, C.; Field, J.; Simonich, S.M. Influence of Asian and Western United States urban areas and fires on the atmospheric transport of polycyclic aromatic hydrocarbons, polychlorinated biphenyls, and fluorotelomer alcohols in the Western United States. Environ. Sci. Technol. 2008, 42, 6385–6391. [Google Scholar] [CrossRef]
  5. Shaul, N.J.; Dodder, N.G.; Aluwihare, L.I.; Mackintosh, S.A.; Maruya, K.A.; Chivers, S.J.; Danil, K.; Weller, D.W.; Hoh, E. Nontargeted biomonitoring of halogenated organic compounds in two ecotypes of bottlenose dolphins (Tursiops truncatus) from the Southern California Bight. Environ. Sci. Technol. 2015, 49, 1328–1338. [Google Scholar] [CrossRef]
  6. Wang, H.; Adamcakova-Dodd, A.; Flor, S.; Gosse, L.; Klenov, V.E.; Stolwijk, J.M.; Lehmler, H.-J.; Hornbuckle, K.C.; Ludewig, G.; Robertson, L.W.; et al. Comprehensive subchronic inhalation toxicity assessment of an indoor school air mixture of PCBs. Environ. Sci. Technol. 2020, 54, 15976–15985. [Google Scholar] [CrossRef]
  7. Lin, Y. Biodegradation of aromatic pollutants by metalloenzymes: A structural-functional-environmental perspective. Coord. Chem. Rev. 2021, 434, 213774. [Google Scholar] [CrossRef]
  8. You, Q.; Yan, K.; Yuan, Z.; Feng, D.; Wang, H.; Wu, L.; Xu, J. Polycyclic aromatic hydrocarbons (PAHs) pollution and risk assessment of soils at contaminated sites in China over the past two decades. J. Clean. Prod. 2024, 450, 141876. [Google Scholar] [CrossRef]
  9. Li, Y.; Hou, F.; Shi, R.; Li, X.; Lan, J.; Zhao, Z. Contamination status, environmental factor and risk assessment of polychlorinated biphenyls and hexachlorobutadiene in greenhouse and open-field agricultural soils across China. Toxics 2023, 11, 941. [Google Scholar] [CrossRef]
  10. Duran, R.; Cravo-Laureau, C. Role of environmental factors and microorganisms in determining the fate of polycyclic aromatic hydrocarbons in the marine environment. FEMS Microbiol. Rev. 2016, 40, 814–830. [Google Scholar] [CrossRef]
  11. Keswani, C.; Dilnashin, H.; Birla, H.; Roy, P.; Tyagi, R.K.; Singh, D.; Rajput, V.D.; Minkina, T.; Singh, S.P. Global footprints of organochlorine pesticides: A pan-global survey. Environ. Geochem. Health 2022, 44, 149–177. [Google Scholar] [CrossRef] [PubMed]
  12. Li, Z.; Panton, S.; Marshall, L.; Fernandes, A.; Rose, M.; Smith, F.; Holmes, M. Spatial analysis of polybrominated diphenylethers (PBDEs) and polybrominated biphenyls (PBBs) in fish collected from UK and proximate marine waters. Chemosphere 2018, 195, 727–734. [Google Scholar]
  13. Preston, E.V.; McClean, M.D.; Henn, B.C.; Stapleton, H.M.; Braverman, L.E.; Pearce, E.N.; Makey, C.M.; Webster, T.F. Associations between urinary diphenyl phosphate and thyroid function. Environ. Int. 2017, 101, 158–164. [Google Scholar] [CrossRef] [PubMed]
  14. Lan, Y.; Gao, X.; Xu, H.; Li, M. 20 years of polybrominated diphenyl ethers on toxicity assessments. Water Res. 2024, 249, 121007. [Google Scholar] [CrossRef]
  15. Ohoro, C.R.; Adeniji, A.O.; Okoh, A.I.; Okoh, O.O. Polybrominated diphenyl ethers in the environmental systems: A review. J. Environ. Health Sci. Eng. 2021, 19, 1229–1247. [Google Scholar] [CrossRef]
  16. Fasano, E.; Cirillo, T. Plasticizers and bisphenol as food contaminants: Sources and human risk. Curr. Anal. Chem. 2018, 14, 296–305. [Google Scholar] [CrossRef]
  17. Net, S.; Sempéré, R.; Delmont, A.; Paluselli, A.; Ouddane, B. Occurrence, Fate, Behavior and Ecotoxicological State of Phthalates in Different Environmental Matrices. Environ. Sci. Technol. 2015, 49, 4019–4035. [Google Scholar] [CrossRef]
  18. Chen, W.; Chi, C.; Zhou, C.; Xia, M.; Ronda, C.; Shen, X. Analysis of the influencing factors of PAEs volatilization from typical plastic products. J. Environ. Sci. 2018, 66, 61–70. [Google Scholar] [CrossRef]
  19. He, L.; Gielen, G.; Bolan, N.S.; Zhang, X.; Qin, H.; Huang, H.; Wang, H. Contamination and remediation of phthalic acid esters in agricultural soils in China: A review. Agron. Sustain. Dev. 2015, 35, 519–534. [Google Scholar] [CrossRef]
  20. Lao, J.-Y.; Huang, G.; Wu, R.; Liang, W.; Xu, S.; Luo, Q.; Zhang, K.; Jing, L.; Jin, L.; Ruan, Y.; et al. Aggravating Pollution of Emerging Aryl Organophosphate Esters in Urban Estuarine Sediments of South China. Environ. Sci. Technol. 2024, 58, 13415–13425. [Google Scholar] [CrossRef]
  21. Liu, X.; Dong, Z.; Baccolo, G.; Gao, W.; Li, Q.; Wei, T.; Qin, X. Distribution, composition and risk assessment of PAHs and PCBs in cryospheric watersheds of the eastern Tibetan Plateau. Sci. Total Environ. 2023, 890, 164234. [Google Scholar] [CrossRef] [PubMed]
  22. Li, N.; Li, J.; Zhang, Q.; Gao, S.; Quan, X.; Liu, P.; Xu, C. Effects of endocrine disrupting chemicals in host health: Three-way interactions between environmental exposure, host phenotypic responses, and gut microbiota. Environ. Pollut. 2021, 271, 116387. [Google Scholar] [CrossRef] [PubMed]
  23. Ansari, I.; El-Kady, M.M.; El Din Mahmoud, A.; Arora, C.; Verma, A.; Rajarathinam, R.; Singh, P.; Verma, D.K.; Mittal, J. Persistent pesticides: Accumulation, health risk assessment, management and remediation: An overview. Desalination Water Treat. 2024, 317, 100274. [Google Scholar] [CrossRef]
  24. Wu, Z.; He, C.; Han, W.; Song, J.; Li, H.; Zhang, Y.; Jing, X.; Wu, W. Exposure pathways, levels and toxicity of polybrominated diphenyl ethers in humans: A review. Environ. Res. 2020, 187, 109531. [Google Scholar] [CrossRef] [PubMed]
  25. Zhang, Y.; Yang, Y.; Tao, Y.; Guo, X.; Cui, Y.; Li, Z. Phthalates (PAEs) and reproductive toxicity: Hypothalamic-pituitary-gonadal (HPG) axis aspects. J. Hazard. Mater. 2023, 459, 132182. [Google Scholar] [CrossRef]
  26. Behl, M.; Hsieh, J.-H.; Shafer, T.J.; Mundy, W.R.; Rice, J.R.; Boyd, W.A.; Freedman, J.H.; Hunter, E.S.; Jarema, K.A.; Padilla, S.; et al. Use of alternative assays to identify and prioritize organophosphorus flame retardants for potential developmental and neurotoxicity. Neurotoxicol. Teratol. 2015, 52, 181–193. [Google Scholar] [CrossRef]
  27. He, W.; Ding, J.; Gao, N.; Zhu, L.; Zhu, L.; Feng, J. Elucidating the toxicity mechanisms of organophosphate esters by adverse outcome pathway network. Arch. Toxicol. 2024, 98, 233–250. [Google Scholar] [CrossRef]
  28. Subramanian, S.; Schnoor, J.L.; Van Aken, B. Effects of Polychlorinated Biphenyls (PCBs) and their hydroxylated metabolites (OH-PCBs) on Arabidopsis thaliana. Environ. Sci. Technol. 2017, 51, 7263–7270. [Google Scholar] [CrossRef]
  29. Zhang, Y.; Ding, J.; Shen, G.; Zhong, J.; Wang, C.; Wei, S.; Chen, C.; Chen, Y.; Lu, Y.; Shen, H.; et al. Dietary and inhalation exposure to polycyclic aromatic hydrocarbons and urinary excretion of monohydroxy metabolites—A controlled case study in Beijing, China. Environ. Pollut. 2014, 184, 515–522. [Google Scholar] [CrossRef]
  30. Yang, C.-Y.; Chang, M.-l.; Wu, S.C.; Shih, Y.-H. Partition uptake of a brominated diphenyl ether by the edible plant root of white radish (Raphanus sativus L.). Environ. Pollut. 2017, 223, 178–184. [Google Scholar] [CrossRef]
  31. Liu, Q.; Wang, X.; Yang, R.; Yang, L.; Sun, B.; Zhu, L. Uptake kinetics, accumulation, and long-distance transport of organophosphate esters in plants: Impacts of chemical and plant properties. Environ. Sci. Technol. 2019, 53, 4940–4947. [Google Scholar] [CrossRef] [PubMed]
  32. Sverdrup, L.E.; Nielsen, T.; Krogh, P.H. Soil ecotoxicity of polycyclic aromatic hydrocarbons in relation to soil sorption, lipophilicity, and water solubility. Environ. Sci. Technol. 2002, 36, 2429–2435. [Google Scholar] [CrossRef] [PubMed]
  33. Liu, P.-Y.; Du, G.-D.; Zhao, Y.-X.; Mu, Y.-S.; Zhang, A.-Q.; Qin, Z.-F.; Zhang, X.-Y.; Yan, S.-S.; Li, Y.; Wei, R.-G.; et al. Bioaccumulation, maternal transfer and elimination of polybrominated diphenyl ethers in wild frogs. Chemosphere 2011, 84, 972–978. [Google Scholar] [CrossRef] [PubMed]
  34. Wang, Y.; Zhang, Z.; Xu, Y.; Rodgers, T.F.M.; Ablimit, M.; Li, J.; Tan, F. Identifying the contributions of root and foliage gaseous/particle uptakes to indoor plants for phthalates, OPFRs and PAHs. Sci. Total Environ. 2023, 883, 163644. [Google Scholar] [CrossRef]
  35. Namiki, S.; Otani, T.; Motoki, Y.; Seike, N.; Iwafune, T. Differential uptake and translocation of organic chemicals by several plant species from soil. J. Pestic. Sci. 2018, 43, 96–107. [Google Scholar] [CrossRef]
  36. McLaughlin, M.; Smolders, E.; Merckx, R. Soil-root interface: Physicochemical processes. Soil Chem. Ecosyst. Health 1998, 52, 233–277. [Google Scholar]
  37. Li, Y.; Chiou, C.T.; Li, H.; Schnoor, J.L. Improved prediction of the bioconcentration factors of organic contaminants from soils into plant/crop roots by related physicochemical parameters. Environ. Int. 2019, 126, 46–53. [Google Scholar] [CrossRef]
  38. Dobslaw, D.; Woiski, C.; Kiel, M.; Kuch, B.; Breuer, J. Plant uptake, translocation and metabolism of PBDEs in plants of food and feed industry: A review. Rev. Environ. Sci. Bio/Technol. 2021, 20, 75–142. [Google Scholar] [CrossRef]
  39. Terzaghi, E.; Raspa, G.; Zanardini, E.; Morosini, C.; Anelli, S.; Armiraglio, S.; Di Guardo, A. Life cycle exposure of plants considerably affects root uptake of PCBs: Role of growth strategies and dissolved/particulate organic carbon variability. J. Hazard. Mater. 2022, 421, 126826. [Google Scholar] [CrossRef]
  40. Strawn, D.G. Sorption Mechanisms of Chemicals in Soils. Soil Syst. 2021, 5, 13. [Google Scholar] [CrossRef]
  41. Sun, J.; Wu, X.; Gan, J. Uptake and Metabolism of Phthalate Esters by Edible Plants. Environ. Sci. Technol. 2015, 49, 8471–8478. [Google Scholar] [CrossRef] [PubMed]
  42. Wan, W.; Huang, H.; Lv, J.; Han, R.; Zhang, S. Uptake, Translocation, and Biotransformation of Organophosphorus Esters in Wheat (Triticum aestivum L.). Environ. Sci. Technol. 2017, 51, 13649–13658. [Google Scholar] [CrossRef] [PubMed]
  43. Balliu, A.; Zheng, Y.; Sallaku, G.; Fernández, J.A.; Gruda, N.S.; Tuzel, Y. Environmental and Cultivation Factors Affect the Morphology, Architecture and Performance of Root Systems in Soilless Grown Plants. Horticulturae 2021, 7, 243. [Google Scholar] [CrossRef]
  44. Gao, F.; Shen, Y.; Brett Sallach, J.; Li, H.; Zhang, W.; Li, Y.; Liu, C. Predicting crop root concentration factors of organic contaminants with machine learning models. J. Hazard. Mater. 2022, 424, 127437. [Google Scholar] [CrossRef]
  45. Martin, I.; Collins, C.; Fryer, M. Evaluation of models for predicting plant uptake of chemicals from soil. Sci. Rep. 2006, SC050021/SR. [Google Scholar] [CrossRef]
  46. Yang, C.-Y.; Wu, S.C.; Lee, C.-C.; Shih, Y.-h. Translocation of polybrominated diphenyl ethers from field-contaminated soils to an edible plant. J. Hazard. Mater. 2018, 351, 215–223. [Google Scholar] [CrossRef]
  47. Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef]
  48. Park, S.; Chu, L.C.; Fishman, E.K.; Yuille, A.L.; Vogelstein, B.; Kinzler, K.W.; Horton, K.M.; Hruban, R.H.; Zinreich, E.S.; Fouladi, D.F.; et al. Annotated normal CT data of the abdomen for deep learning: Challenges and strategies for implementation. Diagn. Interv. Imaging 2020, 101, 35–44. [Google Scholar] [CrossRef]
  49. Liu, X.; Lu, D.; Zhang, A.; Liu, Q.; Jiang, G. Data-driven machine learning in environmental pollution: Gains and problems. Environ. Sci. Technol. 2022, 56, 2124–2133. [Google Scholar] [CrossRef]
  50. Feng, X.; Pan, L.; Xu, T.; Jing, J.; Zhang, H. Dynamic modeling of famoxadone and oxathiapiprolin residue on cucumber and Chinese cabbage based on tomato and lettuce archetypes. J. Hazard. Mater. 2019, 375, 70–77. [Google Scholar] [CrossRef]
  51. Trapp, S.; Matthies, M. Generic one-compartment model for uptake of organic chemicals by foliar vegetation. Environ. Sci. Technol. 1995, 29, 2333–2338. [Google Scholar] [CrossRef] [PubMed]
  52. Xiang, L.; Qiu, J.; Chen, Q.-Q.; Yu, P.-F.; Liu, B.-L.; Zhao, H.-M.; Li, Y.-W.; Feng, N.-X.; Cai, Q.-Y.; Mo, C.-H.; et al. Development, evaluation, and application of machine learning models for accurate prediction of root uptake of per- and polyfluoroalkyl substances. Environ. Sci. Technol. 2023, 57, 18317–18328. [Google Scholar] [CrossRef] [PubMed]
  53. Wang, Y.; Li, J.; Xu, Y.; Rodgers, T.F.M.; Bao, M.; Tan, F. Uptake, translocation, bioaccumulation, and bioavailability of organophosphate esters in rice paddy and maize fields. J. Hazard. Mater. 2023, 446, 130640. [Google Scholar] [CrossRef]
  54. Long, S.; Hamilton, P.B.; Fu, B.; Xu, J.; Han, L.; Suo, X.; Lai, Y.; Shen, G.; Xu, F.; Li, B. Bioaccumulation and emission of organophosphate esters in plants affecting the atmosphere’s phosphorus cycle. Environ. Pollut. 2023, 318, 120803. [Google Scholar] [CrossRef]
  55. Zissimos, A.M.; Abraham, M.H.; Klamt, A.; Eckert, F.; Wood, J. A comparison between the two general sets of linear free energy descriptors of abraham and klamt. J. Chem. Inf. Comput. Sci. 2002, 42, 1320–1331. [Google Scholar] [CrossRef]
  56. von Hellfeld, R.; Gade, C.; Vargesson, N.; Hastings, A. Considerations for future quantitative structure-activity relationship (QSAR) modelling for heavy metals—A case study of mercury. Toxicology 2023, 499, 153661. [Google Scholar] [CrossRef]
  57. Dong, J.; Cao, D.-S.; Miao, H.-Y.; Liu, S.; Deng, B.-C.; Yun, Y.-H.; Wang, N.-N.; Lu, A.-P.; Zeng, W.-B.; Chen, A.F. ChemDes: An integrated web-based platform for molecular descriptor and fingerprint computation. J. Cheminform. 2015, 7, 60. [Google Scholar] [CrossRef]
  58. Hu, Y.; Dou, X.; Li, J.; Li, F. Impervious surfaces alter soil bacterial communities in urban areas: A case study in Beijing, China. Front. Microbiol. 2018, 9, 226. [Google Scholar] [CrossRef]
  59. Shahhosseini, M.; Hu, G.; Pham, H. Optimizing ensemble weights and hyperparameters of machine learning models for regression problems. Mach. Learn. Appl. 2022, 7, 100251. [Google Scholar] [CrossRef]
  60. Wang, S.; Li, D.; Petrick, N.; Sahiner, B.; Linguraru, M.G.; Summers, R.M. Optimizing area under the ROC curve using semi-supervised learning. Pattern Recognit. 2015, 48, 276–287. [Google Scholar] [CrossRef]
  61. Alhakeem, Z.M.; Jebur, Y.M.; Henedy, S.N.; Imran, H.; Bernardo, L.F.A.; Hussein, H.M. Prediction of ecofriendly concrete compressive strength using gradient boosting regression tree combined with GridSearchCV hyperparameter-optimization techniques. Materials 2022, 15, 7432. [Google Scholar] [CrossRef] [PubMed]
  62. Kulkarni, V.; Sinha, P. Random forest classifiers: A survey and future research directions. Int. J. Adv. Comput. 2013, 36, 1144–1153. [Google Scholar]
  63. Zhang, F.; O’Donnell, L.J. Support vector regression. In Machine Learning; Academic Press: Cambridge, MA, USA, 2020; pp. 123–140. [Google Scholar]
  64. Wang, S.; Chen, Y.; Cui, Z.; Lin, L.; Zong, Y. Diabetes risk analysis based on machine learning LASSO regression model. J. Theory Pract. Eng. Sci. 2024, 4, 58–64. [Google Scholar]
  65. Zheng, H.L.; An, S.Y.; Qiao, B.J.; Guan, P.; Huang, D.S.; Wu, W. A data-driven interpretable ensemble framework based on tree models for forecasting the occurrence of COVID-19 in the USA. Environ. Sci. Pollut. Res. 2023, 30, 13648–13659. [Google Scholar] [CrossRef]
  66. Otoom, M.; Otoum, N.; Alzubaidi, M.A.; Etoom, Y.; Banihani, R. An IoT-based framework for early identification and monitoring of COVID-19 cases. Biomed. Signal Process. 2020, 62, 102149. [Google Scholar] [CrossRef]
  67. Allen, A.E.A.; Tkatchenko, A. Machine learning of material properties: Predictive and interpretable multilinear models. Sci. Adv. 2022, 8, eabm7185. [Google Scholar] [CrossRef]
  68. Robnik-Šikonja, M.; Bohanec, M. Perturbation-based explanations of prediction models. In Human and Machine Learning: Visible, Explainable, Trustworthy and Transparent; Springer: Berlin/Heidelberg, Germany, 2018; pp. 159–175. [Google Scholar]
  69. Perez-Lebel, A.; Varoquaux, G.; Le Morvan, M.; Josse, J.; Poline, J.B. Benchmarking missing-values approaches for predictive models on health databases. Gigascience 2022, 11, giac013. [Google Scholar] [CrossRef]
  70. Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30 (NIPS 2017); Neural Information Processing Systems Foundation, Inc. (NeurIPS): La Jolla, CA, USA, 2017; Volume 30, pp. 4768–4777. [Google Scholar]
  71. Orsenigo, C.; Vercellis, C. Kernel ridge regression for out-of-sample mapping in supervised manifold learning. Expert Syst. Appl. 2012, 39, 7757–7762. [Google Scholar] [CrossRef]
  72. Jaffari, Z.H.; Abbas, A.; Kim, C.M.; Shin, J.; Kwak, J.; Son, C.; Lee, Y.-G.; Kim, S.; Chon, K.; Cho, K.H. Transformer-based deep learning models for adsorption capacity prediction of heavy metal ions toward biochar-based adsorbents. J. Hazard. Mater. 2024, 462, 132773. [Google Scholar] [CrossRef]
  73. Pan, Y.; Chen, S.; Qiao, F.; Ukkusuri, S.V.; Tang, K. Estimation of real-driving emissions for buses fueled with liquefied natural gas based on gradient boosted regression trees. Sci. Total Environ. 2019, 660, 741–750. [Google Scholar] [CrossRef]
  74. Esmaeili, A.; Kiadeh, S.P.H.; Pirbazari, A.E.; Saraei, F.E.K.; Pirbazari, A.E.; Derakhshesh, A.; Tabatabai-Yazdi, F.-S. CdS nanocrystallites sensitized ZnO nanosheets for visible light induced sonophotocatalytic/photocatalytic degradation of tetracycline: From experimental results to a generalized model based on machine learning methods. Chemosphere 2023, 332, 138852. [Google Scholar] [CrossRef] [PubMed]
  75. Wang, R.; Cai, X. Biota-sediment accumulation factor models of organic chemicals in benthic invertebrates with gradient boosting regression tree. Asian J. Ecotoxicol. 2023, 18, 22–33. [Google Scholar]
  76. Elith, J.; Leathwick, J.R.; Hastie, T. A working guide to boosted regression trees. J. Anim. Ecol. 2008, 77, 802–813. [Google Scholar] [CrossRef] [PubMed]
  77. Luo, H.-X.; Dai, S.-P.; Li, M.-F.; Liu, E.-P.; Zheng, Q.; Hu, Y.-Y.; Yi, X.-P. Comparison of machine learning algorithms for mapping mango plantations based on Gaofen-1 imagery. J. Integr. Agric. 2020, 19, 2815–2828. [Google Scholar] [CrossRef]
  78. Radzi, S.F.M.; Karim, M.K.A.; Saripan, M.I.; Rahman, M.A.A.; Isa, I.N.C.; Ibahim, M.J. Hyperparameter tuning and pipeline optimization via grid search method and tree-based AutoML in breast cancer prediction. J. Pers. Med. 2021, 11, 978. [Google Scholar] [CrossRef]
  79. Hassan, G.S.; Georgey, H.H.; Mohammed, E.Z.; George, R.F.; Mahmoud, W.R.; Omar, F.A. Mechanistic selectivity investigation and 2D-QSAR study of some new antiproliferative pyrazoles and pyrazolopyridines as potential CDK2 inhibitors. Eur. J. Med. Chem. 2021, 218, 113389. [Google Scholar] [CrossRef]
  80. Randic, M. Characterization of molecular branching. J. Am. Chem. Soc. 1975, 97, 6609–6615. [Google Scholar] [CrossRef]
  81. Molina, L.; Segura, A. Biochemical and Metabolic Plant Responses toward Polycyclic Aromatic Hydrocarbons and Heavy Metals Present in Atmospheric Pollution. Plants 2021, 10, 2305. [Google Scholar] [CrossRef]
  82. Zhang, Q.; Yao, Y.; Wang, Y.; Zhang, Q.; Cheng, Z.; Li, Y.; Yang, X.; Wang, L.; Sun, H. Plant accumulation and transformation of brominated and organophosphate flame retardants: A review. Environ. Pollut. 2021, 288, 117742. [Google Scholar] [CrossRef]
  83. Hao, H.; Li, P.; Jiao, W.; Ge, D.; Hu, C.; Li, J.; Lv, Y.; Chen, W. Ensemble learning-based applied research on heavy metals prediction in a soil-rice system. Sci. Total Environ. 2023, 898, 165456. [Google Scholar] [CrossRef]
  84. Bagheri, M.; Al-jabery, K.; Wunsch, D.; Burken, J.G. Examining plant uptake and translocation of emerging contaminants using machine learning: Implications to food security. Sci. Total Environ. 2020, 698, 133999. [Google Scholar] [CrossRef] [PubMed]
  85. Gabriele, I.; Race, M.; Papirio, S.; Esposito, G. Phytoremediation of pyrene-contaminated soils: A critical review of the key factors affecting the fate of pyrene. J. Environ. Manag. 2021, 293, 112805. [Google Scholar] [CrossRef] [PubMed]
  86. Terzaghi, E.; Zanardini, E.; Morosini, C.; Raspa, G.; Borin, S.; Mapelli, F.; Vergani, L.; Di Guardo, A. Rhizoremediation half-lives of PCBs: Role of congener composition, organic carbon forms, bioavailability, microbial activity, plant species and soil conditions, on the prediction of fate and persistence in soil. Sci. Total Environ. 2018, 612, 544–560. [Google Scholar] [CrossRef] [PubMed]
  87. Zhao, F.; Ping, H.; Liu, J.; Zhao, T.; Wang, Y.; Cui, G.; Ha, X.; Ma, Z.; Li, C. Occurrence, potential sources, and ecological risks of traditional and novel organophosphate esters in facility agriculture soils: A case study in Beijing, China. Sci. Total Environ. 2024, 923, 171456. [Google Scholar] [CrossRef]
  88. Lipczynska-Kochany, E. Humic substances, their microbial interactions and effects on biological transformations of organic pollutants in water and soil: A review. Chemosphere 2018, 202, 420–437. [Google Scholar] [CrossRef]
  89. Gao, Y.; Zhu, L. Plant uptake, accumulation and translocation of phenanthrene and pyrene in soils. Chemosphere 2004, 55, 1169–1178. [Google Scholar] [CrossRef]
  90. Xie, Z.; Zhang, X.; Xie, Y.; Liu, F.; Sun, B.; Liu, W.; Wu, J.; Wu, Y. Bioaccumulation and potential endocrine disruption risk of legacy and emerging organophosphate esters in cetaceans from the northern South China Sea. Environ. Sci. Technol. 2024, 58, 4368–4380. [Google Scholar] [CrossRef]
  91. Matthews, H.; Dedrick, R. Pharmacokinetics of PCBS. Annu. Rev. Pharmacol. Toxicol. 1984, 24, 85–103. [Google Scholar] [CrossRef]
  92. Slawik, T.; Skibinski, R.; Paw, B.; Dzialo, G. Reversed-phase TLC study of the lipophilicity of some 3-hydroxy-1,2-benzisoxazoles substituted in the benzene ring. Acta Chromatogr. 2009, 21, 251–258. [Google Scholar] [CrossRef]
  93. Fan, Y.; Chen, S.-J.; Li, Q.-Q.; Zeng, Y.; Yan, X.; Mai, B.-X. Uptake of halogenated organic compounds (HOCs) into peanut and corn during the whole life cycle grown in an agricultural field. Environ. Pollut. 2020, 263, 114400. [Google Scholar] [CrossRef]
  94. Poorter, H.; Bühler, J.; van Dusschoten, D.; Climent, J.; Postma, J.A. Pot size matters: A meta-analysis of the effects of rooting volume on plant growth. Funct. Plant Biol. 2012, 39, 839–850. [Google Scholar] [CrossRef] [PubMed]
  95. Zhu, J. Abiotic stress signaling and responses in plants. Cell 2016, 167, 313–324. [Google Scholar] [CrossRef] [PubMed]
  96. Gao, Y.; Zhu, L.; Ling, W. Application of the partition-limited model for plant uptake of organic chemicals from soil and water. Sci. Total Environ. 2005, 336, 171–182. [Google Scholar] [CrossRef] [PubMed]
  97. Tao, Y.; Zhang, S.; Zhu, Y.-g.; Christie, P. Uptake and acropetal translocation of polycyclic aromatic hydrocarbons by wheat (Triticum aestivum L.) grown in field-contaminated soil. Environ. Sci. Technol. 2009, 43, 3556–3560. [Google Scholar] [CrossRef]
  98. Kipopoulou, A.M.; Manoli, E.; Samara, C. Bioconcentration of polycyclic aromatic hydrocarbons in vegetables grown in an industrial area. Environ. Pollut. 1999, 106, 369–380. [Google Scholar] [CrossRef]
  99. Huang, H.; Zhang, S.; Christie, P. Plant uptake and dissipation of PBDEs in the soils of electronic waste recycling sites. Environ. Pollut. 2011, 159, 238–243. [Google Scholar] [CrossRef]
  100. Harris, C.R.; Sans, W.W. Absorption of organochlorine insecticide residues from agricultural soils by root crops. J. Agric. Food Chem. 1967, 15, 861–863. [Google Scholar] [CrossRef]
  101. Beestman, G.B.; Keeney, D.R.; Chesters, G. Dieldrin uptake by corn as affected by soil properties. Agron. J. 1969, 61, 247–250. [Google Scholar] [CrossRef]
  102. Gonzalez, M.; Miglioranza, K.S.B.; de Moreno, J.E.A.; Moreno, V.J. Organochlorine pesticide residues in leek (Allium porrum) crops grown on untreated soils from an agricultural environment. J. Agric. Food Chem. 2003, 51, 5024–5029. [Google Scholar] [CrossRef]
  103. Mikes, O.; Cupr, P.; Trapp, S.; Klanova, J. Uptake of polychlorinated biphenyls and organochlorine pesticides from soil and air into radishes (Raphanus sativus). Environ. Pollut. 2009, 157, 488–496. [Google Scholar] [CrossRef]
  104. Zhao, H.-M.; Du, H.; Xiang, L.; Chen, Y.-L.; Lu, L.-A.; Li, Y.-W.; Li, H.; Cai, Q.-Y.; Mo, C.-H. Variations in phthalate ester (PAE) accumulation and their formation mechanism in Chinese flowering cabbage (Brassica parachinensis L.) cultivars grown on PAE-contaminated soils. Environ. Pollut. 2015, 206, 95–103. [Google Scholar] [CrossRef] [PubMed]
  105. Wang, S.; Liu, Y.; Zhang, F.; Jin, K.; Liu, H.; Zhai, L. Methane emissions sources and impact mechanisms altered by the shift from rice-wheat to rice-crayfish rotation. J. Clean. Prod. 2024, 434, 139968. [Google Scholar] [CrossRef]
  106. Sasaki, R. Characteristics and seedling establishment of rice nursling seedlings. JARQ 2004, 38, 7–13. [Google Scholar] [CrossRef]
  107. Pei, Z.; Fan, Y.; Wu, B. Drought Monitoring of Spring Maize in the Songnen Plain Using Multi-Source Remote Sensing Data. Atmosphere 2023, 14, 1614. [Google Scholar] [CrossRef]
  108. Zhang, F.; Duan, J.; Jiang, Z.; Jin, Y.; Zhang, Y. Drought characteristics of spring maize during the whole growth period in Songnen Plain. Chinese J. Ecol. 2024, 1, 13. [Google Scholar]
  109. Wang, D.; Xi, Y.; Shi, X.-Y.; Zhong, Y.-J.; Guo, C.-L.; Han, Y.-N.; Li, F.-M. Effect of plastic film mulching and film residues on phthalate esters concentrations in soil and plants, and its risk assessment. Environ. Pollut. 2021, 286, 117546. [Google Scholar] [CrossRef]
  110. Zeng, L.-J.; Huang, Y.-H.; Lu, H.; Geng, J.; Zhao, H.-M.; Xiang, L.; Li, H.; Li, Y.-W.; Mo, C.-H.; Cai, Q.-Y.; et al. Uptake pathways of phthalates (PAEs) into Chinese flowering cabbage grown in plastic greenhouses and lowering PAE accumulation by spraying PAE-degrading bacterial strain. Sci. Total Environ. 2022, 815, 152854. [Google Scholar] [CrossRef]
  111. Grava, J.; Raisanen, K.A. Growth and nutrient accumulation and distribution in wild Rice. Agron. J. 1978, 70, 1077–1081. [Google Scholar] [CrossRef]
Figure 1. Workflow of model construction and evaluation.
Figure 1. Workflow of model construction and evaluation.
Toxics 12 00737 g001
Figure 2. t-SNE visualization of RCF incorporating chemical, soil, and plant characteristics.
Figure 2. t-SNE visualization of RCF incorporating chemical, soil, and plant characteristics.
Toxics 12 00737 g002
Figure 3. Performance of GBRT, RF, SVR, and LR models for predicting RCFs of traditional and emerging ACs.
Figure 3. Performance of GBRT, RF, SVR, and LR models for predicting RCFs of traditional and emerging ACs.
Toxics 12 00737 g003
Figure 4. Permutation feature importance of the GBRT model for predicting RCFs of traditional and emerging ACs.
Figure 4. Permutation feature importance of the GBRT model for predicting RCFs of traditional and emerging ACs.
Toxics 12 00737 g004
Figure 5. SHAP analysis of the GBRT model for predicting RCFs of traditional and emerging ACs.
Figure 5. SHAP analysis of the GBRT model for predicting RCFs of traditional and emerging ACs.
Toxics 12 00737 g005
Figure 6. ICE analysis of key features of SOM, lipid content, protein content, xlogP, exposure time, and cultivation mode in the GBRT model for predicting RCFs of traditional and emerging ACs.
Figure 6. ICE analysis of key features of SOM, lipid content, protein content, xlogP, exposure time, and cultivation mode in the GBRT model for predicting RCFs of traditional and emerging ACs.
Toxics 12 00737 g006
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, S.; Shen, Y.; Gao, M.; Song, H.; Ge, Z.; Zhang, Q.; Xu, J.; Wang, Y.; Sun, H. Machine Learning Models for Predicting Bioavailability of Traditional and Emerging Aromatic Contaminants in Plant Roots. Toxics 2024, 12, 737. https://doi.org/10.3390/toxics12100737

AMA Style

Li S, Shen Y, Gao M, Song H, Ge Z, Zhang Q, Xu J, Wang Y, Sun H. Machine Learning Models for Predicting Bioavailability of Traditional and Emerging Aromatic Contaminants in Plant Roots. Toxics. 2024; 12(10):737. https://doi.org/10.3390/toxics12100737

Chicago/Turabian Style

Li, Siyuan, Yuting Shen, Meng Gao, Huatai Song, Zhanpeng Ge, Qiuyue Zhang, Jiaping Xu, Yu Wang, and Hongwen Sun. 2024. "Machine Learning Models for Predicting Bioavailability of Traditional and Emerging Aromatic Contaminants in Plant Roots" Toxics 12, no. 10: 737. https://doi.org/10.3390/toxics12100737

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop