Next Article in Journal
Characteristics of Sudden Change in Aerodynamic Load of High-Speed Train Caused by Wind Barrier and Its Buffer Measure
Previous Article in Journal
Reduction of Submicron-Sized Aerosols by Aerodynamically Assisted Electrical Attraction with Micron-Sized Aerosols
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Understanding Polymers Through Transfer Learning and Explainable AI

by
Luis A. Miccio
1,2,3
1
Donostia International Physics Center, P. M. de Lardizábal 4, 20018 San Sebastián, Spain
2
Institute of Materials Science and Technology (INTEMA), National Research Council (CONICET), Colón 10850, Mar del Plata 7600, Argentina
3
Departamento Polímeros y Materiales Avanzados: Física, Química y Tecnología, University of the Basque Country (UPV/EHU), P. Manuel Lardizábal 3, 20018 San Sebastián, Spain
Appl. Sci. 2024, 14(22), 10413; https://doi.org/10.3390/app142210413
Submission received: 23 September 2024 / Revised: 7 November 2024 / Accepted: 8 November 2024 / Published: 12 November 2024
(This article belongs to the Special Issue Applications of Machine Learning with White-Boxing)

Abstract

:
In this work we study the use of artificial intelligence models, particularly focusing on transfer learning and interpretability, to predict polymer properties. Given the challenges imposed by data scarcity in polymer science, transfer learning offers a promising solution by using learnt features of models pre-trained on other datasets. We conducted a comparative analysis of direct modelling and transfer learning-based approaches using a polyacrylates’ glass transitions dataset as a proof-of-concept study. The AI models utilized tokenized SMILES strings to represent polymer structures, with convolutional neural networks processing these representations to predict Tg. To enhance model interpretability, Shapley value analysis was employed to assess the contribution of specific chemical groups to the predictions. The results indicate that while transfer learning provides robust predictive capabilities, direct modelling on polymer-specific data offers superior performance, particularly in capturing the complex interactions influencing Tg. This work highlights the importance of model interpretability and the limitations of applying molecular-level models to polymer systems.

1. Introduction

In the rapidly advancing field of materials science, the integration of artificial intelligence (AI) has brought unprecedented opportunities for predicting material properties with remarkable accuracy. However, as AI models become more complex, their decision-making processes often resemble “black boxes” where the relationship between input features and corresponding predictions remains obscure to users [1,2]. This opacity has raised significant concerns regarding the interpretability and trustworthiness of AI-driven predictions. Consequently, the demand for “white boxing” or the ability to explain and understand the internal workings of these models has become a central theme in AI development. Enhanced interpretability improves model reliability and also facilitates the extraction of valuable scientific insights, bridging the gap between empirical data and theoretical understanding [3,4,5,6,7]. Several studies have focused on balancing complexity with interpretability [8,9], studying the effect of transparency in scientific discovery [10,11], surveying black-box accessibility [12], modelling polimerization [13], or exploring approaches that enhance trust in AI systems [14,15]. Traditional materials discovery and design methods rely heavily on empirical observations and human experts’ intuition, often requiring time-consuming and costly experimental validation. In this regard, several AI approaches were employed, introducing new ways of using SMILES strings [16], enhancing accuracy of predictions [17,18,19], employing dimensionality reduction techniques [20,21], or comparing approaches [22]. Among all these, recent advancements in explainable AI techniques, such as feature importance analysis and visualization tools, appear as a promising way of demystifying these models, making their predictions more transparent and their results more actionable [23,24].
One of the major challenges in applying AI to materials science, and particularly to polymer science, is data scarcity. High-quality, experimentally validated datasets are often limited, particularly in very specialized or niche domains. This scarcity can hinder the ability of AI models to learn and generalize effectively to new data, therefore leading to inaccurate predictions or even complete failures in capturing the underlying physics of a problem. To mitigate this issue, transfer learning has emerged as a powerful tool [25]. By fine-tuning pre-trained models on large, related datasets, transfer learning enables the adaptation of AI models to new tasks by using relatively small amounts of data, thus improving prediction accuracy and model robustness. This approach has shown great promise in fields where data collection is difficult, costly or very time-consuming, providing a viable path forward in the face of limited data [26]. For example, the prediction of the glass transition temperature (Tg) of polymers is a complex task that exemplifies the challenges in the application of AI to materials science. Tg is a critical thermal property that accounts for the transition of a polymer from a rigid, glassy state to a more flexible, rubbery state [27,28]. Accurate prediction of Tg is essential for the design and application of polymers in various industries, including packaging, automotive, and electronics. Traditional methods for predicting Tg often rely on empirical models that require extensive experimental data, which can be difficult to obtain for new or complex polymers. AI models offer a more flexible approach by learning the underlying relationships between polymer structure and Tg directly from data [29]. For instance, fully connected [30], convolutional [31] or graph neural networks [32], can efficiently capture structure-property relationships.
In this study, we focus on investigating the limitations of transfer learning when applied to the specific problem of predicting properties of polymers. In particular, we utilize models pre-trained on datasets of glass transition temperatures of pure molecules and apply them to the more complex task of predicting the Tg of long polymer chains. While transfer learning offers a pathway to overcome data scarcity, the difference in the fundamental physics of the problems presents unique challenges that may not be fully addressed by models trained on simpler molecular structures. This work offers new perspectives for polymer science by showing how transfer learning and whiteboxing can enhance our understanding of polymer properties, using Tg as a case study. Looking forward, our future work will explore the latent spaces generated by models trained with different molecular fingerprints, incorporating additional material properties. By comparing the latent spaces created from various properties and material classes, we could gain deeper insights into structure-property relationships, potentially leading to the design of new materials with optimised multifunctional performance across diverse applications.

2. Methods

In this section, we describe the origin and characteristics of the two datasets employed in this work for training and testing artificial neural networks capable of predicting the glass transition temperature of polymers, even under conditions of data scarcity. We also detail the data preprocessing steps, as well as the architecture and hyperparameter tuning of two distinct modelling approaches: (a) a model trained directly on polymer-specific data (referred to as “Direct”), and (b) a transfer learning-based model (referred to as “TL-based”), which is fine-tuned from a pre-trained model initially optimized for single-molecule systems.

2.1. Datasets

For this study, we utilized two datasets:
Polyacrylates Dataset: This dataset comprises structural representations of atactic polyacrylates, all of which are above the molecular weight saturation point (above 100,000 MU) [5]. The dataset includes SMILES (Simplified Molecular Input Line Entry System) [33,34] strings encoding the polymer structures, along with their experimentally determined glass transition temperatures (see Table S1). The polyacrylates dataset offers a very homogeneous and representative set of complex structures ideal for evaluating the capabilities of AI and transfer learning in polymer science. Since the backbone of all samples in the dataset is the same, the main difference arises from the pending chains, and therefore constitute an ideal measure of the limits of our transfer learning approach.
Molecular Glass Formers Dataset: This dataset consists of structural representations of molecular glass formers [6], also encoded as SMILES strings, accompanied by their corresponding experimental Tg values.
Both datasets are essential to our comparative analysis, enabling the direct training of polymer-specific models as well as the fine-tuning of transfer learning models.

2.2. Data Treatment

To prepare the data for machine learning, several preprocessing steps were performed to convert SMILES strings into a format fully compatible with neural networks.
Canonicalization of SMILES and features’ extraction: RDKit library was employed to convert SMILES strings to their canonical form, ensuring that each chemical structure is represented in a standardized and consistent manner across the dataset [34]. This process facilitated compatibility and uniformity during training. In addition to canonicalizing the SMILES strings, chemical features such as the number of atoms, hydrogen bond donors, and hydrogen bond acceptors were extracted.
Creation of dense array inputs: SMILES strings were tokenized [35] at the character level by assigning unique tokens to atomic symbols and bonding relationships within the SMILES notation. Due to the variability in the SMILES string lengths, sequences were padded to a uniform length (we used the maximum length across datasets) by adding zeros at the end. This step ensured that all input data had consistent dimensions when fed into the neural networks.

2.3. ANN Architecture

Our neural network architecture was designed to predict the Tg of polymers using tokenized SMILES strings. We employed both direct and a TL-based models to study prediction accuracy, particularly in situations where data is scarce. Figure 1 shows a schematic picture of the typical architecture of our direct models, where the shadowed area represents the “black box” in the neural network.
Embedding model: The embedding model in the diagram is composed by an embedding Layer and several 1D convolutional layers [36,37]. In a first step, the tokenized SMILES sequences are embedded into a high-dimensional space (dimension equal to 128) by an embedding layer, which captures the contextual relationships between tokens. This embedding layer transforms the integer arrays into dense vector representations, therefore facilitating the learning of complex patterns without needing sparse matrices [31]. Following embedding, the model employs one-dimensional convolutional layers (Conv1D) to extract spatial hierarchies of features from the embedded sequences. Each Conv1D layer is followed by a Leaky ReLU activation function to manage non-linearities and mitigate vanishing gradient and dead neurons. These layers are particularly good for processing sequential data like SMILES strings, even in comparison with one-hot encoding methods [31].
Fully Connected Layers: The output from the convolutional layers in the embedding model is passed to fully connected (dense) layers, allowing the model to learn specific non-linear relationships. The final fully connected layer’s activations are used as fingerprints, which capture the essential chemical features relevant to the Tg prediction.
Output Layer: The final layer consists of a single neuron with a linear activation function, outputting a continuous value corresponding to the Tg.

2.4. Training and Optimization

The models are compiled using the Adam optimizer [38], with parameters such as learning rate, beta values, and decay configured for optimal performance. Training is conducted over several epochs with an adjustable batch size, and an early stopping callback is implemented to prevent overfitting by monitoring the validation loss (validation mean absolute percentage error, MAPE). To identify the best-performing models, we performed grid training, exploring various combinations of learning rates, number of neurons and batch sizes (from gradient descent to batches of 10 samples). This systematic approach allowed us to compare different models based on their performance on the Tg prediction task, ensuring robust and reliable outcomes.

2.5. Transfer Learning and Comparison

In our transfer learning approach, a model pre-trained on the glass transition of molecular glass formers (i.e.,: individual molecules) [26] was fine-tuned to predict the Tg of acrylates. This approach uses the learned representations [20] from the pre-trained model (“monomer fingerprint”), allowing the network to focus on the specific relationships between polymer structure and Tg without needing to relearn basic chemical interpretations. Figure 2 shows a schematic picture of the transfer learning-based models (in this case the black box is located in the fine-tuning).
We assume that the fingerprints generated by the pre-trained model encapsulate significant chemical features and relevant information from the molecular glass formers. These fingerprints, although initially trained on a different task, provide a strong foundation for the Tg prediction model. By transferring this knowledge, we enhance the model’s ability to generalize from limited polymer data, leading to improved prediction accuracy even in the face of data scarcity. Training and optimization of the transfer learning model followed the same procedure as the individual models, ensuring consistency and maximizing predictive accuracy across the Tg prediction task.

2.6. Shapley Value Analysis for Model Interpretability

To further understand the contribution of specific chemical groups to the Tg prediction, we employed Shapley value analysis, a game-theoretic approach that provides insights into the importance of each feature in a model’s prediction. Shapley values quantify the contribution of each input feature (in this case, chemical groups represented in the SMILES strings) to the model’s output, allowing us to identify which parts of the polymer structure have the most significant impact on Tg.
The use of Shapley values is particularly advantageous due to its model-agnostic nature. This means that Shapley values can be applied to any machine learning model, regardless of its internal workings, making it a versatile tool for interpretability [39,40,41,42]. By analysing the Shapley values, we can dissect the predictions made by our neural networks, offering a transparent understanding of how different structural components of a polymer contribute to its glass transition temperature. In our approach, Shapley value analysis was applied to the chemical structures defined by SMILES strings, focusing on the marginal contribution of each symbol to the final prediction. This method allowed us to generate a visual representation of the influence of individual chemical features on the predicted Tg, thereby making the model’s decision-making process more transparent. Furthermore, we normalized the Shapley values to facilitate the quantitative comparison of the influence of specific chemical groups across different models. This comparative analysis not only enhances the interpretability of our models but also yields valuable chemical insights that can guide future efforts in material design and optimization.

3. Results and Discussion

3.1. Direct Modelling of the System

The models trained directly on the polyacrylates dataset (Direct) demonstrated excellent performance, even when tested with a substantial portion of the dataset (up to 50%) allocated as external test set. Given the relatively small total size of the dataset and the complexity of the features being learned, this result is particularly impressive.
Figure 3 illustrates the “predicted vs. experimental” glass transition temperature plots for the best-performing model (a summary of all models’ results is presented in Figures S1 and S2) at two different data split scenarios. The neural network effectively captures the relationship between the chemical structure and Tg across the studied temperature range, even when trained with only 50% of the data (only 67 samples left for training when accounting for a 30% validation set). Particularly noteworthy is the mean relative deviation in this external test set, which remains well below 10% (7.1%), closely aligning with the training and validation sets’ performance (6.3% and 7.5%, respectively).
Beyond these relatively low error metrics, our analysis has two main objectives: first, to identify the substructures most relevant to the network during training and prediction; second, to explore the limitations of transfer learning when utilizing a model that was pre-trained on single molecules. This requires a closer examination of outliers and chemical structures that are not well generalized by the direct model (whether due to underlying physical processes, such as nanophase separations, variations in sample preparation protocols, or simply underrepresented chemical features). To address this, we analysed the correlation of the observed deviations with basic chemical descriptors such as the number of atoms, hydrogen bond acceptors, and hydrogen bond donors in the monomer structure. This analysis provides insights into the model’s precision under specific chemical conditions, and offer a qualitative, rather than quantitative, perspective on the model’s limitations. Figure 4 illustrates the observed absolute (Pred Real) differences in Tg as functions of these parameters for all samples from the 50% test case (we focus in it for greater statistical significance). In Figure 4a, the effect of the number of atoms is shown, while Figure 4b focuses on the number of H acceptors. The red lines represent the best linear fits, with the red shaded areas indicating the confidence intervals for these fits, and the black dashed lines marking the zero difference. The deviations appear to be evenly distributed across samples with different numbers of atoms, suggesting that the direct model’s performance is not significantly affected by the monomer’s raw length. However, slight deviations can be observed for monomers with less or more than four H acceptors, reflecting the model’s difficulties at the time of learning intermolecular forces (dipolar or H bonds) from such a limited set of training samples.
The analysis of samples with no H donors is particularly interesting, as these cannot form hydrogen bonds and therefore exhibit weaker (only dipolar or Van der Waals are possible) intermolecular forces. As shown in Figure 4c, the observed deviations are approximately normally distributed around zero (μ = −1.5 K and σ = 30.5 K). These findings will be further discussed in the context of the transfer learning-based model.

3.2. Modelling of the System from Transfer Learning on Pre Trained Model

The TL-based models also demonstrated a strong performance. Thanks to the pre-trained model, these models need to add only a fraction of the parameters when compared to their direct counterparts (approximately one-tenth) without compromising accuracy. It was observed that an increase in the number of neurons and, consequently, model internal parameters did not improve their performance. Figure 5 show the predicted vs. experimental Tg plots for the best model at each data scarcity scenario (a summary of the grid training of the different models is presented in the Supplementary Materials, see Figures S2 and S4). The results suggest that the neural network successfully captured the relationship between chemical structure and Tg across the studied temperature range, even when trained with only half of the available data. In this case, the mean relative deviation in the external control group was 11.3%, comparable to the corresponding training and validation sets’ performance during the training process (11.6% and 13.3%, respectively).
As in the previous case, we have analysed the deviations as a function of the number of atoms and number of H acceptors. Figure 6a shows the effect of the number of atoms, while Figure 6b depicts the effect of the number of H acceptors. The red lines represent the best linear fits, the shaded areas indicate confidence intervals, and the black dashed lines highlight the zero-error value. The observed deviations are not as evenly distributed across samples with varying numbers of atoms, suggesting that the performance in TL-based model is slightly influenced by the monomer’s length and/or complexity. Moreover, the model shows a stronger dependency on the number of H acceptors, showing a larger spreading of the values at low number of acceptors (see for example the range of observed deviations at n = 2 in Figure 6b). This observation aligns with the expectation that a model trained on a fingerprint obtained from molecules could underestimate low intermolecular forces scenarios, due to the fact that glass transition in molecules depends strongly on these. This hypothesis is further supported by the results in samples with no H donors (and therefore lower intermolecular forces), which show a significantly larger average deviation than that observed in the direct model, with a deviation mean of −15.5 K (and σ = 43.6 K, see Figure 6c).

3.3. Comparison of Both Approaches

The differences in performance between the direct modelling and transfer learning-based approaches can be attributed to the inherent limitations of molecular-based model fingerprinting. To fully understand these limitations, it is crucial to consider the underlying physics of the glass transition processes in linear polymers, particularly polyacrylates. In these polymers, the glass transition can be rationalized as a chain relaxation within a characteristic length, the Kuhn length [43,44]. Within this characteristic region, which for example is approximately 1.7 nm for PMMA [43], several factors like segment stiffness (in turn influenced by the chemical groups present), covalent bonding, entanglements with neighbouring chains, and intermolecular forces, play critical roles in determining the glass transition temperature [43].
Figure 7a presents a schematic representation of these phenomena, where chemical groups are represented by balls connected by covalent bonds. The dashed circles indicate the characteristic length within which relaxation occurs (in turn illustrated by a red arrow), and the dashed blue lines represent the intermolecular forces at play. The neural network must implicitly learn the contributions of these various factors, which could be summarized as: (a) the chemical structure as encoded by the SMILES strings, (b) the stiffness, intramolecular forces and entanglement effects of the polymer chains, and (c) the intermolecular forces at play. In contrast, the glass transition process in molecular systems is not confined to a single polymer chain but occurs across multiple molecules, forming a sort of physical network (Figure 7b). As a result, chain stiffness and entanglements play a less significant role in these systems [43], and pre-trained models (which are based on molecular data) capture less information about these phenomena. Consequently, these aspects must be learned during the transfer learning process, provided that the fingerprint data allows for it. The puzzle in Figure 7c schematically represents the complete physics of the system (in this case the Tg in atactic polyacrylates above molecular weight saturation), with the missing piece representing the irreducible error (due to factors such as limited data, variations in measurement techniques, and errors in sample preparation) and the missing physical phenomena in the fingerprint (e.g., the above-mentioned polymer chain effects).
Given this interpretation, the superior performance of the direct model can be explained by its ability to directly learn the relevant chemical features from the dataset of polyacrylate monomers. In this particular example, and although both models show good performance, this diminishes the relative advantage of using a transfer learning approach but provides an excellent example of the underlying physics’ importance.
While deviation metrics provide a quantitative measure of performance, they offer only a superficial understanding of the models’ ability to generalize across different scenarios. Even comparisons based on training robustness during hyperparameter tuning do not fully explain the differences in the models’ ability to generalize. For instance, a comparison of the predicted versus experimental results (presented in Figures S2 and S4) highlights significant differences between the direct model and the TL-based model. As shown, the relative error density in the TL-based model’s plots is far more dispersed compared to the direct model. This wider spread indicates a higher degree of variability and suggests that the TL-based model struggles to generalize as effectively across different data points. In contrast, the direct model exhibits a more concentrated error distribution, implying less sensitivity to the selected hyperparameters.
To better understand the underlying factors that influence these generalization behaviours, it is important to analyse the contributions of individual input features. We used Shapley value analysis to further test and validate our understanding of the models’ inner workings. Figure 8 compares how specific chemical features are weighted by the direct and TL-based models when predicting Tg, with relative deviations shown above each sample. To understand the limitations of the TL-based model, we selected examples with the poorest performances (i.e., those with the largest deviations). As shown, all samples possess short side chains and mainly nonpolar groups. Notably, most of these examples feature a tert-butyl group that is heterogeneously weighted by the TL-based model, while it is weighted more homogeneously by the direct model, which, incidentally, performs better in these cases. A similar trend is observed across the rest of the structure, with the TL-based model displaying more heterogeneous behaviour compared to the direct model (for example, in the methacrylate or carbonyl groups). These results reinforce our hypothesis that models based on molecular glass formers tend to underestimate intra-chain effects while overestimating intermolecular forces, indicating that they do not generalize efficiently to polymer chains.
Conversely, analysing examples where the TL-based model exhibits low deviations can also shed light on its strengths and limitations. In the example shown in Figure 9, both models slightly overestimate the Tg, predicting values of 317 K (direct) and 314 K (TL-based), compared to the actual value of 310 K. Analysing their Shapley contributions, we observe that in scenarios involving larger side chains (where the side chain structure significantly influences the polymer’s Tg) the TL-based model shows improved performance. Specifically, the linear segment within the side chain (highlighted as A) decreases the predicted glass transition, as expected for a more mobile structure. On the other hand, the stiffer structure containing two phenyl groups (labelled as B) contributes to increasing the predicted Tg. Notably, this contribution to the Tg prediction is positive in both models, though it is twice as large in the direct model (53% of the total contributions, compared to 46% in the TL-based model). These results suggest that the TL-model may have a better “understanding” of the underlying physics when the side chain’s effect is significant, and therefore the scenario is more similar to the molecular glass former case.
Comparable observations can be made for other examples where the TL-based model performs well. Figure 10 illustrates the case of other relatively complex side chains where the TL-based model shows deviations of less than 2%. In the first example (labelled as 1), the phenyl group (A) and the carboxyl-containing segment (B) tend to increase the predicted value, likely due to their stiffness and their influence on dipolar forces among molecules, respectively. In both cases, the contribution is similar, accounting for about 25% of the prediction. In the second example, the linear chain (labelled A) does not significantly increase the predicted Tg, as expected for a relatively mobile linear chain. Most of the increase is attributed to the phenyl groups at the ends (labelled as B).
A comparison of results of the direct and transfer learning models is shown in Table 1. Shapley value analysis provides a valuable complementary perspective on how different chemical features influence Tg, offering insights beyond simple numerical correlation. This level of interpretability is crucial for validating the models’ predictions and understanding the limitations of the transfer learning methods, especially under data scarcity conditions. The analysis not only enhances the models’ interpretability but also offers critical insights into the molecular features that govern Tg. Moving forward, the perspectives provided by this study open the door to explore other critical polymer properties beyond Tg. While experimental validation remains necessary, these insights can serve as a valuable tool for selecting promising candidates for further testing. This would enable the rational design of multifunctional, next-generation materials optimized for advanced industrial applications, including sustainable energy solutions and green technologies.

4. Conclusions

In this study, we explored the effectiveness of using direct modelling and transfer learning approaches to predict the glass transition temperature of polyacrylates. The results demonstrated that both approaches yield accurate predictions; however, direct modelling on polymer-specific data generally outperformed the transfer learning models. The transfer learning models, although less accurate, showed significant potential, especially when dealing with more complex systems, where their ability to leverage pre-trained knowledge can be exploited.
Shapley value analysis provided crucial insights into the inner workings of the models, allowing us to understand the influence of different chemical groups on the predicted Tg values. This analysis revealed that, while transfer learning models successfully incorporated knowledge from molecular systems, they struggled to fully account for polymer-specific phenomena such as chain stiffness and entanglements.
Our findings highlight the value of combining AI techniques with explainability tools in materials science, facilitating a deeper understanding of how chemical structures influence polymer properties. While this approach cannot replace experimental validation, it has significant potential to reduce costs by offering valuable insights and optimizing the selection of candidates for testing. Future work will focus on minimizing bias in transfer learning models by incorporating a broader range of pre-trained models to capture more aspects of the underlying physics, thereby enhancing the applicability of AI in polymer science.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app142210413/s1. Table S1: Acrylates dataset, Figure S1 grid training direct models, Figure S2 Predicted vs experimental results direct models, Figure S3 grid training TL-based models, Figure S4 Predicted vs experimental results TL-based models.

Funding

We gratefully acknowledge the financial support from PID2023-146348NB-I00 funded by MCIN/AEI/10.13039/50110001103 and IT1566-22 funded by Basque Government.

Data Availability Statement

The data that supports the findings of this study are available within the article and in the Supporting Information File (SI).

Acknowledgments

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPU used for this research.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Rai, A. Explainable AI: From black box to glass box. J. Acad. Mark. Sci. 2020, 48, 137–141. [Google Scholar] [CrossRef]
  2. Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence. Cogn. Comput. 2024, 16, 45–74. [Google Scholar] [CrossRef]
  3. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]
  4. Hu, J.; Li, Z.; Lin, J.; Zhang, L. Prediction and Interpretability of Glass Transition Temperature of Homopolymers by Data-Augmented Graph Convolutional Neural Networks. ACS Appl. Mater. Interfaces 2023, 15, 54006–54017. [Google Scholar] [CrossRef]
  5. Miccio, L.A.; Schwartz, G.A. Localizing and quantifying the intra-monomer contributions to the glass transition temperature using artificial neural networks. Polymer 2020, 203, 122786. [Google Scholar] [CrossRef]
  6. Borredon, C.; Miccio, L.A.; Cerveny, S.; Schwartz, G.A. Characterising the glass transition temperature-structure relationship through a recurrent neural network. J. Non-Cryst. Solids X 2023, 18, 100185. [Google Scholar] [CrossRef]
  7. Liu, J.; Wu, Y.; Lin, Z.; Peng, L.; Chu, Q.; Tang, Y.; Zhang, W. Visual analytics of an interpretable prediction model for the glass transition temperature of fluoroelastomers. Mater. Today Commun. 2024, 40, 110155. [Google Scholar] [CrossRef]
  8. Montavon, G.; Samek, W.; Müller, K.R. Methods for interpreting and understanding deep neural networks. Digit. Signal Process. 2018, 73, 1–15. [Google Scholar] [CrossRef]
  9. Oviedo, F.; Ferres, J.L.; Buonassisi, T.; Butler, K.T. Interpretable and explainable machine learning for materials science and chemistry. Acc. Mater. Res. 2022, 3, 597–607. [Google Scholar] [CrossRef]
  10. Roscher, R.; Bohn, B.; Duarte, M.F.; Garcke, J. Explainable machine learning for scientific insights and discoveries. IEEE Access 2020, 8, 42200–42216. [Google Scholar] [CrossRef]
  11. Zhong, X.; Gallagher, B.; Liu, S.; Kailkhura, B.; Hiszpanski, A.; Han, T.Y.-J. Explainable machine learning in materials science. npj Comput. Mater. 2022, 8, 204. [Google Scholar] [CrossRef]
  12. Adadi, A.; Berrada, M. Peeking inside the black-box: A survey on explainable artificial intelligence (xai). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
  13. Fiosina, J.; Sievers, P.; Drache, M.; Beuermann, S. Polymer reaction engineering meets explainable machine learning. Comput. Chem. Eng. 2023, 177, 108356. [Google Scholar] [CrossRef]
  14. Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
  15. Vilone, G.; Longo, L. Notions of explainability and evaluation approaches for explainable artificial intelligence. Inf. Fusion 2021, 76, 89–106. [Google Scholar] [CrossRef]
  16. Chen, G.; Tao, L.; Li, Y. Predicting Polymers’ Glass Transition Temperature by a Chemical Language Processing Model. Polymers 2021, 13, 1898. [Google Scholar] [CrossRef]
  17. Nguyen, T.; Bavarian, M. A Machine Learning Framework for Predicting the Glass Transition Temperature of Homopolymers. Ind. Eng. Chem. Res. 2022, 61, 12690–12698. [Google Scholar] [CrossRef]
  18. Cassar, D.R.; de Carvalho, A.C.; Zanotto, E.D. Predicting glass transition temperatures using neural networks. Acta Mater. 2018, 159, 249–256. [Google Scholar] [CrossRef]
  19. Mattioni, B.E.; Jurs, P.C. Prediction of Glass Transition Temperatures from Monomer and Repeat Unit Structure Using Computational Neural Networks. J. Chem. Inf. Comput. Sci. 2002, 42, 232–240. [Google Scholar] [CrossRef]
  20. Miccio, L.A.; Schwartz, G.A. Mapping Chemical Structure-Glass Transition Temperature Relationship through Artificial Intelligence. Macromolecules 2021, 54, 1811–1817. [Google Scholar] [CrossRef]
  21. Mysona, J.A.; Nealey, P.F.; de Pablo, J.J. Machine Learning Models and Dimensionality Reduction for Prediction of Polymer Properties. Macromolecules 2024, 57, 1988–1997. [Google Scholar] [CrossRef]
  22. Hou, F.; Wu, Z.; Hu, Z.; Xiao, Z.; Wang, L.; Zhang, X.; Li, G. Comparison Study on the Prediction of Multiple Molecular Properties by Various Neural Networks. J. Phys. Chem. A 2018, 122, 9128–9134. [Google Scholar] [CrossRef] [PubMed]
  23. Miccio, L.A.; Borredon, C.; Schwartz, G.A. A glimpse inside materials: Polymer structure—Glass transition temperature relationship as observed by a trained artificial intelligence. Comput. Mater. Sci. 2024, 236, 112863. [Google Scholar] [CrossRef]
  24. Uddin, M.J.; Fan, J. Interpretable Machine Learning Framework to Predict the Glass Transition Temperature of Polymers. Polymers 2024, 16, 1049. [Google Scholar] [CrossRef]
  25. Audus, D.J.; De Pablo, J.J. Polymer Informatics: Opportunities and Challenges. ACS Macro Lett. 2017, 6, 1078–1082. [Google Scholar] [CrossRef]
  26. Borredon, C.; Miccio, L.A.; Schwartz, G.A. Transfer learning-driven artificial intelligence model for glass transition temperature estimation of molecular glass formers mixtures. Comput. Mater. Sci. 2024, 238, 112931. [Google Scholar] [CrossRef]
  27. Gibbs, J.H. Nature of the Glass Transition in Polymers. J. Chem. Phys. 2004, 25, 185. [Google Scholar] [CrossRef]
  28. Pugar, J.A.; Childs, C.M.; Huang, C.; Haider, K.W.; Washburn, N.R. Elucidating the Physicochemical Basis of the Glass Transition Temperature in Linear Polyurethane Elastomers with Machine Learning. J. Phys. Chem. B 2020, 124, 9722–9733. [Google Scholar] [CrossRef]
  29. Zhang, Y.; Xu, X. Machine learning glass transition temperature of polymers. Heliyon 2020, 6, e05055. [Google Scholar] [CrossRef]
  30. Joyce, S.; Osguthorpe, D.; Padget, J.; Price, G. Neural network prediction of glass-transition temperatures from monomer structure. J. Chem. Soc. Faraday Trans. 1995, 91, 2491–2496. [Google Scholar] [CrossRef]
  31. Miccio, L.A.; Schwartz, G.A. From chemical structure to quantitative polymer properties prediction through convolutional neural networks. Polymer 2020, 193, 122341. [Google Scholar] [CrossRef]
  32. Volgin, I.V.; Batyr, P.A.; Matseevich, A.V.; Dobrovskiy, A.Y.; Andreeva, M.V.; Nazarychev, V.M.; Goikhman, M.Y.; Vizilter, Y.V.; Askadskii, A.A.; Lyulin, S.V. Machine Learning with Enormous ‘synthetic’ Data Sets: Predicting Glass Transition Temperature of Polyimides Using Graph Convolutional Neural Networks. ACS Omega 2022, 7, 43678–43691. [Google Scholar] [CrossRef] [PubMed]
  33. Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31–36. [Google Scholar] [CrossRef]
  34. O’Boyle, N.M. Towards a Universal SMILES representation—A standard method to generate canonical SMILES based on the InChI. J. Cheminform. 2012, 4, 22. [Google Scholar] [CrossRef]
  35. Ucak, U.V.; Ashyrmamatov, I.; Lee, J. Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization. J. Cheminform. 2023, 15, 55. [Google Scholar] [CrossRef]
  36. Wu, H.; Gu, X. Max-Pooling Dropout for Regularization of Convolutional Neural Networks. arXiv 2015, arXiv:1512.01400. [Google Scholar]
  37. LeCun, Y.; Kavukcuoglu, K.; Farabet, C. Convolutional networks and applications in vision. In Proceedings of the 2010 IEEE International Symposium on Circuits and Systems, Paris, France, 30 May–2 June 2010; pp. 253–256. [Google Scholar] [CrossRef]
  38. Kingma, D.P.; Ba, J.L. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  39. Sundararajan, M.; Najmi, A. The many shapley values for model explanation. In Proceedings of the 37th International Conference on Machine Learning ICML 2020, Online, 13–18 July 2020; PartF168147-12. pp. 9210–9220. [Google Scholar]
  40. Anjum, M.; Khan, K.; Ahmad, W.; Ahmad, A.; Amin, M.N.; Nafees, A. New SHapley Additive ExPlanations (SHAP) Approach to Evaluate the Raw Materials Interactions of Steel-Fiber-Reinforced Concrete. Materials 2022, 15, 6261. [Google Scholar] [CrossRef]
  41. Liu, T.; Barnard, A. Shapley Based Residual Decomposition for Instance Analysis. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
  42. Barnard, A.S.; Fox, B.L. Importance of Structural Features and the Influence of Individual Structures of Graphene Oxide Using Shapley Value Analysis. Chem. Mater. 2023, 35, 8840–8856. [Google Scholar] [CrossRef]
  43. Rubinstein, M.; Colby, R.H. Polymer Physics; Oxford University Press (OUP): Oxford, UK, 2003. [Google Scholar] [CrossRef]
  44. Qin, J. Similarity of Polymer Packing in Melts Is Dictated by N ¯ . Macromolecules 2024, 57, 1885–1892. [Google Scholar] [CrossRef]
Figure 1. Architecture of the neural network used for predicting the glass transition temperature of polymers. The model includes an embedding layer that converts tokenized SMILES strings into dense vector representations, followed by multiple 1D convolutional layers for feature extraction. The output of these layers is passed through fully connected layers, where the final activations are used as fingerprints. The final layer outputs a continuous value corresponding to the Tg. The shadowed area represents the “black box” within the model, highlighting the complexity and need for interpretability techniques such as Shapley value analysis.
Figure 1. Architecture of the neural network used for predicting the glass transition temperature of polymers. The model includes an embedding layer that converts tokenized SMILES strings into dense vector representations, followed by multiple 1D convolutional layers for feature extraction. The output of these layers is passed through fully connected layers, where the final activations are used as fingerprints. The final layer outputs a continuous value corresponding to the Tg. The shadowed area represents the “black box” within the model, highlighting the complexity and need for interpretability techniques such as Shapley value analysis.
Applsci 14 10413 g001
Figure 2. Schematic representation of the transfer learning-based model used for Tg prediction. The model is pre-trained on a molecular glass formers dataset and fine-tuned on a polymer-specific dataset (polyacrylates). The pre-trained model generates fingerprints based on learned molecular features, which are then used as input for the Tg prediction task. The black box indicates the fine-tuning process, emphasizing the transfer of knowledge from molecular systems to more complex polymer structures.
Figure 2. Schematic representation of the transfer learning-based model used for Tg prediction. The model is pre-trained on a molecular glass formers dataset and fine-tuned on a polymer-specific dataset (polyacrylates). The pre-trained model generates fingerprints based on learned molecular features, which are then used as input for the Tg prediction task. The black box indicates the fine-tuning process, emphasizing the transfer of knowledge from molecular systems to more complex polymer structures.
Applsci 14 10413 g002
Figure 3. Predicted vs. experimental Tg plots for best performing direct model at each data scarcity scenario. (30%) train on 93 samples, validate on 41 samples and test on 58 samples, and (50%) train on 67 samples, validate on 29 samples and test on 93 samples. Dashed lines are just an arbitrary guide that indicate the 25 and 50K difference regions (grey and red, respectively). The black arrows indicate these samples where the model large deviations.
Figure 3. Predicted vs. experimental Tg plots for best performing direct model at each data scarcity scenario. (30%) train on 93 samples, validate on 41 samples and test on 58 samples, and (50%) train on 67 samples, validate on 29 samples and test on 93 samples. Dashed lines are just an arbitrary guide that indicate the 25 and 50K difference regions (grey and red, respectively). The black arrows indicate these samples where the model large deviations.
Applsci 14 10413 g003
Figure 4. Plots showing the correlation between the observed Tg prediction deviations and molecular descriptors, including the number of atoms (#atoms) (a), the number of hydrogen bond acceptors (#acc) (b), and the observed difference distribution for samples with no H donors (c). The red lines represent the best linear fits, with shaded areas indicating confidence intervals. The deviations are approximated to a normal distribution, providing insights into the model’s ability to generalize across different polymer structures.
Figure 4. Plots showing the correlation between the observed Tg prediction deviations and molecular descriptors, including the number of atoms (#atoms) (a), the number of hydrogen bond acceptors (#acc) (b), and the observed difference distribution for samples with no H donors (c). The red lines represent the best linear fits, with shaded areas indicating confidence intervals. The deviations are approximated to a normal distribution, providing insights into the model’s ability to generalize across different polymer structures.
Applsci 14 10413 g004
Figure 5. Predicted vs. experimental Tg plots for best performing TL-based model at each data scarcity scenario. (30%) train on 93 samples, validate on 41 samples and test on 58 samples, and (50%) train on 67 samples, validate on 29 samples and test on 93 samples. Dashed lines are just an arbitrary guide that indicate the 25 and 50K difference regions (grey and red, respectively). The black arrows indicate these samples where the model large deviations. The results highlight the transfer learning model’s effectiveness, though with slightly higher deviations compared to direct modelling.
Figure 5. Predicted vs. experimental Tg plots for best performing TL-based model at each data scarcity scenario. (30%) train on 93 samples, validate on 41 samples and test on 58 samples, and (50%) train on 67 samples, validate on 29 samples and test on 93 samples. Dashed lines are just an arbitrary guide that indicate the 25 and 50K difference regions (grey and red, respectively). The black arrows indicate these samples where the model large deviations. The results highlight the transfer learning model’s effectiveness, though with slightly higher deviations compared to direct modelling.
Applsci 14 10413 g005
Figure 6. Plots showing the correlation between the observed Tg prediction deviations and molecular descriptors, including the number of atoms (#atoms) (a), the number of hydrogen bond acceptors (#acc) (b), and the observed difference distribution for samples with no H donors (c). The red lines represent the best linear fits, with shaded areas indicating confidence intervals. The deviations are approximated to a normal distribution, providing insights into the model’s ability to generalize across different polymer structures.
Figure 6. Plots showing the correlation between the observed Tg prediction deviations and molecular descriptors, including the number of atoms (#atoms) (a), the number of hydrogen bond acceptors (#acc) (b), and the observed difference distribution for samples with no H donors (c). The red lines represent the best linear fits, with shaded areas indicating confidence intervals. The deviations are approximated to a normal distribution, providing insights into the model’s ability to generalize across different polymer structures.
Applsci 14 10413 g006
Figure 7. (a) A schematic of the glass transition process in polyacrylates, showing the relaxation of polymer chains within a characteristic length (Kuhn length, represented by dashed circles) where several factors, such as segment stiffness, covalent bonds, entanglements, and intermolecular forces, play crucial roles. (b) A schematic of the glass transition process in molecular systems, where the transition occurs across multiple molecules, with less emphasis on chain stiffness and entanglements. (c) The puzzle analogy representing the complete physics of the glass transition process, with the missing piece indicating irreducible error and missing physical phenomena in the molecular-based fingerprint.
Figure 7. (a) A schematic of the glass transition process in polyacrylates, showing the relaxation of polymer chains within a characteristic length (Kuhn length, represented by dashed circles) where several factors, such as segment stiffness, covalent bonds, entanglements, and intermolecular forces, play crucial roles. (b) A schematic of the glass transition process in molecular systems, where the transition occurs across multiple molecules, with less emphasis on chain stiffness and entanglements. (c) The puzzle analogy representing the complete physics of the glass transition process, with the missing piece indicating irreducible error and missing physical phenomena in the molecular-based fingerprint.
Applsci 14 10413 g007
Figure 8. Comparison of Shapley value contributions between the direct and transfer learning-based models for samples with the largest deviations. The figure highlights the weighting of specific chemical groups, such as tert-butyl and methacrylate groups, where the transfer learning model shows more heterogeneous behaviour compared to the direct model. The results suggest that the transfer learning model underestimates the influence of short nonpolar chains, leading to poorer generalization to polymer chains.
Figure 8. Comparison of Shapley value contributions between the direct and transfer learning-based models for samples with the largest deviations. The figure highlights the weighting of specific chemical groups, such as tert-butyl and methacrylate groups, where the transfer learning model shows more heterogeneous behaviour compared to the direct model. The results suggest that the transfer learning model underestimates the influence of short nonpolar chains, leading to poorer generalization to polymer chains.
Applsci 14 10413 g008
Figure 9. Shapley value contributions for an example where both the direct and transfer learning models slightly overestimate Tg. The analysis shows how the linear segment within the side chain (A) decreases the predicted Tg, while stiffer structures, such as phenyl groups, increase it (B). The direct model exhibits a better understanding of the impact of side chain structures on Tg, particularly when these structures significantly influence the polymer’s thermal behaviour.
Figure 9. Shapley value contributions for an example where both the direct and transfer learning models slightly overestimate Tg. The analysis shows how the linear segment within the side chain (A) decreases the predicted Tg, while stiffer structures, such as phenyl groups, increase it (B). The direct model exhibits a better understanding of the impact of side chain structures on Tg, particularly when these structures significantly influence the polymer’s thermal behaviour.
Applsci 14 10413 g009
Figure 10. Shapley value contributions for two examples with complex side chains where the transfer learning model shows low deviations. The phenyl group and oxygen-containing segments (B) are shown to increase the predicted Tg due to their stiffness and influence on dipolar forces. The analysis highlights the transfer learning model’s ability to accurately predict Tg in polymers with more complex side chain structures, despite its general limitations.
Figure 10. Shapley value contributions for two examples with complex side chains where the transfer learning model shows low deviations. The phenyl group and oxygen-containing segments (B) are shown to increase the predicted Tg due to their stiffness and influence on dipolar forces. The analysis highlights the transfer learning model’s ability to accurately predict Tg in polymers with more complex side chain structures, despite its general limitations.
Applsci 14 10413 g010
Table 1. Comparison of results of the direct and transfer learning models.
Table 1. Comparison of results of the direct and transfer learning models.
AspectDirect ModelTransfer Learning Model
Training dataTrained directly on polymer-specific dataset. Requires larger polymer-specific datasets.Pre-trained on molecular dataset and fine-tuned on polymer dataset. Performs well even with small dataset due to transfer of pre-trained knowledge.
Prediction
Accuracy
Higher accuracy for polymer-specific phenomena.Comparable accuracy, but lower for certain polymer-specific interactions. It shows a biased output for short side chain samples.
GeneralizabilityLimited to polyacrylates (or chemically similar) data; struggles in data scarcity conditions.Good generalization due to the use of a fingerprint that carries pre-trained knowledge; can still work under data scarcity conditions. When faced with new structures, need to adapt previous knowledge during fine tuning. As a result, in data scarcity conditions can show bias towards underrepresented samples.
Handling of Chain
Interactions
Effectively captures polymer chain stiffness, entanglements, and intra-chain effects.Underestimates effects of weak nonpolar interactions and intra-chain phenomena.
InterpretabilityShapley values indicate consistent contributions.Shows heterogeneous Shapley contributions for short pendant chains.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Miccio, L.A. Understanding Polymers Through Transfer Learning and Explainable AI. Appl. Sci. 2024, 14, 10413. https://doi.org/10.3390/app142210413

AMA Style

Miccio LA. Understanding Polymers Through Transfer Learning and Explainable AI. Applied Sciences. 2024; 14(22):10413. https://doi.org/10.3390/app142210413

Chicago/Turabian Style

Miccio, Luis A. 2024. "Understanding Polymers Through Transfer Learning and Explainable AI" Applied Sciences 14, no. 22: 10413. https://doi.org/10.3390/app142210413

APA Style

Miccio, L. A. (2024). Understanding Polymers Through Transfer Learning and Explainable AI. Applied Sciences, 14(22), 10413. https://doi.org/10.3390/app142210413

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop