1. Introduction
The emergence of bioplastics [
1] represents a significant step toward sustainable and environmentally friendly materials [
2]. In the present decade, sustainable development goals addressing pollution and fossil fuel depletion have become a promising area of research. Computational methods, such as machine learning [
3] and molecular dynamics simulations [
4], are accelerating the discovery and optimization of bioplastics by over a decade [
5]. Recent reviews have emphasized the transformative role of AI in accelerating the discovery of sustainable polymers. Tran et al. [
6] highlighted how AI-driven frameworks are enabling the design of functional and recyclable polymers by integrating property prediction, synthesis planning, and sustainability metrics into a unified pipeline. In addition to traditional ML approaches, large language models (LLMs) have recently been adapted to polymer science. Gupta et al. [
7] demonstrated that fine-tuned LLMs can predict key thermal properties of polymers using natural language inputs, offering a new paradigm for polymer informatics that reduces the need for handcrafted descriptors.
The potential of bioplastics to reduce the carbon footprint and foster a circular economy is substantial, despite challenges such as higher costs. The growing demand for sustainable plastics has intensified research in bioplastics, particularly in biodegradable polyesters produced via ring-opening polymerization (ROP) of lactones and macrolactones [
8]. Beyond conventional polyesters, recent work by Deng et al. [
9] introduced a multilevel descriptor framework for high-throughput screening of elastomers. This approach enables the design of polymers with tailored mechanical properties, which is particularly relevant for bioplastics intended for flexible or biomedical applications.
Biodegradable polyesters, including polyhydroxybutyrate (PHB), polylactic acid (PLA) [
10], polycaprolactone (PCL), and polyhydroxyalkanoates (PHAs), are emerging as promising materials for environmental sustainability. Their applications span packaging, energy, and biomedical sectors, contributing to sustainability goals [
11].
Artificial intelligence (AI) infused with machine learning (ML) models presents an unparalleled approach for navigating complex polymeric structure–property relationships and optimizing polymer design, thereby accelerating bioplastics discovery. The applications of AI span from the polymerization process (synthesis) to property prediction (discovery) in polymer science. ML applications for polymeric materials had been explored by Malashin et al. [
12], who discussed the influence of ML approaches on data analysis, predictive modeling, and polymer design. As such, ML algorithms can predict the properties of untested polymers and guide optimal design based on the existing ones. Several studies have developed ML models trained on large datasets containing experimental data related to parameters such as polymer branching, melting temperature, glass transition temperature, molecular weight, polydispersity, and processing conditions, all of which govern polymer properties. Hence, ML is emerging as a powerful tool for optimizing bioplastic synthesis and predicting polymer properties. ML is transforming bioplastics production by aiding in selecting optimal catalysts and biomass feedstocks using data-driven approaches that minimize trial-and-error experimentation. ML techniques have dynamically advanced the evolving landscape of green polymer discovery for diverse applications in packaging, energy, and healthcare, supporting sustainable development goals.
However, the accelerated discovery of sustainable polymers through AI-based models remains underexplored in the scientific literature. This review addresses gaps in the state-of-the-art literature by including case studies and key findings, and discusses the following major aspects of bio-based polymer informatics:
ML for efficient bioplastic synthesis, focusing on retrosynthesis and catalyst design for the ROP process.
The profound impact of polymer informatics on understanding structure–property relationships in bioplastics, with insights from open-source data repositories.
The role of generative AI, molecular descriptors, and explainable AI (XAI) in polymer research, based on advances in polymer informatics.
AI-driven design of catalysts, property prediction, and generative modeling in polymer research.
This review also outlines the roles of XAI and generative AI in polymers and offers an insight into molecular descriptors, including string representations in AI models and SHAP (SHapley Additive exPlanations) integrated with ML for polymers. It presents an advanced outlook for researchers on the accelerated discovery and green synthesis of sustainable polymers through AI techniques, focusing on innovations that support sustainability and the development of novel green polymers, to meet the demands of the different evolving markets, such as energy and healthcare. By leveraging computational models, researchers can explore complex relationships amongst molecular structures and material properties, enabling more efficient design and optimization of biopolymer-based materials.
It is worth highlighting that this review offers a comprehensive and unprecedented examination of the cross-cutting integration of artificial intelligence throughout the entire bioplastic design lifecycle, spanning property prediction, molecular synthesis, and experimental validation. Unlike prior works focused narrowly on isolated development stages, this article contextualizes the full pipeline of AI-enabled discovery. Notably, it underscores the transformative relevance of generative and inverse design models—such as PolyID, PolyBART, PolyCL, and transformer-based frameworks—that can generate candidate polymer structures directly from target property profiles, marking a paradigm shift in materials science. This review further highlights the pioneering application of language models to polymer chemistry, demonstrating how representations like SMILES, SELFIES, and PSELFIES, in tandem with large language models, can predict thermal and mechanical properties without reliance on handcrafted descriptors. Importantly, the integration of explainable AI techniques, including SHAP, is discussed in depth, emphasizing their role in ensuring interpretability and fostering adoption in industrial and regulatory contexts. The emerging use of AI for catalyst prediction and design in ROP processes—featuring tools such as the Chemical Markdown Language (CMDL) and data-assisted retrosynthesis—receives dedicated attention, reflecting a rapidly advancing but underexplored research frontier. Additionally, this review provides an extensive survey of open repositories like PI1M, PolyInfo, and the Materials Project, clarifying their critical contributions to the reproducibility and scalability of machine learning models. Finally, this article explores innovative applications of AI-designed bioplastics beyond conventional domains, including electronic skins and nanomedicine, further underscoring the expansive impact and novelty of these methodologies.
Figure 1 reports a mind map of all the AI tools revised in this article that are used in bioplastics design.
This review article is organized as follows: it begins with an overview of the application of AI and ML methodologies in bioplastics design and synthesis and the evolution of polymer informatics. The next section covers ML integrated with SHAP, and the key findings are described. Then, a brief explanation is provided on generative models for the inverse design of polymers and the use of molecular representations of polymeric structures in ML models. As well, PolyID, an AI-based model for bioplastics, is reviewed. The last section focuses on ML in polymerizations and catalyst prediction for ROP to produce bioplastics—a novel and still nascent research area.
2. Evolution of Polymer Informatics in Bioplastics
Polymer informatics, or otherwise AI integration into polymer research, is a rapidly expanding field in industries where sustainable polymers are targeted for specific applications [
5]. The integration of AI-driven computational tools with experimental validation can significantly reduce research costs and time, fostering innovation in polymer science, as reported in
Table 1. However, the application of AI in retrosynthesis, which involves integrating ML models to predict the ease of synthesis of a target polymer and plan its (retro)synthetic steps during the polymerization process, is still at an infancy stage. Historically, the vision of computer-aided polymer design was outlined by Adams and Murray-Rust [
13], who proposed semantic web technologies for representing polymer structures and reactions. This early work laid the foundation for modern polymer informatics platforms by emphasizing the need for standardized, machine-readable representations.
Numerous boosting methods like Gradient Boosting XGBoost AdaBoost, LightGBM, and CatBoost have emerged as powerful tools in computational polymer science. The authors emphasized the relevance of integrating boosting methods in polymer science, especially in property prediction with nonlinear aspects, with high accuracy. The applications are diversified from sustainability to advanced manufacturing and multifunctional materials. An ensemble ML-based framework that combines the boost models reduces computational costs with high precision and interpretability. These frameworks highlight their applicability for bioplastics design [
14]. To support interoperability and data reuse, recent efforts, such as CRIPT (Community Resource for Innovation in Polymer Technology), have introduced extensible ontologies for polymer data. As described by Jackson et al. [
15], CRIPT enables standardized metadata and provenance tracking, which are essential for reproducible AI-driven polymer research. Park et al. [
16] provided a significant platform for AI-driven design of catalysts and polymer design. In this model, machine-understandable polymer representations have been developed with the support of a data-driven approach. These molecular representations, produced through Natural Language Processing (NLP) techniques [
17], are used to manage the large chemical and physical structural variability, as well as the diverse monomeric units in the chain structure of the polymers at multiple scales.
Traditional approaches to material discovery often involve time-consuming experiments and complex trial-and-error methodologies, including catalytic polymerization reactions. A comparative study of AI-integrated and scientific approaches in polymer science is tabulated in
Table 2. Pre-trained models are applied to the polymer data available in data repositories, making it possible to predict a range of structure–property relationships and the activity of the catalysts [
16].
As previously mentioned, bioplastics and, in particular, biodegradable polyesters have gained significant attention due to their wide applicability in industrial domains involving energy and health sectors, with the added benefit of environmental sustainability. The use of AI-based models in polymers addresses the gap in screening promising polymer candidates with specific characteristics. However, the integration of AI provides a transformative solution by enabling data-driven insights, predictive modeling, and accelerated biopolymer screening processes [
18]. Cutting-edge generative AI [
19] and XAI [
20] are now employed for the inverse design of polymers [
21,
22] with desirable and specific properties.
In the case of sustainable applications focusing on bioplastics, very few benchmarking data repositories are available. Recent advances in the open-source data repositories for these materials are explored in this article. Thus, biopolymer informatics paves the way for a sustainable future by enabling the discovery of tailored green polymers and facilitating the easier manufacture of bioplastics that benefits society.
Figure 2 shows an AI-ML pipeline for accelerated discovery of bioplastics.
Applications of Bioplastics
Polymer informatics has rapidly evolved as a multidisciplinary field that integrates AI and ML into polymer science, enabling accelerated discovery, design, and optimization of sustainable materials. In the context of bioplastics, this evolution is particularly impactful due to the urgent need for eco-friendly alternatives to petroleum-based polymers.
Beyond packaging and biomedical applications, bioplastics are now being explored in a wide range of advanced and emerging sectors, including the following:
Healthcare and Biomedical Devices. Biocompatible and bio-mimicable polymers are also used for “electronic skin” sensing attached directly to human skin to obtain the real-time multi-dimensional acquisition of the physiological state of the human body. Epidermal electronic skins, known as artificial e-skins, act as an interface with the human skin surface and are widely studied in the domain of “AI in health care” in the present era [
23]. Very recently, Yang and coworkers [
24] have described a human-friendly biocompatible skin electronic system based on a natural polymer. This material acts as a skin interface for measuring human electrical signals and chemical secretions in the human body. The bio-mimicable nature of the polymer imparts a fast and controllable separation without any skin damage to achieve multi-dimensional real-time measurement. The researchers developed an ML algorithm based on an artificial neural network that can accurately distinguish various physiological states of the human body and provide an effective solution for all-weather health monitoring, responding to measured multi-modal big data [
16]. This novel approach, known as the combination of skin electronics with ML, will be crucial for the future of personalized smart medicine solutions. In this context, biomaterials and bioplastics have a key role in the design and development of the next generation of nanobots able to interact and deliver across the blood-brain barrier without any toxicity, damage, and malfunctions [
25].
Energy Storage and Electronics. Bioplastics are being investigated for use in flexible electronics, supercapacitor membranes, and biodegradable batteries. Their tunable dielectric properties and mechanical flexibility make them suitable for wearable and transient electronics. Polymer informatics aids screening materials with optimal conductivity and thermal stability [
26,
27,
28].
Automotive and Aerospace. The automotive and aerospace industries are increasingly exploring bioplastics as sustainable alternatives to conventional materials. These sectors demand materials with high thermal resistance, mechanical strength, lightweight properties, and recyclability. AI-driven polymer informatics tools, such as PolyID [
29] and CMDL [
16], have enabled the design of bioplastics tailored for these stringent requirements. For instance, generative models and virtual forward synthesis frameworks [
26,
28] have been used to identify bioplastics with performance parity to engineering plastics, while also ensuring synthetic accessibility and environmental sustainability. In the aerospace sector, where weight reduction is critical, bioplastics designed through multitask deep neural networks [
30] and graph-based representations [
31] have shown promise in replacing petroleum-based composites. Moreover, the integration of XAI and uncertainty quantification frameworks, such as POINT [
32], ensures that the designed materials meet safety and performance standards. These advancements underscore the potential of AI-enhanced bioplastic design in revolutionizing material selection for next-generation vehicles and aircraft.
Agriculture and Controlled Release Systems. Bioplastics are used in mulch films, seed coatings, and slow-release fertilizers. ML models help tailor degradation rates and mechanical strength to specific environmental conditions, improving crop yield and reducing plastic pollution in soil [
29,
33,
34].
Construction and 3D Printing. In the construction sector, bioplastics offer potential as sustainable alternatives for insulation, coatings, and structural composites. Machine learning models have been applied to predict mechanical properties, such as tensile strength and modulus, enabling the selection of bioplastics suitable for load-bearing and thermal insulation applications [
2,
30,
35]. The integration of AI-driven design tools facilitates the development of bioplastics with enhanced durability and environmental resistance, supporting green building initiatives. In additive manufacturing, bioplastics like PLA are widely used for 3D printing of architectural models, customized tools, and biocompatible prosthetics. ML algorithms optimize printability, layer adhesion, and structural integrity, enabling rapid prototyping and sustainable fabrication.
Environmental Applications. Bioplastics are being developed for marine packaging, fishing gear, and sensor housings that degrade safely in aquatic environments. Polymer informatics supports the design of materials with controlled hydrolytic degradation and minimal ecotoxicity. The role of bioplastics in the circular economy is underscored by their potential for recyclability and biodegradability. Generative design frameworks and retrosynthesis planning tools have been developed to create bioplastics that are not only high-performing but also easy to recycle or compost [
4]. Polymer informatics platforms, such as PolyID and CMDL, enable the integration of life cycle assessment metrics into the design process, ensuring that new materials align with circular economy principles.
The integration of these applications in the evolution of polymer informatics underlines the versatility and social relevance of bioplastics. The use of AI not only accelerates material discovery but also ensures that bioplastics meet the complex requirements of diverse industries, reinforcing their role in a circular and sustainable economy.
Figure 3 reports a schematic summary of the different applications of AI-guided bioplastics.
3. AI-ML in Polymer Design
The growing need for new smart and eco-friendly polymers poses a challenge to traditional technologies. As mentioned in the previous sections, biopolymer informatics has emerged as a viable strategy, utilizing ML and deep learning (DL) models, combined with high-throughput computing, to generate predictive models for target properties. These models can then be employed in high-throughput screening processes, accelerating the design and discovery of polymers [
19]. Recent advances in polymer representation have introduced Pseudo-Polymer SELFIES (PSELFIES), a robust encoding scheme that enables generative models to explore polymer design space more effectively. Savit et al. [
36] demonstrated that PSELFIES can be integrated into transformer-based architectures to generate novel polymers with targeted properties.
In particular, sustainable polymer design using an ML-based approach follows three key steps. The first one involves identifying the chemical space, achieved by converting the polymeric structures into computationally readable representations using fingerprinting techniques. Next, ML models are applied to predict the properties and are fine-tuned with experimental data to target their performance [
20]. The final step involves developing a benchmarked ML model trained on a dataset and subsequently evaluating it with experimental data. A notable example of this approach is presented by Kuenneth et al. [
30], who developed a multitask deep neural network framework trained on a large dataset of polymer structures and properties. Their model successfully identified bioplastic candidates with performance comparable to conventional plastics, demonstrating the potential of multitask learning for simultaneous optimization of multiple target properties in sustainable polymer design. To address limitations in data availability, Li et al. [
37] developed a data-augmented ML framework for the inverse design of homopolymers with targeted glass transition temperatures. Their approach integrates synthetic data generation to overcome data scarcity, a common limitation in bioplastic catalyst design.
ML methods enable machines to simulate human decision-making and optimization processes. The ML algorithm, trained on extensive datasets, extracts valuable insights from the data to recommend the most suitable polymer candidates that align with the design objectives. Though the ML approach has appeared as an effective predictive tool to facilitate polymer material design through efficient structure–property modeling in the physico-chemical domain, a data-driven approach for complex bioplastics and their synthesis is still unexplored.
The ML framework pipeline for polymer design is shown in
Figure 4. This pipeline consists of several stages, including data collection, preprocessing, featurization, model training and validation, and deployment. This integrated approach facilitates polymer discovery, prediction, and optimization, accelerating research and driving green innovation in industrial polymer production.
In a complementary approach, Atasi et al. [
26] combined machine learning with genetic algorithms to design recyclable polymers with targeted properties. Their framework integrates virtual forward synthesis and sustainability scoring, enabling the generation of over a million candidate structures and identifying promising recyclable alternatives to conventional plastics.
Generative models based on the DL approach for polymer design have attained a significant role in the present era for their ability in “de novo” polymer design. The gap in the literature is the application of these models for the inverse design of polymers. “De novo” polymer design has been recognized as a promising method to expedite the accelerated discovery of novel structural and multifunctional polymers. As such, Tianle Yue et al. [
18] explored “de novo” polymer design using deep generative models like Variational Autoencoder (VAE), Objective-Reinforced Generative Adversarial Networks (ORGAN), Adversarial Autoencoder (AAE), character level recurrent neural networks (Char-GNN), GraphINVENT, and REINVENT. These models were trained on real polymers using reinforcement learning methods targeting the generation of hypothetical extreme temperature-resistant polymers. More recently, Zhou et al. [
38] proposed PolyCL, a contrastive learning framework that leverages both explicit and implicit augmentations to learn robust polymer representations. This method enhances generalization in downstream tasks, such as property prediction and inverse design, particularly in low-data regimes where traditional supervised models may struggle. In a complementary study, Vogel and Weber [
39] developed a generative framework for the inverse design of copolymers that explicitly incorporates stoichiometry and chain architecture. This approach enables the generation of chemically valid and synthetically accessible copolymers with tailored properties, addressing a key limitation in many existing generative models. Additionally, Bilodeau et al. [
40] provide a comprehensive review of recent advances and challenges in generative models for molecular discovery, highlighting their relevance for polymer design and the integration of these models into practical discovery pipelines.
Although polymer informatics is an emerging field with significant potential for accelerated discovery and development, it faces challenges related to validation, despite its success in molecular design. Generative modeling in polymers requires extensive, well-benchmarked data repositories, such as Pl1M, PolyDat, PolyInfo, etc. A key limitation of ML models is the lack of viability in generated structures, leading to the development of frameworks like polymer genome [
41]. An important contribution in this direction is the Open Macromolecular Genome (OMG) introduced by Kim et al. [
42], which provides a large, curated dataset of synthetically accessible polymers compatible with known polymerization reactions. This resource enables generative models to propose realistic polymer candidates while ensuring synthetic feasibility, bridging the gap between virtual design and experimental realization. Predicting a target property in a polymer is typically an interpolative process, involving the computation of molecular fingerprints or “descriptors” followed by correlating these descriptors with machine-readable molecular representations or experimental design variables. A recent contribution in this direction is polyBART, introduced by Savit et al. [
36], which leverages a transformer-based architecture and a novel polymer-specific representation (PSELFIES) to enable both property prediction and generative design of polymers. This model demonstrates the potential of adapting large language models to the polymer domain by treating polymer structures as a chemical language.
Data-driven approaches enable accelerated material discovery through high-throughput ML tools, including for bioplastics. Molecular descriptors used in polymer property prediction rely on static featurization kernels to map chemical spaces. The featurization process is, with “end-to-end” learning, enabling simultaneous feature extraction and prediction, leading to high accuracy in polymer property prediction. A recent work by Aldeghi and Coley [
43] introduced a graph-based representation of molecular ensembles that captures the stochastic nature of polymer structures. Their approach enables accurate property prediction by modeling polymers as distributions over molecular graphs, rather than as single deterministic structures, which is particularly valuable for capturing the diversity inherent in polymer systems. Moreover, Aleb et al. [
44] recently reported a novel transformer model that converts chemical representations of polymers for AI viz. SMILES to glass transition temperatures of polymers. They provided a deeper understanding and exploration of chemical space by relating chemical language and properties. The potential of developing language-to-property transformer models that directly relate monomer structures to property relations across various domains provides a powerful tool for polymer design. To address this challenge, Zhang and Yang [
45] proposed a multimodal architecture that combines molecular structure embeddings with language-based representations. Their model, PolyLLMem, improves property prediction accuracy in low-data regimes, making it particularly suitable for bioplastics with limited experimental datasets. Thus, the open-source repositories, pretrained models, polymer structure representations like SMILES, BigSMILES, SMARTS, p-SMILES, etc. have drastically escalated the ML models in polymer science. A notable example is polyBERT, introduced by Kuenneth and Ramprasad [
46], which adapts transformer-based language models to polymer sequences. This model enables ultrafast property prediction by learning directly from polymer SMILES representations, eliminating the need for handcrafted descriptors and accelerating the design process.
In the fields of energy, sustainability, and medical applications, researchers are advancing in biopolymer informatics with the advent of generative AI and XAI to design novel materials and analyze their structure–property relationships. The design space for bioplastics is vast, needing high-throughput computational screening methods to develop novel polymers and identify efficient synthesis pathways. Mulrennan et al. [
47] combined in-process temperature and pressure data acquired from sensors and near-infrared (NIR) spectroscopic data with multivariate regression models to predict the mechanical properties of an extruded bioresorbable polymer based on PLA, which was used for medical implants and drug delivery. The fusion of NIR and sensor data (conventional properties) is required for a robust model in predicting processing conditions. XAI helps make ML models more interpretable by providing insights into their decision-making processes and bridging the gap between human understanding and model complexity. XAI in bioplastics design and discovery is used to delineate the relationship between reactants, processing conditions, catalysts, and the end properties. Rizwan et al. [
48] have contributed towards incorporating XAI into ML models designed to investigate biomaterials. The SHAP technique is used with XAI to make ML models more efficient. The key insights, objectives, and models in polymer informatics are summarized in
Table 3.
4. PolyID—Artificial Intelligence for Bioplastics
As discussed in the previous section, extensive research and development efforts have been dedicated to creating computational tools capable of performing property modelling and prediction for polymers, ranging from DL models to large language models and transformers. Wilson et al. [
29] introduced PolyID as a domain-specific graph neural network framework designed to predict key polymer properties, such as glass transition temperature, mechanical strength, and biodegradability. The model integrates stereochemical information and has been validated both computationally and experimentally, demonstrating its utility in identifying sustainable alternatives to conventional plastics. PolyID [
24] is the first ML prediction tool to incorporate the stereochemistry of polymers, being a crucial factor in bioplastics design. PolyID, based on a message-passing graph neural network model, was developed at the National Renewable Energy Laboratory (NREL) [
49]. This model achieves the highest accuracy in predicting the mechanical, thermal, barrier, and biodegradability properties of various homopolymers and copolymers. Complementary to PolyID, Gurnani et al. [
50] proposed a scalable multitask graph neural network framework that learns polymer representations directly from repeat units. This scalable approach enables simultaneous prediction of multiple properties and supports high-throughput screening of large polymer libraries. By using monomer structures as input, the model can be integrated with network generation tools to efficiently explore the bioplastics design solution space and identify sustainable alternatives to synthetic polymers. In parallel, Antoniuk et al. [
51] proposed a periodic graph-based representation for polymers, capturing their repeating structural motifs. This approach enables accurate property prediction by modeling the periodicity inherent in polymer chains, complementing PolyID’s stereochemistry-aware framework.
Wilson et al. [
29] developed PolyID, an ML model specifically designed for polymer property prediction. PolyID utilizes an end-to-end learning approach with a multioutput, message-passing neural network, incorporating experimental validation. To ensure the use of relevant training data, an intuitive and interpretable method was developed. This method facilitated the screening of transformer-based polymer models generated bioplastics, leading to the discovery of five potential replacements for poly (ethylene terephthalate) (PET). Additionally, they explored how quantitative structure–property relationship (QSPR) models can be leveraged using the developed message-passing network. Through experimental validation and end-to-end machine learning methods, Wilson demonstrated that the discovery of novel bioplastics can be significantly accelerated [
29]. In parallel, Ma et al. [
32] introduced the POINT benchmark, which integrates uncertainty quantification, interpretability, and synthesizability into polymer property prediction workflows. This framework complements tools like PolyID by providing standardized evaluation protocols that enhance the robustness and reproducibility of AI-driven polymer informatics.
The potential relevance of bioplastics manufacturing with ML is to predict the printability and functionality of clinically relevant applications like tissue engineering, emphasizing the transformative AI in 3D printed bioimplants [
52]. In earlier scenarios, the structures of bioplastics and their representations were extraordinarily complex and diverse, where linear regression models were unable to predict actual results. Conventional ML methods, like linear regression, were unable to capture the nonlinear variations that characterize the polymers. But nowadays, simulation model integration with physical and chemical properties can capture nonlinear behavior. In biochemical processes, labeled data are often scarce. However, techniques such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) can generate synthetic data from small experimental datasets, helping to mitigate overfitting. Algorithms, such as neural networks, Support Vector Machines, and Random Forests, are employed to manage data with complex molecular structural interactions, thereby improving the prediction of bioplastics properties.
In
Table 4, a list of the different data repositories available for polymers is reported.
Didac Marti et al. [
53] predicted the glass transition temperature of biopolymers with molecular dynamics simulation and ML aspects. It was shown that molecular dynamics simulations were converging to experimental behavior for a series of characteristics. They also explained the difficulty in compiling data from heterogeneous sources for the ML model. The combination of both molecular dynamics (MD) simulations and ML models in the context of bioplastics design was developed. They compiled a dataset of 58 homopolymers and then implemented a polymer simulation. The comparison of transition temperatures obtained from simulations is related to experimental observations and found that a similar trend was reproduced.
5. AI-Driven Design of Catalysts for ROP of Bioplastics
The synthesis of novel bioplastics with targeted properties and the optimization of their production for environmental sustainability with the AI integration opens a new way of innovation in bioplastic synthesis-based informatics. The integration of machine learning approaches with experiments involving ring-opening polymerization (ROP) offers the potential for significant reductions in research timelines for the synthesis of novel bioplastics. The reinforcement of ML in bioplastics discovery provides more precise control with better consistency over the entire stage of production. ML-infused bioplastic discovery can also ensure the quality by predicting notifiable deviations in processing conditions, leading to quality control with the aid of predictive models [
52].
Park et al. [
16] implemented Chemical Markdown Language (CMDL) within the IBM Materials Notebook, which provides an extensible and flexible platform to represent and merge disparate experimental datatypes. CMDL provides the utility of historical data from experiments to fine-tune RT (Regression Transformer) models to generate molecular designs. The datasets developed through CMDL helped the progress of RT models for the effective design of ROP catalysts, as well as the microstructure of the polymers, including co-polymers, validating the architecture. The CMDL tuned model has functional groups in the polymer chemical structure for experimental verification. This CMDL approach paved the way to a predictive generative model in ROP. Complementary to CMDL, SMiPoly offers a rule-based approach to generate synthesizable polymer libraries, facilitating the exploration of reaction pathways and retrosynthetic planning [
54]. In parallel, Chen et al. [
55] developed a data-assisted retrosynthesis planning framework tailored for polymers. Their approach leverages curated reaction templates and similarity-based matching to propose feasible synthetic routes, offering a valuable tool for guiding AI-driven polymer design and synthesis.
Polymerization predictions using a transfer learning approach and chemical language models were conducted by Brenda and co-workers [
56]. However, computational polymer discovery still lacks comprehensive data-driven analyses of reaction pathways and stability assessments through retrosynthesis. The researchers envisaged the utility of transformer-based chemical language models to predict polymerization tasks. The researchers used a transfer learning approach to predict forward and retrosynthesis reactions for vinyl copolymers.
The prediction of polymerization reactions is only validated when the specific atoms involved in the bonding between monomers throughout the polymerization steps are accurately identified. However, modeling these atomic linkages remains an unresolved challenge in the current computational literature. Another obstacle in AI-infused polymer discovery is the determination of the synthesizability of thermodynamically stable polymers. In ML-infused design to predict polymerization reactions, it is required to identify label head and tail linkages of repeated monomers. For this scenario, Ferrari et al. [
56] have adopted two distinct strategies. As a first approach, they have adopted the M2P (Monomers to Polymers) tool, and as a second approach, a Python 3.14.x tool for Head-Tail Assignment, known as HTA, was adopted. It was elucidated with 100% accuracy for identifying members of the polyvinyl class for the HTA algorithm.
5.1. Molecular Transformer
A molecular transformer casts polymerization reaction prediction as a language modeling task based on molecular representations of reactants in the form of extensive descriptors (SMILES strings). The architecture delineates the polymerization reactions and the prediction of retrosynthesis by fine-tuning the pretrained models. This paves the way for the transfer learning approach, and the models are trained based on textual representations of molecules. Chemical reactions or polymerizations are encoded as reaction SMILES representations through NLP techniques. The reaction SMILES were for reactants, reagents, solvents, and products, as well as catalysts.
This molecular transformer model approach is based on an extended version of the transformer-infused language model based on polymers. The adaptability of molecular language transformer-based models in retrosynthesis is a niche area of computational tool development for automated reaction pathways for biopolymers. The limitations of molecular transformer models in retrosynthesis are the choice of polymerization categories and the size of the training data that are available for building the prediction model. The extension of this model to diverse polymer classes in the transfer learning stage will enhance the accuracy of prediction outcomes.
Recent developments, such as TransPolymer, have extended transformer-based architectures specifically for polymer property prediction. Xu et al. [
57] introduced a chemically aware tokenizer and demonstrated that pretraining on polymer-specific sequences significantly improves the accuracy of retrosynthetic and forward reaction predictions, making it a valuable complement to general-purpose molecular transformers.
5.2. Translation Task in ROP
Forward and retro-reaction predictions were modeled as a translation task, converting reactant-reagent SMILES strings into product SMILES. In the context of ring-opening polymerization (ROP), the SMILES strings are divided into a source—comprising the reactants and reagents—and a target, representing the products. Dataset splitting is based on target variations for the same source input. According to Ferrari’s observations, since the HTA and M2P datasets yield diverse outcomes for identical source instances, the datasets were split into 90% for training, 5% for validation, and 5% for testing [
56]. In a related effort, Schwaller et al. [
58] proposed a hypergraph-based retrosynthesis planner that integrates transformer models with multistep reaction prediction. Although initially developed for small molecules, this architecture has been adapted for polymerization pathways, offering improved accuracy in predicting feasible synthetic routes for complex monomer systems.
6. Conclusions
This review presents a state-of-the-art overview of recent advances in artificial intelligence, particularly machine learning and deep learning, for the accelerated discovery of bioplastics. Biopolymer informatics emphasizes the transformative potential of data-driven approaches in the development of sustainable polymers.
We highlight PolyID, a machine learning-based message-passing graph neural network tool that provides a robust framework for the accelerated discovery of bioplastics, supported by benchmarked data repositories. This review details AI techniques for predicting structure–property relationships, optimizing synthesis pathways, and enhancing the development of bioplastics.
Furthermore, ML-driven methods in catalyst design and polymerization processes for bioplastics are discussed, offering a valuable resource for emerging researchers and academicians in the field. These advancements are facilitated by the growing availability of publicly accessible databases and open-source polymer informatics tools.
This review serves as a comprehensive reference, providing critical insights into current AI models for polymerization, molecular descriptor generation, structure–property prediction, and the use of benchmarked datasets in the context of sustainable polymers. Overall, this study underscores the significant potential of AI in advancing the design and discovery of green polymers, contributing to a more sustainable future. AI innovation in bioplastics is gaining increasing attention and represents a crucial step forward in addressing societal and environmental challenges through advanced materials science.