Machine Learning for the Optimization of the Bioplastics Design

Ashok, Neelesh; Garcia-Diaz, Pilar; Mosquera, Marta E. G.; Sessini, Valentina

doi:10.3390/macromol5030038

Open AccessReview

Machine Learning for the Optimization of the Bioplastics Design

¹

Amrita School of Artificial Intelligence, Amrita Vishwa Vidyapeetham, Coimbatore 641112, India

²

Departamento de Teoría de la Señal y Comunicaciones, Escuela Politécnica Superior, Universidad de Alcalá, Campus Universitario, 28871 Alcalá de Henares, Madrid, Spain

³

Departamento de Química Orgánica y Química Inorgánica, Instituto de Investigación en Química “Andrés M. del Río” (IQAR), Universidad de Alcalá, Campus Universitario, 28871 Alcalá de Henares, Madrid, Spain

^*

Authors to whom correspondence should be addressed.

Macromol 2025, 5(3), 38; https://doi.org/10.3390/macromol5030038

Submission received: 12 June 2025 / Revised: 15 July 2025 / Accepted: 10 August 2025 / Published: 14 August 2025

(This article belongs to the Special Issue Sustainable Processes to Multifunctional Bioplastics and Biocomposites)

Download

Browse Figures

Versions Notes

Abstract

Biodegradable polyesters have gained attention due to their sustainability benefits, considering the escalating environmental challenges posed by synthetic polymers. Advances in artificial intelligence (AI), including machine learning (ML) and deep learning (DL), are expected to significantly accelerate research in polymer science. This review article explores “bio” polymer informatics by harnessing insights from the AI techniques used to predict structure–property relationships and to optimize the synthesis of bioplastics. This review also discusses PolyID, a machine learning-based tool that employs message-passing graph neural networks to provide a framework capable of accelerating the discovery of bioplastics. An extensive literature review is conducted on explainable AI (XAI) and generative AI techniques, as well as on benchmarking data repositories in polymer science. The current state-of-the art in ML methods for ring-opening polymerizations and the synthesizability of biodegradable polyesters is also presented. This review offers an in-depth insight and comprehensive knowledge of current AI-based models for polymerizations, molecular descriptors, structure–property relationships, predictive modeling, and open-source benchmarked datasets for sustainable polymers. This study serves as a reference and provides critical insights into the capabilities of AI for the accelerated design and discovery of green polymers aimed at achieving a sustainable future.

Keywords:

bioplastics; polyesters; polymer informatics; machine learning; generative AI; XAI

1. Introduction

The emergence of bioplastics [1] represents a significant step toward sustainable and environmentally friendly materials [2]. In the present decade, sustainable development goals addressing pollution and fossil fuel depletion have become a promising area of research. Computational methods, such as machine learning [3] and molecular dynamics simulations [4], are accelerating the discovery and optimization of bioplastics by over a decade [5]. Recent reviews have emphasized the transformative role of AI in accelerating the discovery of sustainable polymers. Tran et al. [6] highlighted how AI-driven frameworks are enabling the design of functional and recyclable polymers by integrating property prediction, synthesis planning, and sustainability metrics into a unified pipeline. In addition to traditional ML approaches, large language models (LLMs) have recently been adapted to polymer science. Gupta et al. [7] demonstrated that fine-tuned LLMs can predict key thermal properties of polymers using natural language inputs, offering a new paradigm for polymer informatics that reduces the need for handcrafted descriptors.

The potential of bioplastics to reduce the carbon footprint and foster a circular economy is substantial, despite challenges such as higher costs. The growing demand for sustainable plastics has intensified research in bioplastics, particularly in biodegradable polyesters produced via ring-opening polymerization (ROP) of lactones and macrolactones [8]. Beyond conventional polyesters, recent work by Deng et al. [9] introduced a multilevel descriptor framework for high-throughput screening of elastomers. This approach enables the design of polymers with tailored mechanical properties, which is particularly relevant for bioplastics intended for flexible or biomedical applications.

Biodegradable polyesters, including polyhydroxybutyrate (PHB), polylactic acid (PLA) [10], polycaprolactone (PCL), and polyhydroxyalkanoates (PHAs), are emerging as promising materials for environmental sustainability. Their applications span packaging, energy, and biomedical sectors, contributing to sustainability goals [11].

Artificial intelligence (AI) infused with machine learning (ML) models presents an unparalleled approach for navigating complex polymeric structure–property relationships and optimizing polymer design, thereby accelerating bioplastics discovery. The applications of AI span from the polymerization process (synthesis) to property prediction (discovery) in polymer science. ML applications for polymeric materials had been explored by Malashin et al. [12], who discussed the influence of ML approaches on data analysis, predictive modeling, and polymer design. As such, ML algorithms can predict the properties of untested polymers and guide optimal design based on the existing ones. Several studies have developed ML models trained on large datasets containing experimental data related to parameters such as polymer branching, melting temperature, glass transition temperature, molecular weight, polydispersity, and processing conditions, all of which govern polymer properties. Hence, ML is emerging as a powerful tool for optimizing bioplastic synthesis and predicting polymer properties. ML is transforming bioplastics production by aiding in selecting optimal catalysts and biomass feedstocks using data-driven approaches that minimize trial-and-error experimentation. ML techniques have dynamically advanced the evolving landscape of green polymer discovery for diverse applications in packaging, energy, and healthcare, supporting sustainable development goals.

However, the accelerated discovery of sustainable polymers through AI-based models remains underexplored in the scientific literature. This review addresses gaps in the state-of-the-art literature by including case studies and key findings, and discusses the following major aspects of bio-based polymer informatics:

ML for efficient bioplastic synthesis, focusing on retrosynthesis and catalyst design for the ROP process.
The profound impact of polymer informatics on understanding structure–property relationships in bioplastics, with insights from open-source data repositories.
The role of generative AI, molecular descriptors, and explainable AI (XAI) in polymer research, based on advances in polymer informatics.
AI-driven design of catalysts, property prediction, and generative modeling in polymer research.

This review also outlines the roles of XAI and generative AI in polymers and offers an insight into molecular descriptors, including string representations in AI models and SHAP (SHapley Additive exPlanations) integrated with ML for polymers. It presents an advanced outlook for researchers on the accelerated discovery and green synthesis of sustainable polymers through AI techniques, focusing on innovations that support sustainability and the development of novel green polymers, to meet the demands of the different evolving markets, such as energy and healthcare. By leveraging computational models, researchers can explore complex relationships amongst molecular structures and material properties, enabling more efficient design and optimization of biopolymer-based materials.

It is worth highlighting that this review offers a comprehensive and unprecedented examination of the cross-cutting integration of artificial intelligence throughout the entire bioplastic design lifecycle, spanning property prediction, molecular synthesis, and experimental validation. Unlike prior works focused narrowly on isolated development stages, this article contextualizes the full pipeline of AI-enabled discovery. Notably, it underscores the transformative relevance of generative and inverse design models—such as PolyID, PolyBART, PolyCL, and transformer-based frameworks—that can generate candidate polymer structures directly from target property profiles, marking a paradigm shift in materials science. This review further highlights the pioneering application of language models to polymer chemistry, demonstrating how representations like SMILES, SELFIES, and PSELFIES, in tandem with large language models, can predict thermal and mechanical properties without reliance on handcrafted descriptors. Importantly, the integration of explainable AI techniques, including SHAP, is discussed in depth, emphasizing their role in ensuring interpretability and fostering adoption in industrial and regulatory contexts. The emerging use of AI for catalyst prediction and design in ROP processes—featuring tools such as the Chemical Markdown Language (CMDL) and data-assisted retrosynthesis—receives dedicated attention, reflecting a rapidly advancing but underexplored research frontier. Additionally, this review provides an extensive survey of open repositories like PI1M, PolyInfo, and the Materials Project, clarifying their critical contributions to the reproducibility and scalability of machine learning models. Finally, this article explores innovative applications of AI-designed bioplastics beyond conventional domains, including electronic skins and nanomedicine, further underscoring the expansive impact and novelty of these methodologies. Figure 1 reports a mind map of all the AI tools revised in this article that are used in bioplastics design.

This review article is organized as follows: it begins with an overview of the application of AI and ML methodologies in bioplastics design and synthesis and the evolution of polymer informatics. The next section covers ML integrated with SHAP, and the key findings are described. Then, a brief explanation is provided on generative models for the inverse design of polymers and the use of molecular representations of polymeric structures in ML models. As well, PolyID, an AI-based model for bioplastics, is reviewed. The last section focuses on ML in polymerizations and catalyst prediction for ROP to produce bioplastics—a novel and still nascent research area.

2. Evolution of Polymer Informatics in Bioplastics

Polymer informatics, or otherwise AI integration into polymer research, is a rapidly expanding field in industries where sustainable polymers are targeted for specific applications [5]. The integration of AI-driven computational tools with experimental validation can significantly reduce research costs and time, fostering innovation in polymer science, as reported in Table 1. However, the application of AI in retrosynthesis, which involves integrating ML models to predict the ease of synthesis of a target polymer and plan its (retro)synthetic steps during the polymerization process, is still at an infancy stage. Historically, the vision of computer-aided polymer design was outlined by Adams and Murray-Rust [13], who proposed semantic web technologies for representing polymer structures and reactions. This early work laid the foundation for modern polymer informatics platforms by emphasizing the need for standardized, machine-readable representations.

Numerous boosting methods like Gradient Boosting XGBoost AdaBoost, LightGBM, and CatBoost have emerged as powerful tools in computational polymer science. The authors emphasized the relevance of integrating boosting methods in polymer science, especially in property prediction with nonlinear aspects, with high accuracy. The applications are diversified from sustainability to advanced manufacturing and multifunctional materials. An ensemble ML-based framework that combines the boost models reduces computational costs with high precision and interpretability. These frameworks highlight their applicability for bioplastics design [14]. To support interoperability and data reuse, recent efforts, such as CRIPT (Community Resource for Innovation in Polymer Technology), have introduced extensible ontologies for polymer data. As described by Jackson et al. [15], CRIPT enables standardized metadata and provenance tracking, which are essential for reproducible AI-driven polymer research. Park et al. [16] provided a significant platform for AI-driven design of catalysts and polymer design. In this model, machine-understandable polymer representations have been developed with the support of a data-driven approach. These molecular representations, produced through Natural Language Processing (NLP) techniques [17], are used to manage the large chemical and physical structural variability, as well as the diverse monomeric units in the chain structure of the polymers at multiple scales.

Traditional approaches to material discovery often involve time-consuming experiments and complex trial-and-error methodologies, including catalytic polymerization reactions. A comparative study of AI-integrated and scientific approaches in polymer science is tabulated in Table 2. Pre-trained models are applied to the polymer data available in data repositories, making it possible to predict a range of structure–property relationships and the activity of the catalysts [16].

As previously mentioned, bioplastics and, in particular, biodegradable polyesters have gained significant attention due to their wide applicability in industrial domains involving energy and health sectors, with the added benefit of environmental sustainability. The use of AI-based models in polymers addresses the gap in screening promising polymer candidates with specific characteristics. However, the integration of AI provides a transformative solution by enabling data-driven insights, predictive modeling, and accelerated biopolymer screening processes [18]. Cutting-edge generative AI [19] and XAI [20] are now employed for the inverse design of polymers [21,22] with desirable and specific properties.

In the case of sustainable applications focusing on bioplastics, very few benchmarking data repositories are available. Recent advances in the open-source data repositories for these materials are explored in this article. Thus, biopolymer informatics paves the way for a sustainable future by enabling the discovery of tailored green polymers and facilitating the easier manufacture of bioplastics that benefits society. Figure 2 shows an AI-ML pipeline for accelerated discovery of bioplastics.

Applications of Bioplastics

Polymer informatics has rapidly evolved as a multidisciplinary field that integrates AI and ML into polymer science, enabling accelerated discovery, design, and optimization of sustainable materials. In the context of bioplastics, this evolution is particularly impactful due to the urgent need for eco-friendly alternatives to petroleum-based polymers.

Beyond packaging and biomedical applications, bioplastics are now being explored in a wide range of advanced and emerging sectors, including the following:

Healthcare and Biomedical Devices. Biocompatible and bio-mimicable polymers are also used for “electronic skin” sensing attached directly to human skin to obtain the real-time multi-dimensional acquisition of the physiological state of the human body. Epidermal electronic skins, known as artificial e-skins, act as an interface with the human skin surface and are widely studied in the domain of “AI in health care” in the present era [23]. Very recently, Yang and coworkers [24] have described a human-friendly biocompatible skin electronic system based on a natural polymer. This material acts as a skin interface for measuring human electrical signals and chemical secretions in the human body. The bio-mimicable nature of the polymer imparts a fast and controllable separation without any skin damage to achieve multi-dimensional real-time measurement. The researchers developed an ML algorithm based on an artificial neural network that can accurately distinguish various physiological states of the human body and provide an effective solution for all-weather health monitoring, responding to measured multi-modal big data [16]. This novel approach, known as the combination of skin electronics with ML, will be crucial for the future of personalized smart medicine solutions. In this context, biomaterials and bioplastics have a key role in the design and development of the next generation of nanobots able to interact and deliver across the blood-brain barrier without any toxicity, damage, and malfunctions [25].

Energy Storage and Electronics. Bioplastics are being investigated for use in flexible electronics, supercapacitor membranes, and biodegradable batteries. Their tunable dielectric properties and mechanical flexibility make them suitable for wearable and transient electronics. Polymer informatics aids screening materials with optimal conductivity and thermal stability [26,27,28].

Automotive and Aerospace. The automotive and aerospace industries are increasingly exploring bioplastics as sustainable alternatives to conventional materials. These sectors demand materials with high thermal resistance, mechanical strength, lightweight properties, and recyclability. AI-driven polymer informatics tools, such as PolyID [29] and CMDL [16], have enabled the design of bioplastics tailored for these stringent requirements. For instance, generative models and virtual forward synthesis frameworks [26,28] have been used to identify bioplastics with performance parity to engineering plastics, while also ensuring synthetic accessibility and environmental sustainability. In the aerospace sector, where weight reduction is critical, bioplastics designed through multitask deep neural networks [30] and graph-based representations [31] have shown promise in replacing petroleum-based composites. Moreover, the integration of XAI and uncertainty quantification frameworks, such as POINT [32], ensures that the designed materials meet safety and performance standards. These advancements underscore the potential of AI-enhanced bioplastic design in revolutionizing material selection for next-generation vehicles and aircraft.

Agriculture and Controlled Release Systems. Bioplastics are used in mulch films, seed coatings, and slow-release fertilizers. ML models help tailor degradation rates and mechanical strength to specific environmental conditions, improving crop yield and reducing plastic pollution in soil [29,33,34].

Construction and 3D Printing. In the construction sector, bioplastics offer potential as sustainable alternatives for insulation, coatings, and structural composites. Machine learning models have been applied to predict mechanical properties, such as tensile strength and modulus, enabling the selection of bioplastics suitable for load-bearing and thermal insulation applications [2,30,35]. The integration of AI-driven design tools facilitates the development of bioplastics with enhanced durability and environmental resistance, supporting green building initiatives. In additive manufacturing, bioplastics like PLA are widely used for 3D printing of architectural models, customized tools, and biocompatible prosthetics. ML algorithms optimize printability, layer adhesion, and structural integrity, enabling rapid prototyping and sustainable fabrication.

Environmental Applications. Bioplastics are being developed for marine packaging, fishing gear, and sensor housings that degrade safely in aquatic environments. Polymer informatics supports the design of materials with controlled hydrolytic degradation and minimal ecotoxicity. The role of bioplastics in the circular economy is underscored by their potential for recyclability and biodegradability. Generative design frameworks and retrosynthesis planning tools have been developed to create bioplastics that are not only high-performing but also easy to recycle or compost [4]. Polymer informatics platforms, such as PolyID and CMDL, enable the integration of life cycle assessment metrics into the design process, ensuring that new materials align with circular economy principles.

The integration of these applications in the evolution of polymer informatics underlines the versatility and social relevance of bioplastics. The use of AI not only accelerates material discovery but also ensures that bioplastics meet the complex requirements of diverse industries, reinforcing their role in a circular and sustainable economy. Figure 3 reports a schematic summary of the different applications of AI-guided bioplastics.

3. AI-ML in Polymer Design

The growing need for new smart and eco-friendly polymers poses a challenge to traditional technologies. As mentioned in the previous sections, biopolymer informatics has emerged as a viable strategy, utilizing ML and deep learning (DL) models, combined with high-throughput computing, to generate predictive models for target properties. These models can then be employed in high-throughput screening processes, accelerating the design and discovery of polymers [19]. Recent advances in polymer representation have introduced Pseudo-Polymer SELFIES (PSELFIES), a robust encoding scheme that enables generative models to explore polymer design space more effectively. Savit et al. [36] demonstrated that PSELFIES can be integrated into transformer-based architectures to generate novel polymers with targeted properties.

In particular, sustainable polymer design using an ML-based approach follows three key steps. The first one involves identifying the chemical space, achieved by converting the polymeric structures into computationally readable representations using fingerprinting techniques. Next, ML models are applied to predict the properties and are fine-tuned with experimental data to target their performance [20]. The final step involves developing a benchmarked ML model trained on a dataset and subsequently evaluating it with experimental data. A notable example of this approach is presented by Kuenneth et al. [30], who developed a multitask deep neural network framework trained on a large dataset of polymer structures and properties. Their model successfully identified bioplastic candidates with performance comparable to conventional plastics, demonstrating the potential of multitask learning for simultaneous optimization of multiple target properties in sustainable polymer design. To address limitations in data availability, Li et al. [37] developed a data-augmented ML framework for the inverse design of homopolymers with targeted glass transition temperatures. Their approach integrates synthetic data generation to overcome data scarcity, a common limitation in bioplastic catalyst design.

ML methods enable machines to simulate human decision-making and optimization processes. The ML algorithm, trained on extensive datasets, extracts valuable insights from the data to recommend the most suitable polymer candidates that align with the design objectives. Though the ML approach has appeared as an effective predictive tool to facilitate polymer material design through efficient structure–property modeling in the physico-chemical domain, a data-driven approach for complex bioplastics and their synthesis is still unexplored.

The ML framework pipeline for polymer design is shown in Figure 4. This pipeline consists of several stages, including data collection, preprocessing, featurization, model training and validation, and deployment. This integrated approach facilitates polymer discovery, prediction, and optimization, accelerating research and driving green innovation in industrial polymer production.

In a complementary approach, Atasi et al. [26] combined machine learning with genetic algorithms to design recyclable polymers with targeted properties. Their framework integrates virtual forward synthesis and sustainability scoring, enabling the generation of over a million candidate structures and identifying promising recyclable alternatives to conventional plastics.

Generative models based on the DL approach for polymer design have attained a significant role in the present era for their ability in “de novo” polymer design. The gap in the literature is the application of these models for the inverse design of polymers. “De novo” polymer design has been recognized as a promising method to expedite the accelerated discovery of novel structural and multifunctional polymers. As such, Tianle Yue et al. [18] explored “de novo” polymer design using deep generative models like Variational Autoencoder (VAE), Objective-Reinforced Generative Adversarial Networks (ORGAN), Adversarial Autoencoder (AAE), character level recurrent neural networks (Char-GNN), GraphINVENT, and REINVENT. These models were trained on real polymers using reinforcement learning methods targeting the generation of hypothetical extreme temperature-resistant polymers. More recently, Zhou et al. [38] proposed PolyCL, a contrastive learning framework that leverages both explicit and implicit augmentations to learn robust polymer representations. This method enhances generalization in downstream tasks, such as property prediction and inverse design, particularly in low-data regimes where traditional supervised models may struggle. In a complementary study, Vogel and Weber [39] developed a generative framework for the inverse design of copolymers that explicitly incorporates stoichiometry and chain architecture. This approach enables the generation of chemically valid and synthetically accessible copolymers with tailored properties, addressing a key limitation in many existing generative models. Additionally, Bilodeau et al. [40] provide a comprehensive review of recent advances and challenges in generative models for molecular discovery, highlighting their relevance for polymer design and the integration of these models into practical discovery pipelines.

Although polymer informatics is an emerging field with significant potential for accelerated discovery and development, it faces challenges related to validation, despite its success in molecular design. Generative modeling in polymers requires extensive, well-benchmarked data repositories, such as Pl1M, PolyDat, PolyInfo, etc. A key limitation of ML models is the lack of viability in generated structures, leading to the development of frameworks like polymer genome [41]. An important contribution in this direction is the Open Macromolecular Genome (OMG) introduced by Kim et al. [42], which provides a large, curated dataset of synthetically accessible polymers compatible with known polymerization reactions. This resource enables generative models to propose realistic polymer candidates while ensuring synthetic feasibility, bridging the gap between virtual design and experimental realization. Predicting a target property in a polymer is typically an interpolative process, involving the computation of molecular fingerprints or “descriptors” followed by correlating these descriptors with machine-readable molecular representations or experimental design variables. A recent contribution in this direction is polyBART, introduced by Savit et al. [36], which leverages a transformer-based architecture and a novel polymer-specific representation (PSELFIES) to enable both property prediction and generative design of polymers. This model demonstrates the potential of adapting large language models to the polymer domain by treating polymer structures as a chemical language.

Data-driven approaches enable accelerated material discovery through high-throughput ML tools, including for bioplastics. Molecular descriptors used in polymer property prediction rely on static featurization kernels to map chemical spaces. The featurization process is, with “end-to-end” learning, enabling simultaneous feature extraction and prediction, leading to high accuracy in polymer property prediction. A recent work by Aldeghi and Coley [43] introduced a graph-based representation of molecular ensembles that captures the stochastic nature of polymer structures. Their approach enables accurate property prediction by modeling polymers as distributions over molecular graphs, rather than as single deterministic structures, which is particularly valuable for capturing the diversity inherent in polymer systems. Moreover, Aleb et al. [44] recently reported a novel transformer model that converts chemical representations of polymers for AI viz. SMILES to glass transition temperatures of polymers. They provided a deeper understanding and exploration of chemical space by relating chemical language and properties. The potential of developing language-to-property transformer models that directly relate monomer structures to property relations across various domains provides a powerful tool for polymer design. To address this challenge, Zhang and Yang [45] proposed a multimodal architecture that combines molecular structure embeddings with language-based representations. Their model, PolyLLMem, improves property prediction accuracy in low-data regimes, making it particularly suitable for bioplastics with limited experimental datasets. Thus, the open-source repositories, pretrained models, polymer structure representations like SMILES, BigSMILES, SMARTS, p-SMILES, etc. have drastically escalated the ML models in polymer science. A notable example is polyBERT, introduced by Kuenneth and Ramprasad [46], which adapts transformer-based language models to polymer sequences. This model enables ultrafast property prediction by learning directly from polymer SMILES representations, eliminating the need for handcrafted descriptors and accelerating the design process.

In the fields of energy, sustainability, and medical applications, researchers are advancing in biopolymer informatics with the advent of generative AI and XAI to design novel materials and analyze their structure–property relationships. The design space for bioplastics is vast, needing high-throughput computational screening methods to develop novel polymers and identify efficient synthesis pathways. Mulrennan et al. [47] combined in-process temperature and pressure data acquired from sensors and near-infrared (NIR) spectroscopic data with multivariate regression models to predict the mechanical properties of an extruded bioresorbable polymer based on PLA, which was used for medical implants and drug delivery. The fusion of NIR and sensor data (conventional properties) is required for a robust model in predicting processing conditions. XAI helps make ML models more interpretable by providing insights into their decision-making processes and bridging the gap between human understanding and model complexity. XAI in bioplastics design and discovery is used to delineate the relationship between reactants, processing conditions, catalysts, and the end properties. Rizwan et al. [48] have contributed towards incorporating XAI into ML models designed to investigate biomaterials. The SHAP technique is used with XAI to make ML models more efficient. The key insights, objectives, and models in polymer informatics are summarized in Table 3.

4. PolyID—Artificial Intelligence for Bioplastics

As discussed in the previous section, extensive research and development efforts have been dedicated to creating computational tools capable of performing property modelling and prediction for polymers, ranging from DL models to large language models and transformers. Wilson et al. [29] introduced PolyID as a domain-specific graph neural network framework designed to predict key polymer properties, such as glass transition temperature, mechanical strength, and biodegradability. The model integrates stereochemical information and has been validated both computationally and experimentally, demonstrating its utility in identifying sustainable alternatives to conventional plastics. PolyID [24] is the first ML prediction tool to incorporate the stereochemistry of polymers, being a crucial factor in bioplastics design. PolyID, based on a message-passing graph neural network model, was developed at the National Renewable Energy Laboratory (NREL) [49]. This model achieves the highest accuracy in predicting the mechanical, thermal, barrier, and biodegradability properties of various homopolymers and copolymers. Complementary to PolyID, Gurnani et al. [50] proposed a scalable multitask graph neural network framework that learns polymer representations directly from repeat units. This scalable approach enables simultaneous prediction of multiple properties and supports high-throughput screening of large polymer libraries. By using monomer structures as input, the model can be integrated with network generation tools to efficiently explore the bioplastics design solution space and identify sustainable alternatives to synthetic polymers. In parallel, Antoniuk et al. [51] proposed a periodic graph-based representation for polymers, capturing their repeating structural motifs. This approach enables accurate property prediction by modeling the periodicity inherent in polymer chains, complementing PolyID’s stereochemistry-aware framework.

Wilson et al. [29] developed PolyID, an ML model specifically designed for polymer property prediction. PolyID utilizes an end-to-end learning approach with a multioutput, message-passing neural network, incorporating experimental validation. To ensure the use of relevant training data, an intuitive and interpretable method was developed. This method facilitated the screening of transformer-based polymer models generated bioplastics, leading to the discovery of five potential replacements for poly (ethylene terephthalate) (PET). Additionally, they explored how quantitative structure–property relationship (QSPR) models can be leveraged using the developed message-passing network. Through experimental validation and end-to-end machine learning methods, Wilson demonstrated that the discovery of novel bioplastics can be significantly accelerated [29]. In parallel, Ma et al. [32] introduced the POINT benchmark, which integrates uncertainty quantification, interpretability, and synthesizability into polymer property prediction workflows. This framework complements tools like PolyID by providing standardized evaluation protocols that enhance the robustness and reproducibility of AI-driven polymer informatics.

The potential relevance of bioplastics manufacturing with ML is to predict the printability and functionality of clinically relevant applications like tissue engineering, emphasizing the transformative AI in 3D printed bioimplants [52]. In earlier scenarios, the structures of bioplastics and their representations were extraordinarily complex and diverse, where linear regression models were unable to predict actual results. Conventional ML methods, like linear regression, were unable to capture the nonlinear variations that characterize the polymers. But nowadays, simulation model integration with physical and chemical properties can capture nonlinear behavior. In biochemical processes, labeled data are often scarce. However, techniques such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) can generate synthetic data from small experimental datasets, helping to mitigate overfitting. Algorithms, such as neural networks, Support Vector Machines, and Random Forests, are employed to manage data with complex molecular structural interactions, thereby improving the prediction of bioplastics properties.

In Table 4, a list of the different data repositories available for polymers is reported.

Didac Marti et al. [53] predicted the glass transition temperature of biopolymers with molecular dynamics simulation and ML aspects. It was shown that molecular dynamics simulations were converging to experimental behavior for a series of characteristics. They also explained the difficulty in compiling data from heterogeneous sources for the ML model. The combination of both molecular dynamics (MD) simulations and ML models in the context of bioplastics design was developed. They compiled a dataset of 58 homopolymers and then implemented a polymer simulation. The comparison of transition temperatures obtained from simulations is related to experimental observations and found that a similar trend was reproduced.

5. AI-Driven Design of Catalysts for ROP of Bioplastics

The synthesis of novel bioplastics with targeted properties and the optimization of their production for environmental sustainability with the AI integration opens a new way of innovation in bioplastic synthesis-based informatics. The integration of machine learning approaches with experiments involving ring-opening polymerization (ROP) offers the potential for significant reductions in research timelines for the synthesis of novel bioplastics. The reinforcement of ML in bioplastics discovery provides more precise control with better consistency over the entire stage of production. ML-infused bioplastic discovery can also ensure the quality by predicting notifiable deviations in processing conditions, leading to quality control with the aid of predictive models [52].

Park et al. [16] implemented Chemical Markdown Language (CMDL) within the IBM Materials Notebook, which provides an extensible and flexible platform to represent and merge disparate experimental datatypes. CMDL provides the utility of historical data from experiments to fine-tune RT (Regression Transformer) models to generate molecular designs. The datasets developed through CMDL helped the progress of RT models for the effective design of ROP catalysts, as well as the microstructure of the polymers, including co-polymers, validating the architecture. The CMDL tuned model has functional groups in the polymer chemical structure for experimental verification. This CMDL approach paved the way to a predictive generative model in ROP. Complementary to CMDL, SMiPoly offers a rule-based approach to generate synthesizable polymer libraries, facilitating the exploration of reaction pathways and retrosynthetic planning [54]. In parallel, Chen et al. [55] developed a data-assisted retrosynthesis planning framework tailored for polymers. Their approach leverages curated reaction templates and similarity-based matching to propose feasible synthetic routes, offering a valuable tool for guiding AI-driven polymer design and synthesis.

Polymerization predictions using a transfer learning approach and chemical language models were conducted by Brenda and co-workers [56]. However, computational polymer discovery still lacks comprehensive data-driven analyses of reaction pathways and stability assessments through retrosynthesis. The researchers envisaged the utility of transformer-based chemical language models to predict polymerization tasks. The researchers used a transfer learning approach to predict forward and retrosynthesis reactions for vinyl copolymers.

The prediction of polymerization reactions is only validated when the specific atoms involved in the bonding between monomers throughout the polymerization steps are accurately identified. However, modeling these atomic linkages remains an unresolved challenge in the current computational literature. Another obstacle in AI-infused polymer discovery is the determination of the synthesizability of thermodynamically stable polymers. In ML-infused design to predict polymerization reactions, it is required to identify label head and tail linkages of repeated monomers. For this scenario, Ferrari et al. [56] have adopted two distinct strategies. As a first approach, they have adopted the M2P (Monomers to Polymers) tool, and as a second approach, a Python 3.14.x tool for Head-Tail Assignment, known as HTA, was adopted. It was elucidated with 100% accuracy for identifying members of the polyvinyl class for the HTA algorithm.

5.1. Molecular Transformer

A molecular transformer casts polymerization reaction prediction as a language modeling task based on molecular representations of reactants in the form of extensive descriptors (SMILES strings). The architecture delineates the polymerization reactions and the prediction of retrosynthesis by fine-tuning the pretrained models. This paves the way for the transfer learning approach, and the models are trained based on textual representations of molecules. Chemical reactions or polymerizations are encoded as reaction SMILES representations through NLP techniques. The reaction SMILES were for reactants, reagents, solvents, and products, as well as catalysts.

This molecular transformer model approach is based on an extended version of the transformer-infused language model based on polymers. The adaptability of molecular language transformer-based models in retrosynthesis is a niche area of computational tool development for automated reaction pathways for biopolymers. The limitations of molecular transformer models in retrosynthesis are the choice of polymerization categories and the size of the training data that are available for building the prediction model. The extension of this model to diverse polymer classes in the transfer learning stage will enhance the accuracy of prediction outcomes.

Recent developments, such as TransPolymer, have extended transformer-based architectures specifically for polymer property prediction. Xu et al. [57] introduced a chemically aware tokenizer and demonstrated that pretraining on polymer-specific sequences significantly improves the accuracy of retrosynthetic and forward reaction predictions, making it a valuable complement to general-purpose molecular transformers.

5.2. Translation Task in ROP

Forward and retro-reaction predictions were modeled as a translation task, converting reactant-reagent SMILES strings into product SMILES. In the context of ring-opening polymerization (ROP), the SMILES strings are divided into a source—comprising the reactants and reagents—and a target, representing the products. Dataset splitting is based on target variations for the same source input. According to Ferrari’s observations, since the HTA and M2P datasets yield diverse outcomes for identical source instances, the datasets were split into 90% for training, 5% for validation, and 5% for testing [56]. In a related effort, Schwaller et al. [58] proposed a hypergraph-based retrosynthesis planner that integrates transformer models with multistep reaction prediction. Although initially developed for small molecules, this architecture has been adapted for polymerization pathways, offering improved accuracy in predicting feasible synthetic routes for complex monomer systems.

6. Conclusions

This review presents a state-of-the-art overview of recent advances in artificial intelligence, particularly machine learning and deep learning, for the accelerated discovery of bioplastics. Biopolymer informatics emphasizes the transformative potential of data-driven approaches in the development of sustainable polymers.

We highlight PolyID, a machine learning-based message-passing graph neural network tool that provides a robust framework for the accelerated discovery of bioplastics, supported by benchmarked data repositories. This review details AI techniques for predicting structure–property relationships, optimizing synthesis pathways, and enhancing the development of bioplastics.

Furthermore, ML-driven methods in catalyst design and polymerization processes for bioplastics are discussed, offering a valuable resource for emerging researchers and academicians in the field. These advancements are facilitated by the growing availability of publicly accessible databases and open-source polymer informatics tools.

This review serves as a comprehensive reference, providing critical insights into current AI models for polymerization, molecular descriptor generation, structure–property prediction, and the use of benchmarked datasets in the context of sustainable polymers. Overall, this study underscores the significant potential of AI in advancing the design and discovery of green polymers, contributing to a more sustainable future. AI innovation in bioplastics is gaining increasing attention and represents a crucial step forward in addressing societal and environmental challenges through advanced materials science.

Author Contributions

Ideation, N.A. and V.S.; literature search, N.A. and V.S.; writing, N.A.; revision, V.S., P.G.-D. and M.E.G.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Ministerio de Ciencia e Innovación (Spain) and the European Community (grant number: RYC2021-033921-I). The DIGIMATER-CM project “Digital strategies for the autonomous discovery of materials for engineering applications” (reference TEC-2024/TEC-102) is funded through the call for grants for R&D projects carried out in collaboration between research groups belonging to the Universities and Research Bodies of the Community of Madrid in the modality of programs of R&D activities in technologies.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sessini, V.; Ghosh, S.; Mosquera, M.E. (Eds.) Biopolymers: Synthesis, Properties, and Emerging Applications; Elsevier: Amsterdam, The Netherlands, 2023. [Google Scholar]
Sessini, V.; Ghosh, S.; Mosquera, M.E.G. Introduction to biopolymer synthesis, properties, and emerging applications. In Biopolymers Synthesis, Properties, and Emerging Applications; Elsevier: Amsterdam, The Netherlands, 2023; pp. 1–20. [Google Scholar]
Yan, C.; Li, G. The rise of machine learning in polymer discovery. Adv. Intell. Syst. 2023, 5, 2200243. [Google Scholar] [CrossRef]
Peng, S.; Geng, Y.; Li, Z.; Heydari, S.F.; Shahgholi, M. Investigating the effects of temperature on thermal and mechanical properties of polyurethane/polycaprolactone/graphene oxide nanocomposites: Focusing on creating a smart polymer nanocomposite via molecular dynamics method. Mol. Phys. 2025, 123, e2351164. [Google Scholar] [CrossRef]
Ashok, N.; Soman, K.; Samanta, M.; Sruthi, M.; Poornachandran, P.; Devi V, G.S.; Sukumar, N. Polymer and Nanocomposite Informatics: Recent Applications of Artificial Intelligence and Data Repositories. In Advanced Machine Learning with Evolutionary and Metaheuristic Techniques; Springer: Berlin/Heidelberg, Germany, 2024; pp. 297–322. [Google Scholar]
Tran, H.; Gurnani, R.; Kim, C.; Pilania, G.; Kwon, H.-K.; Lively, R.P.; Ramprasad, R. Design of functional and sustainable polymers assisted by artificial intelligence. Nat. Rev. Mater. 2024, 12, 866–886. [Google Scholar] [CrossRef]
Gupta, S.; Mahmood, A.; Shukla, S.; Ramprasad, R. Benchmarking Large Language Models for Polymer Property Predictions. arXiv 2025, arXiv:2506.02129. [Google Scholar] [CrossRef]
Sessini, V.; Salaris, V.; Oliver-Cuenca, V.; Tercjak, A.; Fiori, S.; López, D.; Kenny, J.M.; Peponi, L. Thermally-Activated Shape Memory Behavior of Biodegradable Blends Based on Plasticized PLA and Thermoplastic Starch. Polymers 2024, 16, 1107. [Google Scholar] [CrossRef]
Deng, S.; Chen, C.; Li, K.; Chen, X.; Xia, K.; Li, S. Structure-Based Multilevel Descriptors for High-throughput Screening of Elastomers. J. Phys. Chem. B 2023, 127, 10077–10087. [Google Scholar] [CrossRef]
Arbelaiz Garmendia, A.; Landa, B.; Peña Rodríguez, C. The Preparation and Characterization of Poly (lactic Acid)/Poly (ε-caprolactone) Polymer Blends: The Effect of Bisphenol A Diglycidyl Ether Addition as a Compatibilizer. J. Manuf. Mater. Process. 2025, 9, 38. [Google Scholar] [CrossRef]
Sessini, V.; Navarro-Baena, I.; Arrieta, M.P.; Dominici, F.; López, D.; Torre, L.; Kenny, J.M.; Dubois, P.; Raquez, J.-M.; Peponi, L. Effect of the addition of polyester-grafted-cellulose nanocrystals on the shape memory properties of biodegradable PLA/PCL nanocomposites. Polym. Degrad. Stab. 2018, 152, 126–138. [Google Scholar] [CrossRef]
Malashin, I.; Tynchenko, V.; Gantimurov, A.; Nelyub, V.; Borodulin, A. Boosting-Based Machine Learning Applications in Polymer Science: A Review. Polymers 2025, 17, 499. [Google Scholar] [CrossRef]
Adams, N.; Murray-Rust, P. Engineering polymer informatics: Towards the computer-aided design of polymers. Macromol. Rapid Commun. 2008, 29, 615–632. [Google Scholar] [CrossRef]
Zhao, Y.; Chen, Z.; Jian, X. A High-Generalizability Machine Learning Framework for Analyzing the Homogenized Properties of Short Fiber-Reinforced Polymer Composites. Polymers 2023, 15, 3962. [Google Scholar] [CrossRef] [PubMed]
Walsh, D.J.; Zou, W.; Schneider, L.; Mello, R.; Deagen, M.E.; Mysona, J.; Lin, T.-S.; de Pablo, J.J.; Jensen, K.F.; Audus, D.J.; et al. Community Resource for Innovation in Polymer Technology (CRIPT): A Scalable Polymer Material Data Structure. ACS Cent. Sci. 2023, 9, 330–338. [Google Scholar] [CrossRef] [PubMed]
Park, N.H.; Manica, M.; Born, J.; Hedrick, J.L.; Erdmann, T.; Zubarev, D.Y.; Adell-Mill, N.; Arrechea, P.L. Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language. Nat. Commun. 2023, 14, 3686. [Google Scholar] [CrossRef]
Gopalakrishnan, A.; Soman, K.; Rajendran, S.; Raj, K.K. Efficient Text Analysis: A BERT-Based Approach to Named Entity Recognition (NER) and Classification for Malayalam Language. Int. J. Inf. Technol. 2025, 1–7. [Google Scholar] [CrossRef]
Yue, T.; Tao, L.; Varshney, V.; Li, Y. Benchmarking study of deep generative models for inverse polymer design. Digit. Discov. 2025, 4, 910–926. [Google Scholar] [CrossRef]
Schneider, L.; Walsh, D.; Olsen, B.; de Pablo, J. Generative BigSMILES: An extension for polymer informatics, computer simulations & ML/AI. Digit. Discov. 2024, 3, 51–61. [Google Scholar] [CrossRef]
Miccio, L.A. Understanding Polymers Through Transfer Learning and Explainable AI. Appl. Sci. 2024, 14, 10413. [Google Scholar] [CrossRef]
Sattari, K.; Xie, Y.; Lin, J. Data-driven algorithms for inverse design of polymers. Soft Matter 2021, 17, 7607–7622. [Google Scholar] [CrossRef]
Zheng, Y.; Thakolkaran, P.; Biswal, A.K.; Smith, J.A.; Lu, Z.; Zheng, S.; Nguyen, B.H.; Kumar, S.; Vashisth, A. AI-Guided Inverse Design and Discovery of Recyclable Vitrimeric Polymers. Adv. Sci. 2025, 12, 2411385. [Google Scholar] [CrossRef]
Dananjaya, S.; Chevali, V.; Dear, J.; Potluri, P.; Abeykoon, C. 3D printing of biodegradable polymers and their composites–Current state-of-the-art, properties, applications, and machine learning for potential future applications. Prog. Mater. Sci. 2024, 146, 101336. [Google Scholar] [CrossRef]
Yang, H.; Zhu, Z.; Ni, S.; Wang, X.; Nie, Y.; Tao, C.; Zou, D.; Jiang, W.; Zhao, Y.; Zhou, Z. Silk fibroin-based bioelectronic devices for high-sensitivity, stable, and prolonged in vivo recording. Biosens. Bioelectron. 2025, 267, 116853. [Google Scholar] [CrossRef]
Singh, A.V.; Chandrasekar, V.; Janapareddy, P.; Mathews, D.E.; Laux, P.; Luch, A.; Yang, Y.; Garcia-Canibano, B.; Balakrishnan, S.; Abinahed, J. Emerging application of nanorobotics and artificial intelligence to cross the BBB: Advances in design, controlled maneuvering, and targeting of the barriers. ACS Chem. Neurosci. 2021, 12, 1835–1853. [Google Scholar] [CrossRef] [PubMed]
Atasi, C.; Kern, J.; Ramprasad, R. Design of Recyclable Plastics with Machine Learning and Genetic Algorithm. J. Chem. Inf. Model. 2024, 64, 9249–9259. [Google Scholar] [CrossRef]
Ganti, S.V.S.; Wölfel, L.; Kuenneth, C. AI-Driven Discovery of High Performance Polymer Electrodes for Next-Generation Batteries. J. Polym. Sci. 2025, 1–9. [Google Scholar] [CrossRef]
Kern, J.; Su, Y.-L.; Gutekunst, W.; Ramprasad, R. An informatics framework for the design of sustainable, chemically recyclable, synthetically accessible, and durable polymers. Npj Comput. Mater. 2025, 11, 182. [Google Scholar] [CrossRef]
Wilson, A.N.; St John, P.C.; Marin, D.H.; Hoyt, C.B.; Rognerud, E.G.; Nimlos, M.R.; Cywar, R.M.; Rorrer, N.A.; Shebek, K.M.; Broadbelt, L.J.; et al. PolyID: Artificial Intelligence for Discovering Performance-Advantaged and Sustainable Polymers. Macromolecules 2023, 56, 8547–8557. [Google Scholar] [CrossRef] [PubMed]
Kuenneth, C.; Lalonde, J.; Marrone, B.L.; Iverson, C.N.; Ramprasad, R.; Pilania, G. Bioplastic design using multitask deep neural networks. Commun. Mater. 2022, 3, 96. [Google Scholar] [CrossRef]
Petersen, S.R.; Kohan Marzagão, D.; Gregory, G.L.; Huang, Y.; Clifton, D.A.; Williams, C.K.; Siviour, C.R. Property Prediction of Bio-Derived Block Copolymer Thermoplastic Elastomers Using Graph Kernel Methods. Angew. Chem. Int. Ed. 2025, 64, e202411097. [Google Scholar] [CrossRef]
Ma, R.; Luo, T. PI1M: A Benchmark Database for Polymer Informatics. J. Chem. Inf. Model. 2020, 60, 4684–4690. [Google Scholar] [CrossRef]
Bejagam, K.K.; Lalonde, J.; Iverson, C.N.; Marrone, B.L.; Pilania, G. Machine Learning for Melting Temperature Predictions and Design in Polyhydroxyalkanoate-Based Biopolymers. J. Phys. Chem. B 2022, 126, 934–945. [Google Scholar] [CrossRef]
Nanda, S.; Patra, B.R.; Patel, R.; Bakos, J.; Dalai, A.K. Innovations in applications and prospects of bioplastics and biopolymers: A review. Environ. Chem. Lett. 2022, 20, 379–395. [Google Scholar] [CrossRef] [PubMed]
Pilania, G.; Iverson, C.N.; Lookman, T.; Marrone, B.L. Machine-Learning-Based Predictive Modeling of Glass Transition Temperatures: A Case of Polyhydroxyalkanoate Homopolymers and Copolymers. J. Chem. Inf. Model. 2019, 59, 5013–5025. [Google Scholar] [CrossRef] [PubMed]
Savit, A.; Sahu, H.; Shukla, S.; Xiong, W.; Ramprasad, R. polyBART: A Chemical Linguist for Polymer Property Prediction and Generative Design. arXiv 2025, arXiv:2506.04233. [Google Scholar]
Li, Z.; Yang, T.; Zhang, L. Data-augmented machine learning for inverse design of homopolymers with targeted glass transition temperature. Polym. Int. 2025. [Google Scholar] [CrossRef]
Zhou, J.; Yang, Y.; Mroz, A.M.; Jelfs, K.E. PolyCL: Contrastive learning for polymer representation learning via explicit and implicit augmentations. Digit. Discov. 2025, 4, 149–160. [Google Scholar] [CrossRef]
Vogel, G.; Weber, J.M. Inverse design of copolymers including stoichiometry and chain architecture. Chem. Sci. 2025, 16, 1161–1178. [Google Scholar] [CrossRef]
Bilodeau, C.; Jin, W.; Jaakkola, T.; Barzilay, R.; Jensen, K.F. Generative models for molecular discovery: Recent advances and challenges. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2022, 12, e1608. [Google Scholar] [CrossRef]
Kim, C.; Chandrasekaran, A.; Huan, T.D.; Das, D.; Ramprasad, R. Polymer genome: A data-powered polymer informatics platform for property predictions. J. Phys. Chem. C 2018, 122, 17575–17585. [Google Scholar] [CrossRef]
Kim, S.; Schroeder, C.M.; Jackson, N.E. Open macromolecular genome: Generative design of synthetically accessible polymers. ACS Polym. Au 2023, 3, 318–330. [Google Scholar] [CrossRef]
Aldeghi, M.; Coley, C.W. A Graph Representation of Molecular Ensembles for Polymer Property Prediction. Chem. Sci. 2022, 13, 13545–13555. [Google Scholar] [CrossRef]
Aleb, N.; Abu-Thabit, N.Y. TransTg: A new transformer model for predicting glass transition temperature of polymers from monomers’ molecular structures. Neural Comput. Appl. 2025, 37, 2733–2746. [Google Scholar] [CrossRef]
Zhang, T.; Yang, D.B. Multimodal Machine Learning with Large Language Embedding Model for Polymer Property Prediction. arXiv 2025, arXiv:2503.22962. [Google Scholar] [CrossRef]
Kuenneth, C.; Ramprasad, R. polyBERT: A Chemical Language Model to Enable Fully Machine-Driven Ultrafast Polymer Informatics. Nat. Commun. 2022, 13, 7271. [Google Scholar] [CrossRef]
Mulrennan, K.; Munir, N.; Creedon, L.; Donovan, J.; Lyons, J.G.; McAfee, M. NIR-based intelligent sensing of product yield stress for high-value bioresorbable polymer processing. Sensors 2022, 22, 2835. [Google Scholar] [CrossRef] [PubMed]
Rizwan, M.; Khan, M.A.; Baig, K.S. Explainable Artificial Intelligence to Predict and Optimize Delignification of a Lignocellulosic Material by Ozone Transport. Available online: https://www.researchgate.net/publication/387271560_Explainable_Artificial_Intelligence_to_Predict_and_Optimize_Delignification_of_a_Lignocellulosic_Material_by_Ozone_Transport (accessed on 24 January 2025).
Grace, R. Scientists Using Al to Accelerate Materials Discovery. Plast. Eng. 2023. Gale Academic OneFile. Available online: https://go.gale.com/ps/i.do?p=AONE&u=anon~145462b4&id=GALE%7CA779352591&v=2.1&it=r&sid=googleScholar&asid=add263f5 (accessed on 14 July 2025).
Gurnani, R.; Kuenneth, C.; Toland, A.; Ramprasad, R. Polymer informatics at scale with multitask graph neural networks. Chem. Mater. 2023, 35, 1560–1567. [Google Scholar] [CrossRef]
Antoniuk, E.R.; Li, P.; Kailkhura, B.; Hiszpanski, A.M. Representing polymers as periodic graphs with learned descriptors for accurate polymer property predictions. J. Chem. Inf. Model. 2022, 62, 5435–5445. [Google Scholar] [CrossRef] [PubMed]
Malashin, I.; Martysyuk, D.; Tynchenko, V.; Gantimurov, A.; Semikolenov, A.; Nelyub, V.; Borodulin, A. Machine Learning-Based Process Optimization in Biopolymer Manufacturing: A Review. Polymers 2024, 16, 3368. [Google Scholar] [CrossRef]
Martí, D.; Pétuya, R.; Bosoni, E.; Dublanchet, A.-C.; Mohr, S.; Léonforte, F. Predicting the Glass Transition Temperature of Biopolymers via High-Throughput Molecular Dynamics Simulations and Machine Learning. ACS Appl. Polym. Mater. 2024, 6, 4449–4461. [Google Scholar] [CrossRef]
Ohno, M.; Hayashi, Y.; Zhang, Q.; Kaneko, Y.; Yoshida, R. SMiPoly: Generation of a Synthesizable Polymer Virtual Library Using Rule-Based Polymerization Reactions. J. Chem. Inf. Model. 2023, 63, 5539–5548. [Google Scholar] [CrossRef]
Chen, L.; Kern, J.; Lightstone, J.P.; Ramprasad, R. Data-assisted polymer retrosynthesis planning. Appl. Phys. Rev. 2021, 8, 031405. [Google Scholar] [CrossRef]
Ferrari, B.S.; Manica, M.; Giro, R.; Laino, T.; Steiner, M.B. Predicting polymerization reactions via transfer learning using chemical language models. Npj Comput. Mater. 2024, 10, 119. [Google Scholar] [CrossRef]
Xu, C.; Wang, Y.; Barati Farimani, A. TransPolymer: A Transformer-based language model for polymer property predictions. Npj Comput. Mater. 2023, 9, 64. [Google Scholar] [CrossRef]
Schwaller, P.; Petraglia, R.; Zullo, V.; Nair, V.H.; Haeuselmann, R.A.; Pisoni, R.; Bekas, C.; Iuliano, A.; Laino, T. Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chem. Sci. 2020, 11, 3316–3325. [Google Scholar] [CrossRef]

Figure 1. Ecosystem of AI tools in bioplastics design.

Figure 2. AI/ML pipeline for accelerated discovery of bioplastics.

Figure 3. Illustrated summary of the different applications of AI-guided bioplastics.

Figure 4. ML framework pipeline for polymers [5].

Table 1. ML models in polymer science.

Sl No.	ML Model	Objective	AI Techniques Used	Key Findings
1	PolyBERT	Polymer property predictions from molecular representations	Transformer models (BERT- NLP)	Achieved high accuracy in predicting polymer properties and learns polymer-structure–property relations from chemical language
2	PolyNet	Prediction of glass transition temperature (Tg)	Graph neural networks (GNNs)	Structure-based GNN representations outperformed traditional methods
3	PI1M Dataset	Polymer informatics platform (benchmarked data repositories)	Ensemble ML models like Random Forest, XGBoost, MLP	The initial large-scale benchmarking data repository to predict dielectric modulus, Tg, etc., with good accuracy
4	Polymer Retrosynthesis Model	Polymerization pathway predictions forward and retrosynthesis	Transformer infused with transfer learning	Enables forward and retrosynthesis prediction for novel polymer inverse design

Table 2. Comparative study of AI-integrated approach with traditional methods.

Aspect	AI Integrated Approach	Scientific Method
Data Handling and Accuracy (Advantage for AI integrated approach)	AI—ML has the capacity to Handle high-dimensional datasets with good accuracy	Limited to manual analysis
Property Prediction (Advantage for AI-integrated approach)	Modeling the behavior for novel bioplastics is possible	Needs high-throughput experiments and manual laborers
Interpretability (Drawback for AI-integrated approach)	Lack of explainability and validations viz. black-box in deep learning	Known physics and chemistry with experimental results
Dependency on Data (Drawback for AI-integrated approach)	Labeled datasets are an important requisite for an AI model	Based on analytical and experimental approach and a less data-intensive method
Ethical concerns for polymer product design (Drawback for AI-integrated approach)	Vigilant validation is required with experimental values to avoid false positives/negatives	The protocols for design are well established and as per standards

Table 3. Main goals, models used, and key insights of AI in polymer design.

De Novo Polymer Design	Novel Polymer Design	Graph-Based Models	Models Trained on Experimental Polymer Data Generated from Extreme Temperature Resistant Hypothetical Polymers
Inverse Polymer Design	Generate polymer structures from target properties	Generative DL models	Retrosynthesis of polymeric design
Tokenization and Descriptors	Property prediction using molecular fingerprints	Featurization of chemical representations	Accurate prediction of structure–property relations in polymers
Transformer-Based Models	Relate polymer chemical structure to end property using chemical language	Transformer models relate SMILES to properties	Fine-tuned with pretrained transformer models
Generative AI in Biopolymer Informatics	Discover bioplastics with target functionality for energy/medical uses	High-throughput screening, generative AI, XAI	Helps identify new biopolymers and efficient synthesis routes
Explainable AI (XAI)	Enhances model interpretability in polymeric design	LIME, SHAP	Human interface with ML predictions, especially in bio-mimetic fields

Table 4. Data Repositories for Polymers.

Database and URL	Polymers	Class	Available Properties
BiopolymersDB https://www.ifbb-hannover.de/en/biopolymer-database.html (accessed on 2 February 2025)	2000+	Biopolymers (proteins, DNA)	Sequences and structure predictions
Materials Project https://materialsproject.org/ (accessed on 5 February 2025)	More than 1,000,000	Synthetic and biopolymers	Benchmarked data repository for polymer informatics
NIST Synthetic Polymer MALDI Recipes Database https://www.nist.gov/srd/related-data-products-and-links/curated-data-collections (accessed on 4 February 2025)	1250	Petroleum-based polymers	MALDI mass spectra, vibrational and electronic energy data
NanoMine http://materialsmine.org/nm (accessed on 4 February 2025)	500+	Polymer nanocomposites	Physico-chemical properties
Polymer Property Predictor https://pppdb.uchicago.edu/ (accessed on 5 February 2025)	1500+	Any polymers	Flory–Huggins χ parameters and glass transition temperatures of polymers
NanoPolyDB https://brinsonlab.pratt.duke.edu/research/mgp/Nanomine (accessed on 10 February 2025)	500+	Polymer nanocomposites	Nanoparticle dispersion and interface property modeling
PolySynthDB https://tonejs.github.io/docs/r12/PolySynth (accessed on 10 February 2025) and https://pypi.org/project/polysynth/ (accessed on 10 February 2025	1500+	Traditional polymers	Polymerization conditions, polymer structure, and monomer synthesis
P I1M https://github.com/st-su/PI1M (accessed on 11 February 2025)	More than 1,000,000	AI-based dataset	Polymer informatics data repository
Protein Data Bank (PDB) http://www.rcsb.org/ (accessed on 11 February 2025)	84,535 unique protein sequences	Proteins, DNA, and protein-DNA complexes	Chemical similarity and visualization search tool
PoLyInfo http://www.polyinfo.org/ (accessed on 12 February 2025)	16,600+ homopolymers, synthetic polymers	Mechanical, thermal, and chemical properties	Property comparison and data analysis

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ashok, N.; Garcia-Diaz, P.; Mosquera, M.E.G.; Sessini, V. Machine Learning for the Optimization of the Bioplastics Design. Macromol 2025, 5, 38. https://doi.org/10.3390/macromol5030038

AMA Style

Ashok N, Garcia-Diaz P, Mosquera MEG, Sessini V. Machine Learning for the Optimization of the Bioplastics Design. Macromol. 2025; 5(3):38. https://doi.org/10.3390/macromol5030038

Chicago/Turabian Style

Ashok, Neelesh, Pilar Garcia-Diaz, Marta E. G. Mosquera, and Valentina Sessini. 2025. "Machine Learning for the Optimization of the Bioplastics Design" Macromol 5, no. 3: 38. https://doi.org/10.3390/macromol5030038

APA Style

Ashok, N., Garcia-Diaz, P., Mosquera, M. E. G., & Sessini, V. (2025). Machine Learning for the Optimization of the Bioplastics Design. Macromol, 5(3), 38. https://doi.org/10.3390/macromol5030038

Article Menu

Machine Learning for the Optimization of the Bioplastics Design

Abstract

1. Introduction

2. Evolution of Polymer Informatics in Bioplastics

Applications of Bioplastics

3. AI-ML in Polymer Design

4. PolyID—Artificial Intelligence for Bioplastics

5. AI-Driven Design of Catalysts for ROP of Bioplastics

5.1. Molecular Transformer

5.2. Translation Task in ROP

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI