Next Article in Journal
Packed-Bed Pyrolysis of Alkali Lignin for Value-Added Products
Previous Article in Journal
Effective Recovery of Gold from Chloride Multi-Metal Solutions Through Anion Exchange
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Plastic Classification Model Based on Simulated Data

1
Technologie Campus Grafenau, Technische Hochschule Deggendorf, 94481 Grafenau, Germany
2
Sesotec GmbH, Regener Straße 130, 94513 Schönberg, Germany
*
Author to whom correspondence should be addressed.
Recycling 2025, 10(2), 65; https://doi.org/10.3390/recycling10020065
Submission received: 3 March 2025 / Revised: 28 March 2025 / Accepted: 5 April 2025 / Published: 8 April 2025

Abstract

:
Plastic recycling holds significant potential to reduce global carbon emissions. Despite advances in recycling technologies, challenges such as limited data availability, contamination in sorted materials, and the complexity of real-world material flows continue to hinder progress. This study addresses these issues by introducing a novel approach to plastic classification, leveraging simulated spectral data to reduce reliance on large datasets and improve classification accuracy. Using near-infrared spectroscopy and deep learning models, the framework integrates data augmentation techniques and spectral simulation to augment datasets with synthetic spectra based on a data sample of 25 plastic granules. The proposed classification framework achieves excellent recall and robust balanced accuracy for both binary and multi-target polymer classification with minimal data input (only 50 spectra per class). Thus, the measurement effort is drastically reduced while maintaining an equally high model accuracy. The model significantly outperforms conventional unsupervised approaches. By overcoming the limitations of supervised learning models, the proposed framework provides a scalable and efficient solution for plastics recycling.

1. Introduction

Plastic recycling offers significant potential for reducing global carbon emissions, given the widespread use of plastics and the fact that approximately 5% of global carbon emissions are attributable to their production and use [1]. In 2019 alone, global greenhouse gas emissions from the plastic lifecycle were estimated at 0.86 million tons (Mt) of carbon dioxide equivalents, with projections suggesting that this figure could rise to 2.80 Mt of carbon dioxide equivalents by 2050 [2]. However, only 4% of the carbon footprint associated with plastics is linked to end-of-life processes such as recycling, incineration, and landfilling, while 96% stem from their production [3]. Realizing this potential for plastics recycling will require significant progress and investment in recycling technologies and infrastructure. The increased interest in recycling is further confirmed by the significant rise in publications (647 to 2641) and likewise patents related to plastic waste recycling over the last decade (2011–2021) [4].
Recycling involves the conversion of plastic waste into secondary materials that can be reintroduced into the production cycle, either for reuse in similar applications or for the development of new components and products with equivalent or superior functionality [5]. Prior to the reprocessing of plastics, various stages of plastic recycling, sorting, and separation constitute an important step in the processing loop [5], as the purity of the sorted materials directly affects the quality and properties of the recovered material.
Flake sorting is a separation stage that takes place toward the end of this process, further reducing contamination from the stream that may have bypassed earlier stages. The equipment is designed to classify plastics by size, with most systems capable of handling particles as small as 1 mm, although this threshold may vary depending on the specific technology used. The identification of plastic types is accomplished through the use of near-infrared (NIR), X-ray fluorescence (XRF), and/or visible (VIS) technologies [6]. Throughout this paper, we focus on this part of the process.
With the advent of Deep Learning (DL), the use of NIR in combination with convolutional neural networks (CNNs) has emerged as a highly effective tool in the accurate classification of plastics, as captured in several comprehensive reviews [6,7,8,9] underlining the integral importance of DL models in the field of sensor-based plastic waste detection and sorting. Traditionally, these models rely on extensive data collection followed by complex model building processes, which include data pre-processing, feature extraction, and model training. At present, a lack of high quality, publicly available datasets poses a significant challenge to advancing further research in this area [10]. Recent research, including Yang et al. [9] and Neo et. al [11], highlights the need to establish and develop polymer spectral databases. In fact, several studies [9,10] identify data limitations as one of the most significant barriers to research into novel DL techniques in the recycling process.
To address scarcity, data augmentation is a powerful technique used in Machine Learning (ML) and DL to artificially increase the size and diversity of a dataset. First experiments with spectroscopic data were already conducted in 1998, where Conlin et al. [12] applied a data augmentation approach to an industrial dataset comprising spectroscopic measurements. The primary objective was to develop a robust calibration model by aggregating multiple partial least squares regression (PLS) models, which were derived from datasets, with enhanced variability achieved through the deliberate addition of Gaussian noise. Gracia Moisés et al. [13] experimented with the inclusion of slightly modified copies of existing data to tackle the challenge of generating robust and balanced datasets for classification tasks. Similarly, Zhang et al. [14] utilized a deep convolutional generative adversarial network (DCGAN) to augment their spectral database for predicting the oil content of individual maize kernels. Chen et al. [15] also implemented virtual sample generation to improve the spectral diagnostic accuracy for cancer detection. Other research groups are also exploring the use of GANs [16] and DCGANs [17,18]. Gracia Moisés et al. [13] presented in their comprehensive review of data augmentation techniques for ML applied to optical spectroscopy datasets a variety of generation techniques encompassing approaches based on DL and non-DL algorithms.
Despite these promising results, these investigations mainly focus on ”pure” data augmentation but fail to extend the sample space to include so far unknown and unrecorded materials during the model training process. In the application of a supervised model approach, we identify this as a general problem, in addition to data scarcity, and an open research gap. Under the assumption of convenient input data and sufficiently high explanatory power in the independent variables with respect to the target value to be predicted, an accurate and tailored (the term “tailored” implies in this context that the model assigns an input instance to one of the learned classes) classification model can be achieved. But, this model will make a prediction regardless of whether the true label of the input instance is an element of the learned set of classes. As an unknown sample, we define spectra belonging to plastics or physical materials that are not included in the training data of the model. Under fully controlled (laboratory) conditions where the true labels of the materials are known, this may not be a problem. However, in real-world applications, the material flow is rarely known with absolute certainty. Even if the plastic waste has already been pre-sorted in the first stage (sorting plants), inaccuracies in these processes could potentially cause unwanted input data for the specialized processing plants, impacting the purity of the resulting recyclates [19] (cf. [20] for a schematic structure of a sensor-based material flow characterization (SBMC)).
To overcome this issue, unsupervised techniques can determine whether the object spectra under investigation stem from a pre-defined sample space. But with increasing variability in the data and more target classes, these procedures lack accuracy, as demonstrated in Section 2. Another possibility is to expand the data space to include all potential interfering materials, enabling the model to distinguish between unwanted and wanted classes. But this requires a large number of measurements and knowledge of these materials. Any detection of new disruptive materials would result in a re-estimation of the model.
This paper presents a novel approach to plastics classification that differs from conventional methods by reducing the reliance on large and extensive datasets. Our proposed model leverages simple simulation techniques to generate synthetic spectral data, allowing for the accurate classification of target polymers. We generate synthetic observations for modeling, reducing the need for direct measurements while improving material classification, especially in unknown flows. Additionally, we apply augmentation methods to minimize target class spectrum measurements and ensure robust DL model training. Using NIR spectra of 25 different plastic granulates (PGs), we start with the examination of simple classification tasks (one target granule) and extend to multi-target classification. Based on known and established classification metrics, performance is assessed against a benchmark. To the best of our knowledge, there is no other research known presenting and following a similar approach in the context of plastic recycling.
The rest of this paper is structured as follows: Section 2 and Section 3 report and discuss the results obtained. The data collection process and the model pipeline are outlined in Section 4. Furthermore, all data processing and simulation steps are described in detail. The paper ends with a short conclusion (Section 5).

2. Results

In this study, we designed an experimental setup to evaluate the performance of our classification framework in two general settings: binary classification and multi-target classification. The binary classification task is defined as the separation of one target class from all other materials, while multi-target classification is the extension to several target classes. For each run, we fit a Convolutional neural network (CNN) using a fixed set of training data consisting of the original target spectra acquired by measurement used for training (ORG-Train), 20,000 replica (REP) and 20,000 synthetic spectra (SYN) (specific spectra simulation (S-SIM)) per polymer class, and 40,000 SYN (unspecific spectra simulation (U-SIM)) (cf. Section 4.4). This combination of spectra ensured diversity and provided a sufficiently balanced dataset for model training with a minimum ratio of 20% target class spectra. We used raw spectra as input data because CNNs generally show strong performance on unprocessed data, as outlined by Jernelv et al. [21], except when baseline correction is required. In the scope of spectral data, Neo et al. [19] also state that the use of a CNN does not necessarily require pre-processing due to its feature extraction capabilities.
To prevent overfitting, the CNN was configured with a number of 2 epochs with no improvement, after which training was stopped at a maximum number of 200 epochs and a batch size of 64. The choice of these hyperparameters was determined by preliminary analyses in which different parameter settings were tested. To boost the performance of the CNN, we assigned a spectrum to the target class only if the CNN predicted a class probability higher than 90%. This value is arbitrary, but pre-tests suggested 0.90 as a convenient performance boost for all metrics.
Testing of the model with independent and convenient test data allows for a correct assessment of the model’s performance. We assumed an unknown polymer material stream consisting of original target spectra acquired by measurement used for validation (ORG-Val) and original non-target spectra acquired by measurement (ORG-Test-NT), forming the validation set. All ORG-Test-NT spectra were labeled as one common class (non-target).
To prevent data leakage and to ensure the robustness of the model evaluation, the dataset was systematically split into distinct training/testing and validation groups. Specifically, polycarbonate (PC) spectra were partitioned by performing 100 independent draws to avoid random effects. In each draw, the spectra of 10 unique PC plastic granulate (PG) groups were randomly selected for the training and testing set, while the remaining 3 PC PG groups were part of the validation set. Additionally, spectra from other polymer classes (polybutylene terephthalate (PBT), polyethylene (PE), and polymethylmethacrylate (PMMA)) were also consistently split into complementary training and validation sets to maintain class balance and avoid overlapping instances. Polymers with less than two subgroups or blends were used for validation only. Performance evaluation was conducted using the metrics introduced in Section 4.5.

2.1. Binary Classification

The results of the binary classification are reported in Table 1 and reflect the median values over the draws. The findings demonstrate the superior performance of the proposed model in comparison to the benchmark evaluated across four polymer classes: PBT, PC, PE, and PMMA. A significant insight is the consistently high recall values achieved by the proposed model, with perfect recall observed for PBT, PE, and PMMA, as well as 0.9667 for PC, indicating its ability to correctly identify nearly all instances of the target polymers. In addition, the proposed model achieved strong balanced accuracy, particularly for PC (0.8618), PMMA (0.8082), and PE (0.9856). The precision values, however, remained comparatively low (e.g., 0.0802 for PBT and 0.1302 for PMMA), directly affecting the F1-scores and indicating room for improvement in reducing false positives.
In stark contrast, the benchmark model exhibited severe instability and collapse for most classes. While it achieved perfect recall for PC, this was coupled with a low balanced accuracy, suggesting that the constructed hull included all validation data and leading to misclassifications of almost all other polymer classes. For PBT, PE, and PMMA, the benchmark’s recall dropped to 0.0000, demonstrating its failure to correctly identify these polymers.
Overall, these results highlight the robustness of the proposed model, particularly its ability to avoid false negatives—a crucial factor for material identification tasks—while “common unsupervised approaches” suffered from instability, bias toward certain classes, and reduced generalization performance. Despite the promising results, the relatively moderate precision observed in some classes emphasizes the need for further refinements to reduce false positive rates and enhance the overall reliability of the model.

2.2. Multi-Target Classification

The primary focus of this research is on multi-target classification due to its real-world relevance in recycling plants. By enabling the simultaneous identification and categorization of multiple polymer types, we are improving the separation process, ensuring higher purity levels and minimizing material loss.
We defined the four polymers studied in Section 2.1 as material classes based on both our literature review and industrial application. Neo et al. [11] identified three to five classes as relevant groupings, while Kroell et al. [7] highlighted five polymer types as commonly studied material categories. Additionally, Naderi Kalali et al. [22] provided evidence that the majority of plastic waste is composed of up to five different polymers. Finally, a review of the inventory of commercially available flake sorters [6] indicates that these systems are primarily designed to handle five or fewer polymer classes (including PC, PE, and PMMA), aligning with the experimental focus. To mitigate random effects caused by train–test splits, we again performed 100 independent draws and reported the median values. We refrained from the presentation of the benchmark results, since this approach collapsed completely, predicting almost all spectra as PC regardless of their true class.
The confusion matrix (cf. Figure 1) reveals a robust performance of the proposed model, as demonstrated by a balanced accuracy of 0.9022 and a recall of 0.9732 across the polymer classes, with perfect recall for PBT, PC, and PMMA. This high recall confirms previous results (cf. Section 2.1) emphasizing the correct identification of nearly all instances of the target polymers. The precision of was notably lower but at a moderate level with 0.6694, and only PC featured precision issues. Closer examination of the confusion matrix shows that a substantial portion of the misclassifications to PC originated from chemically similar polymer blends (PC + ABS and PC + PET). These blends exhibit spectral characteristics that closely resemble pure PC, causing the model to confuse them with the PC class. This observation suggests that the model is not failing but rather behaving in accordance with the inherent spectral similarities. From an industrial perspective, if these blends were redefined as part of the PC class, the apparent misclassifications would turn into correct identifications, improving the measured precision.
The tradeoff between recall and precision is also captured in an F1-score of 0.7393, indicating a good balance between precision and recall but also highlighting areas for improvement, particularly in terms of classification precision. In conclusion, the findings of the multi-target classification experiments illustrate the strong performance and practical applicability of the proposed model for polymer identification. The failure of the benchmark underlines the importance of the developed methodology, which successfully avoids the instability and oversimplification of the common unsupervised alternative.

3. Discussion

This study evaluated the performance of a proposed classification framework under binary and multi-target classification settings. The classification framework exploits randomly generated and simulated spectra, resulting in the need for only 50 partly measured spectra per class for model training, and addresses the issue of insufficient sample sizes, particularly in the context of DL models. To mitigate data leakage and random effects caused by train–test splits, we operated with different PGs groups for training and validation and repeated the experiments 100 times with randomly drawn samples.
The binary classification results underscore the effectiveness and robustness of the proposed model in accurately identifying polymer classes. The model consistently achieved excellent recall values, with perfect detection rates for PBT, PE, and PMMA, as well as a near-perfect recall for PC. The balanced accuracy scores, particularly for PC (0.8618), PMMA (0.8082), and PE (0.9856), further confirm the model’s reliable performance across classes, even under class imbalance. However, the relatively low precision values in certain classes—notably for PBT and PMMA—highlight that while the model is highly sensitive, it remains susceptible to false positives, indicating an opportunity for further refinement. The varying performance may be attributed to the similarity in spectral properties among certain polymers, making them more difficult to distinguish compared to groups with more distinct attributes. By comparison, the benchmark model completely collapsed for most classes, failing to detect PBT, PE, and PMMA altogether and instead overfitting to PC. This stark contrast demonstrates the limitations of standard unsupervised approaches and reinforces the strength and necessity of the tailored method presented in this work.
The primary focus of this research is multi-target classification which is vital for improving recycling plant processes, specifically for polymer sorting and separation tasks. The approach allows for the simultaneous identification of multiple polymer types, addressing the complexities of efficiently sorting diverse materials and improving the purity of recycled products without the need for an extensive spectral data base and high measurement effort.
The findings of the multi-target classification experiments emphasize the robustness and practical relevance of the proposed model for real-world polymer identification in recycling environments. The model demonstrated consistently high recall (0.9732 overall, with perfect recall for PBT, PC, and PMMA), ensuring that nearly all instances of target polymers were correctly identified. Although the precision remained moderate, particularly for PC (0.6694), deeper analysis reveals that most misclassifications arose from blends like PC + ABS and PC + PET. Given their spectral similarity to pure PC, these results reflect chemical reality rather than model deficiency. The model achieved a balanced accuracy of 0.9022 and an F1-score of 0.7393, demonstrating its ability to maintain a good balance between recall and precision. In contrast, the benchmark model collapsed completely, highlighting the superiority and necessity of the proposed approach over unsupervised alternatives.
The robustness of the model emphasizes its potential for efficient multi-target classification in recycling applications, encouraging further investigation into its applications. Despite the use of relatively simple techniques for spectra simulation, the model achieved very good performance across key metrics. This outcome highlights the efficacy of the approach, even when relying on straightforward simulation methods.
One of the most significant implications of these results is the potential to reduce measurement effort while maintaining equally high model accuracy. By requiring only the spectra of the target classes, the approach eliminates the requirement for large datasets, streamlining the data acquisition process and significantly lowering associated costs and time requirements. Especially for simple deep learning tasks, the need for samples is often in the four-digit thousands range, while our framework requires only 50 spectra. This approach simplifies the training process by removing the necessity of obtaining comprehensive spectra from non-target materials. This efficiency is particularly beneficial in scenarios where data collection is resource-intensive or constrained by logistical challenges.
Finally, the inherent flexibility of the method allows it to adapt to diverse conditions and configurations, broadening its applicability across various classification tasks. In conclusion, the combination of simplicity, efficiency, and flexibility positions this method as a promising approach for future research and practical implementation in multi-target classification. Its strong performance, even under simplified conditions, highlights its utility in efficiently and effectively addressing complex classification challenges.
While the proposed approach shows promising results, several limitations must be acknowledged. Despite efforts to enrich spectral variance, the research was conducted under controlled laboratory conditions which may not fully capture the complexity of real-world scenarios. However, the expectation of increased spectral variability per target class at the plant scale could offer more favorable outcomes. In such environments, the natural diversity of materials and operating conditions may better align with the methodology’s requirements, enhancing its applicability and robustness, since increased variance could result in the collapse of unsupervised approaches, potentially limiting their effectiveness in distinguishing between closely related classes.
Moreover, the interaction of synthetic spectra, particularly in multi-target classification, poses a significant challenge. While simulated spectra may be favorable for the classification of one target, they could potentially hinder the correct classification of another target. This interaction underscores the need for deeper analysis and careful design of synthetic spectra to ensure balanced performance across all target classes. Addressing this point will require advanced methodologies and rigorous validation to mitigate the risk of misclassification.
Furthermore, precision remains a challenging metric. The moderate precision levels suggest that the model struggles with a tradeoff between the sensitivity and the accuracy of positive predictions. Closer examination of the simulation parameters has yielded interesting results, suggesting potential avenues for further optimization. Refinements in parameter tuning and algorithm design may help to enhance precision without compromising other performance metrics.
To illustrate the influence of the modelling parameters, we visualize the F1-score for two PGs as a function of width and depth for the SYN simulation using the U-SIM algorithm (see Figure 2). Instead of choosing random values within the interval, we operate with fixed values to demonstrate the effect on model performance.
The results for PET-1 (Figure 2a) display a smooth and continuous surface, with a prominent peak in the F1-score at higher width values and intermediate depth values. The smoother transitions across the surface suggest that PET-1 is less sensitive to variations in width and depth, as there were fewer abrupt changes in its F1-score. In contrast, the surface plot for PS-8 (Figure 2b) was more irregular, with multiple localized peaks and valleys. While PS-8 also achieved high F1-scores in certain regions, its performance was more sensitive to parameter variations. The sharp peaks and valleys indicate that PS-8 requires precise tuning of width and depth to reach optimal performance, and its performance can degrade quickly when parameters deviate from the optimal range. Overall, the figures highlight the sensitivity of the model performance to modeling parameters and deliver valuable insights for selecting or improving models depending on the specific requirements of the application.
In summary, the methodology shows potential but requires further study under real-world conditions. Future research should enhance spectral variance, address SYN interactions, and refine model parameters to improve precision, aiming to bridge the gap between laboratory results and practical application. Nevertheless, the proposed method requires significantly less data compared to traditional models while still maintaining high accuracy. This flexibility not only reduces the computational burden but also opens up new avenues for real-time, dynamic material classification in industrial applications.

4. Materials and Methods

This section is dedicated to the data acquisition and preparation, modeling approach, and methods applied.

4.1. Data Acquisition and Plastics

The data collection was based on our own acquisition process under laboratory conditions, and the data were already acquired during a previous work [23]. For a more detailed description, we refer to Kulko et al. [23]. The experimental setup integrated a NIR (Nir22) spectrometer from Dr. Licht GmbH [24]. This spectrometer uses an InGaAs line sensor with 128 pixels and a grating monochromator, spanning 877–1902 nm, with a resolution of <16 nm FWHM and SNR > 5000 without temperature control and compensation.
We utilized 25 distinct polymer granules (PGs) as samples, as illustrated in Figure 3, which provides an overview of the sample set. The specific granules referred to in this article are denoted by a combination of its a polymer type and a numerical label. The samples encompass a range of polymers, including PE, PMMA, polyethylene terephthalate (PET), PC, polystyrene (PS), PBT, and acrylonitrile butadiene styrene (ABS), as well as co-polymers such as PC-PET and PC-ABS. We recorded 50 absorbance sample spectra for each PG at room temperature, with each sample spectrum being an average of 10 spectra. It is important to note that in addition to the absorption of radiation by molecules, the spectrometer also observes other material properties and effects of the experimental setup. These phenomena have already been discussed in [23].
The samples exhibited variability in several attributes, including color, transparency (opaque or transparent), and shape (ellipsoidal or cylindrical). Despite these variations, all polymer granules maintained a consistent diameter of approximately 2–3 mm, roughly matching the size of flakes. As commercially available flake sorters are generally designed to identify and sort at the polymer type level [6], we merged PG classes belonging to same polymer type. To enhance the dataset for testing purposes, we added spectra of several diverse materials such as steel (cf. Table 2). The group reflective implies faulty spectra with high reflectivity. The study was carried out at pixel level, and the wavelength was cut to 877–1693 nm during data preparation.

4.2. ML/DL-Based Classification Model

The established procedure for the building of ML/DL-based classification models in the context of plastics sorting [25,26], amongst others, starts with the collection of raw data gathered from sensors such as cameras or spectrometers.
Lubongo et al. [6] divide the material identification process using ML/DL techniques into three key stages: data processing and feature extraction, algorithm selection, and performance evaluation. Pre-processing of spectral data, including baseline correction and dimensionality reduction (e.g., principal component analysis (PCA)), reduces computational demand. Classification algorithms then analyze the processed data, defining optimal decision boundaries to assign categories or classes. These algorithms are categorized as supervised, unsupervised, semi-supervised, and reinforcement learning models. Supervised learning requires pre-labeled data for training and predicts outcomes based on prior knowledge, while unsupervised learning identifies patterns in untagged datasets. Semi-supervised models combine both approaches, leveraging small labeled datasets alongside large unlabeled ones to improve accuracy. In waste management, supervised learning and neural networks are particularly effective for classification tasks (cf. [8]).
Finally, performance evaluation identifies the most effective model for accurate predictions. This is often done by splitting the dataset into training and test data to avoid overfitting. In a more sophisticated approach, the dataset is split into three parts (training, test, and validation data). In combination with the test data, the training data are exploited for model fitting, including hyperparameter determination, whereas the test data are used to evaluate the different model versions. The performance of the final model is measured solely through validation using both appropriate quantitative metrics and qualitative techniques such as analysis of misclassifications.

4.3. Experimental Setup

As classification algorithm, we applied a CNN because of its superior performance compared to common ML tools (e.g., PLS-DA, SVM, Logistic Regression), as demonstrated in various systematic literature reviews (cf. [6,7]). We resorted to the network architecture proposed by Ng et al. [27] because this architecture has already achieved promising results in the classification of plastics, as elaborated by Neo et al. [11] and Kulko et al. [23].
Our overarching goal was to construct a highly accurate classification model based on a limited number of samples and a large unknown space. In order to achieve a high degree of tractability and industrial acceptance, we used simple mathematical and statistical methods in the configured pipeline rather than a black box approach, even if the classifier applied was of black box character. In the following, we will present the key elements of our novel approach and constitute the methods applied in the subsequent sections. To guard against confusion regarding the different spectra, we will distinguish between the five types of spectra: ORG-Train, original target spectra acquired by measurement used for testing during training (ORG-Test), ORG-Val, REP, and SYN. Since we will validate the presented approach using measured data only, to avoid bias due to simulation and independence effects, a further subdivision into train and test is not necessary for the latter two.
Starting with the most straightforward challenge, a binary classification, the classifier only has to differentiate between two classes of wanted (e.g., PET) and unwanted plastic. We focus on the use case of identifying a specific target polymer in an unknown material flow. All other ORG-Test-NT will be used for validation of prediction results.
Given the limited data base of ORG-Train (cf. Section 4.1), data augmentation was required to generate a sufficiently large number of data instances for the target classes. We combined a PCA together with a multi-variate normal distribution to enrich the dataset with REPs (cf. Section 4.4.1. Other non-DL enrichment techniques are also possible (such as SMOTE [28]), but we prefer the proposed algorithm due to its lower complexity. The entire data flow is sketched in Figure 4.

4.4. Data Augmentation

In simple terms, the classification task is to recognize and distinguish the individual characteristics of a particular plastic from all other spectra. In the case of a binary classification, the classifier distinguishes between one trained target class and a second class containing all samples that do not belong to the desired material. Therefore, the shape and the design of these spectra used for training are arbitrary as long as they fulfill certain general conditions (identical wavelength range and number of pixels, similar scale, etc.) and are different from the target class. To comply with these requirements, we decided to consult a two-folded approach.
Sample space extension:
We simulated SYN by accumulating randomly generated multiple Gaussian-shaped peaks across a given range of wavelength (cf. Section 4.4.2) targeting to enable the model to reject contaminants, such as wood, from the plastic classification (U-SIM).
Sample space refinement:
To teach the classifier to discriminate between various plastics, we mixed the SYN with REPs from the target class to create specific, but different plastic spectra from the target class (S-SIM). This simulation setting allows a reasonable number of SYNs of any design to be generated with little effort and in a very short time.
The binary classification task can be enlarged to a multi-target classification by adding the further target classes to training data and taking into account all target classes in the simulation of SYNs.
In the following, we discuss the simulation results in detail. We then describe the underlying algorithms used to generate REP, U-SIM, and S-SIM.

4.4.1. Generation of REP

The generation of REP is based on the PCA algorithm by extracting the first three principal components from the set of target class spectra (ORG-Train). The number k of principal components is directly affected by the given data and is selected to capture the most significant variance within the data.
Based on the computed PCA scores, we generate random numbers using a multi-variate normal distribution to enlarge the dataset by imitating score values. Even if the PCA scores are uncorrelated by design resulting from the transformation of the original variables into a new set of orthogonal (statistically independent) components [29], we select a joint distribution function to draw from a single random number generator. Lastly, the randomly generated score values are inverse-transformed into the original feature space. In this approach, it is crucial that the ORG-Train dataset has the highest possible spectral variability in order to comprehensively cover the sample space of the target class.

4.4.2. Simulation of SYN

We proceeded with the simulation of SYN and introduced the procedure for one single simulated spectrum exemplary. The SYN originated from two processes: U-SIM and S-SIM. Starting with the U-SIM, the computational framework described simulates spectra by summing multiple Gaussian-shaped peaks over a specified wavelength range, introducing noise to emulate experimental conditions and normalizing the resulting spectrum. This process is called spectrum generation process, and its core is a Gaussian peak generator.
We chose this type of kernel because Gaussian peaks closely approximate real-world spectral features. The number of peaks n peaks per spectrum generated is chosen randomly from a pre-defined interval. Each peak is specified by its center C, width W, and depth D. The center wavelength is drawn from the set of pixels, while the peculiarities for width and depth are determined by a uniformly sampling over a range of values. The set of pixels and the unity for W (denoted in nm) and D are given by the spectrometer properties. A Gaussian noise is added to mimic the real measurement variability. Lastly, the spectrum S is normalized to its maximum value to ensure consistent scaling and the spectrum is inverted by 1 −S. The inversion reflects absorption-style reflectance behavior. Since we work with raw absorbance spectra (ORG-Train and ORG-Test), this step is optional.
The range of values for all parameters was chosen empirically without any optimization. We will discuss optimization in Section 3. The entire generation process for a given number n of SYNs is depicted in Algorithm A1 in the Appendix A.1 in the form of pseudo-code, as implemented in software. This technique allows simulations to be tailored to specific experimental scenarios.
As outlined in Section 4.3, the S-SIM process is based on the mixing of unspecific SYN and REP (see Algorithm A2 in the Appendix A.1). We used blending weights to control the contribution of the two components by creating a new mixture. Furthermore, we also scaled the mixed spectra by multiplying by 2 and shifted the blended data m to preserve a consistent mean relative to the input data. In line with Algorithm A1, the range of values of all parameters was selected empirically without any optimization.
The presented figures illustrate the different spectral data categories employed to train the classification model for PC. Visualizations for further polymers are included in Appendix B.1. Figure 5a shows the original spectra measured from PC samples, characterized by well-defined absorption peaks corresponding to the functional groups present in polycarbonate, such as carbonyl (C=O), aromatic C–H, and C–O–C linkages [30]. The low intra-class variability and consistent spectral features confirm the reliability of this dataset as a chemical reference for model calibration. Figure 5b presents REP, synthetically generated to expand the target class while preserving the key spectral characteristics observed in the measured PC data. These replicas demonstrate controlled variability in peak intensity and slight baseline shifts, ensuring that the model can generalize across typical experimental deviations without compromising chemical fidelity.
In contrast, Figure 5c displays S-SIM, created by mixing SYNs with REPs. These data refine the sample space, introducing subtle variations that challenge the model to distinguish between closely related polymers. The simulated spectra in this category maintain recognizable PC spectral features but exhibit controlled distortions and peak variability. This ensures the model’s robustness in differentiating PC from polymers with overlapping spectral characteristics. Figure 5d represents U-SIM, which are intentionally designed with high variability and randomly distributed Gaussian peaks. These spectra do not reflect polycarbonate signatures but rather serve as negative examples, enabling the model to effectively reject contaminants and materials unrelated to the target class.
This data generation strategy is designed to address both the need for sufficient target class representation and the ability to discriminate against out-of-class materials. However, the effectiveness and reliability of simulated data depend on the extent to which these simulations accurately preserve the underlying polymer properties. Hence, the accuracy of simulated data plays a pivotal role in determining the predictive reliability of machine learning models. To ensure robust model development and precise assessment of model performance, we validated the prediction results using independent experimental datasets (cf. Section 2), which is crucial to ensure that the functional group patterns and spectral responses characteristic are faithfully replicated.

4.5. Metrics

To assess classification performance, we report the scores of several metrics. To measure overall performance, we computed balanced accuracy to avoid inflated performance estimates on imbalanced datasets. Balanced accuracy is defined as the average of the recall obtained for each class. In the binary case, the balanced accuracy is calculated as the arithmetic mean of the true positive rate and the true negative rate [31]: If the classifier performs equally well on both classes, this metric reduces to the conventional accuracy, which is the number of correct predictions divided by the total number of predictions. The balanced accuracy score ranges from 0 to 1.
Furthermore, we recorded the metrics recall, precision, and F1-score. Recall provides an insight into the percentage of polymers correctly recovered for each class. To maximize the market value of plastic waste, precision should take precedence to ensure the production of high-quality recycled plastic. Recall, however, remains critical to prevent potentially recyclable polymers from being downcycled. It is also essential that recall is high enough to recover a significant amount of recyclable plastic, thereby ensuring the economic viability of the recycling process [19].
In the context of binary classification, we only presented the results for the target class for these two metrics. Otherwise, we calculated the score for each class and found their average weighted by the number of true instances for each class. The F1-score represents the harmonic mean of precision and recall, effectively balancing the tradeoff between these two metrics [32]. The relative contribution of precision and recall to the F1-score is weighted equally.

4.6. Benchmark and Software

To challenge the proposed model, we exploited an unsupervised approach related to geometric computations in a 3D space. For the binary classification task, we extracted the first three PCs (cf. Section 4.4.1) from the training data belonging to the target class. In contrast to the DL classifier (cf. Section 2), we applied a pre-processing using a Savitzky–Golay filter [33] to optimize performance in order to have a reliable benchmark. Subsequently, we constructed the convex hull of the set of PCA scores. The convex hull is the smallest convex polyhedron that encloses all the given points [34]. We continued with a Delaunay triangulation, which subdivides the convex hull into non-overlapping tetrahedra such that no point lies inside the circumference of any tetrahedron. For more information on the method and computation techniques applied, we refer to Barber et al. [34].
The decision if a given spectrum stems from the target class depends on whether the PCA transformed point lies inside or outside of the convex hull defined by the Delaunay triangulation. In the setting of multiple target classification, the same procedure is applied by performing a Delaunay triangulation for each target class. If multiple hulls include a point, the class with the closest hull center is selected and measured by the Euclidean distance.
Self-customized Python scripts were used for data processing, model generation, and visualization, including the following versions: Python version 3.8 with packages pandas 2.2.3 [35], numpy 1.26.4 [36], scikit learn 1.5.2 [37], scipy 1.14.1 [38], and  matplotlib 3.9.2. [39] To build the one dimensional convolutional neural network (1D-CNN), we resorted to the Keras library, which is part of the TensorFlow 2.17.0 library. We used the Adam optimizer with a learning rate of 0.0002. The neural networks were trained using the categorical cross entropy as the loss function, which provides a one-hot representation of the labels. All computations were carried out on an Apple M3 Pro chip with an 8-Core CPU and a 10-Core GPU (Apple Inc., Cupertino, CA, USA).

5. Conclusions

Plastic recycling holds significant potential to reduce global carbon emissions, yet this potential can only be realized by overcoming key challenges in current recycling practices. High data requirements remain a major barrier to advancing research and improving recycling systems. In this work, we demonstrated the successful training of DL models using synthetic spectra. With only 50 partly measured spectra, we expanded the sample space by generating synthetic data through Gaussian-shaped peaks, enabling the model to distinguish plastics from contaminants like wood. Additionally, we refined the sample space by combining synthetic data with representative spectra from the target class, creating distinct yet class-specific plastic spectra. This approach allows for the rapid and efficient generation of synthetic data with customizable designs.
The proposed model exhibited strong performance in both binary and multi-target polymer classification tasks. In multi-target classification, which is critical for recycling processes, the model accurately identified multiple polymer types without relying on extensive spectral databases, achieving a balanced accuracy of 0.9022 and a recall of 0.9732.
This study establishes a basis for the development of highly accurate classification models in plastics recycling. By addressing limitations in data availability and model scalability, this approach aligns with environmental and technological goals. Its simplicity and effectiveness offer the potential to enhance material recovery rates and reduce contamination, thereby supporting more sustainable recycling practices. Future work will focus on scaling this approach for industrial applications and further optimizing machine learning techniques. Incorporating GANs could unlock the full potential of current DL advancements, enhancing the realism and diversity of synthetic spectra. The future of plastic recycling hinges on sustained investment in infrastructure, research, and data-driven technologies. By advancing sorting methods and leveraging cutting-edge machine learning, the industry can progress toward a circular economy in which plastics are continuously reused and repurposed.

Author Contributions

Writing—original draft, A.P.; Writing—review and editing, R.-D.K., A.H., and B.E. All authors contributed substantial work at every stage of this publication. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Bavarian Ministry of Economic Affairs, Regional Development and Energy within the project “KICK - KI-Fusionsmodelle zur Identifikation und Charakterisierung von Kunststoffmischungen” (Grant No. DIK0565).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

Andreas Hanus is from Company Sesotec GmbH. All authors declare no conflicts of interest.

Appendix A

Appendix A.1

Algorithm A1 U-SIM process
Input: 
Wavelength array λ , number of spectra n
Output: 
spectra R
1:
function GeneratePeak ( λ , C, W, D)
2:
     P D · exp 1 2 λ C W 2
3:
    return P
4:
end function
5:
function GenerateSpectrum( λ )
6:
    Initialize S 0                                                                         ▹ Initialize spectrum
7:
    Randomly choose number of peaks n peaks [ 1 , 50 ] ;
8:
                                  restricted to n peaks N +
9:
    Generate random parameters:
10:
            C Randomly select n peaks centers from λ
11:
            W Uniformly sample n peaks widths in [ 5 , 100 ]
12:
            D Uniformly sample n peaks depths in [ 0.1 , 1.0 ]
13:
              restricted to W and D  N +
14:
    Add peaks:
15:
    for  i = 1 to n peaks  do
16:
         S S + GeneratePeak( λ , C i , W i , D i )
17:
    end for
18:
    Add noise: S S + Normal ( 0 , 0.005 )
19:
    Normalize: S S / max ( S )
20:
    Invert: S 1 S
21:
    return S
22:
end function
23:
function GenerateSpectra( λ , n)
24:
    Initialize matrix R zeros ( n , | λ | )
25:
    for  i = 1 to n do
26:
         R [ i , : ] GenerateSpectrum( λ )
27:
    end for
28:
    return DataFrame R
29:
end function
Algorithm A2 S-SIM process
Input: 
REP, SYN, number of spectra n
Output: 
spectra R
1:
function AddIntercept( S Y N , n)
2:
     I n t e r c e p t Uniformly sample n values in [ 0 , 1 ]
3:
     P I n t e r c e p t + S Y N
4:
    return P
5:
end function
6:
function AddDirt( R E P , S Y N , n)
7:
    Generate random spectra/parameters:
8:
            s 1 Randomly sample n spectra from R E P
9:
            s 2 Randomly sample n spectra from S Y N
10:
            w 1 Uniformly sample n weights in [ 0.2 , 0.8 ]
11:
   
12:
     w 2 0.8 w 1
13:
    Blend: m s 1 · w 1 + s 2 · w 2
14:
    Scale m s · 2
15:
    Adjust: s m + ( mean ( s 1 ) mean ( m ) )
16:
    return s
17:
end function

Appendix B

Appendix B.1

Figure A1. Comparison of spectral data categories used in the study for PMMA.
Figure A1. Comparison of spectral data categories used in the study for PMMA.
Recycling 10 00065 g0a1aRecycling 10 00065 g0a1b
Figure A2. Comparison of spectral data categories used in the study for PET.
Figure A2. Comparison of spectral data categories used in the study for PET.
Recycling 10 00065 g0a2aRecycling 10 00065 g0a2b

References

  1. Karali, N.; Khanna, N.; Shah, N.; Climate Impact of Primary Plastic Production. Lawrence Berkeley National Laboratory, Report #: LBNL-2001585. 2024. Available online: https://escholarship.org/uc/item/12s624vf (accessed on 21 March 2025).
  2. Zheng, J.; Suh, S. Strategies to reduce the global carbon footprint of plastics. Nat. Clim. Change 2019, 9, 374–378. [Google Scholar] [CrossRef]
  3. Cabernard, L.; Pfister, S.; Oberschelp, C.; Hellweg, S. Growing environmental footprint of plastics driven by coal combustion. Nat. Sustain. 2021, 5, 139–148. [Google Scholar] [CrossRef]
  4. Salahuddin, U.; Sun, J.; Zhu, C.; Wu, M.; Zhao, B.; Gao, P. Plastic Recycling: A Review on Life Cycle, Methods, Misconceptions, and Techno-Economic Analysis. Adv. Sustain. Syst. 2023, 7, 2200471. [Google Scholar] [CrossRef]
  5. Nayanathara Thathsarani Pilapitiya, P.; Ratnayake, A.S. The world of plastic waste: A review. Clean. Mater. 2024, 11, 100220. [Google Scholar] [CrossRef]
  6. Lubongo, C.; Bin Daej, M.A.A.; Alexandridis, P. Recent Developments in Technology for Sorting Plastic for Recycling: The Emergence of Artificial Intelligence and the Rise of the Robots. Recycling 2024, 9, 59. [Google Scholar] [CrossRef]
  7. Kroell, N.; Chen, X.; Greiff, K.; Feil, A. Optical sensors and machine learning algorithms in sensor-based material flow characterization for mechanical recycling processes: A systematic literature review. Waste Manag. 2022, 149, 259–290. [Google Scholar] [CrossRef]
  8. Ramos, E.; Lopes, A.G.; Mendonça, F. Application of Machine Learning in Plastic Waste Detection and Classification: A Systematic Review. Processes 2024, 12, 1632. [Google Scholar] [CrossRef]
  9. Yang, J.; Xu, Y.P.; Chen, P.; Li, J.Y.; Liu, D.; Chu, X.L. Combining spectroscopy and machine learning for rapid identification of plastic waste: Recent developments and future prospects. J. Clean. Prod. 2023, 431, 139771. [Google Scholar] [CrossRef]
  10. Wu, T.W.; Zhang, H.; Peng, W.; Lü, F.; He, P.J. Applications of convolutional neural networks for intelligent waste identification and recycling: A review. Resour. Conserv. Recycl. 2023, 190, 106813. [Google Scholar] [CrossRef]
  11. Neo, E.R.K.; Low, J.S.C.; Goodship, V.; Debattista, K. Deep learning for chemometric analysis of plastic spectral data from infrared and Raman databases. Resour. Conserv. Recycl. 2023, 188, 106718. [Google Scholar] [CrossRef]
  12. Conlin, A.; Martin, E.; Morris, A. Data augmentation: An alternative approach to the analysis of spectroscopic data. Chemom. Intell. Lab. Syst. 1998, 44, 161–173. [Google Scholar] [CrossRef]
  13. Gracia Moisés, A.; Vitoria Pascual, I.; Imas González, J.J.; Ruiz Zamarreño, C. Data Augmentation Techniques for Machine Learning Applied to Optical Spectroscopy Datasets in Agrifood Applications: A Comprehensive Review. Sensors 2023, 23, 8562. [Google Scholar] [CrossRef] [PubMed]
  14. Zhang, L.; Wang, Y.; Wei, Y.; An, D. Near-infrared hyperspectral imaging technology combined with deep convolutional generative adversarial network to predict oil content of single maize kernel. Food Chem. 2022, 370, 131047. [Google Scholar] [CrossRef] [PubMed]
  15. Chen, H.; Tan, C.; Lin, Z.; Chen, M.; Cheng, B. Applying virtual sample generation and ensemble modeling for improving the spectral diagnosis of cancer. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2024, 318, 124518. [Google Scholar] [CrossRef]
  16. Kim, Y.; Lee, W. Distributed Raman Spectrum Data Augmentation System Using Federated Learning with Deep Generative Models. Sensors 2022, 22, 9900. [Google Scholar] [CrossRef]
  17. Hao, Y.; Li, X.; Zhang, C. Improving prediction model robustness with virtual sample construction for near-infrared spectra analysis. Anal. Chim. Acta 2023, 1279, 341763. [Google Scholar] [CrossRef]
  18. Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 6999–7019. [Google Scholar] [CrossRef]
  19. Neo, E.R.K.; Yeo, Z.; Low, J.S.C.; Goodship, V.; Debattista, K. A review on chemometric techniques with infrared, Raman and laser-induced breakdown spectroscopy for sorting plastic waste in the recycling industry. Resour. Conserv. Recycl. 2022, 180, 106217. [Google Scholar] [CrossRef]
  20. Kroell, N.; Chen, X.; Küppers, B.; Lorenzo, J.; Maghmoumi, A.; Schlaak, M.; Thor, E.; Nordmann, C.; Greiff, K. Near-infrared-based determination of mass-based material flow compositions in mechanical recycling of post-consumer plastics: Technical feasibility enables novel applications. Resour. Conserv. Recycl. 2023, 191, 106873. [Google Scholar] [CrossRef]
  21. Jernelv, I.L.; Hjelme, D.R.; Matsuura, Y.; Aksnes, A. Convolutional neural networks for classification and regression analysis of one-dimensional spectral data. arXiv 2020, arXiv:2005.07530. [Google Scholar] [CrossRef]
  22. Naderi Kalali, E.; Lotfian, S.; Entezar Shabestari, M.; Khayatzadeh, S.; Zhao, C.; Yazdani Nezhad, H. A critical review of the current progress of plastic waste recycling technology in structural materials. Curr. Opin. Green Sustain. Chem. 2023, 40, 100763. [Google Scholar] [CrossRef]
  23. Kulko, R.D.; Pletl, A.; Hanus, A.; Elser, B. Detection of Plastic Granules and Their Mixtures. Sensors 2023, 23, 3441. [Google Scholar] [CrossRef] [PubMed]
  24. Dr. Licht GmbH. Messtechnik und Chemometrische Auswertung. Available online: https://dr-licht.de (accessed on 1 September 2022).
  25. Carrera, B.; Piñol, V.L.; Mata, J.B.; Kim, K. A machine learning based classification models for plastic recycling using different wavelength range spectrums. J. Clean. Prod. 2022, 374, 133883. [Google Scholar] [CrossRef]
  26. Cucuzza, P.; Serranti, S.; Capobianco, G.; Bonifazi, G. Multi-level color classification of post-consumer plastic packaging flakes by hyperspectral imaging for optimizing the recycling process. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2023, 302, 123157. [Google Scholar] [CrossRef]
  27. Ng, W.; Minasny, B.; Montazerolghaem, M.; Padarian, J.; Ferguson, R.; Bailey, S.; McBratney, A.B. Convolutional neural network for simultaneous prediction of several soil properties using visible/near-infrared, mid-infrared, and their combined spectra. Geoderma 2019, 352, 251–267. [Google Scholar] [CrossRef]
  28. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  29. Trochimczyk, J.; Chayes, F. Some properties of principal component scores. J. Int. Assoc. Math. Geol. 1978, 10, 43–52. [Google Scholar] [CrossRef]
  30. Smith, B. Infrared Spectral Interpretation: A Systematic Approach; Includes Bibliographical References and Index; GRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
  31. Kelleher, J.D. Fundamentals of Machine Learning for Predictive Data Analytics; Includes Bibliographical References and index; The MIT Press: Cambridge, MA, USA, 2015; pp. 557–563. [Google Scholar]
  32. Powers, D.M.W. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2011, arXiv:2010.16061. [Google Scholar] [CrossRef]
  33. Press, W.H.; Teukolsky, S.A. Savitzky-Golay Smoothing Filters. Comput. Phys. 1990, 4, 669–672. [Google Scholar] [CrossRef]
  34. Barber, C.B.; Dobkin, D.P.; Huhdanpaa, H. The quickhull algorithm for convex hulls. ACM Trans. Math. Softw. 1996, 22, 469–483. [Google Scholar] [CrossRef]
  35. McKinney, W. Data Structures for Statistical Computing in Python. SciPy 2024, 445, 51–56. [Google Scholar] [CrossRef]
  36. Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef] [PubMed]
  37. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  38. Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef]
  39. Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
Figure 1. Classification results in form of a confusion matrix illustrating the classification performance of the proposed model across the classes PBT, PC, PE, PMMA, and non-target material.
Figure 1. Classification results in form of a confusion matrix illustrating the classification performance of the proposed model across the classes PBT, PC, PE, PMMA, and non-target material.
Recycling 10 00065 g001
Figure 2. F1-score for binary classification as a function of width and depth for the SYN simulation using the U-SIM algorithm.
Figure 2. F1-score for binary classification as a function of width and depth for the SYN simulation using the U-SIM algorithm.
Recycling 10 00065 g002
Figure 3. Visualization of the 25 different plastic granules studied, including the specification of polymer type.
Figure 3. Visualization of the 25 different plastic granules studied, including the specification of polymer type.
Recycling 10 00065 g003
Figure 4. The data pipeline begins with measurement setup and data acquisition, followed by splitting the full dataset into target and non-target materials. Target data are further divided into distinct training (ORG-Train), testing (ORG-Test), and validation (ORG-Val) sets. Synthetic data (U-SIM and S-SIM) and replicas (REP) are generated from ORG-Train data and used alongside ORG-Train and ORG-Test sets for model fitting. The trained model is finally validated on ORG-Val and non-target data.
Figure 4. The data pipeline begins with measurement setup and data acquisition, followed by splitting the full dataset into target and non-target materials. Target data are further divided into distinct training (ORG-Train), testing (ORG-Test), and validation (ORG-Val) sets. Synthetic data (U-SIM and S-SIM) and replicas (REP) are generated from ORG-Train data and used alongside ORG-Train and ORG-Test sets for model fitting. The trained model is finally validated on ORG-Val and non-target data.
Recycling 10 00065 g004
Figure 5. Comparison of spectral data categories used in the study for PC.
Figure 5. Comparison of spectral data categories used in the study for PC.
Recycling 10 00065 g005
Table 1. Results of binary classification.
Table 1. Results of binary classification.
PoylmerModelBenchmark
B-AccuracyRecallPrecisionF1-ScoreB-AccuracyRecallPrecisionF1-Score
PBT0.67061.00000.04180.08020.45920.00000.00000.0000
PC0.86180.96670.32390.48560.50081.00000.11200.2015
PE0.98561.00000.50000.66670.50000.00000.00000.0000
PMMA0.80821.00000.06960.13020.49200.00000.00000.0000
Table 2. Number of spectra for each polymer type and further materials used for model testing.
Table 2. Number of spectra for each polymer type and further materials used for model testing.
MaterialNumber of Samples
ABS50
PBT100
PC650
PC + ABS100
PC + PET50
PE100
PET50
PMMA100
PS50
premium steel126
photo cardboard90
carbohydrate34
unknown plastic a37
unknown plastic b45
unknown organic55
reflective204
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pletl, A.; Kulko, R.-D.; Hanus, A.; Elser, B. A Plastic Classification Model Based on Simulated Data. Recycling 2025, 10, 65. https://doi.org/10.3390/recycling10020065

AMA Style

Pletl A, Kulko R-D, Hanus A, Elser B. A Plastic Classification Model Based on Simulated Data. Recycling. 2025; 10(2):65. https://doi.org/10.3390/recycling10020065

Chicago/Turabian Style

Pletl, Alexander, Roman-David Kulko, Andreas Hanus, and Benedikt Elser. 2025. "A Plastic Classification Model Based on Simulated Data" Recycling 10, no. 2: 65. https://doi.org/10.3390/recycling10020065

APA Style

Pletl, A., Kulko, R.-D., Hanus, A., & Elser, B. (2025). A Plastic Classification Model Based on Simulated Data. Recycling, 10(2), 65. https://doi.org/10.3390/recycling10020065

Article Metrics

Back to TopTop