*Article* **Enhancement of Image Classification Using Transfer Learning and GAN-Based Synthetic Data Augmentation**

**Subhajit Chatterjee 1, Debapriya Hazra 1, Yung-Cheol Byun 1,\* and Yong-Woon Kim <sup>2</sup>**


**Abstract:** Plastic bottle recycling has a crucial role in environmental degradation and protection. Position and background should be the same to classify plastic bottles on a conveyor belt. The manual detection of plastic bottles is time consuming and leads to human error. Hence, the automatic classification of plastic bottles using deep learning techniques can assist with the more accurate results and reduce cost. To achieve a considerably good result using the DL model, we need a large volume of data to train. We propose a GAN-based model to generate synthetic images similar to the original. To improve the image synthesis quality with less training time and decrease the chances of mode collapse, we propose a modified lightweight-GAN model, which consists of a generator and a discriminator with an auto-encoding feature to capture essential parts of the input image and to encourage the generator to produce a wide range of real data. Then a newly designed weighted average ensemble model based on two pre-trained models, inceptionV3 and xception, to classify transparent plastic bottles obtains an improved classification accuracy of 99.06%.

**Keywords:** deep learning; generative adversarial networks; image classification; transfer learning; plastic bottle

**MSC:** 68U10

#### **1. Introduction**

Due to flexibility in terms of cost, light weight, processing, and ease of carrying, plastic bottles are the most widely used material in daily life and industrial fields. Every day, tons of plastic bottles are dumped as waste, and in addition, toxic, hazardous materials in the trash are polluting the environment day by day [1]. An essential strategy for dealing with this issue is the recycling process. Recycling plastic bottles can be used further in new products, automobiles, textile, etc. Plastic bottle recycling has recently emerged as a significant part of the plastic bottle industry, potentially saving fossil fuels while simultaneously lowering greenhouse gas emissions [2].

The recycling task involves a lot of labor cost, and the DL approach helps in the way to automatically classify waste plastic bottles for recycling tasks [3]. Much research has been conducted to find a category of cost-effective PET bottle classifiers. PET bottles can be divided into several categories based on chemical resins, transparency, and color [4]. PET plastic bottles have the highest recycling values compared to other plastic bottles. The Ministry of Environment announced on 5 February that it would start a pilot project for the separation and disposal of transparent plastic bottles from this month. At the beginning of this month, five regions were phased out individually, including Busan, Cheonan (Chungnam), Gimhae (Gyeongnam), Jeju, and Seogwipo. One of the changes will require companies to label plastic bottles that are easy to remove. Legislative changes will also bring system reforms to make recycling more convenient. Plastic bottles with

**Citation:** Chatterjee, S.; Hazra, D.; Byun, Y.-C.; Kim, Y.-W. Enhancement of Image Classification Using Transfer Learning and GAN-Based Synthetic Data Augmentation. *Mathematics* **2022**, *10*, 1541. https:// doi.org/10.3390/math10091541

Academic Editors: Florin Leon, Mircea Hulea, Marius Gavrilescu and Stefano De Marchi

Received: 14 March 2022 Accepted: 29 April 2022 Published: 4 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

easy-to-tear labels are produced in Japan. Designed to protect the environment from plastic pollution, it promotes the growth and innovation of industry and human life through comprehensive transformation: the production, use, and recycling of plastic bottles. PET bottles must be colorless and unlabeled to be completely recycled. It is only possible to crush transparent plastic bottles without labels into thin plastic flakes. These materials can be utilized to create new plastic items.

Plastics are an inextricable aspect of human life, particularly in countries experiencing rapid economic growth. Drinking water bottles and beverage bottles are two of the most common plastic applications in everyday life. Plastic bottles must be separated according to recyclable and non-recyclable to improve plastic bottle waste management. Recycling is the process of rebirth; plastic bottles that have been discarded are recycled into highquality consumer goods. Recycled clear plastic bottles have been resurrected as garments, eco-friendly purses, and cosmetic bottles, among other high-quality items. Previously, all discarded plastic bottles used to make garments and other products in South Korea were imported from abroad. Only 10% of the old plastic bottles collected in the community were recycled into high-quality consumer goods. Another point to consider is that the production of plastic emits a substantial quantity of greenhouse gases, which contributes to global warming. Because recycling reduces crude oil and energy consumption, greenhouse gas emissions, such as carbon dioxide, also decrease significantly. Transparent plastic bottles are mainly used to make fiber materials for clothing, with polar fleece, a polyester material that has lately gained popularity, being a notable example. However, the foreign matter in waste bottles collected in South Korea throughout the disposal and composing procedure raises concerns about their suitability for recycling. According to the application requirements, the sorting equipment only needs to pick transparent plastic bottles in a sorting process. So correct bottle classification is crucial in the sorting system based on machine vision.

This paper proposes a GAN-based model, modified lightweight-GAN, to generate synthetic images using a small dataset containing real plastic bottle images. The main contribution is as follows:


#### **2. Related Works**

Deep learning with a small training dataset leads to overfitting issues. The capacity to generalize data expansion was examined using deep neural network training data extensions. Instead of using traditional data augmentation techniques, GAN can generate more stable and realistic images.

A computer-aided machine learning-based plastic bottle classification technique was proposed by [5]. Specifically, the authors performed feature extraction for classification tasks by achieving 80% accuracy. The authors also proposed classification with the region of interest segmentation technique with PET and non-PET plastic bottle dataset with two classes and achieved 80% of accuracy [6]. Ref. [7] proposed an automated classification of plastic bottles based on SVM for recycling purposes and achieved 97.3% of accuracy based on the best computation time. A real-time application was designed for plastic bottle identification, and the proposed system achieved an accuracy of 97% [8]. Generative adversarial networks are an advanced technique for data augmentation and use semisupervised cycleGAN to augment the training data. Hazra et al. proposed generating synthetic images for bone marrow cells using GAN and the classification approach using the transfer learning model [9]. The proposed model achieved 95% precision and 96% recall. The authors of [10] proposed an inception-cycleGAN model that will classify COVID-19 X-ray images and achieved 94.2% of accuracy. An artificial intelligence-based plastic bottle color classification system was proposed by [11] and achieved 94.14% of accuracy. Wang et al. [12] proposed the recycling of used plastic bottles based on a support vector machine algorithm, and accuracy reached 94.7%. In [13], medical image classification is a famous approach; the researcher applied data augmentation using GAN and using three transfer learning models to overcome the training-time constraints. They achieved 86.45% of accuracy using the inceptionV3 model. Srivastav et al. [14] proposed an approach of generating a synthetic image using GAN to improve the diagnosis of pneumonia disease using chest X-ray image classification and achieved 94.5% accuracy. Waste management and waste classification are essential issues for the environment and human health. Recycling is one most basic forms of waste management; we need to classify the particular waste that can be recycled. There are few publicly available datasets for waste classification; for this reason, Alsabei et al. [15] proposed a model that can classify waste using pre-trained models, and for generating data, they applied the GAN approach. In [16], an intelligent system for waste sorting using a skip-connection-based model was proposed, and the novel model achieved 95% of accuracy. Pio et al. [17] hypothesized that combining a transfer learning approach with the metabolic features developed will deliver a considerable improvement in reconstruction accuracy. A new combined methodology was proposed for a higher recognition rate and robustness to enhance a low-resolution video [18]. GAN and transfer learning are used to deal with license plate image recognition in various challenging situations. Mohammed et al. [19] suggested an ensemble classifier that decreases both the space and temporal complexity of the generated ensemble members while classifying an instance by improving prediction time while maintaining significant accuracy.

#### **3. Dataset**

In our experiment, we collected plastic bottle images from the industry in South Korea. However, it is not a publicly available dataset. We intend to build models that correctly classify plastic bottle images before deploying them into a plastic bottle recycling machine. The precise detection and identification of plastic bottles is the most significant challenge when designing a recycling machine in preventing fraud. It depends on precision and cost.

There are few publicly plastic bottle datasets available. Trashnet [20] is a dataset used for trash classification that has plastic bottle images in it. Each image in the PET bottle dataset contains only one object, a plastic bottle, and a plain background. The human eye more easily perceives this but not by a recycling machine. There are no other objects in the image that could provide additional information.

Our dataset, named the PET-bottle dataset, has six classes, having a total number of 1667 plastic bottle images. We divided the plastic bottle images according to the design and bottle specification; we uniquely named three classes, Bottle\_ShapeA, Bottle\_ShapeB, and Bottle\_ShapeC, and the other three classes are called Masinda, Pepsi, and Samdasoo, respectively. Plastic bottles which do not have a label on them but have black caps are named Bottle\_ShapeA. Plastic bottles with a design on the body and a white cap but without a label are named Bottle\_ShapeB. Plastic bottles that do not have any design or label on them but have a red cap are designated as Bottle\_ShapeC. Masinda is a drinking water bottle company whose class depicts a company label and sky-colored cap. Pepsi is a well-known soft drink manufacturing company whose class represents a label with a company logo and black cap. Jeju Samdasoo is a mineral water brand developed by the Jeju Province Development Corporation; this plastic bottle image depicts a label with a company logo and white cap. Details of the original dataset are given in Table 1. The Sl number represents the numerical value for six classes, from 0 to 5; the class name depicts all the six classes we have used for our experiment. The images per class section describes the images containing each class.

It is noticeable that the dataset is small, and classes are primarily imbalanced in the original dataset, with most data labeled as the Samdasoo class. Training a deep neural network to categorize the data into six categories will over-fit the data with this unbalanced dataset.


**Table 1.** Detailed specification of original dataset.

#### **4. Methodology**

The proposed method is discussed in this section. Figure 1 depicts the proposed method's block diagram. Our proposed method can be divided into five parts. The first block (a) shows the overview of the original dataset with the class label. In the second block, (b) synthetic images are generated using a modified lightweight-GAN model for data augmentation. The third block (c) is traditional data augmentation based on basic image manipulation techniques. In the fourth block, the (d) pre-trained ImageNet model is fine-tuned on our dataset for plastic bottle classification. In the last part, (e) is the evaluation metrics for classification. A detailed explanation is given in the following subsections.

**Figure 1.** Workflow of the proposed framework. (**a**) shows the overview of the original dataset with the class label; (**b**) synthetic images are generated using a modified lightweight-GAN model for data augmentation; (**c**) is traditional data augmentation based on basic image manipulation techniques; (**d**) pre-trained ImageNet model is fine-tuned on our dataset for plastic bottle classification; (**e**) is the evaluation metrics for classification.

#### *4.1. Original Dataset Description*

Our dataset contains 1667 images of plastic bottles, which are segmented into six classes. The PET bottle dataset is divided into six types according to the bottle specification details.

#### *4.2. Synthetic Image Generation Using Modified Lightweight-GAN Model*

Recently, researchers have focused on combining GANs with other models or techniques that allow for superior data reconstruction. We improvised a new approach to our model. We used convolution layers compatible with high-resolution images for both G and D. The basic GAN architecture for the generator and discriminator are graphically depicted in Figure 2. The model structure of G and D and a description of the component layers are shown in Figures 3 and 4.

#### 4.2.1. Generative Adversarial Networks

The generative adversarial network (GAN) was developed by Goodfellow et al. in 2014 [21]. This intriguing invention has been gaining interest in various machine learning fields. GAN consists of two interacting neural networks. It is a generator (*G*) and a discriminator (*D*). The generator network is trained to map points in the latent space to generate new data instances. The discriminator network is trained to distinguish between the actual and plausible images produced by the generator network. Eventually, the generator generates images that resemble actual training samples. The generator is updated based on the discriminator's predictions to have better images at the training time. The discriminator increases its ability to distinguish between actual and fake images. The difference between real and counterfeit labels determines the discriminator loss. The label specifies whether the image is artificial or natural. The general diagram of GAN is shown in Figure 2.

**Figure 2.** Generative adversarial networks architecture.

The main objective of GAN theory can be painted as a two-player min–max game which can be defined by,

$$\min\_{G} \max\_{D} V(D, G) = \mathbb{E}\_{\mathbf{x} \sim P\_d(\mathbf{x})} [\log D(\mathbf{x})] + \mathbb{E}\_{rnv \sim P\_{rnv}(rnv)} [\log(1 - D(G(rnv)))] \tag{1}$$

The discriminator and the generator are involved in a min–max game with the value function *V*(*D*, *G*). The discriminator is trying to minimize its reward *V*(*D*, *G*), and the generator is attempting to reduce the discriminator's reward or, in other words, maximize its loss.

The generator always tries to minimize the following loss function; on the other hand, the discriminator always maximizes it. In GAN, the generator receives the original input data *x*, adds random noise variable *Prnv*(*rnv*) and generates samples *G*(*rnv*). *D*(*x*) is the discriminator's estimate of the probability that real instance x is real over the data distribution *Pd*. *D*(*G*(*rnv*)) is the discriminator's estimate of the probability that a fake

instance is real. The generator tries to create almost perfect images to fool the discriminator. In contrast, the discriminator tries to improve the performance by distinguishing real and fake samples until the time that the samples generated from the generator cannot be distinguished from real data samples.

#### 4.2.2. Generator Network

The generator needs to be impending with the deeper network to generate good synthesized images to orchestrate with high images. A deeper network means more of a convolution layer and more training time for up-sampling. Considering GPU for the training, we first fed the original image data and resized it into 128 × 128 × 3. The image was scaled to [−1, 1] pixel values to match the generator. It was issued because it uses the tanh activation function. The generator network inputs a 100 × 1 noise vector and generates fake samples. We used four convolution layers with ReLU activation to create high-quality synthesized images. Figure 3 illustrates the generator model architecture.

**Figure 3.** The architecture of the generator.

#### 4.2.3. Discriminator Network

Following the assumption that the encoder and discriminator network information overlaps, we partially amalgamated the encoder into the discriminator [22]. The main objective of the encoder is to learn the representation feature, whereas the discriminator aims to discover the discriminating feature.

$$\mathcal{L}^{\text{pixel}}\_{\text{recconv}} = \mathbb{E}\_{q \sim D\_{\text{encolar}}(\mathbf{x}), \mathbf{x} \sim I\_{\text{real}}} [||\kappa(q) - \tau(\mathbf{x})||] \tag{2}$$

where the discriminator's feature map is *q*, the *κ* function processes *q*, and the decoder's function *τ* reflects processing on sample *x* from real images *Ireal*.

Figure 4 illustrates the discriminator model architecture. Firstly, we resize the original image to produce the *I* part. Then, the main part of our discriminator acts as an encoder to extract a good image feature map, and the decoder can produce a good reconstruction *I*- . The decoder consists of four convolutional layers to create the image 128 × 128. Finally, the discriminator and decoder are trained together to minimize the reconstruction loss by matching the part *I* to *I*. The auto-encoding technique used is a common strategy for selfsupervised learning that has been shown to improve model robustness and generalization capabilities [23–25].

**Figure 4.** The architecture of the discriminator.

Recently, generative models have focused on combining new strategies with the GAN model. In many approaches, the authors combined GAN and VAE to generate a good image [22]. On the other hand, our proposed model is a pure GAN with a significantly more simple generator and discriminator and an auto-encoding function. The auto-encoding training is exclusively used for discriminator regularization and does not include the generator [26].

Here, a hinge adversarial loss for GAN is suggested, incorporating SVM margins and considering actual and fake samples falling within the margins while calculating the loss. Artificial samples outside of the boundaries that partially incorporate false local patterns are ignored in the generator training stage [27,28].

$$\mathcal{L}\_{\rm D} = -\mathbb{E}\_{\rm x \sim I\_{\rm rad}}[\min(0, -1 + D(\mathbf{x}))] - \mathbb{E}\_{\rm z \sim P(z)}[\min(0, -1 - D(G(z)))] + \mathcal{L}\_{\rm recons}^{\rm pixel} \tag{3}$$

$$\mathcal{L}\_{\mathbb{G}} = -\mathbb{E}\_{z \sim P(z)}[D(G(z))]\tag{4}$$

#### *4.3. Traditional Data Augmentation Techniques*

In this section, we describe traditional data augmentation based on basic image manipulation techniques [29]. Additionally, consider issues with limited datasets and how imbalances and data expansion can be helpful for oversampling solutions [30]. Class imbalance describes the dataset as a biased ratio of the majority to a sample of the minority.

• Flipping :

There are two types of flipping used for image transformation; horizontal flipping is more common than vertical flipping. This augmentation is one of the simplest to employ and has shown to be effective on various datasets.

$$
\begin{bmatrix} p' \\ q' \\ 1 \end{bmatrix} = \begin{bmatrix} -1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \times \begin{bmatrix} p \\ q \\ 1 \end{bmatrix} \tag{5}
$$

$$p' = -p,\\ q' = q \tag{6}$$

Horizontal flipping formulas are depicted in Equations (5) and (6).

$$
\begin{bmatrix} p' \\ q' \\ 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \times \begin{bmatrix} p \\ q \\ 1 \end{bmatrix} \tag{7}
$$

$$p' = p,\\ q' = -q \tag{8}$$

Vertical flipping formulas are depicted in Equations (7) and (8).

• Rotation :

The image is rotated right or left on an axis between [0–360] degree for rotation augmentations. The rotation degree parameter significantly impacts the safety of rotation augmentations. Outside of the rotating area, pixels are be filled with 0, and the formula of rotation is given in Equation (9).

$$R = \begin{bmatrix} \cos(q) & \sin(q) & 0 \\ -\sin(q) & \cos(q) & 0 \\ 0 & 0 & 1 \end{bmatrix} \tag{9}$$

where *q* specifies the angle of rotation.

• Translation :

To avoid data-position bias, shifting the image left, right, up, or down is a valuable adjustment, so the neural network looks everywhere in the image to capture it. The original image is translated into the [0–255] value range.

$$t = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ t\_x & t\_y & 1 \end{bmatrix} \tag{10}$$

where in Equation (10), *tx* specifies the displacement along the *x* axis, and *ty* specifies the displacement along the *y* axis.

#### • Noise added :

Noise is an exciting augmentation technique; noise injection injects a matrix of random values usually drawn from a Gaussian distribution. Stochastic data expansion is applied when the neural network sees the same image, which is slightly different. This difference can be seen as adding noise to the data sample and letting the neural network learn generalized features rather than overfitting the dataset.

#### *4.4. Transfer Learning*

Transfer learning techniques are used to improve the performance of machine learning algorithms using labeled data. TL efforts learn and apply one or more source tasks to enhance learning in related fields. It has been studied as a machine learning process to solve problems. TL includes pre-training models that have already been trained on large datasets and models that have been retrained at several levels of the model on a small training set. The initial layer of the pre-training network will be changed if necessary. You can use the final layer of the model's fine-tuning parameters to learn the capabilities of the new dataset [31]. According to the new task, models that have already been trained will be retrained with a smaller new dataset, and the model weights will be modified. Newly developed neural networks parameters are not built from scratch. The DL algorithm can achieve higher functionality or performance for many problems, but they need a lot of data for training time.

As a result, it can be helpful to reuse pre-trained models for similar tasks. We used two pre-trained models named inceptionV3 and Xception. The PET bottle dataset is used to fine-tune the models once they have been pre-trained with the ImageNet dataset [32]. The most common method for fine-tuning is to delete the last completely connected layer of pre-trained CNN models and replace it with a new fully connected layer (the same size as the number of classes in our dataset). Our PET bottle dataset contains six categories. Finally, the suggested method meets the goal of providing excellent classification results with a small dataset.

#### 4.4.1. InceptionV3 and Xception

The pre-trained network models InceptionV3 and Xception were trained on millions of images from the ImageNet dataset. The InceptionV3 [33] and Xception [34] networks include 48 and 71 layers, respectively, and require a 299 × 299 × 3-pixel input image. The structure of the InceptionV3 and Xception are shown in Figures 5 and 6. While Inception considers typical congestion and yield issues, efficient results can be obtained by using asymmetric filters and bottlenecks and replacing large filters with smaller ones. Xception is simpler and more efficient. Using cross-channel and spatial correlations independently, Xception provides more specific and efficient outcomes. For the Xception model, depthwise separable convolution is also proposed, as well as the use of cardinality to develop better abstractions.

**Figure 5.** InceptionV3 model architecture.

**Figure 6.** Xception model architecture.

#### 4.4.2. Ensemble Learning

Ensemble learning is a way of combining multiple models to benefit in terms of computation and performance. The results of an ensemble of deep neural networks are always superior to those of a single model. The average ensemble learning was used in this study, with the same weights allocated to each model.

$$P = \frac{\sum \mathcal{M}\_i}{N} \tag{11}$$

where, in Equation (11), *Mi* is the probability of model *i*, and *N* is the total number of models.

DL models have different architectures and complexity; they do not provide the same result. Therefore, assigning more weights to the model performing better is convenient. By this, the maximum output can be extracted from any model. The challenge is to find the correct combination of model weights. We used the grid search technique to solve this challenge, as shown in Figure 7. A total of 1000 weight combinations were used. The search procedure continues until all varieties have been checked. The approach finally provided us with the ideal weight combination for the maximum of our given evaluation metric.

**Figure 7.** Grid search method for finding the weights.

#### *4.5. Evaluation Metrics*

The performance of our model was evaluated, using accuracy, precision, and recall, and the F1-score based on the confusion matrix; it includes four indicators, true positive (TP), false positive (FP), false negative (FN), and true negative (TN).

Accuracy is calculated by dividing the number of true positives and true negatives by the total number of instances. Precision is calculated with actual positive classes from the total predicted classes. The recall is derived by dividing the real positive values by the actual positive values. The F1-score is simply the average of precision and recall. Equations (12)–(15) show the accuracy, precision, recall, and F1-score calculations.

$$Accuracy = \frac{(TP + TN)}{(TP + TN + FP + FN)}\tag{12}$$

$$Precision = \frac{TP}{(TP + FP)}\tag{13}$$

$$Recall = \frac{TP}{(TP + FN)}\tag{14}$$

$$F1\text{-score} = 2 \times \frac{Precision \times Recall}{(Precision + Recall)}\tag{15}$$

#### **5. Results**

#### *5.1. Experimental Setup*

In this study, the first part of the experiments, the modified lightweight-GAN model was trained in 500 epochs and generated synthetic images of PET bottles for each of the six categories. The weights of the generator and discriminator models were updated after each epoch to produce a composite image as close as possible to the actual image. After network training, the PET bottle dataset has 4200 images, including original and synthetic images generated from the modified lightweight-GAN model and traditional augmentation methods. In the second series of experiments, the pre-trained Inception V3 and Xception models were trained using the original training set and a combination of the training set and the image of the generated plastic bottle. Later, we employed a weighted average ensemble to enhance the classification performance using the IncepX-Ensemble model. The samples of real plastic bottle images and synthetic images generated by the modified lightweight-GAN model are shown in Figure 8. For training hyperparameter settings, we

used binary cross-entropy as the cost function, a learning rate of 0.0001, and Adam as the optimizer. We used 100 epochs and a batch size of 32 for every model.

**Figure 8.** Original plastic bottle images and synthetic plastic bottle images generated by modified lightweight-GAN.

We divided our dataset that has 4200 images, which includes original plastic bottle images and generated images by the GAN model. Further, we split our dataset into training, validation, and testing sets for training. The training set is given to the machine learning model to analyze and learn the feature; the validation dataset is a sample of the data retained from the model training and is used to estimate the model's method while optimizing the model's hyperparameters. The test set is not used for training, and it is used to determine whether the model's hypothesis is correct. In the experiment, we first divided the dataset into 60% for training and 40% for test data. In addition, the holdout test data were split into 10% for validation (0.25% of total holdout test data) and 30% for testing (0.75% of entire holdout test data). Details of the experimental dataset are given in the Table 2.

**Table 2.** Details of the dataset after data augmentation using both augmentation techniques.


#### *5.2. Performance Metrics of GAN*

We used two metrics to measure the model performance, as shown in Table 3.



**Table 3.** Quantitative comparison on our dataset—inception score (IS), Frechet inception distance (FID).

#### *5.3. Implementation Details*

Specification details for performing the experiments are given below in Table 4. We used the Windows operating system with a single GPU and 32 GB of RAM. We trained our model on Tensorflow 2.6.0 version, CUDA Toolkit version 11.2 and cuDNN version 8.1.


**Table 4.** System components and specification.

#### *5.4. Classification Performance Details*

In Table 5, we show how the performance of pre-trained models, such as InceptionV3 [37], Xception [38] and our ensemble model IncepX-Ensemble, may be used to determine how well classifiers can classify plastic bottle types after being trained with both original and synthetic data. The results show that the accuracy of the models is enhanced when synthetic data generated by GAN models are used to train the model. Among all the GAN models, our proposed IncepX-Ensemble model produced the best accuracy value of 99.06%.

**Table 5.** Comparison of IncepX-Ensemble with other existing models.


Acc, Pre, Rec, and F1 refer to accuracy, precision, recall, and f1-score, respectively.

We also assessed the performance of classification models that use original data and actual and synthetic data. We employed two different combinations of augmentation procedures for the augmentation of plastic bottle images. To produce synthetic data, Augmentation-1 employs a modified lightweight-GAN. Flipping, rotation, translation, and noise addition are all used in Augmentation-2. We kept the total number of images for each example to ensure a fair comparison.

In Table 6, we show the performance of the traditional augmentation technique with transfer learning models. We also examined classification model performance utilizing original, augmented data and a synthetic image generated by our model, which produces better quality images and performs better. We can notice that in the case of noise addition, accuracy is fairly low because of overfitting.


**Table 6.** Accuracy, precision, recall, and F1-score of different classification using traditional augmentation methods and a combination of original with synthetic data.

Acc, Pre, Rec, and F1 refer to accuracy, precision, recall, and f1-score, respectively.

We evaluated our IncepX-Ensemble model with the ImgaeNet dataset in Table 7. We first trained the models with the original imageNet data and tested the model with actual data. The model can be easily adapted to support fine-tuning for classification tasks. We used the dataset for 60% for training and 40% for testing, and further testing data were split into 0.75% of the total holdout test data and 0.25% validation. The performance of the classification models using synthetic data, augmented data and a mix of original and synthetic data was then determined using the same procedure. The images created by our suggested improved lightweight-GAN model are of higher quality. It performs quantitatively better than existing GAN models, as can be seen from all of the findings.

**Table 7.** Evaluation of our proposed model on the ImageNet dataset.


Acc, Pre, Rec, and F1 refer to accuracy, precision, recall, and f1-score, respectively.

#### **6. Conclusions**

The aim is to develop an application-based system that automatically detects plastic bottle images. Our proposed approach is simple: to overcome the small and imbalanced dataset, we first applied a modified lightweight-GAN method to generate synthetic images of plastic bottles. Next, we developed a transfer learning-based model, IncepX-Ensemble, classifying different plastic bottle images. Therefore, we developed a new system using the transfer learning technique, and a new framework was developed by integrating with modified lightweight-GAN. Modified lightweight-GAN was used for data augmentation enhancement of the dataset, and the proposed transfer learning-based model was trained and evaluated using original and generated images. Finally, we designed a weighted average ensemble model named IncepX-Ensemble, tuning the influence of the base models using the grid search technique. However, the two transfer learning models show excellent performance, though in some cases, the two models fail to classify plastic bottles correctly. To obtain an improved performance, we used a combination of transfer learning and the weighted average technique to boost the application performance. The obtained results indicate the algorithm's efficacy with 99.06% accuracy. Future work may validate the proposed model to evaluate recycling performance using more diverse big data. We plan to use the model we developed to explore other datasets and waste management applications

in the future. We hope that this will play a positive role in plastic bottle waste management and environmental growth.

**Author Contributions:** Conceptualization, S.C. and D.H.; Formal analysis, S.C.; Funding acquisition, Y.-C.B.; Methodology, S.C. and D.H.; Writing—review and editing, S.C.; Investigation, Y.-C.B.; Resources, Y.-C.B.; Project administration, Y.-C.B. and Y.-W.K.; Supervision, Y.-C.B. and Y.-W.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was financially supported by the Ministry of SMEs and Startups (MSS), Korea, under the "Startup growth technology development program (R&D, S3125114)".

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Mathematics* Editorial Office E-mail: mathematics@mdpi.com www.mdpi.com/journal/mathematics

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34

www.mdpi.com

ISBN 978-3-0365-4516-5