A Hybrid Model for Household Waste Sorting (HWS) Based on an Ensemble of Convolutional Neural Networks

Wu, Nengkai; Wang, Gui; Jia, Dongyao

doi:10.3390/su16156500

Open AccessArticle

A Hybrid Model for Household Waste Sorting (HWS) Based on an Ensemble of Convolutional Neural Networks

by

Nengkai Wu

^1,†,

Gui Wang

^2,† and

Dongyao Jia

^1,*

¹

School of Automation and Intelligence, Beijing Jiaotong University, Beijing 100044, China

²

China Construction Second Engineering Bureau Ltd., Beijing 101121, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Sustainability 2024, 16(15), 6500; https://doi.org/10.3390/su16156500

Submission received: 20 June 2024 / Revised: 11 July 2024 / Accepted: 12 July 2024 / Published: 30 July 2024

Download

Browse Figures

Versions Notes

Abstract

:

The exponential increase in waste generation is a significant global challenge with serious implications. Addressing this issue necessitates the enhancement of waste management processes. This study introduces a method that improves waste separation by integrating learning models at various levels. The method begins with the creation of image features as a new matrix using the Multi-Scale Local Binary Pattern (MLBP) technique. This technique optimally represents features and patterns across different scales. Following this, an ensemble model at the first level merges two Convolutional Neural Network (CNN) models, with each model performing the detection operation independently. A second-level CNN model is then employed to obtain the final output. This model uses the information from the first-level models and combines these features to perform a more accurate detection. The study’s novelty lies in the use of a second-level CNN model in the proposed ensemble system for fusing the results obtained from the first level, replacing conventional methods such as voting and averaging. Additionally, the study employs an MLBP feature selection approach for a more accurate description of the HW image features. It uses the Simulated Annealing (SA) algorithm for fine-tuning the hyperparameters of the CNN models, thereby optimizing the system’s performance. Based on the accuracy metric, the proposed method achieved an accuracy of 99.01% on the TrashNet dataset and 99.41% on the HGCD dataset. These results indicate a minimum improvement of 0.48% and 0.36%, respectively, compared to the other methods evaluated in this study.

Keywords:

Household Waste Sorting (HWS); Multi-Scale Local Binary Pattern (MLBP); Convolutional Neural Network (CNN); deep learning; Simulated Annealing (SA)

1. Introduction

One of the global crises is the problem of waste production. The rate of waste production is alarmingly increasing with the growth of the population, industrial advancement, and the rise in human living standards [1]. Research indicates that the global generation of urban solid waste presently exceeds two billion tons annually, with projections suggesting a surge of 70%, amounting to approximately 3.4 billion tons by the year 2050 [2]. The implications of this phenomenon are far-reaching, affecting human health through the increase and spread of new diseases, the environment via pollution of water, soil, and air, the ecosystem through biodiversity reduction, and the economy, for instance, water pollution leading to a decrease in agriculture and the formation of critical economic problems [3].

The resolution of this crisis requires efficient waste management. Nowadays, many studies are being conducted to increase waste management efficiency and ultimately reduce the mentioned harmful effects [4].

Of all the factors that prevent sustainable development, the increasing rate of waste production is one of the most alarming. Inadequate waste disposal measures cause many problems, such as air, water, and soil pollution caused by toxic materials from decaying waste. Also, improper disposal affects the natural resources since the recyclable materials are dumped in the landfills. Furthermore, it creates breeding grounds for disease-carrying pests, posing health risks to humans. One of the most important fields that can help to make the world less damaging to the environment is the development of improved waste sorting systems. In order to fulfill this important need, this study proposes a new model for Household Waste Sorting (HWS).

Waste materials can be classified into four primary categories based on their sources. The first category, household waste, includes a variety of items commonly discarded in residential settings, such as kitchen waste, paper, plastic, glass, food materials, and old clothes. The second category, industrial waste, comes from industrial processes such as mining, construction, and manufacturing, including production residues, chemicals, rubber, scrap metal, and construction waste. The third category, hospital waste, consists of waste generated in healthcare facilities, including medical materials, disposable covers, and disinfectants. The final category, special waste, includes specific types of waste, such as electronic waste, vehicle waste, and large construction waste [5]. Due to the severe harmful effects of industrial, hospital, and special wastes, these categories have been the focus of extensive research in waste management.

Traditional waste management solutions are typically employed in the context of household waste. This often involves manual waste sorting to determine whether the waste can be converted into energy, composted, recycled, or sent to a landfill. However, this manual process has several drawbacks. It is not only costly and time-consuming, but it also poses a threat to human health due to potential exposure to harmful substances. Furthermore, the accuracy of manual sorting can be quite low, leading to inefficiencies in the waste management process [6].

Recent research has proposed using artificial intelligence to solve this sub-problem of waste management. The proposed technique is to use computer vision to segregate waste images into different categories. Compared to the traditional solution, it has higher accuracy, faster speed, lower cost, and eliminates the health risk to human labor.

In this study, a hybrid model based on Convolutional Neural Network (CNN) is proposed for HWS. The proposed method attempts to improve the existing challenges for automatic waste separation by combining learning models at different levels. This method includes three main steps.

Step 1: Image features are created as a new matrix through the Multi-Scale Local Binary Pattern (MLBP) technique, which can optimally display characteristics and patterns at different scales.

Step 2: An ensemble model at the first level merges two CNN models, each performing the detection operation separately.

Step 3: The final output is obtained through a second-level CNN model. This CNN model uses the knowledge gained by the first-level models and combines these features to perform a more accurate detection.

This research makes several key contributions to the field of HWS:

The introduction of a hierarchical ensemble model for accurate detection of various types of Household Waste (HW).
The utilization of local and global features of HW images to enhance detection accuracy.
The application of the Simulated Annealing (SA) algorithm for optimizing the learning components of the hierarchical ensemble system.
The provision of a generalizable method for detecting various types of HW.

Additionally, the novel aspects of this research include:

The use of a Convolutional Neural Network (CNN) model in the second level of the proposed ensemble system for fusing the obtained results of the first level. Instead of using conventional methods such as voting and averaging, a CNN model is used to fuse the results of the first-level models.
The utilization of an MLBP feature selection approach for a more precise description of the HW image features.
The application of the SA algorithm for fine-tuning the hyperparameters of the CNN models and optimizing the system’s performance.

The rest of the paper is organized as follows:

Section 2 reviews the research conducted to improve the HWS process using artificial intelligence. Section 3 describes the proposed method. Section 4 presents the research results and compares them with similar methods. Section 5 provides the conclusion.

2. Research Background

This section reviews recent research and highlights areas that require further investigation and refinement to advance waste classification systems using computer vision. More robust and practical solutions can be presented by examining research gaps.

Focusing on municipal solid waste management, Ruiz et al. investigated automatic image-based waste classification. The authors compared different CNN architectures (VGG, Inception, and ResNet) using the TrashNet dataset. Their work emphasized the potential of computer vision in waste management [7].

A new framework for waste classification based on deep transfer learning was proposed in [8], which facilitates automatic waste classification and recycling. This framework focuses on transfer learning but does not address the challenges of domain adaptation when transferring models across different waste environments.

The authors [9] introduced a new multi-branch channel expansion network to classify waste images. By combining different CNN architectures, including VGG, Inception, and ResNet, the proposed model achieved improved accuracy in identifying different types of waste. Although the multi-branch channel expansion network improves the classification accuracy, this study did not fully address the effect of the unbalanced data set.

In [10], X-DenseNet was proposed for garbage classification using visual images; this paper did not address the interpretability of the model. Methods can be explored to clarify the decision-making process, particularly for regulatory compliance and public trust.

Focusing on office waste, the authors in [11] used transfer learning with the Inception-V3 model. By fine-tuning the pre-trained model, the authors achieve effective waste classification. This study did not examine the effect of different light conditions. The strength can be examined in different lighting scenarios.

Wang et al. proposed an intelligent urban waste management system that integrates deep learning techniques and Internet of Things (IoT) devices. While the proposed system integrates deep learning and IoT for efficient waste management, a more detailed exploration of the WS process can be explored to increase efficiency [12].

Fu et al. presented an intelligent waste classification system that combines deep learning techniques with an embedded Linux system. The system classifies waste items using deep neural networks and contributes to efficient waste management practices [13].

The authors in [14] reviewed deep learning-based image recognition technology for garbage classification. Their research examined different neural network architectures and their effectiveness in identifying different types of waste. This helps in automating WS processes.

Chen et al. [15] improved the ShuffleNet V2 architecture for garbage classification. Their improved model achieves proper accuracy in identifying different categories of waste. This helps in efficient WS and recycling.

Zhao et al. proposed an intelligent waste classification system using MobileNetV3-Large architecture. This study did not examine the impact of dataset diversity. Generalizing the model for different categories of waste can be investigated [16].

In [17], a Garbage Classification Network (GCNet) was introduced that combines EfficientNetv2, Vision Transformer, and DenseNet for garbage image recognition. As data augmentation expands the dataset, this paper does not address the specific challenges posed by garbage images, such as inter-class similarity and complex backgrounds.

The authors in [18] proposed a Depth-Wise Separable Convolution Attention Module (DSCAM) to capture the inherent relationships in the garbage image features. It focuses on important information while ignoring interference. This method works better than classical approaches. In this paper, the trade-off between model complexity and accuracy when using DSCAM was not investigated.

Li et al. [19] focused on classifying garbage images in the context of human-robot interaction. They proposed a fusion feature representation model that combines different bases of waste classification. This method increases the classification accuracy. The paper did not investigate the effect of different lighting conditions or complex backgrounds on the classification of garbage images.

While existing research shows promise in using computer vision for household garbage classification, there are limitations. Models often struggle with generalizability to diverse waste and environments, imbalanced datasets with underrepresented waste types, and lack interpretability for regulatory purposes. Additionally, real-world challenges like varying lighting are not always considered. Ensemble methods combining multiple CNNs offer a potential solution. By leveraging the strengths of different CNNs, ensembles could achieve better generalizability, handle imbalanced datasets more effectively, potentially offer greater interpretability, and be made more robust to real-world variations through techniques like data augmentation. In the current paper, HWS is performed through a two-level weighted aggregation of three CNN models. At the first level, two CNN models form an ensemble system for output approximation. Each CNN model strives to perform the waste type detection operation based on different inputs. After determining the output of each of the models, at the second level, the proposed hybrid model employs a CNN to fuse the knowledge gained by the first-level models and determine the final output.

3. Research Methodology

HWS, through machine vision techniques, in addition to leveraging powerful processing techniques, necessitates the utilization of a rich dataset capable of covering a wide spectrum of patterns. Accordingly, this section first mentions the features of the dataset utilized to meet this requirement and then details the proposed model for HWS based on it.

3.1. Dataset

The present study examines the HWS system using two separate data sources. The first data source is the TrashNet dataset [20], which is recognized as one of the most commonly used datasets in this field. This collection includes 2527 images with an RGB color system that classifies types of waste into six categories:

Plastic (482 samples)
Glass (501 samples)
Metal (410 samples)
Paper (594 samples)
Cardboard (403 samples)
Organic materials (137 samples)

All images have dimensions of 512 by 384 pixels and are recorded under identical lighting conditions with a white background. Figure 1 displays some sample images from this dataset.

The second dataset utilized in this study is HGCD [21]. This system has been designed and collected to overcome the limitations of previous similar datasets, such as a small number of samples and limited classifications.

The comprehensive HWS system includes 15,150 images that classify HW into 12 categories: 1—Battery, 2—Fabric, 3—Plastic, 4—Biological waste, 5—Green glass, 6—White glass, 7—Brown glass, 8—Metal, 9—Paper, 10—Shoes, 11—Cardboard, and 12—Organic materials. All images in this system have an RGB color system and are recorded under various lighting conditions. The dimensions of the images are also different. The diversity of shooting conditions and the larger number of target categories have made the HWS of this system a greater challenge compared to the TrashNet dataset. In this study, the proposed method for classifying images of both datasets has been examined separately.

3.2. Proposed Method

The proposed method identifies types of HW through a two-level weighted ensemble of three CNN models. At the first level, two CNN models form an ensemble system for approximating the output. Each of the CNN models forming this ensemble system strives to perform the waste type detection operation based on different inputs. After determining the output of each of the models forming this ensemble system, at the second level, a proposed composite model uses a CNN to merge the knowledge acquired by the first-level models and determine the final output. Accordingly, the proposed method can be divided into the following three steps:

Image representation;
Local detection based on first-level CNN models;
The final output will be determined based on the aggregation of results by the second-level CNN.

The structure of the proposed method is depicted in Figure 2. This method attempts to create an efficient system for accurately identifying types of HW through the collaboration of three CNN models. Each of these learning models processes a specific set of image-descriptive features to perform the learning and detection operation locally and then globally based on it. The first two CNN models form an ensemble system for the initial approximation of the target category. In this ensemble system, the first CNN model accepts the input image in the RGB color system as its input, while the second CNN model performs image analysis through the features extracted by the MLBP strategy. Each of these CNN models independently learns the patterns of HW detection in RGB images and MLBP matrices and strives to utilize the acquired knowledge to detect the type of waste in new samples.

As depicted in Figure 2, each of the CNN models in the first-level ensemble system describes the type of waste for each input sample in the form of a posterior probability vector. At the second level, the proposed composite model utilizes an independent CNN to combine these vectors and determine the system’s final output based on them. This CNN model, in addition to the posterior probability vectors produced by the two first-level CNN models, receives the feature maps extracted through them as input and performs the final system output detection based on the integration of these four feature sets. Replacing common strategies in ensemble systems (such as voting and averaging) with a CNN model for the problem of detecting types of HW has its advantages. The diversity of target categories and the performance difference of learning models make it impossible to effectively use the common strategies mentioned for this problem. Employing a high-level learning model to combine the results of the lower-level CNN models can effectively overcome these limitations. As shown in Figure 2, each CNN uses the SA algorithm to adjust its hyperparameters. In this case, the SA algorithm determines the optimal values for the hyperparameters of the convolution and pooling layers of each CNN model, such that based on the resulting configuration, the model’s validation error can be minimized.

3.2.1. Image Representation

The proposed method begins by creating the necessary inputs for each of the CNN models in the first-level ensemble system. This ensemble system includes 2 CNN models, each receiving image features in a different format. The first CNN model (shown as CNN_RGB in Figure 2) receives the input image without any changes in a matrix format with dimensions 100 × 100. The second model (CNN_MLBP) is fed through a matrix organized based on MLBP.

Local binary patterns are one of the most efficient techniques for texture processing applications, and they can appropriately describe texture characteristics in an image. Despite the widespread use of the LBP technique, this method has two major limitations that cast doubt on its effectiveness in object type detection. Firstly, the LBP model is highly sensitive to the presence of noise in images, and this limitation is of higher importance in natural images (non-laboratory conditions). Secondly, the LBP model describes texture patterns based on pixel neighborhood features, and for this reason, this strategy fails to extract more global object features. One solution to simultaneously overcome these limitations is to use LBP features with wider neighborhood radii. Through LBP features with a wider radius, it is possible to not only expand the domain of extracted features but also reduce the detrimental effect of noise on local features. However, it should be noted that applying LBP for N different neighborhood radius values results in N times the volume of extracted features, a significant volume of which is repetitive and, in addition to slowing down processing, can lead to a decrease in detection accuracy. In the proposed method to overcome this problem, a weighted combination of LBP with different radii is used in the form of MLBP. For this purpose, from each input image, 8 LBP matrices are extracted with radius values from 1 to 8, and then the resulting LBP matrices are fused using the following relationship:

F_{M L B P} = \sum_{r = 1}^{8} \frac{L B P_{r}}{r^{2}}

(1)

where

L B P_{r}

represents the feature matrix extracted by LBP for a neighborhood radius r. Based on this relationship, LBP feature matrices for different radii are merged, and with the increase in radius, the effect of LBP features decreases exponentially. It should be noted that for all LBP radius values, the parameter of the number of neighboring pixels is considered to be 8. The resulting MLBP feature matrix is used as input for the CNN_MLBP model in the proposed ensemble system.

3.2.2. Local Detection Based on First-Level CNN Models

RGB and MLBP matrices are utilized as inputs to the CNN_RGB and CNN_MLBP models in the proposed ensemble system, respectively. The CNN models used in this ensemble system have a similar architecture in terms of the order and number of layers, but the configuration of the hyperparameters of each layer in these models is different from each other. The structure of the layers in the proposed CNNs is depicted in Figure 3.

As per Figure 3, each CNN model in the proposed ensemble system consists of 4 consecutive convolution blocks. In each block, after extracting feature maps by 2D convolution layers, ReLU functions are used as activation functions and transfer positive values to the next layers. Each convolution block ends with a pooling layer, which reduces the dimensions of the feature maps. At the end of each CNN model, two fully connected consecutive layers are utilized. The feature maps are converted into a compact vector format by the first fully connected layer, and the likelihood that each instance belongs to one of the target categories is calculated by the next fully connected layer, whose dimensions are equal to the number of target classes. The size of the stride in both directions for the convolution and pooling layers of the suggested CNN structure is set to 1. Additionally, the width/length of the convolution layer that comes before the pooling layer is used to determine its own dimensions.

It is crucial to fine-tune a CNN model’s parameter values. However, a deep CNN requires more consideration during its development due to its larger number of components and complexities as compared to earlier artificial neural networks. It appears unfeasible to find an ideal setting for systems that have multiple parameters using conventional methods due to the complexity and time-consuming nature of tuning these parameters. For this purpose, each CNN’s hyperparameters are adjusted in this study using SA. This section’s remaining content explains how to use the SA method to tune each CNN’s parameter values. Three sets of parameters are present in the CNN_RGB and CNN_MLBP models in the suggested system:

Convolution layers C1 through C4: The convolution filter’s width, length, and number of filters are all included in this collection of parameter values. The filter length and width parameters are regarded as equivalent in order to restrict the search space. Empirical findings suggest that convolution filters of the same length and width can produce acceptable outcomes. The width and length parameters can be assigned an integer in the range [3, 9] based on the dimensions of the input instances. Conversely, the number of filters can be changed as an integer within the range [8, 128] using an eight-step step size.
Pooling function type for layers P1–P4: The max, global, or average functions are the options available for each layer of pooling P1–P4.
Number of neurons in layer FC1: This parameter is defined as an integer in the range of [30, 100].

Various combinations for setting the mentioned parameters lead to a broad problem space, making it time-consuming to determine the optimal state among these states. In the introduced solution, SA is employed to tune these hyperparameters. Next, we first explain the structure of the solution vector in the SA algorithm, and after stating how to evaluate fitness, the steps to discover the best configuration by this mechanism are presented.

There are thirteen tunable hyperparameters (also known as optimization variables). In light of this, SA describes the length of each solution vector as a vector of numbers having a length of 13. Eight variables—four of which correspond to the number of filters and four of which determine the filter’s length/width—are utilized in the solution to set the hyperparameters of the layers C1–C4. The pooling function type is also indicated by four variables that can take the following values: 0 = max, 1 = global, or 2 = average. The number of neurons in FC1 is also indicated by the final variable in the solution vector.

The CNN is first configured using the settings specified by the solution vector in order to assess each solution’s fitness. Next, using 20% of the training instances, the tuned CNN is trained. Lastly, the validation error ratio is applied to assess the solution’s fitness:

F i t n e s s = \frac{V}{N}

(2)

where V is the number of instances for which the predicted label differs from the actual label, and N is the total number of validation instances. In the suggested strategy, the SA algorithm aims to provide a CNN topology that minimizes the fitness value. Considering the explained content, the steps of searching for the optimal configuration by SA are as follows:

1.

Initialization:

○: Define the initial solution (candidate configuration) for the optimization problem.
○: Set a high initial temperature that allows for exploration of the search space.
○: Define a cooling schedule that determines how the temperature will decrease over time.

2.

Iteration Loop:

○: This loop continues until a stopping criterion (zero fitness or a certain number of iterations) is met.

3.

Generate Neighboring Solution:

○: Create a slight modification (perturbation) of the current solution. This could involve swapping elements, adding/removing elements, or making small adjustments to existing values.

4.

Calculate Energy Difference:

○: Evaluate the difference in objective function value (Equation (2)) between the current solution and the neighboring solution.

5.

Acceptance Probability:

○

Use the Metropolis criterion to determine whether to accept the neighboring solution. This involves a probability function based on the temperature and the fitness difference.

▪: If the neighboring solution improves the objective function (negative difference), it is always accepted.
▪: If the neighboring solution worsens the objective function (positive difference), it is still accepted with a probability that decreases as the temperature cools.

6.

Update Solution:

○: If the neighboring solution is accepted, it becomes the current solution.

7.

Decrease Temperature:

○: Apply the cooling schedule to decrease the temperature.

8.

Repeat:

○: Go back to step 2 and continue iterating until the stopping criterion is met.

After executing the above steps, the determined hyperparameters in the best solution are used in the CNN, and the resulting configured CNN is trained using all instances.

3.2.3. Determining the Final Output Based on Aggregating Results by the Second-Level CNN

Solving multi-class problems like detecting types of HW based on ensemble learning strategies often comes with two main challenges. Firstly, this problem in most application scenarios involves multiple target classes. In this case, to use ensemble models, the number of classifiers needs to be increased so that the classifiers can reach a consensus in determining the label of samples, which results in a significant increase in computational load. Secondly, in problems like waste type detection, samples of different classes may have considerable similarities to each other. In this case, the classification models in an ensemble system can reflect striking performance differences. These two features have made it impossible to effectively use basic strategies like majority voting or averaging in this problem. For this reason, the introduced model utilizes a machine learning model to combine the outputs of deep learning models instead of using classic ensemble strategies like majority voting. This strategy increases the flexibility of the proposed ensemble model in combining local decision-making of partial classifiers. On the other hand, hierarchically integrating the knowledge gained by learning models can lead to the formation of a more powerful detection system. The proposed ensemble model utilizes a CNN model to combine the outputs of different models and detect the type of HW. This CNN model receives four categories of features as input and detects the type of HW by integrating them into a matrix. These inputs are:

Posterior Probability Vector of the CNN_RGB Model: This feature set is described as a vector whose length is equal to the number of target classes and is obtained from the output layer of the CNN_RGB model. Each numerical value in this vector corresponds to one of the target classes in the problem and indicates the probability of the input sample belonging to that target class.
Posterior Probability Vector of the CNN_MLBP Model: Similarly, this vector is obtained through the output layer of the CNN_MLBP model and reflects the knowledge gained by this learning model in distinguishing types of waste based on the MLBP matrices of images.
Features Extracted from the MLBP Matrix: This feature set indicates a set of features that the CNN_MLBP model uses to describe each of its input samples, which are obtained through the activation weights of the FC1 layer in the CNN_MLBP model.
Features Extracted from the RGB Matrix: Similarly, this feature set is obtained through the activation weights of the FC1 layer in the CNN_RGB model. These features are used to describe the input samples based on the RGB matrix of the images.

To present these features to the CNN_L2 model in the proposed combination system, all features are first concatenated into a 1 × N vector. Then, this vector is transformed into a matrix with dimensions ⌈√N⌉ × ⌈√N⌉, and the remaining entries in this matrix are filled with zero. The resulting matrix forms the input to the CNN. The CNN_L2 is configured using these instances and is trained so that it can perform the process of detecting types of HW more accurately. It should be noted that the configuration of the layers in this model is similar to the CNN_RGB and CNN_MLBP models, and SA is used in a similar way to configure this model.

4. Results and Discussion

To assess the performance of the proposed method, we conducted experiments using MATLAB 2020a, the results of which are presented in this section. To improve the generalizability of the model, a cross-validation technique was used with 10 iterations. This means that the data was divided into 10 sections of 10% each, and the detection operation was performed after each iteration based on 10% of the test samples, with the remaining data used for training. The parameters of the SA algorithm were used to optimize the models for each of the three CNN models with identical parameters. The total iterations were considered 150, and the population size parameter was set to 100. Also, the Cooling Schedule parameter is logarithmic, and the acceptance probability function is of the Boltzmann type.

In the SA algorithm, the parameters in each CNN model are considered for optimization. These parameters are utilized to set up neural networks on each of the datasets.

The configuration results of these models are presented in Table 1, Table 2 and Table 3. Each of these tables indicates the parameters of each CNN model. These parameters are related to one of the experiment iterations, and the configuration is set based on the training data to determine the generalizability of the model.

Therefore, in each iteration, the results may vary for each model, and these are related to one of the results.

In the following, the proposed method with two single-level models (CNN_RGB, CNN_MLBP) and three similar methods (GCNet, GScamKL-Net, and DCCAM) are compared using two datasets, TrashNet and HGCD. The CNN_RGB model receives images in RGB format and classifies them at a single level. Similarly, the CNN_MLBP model receives images in MLBP matrix format and classifies them at a single level. The results related to the classification of samples are presented in Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9.

Figure 4 provides a comparative analysis based on the average accuracy of the methods under consideration.

As depicted in Figure 4, the proposed method demonstrates a significant improvement in average accuracy on both datasets when compared to the two single-level models, CNN_RGB and CNN_MLBP. For the TrashNet dataset, the proposed method achieved an accuracy of 99.01%, which is at least 0.47% higher than the next best method, DSCAM (98.54%), and significantly outperforms other methods such as CNN_RGB (95.88%), CNN_MLBP (96.08%), GCNet (97.55%), and GScamKL-Net (98.14%).

Similarly, for the HGCD dataset, the proposed method achieved an accuracy of 99.41%, which is at least 0.36% higher than the next best method, DSCAM (99.05%), and surpasses other methods such as CNN_RGB (95.27%), CNN_MLBP (96.01%), GCNet (97.46%), and GScamKL-Net (97.78%). These results indicate that the introduced model consistently achieves superior accuracy, making it a highly effective solution for HWS.

The Confusion Matrix (CM) of the introduced model, in comparison with two models (CNN_MLBP and CNN_RGB), as well as the examined methods GCNet and DSCAM for the TrashNet dataset, is presented in Figure 5.

Figure 5 demonstrates the superior accuracy of the proposed method in detecting all types of waste. Specifically, the proposed method correctly identified 495 out of 496 instances of glass waste, achieved perfect accuracy with 550 correct classifications for paper, made only one misclassification among 399 instances of cardboard, accurately sorted 478 out of 479 instances of plastic waste, correctly identified 405 out of 406 instances of metal waste, and made only one error in 137 instances of trash. These figures underscore the proposed method’s superior capability in accurately detecting all types of waste, outperforming the other models in each category. The enhanced accuracy is particularly notable in the categories with the highest number of instances, such as paper and glass, where the proposed method nearly achieved perfect classification. This level of precision is critical for effective waste management and recycling processes.

Figure 6 illustrates the CM of the introduced model and DSCAM on the HGCD dataset. Given the large number of target classes in the TrashNet dataset, the confusion matrix related to it has been omitted.

Based on the numerical results provided in the confusion matrix in Figure 6, the proposed method demonstrates superior performance in classifying each type of waste. Specifically, it correctly identified 1252 out of 1255 instances of paper waste, made only one misclassification among 1253 instances of cardboard, and accurately sorted 1291 out of 1291 instances of biological waste. For plastic waste, the method achieved near-perfect accuracy with 1295 correct classifications out of 1295 instances. It also excelled in identifying metal waste, correctly classifying 1294 out of 1300 instances. Even in the glass-glass category, the proposed method made only a few errors, correctly classifying 1291 out of 1300 instances. Lastly, for the cloth category, the proposed method outperformed DSCAM by correctly classifying 1295 items out of 1300, compared to DSCAM’s 1278 correct classifications.

Figure 6 underscores the proposed method’s superior capability in accurately detecting all types of waste, outperforming DSCAM in each category. The enhanced accuracy is particularly notable in the categories with the highest number of instances, such as paper and plastic, where the proposed method nearly achieved perfect classification. This level of precision is critical for effective waste management and recycling processes. Therefore, the proposed method can be considered more effective for waste classification tasks than DSCAM.

Figure 7 displays the average precision, recall, and F-Measure metrics for the two datasets, TrashNet and HGCD. This comparison is made against three examined methods (GCNet, GScamKL-Net, and DCCAM) and two single-level models, CNN_RGB and CNN_MLBP.

According to Figure 7, it can be seen that the proposed method, compared to the single-level models CNN_RGB and CNN_MLBP, and for both datasets, has a higher average in the precision, recall, and F-Measure metrics. This result indicates that the use of CNN_L2 at the second level is effective in improving performance.

Also, the average precision, recall, and F-Measure metrics on the two datasets, TrashNet and HGCD, are higher than the three similar methods under review. This shows that the proposed method can detect different classes with greater accuracy (higher recall) and also has fewer detection errors (precision). These two factors cause the F-Measure metric, which is a combination of precision and recall, to be higher for the proposed method. These results indicate that our proposed method can generally perform better.

Figure 8 provides a comprehensive performance comparison of the proposed method against two single-level models, CNN_RGB and CNN_MLBP, and three similar methods, GCNet, GScamKL-Net, and DCCAM. This comparison is conducted across two datasets, TrashNet and HGCD, and encompasses all the classes within the datasets. The performance evaluation is based on the precision, recall, and F-Measure metrics.

As illustrated by the surface plots in Figure 8, our proposed method exhibits superior performance in HWS. It consistently achieves higher values across various metrics for each class within the datasets. This method’s enhanced precision in detecting different classes, coupled with a reduced number of classification errors, contributes to a higher F-Measure metric, a harmonized measure of precision and recall. This improved accuracy is pivotal for the effective classification and sorting of waste, thereby positioning our method as a robust solution for HWS challenges. The results further underscore the method’s superior performance in addressing the complexities of HWS.

Figure 9 represents the Receiver Operating Characteristic (ROC) curve and the Area Under the Curve (AUC) value for the proposed method, alongside those of the methods it is compared with across the TrashNet and HGCD datasets. The ROC curves and AUC values serve as critical indicators of performance in classification tasks.

According to Figure 9, the proposed method’s ROC curve is closer to the top left corner in both datasets, indicating a higher True Positive Rate (TPR), which means the method is more effective at correctly identifying positive cases as positive. Alongside a high TPR, the proposed method maintains a lower False Positive Rate (FPR), demonstrating its ability to minimize false alarms while accurately classifying true positives. The AUC is a summary measure of the ROC curve, and the proposed method’s higher AUC values signify its overall superior performance in distinguishing between the classes. In conclusion, the proposed method demonstrates a robust capability in classifying and HWS, outperforming other models by achieving higher accuracy (TPR) while maintaining lower misclassification rates (FPR), as evidenced by the superior AUC values. This analysis confirms the proposed method as a more effective solution for HWS challenges.

Table 4 presents a detailed performance analysis of our proposed method in comparison to other established models, CNN_RGB and CNN_MLBP, and three comparable methods, GCNet, GScamKL-Net, and DCCAM. The evaluation is based on TrashNet and HGCD datasets. Key performance metrics such as accuracy, F-measure, recall, and precision percentages are tabulated for each method, providing a clear snapshot of their efficacy in HWS.

Based on Table 4, the proposed method has performed better, with an accuracy of 99.0107% on the TrashNet dataset and 99.4125% on the HEGCD dataset, compared to the single-level models CNN_MLBP and CNN_RGB. Also, the proposed method performed better than the three methods under comparison (GCNet, GScamKL-Net, and DCCAM).

5. Conclusions

This study introduced a novel ensemble model based on CNNs for HWS. The performance of the model was evaluated using two datasets, TrashNet and HGCD, and compared with two single-level models, CNNRGB and CNNMLBP, as well as three similar methods, GCNet, GScamKL-Net, and DSCAM. The findings indicate that the proposed method significantly enhances the accuracy of automatic waste sorting. Specifically, for the TrashNet dataset, the proposed method improves accuracy by approximately 3.1% over CNNRGB, 2.9% over CNNMLBP, 1.5% over GCNet, 0.9% over GScamKL-Net, and 0.5% over DSCAM. For the HGCD dataset, the proposed method improves accuracy by roughly 4.1% over CNNRGB, 3.4% over CNNMLBP, 2% over GCNet, 1.6% over GScamKL-Net, and 0.4% over DSCAM. These figures highlight the significant performance boost provided by the proposed method in comparison to the other models.

However, the research also identified certain limitations that could be addressed in future work. One of these is the increased complexity of the proposed model compared to conventional models. This complexity arises from the use of two-level aggregation that employs multiple deep-learning models for detection. While this model significantly improves accuracy, it requires more computational resources to perform calculations in the model. Additionally, the use of the SA algorithm, which pertains to the training phase and does not affect the testing phase, also contributes to the model’s complexity. Future work could focus on improving computational performance to address this issue.

The proposed method currently has some limitations in its application to unsorted municipal waste streams. Some of the factors that may be limiting the datasets (TrashNet and HGCD) include the fact that the lighting conditions in the images are controlled, there is very little overlap of the objects in the datasets, and the waste samples are not contaminated. Such factors as crushing, contamination and variation in size and shape of the objects affect the model’s ability to perform well on unsorted municipal waste. Therefore, in future studies, we intend to focus on ways to incorporate the proposed model into improvements in robotics and waste management equipment. It would also make it possible to study the real-world utility of these methods in scenarios other than the laboratory ones. The model could also possibly be utilized to control robotic sorting arms in facilities or in smart recycling bins designed for pre-sorting waste.

Additionally, the datasets employed in this study consist of a few waste categories with little variation. This limits the applicability of our model in real-world waste streams that contain a wide range of material types, sizes, and conditions. In future works, this limitation can be overcome by increasing the data scope and improving its quality to increase the model’s stability. This includes integrating the existing databases (such as TrashNet and HGCD), defining new classes for different waste objects, and including real-world scenarios such as broken glass, mixed organic and inorganic waste, as well as the variations in size and decomposition stage.

Author Contributions

Conceptualization, N.W. and G.W.; methodology, N.W.; software, G.W.; validation, D.J. and N.W.; formal analysis, N.W.; investigation, N.W. and D.J.; resources, N.W.; data curation, G.W. and D.J.; writing—original draft preparation, N.W. and G.W.; writing—review and editing, N.W. and D.J.; visualization, G.W.; supervision, N.W.; project administration, N.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data generated or analyzed during this study are included in this published article.

Conflicts of Interest

Author Gui Wang was employed by the company China Construction Second Engineering Bureau Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Behera, R.; Adhikary, L. Review on cultured meat: Ethical alternative to animal industrial farming. Food Res. 2023, 7, 42–51. [Google Scholar] [CrossRef] [PubMed]
Alnezami, S.; Lamaa, G.; Pereira, M.F.C.; Kurda, R.; de Brito, J.; Silva, R.V. A sustainable treatment method to use municipal solid waste incinerator bottom ash as cement replacement. Constr. Build. Mater. 2024, 423, 135855. [Google Scholar] [CrossRef]
Hajam, Y.A.; Kumar, R.; Kumar, A. Environmental waste management strategies and vermi transformation for sustainable development. Environ. Chall. 2023, 13, 100747. [Google Scholar] [CrossRef]
Hoy, Z.X.; Phuang, Z.X.; Farooque, A.A.; Van Fan, Y.; Woon, K.S. Municipal solid waste management for low-carbon transition: A systematic review of artificial neural network applications for trend prediction. Environ. Pollut. 2024, 423, 123386. [Google Scholar] [CrossRef] [PubMed]
Hayat, P. Integration of advanced technologies in urban waste management. In Advancements in Urban Environmental Studies: Application of Geospatial Technology and Artificial Intelligence in Urban Studies; Springer: Berlin/Heidelberg, Germany, 2023; pp. 397–418. [Google Scholar]
Vinti, G.; Bauza, V.; Clasen, T.; Tudor, T.; Zurbrügg, C.; Vaccari, M. Health risks of solid waste management practices in rural Ghana: A semi-quantitative approach toward a solid waste safety plan. Environ. Res. 2023, 216, 114728. [Google Scholar] [CrossRef] [PubMed]
Ruiz, V.; Sánchez, Á.; Vélez, J.F.; Raducanu, B. Automatic image-based waste classification. In Proceedings of the From Bioinspired Systems and Biomedical Applications to Machine Learning: 8th International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2019, Almería, Spain, 3–7 June 2019; Proceedings, Part II 8. pp. 422–431. [Google Scholar]
Vo, A.H.; Vo, M.T.; Le, T. A novel framework for trash classification using deep transfer learning. IEEE Access 2019, 7, 178631–178639. [Google Scholar] [CrossRef]
Shi, C.; Xia, R.; Wang, L. A novel multi-branch channel expansion network for garbage image classification. IEEE Access 2020, 8, 154436–154452. [Google Scholar] [CrossRef]
Meng, S.; Zhang, N.; Ren, Y. X-DenseNet: Deep learning for garbage classification based on visual images. Proc. J. Phys. Conf. Ser. 2020, 1575, 012139. [Google Scholar] [CrossRef]
Feng, J.-W.; Tang, X.-Y. Office garbage intelligent classification based on inception-v3 transfer learning model. Proc. J. Phys. Conf. Ser. 2020, 1487, 012008. [Google Scholar] [CrossRef]
Wang, C.; Qin, J.; Qu, C.; Ran, X.; Liu, C.; Chen, B. A smart municipal waste management system based on deep-learning and Internet of Things. Waste Manag. 2021, 135, 20–29. [Google Scholar] [CrossRef] [PubMed]
Fu, B.; Li, S.; Wei, J.; Li, Q.; Wang, Q.; Tu, J. A novel intelligent garbage classification system based on deep learning and an embedded linux system. IEEE Access 2021, 9, 131134–131146. [Google Scholar] [CrossRef]
Guo, Q.; Shi, Y.; Wang, S. Research on deep learning image recognition technology in garbage classification. In Proceedings of the 2021 Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS), Shenyang, China, 22–24 January 2021; pp. 92–96. [Google Scholar]
Chen, Z.; Yang, J.; Chen, L.; Jiao, H. Garbage classification system based on improved ShuffleNet v2. Resour. Conserv. Recycl. 2022, 178, 106090. [Google Scholar] [CrossRef]
Zhao, Q.; Xiong, C.; Liu, K. Design and Implementation of Garbage Classification System Based on Convolutional Neural Network. In Proceedings of the International Conference on 5G for Future Wireless Networks, Beijing, China, 21–23 April 2017; pp. 182–193. [Google Scholar]
Liu, W.; Ouyang, H.; Liu, Q.; Cai, S.; Wang, C.; Xie, J.; Hu, W. Image recognition for garbage classification based on transfer learning and model fusion. Math. Probl. Eng. 2022, 2022, 1–12. [Google Scholar] [CrossRef]
Liu, F.; Xu, H.; Qi, M.; Liu, D.; Wang, J.; Kong, J. Depth-wise separable convolution attention module for garbage image classification. Sustainability 2022, 14, 3099. [Google Scholar] [CrossRef]
Li, X.; Li, T.; Li, S.; Tian, B.; Ju, J.; Liu, T.; Liu, H. Learning fusion feature representation for garbage image classification model in human–robot interaction. Infrared Phys. Technol. 2023, 128, 104457. [Google Scholar] [CrossRef]
Yang, M.; Thung, G. Classification of trash for recyclability status. In CS229 Project Report; Stanford University: Stanford, CA, USA, 2016; Volume 2016, p. 3. [Google Scholar]
HGCD: Household Garbage Classification Dataset. 2022. Available online: https://www.kaggle.com/datasets/mostafaabla/garbage-classification (accessed on 11 April 2023).

Figure 1. Some samples from the TrashNet collection for each category.

Figure 2. Diagram of the proposed method for HWS.

Figure 3. The architecture of the proposed CNN models.

Figure 4. Mean accuracy of the presented model compared to other methods in HWS for two datasets: (a) TrashNet and (b) HGCD.

Figure 5. CM of the introduced model and other models in classifying samples from the TrashNet dataset.

Figure 6. CM of the (a) introduced model and (b) DSCAM method in classifying samples from the HGCD dataset.

Figure 7. Average of precision, recall, and F-Measure metrics for datasets (a) TrashNet and (b) HGCD.

Figure 8. Performance comparison of different methods in HWS based on accuracy metrics (first row), recall (second row), and F-Measure (third row) for every class in the two datasets TrashNet (left column) and HGCD (right column).

Figure 9. ROC curves resulting from HWS for datasets (a) TrashNet and (b) HGCD.

Table 1. Determined configuration for each layer of the CNN_RGB model through the TrashNet and HGCD datasets.

Layer	TrashNet Parameter Setting	HGCD Parameter Setting
Convolution1 (W × H × N)	9 × 9 × 8	9 × 9 × 8
Pooling1	Max	Average
Convolution2 (W × H × N)	5 × 5 × 24	8 × 8 × 16
Pooling2	Average	Max
Convolution3 (W × H × N)	4 × 4 × 48	6 × 6 × 28
Pooling3	Average	Max
Convolution4 (W × H × N)	3 × 3 × 64	4 × 4 × 32
Pooling4	Global	Max
FC1	50	40

Table 2. Determined configuration for each layer of the CNN_MLBP model through the TrashNet and HGCD datasets.

Layer	TrashNet Parameter Setting	HGCD Parameter Setting
Convolution1 (W × H × N)	7 × 7 × 8	8 × 8 × 8
Pooling1	Max	Average
Convolution2 (W × H × N)	5 × 5 × 16	6 × 6 × 16
Pooling2	Average	Average
Convolution3 (W × H × N)	5 × 5 × 32	4 × 4 × 16
Pooling3	Max	Average
Convolution4 (W × H × N)	3 × 3 × 40	3 × 3 × 32
Pooling4	Max	Global
FC1	35	30

Table 3. Determined configuration for each layer of the CNN_L2 model through the TrashNet and HGCD datasets.

Layer	TrashNet Parameter Setting	HGCD Parameter Setting
Convolution1 (W × H × N)	6 × 6 × 16	7 × 7 × 8
Pooling1	Max	Average
Convolution2 (W × H × N)	5 × 5 × 24	5 × 5 × 16
Pooling2	Max	Max
Convolution3 (W × H × N)	3 × 3 × 24	5 × 5 × 24
Pooling3	Average	Max
Convolution4 (W × H × N)	3 × 3 × 32	3 × 3 × 24
Pooling4	Max	Global
FC1	30	30

Table 4. Performance comparison of the introduced model and other algorithms.

Dataset	Method	Accuracy	F-Measure	Recall	Precision
TrashNet	Proposed	99.0107	0.989	0.9902	0.9879
	CNN_RGB	95.8844	0.9503	0.957	0.9447
	CNN_MLBP	96.0823	0.9557	0.965	0.9483
	GCNet	97.5465	0.9734	0.9775	0.9697
	GScamKL-Net	98.1401	0.9802	0.983	0.9775
	DSCAM	98.5358	0.9843	0.9859	0.9827
HGCD	Proposed	99.4125	0.9941	0.9941	0.9941
	CNN_RGB	95.2739	0.9526	0.9529	0.9523
	CNN_MLBP	96.0066	0.9599	0.96	0.9598
	GCNet	97.4587	0.9745	0.9747	0.9744
	GScamKL-Net	97.7822	0.9776	0.978	0.9773
	DSCAM	99.0495	0.9906	0.9906	0.9906

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, N.; Wang, G.; Jia, D. A Hybrid Model for Household Waste Sorting (HWS) Based on an Ensemble of Convolutional Neural Networks. Sustainability 2024, 16, 6500. https://doi.org/10.3390/su16156500

AMA Style

Wu N, Wang G, Jia D. A Hybrid Model for Household Waste Sorting (HWS) Based on an Ensemble of Convolutional Neural Networks. Sustainability. 2024; 16(15):6500. https://doi.org/10.3390/su16156500

Chicago/Turabian Style

Wu, Nengkai, Gui Wang, and Dongyao Jia. 2024. "A Hybrid Model for Household Waste Sorting (HWS) Based on an Ensemble of Convolutional Neural Networks" Sustainability 16, no. 15: 6500. https://doi.org/10.3390/su16156500

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Model for Household Waste Sorting (HWS) Based on an Ensemble of Convolutional Neural Networks

Abstract

1. Introduction

2. Research Background

3. Research Methodology

3.1. Dataset

3.2. Proposed Method

3.2.1. Image Representation

3.2.2. Local Detection Based on First-Level CNN Models

3.2.3. Determining the Final Output Based on Aggregating Results by the Second-Level CNN

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI