*4.1. Dataset*

We created a dataset for training the waste classifier, collecting the most frequent waste items found in our university campus' bins and surveying the students about the most common garbage objects thrown into the trash. We collected about 65 different waste items, which were inserted into the smart waste bin for data collection. Since waste objects are not always in their pristine forms when being thrown away but are often dirty, distorted, torn, or crumpled, each item was inserted multiple times into the SWB. Each time, we changed the position of the object inside the WDU as well as applied physical deformations to modify its shape. From the initial 65 items, we collected 3125 data observations, each one composed of one image acquired by the camera sensor as well as a vector of measurements collected by the other scalar sensors. Finally, we grouped objects of the same type together: as an example, all different observations of beverages in aluminium cans (e.g., Coke, Fanta Orange, Red Bull) are grouped in the class metal can. After this operation, the final dataset is composed of 40 classes, each one with roughly 80 observations. As a last step, each item in the dataset is labelled with one of the five classes of trash: glass, paper, plastic, metal, and unsorted. We obtained 7 objects in the glass class, 9 in the paper class, 13 for the plastic class, 4 for the metal class and 7 for the unsorted class (see Table 1 for a complete list). Figure 8 shows a sample of the pictures used for the training dataset taken by the bin's camera, while Table 2 reports the summary statistics for the data gathered by the scalar sensors, divided by class. As one can see from Table 2, the scalar sensors allow capturing some characteristics of specific materials such as the conductivity of metals or the different transparency between paper and plastic. The complete dataset is made publicly available at https://tinyurl.com/SWB-dataset (accessed on 22 May 2021).


**Table 1.** List of objects contained in each waste class of the dataset.

**Figure 8.** A sample of the pictures used as training dataset. The objects were acquired by the waste sensing module directly on the white shelf of the waste disposal unit.

**Table 2.** Summary table reporting the per class average and standard deviation obtained by the Inductive Sensor (IS), Capacitive Sensor (CS) and Photoelectric sensors (PS).


### *4.2. Waste Classification*

We observe that the type of data returned by the two different types of sensors is very different: the inductive and capacitive sensors return a binary value, the photoelectric sensors return a real value and the camera produces an image. In the following, we will first derive two different classification models, leveraging either the scalar sensor data or the images from the camera. Then, we will explore two strategies to effectively fuse all this information in a single classification algorithm, which differ in terms of *where* data integration happens: at learning time or at prediction time.

### 4.2.1. Classification from Scalar Data

As a first step, we trained a classifier to leverage the data retrieved by the scalar sensors only. As a preprocessing step, each sensor data was normalized in order to have zero mean and unit variance. We split the available data into train and test subsets, according to stratified *k*-fold cross-validation with *k* = 5. The training data was given as input to a logistic regression classifier, using as labels the object materials. The performance of the resulting 5-class model is evaluated on the test folds, and we report in Table 3 the results obtained in the form of a confusion matrix, considering all test folds. As one can see from the Table, waste classification starting from the scalar sensors only allows to already reach a good starting point, with an average accuracy of about 89.6%. Some classes have very high recognition accuracy: indeed, the glass, metal and paper classes are recognized with accuracy higher than 95% given the unique property of the materials and the way they interact with the available sensors (i.e., inductive and capacitive sensors).


**Table 3.** Test confusion matrix obtained for sensor-based classification.

### 4.2.2. Classification from Images

As a second step, we build an image-based waste classifier. We base our approach on the use of a Convolutional Neural Network (CNN) classifier, thanks to its proven effectiveness in image classification tasks. Training a CNN classifier from scratch, avoiding the issue of overfitting, generally requires a massive amount of training images. Due to the relatively small size of our dataset, we rely on the concept of *transfer learning*: we start from a CNN image classifier pre-trained on the ImageNet dataset [44], and re-train only its last layers on our dataset in order to specialize it to the task of classifying trash. Since each CNN layer learns filters of increasing complexity, the earlier layers learn to detect basic features such as edges, corners, textures whilst later layers detect patterns, object parts, tags, and the final layers detect objects. Therefore, fine-tuning the last layers on our dataset while keeping the previous layers enables us to reach an accurate model without needing a huge image dataset as input. Several pre-trained CNN models, differing in structure (number of layers, number of neurons per layer, etc.) are already available: in order to select the one that best fits our purposes, we performed fine-tuning and studied the resulting model accuracy as well as Single Forward Pass (SFP) time (that is, the time it takes for the CNN to process an image and return the classification result). The following CNN models were considered for comparison: NASNet-A-Mobile, MobileNet-v2, MobileNet-v3-large, MobileNet-v3-small, ResNet-18, ResNet-34, ResNet-50, ResNet-101, GoogleNet, ShuffeNetv2-1.0, SqueezeNet-v1.1 and Inception-v3. Due to the large variability of the appearance of objects inside each waste material class, tests were performed in the following way: we first split the dataset in train and test folds according to the same 5-fold cross-validation procedure used for the scalar sensor-based classifier. This time, however, we trained the CNNs using the object labels rather than the material labels. At inference time, we mapped back each object to its material class. All tests were performed on an Intel Core i7-6700HQ CPU, equipped with a NVIDIA GeForce GTX 950M GPU, and 16 GB RAM. The accuracies obtained are illustrated in Figure 9: we select the ResNet-18 model as the best compromise between accuracy and SFP time. Table 4 shows the confusion matrix obtained with the fine-tuned ResNet-18 model. As one can see, the average accuracy hits about 93%, with no material class having accuracy higher than 95%.

**Figure 9.** Material class accuracy vs. mean forward pass time.


**Table 4.** Test confusion matrix obtained for image-based classification.

### 4.2.3. Hybrid Classification

Looking at the results obtained classifying waste with scalar sensor or image data, it is clear that each method has its pros and cons. Scalar sensor data outperforms image-based classification for some materials (e.g., metal), while image-based classification obtains similar results for each class. In the following, we propose two different strategies to exploit the best features of the two different approaches.

1. *Integration at prediction time:* a first approach consists of running the two classifiers in parallel and then taking a decision considering the lowest (training) classification error (Figure 10). Let *y* be the output of the two classifiers, taking qualitative value *C* = [glass, paper, plastic, metal, unsorted],, i.e., the output class. For each classifier, we compute the a posteriori misclassification error probability *<sup>P</sup>*(*x* = *<sup>C</sup>*|*y* = *C*), being *x* the true class. To do this, we use the Bayes' theorem:

$$P(\mathbf{x} \neq \mathbf{C} | y = \mathbf{C}) = \frac{P(\underline{y} = \mathbf{C} | \mathbf{x} \neq \mathbf{C})P(\underline{y} = \mathbf{C})}{P(\mathbf{x} \neq \mathbf{C})},\tag{1}$$

where *<sup>P</sup>*(*y* = *<sup>C</sup>*|*x* = *<sup>C</sup>*), *<sup>P</sup>*(*y* = *C*) and *<sup>P</sup>*(*x* = *C*) are the likelihood of misclassification for class *C*, the prior output and the prior class probabilities, respectively. We estimated such quantities from the (training) confusion matrix of each classifier. In case the two classifiers agree on the output class, the method obviously returns the same class *C*; in case the two classifiers disagree, the class *C* having the lowest misclassification error is selected.

As an example, let *ys* = plastic and *yi* = glass be the output of the sensor-based and image-based classifiers, respectively. Assuming the values contained in Table 3 and 4 as the learnt probabilities during training we have:

$$P(\text{x} \neq \text{plastic} | y\_s = \text{plastic}) = \frac{0.1038 \times 0.503}{0.675} = 4.65\%,\tag{2}$$

while

$$P(\text{x} \neq \text{glass} | y\_i = \text{glass}) = \frac{0.1337 \times 0.179}{0.825} = 2.9\%;\tag{3}$$

The system will therefore select *yi* as final class, since its associated error is lower.

**Figure 10.** Integration at prediction time.

2. *Integration at learning time:* A second approach is to train a new classifier, where input features come from all available sensors. To do this, we note that the last layer of the fine-tuned CNN consists of 40 nodes, where each node outputs a value between 0 and 1 that represents the probability that the input image belongs to one of the 40 object classes. We treat such values as new features, which are fed to a regularized logistic regression classifier together with the scalar sensor measurements (Figure 11. The classifier is again trained according to *k*-fold cross-validation using as ground truth labels the waste materials.

**Figure 11.** Integration at learning time.

The results obtained on the test set for the two strategies are contained in Tables 5 and 6, for the integration at prediction and learning cases, respectively. As one can see, both approaches allow to increase performance compared to solely using the scalar sensor-based or image-based approaches. In particular, integration at prediction time allows obtaining an accuracy of 96.12%, while the best result is obtained with the integration at learning time approach (97.37%). This is particularly promising, especially to cope with specific waste objects such as that of shattered glass. In this case (fortunately rare, according to our survey) relying solely on an image-based recognition would be very difficult given the high variance associated with images of glass fragments. Indeed, using also scalar sensors in the system may greatly improve the recognition accuracy.



**Table 6.** Test confusion matrix obtained for hybrid classification with integration at learning.


### *4.3. Waste Classifier Location*

The hybrid model with integration at learning time has been exported for being tested in three different scenarios, differing in where the classification takes place.


For every scenario, we tested the total recognition time of a waste item and the overall energy consumption of the SWB.

### 4.3.1. Recognition Time

The total recognition time is composed of CNN execution time and the picture transfer time from the Raspberry Pi to the server. (Image acquisition time is assumed constant and thus discarded.) For the local scenario, since the picture is processed internally on the Raspberry Pi, the total time equals the execution time of the CNN, which is around 3 s. For the cloud and MEC server scenarios, the total time also includes the transfer time of the picture from the Bin to the server. In these cases, the Image Acquisition Module of the Raspberry Pi takes a picture of the trash, sends it to the cloud or MEC server using Secure Copy Protocol (SCP); then, the server feeds the picture to the CNN, and the resulting label along with the confidence level is sent back to the SWB as an MQTT publish message. The time measurement summary is given in Table 7.

**Table 7.** Total waste recognition time.


As one can see, the total time on the Raspberry Pi is 5–6 times longer than the others taking over 3 s due to the low computational power available. Since the cloud and the MEC server have equivalent hardware specifications, the CNN recognition time is identical on the two machines. However, as the MEC server is located closer to the Smart Waste Bin compared to the cloud server, the data transfer time is greatly reduced. For this reason, we can see a clear improvement for the MEC approach in the Average Total Time. In any case, note that the total time is well below the average time of 5.3 s spent with the traditional bins and estimated from the requirement analysis. This means that the use of the SWB speeds up an average interaction with a human, also reducing waste misplacement.

### 4.3.2. Energy Consumption

To measure the energy consumption of the SWB, we used an Adafruit INA219 High Side DC Current Sensor wired to an Arduino Uno and connected in series to the Raspberry Pi of the SWB. The method calculates the integral of power over the execution time, i.e., the sum of instant power samples taken by the sensor unit, illustrated in Figure 12. Even though the unit continuously takes measurements, since the samples are discrete, the exact energy consumption is not measured but estimated. As one can see in Table 8, the energy consumption reflects the total recognition time. In particular, using the MEC, we can save up to 15% of energy compared to the cloud version.

**Figure 12.** Energy measurement during one normal operation cycle.

**Table 8.** Energy consumption of the Raspberry Pi when the waste classifier is run locally, on the MEC or on the cloud server.


### **5. Management Application**

The smart waste bin collects not only data relative to the waste classification but also a multitude of heterogeneous information such as time and frequency usage, bag filling levels, emptying time. Such additional metadata may be of enormous value for optimizing the waste collection task in a university campus, as well as larger scenarios such as a city. For these reasons, all the information collected by the bin (working status, filling level for each waste class, etc.) are periodically transmitted to a managemen<sup>t</sup> server, hosted remotely, which stores the data for advanced uses. In the following, we provide a brief description of such a managemen<sup>t</sup> server: to fully test the functionalities offered, we also provide a Smart Waste Bin simulator (SWB-sim), which allows simulating a multitude of SWB instances, therefore, providing enough data.

### *5.1. Smart Waste Bin Simulator (SWB-Sim)*

To overcome the practical issues of physically realizing multiple prototypes, we propose a simulator that virtually creates thousands of bins with different usage profiles, such as frequency of interaction with people and distribution of waste produced. The simulation software is written in Python and replicates an arbitrary number of Smart Waste Bin devices in a simulated environment with adjustable parameters. At the program startup, a user-specified number of Smart Waste Bins devices are simulated, placed in an area of interest either randomly or in specific positions. Then, the simulation system runs the engine for the process of waste generation. We leveraged Python capabilities to generate a discrete-time simulation scenario that can either run in real-time or in a speed-up fashion. Each bin's waste level at a certain time is modelled as a normal distribution, according to [45]. Indeed, the amount of waste deposited by each person in a bin can be represented as a stochastic variable. Therefore, according to the central limit theorem, the sum of many stochastic variables of arbitrary probability distributions approaches a normal distribution. The simulator allows to use five template distributions for each bin, according to different

usage profiles from very low to very high usages, which in turn control the mean and variance of the associated normal distribution. Moreover, in order to adhere to real-world constraints, the simulator takes into consideration specific environment characteristics such as university closing time, holidays, and most expected waste type. Periodically, the data generated by each bin in the simulator is transmitted to a remote server via MQTT, using the same message format as the prototype. The smart waste bin simulation system is available at: https://tinyurl.com/SWB-sim (accessed on 22 May 2021).

### *5.2. Management Server*

All data produced and transmitted by the SWB, real or simulated, are received by a server application running on a public server. The main tasks of the managemen<sup>t</sup> server are the following:


**Figure 13.** Smart waste bin backend dashboard implemented with Node-RED. The map shows the bins' position as well as a graphical summary of the fill levels. Respectively, red when at least one class is greater than 70%; yellow when at least one class is greater than 40% and green when all the classes are below or equal 40%.
