Towards Indoor Suctionable Object Classification and Recycling: Developing a Lightweight AI Model for Robot Vacuum Cleaners

Huang, Qian

doi:10.3390/app131810031

Open AccessArticle

Towards Indoor Suctionable Object Classification and Recycling: Developing a Lightweight AI Model for Robot Vacuum Cleaners

by

Qian Huang

School of Architecture, College of Arts and Media, Southern Illinois University Carbondale, Carbondale, IL 62901, USA

Appl. Sci. 2023, 13(18), 10031; https://doi.org/10.3390/app131810031

Submission received: 31 July 2023 / Revised: 31 August 2023 / Accepted: 5 September 2023 / Published: 6 September 2023

Download

Browse Figures

Versions Notes

Abstract

:

Robot vacuum cleaners have gained widespread popularity as household appliances. One significant challenge in enhancing their functionality is to identify and classify small indoor objects suitable for safe suctioning and recycling during cleaning operations. However, the current state of research faces several difficulties, including the lack of a comprehensive dataset, size variation, limited visual features, occlusion and clutter, varying lighting conditions, the need for real-time processing, and edge computing. In this paper, I address these challenges by investigating a lightweight AI model specifically tailored for robot vacuum cleaners. First, I assembled a diverse dataset containing 23,042 ground-view perspective images captured by robot vacuum cleaners. Then, I examined state-of-the-art AI models from the existing literature and carefully selected three high-performance models (Xception, DenseNet121, and MobileNet) as potential model candidates. Subsequently, I simplified these three selected models to reduce their computational complexity and overall size. To further compress the model size, I employed post-training weight quantization on these simplified models. In this way, our proposed lightweight AI model strikes a balance between object classification accuracy and computational complexity, enabling real-time processing on resource-constrained robot vacuum cleaner platforms. I thoroughly evaluated the performance of the proposed AI model on a diverse dataset, demonstrating its feasibility and practical applicability. The experimental results show that, with a small memory size budget of 0.7 MB, the best AI model is L-w Xception 1, with a width factor of 0.25, whose resultant object classification accuracy is 84.37%. When compared with the most accurate state-of-the-art model in the literature, this proposed model accomplished a remarkable memory size reduction of 350 times, while incurring only a slight decrease in classification accuracy, i.e., approximately 4.54%.

Keywords:

indoor object classification; memory efficient; weight quantization; deep learning

1. Introduction

Thanks to the integration of artificial intelligence (AI), modern robot vacuum cleaners are equipped with the ability to autonomously clean and mop indoor floors. However, due to the limited suction power of robot vacuum cleaners, they may struggle with larger or heavier objects, such as large food spills, bulky items, or scattered toys. Therefore, during the cleaning process, when a robot vacuum cleaner encounters objects, it performs a fast binary classification to simply determine whether an object is cleanable or non-cleanable [1,2,3,4,5]. Cleanable objects typically consist of dust, paper scraps, and lightweight debris, all of which can be effectively removed by the vacuum’s suction mechanism. On the other hand, objects that are unsuitable for suctioning involve items that a robot vacuum cleaner cannot handle, as attempting to handle them may cause malfunctions, such as large furniture, electronic devices, or human toys. The utilization of AI techniques ensures that the robot vacuum cleaner effectively cleans all room areas while adjusting the navigation path accordingly to avoid getting stuck or causing any damage.

At present, robot vacuum cleaners have limited capabilities in object recognition, often restricted to simple binary classification (cleanable vs. non-cleanable) and lacking the ability to accurately identify specific types of small debris. However, there is a growing anticipation for future robot vacuum cleaners to become more advanced, exhibiting enhanced intelligence and the ability to identify and collect various types of small objects [6,7,8]. This includes items like cat food, dog food, green beans, mung beans, rice, and more. These advancements would empower robot vacuum cleaners to pick up and recycle small solid objects such as pet food, thereby making significant contributions to maintaining cleanliness and hygiene in indoor environments. For example, pet food scattered on the floor is a common occurrence, resulting from natural behaviors during feeding, e.g., where cats and dogs use their paws or mouths to push food out of their bowls or accidentally spill it while eating. If the pet food is dry and free from contamination on the floor, an advanced robot vacuum cleaner may collect and recycle it. Even though it is hard to estimate the exact amount of pet food wasted annually, it is widely believed that a substantial amount of pet food is wasted each year. As a result, recycling wasted pet food contributes to environmental conservation, resource efficiency, cost savings, and sustainable practices, making it a beneficial and responsible approach to waste management.

Due to current technological limitations, robot vacuums have relatively limited sensing and object classification capabilities [9]. Prior studies demonstrated the capability of picking up trash based on a binary classification. Specifically, they can only perform simple classifications based on predefined rules or sensor feedback to determine which objects can be suctioned and which objects should be avoided or left for human handling. This simplistic object classification restricts the robot vacuum’s ability to operate effectively in complex environments. However, with ongoing technological advancements and the development of machine learning algorithms, future robot vacuums are expected to possess more advanced sensing and classification capabilities. This would enable them to accurately identify and handle various types of objects. In the future, advancements in technology and artificial intelligence (AI) are likely to enable robot vacuum cleaners to not only perform their primary function of cleaning, but also identify, classify, and recycle indoor trash objects. This development holds the potential to revolutionize how people manage waste within their homes. Once the robot has identified and classified the trash objects, it can then take appropriate actions. This may involve segregating the trash into separate compartments within the robot or depositing it into specific containers designated for recycling. The robot may employ techniques such as suction, grabbing, or other mechanical means to handle and manage the trash objects efficiently. Overall, this research can contribute to better recycling efforts, reducing the burden on landfills, and promoting sustainable practices.

As I mentioned before, indoor suctionable object classification and recycling refers to the process of identifying and categorizing objects in indoor environments that are suitable for suctioning or removal by a vacuum cleaner, and subsequently recycling or appropriately disposing of them. At present, this research suffers from several challenges [10]:

Absence of a Comprehensive Dataset: The absence of a comprehensive and diverse collection of data poses a challenge in achieving accurate indoor suctionable object classification. It requires significant efforts to collect a wide range of ground-view perspective images captured by robot vacuum cleaners, which encompass various object types, lighting conditions, and cluttered environments [11].

Edge Computing: Due to the constraints on size, weight, and power consumption, the computational resources in robot vacuum cleaners are limited compared to traditional computing devices like computers or smartphones [4,5]. Hence, vacuum cleaner manufacturers usually focus on optimizing these resources (for example, a small CPU, micro-controller, and memory) to strike a balance between performance, efficiency, and cost-effectiveness, enabling the robot vacuum cleaners to perform their primary cleaning tasks effectively. More advanced AI tasks, such as multi-class object recognition, can be offloaded to powerful cloud servers, where data are processed and then sent back to robot vacuum cleaners for execution. However, edge computing inside vacuum cleaners offers several benefits over cloud-based computing in robot vacuum cleaner applications, such as low latency, reduced dependence on internet connectivity, privacy and security, real-time decision making, and cost savings [12]. First, edge computing processes data locally on the robot vacuum cleaner itself, reducing the time taken to send data to the cloud server and receive back the results. This low latency is crucial for real-time tasks like object classification and obstacle avoidance, enabling the robot to respond quickly to changing environmental conditions. Second, robot vacuum cleaners equipped with edge computing capabilities can operate autonomously even when there is limited or no internet connectivity. This independence from cloud services ensures uninterrupted functionality, regardless of internet outages or network disruptions. Third, edge computing keeps sensitive data and processing tasks within the device itself, reducing the need to send data to external servers. This enhances data privacy and security, because there is less exposure to potential information leaks or unauthorized access [13]. Fourth, the local processing capability of edge computing enables robot vacuum cleaners to make critical decisions in real-time without relying on cloud services. This is especially important for tasks like immediate navigation adjustments or emergency responses to objects. Fifth, with edge computing, there is less reliance on expensive cloud-based service, resulting in potential cost savings of high-speed internet connections for users [14].

Limited Visual Features: Small suctionable objects often lack distinctive visual features that can aid in classification. Unlike larger objects with prominent characteristics, small objects may appear similar in color or shape, making it difficult for AI algorithms to differentiate between different types of small objects. Infrared (IR) sensors encounter difficulties when operating on dark flooring or carpets due to their inability to effectively process infrared signals. This issue is pervasive across all robotic vacuum models available on the current market that incorporate IR sensor technology. The presence of dark coloration on the surface results in a reduction in their capacity to capture visual features.

Occlusion and Clutter: In indoor environments, small objects can be partially or fully occluded by other objects or hidden within cluttered backgrounds. This occlusion and clutter poses challenges for object recognition algorithms, as they may struggle to detect and classify small objects accurately when they are not fully visible.

Real-Time Processing: Real-time processing demands quick and efficient execution, which is challenging on resource-constrained devices like robotic vacuum cleaners that have limited a memory footprint, power budget, and computational resources. Sophisticated models that achieve high accuracy in small object classification tend to be computationally expensive, and hence, power hungry. Striking a balance between model complexity and accuracy is crucial to meet real-time processing demands and the energy budget for running AI on these edge devices.

Size Variation: Small suctionable objects can vary significantly in size, shape, and texture. For example, cat food pellets or kibble come in various sizes and shapes, from small and round to larger, irregular pieces. This diversity makes it hard to develop an effective classification model that accurately identifies all types of small suctionable objects. The variation in appearance and characteristics requires robust AI algorithms capable of handling different object attributes.

Varying Lighting Conditions: Adequate lighting is essential for the visibility of small objects. In good lighting conditions, small objects are more easily distinguishable from the background, and their features are more prominent for better contrast. However, under poor lighting conditions (for example, harsh shadows), it is challenging for AI models to discern relevant features. On the other hand, strong sources of light, such as direct sunlight or artificial light, can cause highlighting and glare on small objects. These reflections may overwhelm the object’s features, leading to wrong predictions from AI models.

The seven research challenges mentioned above also serve as the limitations of the study, which constitute the central focus of this paper. The main contributions and innovations of this paper are described as follows: First, I gathered videos or pictures taken by the built-in cameras of robot vacuum cleaners and utilized them to create a diverse dataset for small suctionable indoor object classification. The dataset encompasses 23,042 ground-view perspective images captured from dining rooms, kitchens, and living rooms in various houses and at different lighting conditions. This approach ensures data diversity in complex indoor scenes. Moreover, the dataset contains ample image data (at least 2000 images in each category of rice, cat/dog food, sunflower seed shell, red bean, soybean, green been, cat litter, and millet) with eight balanced classes, making it suitable for effective training and classification purposes. Second, I examined state-of-the-art AI models including Xception, Inception-V3, LeNet-5, AlexNet, VGG-16, ResNet, DenseNet, EfficientNet, ShuffleNet, NasNetMobile, and MobileNet and carefully chose the top three models among them as potential candidates. Then, I simplified these three potential models and implemented weight quantization to minimize memory usage and computational complexity, ensuring their deployability on resource-constrained robot vacuum cleaner systems. By implementing these model simplification and memory reduction techniques, the memory sizes for both potential model candidates were substantially decreased. The initial memory size of the state-of-the-art AI models can be compressed to only 0.7 MB in our proposed model, comfortably fitting within the memory budget limitations of robot vacuum cleaners. According to the experimental results, the proposed AI model achieved an object classification accuracy of approximately 84.37%, with only a minor decrease of 4.54% when compared to the most accurate existing AI model in the literature. Notably, the proposed AI model achieved a slightly lower level of accuracy while significantly reducing memory usage by 350 times (from 244.9 MB down to 0.69 MB).

The rest of this paper is organized as follows. Section 2 introduces an overview of the related work concerning state-of-the-art deep AI models. Section 3 details the preparation of the diverse dataset and outlines the experimental setup. Section 4 describes the proposed lightweight AI model development procedures, including the state-of-the-art model examination, model simplification, and weight quantization. Section 5 provides the experimental setup, the results, and a comprehensive comparison with the state-of-the-art works in the literature. Section 6 concludes the paper and discusses potential avenues for future research.

2. Related Work

To the best of our knowledge, there has been no prior research on indoor small suctionable object classification for robot vacuum cleaners, and the majority of the existing literature focuses on outdoor trash or garbage classification [14,15,16,17,18,19,20]. In this section, I conduct a review of the state-of-the-art AI models employed for outdoor garbage classification. By studying these works, I gained valuable insights that aided in designing specialized AI models for indoor suctionable small object classification, specifically tailored for robot vacuum cleaners.

The authors of [15] introduce a novel robot system with the primary objective of automating the process of collecting litter and garbage from grassy areas. This innovative system leverages the power of deep learning techniques to enable the robot to identify and classify various types of objects commonly found as litter in outdoor environments. The specific object categories targeted by the system include bottles, cans, cartons, plastic bags, and wasted paper, etc. Throughout the research and experimentation, the ResNet34 model demonstrated remarkable performance, achieving an accuracy rate of up to 96% in real-world scenarios. The model of ResNet34 showed impressive capabilities in learning complex features from images, making it promising for the challenging task of small suctionable object classification in indoor environments. In ref. [16], the researchers introduced an intelligent waste material classification system designed to streamline the process. This system was developed using the ResNet50 convolutional neural network model to simplify and optimize the waste classification procedure. The system serves the purpose of categorizing waste into distinct groups, including glass, metal, paper, plastic, and more. During testing on the trash image dataset, the proposed system attained an impressive accuracy of 87%. In ref. [17], the researchers introduced a novel approach for waste object detection in unstructured environments using Mask R-CNN based on a backbone ResNet101 architecture. The proposed method effectively categorizes waste materials such as plastic, paper, glass, and metal. The experimental findings revealed that the reported accuracy of the Mask R-CNN model is 97%.

In ref. [18], using a dataset consisting of 2527 trash images, the MobileNet model was trained to classify various common trashes, including glass, paper, cardboard, plastic, metal, and more. In addition, the authors used the Gzip quantization algorithm to compress the model weights of trained MobileNet models. Gzip is a widely used data compression method that reduces the size of the model by encoding the weights in a more compact form. Unlike 8-bit weight quantization, the Gzip algorithm does not have a direct impact on the precision of the model’s parameters. The authors of [18] reported the test accuracy of the MobileNet model to be 87.2%. In ref. [14], the authors used three popular AI models (MobileNetV2, NasNetMobile, and EfficientNetB0) for edge computing towards waste object classification.The dataset consists of 10 waste categories, encompassing items like food bowls, food boxes, glass bottles, metal cans, plastic bags, plastic bottles, snack wraps, plastic cutlery, and tetrapacks. In comparison to previous studies mentioned in refs. [15,16,17,18], the three selected models achieved a comparable test accuracy while consuming less memory.

In ref. [19], the authors proposed using an enhanced version of the ShuffleNetV2 architecture for garbage classification. The authors explain how these enhancements lead to an improved accuracy and computational efficiency, making the model well suited for garbage classification applications. The experimental results show the test accuracy to be about 98% while the number of model parameters is only 1.3 million. In ref. [20], the authors developed a unique dataset and conducted a comparative analysis of various deep learning methods for waste classification. The newly constructed trash dataset contains more than 8100 images encompassing seven distinct categories of objects. The authors showed that the most effective model is EfficientNetB3, which achieved the highest test accuracy of 92.87%.

In ref. [21], the authors used the Inception-V3 model for classifying office garbage (such as can, bottle, milk-box, paper cup, paper, and battery). The Inception-V3 model, which was developed by Google, builds on the innovative concept of using multiple filter sizes in parallel to capture information at various scales within the input image. This enables the Inception-V3 model to achieve a remarkable accuracy on object classification while efficiently handling the trade-off between depth and computational resources. The experimental results show the test accuracy to be 95.33% while the number of model parameters is around 21.8 million. In ref. [22], the authors used simplified Xception models for garbage image classification. By removing part of the network layers, lightweight Xception models are developed with a smaller number of training parameters. This makes it well suited to tasks that require a high efficiency, such as real-time or resource-constrained applications, while still benefiting from the architectural advancements of the Xception family. As a result, the reported object classification accuracy reached 94.34%.

From the above-mentioned discussion, it is evident that these existing research works [14,15,16,17,18,19,20,21,22] do not fully meet the demand for lightweight AI models catering to the classification of small suctionable objects in resource-limited robot vacuum cleaner systems. To date, two primary obstacles have impeded the progress in this area. The first obstacle refers to the lack of a comprehensive and diverse dataset, specifically focusing on ground-view videos and images collected from robot vacuum cleaners. For training a robust AI model for small suctionable object classification, it is crucial to have a large and varied dataset that encompasses different environmental conditions, lighting variations, object sizes, and cluttered scenes typically encountered by robot vacuum cleaners during their operation. However, such a dataset tailored for indoor cleaning scenarios involving robot vacuum cleaners is not readily available. To overcome this obstacle, researchers need to devote considerable efforts to collect and create such a dataset. It is important to note that the study [23] offers a dataset comprising 14 categories, but these categories, including bed, sofa, cabinet, stool, table, close stool, trashcan, slippers, wire, socks, carpet, book, feces, and curtain, do not encompass small suctionable objects. As a result, the dataset provided in ref. [23] is not suitable for small object classification and recycling purposes. Another obstacle lies in the fact that, even though some of these models are considered lightweight, their memory requirements still exceed the limitations of low-cost CPUs or micro-controllers. For instance, micro-controllers like the STM32F7 series typically offer a memory capacity of only 512 KB or 1 MB [24], which is insufficient to accommodate these AI models. Hence, the critical objective of this paper was to devise lightweight AI models tailored for small suctionable object classification and recycling, ensuring both a high test accuracy and minimal memory footprint.

3. Dataset Preparation and Experimental Setup

In order to detect and recognize objects during cleaning operations, robot vacuum cleaners rely on integrated front-facing cameras to capture ground-view perspective videos. Due to the unique ground-view perspective and uncommon object categories (for example, soybean, cat/dog food, and millet), I could not find any appropriate public dataset that fits our needs in this research. None of the existing datasets in refs. [14,15,16,17,18,19,20,21,22,23] contain categories of indoor small suctionable objects, so they were unusable in this work. Therefore, I had to create and prepare a dataset. Note that it might be possible to use a non-ground-view dataset for ground-view object detection, but its performance depends on the similarity between the two domains and the availability of ground-view data for fine-tuning. If the domains differ significantly, the inference performance of AI models may be limited, and additional adaptation techniques may be required.

It is not an easy task to create a dataset for AI model training, since it involves collecting and organizing a representative set of data samples. The quality of a dataset plays a crucial role in the inference performance and generalization capability of trained AI models. The accuracy of trained AI models is directly proportional to the quality of training data. Training data contain input data and annotations. AI models learn the annotations on the training data, so that they may be applied to new, unlabeled data samples for inference. In this study, I gathered and compiled video clips from a variety of robot vacuum cleaners in different, diverse buildings. Since raw data can be contaminated and damaged in various ways, I pre-processed the collected video data to ensure a uniform image format for AI model training. When generating a dataset, data augmentation techniques are employed to artificially expand its diversity and size.

The complexity of an AI model is directly influenced by the input image size [25,26]. As the dimensions of input images grow larger, the number of model parameters also increases to accommodate the augmented information. This is necessary because larger images often contain more intricate details that demand a more intricate model for accurate capture and processing. Conversely, reducing the size of input images can simplify the model by decreasing the number of parameters and computations needed. However, this reduction comes at the cost of potential information loss and diminished accuracy, as finer details might be overlooked. The original size of each raw image is 1280 × 720, which indicates high-resolution images that contain more details or information. However, higher resolutions demand more computational resources, such as processing power and memory [9]. Therefore, I decided to scale down the dimensions of the raw images to 256 × 144, while maintaining the aspect ratio. In this way, AI models can be trained with less training data more quickly, leading to faster convergence and a shorter training time. In this work, a total of 23,042 raw images were divided into three subsets, with each subset containing 50%, 25%, and 25% of the original images. The number of samples for the training set, validation set, and testing set are 15,855, 3339, and 3391, respectively. The training set was used to train AI models, the validation set helped in tuning hyper-parameters and preventing over-fitting, and the test set was used to evaluate the final inference performance. This research involved selecting eight categories of common recyclable indoor small objects in each subset. Since robot vacuum cleaners are not well equipped to handle larger or heavier objects, these categories of suctionable objects in the dataset had to be small items. As shown in Table 1, the chosen categories included red bean, millet, soybean, mung bean, sunflower seed shell, cat/dog food, rice, and green bean. Several image examples from the dataset are depicted in Figure 1, while Figure 2 presents a statistical overview of the dataset. As shown in Figure 1, there are a variety of images in our dataset with dark floors. Figure 2 shows slightly imbalanced classification categories. The motivation behind gathering more samples for the “cat/dog food” category primarily stems from the significant interest shown by owners of robotic vacuum cleaners. Through the identification and separation of pet-related items, these owners would like to guarantee the proper collection and recycling of pet food. This strategy improves the effectiveness of recycling initiatives, reduces the risk of contamination, and aligns harmoniously with the broader objective of decreasing the environmental impact linked to pet care. The increased quantity of samples within the dog/cat food category aimed to enhance the AI models’ capability to acquire expertise in identifying and discerning the distinct characteristics linked to this category.

4. Proposed Lightweight AI Model Development Procedures

The process of creating the optimal lightweight AI model for robot vacuum cleaners is depicted in Figure 3. Initially, I collected original images or videos from robot vacuum cleaners in various testing environments, establishing a diverse dataset. Subsequently, I investigated state-of-the-art AI models, such as ResNet, DenseNet, EfficientNet, MobileNet, ShuffleNet, among others, to identify three top candidate models. I then simplified these selected models and quantized their weights to further minimize the computational complexity and memory requirements. This stepwise approach ultimately led to the generation of the best lightweight AI model.

4.1. State-of-the-Art AI Model Examination

The investigation of state-of-the-art AI models is a crucial step in the process of selecting the best model for a specific task. In this work, I examined several popular AI architectures and their variations, including Xception, Inception-V3, LeNet-5, AlexNet, VGG-16, ResNet, DenseNet, EfficientNet, MobileNet, ShuffleNet, among others, to identify the three most promising models that could potentially meet your requirements. I evaluated the classification performance and memory requirement of each model using the created dataset. Once I completed this evaluation, I ranked the models based on their performance and selected the three models that align best with our specific needs. Note that the top models vary depending on the specific task and dataset, so it was essential to tailor the selection process to our particular needs.

4.2. AI Model Simplification

Simplified ResNet: ResNet is a series of deep convolutional neural network architectures that introduced the concept of residual learning. The key idea in ResNet is the use of skip connections or shortcuts that bypass one or more layers, allowing information from previous layers to directly flow to subsequent layers. The ResNet family includes several variants, each distinguished by the number of layers. The most common variants are ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152. The numbers indicate the total number of layers in the network, including both convolutional layers and fully connected layers. Table 2 lists the number of layers, parameters, and the memory size of the family of ResNet models. As the depth of the model increases, there is a substantial rise in the number of parameters and memory size. Consequently, I had the option to modify the number of layers in ResNet models to achieve simplification.

Simplified DenseNet: DenseNet is a widely used convolutional neural network architecture known for its effectiveness in object classification tasks. One of the key components that distinguishes DenseNet from other AI model architectures is its dense connectivity pattern, where each layer is connected to every other layer in a feed-forward manner. This connectivity promotes feature reuse, enables efficient information flow, and contributes to better parameter efficiency. As shown in Table 3, a critical parameter in DenseNet is the growth rate, which determines the number of additional feature maps each layer contributes. For example, in the original DenseNet121 model, the growth rate is set to 32. This means that each dense block in the architecture adds 32 feature maps to the output of the previous layer, significantly increasing the number of parameters and computation. By adjusting the growth rate to a number that is less than 32, the new DenseNet121 model will have a simpler architecture with fewer feature maps per layer. Such simplified DenseNet121 models are especially attractive when working with limited computational resources or aiming for a more lightweight architecture. For example, by reducing the growth rate to 8, the model becomes less complex, as each dense block will only add eight feature maps per layer. This leads to a reduction in the total number of parameters, making the model more memory-efficient and faster to train and infer. On the other hand, such simplified DenseNet121 models still retain the benefits of dense connectivity and multi-level feature fusion. This simplification allowed us to strike a balance between model complexity and performance, making DenseNet121 with a lower growth rate a viable option for hardware-limited robot vacuum cleaner applications.

Simplified EfficientNet: EfficientNetB0 to EfficientNetB7 is a family of convolutional neural network architectures, which uniformly scale the depth, width, and resolution of the network to find a balance between model size and classification performance. While it achieves a state-of-the-art accuracy with relatively fewer parameters compared to other models, it can still be computationally expensive for hardware-constrained applications. The “B” in EfficientNetB0 to EfficientNetB7 represents its baseline architecture, which starts with a relatively small depth, width, and image resolution. The number after “B” indicates the level of compound scaling applied to the baseline architecture. Higher numbers represent larger and more computationally intensive models, which generally achieve a better performance but require more resources. Therefore, as I move from EfficientNetB0 to EfficientNetB7, the models become larger and more powerful, capable of handling more complex object classification tasks. However, they also require more memory and computational resources for training and inference. Table 4 lists the number of parameters, memory size, and scaling factor for each EfficientNetB model. By fine-tuning the scaling factor, I had the ability to customize the model size and achieve simplification.

Simplified ShuffleNet: ShuffleNet is a family of AI architectures designed to strike a balance between model accuracy and computational efficiency. ShuffleNet utilizes the channel shuffle operation to enable information exchange across groups of feature channels, enhancing the representational capacity. It employs depth-wise separable convolutions to reduce the computation and model size. ShuffleNetV2 introduces the concept of residual connections within ShuffleNet units to further improve information flow. ShuffleNetV3 introduces a new ShuffleNet unit design called “Ghost Unit,” which includes a “ghost” branch to enhance information flow without significantly increasing computations. Table 5 lists the number of parameters and memory size of each ShuffleNet model. Each ShuffleNet model provides different trade-offs between model size, computational efficiency, and accuracy, making them suitable for various applications with varying resource constraints.

Simplified MobileNet: MobileNet is a family of neural network architectures that provide efficient and lightweight solutions on resource-constrained devices. One of the key design principles of MobileNet is the use of depth-wise separable convolutions, which significantly reduce the computation and model size while preserving the accuracy. The architecture of MobileNet is governed by a parameter called “alpha”, which controls the width of the network. The width of the network determines the number of channels in each layer and, consequently, the number of parameters and computation involved in the model. As shown in Table 6, by adjusting the alpha parameter, different versions of MobileNet can be created, each optimized for a specific balance between model size and accuracy.

Simplified Xception: Xception is a neural network architecture that supports lightweight solutions on resource-constrained devices. The original Xception model consists of a 9-layer structure repeated eight times for linear stacking. The authors in ref. [22] suggest removing seven repeated structures to build a L-w Xception 1 model, containing only 9.5 million parameters. In order to further simplify this L-w Xception 1 model, I propose controlling the width of the network. The width of the network determines the number of channels in each layer and, consequently, the number of parameters and computation involved in the model. As shown in Table 7, by adjusting the width, different versions of L-w Xception 1 models can be created for specific model sizes.

4.3. Post-Training Weight Quantization on the Selected AI Models

The state-of-the-art AI models, such as ResNet, DenseNet, EfficientNet, MobileNet, and ShuffleNet, are typically trained using 32-bit floating-point numbers to represent their weights. These floating-point numbers offer high precision and allow for accurate computations during training. However, they can be memory-intensive and computationally expensive when deployed on devices with limited resources. Post-training weight quantization is a technique used to reduce the memory footprint and computational requirements of deep learning models after they have been trained. The process involves representing the trained weight parameters with a lower precision format than the original floating-point representation [27,28]. By doing so, the model can be more efficiently deployed on resource-constrained devices, such as edge devices, and micro-controller systems, without sacrificing much in terms of performance. In this work, I used 8-bit integer weights to replace 32-bit floating-point weights. Specifically, I used the module of tf.lite.conversion [29] to convert trained TensorFlow models into TensorFlow Lite models. TensorFlow Lite is a framework specifically designed for deploying machine learning models on resource-constrained devices. One of the key features of TensorFlow Lite is support for quantization, which allows models to be converted into a lower precision format to reduce the memory size and improve the inference speed on these devices. It is worth noting that weight quantization may lead to a slight drop in the classification accuracy due to the loss of information caused by the reduced weight precision. The time required for post-training weight quantization is mainly based on two factors: model complexity and the quantization method. More complex models with a higher number of parameters usually take longer to quantize due to the increased computational demands. Different quantization techniques (e.g., uniform quantization, dynamic quantization) have varying computational requirements. Some quantization methods may involve more complex calculations, leading to longer quantization times. In this paper, since I performed 8-bit uniform quantization and the selected AI models do not possess a huge number of parameters, the time required for post-training weight quantization was very short, typically lasting no more than 1 min.

5. Experimental Results and Discussion

5.1. Experimental Setup

In this study, the AI models of interest were programmed and coded using Python, Keras [30], and the TensorFlow platform [31]. To obtain statistical results, AI models were trained multiple times, each with distinct random parameter initialization. To expedite both the training and evaluation phases, I performed experiments on a CUDA-supported GPU machine equipped with TITAN XP cores, supported by NVIDIA’s Compute Unified Device Architecture (CUDA) technology [32]. All AI models were trained using the Adam optimizer [33], since Adam has fewer hyper-parameters to tune compared to other optimizers, making it easier to set up and less sensitive to hyper-parameter choices. Yet, due to the adaptive learning rate mechanism of the Adam optimizer, the Adam optimizer sometimes causes the learning rates to fluctuate more during training. To mitigate this effect, weight decay was incorporated into the Adam optimizer by adding a regularization term to the loss function. This regularization term encourages the Adam optimizer to favor smaller weights, helping prevent over-fitting and improving generalization. The training process spanned 150 epochs, a value determined empirically to achieve satisfactory convergence. To account for computational resource constraints, I adopted a uniform mini-batch size of 32 for all AI models during training. The steps per epoch determines how many times trainable parameters are updated during one epoch of training, and this parameter is determined by the size of the training dataset and the batch size used during training. For example, if I had 15,855 training samples and was using a batch size of 32, the steps per epoch would be rounded to 495.

5.2. Results of State-of-the-Art AI Model Examination

The outcomes of investigating the state-of-the-art AI models are presented in Table 8. The experimental results are organized based on the test accuracy, ranging from the largest to the smallest. Xception led to the highest test accuracy of 88.91%, with a memory size of 244.9 MB. Figure 4 provides a clear visualization of the trade-off between the classification accuracy and memory footprint in Table 8. It is evident that these AI models exhibit a broad spectrum of accuracy and memory size variations. In this figure, Xception, EfficientNet, MobileNet, and DenseNet121 demonstrate much higher accuracies. It is normal that certain AI models produce better outcomes despite having a smaller number of parameters. This phenomenon is attributed to the models’ architecture, efficient design, and optimization techniques, which allow them to effectively capture and utilize information from the training data. As a result, these models achieve better performance with fewer parameters compared to their larger, less optimized counterparts (for example, AlexNet and VGG-16 in this study). As discussed in Section 4.2, since EfficientNetB0 could not be simplified further, Xception, MobileNet, and DenseNet121 were chosen as the foremost candidate models for the subsequent phase of model simplification.

Figure 5 and Figure 6 plot the experimental results of the loss and classification accuracy on both the training and validation sets for DenseNet121, MobileNet, EfficientNetB1, and Xception. The purpose of the validation set is to evaluate the performance of a trained AI model and assess its generalization ability. It is clear that the validation loss started with a relatively high value and gradually diminished. At 150 epochs, the validation loss stabilized, indicating that the AI model fits well without over-fitting. Moreover, the training accuracy and validation accuracy ultimately settled at approximately 0.9 and 0.85, respectively. Two primary factors contributed to the wild fluctuation in the validation curves compared to the training curves. Firstly, the limited number of samples within the validation set played a significant role in the fluctuations observed in the validation curves. In this paper, considering that the validation dataset consisted of just 5760 samples across eight classification categories, the prevalence of the pronounced fluctuations in validation performance was not unexpected. This was because a limited number of validation samples may not adequately represent the overall data distribution, causing the model performance to vary significantly based on the specific samples in the validation set. Secondly, the fluctuation of validation curves was also influenced by a smaller batch size, which was 32 in this work. Smaller batches introduce increased noise the optimization process, which, in turn, results in more notable variations in validation performance. This occurs because the specific attributes of each mini batch impact the model’s parameters, magnifying the observed fluctuations.

5.3. Results of Simplified AI Models

The outcomes of investigating the simplified AI models are presented in Table 9. The experimental results of the L-w Xception 1 model with three different widths, the DenseNet121 model with three different growth rates, and the MobileNet model with three different alpha values are summarized. Through model simplification, the required memory size was down to less than 29 MB, while still retaining a test accuracy of over 76%. Figure 7 visually presents the trade-off between classification accuracy and the memory footprint in Table 8 and Table 9. Note that results in Table 8 and Table 9 are distinguished with markers of distinct colors in Figure 7. The blue markers represent the original AI models without simplification in Table 8, while the purple markers correspond to the simplified AI models shown in Table 9.

The experimental results for the simplified DenseNet121 models with a growth rate of 10 and the simplified L-w Xception 1 model with a width factor of 0.25 are depicted in Figure 8 and Figure 9, respectively. These figures illustrate the loss and classification accuracy on both the training and validation sets. The consistent decrease and relative stability of the validation loss throughout the training process indicate that these models are effectively learning meaningful patterns from the data and are likely to generalize well to new, unseen scenarios.

5.4. Results of Simplified AI Models with Post-Training Weight Quantization

Table 10 presents the results for the number of parameters, memory size, and test accuracy of these simplified state-of-the-art AI models following the application of the 8-bit weight quantization technique. When comparing the results in Table 9 with those in Table 10, it is evident that the memory size in Table 10 was significantly reduced to just a few megabytes (MB). As an illustration, the quantized simplified model of the L-w Xception 1 model with a growth rate of 0.25 achieved a remarkable test accuracy of 84.73% while utilizing only a 0.69 MB of memory. Therefore, this lightweight model shows great promise for deployment in micro-controllers with a memory budget of 1 MB. In comparison to the most accurate state-of-the-art model in the literature (Xception model in Table 8), this proposed model (the quantized simplified L-w Xception 1 model with a growth rate of 0.25 in Table 10) achieved a remarkable memory size reduction of 350 times, while experiencing only a minor decrease in classification accuracy, i.e., approximately 4.54%.

In the study of object classification, a confusion matrix is a tabular representation frequently utilized to assess the performance of a classification model. I present a comprehensive summary of the model’s prediction and their agreement with the ground-truth labels for each category. This overview is conducted for the original Xception model (Figure 10) and the newly proposed L-w Xception model with a width factor of 0.25 (Figure 11). As shown in Figure 10, even though this Xception model requires a large memory space, it predicted the categories of rice, mung bean, and millet very well. The classification accuracy for the category of cat/dog food was 79%. In contrast, Figure 11 demonstrates excellent predictions for rice, sunflower seed shell, and mung bean. I think this is probably due to their distinctive features. These object categories possess unique and distinguishable visual features that are easily recognized by this L-w Xception 1 model with a growth rate of 0.25. These distinct features enable accurate predictions, as the model can confidently identify patterns specific to these categories. Regarding the prediction of cat/dog food, Figure 11 shows an accuracy of 83%. This improvement implies that the simplified Xception 1 model has an enhanced capability in identifying pet food with a much smaller memory size. Both models exhibit a weakness in accurately predicting soybeans, possibly attributed to their yellow color, which can closely resemble the color of the floor or the surrounding environment. This similarity in color makes it challenging for the model to distinguish soybean from the background, especially if the background floor color is light or white.

5.5. Comprehensive Results Comparison

In Figure 12, the classification accuracy is plotted against the memory footprint for all AI models considered in this study, including state-of-the-art models, simplified models, and quantized simplified models. I can see that model simplification and weight quantization can result in a considerable reduction in memory requirements, while still maintaining a minor drop in the classification accuracy. By employing the proposed techniques, the memory can be significantly reduced to only 0.69 MB, while maintaining a classification accuracy of 84.73%. Therefore, it becomes viable to meet the deployment requirements of resource-constrained robot vacuum cleaners.

6. Conclusions and Future Work

In this study, I examine various state-of-the-art AI models, including ResNet, DenseNet, EfficientNet, MobileNet, NasNetMobile, and ShuffleNet, to identify the top-performing candidate models. Through experimental investigation, I selected Xception, MobileNet, and DenseNet121 as the most promising models to move forward with our model simplification efforts. Then, I proceeded to simplify these models by adjusting their alpha values, width factors, and growth rates, effectively reducing their memory footprint while striving to maintain classification accuracy. The results are remarkable, demonstrating the potential for these simplified models to be deployed on resource-constrained devices. Furthermore, I employed post-training weight quantization to further enhance the memory efficiency of the models. The quantized simplified models achieved significant memory reductions, making them ideal candidates for applications with strict memory constraints. Our experimental results demonstrate that model simplification, coupled with weight quantization, strikes a favorable balance between memory size and classification accuracy. The proposed techniques achieve memory reductions of up to 350 times, while only incurring a minor accuracy drop of 4.54%. With such substantial memory savings, our models are highly promising for deployment in micro-controllers of resource-constrained robot vacuum cleaners.

In our future work, I will delve into low-bit width weight quantization, such as 4-bit, 2-bit, or even 1-bit integers. I will also explore mixed-precision quantization approaches, where different layers or parts of the model are quantized to varying precision levels. The primary goal is to further reduce the memory and computation requirements while preserving model accuracy to the best possible extent.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Huang, Q.; Lu, C.; Chen, K. Smart Building Applications and Information System Hardware Co-Design. In Big Data Analytics for Sensor-Network Collected Intelligence; Elsevier: Amsterdam, The Netherlands, 2017; pp. 225–240. [Google Scholar]
Jayaram, R.; Dandge, R. Optimizing Cleaning Efficiency of Robotic Vacuum Cleaner. TATA ELXSI Report. Available online: https://www.tataelxsi.com/ (accessed on 13 July 2023).
Soori, M.; Arezoo, B.; Dastres, R. Artificial Intelligence, Machine Learning and Deep Learning in Advanced Robotics, a Review. Cogn. Robot. 2023, 3, 54–70. [Google Scholar] [CrossRef]
Huang, Q. Weight-Quantized SqueezeNet for Resource-Constrained Robot Vacuums for Indoor Obstacle Classification. AI 2022, 3, 180–193. [Google Scholar] [CrossRef]
Huang, Q.; Tang, Z. High-Performance and Lightweight AI Model for Robot Vacuum Cleaners with Low Bitwidth Strong Non-Uniform Quantization. AI 2023, 4, 531–550. [Google Scholar] [CrossRef]
Amitha, S.; Raj, P.N.; Sonika, H.P.; Urs, S.; Tejashwini, B.; Kulkarni, S.A.; Jha, V. Segregated Waste Collector with Robotic Vacuum Cleaner using Internet of Things. In Proceedings of the IEEE International Symposium on Sustainable Energy, Signal Processing and Cyber Security, Gunupur Odisha, India, 16–17 December 2020; pp. 1–5. [Google Scholar] [CrossRef]
Calaiaro, J. AI-Guided Robots Are Ready to Sort your Recyclables. IEEE Spectrum 2022. [Google Scholar]
Gusson, M. Robotic Vacuum Cleaner Designed for Circular Economy. Available online: https://umu.diva-portal.org/smash/get/diva2:1577399/FULLTEXT01.pdf (accessed on 13 July 2023).
Neuman, S.; Plancher, B.; Duisterhof, B.P.; Krishnan, S.; Banbury, C.; Mazumder, M.; Prakash, S.; Jabbour, J.; Faust, A.; de Croon, G.C.H.E.; et al. Tiny Robot Learning: Challenges and Directions for Machine Learning in Resource-Constrained Robots. In Proceedings of the IEEE 4th International Conference on Artificial Intelligence Circuits and Systems, Incheon, Republic of Korea, 13–15 June 2022; pp. 296–299. [Google Scholar]
Bao, L.; Lv, C. Ecovacs Robotics: The AI Robotic Vacuum Cleaner Powered by TensorFlow. 2020. Available online: https://blog.tensorflow.org/2020/01/ecovacs-robotics-ai-robotic-vacuum.html (accessed on 13 February 2022).
Althnian, A.; AlSaeed, D.; Al-Baity, H.; Samha, A.; Dris, A.B.; Alzakari, N.; Abou Elwafa, A.; Kurdi, H. Impact of Dataset Size on Classification Performance: An Empirical Evaluation In the Medical Domain. Appl. Sci. 2021, 2021, 796. [Google Scholar] [CrossRef]
Groshev, M.; Baldoni, G.; Cominardi, L.; de la Oliva, A.; Gazda, R. Edge robotics: Are we ready? An experimental evaluation of current vision and future directions. Digit. Commun. Netw. 2023, 9, 166–174. [Google Scholar] [CrossRef]
Sami, S.; Dai, Y.; Tan, S.R.X.; Roy, N.; Han, J. Spying with Your Robot Vacuum Cleaner: Eavesdropping via Lidar Sensors. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems, Virtual, 16–19 November 2020; pp. 354–367. [Google Scholar]
Schneider, M.; Amann, R.; Mitsantisuk, C. Waste object classification with AI on the edge accelerators. In Proceedings of the IEEE International Conference on Mechatronics, Takamatsu, Japan, 8–11 August 2021; pp. 1–6. [Google Scholar]
Bai, J.; Lian, S.; Liu, Z.; Wang, K.; Liu, D. Deep Learning Based Robot for Automatically Picking up Garbage on the Grass. IEEE Trans. Consum. Electron. 2018, 64, 382–389. [Google Scholar] [CrossRef]
Adedeji, O.; Wang, Z. Intelligent Waste Classification System Using Deep Learning Convolutional Neural Network. Procedia Manuf. 2019, 35, 607–612. [Google Scholar] [CrossRef]
Aarthi, R.; Rishma, G. A Vision Based Approach to Localize Waste Objects and Geometric Features Exaction for Robotic Manipulation. Procedia Comput. Sci. 2023, 218, 1342–1352. [Google Scholar] [CrossRef]
Stephenn, L.; Cabatuan, M.K.; Sybingco, E.; Dadios, E.P.; Calilung, E.J. Common Garbage Classification Using Mobilenet. In Proceedings of the IEEE 10th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management, Baguio City, Philippines, 29 November–2 December 2018; pp. 1–4. [Google Scholar]
Chen, Z.; Yang, J.; Chen, L.; Jiao, H. Garbage classification system based on improved ShuffleNet v2. Resour. Conserv. Recycl. 2022, 178, 106090. [Google Scholar] [CrossRef]
Masand, A.; Chauhan, S.; Jangid, M.; Kumar, R.; Roy, S. Scrapnet: An Efficient Approach to Trash Classification. IEEE Access 2021, 9, 130947–130958. [Google Scholar] [CrossRef]
Feng, J.; Tang, X. Office Garbage Intelligent Classification Based on Inception-V3 Transfer Learning Model. J. Phys. Conf. Ser. 2020, 1487, 012008. [Google Scholar] [CrossRef]
Shi, C.; Xia, R.; Wang, L. A Novel Multi-Branch Channel Expansion Network for Garbage Image Classification. IEEE Access 2020, 8, 154436–154452. [Google Scholar] [CrossRef]
Lv, Y.; Fang, Y.; Chi, W.; Chen, G.; Sun, L. Object Detection for Sweeping Robots in Home Scenes (ODSR-IHS): A Novel Benchmark Dataset. IEEE Access 2021, 9, 17820–17828. [Google Scholar] [CrossRef]
STM32F7 Series. Available online: www.st.com/en/microcontrollers-microprocessors/stm32f7-series.html (accessed on 30 July 2023).
Rukundo, O. Effects of Image Size on Deep Learning. Electronics 2023, 12, 985. [Google Scholar] [CrossRef]
Talebi, H.; Milanfar, P. Learning to Resize Images for Computer Vision Tasks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 497–506. [Google Scholar]
Kulkarni, U.; Hosamani, A.S.; Masur, A.S.; Hegde, S.; Vernekar, G.R.; Chandana, K.S. A Survey on Quantization Methods for Optimization of Deep Neural Networks. In Proceedings of the International Conference on Automation, Computing and Renewable Systems (ICACRS), Pudukkottai, India, 13–15 December 2022; pp. 827–834. [Google Scholar]
Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and Training of Neural Networks for Efficient Integer-arithmetic-only Inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2704–2713. [Google Scholar]
TensorFlow Model Conversion Overview. Available online: https://www.tensorflow.org/lite/models/convert (accessed on 30 July 2023).
Chollet, F. Keras. 2015. Available online: https://github.com/fchollet/keras (accessed on 30 July 2023).
Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
Available online: https://developer.nvidia.com/cuda-toolkit (accessed on 30 July 2023).
Kingma, D.; Jimmy, B. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]

Figure 1. Illustrative samples from each category in the generated dataset.

Figure 2. A statistical overview of the dataset generated in this study.

Figure 3. Generating the best lightweight AI model involved several design procedures.

Figure 4. Performance comparison of test classification accuracy vs. memory footprint in state-of-the-art AI models in the literature.

Figure 5. DenseNet121 and MobileNet, respectively.

Figure 6. EfficientNetB1 and Xception, respectively.

Figure 7. Performance comparison of test classification accuracy vs. memory footprint of proposed simplified AI models and state-of-the-art AI models in the literature.

Figure 8. DenseNet121 with a growth rate of 10.

Figure 9. L-w Xception 1 with a width factor of 0.25.

Figure 10. Confusion matrix of the original Xception model.

Figure 11. Confusion matrix of quantized L-w Xception 1 model with a width factor of 0.25.

Figure 12. Performance comparison of test classification accuracy vs. memory footprint of our proposed quantized simplified lightweight AI models and state-of-the-art models in the literature. Stars, squares, and circles represent quantized simplified models, simplified models, and original models, respectively.

Table 1. Our dataset comprises eight distinct categories of indoor suctionable objects intended for classification and recycling purposes.

Object Category	The Number of Samples in Our Dataset
rice	2158
cat/dog food	5606
sunflower seed shell	3414
red bean	2205
soybean	2305
green bean	2179
cat litter	3014
millet	2161

Table 2. The number of layers, parameters, and memory size of the family of ResNet models.

ResNet Model	The Number of Layers	The Number of Parameters	Memory Size
ResNet18	18	11,194,312	131.4 MB
ResNet34	34	21,318,728	250.3 MB
ResNet50	50	23,604,104	277.1 MB
ResNet101	101	42,674,568	501.0 MB
ResNet152	152	179,521,992	2104.8 MB

Table 3. The number of parameters and memory size of DenseNet121 model.

DenseNet121 Model	The Number of Parameters	Memory Size
Growth Rate 4	169,086	3.9 MB
Growth Rate 8	527,976	8.0 MB
Growth Rate 10	784,833	11.0 MB
Growth Rate 12	1,094,694	14.6 MB
Growth Rate 16	1,869,240	23.6 MB
Growth Rate 32	7,045,704	83.9 MB

Table 4. The number of parameters, memory size, and scaling factor of EfficientNet models.

EfficientNet Model	The Number of Parameters	Memory Size	Scaling Factor
EfficientNetB0	4,059,819	48.3 MB	minimum
EfficientNetB1	6,585,487	78.2 MB	1.0
EfficientNetB2	7,779,841	92.2 MB	1.1
EfficientNetB3	10,795,831	127.5 MB	1.2
EfficientNetB4	17,688,167	208.4 MB	1.4
EfficientNetB5	28,529,919	335.6 MB	1.6
EfficientNetB6	40,978,583	481.4 MB	1.8
EfficientNetB7	64,118,175	752.5 MB	2.0

Table 5. The number of parameters and memory size of ShuffleNet models.

ShuffleNet Model	The Number of Parameters	Memory Size
ShuffleNetV2	755,004	9.0 MB
ShuffleNet	1,482,368	17.5 MB
ShuffleNetV3	1,991,372	23.5 MB

Table 6. The number of parameters and memory size of MobileNet models.

MobileNet Model	The Number of Parameters	Memory Size
alpha = 0.5	833,640	10.4 MB
alpha = 0.53	930,435	11.3 MB
alpha = 0.75	1,839,128	21.9 MB
alpha = 1	3,237,064	38.2 MB

Table 7. The number of parameters and memory size of L-w Xception 1 models.

L-w Xception 1 Model	The Number of Parameters	Memory Size
width = 1	9,549,464	112.1 MB
width = 0.5	2,421,120	28.6 MB
width = 0.25	622,160	7.6 MB
width = 0.17	355,794	4.5 MB

Table 8. The number of parameters, memory size, and test accuracy of state-of-the-art AI models employed in indoor small suction classification as reported in the literature.

State-of-the-Art AI Model	The Number of Parameters	Memory Size	Test Accuracy
Xception [22]	20,877,872	244.9 MB	88.91%
L-w Xception 1 [22]	9,549,464	112.1 MB	86.52%
EfficientNetB1	6,585,487	78.2 MB	86.08%
DenseNet121 [23]	7,045,704	83.9 MB	85.07%
EfficientNetB0 [18]	4,059,819	48.3 MB	84.47%
EfficientNetB3 [19]	10,795,831	127.5 MB	84.19%
MobileNet [17]	3,237,064	38.2 MB	83.66%
Inception-V3 [21]	21,819,176	256.8 MB	82.90%
ResNet50 [15]	23,604,104	277.1 MB	79.51%
ShuffleNetV3	1,991,372	23.5 MB	77.94%
ShuffleNet	1,482,368	17.5 MB	76.85%
ResNet34 [13]	21,318,728	250.3 MB	76.29%
ShuffleNetV2 [14]	755,004	9.0 MB	75.23%
DenseNet169 [23]	12,656,200	149.8 MB	75.11%
ResNet101 [16]	42,674,568	501.0 MB	72.90%
MobileNetV2 [18]	2,268,232	27.1 MB	71.90%
NasNetMobile [18]	4,278,172	53.3 MB	71.64%
LeNet-5	4,059,372	47.6 MB	58.33%
AlexNet	26,822,280	314.4 MB	24.21%
VGG-16	98,641,736	1156.1 MB	23.80%

Table 9. Test accuracy and memory usage for simplified state-of-the-art AI models.

Simplified AI Models	The Number of Parameters	Memory Size	Test Accuracy
L-w Xception 1 (width = 0.5)	2,421,120	28.6 MB	87.76%
L-w Xception 1 (width = 0.25)	622,160	7.6 MB	84.78%
L-w Xception 1 (width = 0.17)	355,794	4.5 MB	76.53%
DenseNet121 (growthrate = 4)	169,086	3.9 MB	78.33%
DenseNet121 (growthrate = 8)	527,976	8.0 MB	81.28%
DenseNet121 (growthrate = 10)	784,833	11.0 MB	83.43%
MobileNet (alpha = 0.5)	833,640	10.4 MB	76.66%
MobileNet (alpha = 0.53)	930,435	11.3 MB	78.62%
MobileNet (alpha = 0.75)	1,839,128	21.9 MB	80.62%

Table 10. Test accuracy and memory usage for simplified state-of-the-art AI models with 8-bit weight quantization.

Quantized Simplified AI Models	The Number of Parameters	Memory Size	Test Accuracy
L-w Xception 1 (alpha = 0.5)	2,421,120	2.51 MB	87.80%
L-w Xception 1 (alpha = 0.25)	622,160	0.69 MB	84.73%
L-w Xception 1 (width = 0.17)	355,794	0.42 MB	76.76%
DenseNet121 (growthrate = 4)	169,086	0.4 MB	78.30%
DenseNet121 (growthrate = 8)	527,976	0.7 MB	81.20%
DenseNet121 (growthrate = 10)	784,833	1.0 MB	83.36%
MobileNet (alpha = 0.5)	833,640	0.9 MB	76.61%
MobileNet (alpha = 0.53)	930,435	1.0 MB	78.75%
MobileNet (alpha = 0.75)	1,839,128	1.9 MB	80.52%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, Q. Towards Indoor Suctionable Object Classification and Recycling: Developing a Lightweight AI Model for Robot Vacuum Cleaners. Appl. Sci. 2023, 13, 10031. https://doi.org/10.3390/app131810031

AMA Style

Huang Q. Towards Indoor Suctionable Object Classification and Recycling: Developing a Lightweight AI Model for Robot Vacuum Cleaners. Applied Sciences. 2023; 13(18):10031. https://doi.org/10.3390/app131810031

Chicago/Turabian Style

Huang, Qian. 2023. "Towards Indoor Suctionable Object Classification and Recycling: Developing a Lightweight AI Model for Robot Vacuum Cleaners" Applied Sciences 13, no. 18: 10031. https://doi.org/10.3390/app131810031

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Towards Indoor Suctionable Object Classification and Recycling: Developing a Lightweight AI Model for Robot Vacuum Cleaners

Abstract

1. Introduction

2. Related Work

3. Dataset Preparation and Experimental Setup

4. Proposed Lightweight AI Model Development Procedures

4.1. State-of-the-Art AI Model Examination

4.2. AI Model Simplification

4.3. Post-Training Weight Quantization on the Selected AI Models

5. Experimental Results and Discussion

5.1. Experimental Setup

5.2. Results of State-of-the-Art AI Model Examination

5.3. Results of Simplified AI Models

5.4. Results of Simplified AI Models with Post-Training Weight Quantization

5.5. Comprehensive Results Comparison

6. Conclusions and Future Work

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI