1. Introduction
The deployment of machine learning models on mobile platforms has opened a multitude of opportunities across various sectors, including economics, environment, healthcare, education, and, notably, agriculture. In the realm of agriculture, these applications hold particular significance, offering transformative benefits to farmers, one of the backbones of our society. By equipping farmers with state-of-the-art technologies, these applications provide them with unprecedented access to timely and actionable insights.
It has been proven over the years that insects pose the greatest dangers to crop yields during harvest, as seen by the rapid rise in the use of agricultural pesticides. However, due to evolution and climate change-related factors, a lot of dangerous insects become resistant and pose a greater threat [
1]. Therefore, throughout the years, smart insect monitoring systems [
2,
3] and insect detection and classification models [
4,
5,
6] have been used to tackle this problem by helping farmers and smart IoT-based systems identify and target these insects to protect crops.
Traditional methods of insect detection and pest management rely heavily on manual observation or cloud-based classification systems, which often introduce delays, inaccuracies, and privacy concerns. These methods are also impractical in remote or rural areas with limited or unreliable network connectivity, thus making rapid pest identification and timely intervention difficult.
Further worsening the problem, climate change and accelerated insect evolution have led to the emergence of resistant insect populations, complicating traditional pesticide-based management strategies. Existing digital insect monitoring systems often leverage heavy computational models which, while accurate, are unsuitable for direct deployment on resource-constrained mobile and edge devices commonly used by farmers. Such models either require continuous internet connectivity or are computationally prohibitive for real-time on-device analysis, thus failing to deliver the speed, efficiency, and convenience urgently needed in practical agricultural scenarios.
Therefore, there is a clear and pressing need for optimized, edge-deployable machine learning models that are capable of accurate and real-time insect classification directly on mobile devices. The existing literature largely focuses on cloud-based or resource-intensive deep learning architectures, neglecting the challenges associated with the lightweight deployment and quantization strategies that are critical for mobile scenarios. Moreover, datasets utilized by many existing studies often either lack specificity relevant to local pest populations or are excessively generalized, resulting in reduced model accuracy and applicability in localized agricultural contexts.
Now, there are many different insect image datasets with labels and annotations. Some popular ones include IP102 [
7], which contains over 75,000 images with 102 classes of insects, an ultra-specific insect dataset [
8], and BIOSCAN-1M [
9], which contains over 1 million images. However, these large datasets may not always represent the species or environmental conditions relevant to a particular agricultural context. Consequently, more targeted datasets have begun to emerge—such as the Dangerous Farm Insects Dataset—focusing specifically on pests that pose significant risks to crop yields. These smaller, highly specific datasets present new opportunities and challenges for model development, especially when balanced against the computational constraints and on-device resource limitations inherent to mobile and edge platforms.
To date, relatively few studies have leveraged the Dangerous Farm Insects Dataset. One notable example is [
10], where researchers employed transfer learning approaches using pre-trained deep learning architectures (e.g., ResNet, MobileNet, VGG) to identify the most effective model for classifying dangerous farm insects. Despite the dataset’s limited size, ref. [
10] demonstrated that models like Xception could achieve a validation accuracy of about 77.7%. However, the test performance remained around 70%, suggesting that there is room for methodological improvements and a need for techniques that better generalize from limited training samples while also maintaining efficient inference.
In this context, the present study aims to push state-of-the-art on-device insect classification by focusing on advanced classification models that are inherently well-suited for mobile deployment. Specifically, we investigate architectures such as Mobile-ViT and EfficientNet, known for their balance of model accuracy and computational efficiency on constrained hardware platforms. While prior works like [
10] provide a valuable benchmark for model performance on the Dangerous Farm Insects Dataset, our approach strives to go further by optimizing both accuracy and efficiency through a series of quantization strategies. These include Post-Training Quantization, Quantization-Aware Training, and representative dataset selection, all of which are evaluated for their impact on model performance and resource utilization. Beyond evaluating these techniques on our primary dataset, we also validate the approach on a larger dataset (IP102), demonstrating that our method is indeed generalizable, and it scales effectively.
To illustrate the practical implications of these advancements, we also develop a proof-of-concept mobile application using TensorFlow Lite. This application performs real-time insect classification directly on the device, enabling farmers and field workers to rapidly identify pests without relying on cloud-based services or continuous network connectivity. By improving both the effectiveness and the deployment feasibility of insect classification tools, this work contributes a generalizable methodology for building high-performance, resource-efficient ML models suited to the agricultural domain and beyond.
2. Literature Review
The ubiquity of mobile devices has motivated the search for rapidly prototyped and deployed machine learning models adept in running within the environments of these platforms. TensorFlow Lite (TFLite), a new trademark of Google’s TensorFlow framework which is lighter and thus can be integrated into mobile applications easily, plays a role as the bridge connecting the development of machine learning models and mobile applications [
11]. Developed for mobile and embedded platforms such as smartphones, tablets, and many portable devices, TensorFlow Lite enables developers to implement intelligent capabilities on their mobile applications.
TensorFlow Lite deals with the challenges in mobile environments, such as limited computing resources [
12], power limitations, and the requirement for quicker real-time processing [
13]. Through methods like model quantization [
14], optimization [
15], and hardware acceleration, TensorFlow Lite allows complex machine learning algorithms and neural networks to run on devices efficiently. This ensures the response time protects user privacy through computations performed locally on the device itself while delivering user experiences. Whether it is image recognition or natural language processing to gesture recognition or predictive analytics, TensorFlow Lite equips developers with tools to create applications that leverage machine learning capabilities at the device level.
There are some interesting applications of TFLite in different domains where the models have been used for a variety of tasks. For instance, ref. [
16] introduces an IoT-based solution for real-time flash flood detection and alerting using TensorFlow Lite. It addresses the limitations of traditional flood warning methods by emphasizing the need for timely alerts to facilitate an effective public response. The proposed solution employs computer vision techniques with video cameras to monitor water levels. Implemented on low-powered Raspberry Pi devices, the system offers scalability and adaptability for deployment in flood-prone regions. Performance evaluation identifies TensorFlow Lite with an SSD-MobileNet-v2-Quantized model as the optimal configuration for achieving high detection accuracy and efficiency in IoT environments.
Some interesting applications have also been seen in image classification. The work in [
17] introduces “AgroAId”, a mobile app system utilizing deep learning and TensorFlow Lite for the visual classification of plant species and diseases, achieving 99% accuracy and providing spatiotemporal analytics on regional and seasonal disease trends. The work in [
18] presents a mobile application for on-edge medical diagnosis of lung diseases using TensorFlow Lite. The study experimented with a total of 18 models, which were created using various quantization techniques, including post-classification quantization, integer quantization, and Quantization-Aware Training. Quantization-Aware Training resulted in a 75.59% reduction in model size, with MobileNetV2 offering the best performance-to-size ratio, showing only a 4.1% accuracy loss.
Additionally, ref. [
19] explores the use of vision transformers (ViTs) for on-edge medical diagnostics using the Kvasir-Capsule image classification dataset of gastrointestinal diseases. The study applied TensorFlow Lite quantization techniques, such as post-training float-16 (F16) quantization and Quantization-Aware Training (QAT), to reduce model sizes while maintaining performance. The study concluded that MobileViT_V2_175, with its F16 quantization and 27.47 MB size, offered the best balance between performance and efficiency.
Moreover, ref. [
20] proposed “Leboh”, an Android mobile application utilizing TFLite’s EfficientNet-Lite model for waste classification, achieving an accuracy of 95.39% during model evaluation and 82.5% during user testing. Likewise, ref. [
21] had an interesting idea and dataset, as they introduced a mobile application for oil palm fresh fruit ripeness classification, achieving a high test accuracy of 89.3%, with a 96 ms inference time per image using EfficientNetB0 optimized through transfer learning, 9-angle crop data augmentation, and float16 quantization.
Furthermore, with the advancements in the fields of computer vision, object detection applications have made their way into the TinyML domain, and have been implemented using TFLite and MobileNetv2. For example, ref. [
22] presents an object detection and classification system for urban actors, using TFLite with a re-trained Single Shot Detector (SSD) model. Another interesting use case was [
23], which introduces a mobile-based application for traffic signs recognition, leveraging TFLite and transfer learning techniques to train a Single Shot MultiBox Detector (SSD) MobileNet V2 model, with the quantized model demonstrating four times faster detection compared to the original float model.
There are lots of other interesting applications in video classification [
24], audio classification [
25,
26], and natural language processing that this paper does not delve into but are also worth looking into to gain a broader perspective on the potential and ability of TFLite in adding to the TinyML space.
In recent years, the integration of modern machine learning techniques has greatly advanced insect image classification and detection across a variety of contexts, from agricultural fields to forest ecosystems. Building on the versatility and real-time capabilities of TFLite demonstrated in previous applications, this literature review now shifts focus to insect classification. By synthesizing key findings from studies in this domain, we explore the datasets, algorithms, and methodologies employed, emphasizing the potential for mobile-based, real-time insect detection solutions to address existing challenges and enhance performance in the field.
Many of the existing workers performed experiments on their personal developed datasets. Ref. [
27] developed a dataset of 225 images, representing nine common orders and sub-orders of insect species, with 25 specimen images in each. They utilized artificial neural networks (ANNs) and support vector machine (SVM) algorithms, reaching an accuracy of 93% with the SVM model.
Similarly, ref. [
28] developed their own dataset, consisting of 60 samples of 24 common pest species found in-field, resulting in a 1440 image dataset. To enhance the classification accuracy in field crop insects, they developed recognition systems utilizing techniques such as multiple-task sparse representation and multiple-kernel learning (MKL). By leveraging the unique features of insect images, these techniques significantly improved the recognition performance, contributing to a classification result of 90.4%.
The work in [
29] uses both of the datasets developed in [
27,
28], consisting of nine and twenty-four classes, respectively. Their methodology involved employing various machine learning techniques, such as artificial neural networks (ANN), support vector machine (SVM), k-nearest neighbors (KNN), naive Bayes (NB), and convolutional neural network (CNN) models. The study also proposed an insect pest detection algorithm involving foreground extraction and contour identification, contributing to the classification of insects across complex backgrounds. The evaluation of classification models was enhanced through nine-fold cross-validation, resulting in the highest classification rates of 91.5% and 90% for the nine and the twenty-four class insects, respectively, achieved using the CNN model.
Additionally, ref. [
5] uses the dataset of 24 classes developed by [
28], along with additional images sourced from the internet and incorporated into it for enhancements in generalization. The study implemented an improved network architecture based on VGG19, known for its progressive learning of image features. The model was compared to existing methods such as SSD and Fast R-CNN. Notably, the proposed methodology achieved a mean Average Precision (mAP) of 0.8922, surpassing both SSD and Fast R-CNN in performance.
Other studies that opted to develop their own dataset include [
30], where a dataset of 29 k images was developed, covering 30 insect species. ResNet transfer learning techniques are applied to the dataset to classify the forest insects. The system achieved an average insect classification accuracy of 94%. To facilitate the classification process, the researchers developed an application that allows users to capture, edit, and transfer insect images.
Likewise, ref. [
31] also utilized their own dataset, which comprised images of twenty classes of paddy field insect pests sourced from Google Images and photographs taken by the Faculty of Agriculture, University of Jaffna, Sri Lanka. A framework was developed to classify the images of paddy field insect pests using gradient-based features through the bag-of-words approach. The classification process involved several steps, including the identification of regions of interest and their representation as scale-invariant feature transform (SIFT) or speeded-up robust features (SURF) descriptors. Subsequently, codebooks were constructed to map these descriptors into fixed-length vectors in the histogram space. The feature histograms were then subjected to multi-class classification using support vector machines (SVMs). Notably, the combination of the Histogram of Oriented Gradients (HOG) descriptors with SURF features yielded approximately 90% accuracy in classification.
The work in [
32] carries out a cascade architecture to identify Lepidoptera species from their images, combining deep convolutional neural networks (DCNNs) with Supported Vector Machines (SVMs). The dataset utilized in this research consisted of 1301 Lepidoptera images from 22 species. The architecture used part of the DCNN as a feature extractor, followed by SVMs serving as the insect classifiers. The proposed cascade architecture achieved a reported accuracy of 100% on the testing dataset.
Recent developments in automated insect identification further emphasize the feasibility and effectiveness of deploying machine learning-based methods for practical agricultural monitoring. The work in [
33] demonstrated the use of convolutional neural networks (CNNs) to classify economically important insect species, such as the Mediterranean fruit fly (
Ceratitis capitata) and the olive fruit fly (
Bactrocera oleae), in real-time, even when insects are freely moving and changing postures. Their study highlighted the significant improvement in accuracy (93%) that is achievable using methods that go beyond traditional static-image-based approaches, emphasizing the potential for real-time, automated monitoring and precise pest management interventions. Similarly, the work in [
34] leveraged optical sensors combined with machine learning to accurately identify flying insects in agricultural fields, achieving over 80% accuracy in the classification. This approach facilitated the timely and spatially optimized application of insecticides, significantly reducing unnecessary pesticide usage and supporting environmentally sustainable pest management. Collectively, these advancements illustrate the growing potential and practical utility of integrated sensor and machine learning technologies in precision agriculture. Lastly, the work in [
35] proposed a workflow for holistic insect monitoring. The approach involves the use of large-scale DNA barcoding (megabarcoding) to classify species, validate them with morphology, and use specimen images to train AI for identification and trait analysis.
In summary, TFLite’s adaptability for mobile and on-device machine learning has been demonstrated across various fields, and this flexibility is especially promising for insect classification. While previous research has utilized machine learning models for insect detection, reliance on cloud-based inference systems often introduces latency and privacy concerns. By applying TFLite for real-time, on-device insect classification, our study bridges this gap, offering a mobile-based solution that leverages the benefits of low-latency inference, efficient power usage, and privacy preservation. This approach not only enhances performance in the field, but also makes insect detection more accessible and scalable for practical applications such as wildlife monitoring and agricultural pest control.
In recent years, smart trap technology has emerged as a complementary approach to traditional deep learning-based insect classification. Ref. [
36] developed an autonomous smart trap for the precision monitoring of hematophagous flies on cattle that integrates high-resolution imaging, environmental sensing, and on-device convolutional neural network processing. Their system achieved an overall classification accuracy of 96%, demonstrating a robust performance under field conditions, with low power consumption and real-time data transmission capabilities. This work not only validates the practical utility of edge-deployed AI in pest monitoring, but it also serves as a valuable benchmark for integrating sensor data with deep learning methodologies. By comparing our quantization-driven optimization approach to such real-world applications, we further underscore the relevance of our methods for achieving efficient and accurate insect detection in resource-constrained agricultural environments.
Furthermore, ref. [
37] provide a comprehensive review of artificial intelligence applications in integrated pest management. They systematically examine how AI methodologies, when integrated with IoT-based systems, can enhance decision-making in pest control. This work, along with [
38], highlights the potential for developing intelligent, real-time pest management systems that reduce the reliance on chemical treatments while promoting sustainable agriculture. By contextualizing our research within these broader AI-driven strategies, our approach to quantization-driven optimization for edge deployment is further validated as a critical step toward efficient, real-world pest monitoring.
6. Results and Discussion
To our knowledge, the only prior study using the Dangerous Farm Insects Dataset is reported in [
10]. Given the limited number of images available, the researchers of [
10] implemented a transfer learning approach, utilizing pre-trained deep learning models to capture features of field pests more effectively. The method leveraged the use of ResNet, MobileNet, and VGG, with the goal of identifying the most suitable architecture for classifying dangerous farm insects.
Specifically, ref. [
10] explored the use of models including ResNet-50V2, MobileNetV2, and Xception. While ResNet-50V2 showed potential in the initial testing, signs of overfitting suggested it might benefit from further refinement through hyperparameter tuning. ResNet-152V2 showed comparable behavior but did not offer notable performance gains over ResNet-50V2. MobileNetV2 maintained a stable performance during both training and validation, though the testing results revealed some limitations. In contrast, Xception provided significant performance improvements, especially in testing, positioning it as a leading candidate for the classification task. After hyperparameter optimization, Xception achieved a validation accuracy of 77.7%, outperforming baseline models. Although the study does not explicitly report test accuracy, a visual inspection of their results indicates a test performance with a classification accuracy of around 70%.
As for our proposed solutions, after fine-tuning our pre-trained models and utilizing augmentation, we achieved higher classification accuracy than the work reported in [
10]. This is partially because we used newer and better models which are more suitable for image classifications such as MobileNet and EfficientNet. Additionally, the authors in [
10] did not implement any specific augmentation techniques, which further distinguishes our approach. The results of our proposed solution are given below, in
Table 3.
The results in
Table 3 demonstrate that our proposed models, MobileViT and EfficientNetV2B2, achieved a superior performance compared to the baseline models used in previous works [
10]. MobileNetV2, a baseline model, achieved a validation accuracy of 72.3%, but test accuracy was not reported in [
10], suggesting limited applicability of this lightweight architecture to the complex features present in our insect dataset. Xception, another baseline model, improved upon MobileNetV2, with a validation accuracy of 77.7% and an approximate test accuracy of 70%, though its lower test accuracy may indicate susceptibility to overfitting, likely due to the absence of data augmentation in the original approach. In contrast, our MobileViT model, fine-tuned and supported by targeted augmentation techniques, achieved a validation accuracy of 82.6% and a test accuracy of 73.4%, highlighting the benefits of MobileViT’s transformer-based design for capturing intricate insect image patterns. However, the highest test performance was achieved with our EfficientNetV2B2 model, which attained validation and test accuracies of 80.9% and 77.8%, respectively. The model’s compound scaling method allowed for an optimal balance between model size and depth, resulting in robust generalization across unseen data. These findings underscore the efficacy of combining advanced architectures with data augmentation to enhance model generalizability and address dataset limitations, positioning it as an optimal choice for on-device insect classification.
The ablation study summarized in
Table 4 illustrates the incremental impact of individual augmentation techniques on our EfficientNetV2B2 model’s accuracy. Without any augmentation, the model achieved a baseline accuracy of 74.5%, indicating a limited generalization capability. Horizontal and vertical flips individually improved the accuracy slightly, highlighting the benefit of these realistic transformations. Rotation augmentations showed the highest individual performance gain, reflecting the value of rotational invariance for insect images. Adjustments in brightness also contributed positively by simulating varying environmental lighting conditions. The combination of all augmentation techniques led to the highest model accuracy, underscoring the complementary benefits of applying multiple targeted data augmentation strategies to enhance model generalization and robustness.
It is important to recognize that while augmentation can significantly enhance generalization, certain augmentation methods, when applied excessively or without careful consideration, may negatively impact performance. Excessive rotations or aggressive image transformations can inadvertently distort critical morphological features essential for insect classification, such as wing patterns, antennae positions, or body contours. These subtle but distinct visual characteristics, which the model relies upon to differentiate between similar insect classes, become difficult to discern when excessively manipulated. Therefore, augmentations must be thoughtfully balanced and applied strategically, preserving critical visual features to maintain high model accuracy and reliable generalization.
More relevant to this work, and as outlined in
Section 4, we applied various quantization techniques to convert the full models into TensorFlow Lite formats optimized for edge deployment.
Table 5 presents the impact of these quantization techniques on model size, classification accuracy, and inference times. The results illustrate the trade-offs associated with each quantization method and provide insights into achieving efficient, on-device inference while maintaining high classification accuracy.
In our experiments, the inference speed measurements reported in
Table 5 were obtained using the TensorFlow Lite runtime integrated within our mobile application, developed with the Flutter framework. These results were recorded on a Nothing Phone (2), manufactured by Nothing Technology Limited in London, UK, sourced from a UAE retail shop, and running Android 15. This device is powered by the Qualcomm Snapdragon 8+ Gen 1 chipset built on a 4 nm process, featuring an octa-core CPU configuration with one Cortex-X2 core at 3.19 GHz, three Cortex-A710 cores at 2.75 GHz, and four Cortex-A510 efficiency cores at 2.0 GHz. The system is further supported by an Adreno 730 GPU, clocked at 900 MHz, and 12 GB of LPDDR5 RAM running at 3200 MHz, ensuring a flagship-level performance for on-device machine learning inference. This detailed hardware profile reflects a typical, high-performance mobile environment, and reinforces the practical relevance of our deployment results in real-world agricultural settings.
In addition to the performance metrics presented, our mobile application has been developed using the Flutter framework, ensuring cross-platform compatibility across both Android and iOS devices. In our experiments, the system was deployed on a Nothing Phone (2) running Android 15, which is equipped with a high-resolution camera that meets the standard pixel specifications required for accurate insect classification. The camera’s resolution and sensor quality are sufficient to capture the fine details needed for a reliable model performance in real-world agricultural settings. Although our current evaluation focused on Android deployment, the Flutter-based development guarantees that our approach is equally compatible with iOS.
Table 5 presents the impact of various quantization techniques on model size, inference time, and test accuracy for MobileViT and EfficientNetV2B2, demonstrating the trade-offs in efficiency and accuracy that are critical for edge deployment. The base MobileViT model, known for its small and lightweight architecture, achieved the best inference time at 0.01 s per image, with a compact size of 22 MB and a test accuracy of 73.4%. However, after applying Post-Training Quantization, the inference time increased to 0.34 s per image, while the accuracy dropped to 66.1%. This performance degradation can be attributed to MobileViT’s current lack of support in Keras as a pre-trained model, which limits our solution to Post-Training Quantization. This limitation also prevents the implementation of Quantization-Aware Training (QAT) and Representative Data Quantization for the MobileViT model.
Given these constraints, EfficientNetV2B2 emerged as the primary choice for deployment, primarily because its native Keras support allows for a comprehensive and flexible implementation of various quantization strategies. As shown in
Table 3, the baseline EfficientNetV2B2 model achieved a 77.8% test accuracy at a size of 33 MB and with an inference time of 0.24 s per image. Applying Post-Training Quantization (PTQ)—a method that simply converts the trained weights and activations into low-precision formats after training—is particularly effective here. Because PTQ does not alter the model’s learned parameters during training and leverages well-optimized integer arithmetic kernels, the model’s representational quality remains virtually intact. This approach yields a 9.6 MB model that retains the full 77.8% accuracy and reduces the inference time to 0.20 s per image, demonstrating that PTQ can achieve sufficient compression with minimal trade-offs in the performance.
By contrast, Quantization-Aware Training (QAT) integrates quantization operations into the training loop itself. Although this technique theoretically helps the model adapt to the noise and reduced precision of quantization, the final quantization parameters may not perfectly match the actual data distribution at inference time. This mismatch can lead to suboptimal results, as seen in our QAT variant with a 74.9% accuracy, 22.1 MB size, and a 0.26 s inference time. Representative data quantization, which adjusts quantization parameters based on a small dataset representative of the real-world input, helps mitigate this shortcoming of QAT. By tailoring the scaling factors to the actual operating conditions, Representative Data Quantization yields a better-aligned 10.4 MB model, with a 77.2% accuracy and a 0.23 s inference time, which is closer to PTQ’s performance, but with the added benefit of data-driven fine-tuning. This improvement highlights how leveraging representative data can “correct” QAT’s residual misalignments, providing a stronger balance between efficiency and accuracy. In essence, while PTQ offers a straightforward, near-optimal compression strategy, and QAT attempts to build quantization robustness during training, Representative Data Quantization steps in to refine the quantization parameters post hoc, ensuring that the final deployed model operates under conditions more closely matching those it will encounter in practice.
Notably, our test accuracy on TensorFlow Lite using the post-training quantized EfficientNetV2B2 model is effectively equivalent to the base model validation accuracy reported in the literature. This finding highlighted the effectiveness of our optimized approach, enabling the deployment of a lightweight yet accurate model for real-time insect classification in resource-constrained environments.
To assess the energy efficiency of our mobile insect classification application, we conducted experiments using the same Nothing Phone 2 (4700 mAh battery) running Android 15. The test was performed under controlled conditions—with the phone fully charged, screen brightness fixed, and minimal background activity—to ensure consistent measurements. During a 5-min session (300 s), the application processed 80 image classifications sequentially, consuming a total of 34.6 mAh. This corresponds to an average energy consumption of approximately 0.43 mAh per inference. By converting this energy usage using a nominal battery voltage of 3.85 V, we estimate an active power draw of 1.60 W. These results are summarized in
Table 6 and demonstrate that our system maintains low power consumption, a critical factor for prolonged edge deployment in resource-constrained agricultural environments.
To further evaluate the real-world performance of our insect classification application, we manually measured the end-to-end latency on a Nothing Phone 2. Using a stopwatch, we recorded the total time from user input to output, for each of 20 separate classification instances. The measurements revealed an average latency of 2.19 s, with a standard deviation of 1.19 s, a median latency of 1.58 s, and a range spanning from 1.00 to 3.81 s. These latency results, which are summarized in
Table 7, offer additional insight into the system’s responsiveness under typical operating conditions, and confirm that the application meets the requirements for practical on-device deployment.
The confusion matrix generated using the TensorFlow Lite Model of EfficientnetV2B2 with Post-Training Quantization is displayed in
Figure 6.
Having integrated the quantized EfficientNetV2B2 model using Post-Training Quantization into our proposed mobile application, we conducted a series of tests to evaluate the model’s real-world performance in classifying insect species. For each of the fifteen classes, five random images were selected from the test set, resulting in a total of seventy-five tests. The model’s predictions were recorded, and the results were compiled into a confusion matrix, which provides a detailed view of the model’s accuracy and any misclassifications across the classes. This matrix highlights the model’s accuracy, with the majority of predictions aligning with the ground truth, though some misclassifications exist between certain insect species with similar visual characteristics.
The model accurately classified 77.0% of the test images across 15 insect classes, showing good performance in categories like Africanized Honeybees, Fruit Flies, and Spider Mites. Such classes have high counts along the diagonal of
Figure 6. An in-depth analysis of our confusion matrix reveals several specific misclassification patterns that warrant further investigation. For example, the model correctly classified only four out of eleven samples of Fall Armyworms (Row 9), resulting in an accuracy of approximately 36%. Notably, six samples of Fall Armyworms were misclassified as Armyworms and one as Western Corn Rootworms, indicating that the model frequently confuses these classes, likely due to their morphological similarities or overlapping features in the training data. Similarly, for Armyworms (Row 2), while five samples were correctly classified, there were additional errors, with one instance misclassified as Corn Earworms, and four as Fall Armyworms. Furthermore, Corn Borers were correctly identified in eight cases, but were confused with Corn Earworms (two samples) and Fall Armyworms (one sample).
These patterns suggest that certain insect species with similar visual characteristics are challenging for the model to distinguish. Given this, a potential strategy for future improvement could involve merging or re-evaluating visually similar classes—such as Armyworms and Fall Armyworms—if domain knowledge supports that their distinctions are ambiguous even to experts. Such refinements may help enhance model robustness and overall classification accuracy. The performance of the deployed mobile model was further evaluated using a confusion matrix based on user testing, as shown in
Figure 7.
To further validate the robustness and generalizability of our solution, we applied it to the IP102 dataset [
7]. As mentioned in
Section 3, this larger, more diverse benchmark is widely used for insect pest classification, making it ideal for this validation. With over 75,000 images across 102 insect pest classes, the dataset exhibited considerable variation in the number of samples per class. While some classes contained thousands of images, others were severely underrepresented, with as few as a few dozen samples. This imbalance posed a risk of biasing the predictive model towards the overrepresented classes, potentially leading to poor generalization on minority classes. Addressing this imbalance was critical to ensuring the fairness and robustness of the trained model.
To mitigate this issue, we adopted a systematic dataset balancing strategy, aimed at equalizing the number of samples per class while maintaining diversity. Initially, we determined a target of 300 samples per class, balancing the trade-off between dataset size and class uniformity. For classes with more than 300 images, we employed under-sampling by randomly selecting 300 images, ensuring a representative subset was retained. For underrepresented classes, a combination of oversampling and augmentation techniques was used. Augmentation involved generating synthetic images through transformations such as rotation, zooming, flipping, brightness adjustments, and color shifts. These transformations helped create diverse variations of existing images, improving the model’s ability to generalize to unseen data. Additionally, for classes with very few samples, existing images were duplicated to reach the target sample size, ensuring no class remained underrepresented.
We then followed the methodology outlined in
Section 5, splitting the dataset and augmenting the training set before merging it with the original training dataset. Using this combined dataset, we trained an EfficientNetV2B2 model in floating-point precision, giving us an accuracy of 59.6%. Subsequently, we applied the three quantization strategies evaluated in this study, which are as follows: Post-Training Quantization (PTQ), Quantization-Aware Training (QAT), and Data Representative Quantization. The results obtained from these models are presented in
Table 8, alongside a comparison with the results from related works.
To contextualize these results, we compared our quantized models to two baseline studies that utilized InceptionNetV3 [
43] and ResNet [
44]. We specifically selected these models because, like our own, they do not incorporate specialized hybrid strategies tailored to this dataset. As shown in
Table 4, InceptionNetV3 and ResNet achieved accuracies of 57.08% and 55.24%, respectively. Our EfficientNetV2B2 model surpassed both baselines, even before quantization, reaching 59.6% accuracy. More importantly, after applying quantization, our solution maintained or closely approached this accuracy while reducing the model size, and thus the computational footprint.
Among the three quantization strategies evaluated—Post-Training Quantization (PTQ), Quantization-Aware Training (QAT), and Data Representative Quantization—PTQ stands out once again, as it did in our previous dataset. PTQ reduced the model size from 33 MB to 9.6 MB without any loss in accuracy, sustaining a 59.6% test accuracy. Representative Data Quantization also preserved the original accuracy at 59.6% with a slightly larger size (10.4 MB), while QAT achieved 58.3% accuracy at 22.1 MB. Although QAT proved useful, it did not offer the same degree of accuracy retention as PTQ or Representative Data Quantization.
These results on IP102 affirm the scalability and versatility of our approach. By replicating the improved efficiency and robust accuracy previously demonstrated on a smaller dataset, we confirm that our methodology is not limited to a single context. Instead, it remains effective on a more complex and extensive dataset, reinforcing its potential for widespread application in resource-constrained, real-world agricultural environments.
Our analysis revealed a complex relationship between augmentation strategies and the characteristics of the Dangerous Farm Insects Dataset. Models trained exclusively on heavily augmented data performed worse than those trained on the original dataset. This counterproductive effect was likely due to the dataset’s small size and the subtle visual distinctions between insect classes, where excessive transformations, such as large rotations or aggressive brightness adjustments, inadvertently distorted critical features that define class boundaries.
These findings highlight the importance of domain-specific augmentation design. The application of more restrained transformations, such as brightness adjustments and flips, aligns with the dataset’s real-world variability while preserving critical visual cues. This result emphasizes that augmentation strategies must be adapted to the dataset’s specific properties to maximize their effectiveness, particularly for small, specialized datasets like ours.