Intelligently Counting Agricultural Pests by Integrating SAM with FamNet

Qing, Jiajun; Deng, Xiaoling; Lan, Yubin; Xian, Jidong

doi:10.3390/app14135520

Open AccessArticle

Intelligently Counting Agricultural Pests by Integrating SAM with FamNet

by

Jiajun Qing

¹,

Xiaoling Deng

^1,2,3,*

,

Yubin Lan

^1,2,3 and

Jidong Xian

^1,2,3

¹

College of Electronic Engineering (College of Artificial Intelligence), South China Agricultural University, Guangzhou 510642, China

²

National Center for International Collaboration Research on Precision Agricultural Aviation Pesticide Spraying Technology, Guangzhou 510642, China

³

Guangdong Engineering Technology Research Center of Smart Agriculture, Guangzhou 510642, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(13), 5520; https://doi.org/10.3390/app14135520

Submission received: 20 March 2024 / Revised: 2 June 2024 / Accepted: 5 June 2024 / Published: 25 June 2024

(This article belongs to the Special Issue Machine Vision and Hyperspectral Imaging Technologies and Applications for the Agri-Food Sector)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The utilization of the large pretrained model (LPM) based on Transformer has emerged as a prominent research area in various fields, owing to its robust computational capabilities. However, there remains a need to explore how LPM can be effectively employed in the agricultural domain. This research aims to enhance agricultural pest detection with limited samples by leveraging the strong generalization performance of the LPM. Through extensive research, this study has revealed that in tasks involving the counting of a small number of samples, complex agricultural scenes with varying lighting and environmental conditions can significantly impede the accuracy of pest counting. Consequently, accurately counting pests in diverse lighting and environmental conditions with limited samples remains a challenging task. To address this issue, the present research suggests a unique approach that integrates the outstanding performance of the segment anything model in class-agnostic segmentation with the counting network. Moreover, by intelligently utilizing a straightforward TopK matching algorithm to propagate accurate labels, and drawing inspiration from the GPT model while incorporating the forgetting mechanism, a more robust model can be achieved. This approach transforms the problem of matching instances in different scenarios into a problem of matching similar instances within a single image. Experimental results demonstrate that our method enhances the accuracy of the FamNet baseline model by 69.17% on this dataset. Exploring the synergy between large models and agricultural scenes warrants further discussion and consideration.

Keywords:

agricultural; pests; few-shot; large pretrained model

1. Introduction

Agricultural pests and diseases have significant implications for the economic returns and long-term development of the agricultural industry. These pests not only hinder plant growth but also serve as a conduit for the spread of numerous diseases. The global farming community has expressed concerns about the prevalence of insect pests in modern agriculture. Insect pests, such as citrus psyllids, pose a significant problem as they are the primary carriers of citrus Huanglongbing, causing yellowing and defoliation of citrus leaves and ultimately leading to reduced yields [1]. Other crops, including lychees, are also plagued by pests and diseases, such as stink bugs [2], blackback beetles, aphids, scale insects, and ladybugs.

By quantifying the pest population, the assessment of crop growth can be enhanced, and the correlation between pest abundance and factors such as pesticides, temperature, water, and fertilizer can be investigated. These efforts will contribute to the establishment of a comprehensive intelligent monitoring system for agricultural pests, which will provide guidance for effective crop pest management, ultimately reducing losses and increasing yields.

Advancements in intelligent algorithms have notably improved the monitoring of plant pests and diseases, as discussed by Kara et al. [3]. A significant contribution in this field was made by Li et al. [4], who proposed an enhanced deep learning network for the counting and localization of agricultural pests. Further, Ferentinos [5] demonstrated remarkable accuracy in disease identification by training a model on a vast open database that included 87,848 images of 25 different plants and diseases across 58 combinations, achieving a 99.53% accuracy rate.

Okuyama et al. [6] utilized deep learning techniques to quantify pest populations in agricultural settings, providing insights into the correlation between population dynamics and various factors such as seasonality, climate, and temperature over time. Wen et al. [7] highlighted the effectiveness of Pest-YOLO, a model that has shown promising results in detecting and counting dense micro-pests on a large scale, addressing the challenge of micro-pest detection. Tetila et al. [8] developed an automated system for the early diagnosis of insect pest infestations in soybean fields, allowing for the assessment of infection levels and optimizing pesticide application, which in turn reduces costs and environmental impact.

Despite the success of these studies with specific datasets, they are heavily dependent on extensive image data collection, which is resource intensive. Addressing this challenge, Ranjan et al. [9] introduced FamNet in their CVPR2021 paper titled “Learning To Count Everything”, offering a solution for object counting with limited data, or “few shots”, and achieving excellent results. However, their approach was limited to counting within the same image and environmental context, not accounting for scenarios where the samples and instances to be counted differ in their environments or scenes. This research aims to capitalize on the robust generalization capabilities of large-scale models to enhance agricultural pest counting, overcoming the limitations of previous studies and reducing the reliance on extensive image datasets.

The rapid advancement in computing power has made it possible to train language models with hundreds of billions of parameters using the Transformer architecture [10]. These models exhibit exceptional performance in various natural language processing tasks, such as inference, summarization, translation, and generating human-like text. Notably, companies like OpenAI have developed their own large-scale language models, such as the GPT series (GPT-4, GPT-3.5, and GPT-3) [11], which have gained widespread recognition in artificial intelligence fields, including medical care and pedagogy [12]. In previous research, an attempt was made to incorporate GPT into the generation of guidance for agricultural pest and disease management. Subsequently, the potential of large-scale pretrained models was recognized not only in the field of language but also in other domains. For example, Meta AI recently released the segment anything model (SAM), which has garnered attention for its impressive performance in class-agnostic segmentation. SAM can accurately generate masks and segment different instances in an image, even when the instances are unknown [13].

However, the application of such large-scale models like SAM in agriculture remains unexplored. This field holds immense potential but has yet to be fully developed. The enormous parameter count of SAM gives it strong generalization capability, laying the foundation for a unified agricultural model. In a recent study, the performance of SAM was evaluated in diverse real-world segmentation scenarios, including natural images, agriculture, manufacturing, remote sensing, and medical scenes [14]. The findings of this study revealed that the SAM demonstrates remarkable generalization capabilities in common scenes, such as natural images. However, its effectiveness is limited in low-contrast scenes, and it necessitates strong prior knowledge in complex scenes. The initial assessment of the SAM model on this task and its comparison with other baseline few-shot counting methods were carried out in the study conducted titled “Can sam count any-thing? an empirical study on sam counting”. The findings revealed that SAM exhibited lower effectiveness compared to the state-of-the-art baseline when dealing with small and crowded objects. As a result, it is imperative to make further enhancements to SAM in order to overcome these limitations, particularly in specific scenarios [15].

This study proposes a scheme that utilizes SAM to assist FamNet in achieving better counting results. SAM’s robust generalization capability can help FamNet handle variations in different environments and lighting conditions. FamNet, in turn, can alleviate the low-precision segmentation issue of SAM in dense scenes. By combining these two approaches as proposed in this study, better solutions can be achieved for counting different crop pests and diseases. This method allows for reduced collection of pest datasets in smart agriculture and enables unsupervised pest density statistics with the introduction of external databases. With the continuous improvement in large-scale models and the expansion of quantifiable agricultural data, more models can be seamlessly integrated, forming the cornerstone of an integrated intelligent detection system for agricultural water, fertilizer, and pest diseases. It is worth noting that large-scale models also have limitations, which will be analyzed in the Section 4.

2. Materials and Methods

The experimental data in this study comprise a collection of RGB images that capture various diseases and pests from different angles and distances. To analyze and infer abundant feature information in agricultural images a segmentation-agnostic model called the segment anything model (SAM) is utilized. SAM demonstrates strong contextual reasoning and generalization capabilities, allowing for the partitioning of images into segments. Additionally, the count everything model, which is available in the five-shot scenario, is used as a benchmark model for counting purposes. The use of ‘Count Every Thing’ and FamNet has been standardized throughout the manuscript to ensure that the terms are used interchangeably with a clear understanding that ‘Count Every Thing’ refers to the seminal paper, while FamNet denotes the network presented therein.

In the proposed approach, agricultural images are initially fed into the unsupervised SAM network, which performs segmentation based on different semantic contents and produces vector features representing different semantic information. These features are then processed and matched for label assignment with the sample vectors that have undergone the same processing. To mitigate differences in environments or lighting conditions, query instances are generated within a single image to achieve better counting results. Based on this, a flexible design utilizing SAM is proposed in this study, which avoids the segmentation issues that SAM encounters in common dense agricultural scenes and improves the generalization capabilities of counting models due to the lack of samples. Furthermore, to enable the model to learn from historical detection records similar to GPT’s knowledge acquisition from dialogue history, the proposed model incorporates feedback from the results into a knowledge base at the final layer [16]. This facilitates the automatic expansion of the external data with detection records, which will benefit future detections. The network architecture is illustrated in Figure 1.

2.1. Dataset

The dataset employed in this study encompasses a collection of RGB images depicting a variety of plant diseases and pests from multiple perspectives and distances. The dataset is comprised of 803 RGB images, with a detailed breakdown as follows: 544 images of citrus psyllids, 26 images of lychee stink bugs, 57 images of aphids, 55 images of blackback beetles, 56 images of scale insects, and 65 images of ladybugs. It is important to note that these images were exclusively utilized for the purpose of model evaluation and were not part of the training dataset.

The citrus psyllid RGB images were collected from the fruit orchards in the East Campus of South China Agricultural University and the orchards of the College of Engineering over a period from 25 June 2020 to 24 August 2020. To account for the variability in natural light conditions and to reduce the long-tail effect in image recognition, images were deliberately captured during three distinct time periods each day: early morning (9:00–11:00), midday (14:00–15:00), and late afternoon (17:00–18:00). This approach ensured a diverse dataset that better represents the range of environmental conditions that citrus psyllids may encounter.

A Huawei Honor 20 handheld camera, manufactured in China, was employed to photograph adult citrus psyllids on citrus trees from a range of 4 to 6 cm. The initial collection yielded 2024 field images of citrus psyllids with varying resolutions: 3024 × 4032, 2592 × 1944, and 1200 × 1600 pixels. To enrich the dataset, an additional 117 images, contributed by citrus experts and farmers from diverse regions across the nation, were incorporated. This supplementation culminated in a final dataset of 544 original citrus psyllid images, as illustrated in Figure 2.

In addition to the RGB images of the litchi aphid, we supplemented the dataset with 259 images of insect damage using web crawling techniques. The images are as follows:

To address the issue of insufficient training data leading to overfitting, a common approach of data augmentation was applied to the images [17]. This included random adjustments of contrast and brightness [18], adding noise from Gaussian and gamma distributions, and applying techniques such as contrast limited adaptive histogram equalization (CLAHE) [19], median blur, horizontal and vertical flipping, and transpose to modify the contrast. The final result is shown in Figure 3. Before each experiment, few random images were selected from the dataset as prior knowledge, while the rest of the dataset served as the test set.

2.2. Employing SAM to Assist in Unsupervised Model Counting

SAM, as a visual meta-model proposed by Meta AI, has achieved remarkable performance in class-agnostic segmentation by utilizing contextual attention. This study reveals that the essence of few-shot counting is actually similar to querying features in images that resemble the distribution of counting instances in a class-agnostic manner. Therefore, in summary, the main workflow of this model is as follows: first, the powerful generalization capability of the SAM meta-model is utilized for semantic segmentation. Then, based on the semantic features, matching is performed to identify the instances that require counting in the image. Finally, once the instances in the image are obtained, counting is performed using the few-shot counting model, as done in previous approaches. The following section will provide a detailed description of how SAM is used to extract image vector features:

Step 1: Semantic segmentation using SAM. Thanks to the training of SAM with an image encoder, prompt encoder, mask decoder, and a large-scale dataset, it supports three schemes for segmentation: point prompt, box prompt, and automatically generated mask prompt. To achieve an unmanned automated system, this study chooses the approach of automatically generating masks for the initial segmentation of the images in the first step and then converts the results into vector form in the second step.

Step 2: Similar to the popular use of LangChain in 1.0.0 version with large models like GPT, this study employs vectorization to perform semantic similarity sorting of the model’s knowledge and memory. Specifically, a small amount of sample data is stored in vectorized form as external knowledge for the model. Different vectorization methods are tested to store the samples as knowledge vectors, including FLAT [20], HNSW [21], and SWAV [22]. After experimenting with various methods, SWAV is ultimately chosen as the vectorization approach. The specific process is illustrated in Figure 4.

2.3. The Vector TopK Matching Technique

In previous studies on few-shot counting tasks, the task is often defined as a key-value problem. Specifically, given a key representing an instance to be queried in an image, the task is to find the value, which corresponds to the number of instances to be counted in the image. Researchers commonly use the pest instances in the test set as keys to query the test set images for matching regions. However, due to the limited number of samples in the dataset and test set, this approach often leads to long-tail effects and poor domain adaptation caused by environmental differences. As shown in Figure 5, when using the same image of a woodlouse, satisfactory results can be achieved. However, when using woodlice from other scenes as counting instances, the accuracy significantly decreases. This is because changes in the environment, such as variations in lighting angles, blur the features of the queried instances.

Addressing this issue, this study attempts to transform the problem of querying the value in the test set images using instances outside the test set as keys. It proposes a method that utilizes instances outside the test set as keys to match the key vectors in the test set images, and then uses the test set key vectors to query the values in the test set. Therefore, this study chooses a simple approach that uses the minimum cosine distance between vectors to compensate for the data differences between the visual segmentation model and the counting model. The vector similarity matching aims to match the known pest feature vectors to the most similar K feature vectors in the segmented and processed block image vectors using the SAM. Specifically, it calculates the distances between all vectors after SAM segmentation and a small number of samples in the vector database, and then takes the average of all distances to determine the final similarity. The final results of the network are shown in Figure 6.

The hyperparameter K in the TopK tag matching process was set to a fixed value of 10 after conducting preliminary experiments. This value was chosen as it offered an optimal trade-off between the diversity of the tag matches and the computational performance of our model. Further exploration of the impact of varying K on the model’s performance is a potential area for future research.

Additionally, this study explores another approach. It calculates the similarity between all feature vectors after segmentation and the top K of the most similar vectors from a small number of samples in the vector database. As long as the average similarity reaches a certain threshold, it is considered as a selection criterion.

2.4. The Self-Prompting Structure

Based on recent advancements in the popular GPT model, it has been found that GPT is capable of extracting features from conversation data and using past dialogue messages as prompts for new rounds of conversation, which enhances the quality of the subsequent dialogue. Inspired by GPT, this study incorporates a feedback loop after successful vector matching, where the detected instance with the highest similarity to the database is used as a prompt for the next round of detection. This automatic expansion of the sample set helps improve the accuracy of the subsequent detection. Additionally, we propose two threshold matching methods in this article. The first method is referred to as “Prompt based on Top”. It involves conducting a TopK image matching and selecting the closest matching image to be included in the prompt knowledge. The second method, known as “Prompt based on threshold”, also involves conducting a TopK image matching, but in this case, all the images obtained are added to the prompt knowledge. The key distinction between these two methods lies in whether the results from previous tests are carried over as prior knowledge for the subsequent tests.

To prevent the excessive accumulation of prompt knowledge, this study sets a limit on the prompt vectors. If the limit is exceeded, the earliest vector feature in the entire set is forgotten. The simplified structure is illustrated in Figure 7.

3. Experimental Results

The hardware configuration for the experiments were Intel(R) Xeon(R) Gold 6226R CPU @ 2.90 GHz, with 256 GB memory, 2 * NVIDIA A100-PCIe graphics card with 80 GB graphics memory. The software environment was Ubuntu 16.04LTS 64-bit operating 236 system, CUDA version 11.1, CUDNN version v8.0.4, Python version 4.0 and pytorch==1.8.0.

3.1. Evaluation of Different TopK Matching Approaches

In the experiment evaluating the effectiveness of the large model, this study first tested the performance difference between two different TopK matching approaches designed in this study. Approach 1 involved matching the test set based on the average highest matching rate, while Approach 2 involved matching multiple images based on a matching rate threshold.

In the performance evaluation focused solely on the TopK matching approach, this study found that although matching multiple test set instances to a single image as the ground truth label improved the accuracy of obtaining correct labels, it also limited the use of only one image feature for counting. Therefore, matching multiple images based on a pre-defined threshold achieved better precision compared to using a single image. This study defined the matching accuracy as mean absolute error (MAE).

The final results of the evaluation experiment are shown in Table 1:

3.2. Evaluation Experiment on Different CAPM Architectures

To evaluate the effectiveness of using SAM as an auxiliary tool for FamNet in this study, FamNet was used as the baseline model, and two sets of systems proposed in this paper were tested. System 1 involved the basic system that directly performed segmentation and used the TopK threshold matching approach. System 2 stored all detection results as historical knowledge vectors for the next round of predictions. System 3 stored the detection result with the highest accuracy as a historical knowledge vector for the next round of predictions. System 4 stored the detection result with the highest accuracy as a historical knowledge vector for the next round of predictions and included a forgetting mechanism.

In the performance evaluation of CPM, this study found that storing the detection results as vectors for the next prediction as prompt information improved the accuracy to some extent. However, since FamNet is a few-shot model, it may not require excessive prompts. Too many prompts can actually lead to a decrease in performance. In System 3, when multiple instances were selected using TopK and matched based on a threshold, adding multiple prompts at once led to a faster decline in performance. In System 4, although a forgetting mechanism was added, it actually resulted in a decrease in performance. The study speculates that this is because the simple forgetting mechanism caused the system to forget the correct prompt knowledge and resulted in a higher proportion of incorrect prompts. The specific experimental results are shown in Table 2 and Figure 8. To provide a comprehensive view of the experiments conducted in this study, additional information regarding experimental visualizations can be found in the Appendix A.

4. Discussion and Conclusions

4.1. Conclusions

This study makes several contributions. Firstly, it proposes a system called segment anything method (SAM) for counting with the assistance of FamNet. By exploring the limitations of FamNet in different environments and combining it with the SAM auxiliary layer, preliminary semantic feature extraction is performed. This leverages the powerful unsupervised semantic segmentation capability of FamNet to enhance the perception of object features in the counting small model. Secondly, it introduces the use of TopK vector matching to refine the counting process, transforming it into a two-stage process to address domain adaptation issues when dealing with samples and tests of the same type but different scenes. This is beneficial for better handling the long-tail effect in the data and reducing the impact of lighting and other scene factors. Finally, inspired by the use of historical dialogue in GPT to improve performance, this research proposes an architecture that can enhance performance through prompt learning during use. This architecture helps the model automatically expand rare datasets and allows for convenient manual adjustment of model prompts. Overall, this work contributes to the application of large models in downstream agricultural tasks.

4.2. Discussion

Although the SAM auxiliary counting system proposed in this study effectively harnesses the powerful generalization performance of large models, it still faces limitations and challenges. Firstly, it heavily relies on the accuracy of the large model. While it benefits from the strong domain adaptation capability provided by the large model, it also faces issues encountered by large models, including the problem of model hallucination and failure in specific sample scenarios. These limitations will restrict the future development of the current model. Secondly, the introduction of large models comes at a cost, which means higher computational resources and increased device costs. This will impact the practical implementation of the model. Finding ways to use large models in a more lightweight manner, utilizing acceptable resources to achieve better results, remains a topic for further exploration. Furthermore, the data currently processed by the model still remain at the level of two-dimensional images. Although the model has a large number of parameters, it is still limited by the lack of three-dimensional information in two-dimensional images. Introducing three-dimensional information is expected to significantly improve the upper limit of the current model. Lastly, the proposed method in the early stage of low-sample counting can lead to a significant decrease in subsequent detection accuracy if incorrect samples are stored in the knowledge base as prompts. Even if a forgetting mechanism is introduced to stabilize the quantity of prompts, excessive expansion due to model prompts can still cause a decrease in performance.

Author Contributions

Conceptualization, J.Q. and X.D.; methodology, J.Q., X.D., Y.L. and J.X.; software, J.Q.; validation, J.Q., X.D. and Y.L.; formal analysis, X.D., Y.L. and J.X.; investigation, X.D., Y.L. and J.X.; resources, J.Q.; data curation, X.D., Y.L. and J.X.; writing—original draft preparation, J.Q.; writing—review and editing, X.D.; visualization, J.Q.; supervision, J.Q.; project administration, J.Q., X.D., Y.L. and J.X.; funding acquisition, X.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Key-Area Research and Development Program of Guangdong Province, grant number 2023B0202090001; National Natural Science Foundation of China, grant number 32371984; Key-Area Research and Development Program of Guangdong Province, grant number 2019B020214003; Laboratory of Lingnan Modern Agriculture Project, grant number NT2021009; Key-Area Research and Development Program of Guangzhou, grant number 202103000090; Key-Areas of Artificial Intelligence in General Colleges and Universities of Guangdong Province, grant number 2019KZDZX1012; The leading talents of Guangdong province program, grant number 2016LJ06G689; China Agriculture Research System, grant number CARS-15-23 and The 111 Project, grant number D18019.

Data Availability Statement

Data sets for the study period can be obtained from the corresponding author upon reasonable request. These data include insect disease information for six species. However, the availability of these data is restricted and their usage is subject to a license agreement with the current study. Therefore, they are not publicly accessible.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A

Figure A1. The results of the initial model under different rounds.

Figure A2. Results of initial model + SAM in different rounds.

Figure A3. Results of initial model + SAM + Forget in different rounds.

Figure A4. The results of the initial model under different rounds.

Figure A5. The results of the initial model under different rounds.

Figure A6. Results of initial model + SAM + Forget in different rounds.

References

Godefroid, M. Species distribution models predicting climate suitability for the psyllid Trioza erytreae, vector of citrus greening disease. Crop Prot. 2023, 168, 106228. [Google Scholar] [CrossRef]
Boopathi, T.; Singh, S.B.; Manju, T.; Ramakrishna, Y.; Akoijam, R.S.; Chowdhury, S.; Singh, N.H.; Ngachan, S.V. Development of temporal modeling for forecasting and prediction of the incidence of lychee, Tessaratoma papillosa (Hemiptera: Tessaratomidae), using time-series (ARIMA) analysis. J. Insect Sci. 2015, 15, 55. [Google Scholar] [CrossRef]
Karar, M.E.; Alsunaydi, F.; Albusaymi, S.; Alotaibi, S. A new mobile application of agricultural pests recognition using deep learning in cloud computing system. Alex. Eng. J. 2021, 60, 4423–4432. [Google Scholar] [CrossRef]
Li, W.; Chen, P.; Wang, B.; Xie, C. Automatic localization and count of agricultural crop pests based on an improved deep learning pipeline. Sci. Rep. 2019, 9, 7024. [Google Scholar] [CrossRef] [PubMed]
Ferentinos, K.P. Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 2018, 145, 311–318. [Google Scholar] [CrossRef]
Okuyama, T.; Yang, E.C.; Chen, C.P.; Lin, T.S.; Chuang, C.L.; Jiang, J.A. Using automated monitoring systems to uncover pest population dynamics in agricultural fields. Agric. Syst. 2011, 104, 666–670. [Google Scholar] [CrossRef]
Wen, C.; Chen, H.; Ma, Z.; Zhang, T.; Yang, C.; Su, H.; Chen, H. Pest-YOLO: A model for large-scale multi-class dense and tiny pest detection and counting. Front. Plant Sci. 2022, 13, 973985. [Google Scholar] [CrossRef] [PubMed]
Tetila, E.C.; Machado, B.B.; Menezes, G.V.; de Souza Belete, N.A.; Astolfi, G.; Pistori, H. A deep-learning approach for automatic counting of soybean insect pests. IEEE Geosci. Remote Sens. Lett. 2019, 17, 1837–1841. [Google Scholar] [CrossRef]
Ranjan, V.; Sharma, U.; Nguyen, T.; Hoai, M. Learning to count everything. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3394–3403. [Google Scholar]
Gerasimenko, N.A.; Chernyavsky, A.S.; Nikiforova, M.A. ruSciBERT: A Transformer Language Model for Obtaining Semantic Embeddings of Scientific Texts in Russian. Dokl. Math. 2022, 106, S95–S96. [Google Scholar] [CrossRef]
Mathimani, T.; Mallick, N. A comprehensive review on harvesting of microalgae for biodiesel–key challenges and future directions. Renew. Sustain. Energy Rev. 2018, 91, 1103–1120. [Google Scholar] [CrossRef]
Kung, T.H.; Cheatham, M.; Medenilla, A.; Sillos, C.; De Leon, L.; Elepaño, C.; Madriaga, M.; Aggabao, R.; Diaz-Candido, G.; Maningo, J.; et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS Digit. Health 2023, 2, e0000198. [Google Scholar] [CrossRef] [PubMed]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. arXiv 2023, arXiv:2304.02643. [Google Scholar]
Ma, Z.; Hong, X.; Shangguan, Q. Can sam count anything? an empirical study on sam counting. arXiv 2023, arXiv:2304.10817. [Google Scholar]
Ji, W.; Li, J.; Bi, Q.; Li, W.; Cheng, L. Segment anything is not always perfect: An investigation of sam on different real-world applications. arXiv 2023, arXiv:2304.05750. [Google Scholar] [CrossRef]
Pesaru, A.; Gill, T.S.; Tangella, A.R. AI assistant for document management Using Lang Chain and Pinecone. Int. Res. J. Mod. Eng. Technol. Sci. 2023. Available online: https://www.doi.org/10.56726/IRJMETS42630 (accessed on 19 March 2024).
Bow, S.T. Pattern Recognition and Image Preprocessing; CRC Press: Boca Raton, FL, USA, 2002. [Google Scholar]
Morris, R. Developments of a water-maze procedure for studying spatial learning in the rat. J. Neurosci. Methods 1984, 11, 47–60. [Google Scholar] [CrossRef]
Pizer, S.M.; Amburn, E.P.; Austin, J.D.; Cromartie, R.; Geselowitz, A.; Greer, T.; ter Haar Romeny, B.; Zimmerman, J.B.; Zuiderveld, K. Adaptive histogram equalization and its variations. Comput. Vis. Graph. Image Process. 1987, 39, 355–368. [Google Scholar] [CrossRef]
Gieseker, D. Flat vector bundles and the fundamental group in non-zero characteristics. Ann. Della Sc. Norm. Super. Pisa-Cl. Sci. 1975, 2, 1–31. [Google Scholar]
Malkov, Y.A.; Yashunin, D.A. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 42, 824–836. [Google Scholar] [CrossRef]
Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural Inf. Process. Syst. 2020, 33, 9912–9924. [Google Scholar]

Figure 1. The complete network architecture diagram.

Figure 2. Images of citrus psyllids.

Figure 3. Images of different pests.

Figure 4. SAM conducts the preliminary data processing stage.

Figure 5. Choose samples beyond the frame of the image as counting samples.

Figure 6. Identifying the most similar sample through matching.

Figure 7. Implementation of a new loop for self-data expansion and knowledge forgetting.

Figure 8. Comparative experimental results between four different architectures.

Table 1. Comparative experimental results between three different architectures.

	MAE
FamNet SAM + matching based on top + FamNet SAM + matching based on threshold + FamNet	8.63
	5.12 4.21

Table 2. Comparative experimental results between four different architectures.

		MAE
	0 Epoch	5 Epoch	25 Epoch
SAM + Matching based on threshold + FamNet	4.21	4.21	4.21
SAM + Matching based on threshold + Prompt based on threshold + FamNet	4.21	2.67	11.64
SAM + Matching based on threshold + Prompt based on Top + FamNet	4.21	3.90	4.46
SAM + Matching based on threshold + Prompt based on threshold + FamNet + Forget	4.21	2.67	3.65

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qing, J.; Deng, X.; Lan, Y.; Xian, J. Intelligently Counting Agricultural Pests by Integrating SAM with FamNet. Appl. Sci. 2024, 14, 5520. https://doi.org/10.3390/app14135520

AMA Style

Qing J, Deng X, Lan Y, Xian J. Intelligently Counting Agricultural Pests by Integrating SAM with FamNet. Applied Sciences. 2024; 14(13):5520. https://doi.org/10.3390/app14135520

Chicago/Turabian Style

Qing, Jiajun, Xiaoling Deng, Yubin Lan, and Jidong Xian. 2024. "Intelligently Counting Agricultural Pests by Integrating SAM with FamNet" Applied Sciences 14, no. 13: 5520. https://doi.org/10.3390/app14135520

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligently Counting Agricultural Pests by Integrating SAM with FamNet

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Employing SAM to Assist in Unsupervised Model Counting

2.3. The Vector TopK Matching Technique

2.4. The Self-Prompting Structure

3. Experimental Results

3.1. Evaluation of Different TopK Matching Approaches

3.2. Evaluation Experiment on Different CAPM Architectures

4. Discussion and Conclusions

4.1. Conclusions

4.2. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI