Research on the Agricultural Pest Identification Mechanism Based on an Intelligent Algorithm

Xiao, Qixun; Zheng, Wenying; He, Yifan; Chen, Zijie; Meng, Fanxin; Wu, Liyan

doi:10.3390/agriculture13101878

Open AccessArticle

Research on the Agricultural Pest Identification Mechanism Based on an Intelligent Algorithm

by

Qixun Xiao

^1,†,

Wenying Zheng

^2,†,

Yifan He

²,

Zijie Chen

³,

Fanxin Meng

² and

Liyan Wu

^2,*

¹

School of Electronic Information Engineering, Zhuhai College of Science and Technology, No. 8, Anji East Road, Jinwan District, Zhuhai 519041, China

²

College of Pharmacy and Food Science, Zhuhai College of Science and Technology, No. 8, Anji East Road, Jinwan District, Zhuhai 519041, China

³

School of Mechanical Engineering, Zhuhai College of Science and Technology, No. 8, Anji East Road, Jinwan District, Zhuhai 519041, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Agriculture 2023, 13(10), 1878; https://doi.org/10.3390/agriculture13101878

Submission received: 24 August 2023 / Revised: 17 September 2023 / Accepted: 22 September 2023 / Published: 26 September 2023

(This article belongs to the Special Issue Agricultural Automation in Smart Farming)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The use of Internet of Things (IoT) technology for real-time monitoring of agricultural pests is an unavoidable trend in the future of intelligent agriculture. This paper aims to address the difficulties in deploying models at the edge of the pest monitoring visual system and the low recognition accuracy. In order to achieve that, a lightweight GCSS-YOLOv5s algorithm is proposed. Firstly, we introduce the lightweight network GhostNet, use the Ghostconv module to replace the traditional convolution, and construct the C3Ghost module based on the CSP structure, drastically reducing the number of model parameters. Secondly, during the feature fusion process, we introduce the content-aware reassembly of features (CARAFE) lightweight up-sampling operator to enhance the feature integration capability of the pests by reducing the impact of redundant features after fusion. Then, we adopt SIoU as the bounding box regression loss function, which enhances the convergence speed and detection accuracy of the model. Finally, the traditional non-maximum suppression (NMS) was improved to Soft-NMS to improve the model’s ability to recognize overlapping pests. According to the experimental results, the mean average precision (mAP) of the GCSS-YOLOv5s model reaches 90.5%. This is achieved with a 44% reduction in the number of parameters and a 7.4 G reduction in computation volume compared to the original model. The method significantly reduces the model’s resource requirements while maintaining accuracy, which offers a specific theoretical foundation and technological reference for the future field of intelligent monitoring.

Keywords:

pest monitoring; GCSS-YOLOv5s; GhostNet; content-aware reassembly of features; SIoU; soft-non-maximum suppression

1. Introduction

Agriculture is an essential component of human development, entailing not just food supply but also economic, social, cultural, and environmental aspects. The growth of plants in agricultural cultivation is influenced by factors such as climatic conditions, soil properties, and anthropogenic activities. In subtropical regions, the favorable climate generates optimal circumstances for the growth and reproduction of pests. Pests significantly impede the growth of plants by gnawing on their leaves, stems, fruits, and other parts, stunting their progress. Moreover, some pests carry pathogens that can infect plants and damage the crop’s overall health. According to the survey, major pests and diseases of wheat, rice, maize, potato, and other major grain crops were heavy in 2022, with an estimated national occurrence of 2.026 billion mu times, an increase of 13.8% and 10.1% over the 2021 and 2016–2020 averages, respectively, and posing a threat to more than 70% of the grain crop production areas [1]. The economic losses caused by pests in Fujian Province alone amounted to 31.348 billion yuan, of which the direct economic losses amounted to 22.174 billion yuan, the prevention and control costs amounted to 8.424 billion yuan, and the indirect economic losses amounted to 750 million yuan [2]. Therefore, precision prevention and control of pests is of great research importance.

To ensure the sustainability and stability of agricultural production and to reduce the harm caused by pests to plants, scientists from various countries have proposed measures for eliminating pests, such as zonal management and precision chemical control. Monitoring pests to identify and obtain species and population information is the basis for achieving precise control and prevention. Currently, the commonly used method of pest monitoring is to estimate the pest species and number by manual visual inspection. However, this human detection method is easily affected by subjective factors. The detection range is small and inefficient, and it cannot obtain the dynamics of pest populations in real time. The problem of untimely prevention and control due to the delay in obtaining information on the pest situation often occurs. The advancement of science and technology has brought a new solution for pest monitoring using the Internet of Things (IoT) technology. This technology can monitor the farmland environment and pest activities in real time with wireless sensors, data collection, and edge computing. It provides farmers with accurate information that helps them manage agricultural pests more sophisticatedly [3]. According to the literature [4], a pest detection system has been developed using the Internet of Things (IoT). The system uses infrared sensors to count trapped pests and pest monitoring application software that enables users to query detection results with a counting accuracy rate of 93.52%. However, the species of pests cannot be identified. The literature authors [5] developed a real-time insect trap monitoring sensor to observe insect pests’ spread based on ZigBee and GPRS technology. The sensor could issue an early warning on the spreading trend of insect pests; however, the accuracy rate of the identified pest species was lower. In the literature [6], the authors evaluated the monitoring effect of four major lepidopteran pests using an IoT system to measure and control agricultural and forestry pests automatically. The results demonstrated that after the automatic map recognition and counting process was perfected, the system had better prospects for popularization and application. Despite some progress in IoT-based pest monitoring, identifying, classifying, and counting pests in the monitoring process still need to be addressed.

Computer vision and deep learning techniques have surfaced recently and can efficiently address the pest identification problem. In the literature [7], scholars utilize YOLOv5 technology to identify Cnaphalocrocis medinalis Guenee and Chilo suppressalis accurately. Meanwhile, the literature [8] employs computer vision and convolutional neural networks for identifying corn in a cold land, constructing a network model for corn pest identification, thereby providing an experience for research in the future. Additionally, the literature [9] enhances the traditional SSD model by substituting the original multiscale feature map with feature pyramids having a more vital characterization ability, resulting in a higher identification rate for more minor rice pests. The researchers mentioned above have established a foundation for pest recognition studies, but balancing recognition accuracy and model size is challenging. Additionally, accuracy improvement often results in augmenting the number of model parameters and computational overhead, making deploying algorithmic models at the system’s edge in resource-constrained IoT systems extremely burdensome. Thus, ensuring detection accuracy while acquiring a model with fewer parameters and memory consumption is the key to fulfilling the pest detection task.

In summary, this paper presents a lightweight pest detection algorithmic model that addresses the imbalance between the accuracy of recognizing common pests on edge devices and the resources required for model deployment. This paper is organized as follows: Section 2 focuses on the operational flow of the IoT-based pest detection system and the specific improvement of the pest identification algorithm model. Section 3 describes the pest dataset for this experiment, illustrates the experimental results of the improved method in this paper, and compares it with other lightweight models. The last section summarizes the work of this paper and describes the prospects of smart IoT in monitoring agricultural pests in the future.

2. Methods

2.1. IoT Monitoring Pest Processes

It is a waste of labor and time for growers to personally go to the fields and forests to conduct pest surveys, and the results estimated using visual inspection are often insufficient to reflect the pest situation accurately. The pest monitoring system based on IoT technology allows growers to monitor the pest situation in real-time at home, and its specific process is shown in Figure 1. First, growers can install pest traps in agricultural growing areas, which contain pest attractants and trapping lights, and the lights will automatically turn on at night to attract pests to gather. Pests attracted by the light source will fall through the impact screen to the bottom of the trap box, which will use infrared lights to kill the pests. At the same time, a camera is placed in the pest collection box to take photos and records; at this time, the system can recognize and count the pest images through our designed intelligent algorithm, and the results will be statistically analyzed to get the current environmental pest situation. The results will be uploaded to the remote pest monitoring platform, and the growers can view the analysis results in real time on the monitoring platform to realize the automatic monitoring and forecasting of pests. After getting real-time information about the pests, the growers can select the control measures according to the actual situation and effectively eliminate the pests.

2.2. Research on Detection Methods

We must be clear that standard models of pest recognition algorithms consume large amounts of resources and are difficult to deploy in IoT systems, which can lead to the failure of the system to function correctly, so this study needs to design an intelligent algorithm with high recognition accuracy and low resource consumption. With the rapid development of deep learning technology, more and more target detection models have achieved significant results in various tasks. Generally speaking, target detection models based on deep learning are divided into single-stage and two-stage. Single-stage models are represented by SSD [10], YOLO [11,12,13,14,15,16], and RetinaNet [17], and two-stage models are represented by R-CNN [18], Fast R-CNN [19], Faster R-CNN [20], Mask R-CNN [21], etc. Since the edge end of pest monitoring needs to meet high accuracy and high-speed demand, and the YOLO algorithm has excellent performance in real-time applications, this paper selects the YOLO series of algorithms as the research object. The core idea of the YOLO series of algorithms is to transform the target detection problem into a regression problem, which predicts both the category and the location of the target in a single network, divides the image into grid cells, and predicts the attributes of the target. Because of the limited computational resources of the pest monitoring devices at the edge end of the IoT system, the YOLOv5s algorithm with higher accuracy and fewer parameters is selected as the baseline model for improvement in this paper.

2.2.1. Baseline Model YOLOV5s

The network structure of YOLOv5s is shown in Figure 2, which can be divided into four parts: Input, Backbone, Neck, and Prediction. The Input layer is responsible for receiving the images to be detected and carrying out preprocessing operations such as data normalization, data enhancement, and fixing the size of the image, which converts the image into a format that can be processed by the neural network, to be inputted to the subsequent network layers. The Backbone part is the core part of the network, which is mainly responsible for the feature extraction of the data; the authors borrowed the design idea of CSPNet. First, the feature mapping of the base layer is divided into two, and then through the cross-stage hierarchical structure, the new CSPDarknet53 structure is designed. CSPDarknet53 is a deep residual network structure that gradually extracts features from an image through a series of convolution, pooling, and activation function layers and is able to learn high-level feature representations of images efficiently. The Neck part is mainly responsible for organizing the features extracted by the backbone network and using PANet to fuse multiscale feature information so that the network can detect targets of different sizes. The Prediction part uses the convolutional and fully connected layers to classify target categories and the bounding box regression, combined with the activation function, to achieve the prediction results.

2.2.2. Introducing GhostNet to Design Lightweight Networks

Traditional convolutional network modules have many parameters and computations, which is a huge challenge for hardware devices, often exceeding memory limitations and leading to slower training and inference in deep networks, making them difficult to deploy in resource-constrained edge devices. GhostNet [22] is a lightweight network proposed in 2020 to address the redundancy problem of the feature map similarity between them and obtain more feature information through cheap operations to achieve efficient feature extraction in lightweight networks. The core contribution of GhostNet is the Ghost module, whose main idea is to achieve efficient feature transformation by dividing the traditional convolution into two steps. The first step is to perform a simple convolution operation on the input feature map to obtain a part of the feature map, then to perform a cheap operation on this part of the feature map to obtain another part of the feature map, stitching the two parts of the feature map, and finally to perform the dimensionality transformations. This design allows the Ghost module to capture the features of the image adequately while improving computational efficiency. The flow of ordinary convolution and Ghost convolution is shown in Figure 3, where

\emptyset

represents the cheap operation [23].

Ordinary convolution is performed by sliding the convolution kernel and gradually sliding the convolution over the input feature map to obtain the output feature map, which is calculated as shown in Equation (1).

Y = X * f + b

(1)

Y is the output feature map, Y ∈ h′ × w′ × n, denoting the height, width, and number of feature maps, respectively. X is the input feature map, X ∈ c × h × w, denoting the number of convolution channels, height, and width, respectively. f is the convolution kernel, and k × k is the convolution kernel size magnitude. ∗ denotes the convolution operation and b denotes the bias term. Then the computation of ordinary convolution is: c × k × k × n × h′ × w′, and the parameter computation is larger when c and n are larger. Unlike traditional convolutional modules, the Ghost convolutional module generates some ‘ghost’ layers from a few original feature maps using an inexpensive linear transformation

\emptyset

to achieve efficient feature extraction. The formula for the intrinsic feature map Y′ generated after omitting the bias term is shown in (2), where Y′ ∈ h′ × w′ × m, f′ ∈ c × k × k × m, m denotes the number of feature map channels, and f′ is the number of channels of the k × k size convolution kernel [24].

Y^{'} = X * f^{'}

(2)

Assuming that the linear transformation convolution kernel size is d × d and the number of transformations is s, the phantom feature map y_ij is obtained after performing the linear transformation in Equation (3). where y′_i is the ith feature map in the intrinsic feature map and

\emptyset_{i, j}

denotes the jth linear transformation operation.

y_{i j} = ϕ_{i, j} ({y^{'}}_{i}), \forall i = 1, \dots, m, j = 1, \dots, s,

(3)

According to the transformation, we can get n = m × s feature maps Y = [y₁₁,y₁₂, …,y_ms], then the parameter compression ratio of traditional convolution to Ghost convolution is r_c:

r_{c} = \frac{n \cdot c \cdot k \cdot k}{\frac{n}{s} \cdot c \cdot k \cdot k + (s - 1) \cdot \frac{n}{s} \cdot d \cdot d} \approx \frac{s \cdot c}{s + c - 1} \approx s

(4)

Accelerated computation than r_c for:

\begin{array}{l} r_{s} = \frac{n \cdot h^{'} \cdot w^{'} \cdot c \cdot k \cdot k}{\frac{n}{s} \cdot h^{'} \cdot w^{'} \cdot c \cdot k \cdot k + (s - 1) \cdot \frac{n}{s} \cdot h^{'} \cdot w^{'} \cdot d \cdot d} \approx s \end{array}

(5)

The correlation between the effects of Ghost module parameter compression and computational acceleration and the number of transformations s is evident. To balance speed and accuracy, the number of transformations is set to 1/2, resulting in some acceleration while maintaining detection accuracy up to an extent. As Figure 4 illustrates, the Ghostconv module and C3Ghost module are constructed based on this basis. DwConv is a deep separable convolution, serving as one of them. The Ghostbottleneck module establishes a deep network structure with two step sizes 1 and 2 settings.

2.2.3. CARAFE: A Lightweight Upsampling Operator

The upsampling operation extends the low-resolution feature maps of an image to higher resolutions, combines the feature maps of various resolutions, helps the network to capture multiscale information at different levels, and enhances the sensory field and the representation capability of the network, ultimately increasing the model’s ability to locate the target. Currently, the standard upsampling methods used are interpolation upsampling and inverse convolutional upsampling techniques. Interpolation-based upsampling is categorized into two types—nearest neighbor and bilinear; however, the nearest neighbor interpolation leads to block artifacts in the image and fails to deal with small texture details, and bilinear interpolation smooths the image during upsampling, resulting in the loss of key details and blurring of the image edges, ultimately performing poorly during image reconstruction and semantic segmentation tasks. Inverse convolutional upsampling necessitates more considerable learning parameters that may lead to a more complex network model. Unfortunately, such a complex model is prone to overfitting and unsuitable for lightweight models [25]. To address the abovementioned issues, this study has proposed a CARAFE [26] lightweight upsampling module that replaces the original network’s upsampling structure, expands the model’s sensory field, adapts to the input content, and aims to improve its performance. Please refer to Figure 5 for the specific flowchart.

As shown in the figure above, it can be observed that the CARAFE up-sampling core comprises two primary components: kernel prediction and content-aware reassembly. The kernel prediction module is segregated into three parts—channel compressor, content encoder, and kernel normalizer. Initially, a feature map of size H × W × C is fed as input, which is then compressed by the channel compressor to a compressed feature map of size H × W × C_m, using a 1 × 1 convolution; and sent to the content encoder to predict the upsampling kernel utilizing the convolution layer of K_encoder × K_encoder, obtaining the upsampling kernel with the shape of σH × σW × K²_up. Finally, the kernel normalizer applies the softmax function to each generated recombination kernel to increase the feature recombination. In the content-aware reorganization module, the K_up × K_up region centered on each position of the output feature map is taken out and the reorganization kernel Wl′ is made a dot product, which leads to detail preservation and semantic accuracy improvement [27].

2.2.4. Loss Function Optimization

The loss function measures the error between the model prediction and the true label. During the training process, it is continuously optimized to minimize the differences and improve the model’s performance. In YOLOv5s, the default loss function for bounding box localization is CIoU [28]. It considers the intersection, centroid, and diagonal distance of the predicted and real boxes, making it a comprehensive and accurate loss function for target detection. However, it does not consider the directional mismatch between the real and predicted boxes, which can affect the model’s convergence speed and detection accuracy. To solve this problem, SIoU [29] includes the vector angle as a penalty term in the loss function, ensuring that predicted frames match the real frames more closely. This improvement enhances accuracy and speeds up the training process. Figure 6 shows SIoU’s angular costing.

The symbol b represents the coordinates of the center of the prediction frame, and b^gt represents the coordinates of the center of the real frame. The symbol

σ

represents the distance between the centroids of the prediction frame and the real frame. The symbol c_w represents the horizontal distance between the real frame and the smallest outer rectangle of the prediction frame. The symbol c_h represents the vertical distance between the real frame and the smallest outer rectangle of the prediction frame. Finally, the symbol α represents the equilibrium parameter. Below are the formulas for calculating the angular cost function Λ and the distance cost function

∆

, as defined in SIoU:

Λ = 1 - 2 \times \sin^{2} (\arcsin x - \frac{π}{4})

(6)

x = c_{h} / σ = \sin α

(7)

σ = \sqrt{{(b_{c_{x}}^{^{g t}} - b_{c_{x}})}^{^{2}} + {(b_{c_{y}}^{^{g t}} - b_{c_{y}})}^{^{2}}}

(8)

c_{h} = \max (b_{c_{y}}^{^{g t}}, b_{c_{y}}) - \min (b_{c_{y}}^{^{g t}}, b_{c_{y}})

(9)

Δ = \sum_{t} = {}_{x, y}{(1 - e^{- γ ρ_{t}})}

(10)

Among them:

ρ_{x}^{} = {(\frac{b_{c_{x}}^{g t} - b_{c_{x}}}{c_{W}})}^{2}, ρ_{y}^{} = {(\frac{b_{c_{y}}^{g t} - b_{c_{y}}}{c_{h}})}^{2}, γ = 2 - Λ

(11)

The distance loss in the two directions is denoted by ρ_x and ρ_y, respectively. When α approaches 0, the regression is performed directly in the horizontal direction. The shape cost loss function of SIoU is determined at this point:

Ω = \sum_{t = W, h} {(1 - e^{- ω_{t}})}^{θ}

(12)

ω_{W} = \frac{|W - W^{^{g t}}|}{\max (W, W^{^{gt}})}, ω_{h} = \frac{|h - h^{^{g t}}|}{\max (h, h^{^{g t}})}

(13)

W represents the width of the two frames, and h is the height of the two frames. The shape cost function, whose size is determined by θ set to 4, is controlled by the improved SIoU loss function as follows:

l_{SIoU} = 1 - I o U + \frac{Δ + Ω}{2}

(14)

2.2.5. Soft-NMS Implementation

Non-maximum suppression is a frequently used post-processing technique when conducting target detection tasks, which aims to choose the most appropriate target box from several overlapping candidate target boxes. This is carried out to enhance the quality and accuracy of detection results. The conventional NMS approach keeps the candidate boxes below a specific IoU threshold and discards those above this threshold, depicted in Equation (15).

s_{i} = \{\begin{cases} S_{i} = I o U (M, b_{i}) < N_{t} \\ 0, I o U (M, b_{i}) \geq N_{t} \end{cases}

(15)

This approach imprecisely rejects the detection target in dense target areas or partial occlusion, leading to a decline in detection accuracy. In contrast, Soft-NMS [30] does not simply reject candidate boxes higher than the threshold. Instead, it sorts all the candidate boxes based on their scores in descending order and subsequently reduces their confidence scores through an attenuation function. This function is usually a Gaussian or linear function whose attenuation degree weakens with the increase in the Intersection over Union (IoU). Consequentially, the influence of redundancy detection is reduced. In this paper, the attenuation function used is the Gaussian function, and its calculation formula is as follows,

s_{i} = s_{i} e^{- \frac{i o u {(Μ, b_{i})}^{2}}{σ}}, \forall b_{i}^{} \notin D

(16)

2.2.6. GCSS-YOLOv5s

Combined with the above, this paper proposes an improved model GCSS-YOLOv5s. Firstly, the CBS_Conv in the original network is replaced by Ghostconv, and the C3 module is replaced by C3Ghost module, which greatly reduces the number of parameters and computational overhead of the model; secondly, the CARAFE up-sampling is used to replace the original nearest neighbor interpolation up-sampling module of the network, which improves the sensory field of the model, and helps the model to obtain the multiscale information of the pest characteristics. Then the CIoU loss function is replaced by SIoU loss function in the model, which improves the convergence speed and detection accuracy of the model. Finally, Soft-NMS is adopted as the post-processing technique of the model, which enhances the model’s ability of recognizing overlapping pests. Figure 7 shows the structure of the improved model.

2.3. Evaluation Indicators

To objectively evaluate the effectiveness of a network’s improvement, this study adopts the following five evaluation metrics: TP represents the number of positive samples predicted as positive, FN represents the number of positive samples predicted as negative, FP represents the number of negative samples predicted as positive, AP_(c) represents the mean accuracy of a certain category, and N_(classes) represents the number of categories in multiclass tasks.

The precision rate refers to the proportion of true positive cases out of all samples predicted to be positive cases and is calculated as,

Precision = \frac{T P}{T P + F P}

(17)

2.: Recall refers to the proportion of all samples that are true positive cases that are correctly predicted to be positive cases and is calculated by the formula,

Recall = \frac{T P}{T P + F N}

(18)

3.: The average precision mean refers to the mean value of the precision of all categories, which is calculated by the formula,

m A P = \frac{\sum_{c \in c l a s s e s} A P_{(c)}}{N_{(c l a s s e s)}}

(19)

4.: Params reflects the size of the memory parameters occupied by the model.
5.: FLOPS reflects the size of the model model computation.

In the measurement of model detection accuracy, according to Equations (17) and (18), it can be seen that the values of precision and recall cannot be satisfied at the same time, so the average precision mean(mAP) is calculated as a more meaningful indicator of model accuracy. The mAP value can be used to measure the accuracy of identifying pests, the higher the accuracy represents the more accurate statistical analysis of pest information will be more accurate. A low accuracy rate will lead to errors in the IoT system’s statistical information on pests, resulting in misleading and biased decisions affecting growers, affecting subsequent growers’ ability to carry out targeted pest management, and increasing unnecessary costs.

3. Experimental Results and Analysis

3.1. The Agricultural Pest Dataset

Insects of the stem borer moth family, the golden tortoise family, and the mole cricket family are the most common and most frequent pests damaging agriculture. In China, these types of insects are projected to occur over an area of 310 million, 220 million and 80 million mu in 2022 [31], so monitoring and controlling these pests is in line with the current state of agricultural development. In this study, a total of 6626 images of agricultural pests were acquired for nine of the abovementioned pest species and in constructing this dataset, the researchers used light traps to capture the pests. The number of each pest species is shown in Table 1.

Rectangular bounding boxes were drawn around pests present in each image using the open-source image annotation tool labeling. Category labels were assigned to each box to facilitate the learning process of the algorithmic model. The dataset is divided into three parts: the training, validation, and test sets. There are 4786 images in the training set, 530 in the validation set, and 1326 in the test set. The training set is primarily used for extracting data features and building the model. The validation set determines hyperparameters, network structure, and other aspects of the model to enhance its generalization ability and robustness. The test set is the final dataset used to verify the model’s accuracy before its release and to evaluate the model’s generalization ability. The training set data are the central component of algorithmic model learning. It contains a statistical plot of the height-to-width ratio of the pest’s rectangular bounding box relative to the entire image, as illustrated in Figure 8. The darker the color means there are more pest targets at that location.

3.2. Experimental Environment

The hardware environment of the improved model is: CPU adopts E5-2680V3, 12 cores and 24 threads at 2.5 GHz, GPU adopts NVIDIA GeForce RTX3090 with 24 GB of video memory, software environment is: pytorch1.12.1, CUDA version 11.6, Python version 3.9. The experimental parameters are shown in Table 2.

3.3. Ablation Experiment

To verify the improved module’s impact on the model, this article designed an ablation experiment for testing. The detailed results are presented in Table 3, where the part marked with “√” represents the addition of the corresponding module and the “-” represents that the module has not been added.

(1): The first group represents experimental results without any improvement and serves as the baseline control group for this experiment. Its detection mAP value reaches 91.2%, indicating that the model significantly affects agricultural pest detection. However, the model’s parameters reach 7,034,398, the number of floating-point operations per second is 15.8 G, and the size of the model weights is 13.82 M. The model requires a considerable amount of computation, which makes it challenging to deploy on edge devices.
(2): The second group emerged due to introducing GhostNet, reducing the number of model parameters by 47.4% compared to the original model. The number of floating-point operations per second dropped by 7.7 G, and so did the model’s weight by 6.29 M, thereby significantly reducing the model’s size. Since the model has been made lighter, a decrease in accuracy is inevitable. As a result, the mAP value of the second group has decreased by 4.1%. The third, fourth, and fifth groups introduce CARAFE, SIoU, and Soft-NMS to the original model to enhance it. It can be seen that the CARAFE upsampling operator provides a slight parameter enhancement, while SIoU and Soft-NMS do not alter the parameter count and computation of the model.
(3): Groups six, seven, and eight introduced CARAFE, SIoU, and Soft-NMS modules after reducing the model size using GhostNet. We then compared the second set of data and observed an improvement of 0.5% and 1.1% in mAP values, including CARAFE and SIoU, respectively.
(4): The ablation experiments in the ninth to thirteenth groups show the interplay of these modules applied simultaneously in a network model, and the experiments show that the simultaneous application of these modules does not conflict but rather improves the detection accuracy of the model.
(5): The fourteenth group is the result of the combined use of all the modules, the mAP value of the test reaches 90.6%, the improved model presents a 0.6% decrease in accuracy with respect to the original model; however, the significant resource optimization offsets this small loss, the improved model has 44% less parameter volume, 7.4 G fewer floating-point operations per second, and 6.01 M fewer model weight sizes, effectively reducing the computational load for training and inference.

Figure 9 shows the P–R curves of the enhanced model. The detection mean average precision (mAP) values of seven pests were over 90%, among which Cnaphalocrocis medinalis Guenee, Chilo suppressalis, Naranga aenescens Moore, and Cockchafer even exceeded 95%. This fully proves the effectiveness of this improved model for pest detection.

3.4. Comparison Experiment

To demonstrate the superiority of the GCSS-YOLO5s algorithm, a comparison test system is designed in this paper to evaluate the performance of current YOLO series lightweight models in detecting pests. The detailed results are presented in Table 4. To begin with, YOLOv5m has the highest accuracy, with an mAP value of 91.8%, significantly greater than those of YOLOv5n and YOLOv7-tiny. However, it requires 20,885,262 parameters and 48.0 G floating-point operations per second, sacrificing accuracy for computational resources and making it almost impractical to deploy on edge devices. Regarding resource consumption, YOLOv5n has the lowest number of parameters, floating-point operations per second, and model weight size. However, it has a low mAP value and many misdetections and omissions. These factors could easily lead to wrong judgments for the growers and consequently affect the management of agricultural pests. The GCSS-YOLOv5s algorithm optimizes the model structure to combine the advantages of each model. As a result, the mAP values are 6.5% and 1.7% higher than those of YOLOv5n and YOLOv7-tiny, respectively. Moreover, when the accuracy of YOLOv3-tiny is the same, the number of parameters is reduced by 54.7%, the number of floating-point operations per second is reduced by 4.5 G, and the model weight is reduced by 8.86 M. Having slightly lower precision than YOLOv5m, the model requires 81.1% fewer parameters, 39.6 G fewer floating-point operations per second, and its model weight is 32.53 M. The enhanced model improves computational resource utilization efficiency while maintaining reasonable precision.

3.5. Visualization Analysis and Discussion

To better assess the performance of the improved detection algorithm, this paper displays the pest detection graphs under varying scenarios, as depicted in Figure 10. Please note that some small black bugs in the graph are mosquitoes and flies, which are not classified as pests. This paper demonstrates that the algorithm used for pest detection exhibits great robustness. Even when dealing with multitarget scale changes, the improved algorithm can maintain stability while accurately localizing and classifying pests. Furthermore, even if the target is masked, the algorithm can identify it accurately, owing to its strong feature extraction capability.

To explore the improvement of the algorithm, this paper introduces the confusion matrix to analyze the model’s performance. It provides detailed information about the classification results and aids in understanding the model’s performance. Figure 11 shows seven categories of pests with more than 90% accuracy classified, whereas Corn borer and Gryllotalpa spps. have lower accuracy. Among these, there was a 29% probability of misclassifying Corn borer as Chilo suppressalis and a 21% probability of Gryllotalpa spps. being misclassified as the background, leading to missed detection. According to our analysis, the reason for this is that Corn borer, Gryllotalpa spps. acquired a limited amount of data in this model training, the model does not have enough examples to learn its features, the model will be more inclined to predict the category with more samples, and the amount of data should be increased to improve the model. Overall, the improved model in this paper has far-reaching implications for resource efficiency and scalability in pest detection applications, although there is a 0.6% accuracy reduction relative to the original model; however, this small performance drop is reasonable and acceptable relative to the huge reduction in the number of parameters and computational resources.

4. Conclusions

The rapid development of IoT and computer vision technologies provides an effective means for pest control and prediction. This study aims to significantly reduce the number of model parameters and computational resources by lightweighting the YOLO5s model to explore the possibility of achieving efficient pest detection in resource-limited situations. Based on the original model, firstly, the GhostNet module is introduced to significantly reduce the number of parameters and computational resources of the model; secondly, the original up-sampling operator is replaced by the CARAFE up-sampling operator, and the CIoU loss function is replaced by the SIoU, to enhance the feature extraction and integration ability of the model and improve the convergence speed; finally, Soft-NMS is used for post-processing to enhance the extraction and identification of overlapping pests. Ideally, we would like the mAP of the pest identification model to reach more than 90%, so that the acceptability and credibility of the pest monitoring system can be maintained. Our final experiment results in a mAP value of 90.5%, which reduces the amount of parameters by 44% and the amount of computation by 7.4 G compared to the original model while maintaining high accuracy. In the context of limited computational resources and high real-time requirements, a slight reduction in accuracy is often reasonably acceptable, especially in the pursuit of efficient performance, large-scale deployment, and practical applications. The resource savings brought by model lightweight far outweigh the tiny loss of detection accuracy. The intelligent algorithms designed in this paper can be well suited for pest monitoring tasks in IoT systems, helping growers to make long-term trend analyses and optimized decisions. By analyzing these data, they can better understand pest ecology and activity patterns, meaning that growers no longer need generic insecticide treatments. However, they can instead target specific pest types and populations for targeted eradication, leading to more efficient resource allocation.

The future development of intelligent pest monitoring systems in agriculture is promising, and the algorithm designed in this paper can further balance resource conservation and model accuracy through channel pruning, model distillation, and feature extraction optimization to achieve more accurate and efficient pest monitoring and management. This will help reduce agricultural yield loss and improve the quality of agricultural products, thus contributing positively to sustainable agriculture, food safety and ecological protection.

Author Contributions

Methodology, Q.X.; validation, Q.X.; formal analysis, W.Z. and Z.C.; resources, Z.C.; data curation, Q.X. and W.Z.; writing—original draft, Q.X. and Y.H.; writing—review and editing, W.Z.; visualization, Y.H.; supervision, F.M.; project administration, F.M. and L.W.; funding acquisition, F.M. and L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Guangdong Science and Technology Innovation Strategy Fund Project (pdjh2023b0739), the National Student Innovation and Entrepreneurship Training Programme Project (202313684009), and the Guangdong Student Innovation and Entrepreneurship Training Programme Project (S202313684027X).

Institutional Review Board Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Dynamic information. Forecasting the occurrence trend of major crop pests and diseases in China in 2022. China Plant Prot. J. 2022, 42, 107–108. [Google Scholar]
Zheng, S.N.; Wei, W. Assessment of the Economic Losses Caused by Four Important Species of Fruit Fly Pests in Fujian Province. Chin. J. Biol. Control. 2019, 35, 209–216. [Google Scholar]
Zhang, L.; Dabipi, I.K.; Brown, W.L., Jr. Internet of Things applications for agriculture. In Internet of Things A to Z: Technologies and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2018; pp. 507–528. [Google Scholar]
Qiu, R.Z.; Zhao, J.; Chi, M.X.; Liang, Y.; Chen, S.X.; Weng, Q.Y. Design and Field Tests of Intelligent Pest Monitoring System based on Internet of Things. Fujian J. Agric. 2020, 35, 235–242. [Google Scholar]
Gao, L. Design of Orchard Pest Information Monitoring System Based on Iot Technology. Master’s Thesis, Anhui University, Hefei, China, 2018. [Google Scholar]
Zhang, Z.D.; Wang, L.M.; Qi, C.; Sun, Z.N.; Yang, L.L.; Liu, Y. Monitoring effects of the automatic test and control system of plant diseases and insects(ATCSP) on four main Lepidoptera pests in Jilin province. Plant Prot. 2021, 47, 217–221. [Google Scholar]
Liang, Y.; Qiu, R.Z.; Li, Z.P.; Chen, S.X.; Zhang, Z.; Zhao, J. Identification Method of major Rice Pests Based on YOLOv5 and Multi-source Datasets. J. Agric. Mach. 2022, 53, 250–258. [Google Scholar]
Chen, F.; Gu, J.T.; Li, Y.L.; Peng, X.X.; Han, T.J. A method for identifying corn pests in cold northeast China based on machine vision and convolutional neural network. Jiangsu Agric. Sci. 2020, 48, 237–244. [Google Scholar]
She, H.; Wu, L.; San, L.Q. Improved Rice Pest Recognition Based on SSD Network Model. J. Zhengzhou Univ. 2020, 52, 49–54. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao HY, M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
Liu, X.; Cai, L.C.; Chen, B.J.; Chen, K.; Gao, X.; Duan, S.S. Light weight end to end mobile phone detection method based on YOLOv5. Electron. Meas. Technol. 2023, 46, 188–196. [Google Scholar]
HuangFu, J.Y.; Meng, Q.; Meng, L.C.; Xie, Y.P. YOLOv5 Traffic Object Detection Based on GhostNet and Attention Mechanism. Comput. Syst. Appl. 2023, 32, 149–160. [Google Scholar]
Fan, T.H.; Gu, J.N.; Wang, W.B.; Zuo, Y.; Ji, C.; Hou, Z.H.; Lu, B.Y.; Dong, J.Y. A lightweight honeysuckle identification method based on improved YOLOv5s. J. Agric. Eng. 2023, 39, 192–200. [Google Scholar]
Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. Carafe: Content-aware reassembly of features. In Proceedings of the IEEE/CVF international conference on computer vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3007–3016. [Google Scholar]
Bai, R.; Xu, Y.; Wang, B.; Zhang, W.W. Road pothole detection algorithm based on improved YOLOv5s. Comput. Mod. 2023, 6, 69–75. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv 2022, arXiv:2205.12740. [Google Scholar]
Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS--improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5561–5569. [Google Scholar]
Jiang, X.L.; Chen, T.E.; Wang, C.; Li, S.Q.; Zhang, H.M.; Zhao, C.J. Survey of Deep Learning Algorithms for Agricultural Pest Detection. Comput. Eng. Appl. 2023, 59, 30–44. Available online: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2205-0604 (accessed on 24 June 2023).

Figure 1. IoT Pest Monitoring System Operation Process.

Figure 2. YOLOv5s network structure diagram.

Figure 3. Diagram of normal convolution and the Ghost convolution process.

Figure 4. Ghost module network structure.

Figure 5. Flowchart of sampling on CARAFE.

Figure 6. SIoU angle costing.

Figure 7. The network architecture of GCSS-YOLOv5s.

Figure 8. Statistics of the relative height-to-width ratio of the pest bounding box.

Figure 9. P–R graph.

Figure 10. (a–f) Effectiveness of the improved algorithm for pest detection.

Figure 11. Confusion matrix.

Table 1. Pest population table.

Index	Pest Type (Latin Name)	Pest Type (Common English Name)	Number of Instances
1	Cnaphalocrocis medinalis Guenee	Rice leaf folder	1675
2	Chilo suppressalis	Asiatic rice borer	4305
3	Sesamia inferens	Pink borer	1323
4	Chasminodes atrata	Black and white moths	272
5	Naranga aenescens Moore	Rice stem borers	506
6	Ostrinia nubilalis	Corn borer	91
7	Stiphotia candida Staudinger	Snow moth	46
8	Gryllotalpa spps.	Mole cricket	67
9	Cheirotonus macleayi	Cockchafer	603

Table 2. Experimental parameter settings.

Training Parameter	Parameter Value
initial learning rate	0.01
final learning rate	0.01
SGD momentum	0.937
optimizer weight decay	0.0005
batch size	8
epochs	150
warmup epochs	3.0
warmup initial momentum	0.8

Table 3. Results of ablation experiments.

Number	GhostNet	CARAFE	SIoU	Soft- NMS	P (%)	R (%)	mAP (%)	Params	Flops (G)	Weights (M)
N1	—	—	—	—	92.80	82.50	91.20	7,034,398	15.8	13.82
N2	√	—	—	—	82.60	82.30	87.10	3,697,302	8.1	7.53
N3	—	√	—	—	88.20	83.90	90.90	7,174,502	16.1	14.10
N4	—	—	√	—	90.90	85.20	91.70	7,034,398	15.8	13.82
N5	—	—	—	√	92.80	82.50	90.80	7,034,398	15.8	13.82
N6	√	√	—	—	88.90	78.80	87.60	3,837,406	8.4	7.81
N7	√	—	√	—	87.20	81.00	88.20	3,697,302	8.1	7.53
N8	√	—	—	√	82.70	82.30	87.10	3,697,302	8.1	13.82
N9	—	√	—	√	85.50	87.30	90.00	7,174,502	16.1	14.10
N10	—	—	√	√	91.00	84.90	91.30	7,034,398	15.8	13.82
N11	√	—	√	√	87.20	81.00	88.00	3,697,302	8.1	7.53
N12	√	√	√	—	89.30	84.70	90.40	3,837,406	8.4	7.81
N13	√	√	—	√	88.90	78.80	87.60	3,837,406	8.4	7.81
N14	√	√	√	√	89.30	84.70	90.60	3,937,406	8.4	7.81

Table 4. Comparative experimental results.

Model	P (%)	R (%)	mAP (%)	Params	Flops	Weights
YOLOv3-tiny	89.50	85.60	90.60	8,685,172	12.9	16.67
YOLOv5n	78.00	81.10	84.10	1,771,342	4.2	3.76
YOLOv5s	92.80	82.50	91.20	7,034,398	15.8	13.82
YOLOv5m	90.00	86.30	91.80	20,885,262	48.0	40.34
YOLOv7-tiny	84.90	82.20	88.90	6,029,244	13.1	11.76
GCSS-YOLOv5s	89.30	84.70	90.60	3,937,406	8.4	7.81

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiao, Q.; Zheng, W.; He, Y.; Chen, Z.; Meng, F.; Wu, L. Research on the Agricultural Pest Identification Mechanism Based on an Intelligent Algorithm. Agriculture 2023, 13, 1878. https://doi.org/10.3390/agriculture13101878

AMA Style

Xiao Q, Zheng W, He Y, Chen Z, Meng F, Wu L. Research on the Agricultural Pest Identification Mechanism Based on an Intelligent Algorithm. Agriculture. 2023; 13(10):1878. https://doi.org/10.3390/agriculture13101878

Chicago/Turabian Style

Xiao, Qixun, Wenying Zheng, Yifan He, Zijie Chen, Fanxin Meng, and Liyan Wu. 2023. "Research on the Agricultural Pest Identification Mechanism Based on an Intelligent Algorithm" Agriculture 13, no. 10: 1878. https://doi.org/10.3390/agriculture13101878

APA Style

Xiao, Q., Zheng, W., He, Y., Chen, Z., Meng, F., & Wu, L. (2023). Research on the Agricultural Pest Identification Mechanism Based on an Intelligent Algorithm. Agriculture, 13(10), 1878. https://doi.org/10.3390/agriculture13101878

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on the Agricultural Pest Identification Mechanism Based on an Intelligent Algorithm

Abstract

1. Introduction

2. Methods

2.1. IoT Monitoring Pest Processes

2.2. Research on Detection Methods

2.2.1. Baseline Model YOLOV5s

2.2.2. Introducing GhostNet to Design Lightweight Networks

2.2.3. CARAFE: A Lightweight Upsampling Operator

2.2.4. Loss Function Optimization

2.2.5. Soft-NMS Implementation

2.2.6. GCSS-YOLOv5s

2.3. Evaluation Indicators

3. Experimental Results and Analysis

3.1. The Agricultural Pest Dataset

3.2. Experimental Environment

3.3. Ablation Experiment

3.4. Comparison Experiment

3.5. Visualization Analysis and Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI