A Lightweight Rice Pest Detection Algorithm Using Improved Attention Mechanism and YOLOv8

Yin, Jianjun; Huang, Pengfei; Xiao, Deqin; Zhang, Bin

doi:10.3390/agriculture14071052

Open AccessArticle

A Lightweight Rice Pest Detection Algorithm Using Improved Attention Mechanism and YOLOv8

¹

College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China

²

Key Laboratory of Smart Agricultural Technology in Tropical South China, Ministry of Agriculture and Rural Affairs, Guangzhou 510642, China

^*

Author to whom correspondence should be addressed.

Agriculture 2024, 14(7), 1052; https://doi.org/10.3390/agriculture14071052

Submission received: 26 May 2024 / Revised: 26 June 2024 / Accepted: 26 June 2024 / Published: 29 June 2024

(This article belongs to the Section Crop Protection, Diseases, Pests and Weeds)

Download

Browse Figures

Versions Notes

Abstract

Intelligent pest detection algorithms are capable of effectively detecting and recognizing agricultural pests, providing important recommendations for field pest control. However, existing recognition models have shortcomings such as poor accuracy or a large number of parameters. Therefore, this study proposes a lightweight and accurate rice pest detection algorithm based on improved YOLOv8. Firstly, a Multi-branch Convolutional Block Attention Module (M-CBAM) is constructed in the YOLOv8 network to enhance the feature extraction capability for pest targets, yielding better detection results. Secondly, the Minimum Points Distance Intersection over Union (MPDIoU) is introduced as a bounding box loss metric, enabling faster model convergence and improved detection results. Lastly, lightweight Ghost convolutional modules are utilized to significantly reduce model parameters while maintaining optimal detection performance. The experimental results demonstrate that the proposed method outperforms other detection models, with improvements observed in all evaluation metrics compared to the baseline model. On the test set, this method achieves a detection average precision of 95.8% and an F1-score of 94.6%, with a model parameter of 2.15 M, meeting the requirements of both accuracy and lightweightness. The efficacy of this approach is validated by the experimental findings, which provide specific solutions and technical references for intelligent pest detection.

Keywords:

pest detection; YOLOv8; attention mechanism; loss metric; lightweight model

1. Introduction

Pests are prone to greatly affecting the yield and quality of rice crops [1,2]. Rice pest control can reduce crop losses, which is directly related to the healthy development of the grain industry. Therefore, timely and accurate pest control is a key measure to reduce losses and increase yields, and accurate and effective identification of crop pest species is a critical first step in agricultural pest control. Knowing which pests are present in the field and how many there are is critical for farmers to choose the most timely and effective control methods. Traditional pest identification relies on manual identification by technicians or experienced farmers, which is subjective and laborious, making it difficult to effectively implement large-scale applications in the field [3,4,5]. In this regard, it is necessary to research high-efficiency and low-cost intelligent pest detection methods to identify crop pest categories quickly and accurately.

With the continuous improvement in computing capabilities and the ongoing development of machine learning technologies, an increasing amount of research is based on deep learning techniques for the detection of agricultural pests [6]. For example, Guo et al. [7] improved the YOLOv4 algorithm using a more powerful feature extraction network for locating and classifying vegetable pests on yellow sticky traps. Wang et al. [8] proposed a handheld mobile pest detection system based on a convolutional neural network to automatically identify and count rice planthoppers in the wild environment. Dong et al. [9] designed a unique Multi-Level Spatial Pyramid Pooling structure, introduced an attention mechanism, and optimized the upsampling process. After testing, the proposed algorithm had achieved an accuracy rate of 90.7% for pest detection. Sun et al. [10] used YOLOv8l as the baseline model, used an asymptotic feature pyramid network, improved the C2f structure to reduce model parameters, and introduced attention to enhance feature extraction capabilities. In the experiment, the model parameters could be reduced by 55.26% and the mAP increased by 1%. Chen et al. [11] fine-tuned the YOLOv4 algorithm and deployed it to a small car equipped with a camera. The car drove over grain piles while identifying the red flour beetle and the rice weevil on the surface. The average detection accuracy in the experiment was up to 97.55%. Chen et al. [12] built a lightweight network based on the MobileNet-V2 network and embedded an attention mechanism, which was used to identify crop pest categories in field scenarios.

Zhang et al. [13] proposed a pest classification method, PCNet, based on lightweight CNNs and embedded attention mechanisms. The accuracy and model size of this method can meet the requirements for pest identification on devices with limited resources. Sanghavi et al. [14] introduced a new convolutional layer to reduce parameter redundancy in CNN models and proposed a Hunger Games Search-based deep convolutional neural network (HGS-DCNN) model to classify crop pest images. Hu et al. [15] added a Global Context attention mechanism to the network and introduced the BiFPN network to improve the feature fusion effect. The proposed algorithm improved the mAP by 5.4% compared to YOLOv5. Li et al. [16] added an attention mechanism that focused on pest objects in YOLOv5 to improve the performance of the algorithm and improved the bounding box loss function to reduce redundant calculations. After the experiments, the average accuracy rate for identifying 12 pests was as high as 96.51%, and the detection time was shorter.

Many studies have demonstrated that advanced deep learning technologies provide a new solution for intelligent pest detection in modern agriculture [17,18,19] and greatly improve detection efficiency. In many studies, large and complex models have been commonly employed, which could effectively fit training images to achieve the desired detection accuracy. However, these models often have a large number of parameters, consuming high computing resources. Agricultural applications typically occur outdoors, with limited available resources and low computing power, leading to slower detection speeds or longer detection times, which can impact detection results. Therefore, this research has developed a lightweight detection algorithm based on the YOLOv8 algorithm and an improved attention mechanism to address the aforementioned issues, aiming to achieve precise detection of rice pests.

2. Materials and Methods

2.1. Data Collection and Processing

Intelligent pest collection and imaging devices have been utilized in various studies to automatically acquire pest image data in the field [17,20,21,22]. Due to their ability to conveniently and continuously acquire field pest image information, they have become an important tool for intelligent pest detection. In this study, an intelligent pest image collection device, as shown in Figure 1a, was used. This device was set up in the field and operated continuously, with the top trapping light attracting and clustering pests. Pests that were lured and killed would drop onto the white background board in the pest collection box. At the same time, the camera inside the pest collection box would automatically capture front-facing images of the pests under the white background board at regular intervals and transmit the collected images to a server via network communication for subsequent pest detection and statistical analysis.

The device was placed in rice fields to capture common rice pests. Approximately 40,000 images were collected during the period from 2022 to 2023. Each image was taken by the camera facing the white background on which the pests were placed. There was a slight difference in light between day and night. After some images that did not contain any pests and some partially blurred and unusable low-quality images were removed, we selected those categories with a large number of pest targets as training data for the proposed detection algorithm. Through manual screening, a total of 2500 images containing 8 classes of target pests were selected for the experiment, including Cnaphalocrocis medinalis, Chilo suppressalis, Sesamia inferens, Hymenia re-curvalis Fabricius, Dytiscidae, Naranga aenescens Moore, Gryllotalpaspps, and Physopelta gutta. Figure 1b shows some examples of pest targets. Among the 2500 selected pest images, each pest category contained 800 objects, which were labeled by experienced agricultural personnel using the professional data labeling software LabelImg (v1.8.6).

During the image data preprocessing stage, we wanted to double the size of the image dataset to enhance data volume and improve the model’s generalization performance. For each image, we selected one of the operations (flipping or rotating) with a certain probability to augment the image. The final augmented pest dataset contained 5000 images. Following a ratio of 6:2:2, 3000 images were allocated for the training set, 1000 images for the validation set, and 1000 images for the test set. Each image may contain multiple pest objects, so we tried to ensure that the distribution of object numbers for each category was relatively uniform across the different datasets during the random split. The distribution of the number of pest objects for each category in the different sets is shown in Figure 2.

2.2. YOLOv8 Algorithm

The YOLO series algorithm is a single-stage detection model capable of obtaining network parameters through training on datasets, thereby achieving automatic extraction of object features. In recent years, it has garnered considerable attention due to its fast and accurate object detection capabilities [23]. After continuous research and development, it has evolved to the latest YOLOv8 version, which can provide an effective solution to complex computer vision tasks [24].

YOLOv8 [25] was improved based on YOLOv5 and comprises key components such as the backbone network, neck network, and detection head. The backbone network is responsible for gradually extracting feature information from the input image. YOLOv8 utilizes a CSPDarknet53 feature extractor similar to YOLOv5 for its backbone, further optimizing the structure of CSPDarknet53. Additionally, YOLOv8 introduces a new C2f module with richer gradient flow. The C2f module, a cross-stage partial bottleneck with two convolutions, combines high-level features with contextual information to enhance detection accuracy. The most significant structural changes occurred in the head, which separates the structure of the classification head from the detection head and switches from anchor-based to anchor-free detection. Regarding loss calculation, YOLOv8 abandons the previous IoU matching or unilateral proportion allocation method and adopts the Task-Aligned Assigner matching method proposed in TOOD. Furthermore, it derives the bounding box loss from the CIoU and Distribution Focal Loss (DFL) loss functions to improve target detection performance [24]. The overall detection architecture of YOLOv8 is depicted in Figure 3. Compared to previous versions of the YOLO series and other target detection algorithms, YOLOv8 offers higher detection accuracy and faster detection speed.

2.3. Proposed Algorithm

YOLOv8 comprises versions such as n, s, m, l, and x, which vary based on the model size. The primary distinction among them lies in the number of feature extraction modules and convolution kernels at specific locations in the network. A larger version of the model possesses stronger feature extraction capabilities and typically yields better detection results. However, its significantly increased parameters and computational costs also pose challenges in specific application scenarios, particularly for pest detection in agriculture.

After experimental evaluation, to balance computational costs and detection efficiency, this paper selected the YOLOv8n model, which was the most lightweight but slightly less accurate, as the baseline model. The aim was to improve it by reducing the number of model parameters while enhancing detection performance. Specifically, the structure and principles of CBAM were analyzed, leading to the design of a Multi-branch Convolutional Block Attention Module (M-CBAM). This module introduced a new multi-branch channel attention mechanism to further extract effective pest-related feature information, which was integrated into the three information flows between the backbone and neck. Additionally, the original bounding box loss metric CIoU of YOLOv8 was reevaluated, and the Minimum Points Distance Intersection over Union (MPDIoU) was adopted to replace the original metric, ensuring model stability and better convergence. Finally, the lightweight Ghost convolution module was integrated into the model to reduce the number of model parameters and computations, resulting in a lighter and faster detection model. The detailed architecture of the proposed algorithm is depicted in Figure 4.

2.3.1. Multi-Branch Convolutional Block Attention Module (M-CBAM)

Using the attention mechanism enables the model to focus on important features of the input image and suppress unnecessary ones, thereby enhancing the network’s recognition ability. The lightweight and effective Convolutional Block Attention Module (CBAM) attention mechanism module has recently received a lot of attention in the field of agricultural pest detection [16,26,27]. CBAM attention emphasizes the fine-grained features of objects, enabling the network to acquire better feature maps and enhance the accuracy of model detection. Figure 5 illustrates the structural diagram of CBAM, which comprises a channel attention module and a spatial attention module. The input feature map passes through those modules sequentially, adjusting and producing refined object features.

Channel attention captures channel dependencies in the feature map, focusing on channels with higher weights to enhance the representation capacity of important features. In the channel attention module, the input feature map

F

undergoes both max pooling and average pooling, resulting in two one-dimensional vectors. These vectors are then passed through a Multilayer Perceptron (MLP), and the combined output features go through a Sigmoid function to generate channel attention weights

M_{c} \in R^{C \times 1 \times 1}

, where C represents the number of channels. Finally,

M_{c}

is multiplied by

F

to obtain the feature map

F^{'}

after channel attention adjustment.

The spatial attention module generates spatial attention maps by considering attribute relationships between spatial locations in the input feature map, focusing more on the spatial location of meaningful information. First, max pooling and average pooling are applied to

F^{'}

, resulting in pooled features that are concatenated to generate a two-dimensional vector. Then, this vector is passed through convolutional layers, followed by applying the Sigmoid function to obtain the spatial attention map

M_{s} \in R^{1 \times H \times W}

. Finally,

M_{s}

is multiplied by

F^{'}

to obtain the spatially focused feature map

F^{″}

.

In summary, the process by which CBAM generates attention can be expressed as follows:

F' = M_{c} (F) \otimes F

(1)

M_{c} (F) = σ (M L P (A v g P o o l (F)) + M L P (M a x P o o l (F))) = σ (W_{1} (W_{0} (F_{a v g}^{c})) + W_{1} (W_{0} (F_{m a x}^{c})))

(2)

F ″ = M_{s} (F^{'}) \otimes F'

(3)

M_{s} (F) = σ (f^{7 \times 7} ([A v g P o o l (F); M a x P o o l (F)])) = σ (f^{7 \times 7} ([F_{a v g}^{s}; F_{m a x}^{s}]))

(4)

where

c

and

s

represent the channel and spatial attention modules, respectively;

M a x P o o l

and

A v g P o o l

are the max pooling and average pooling calculations, respectively;

F_{m a x}^{s}

and

F_{a v g}^{s}

represent the calculated outputs of the max pooling and average pooling, respectively; and

W_{0}

and

W_{1}

are the weights of MLP. ⊗ indicates element-wise multiplication, and

σ

is the Sigmoid activation function.

f^{7 \times 7}

is a convolutional kernel with a stride of 7 × 7.

The channel attention module combines the two features obtained by the max pooling and average pooling operations, which retain part of the feature information of the image. To further extract useful features, we improved the channel attention module and designed a Multi-branch Convolutional Block Attention Module (M-CBAM) based on this concept, which could enhance useful information by fusing max pooling features and average pooling features. The schematic diagram of the improved channel attention module is shown in Figure 6. Specifically, a third branch was added to the channel attention module and improved into a multi-branch channel attention module. The third branch combined the results of the max pooling and average pooling operations and then sent them to the MLP calculation to generate the corresponding third channel attention vector, thereby further enhancing the effect of the attention mechanism.

In the multi-branch channel attention module, the first branch obtained features from max pooling, the second branch obtained features from average pooling, and the proposed third branch combined features from both max pooling and average pooling. Subsequently, the feature information from the three branches was fed into the shared MLP to process and generate the three corresponding feature vectors. After summing them, the Sigmoid function was applied to obtain channel weights. The above process can be represented as follows:

M_{c} (F) = σ (M L P (A v g P o o l (F)) + M L P (M a x P o o l (F)) + M L P (A v g P o o l (F) + M a x P o o l (F))) = σ (W_{1} (W_{0} (F_{a v g}^{c})) + W_{1} (W_{0} (F_{m a x}^{c})) + W_{1} (W_{0} (F_{m a x}^{c} + F_{a v g}^{c})))

(5)

where

F_{m a x}^{c} + F_{a v g}^{c}

represent the addition of the output results of max pooling and average pooling, σ represents the Sigmoid activation function, and

W_{0}

and

W_{1}

are the weights of the MLP.

The improved channel attention module contained three branches. The third branch combined the information obtained by max pooling and average pooling operations to fully utilize the effective information of pooling. The shared MLP added a fully connected layer to process the newly added branch information. After being trained by the fully connected layer, the relationship between the two poolings of information could be better fitted. Additionally, the newly added branches used the pooling information of the original branches to perform operations, thereby further enriching the feature information without causing a significant increase in computing costs. By fusing the channel information of the three branches, the improved channel attention module could pay more attention to the salient feature information between pest categories and enhance the ability to identify pest characteristics.

In object detection tasks, the integration of attention modules at different locations may have different effects on model performance. As shown in Figure 4, we added the improved attention module between the backbone output and each neck input. Specifically, for the feature maps at different levels extracted by the backbone network, the proposed M-CBAM was applied to each feature map stream before it was input into the neck network and then sent to the neck network for the next step of feature interaction. Applying attention to the feature maps that had been gradually extracted could make the model further focus on important information and then send it to the neck network for effective feature fusion to enhance the representation ability of the model.

2.3.2. Minimum Points Distance Intersection over Union (MPDIoU)

The IoU metric is commonly used in object detection algorithms to evaluate the degree of overlap between anchor boxes. The higher the IoU value, the greater the overlap between the predicted box and the ground truth box, indicating a more accurate model prediction. The YOLOv8 algorithm used CIoU as the bounding box loss function. The CIoU loss function took into account factors such as the overlap area between the predicted box and the ground truth box, the distance from the center point, and the aspect ratio. As shown in the schematic diagram of CIoU in Figure 7a, the green box represents the predicted box, and the blue rectangle represents the ground truth box. The calculation of CIoU can be expressed by the following formula:

C I o U = I o U - \frac{ρ^{2} (C_{g t}, C_{p r d})}{d^{2}} - α V

(6)

V = \frac{4}{π^{2}} {(\arctan \frac{w^{g t}}{h^{g t}} - \arctan \frac{w^{p r d}}{h^{p r d}})}^{2}

(7)

α = \frac{V}{1 - I o U + V}

(8)

where

C_{g t}

and

C_{p r d}

represent the center point coordinates of the ground truth box and the predicted box, respectively, and

ρ^{2} (C_{g t}, C_{p r d})

is the Euclidean distance between the center points of the predicted box and the ground truth box.

d

represents the diagonal length of the smallest box covering the ground truth box and the predicted box.

α

is the weight function used to balance the scale, and

V

is used to measure the similarity between the aspect ratio of the candidate box and the ground truth box.

w^{g t}

and

h^{g t}

represent the width and height of the ground truth box, respectively, and

w^{p r d}

and

h^{p r d}

represent the width and height of the predicted box, respectively.

Although the CIoU bounding box loss metric considered multiple factors to improve the effectiveness of bounding box regression, it lost its effectiveness when the predicted bounding boxes and ground truth bounding boxes had different width and height values but the same aspect ratio. This could affect the training convergence speed and model detection accuracy. Additionally, geometric factors such as distance and aspect ratio would intensify the penalty for low-quality samples, thereby reducing the generalization performance of the model [28,29].

After evaluating commonly used IoU loss functions, including CIoU, Siliang M [30] proposed a new bounding box similarity comparison metric, MPDIoU, based on Minimum Points Distance. MPDIoU aimed to better train the model by minimizing the distance between the upper-left and lower-right points of the predicted bounding box and the ground truth bounding box. The schematic diagram of MPDIoU is shown in Figure 7b. It can fully exploit the geometric features between bounding boxes and include all relevant factors considered in existing bounding box loss measures, such as overlapping or non-overlapping areas, center point distance, width, and height deviations, while simplifying the calculation process and improving the model performance. MPDIoU is calculated as follows:

d_{1}^{2} = {(x_{1}^{p r d} - x_{1}^{g t})}^{2} + {(y_{1}^{p r d} - y_{1}^{g t})}^{2}

(9)

d_{2}^{2} = {(x_{2}^{p r d} - x_{2}^{g t})}^{2} + {(y_{2}^{p r d} - y_{2}^{g t})}^{2}

(10)

M P D I o U = I o U - \frac{d_{1}^{2}}{w^{2} + h^{2}} - \frac{d_{2}^{2}}{w^{2} + h^{2}}

(11)

where

w

and

h

represent the width and height of the input image.

(x_{1}^{g t}, y_{1}^{g t})

are the coordinates of the top-left and bottom-right corners of the ground truth bounding box, while

(x_{2}^{g t}, y_{2}^{g t})

are the coordinates of the top-left and bottom-right corners of the predicted bounding box. In Equation (9),

d_{1}

represents the distance between the top-left corners of the two bounding boxes, as indicated by the red dashed line in Figure 7b. Similarly,

d_{2}

represents the distance between the bottom-right corners of the two bounding boxes. Finally, combining

d_{1}

and

d_{2}

with

w

and

h

allows for a more comprehensive quantification of the differences between the two bounding boxes. A predicted bounding box that is closer to the ground truth bounding box signifies a more accurate prediction. Hence, during training, MPDIoU compelled each predicted bounding box of the model to approach the ground truth bounding box, thereby enhancing the detection performance of the model. In general, MPDIoU could simplify the similarity comparison between two bounding boxes and achieve better adaptation to bounding box regression with simpler operations. In order to verify the effectiveness of MPDIoU on the dataset used in this paper, we trained the YOLOv8 model on the dataset using both the original CIoU and MPDIoU loss measures and compared the changes in mAP during the training process. Figure 8 illustrates the mAP changes during training. It is observed that the mAP of the original CIoU exhibits large fluctuations in the initial stage, whereas the mAP of MPDIoU converges more steadily during the training process, eventually reaching 97.67%, surpassing CIoU. This indicates that the MPDIoU loss metric demonstrates better stability, facilitating faster model convergence and improved detection performance.

2.3.3. Ghost Module

Deep convolutional neural networks usually require a large number of convolution operations when extracting target features and consequently generate many similar redundant feature maps [31,32]. To achieve higher recognition accuracy, the commonly used larger models often require a substantial number of model parameters, which leads to a dramatic increase in computational costs and a slowdown in inference speed. However, Ghost convolution [33] offers a solution by employing more cost-effective operations to mitigate some of the redundant and complex calculation processes. Specifically, it utilizes lower-cost linear convolution operations to replace a portion of the original convolution operations, thereby achieving a lightweight construction of feature maps. Ghost convolution employs these low-cost linear convolution operations while maintaining properties akin to traditional convolution, such as learnability and input dependence, and can also be optimized through backpropagation. Figure 9 depicts the schematic diagram of both the traditional convolution module and the Ghost convolution module.

Figure 9a depicts a schematic diagram of the traditional convolution operation. Its calculation parameters are composed of the product of the input data and the value of each dimension in the filter. This value is usually large, and similar redundant feature maps exist in the convolved output data. The output data of a convolution operation might contain parts that are similar to each other, and these similar parts can be grouped together. One feature map in each group is selected as the intrinsic map, and other feature maps can be regarded as redundant parts, termed ghosts of the feature map. Since the features of intrinsic and ghost maps are largely similar, Ghost convolution uses cheaper operations to generate these redundant parts and keeps the spatial size of the output feature map consistent, thereby reducing the floating-point operations and parameters used to generate these redundant feature maps. The above process can be expressed by Equation (12):

y_{i j} = Φ_{i, j} (y_{i}^{'}), \forall i = 1, \dots, m, j = 1, \dots, s

(12)

Intrinsic maps are transformed into ghost maps through a series of cost-effective Simple Linear (SL) operations, where

y_{i}^{'}

represents the

i

-th intrinsic feature map generated after the convolution operation and

Φ_{i, j}

in the above function denotes the

j

-th (except the last one) linear operation for generating the

j

-th ghost feature map

y_{i j}

. The last

Φ_{i, s}

is the identity mapping for preserving the intrinsic feature maps, as shown in Figure 9b. Finally, the intrinsic feature map and ghost feature map are output in parallel to keep the number of output feature maps unchanged. Compared with ordinary convolution, SL can effectively reduce computational consumption. The calculation process for Ghost convolution is shown in Figure 9b.

This article employed the lightweight GhostConv module to replace part of the original Conv module for extracting features of pest objects with fewer parameters while maintaining considerable accuracy. We improved the original Conv module in the C2f module within the model with the GhostConv module, termed the C2f_Ghost module, and replaced some of the original C2f modules in the model. The structure of the improved C2f_Ghost module is depicted in Figure 10.

3. Results

3.1. Experimental Environment

The experimental environment for this study was the Windows 10 operating system with 32 GB of RAM. The CPU was an Intel(R) Core(TM) i7-10700, and a single NVIDIA RTX 3060 GPU was used for training and testing. The deep learning framework PyTorch version 2.0.1 was used. To speed up training, a transfer learning-based approach utilized pre-trained weights from the YOLOv8 algorithm. During training, the learning rate was set to 0.01, the momentum to 0.937, the weight decay to 0.0005, the input image size to 640 × 640, the batch size to 32, and the number of iterations to 300.

3.2. Evaluation Metrics

To evaluate the performance and effectiveness of the proposed method, we used precision (P), recall (R), mean average precision (mAP), F1-score, model parameters, and frames per second (FPS) as evaluation indicators. mAP was calculated as the average of the average precision (AP), which was obtained by calculating the area under the precision–recall (P-R) curve. The F1-score was calculated as the harmonic mean of precision and recall. Both the mAP and F1-score were used as comprehensive evaluation indicators for model detection accuracy. FPS was used to measure the algorithm’s detection efficiency and to evaluate whether the model met real-time requirements. By using these evaluation indicators, we comprehensively evaluated the performance of the proposed algorithm in pest detection tasks, reflecting its performance in different aspects. The calculation formulas for the relevant evaluation indicators are as follows:

P = \frac{T P}{T P + F P} \times 100 %

(13)

R = \frac{T P}{T P + F N} \times 100 %

(14)

A P = \int_{0}^{1} P (R) d R

(15)

m A P = \frac{\sum_{i = 1}^{n} {A P}_{i}}{n}

(16)

F_{1} - S c o r e = \frac{2 \times P \times R}{P + R}

(17)

where

P

and

R

are the precision and the recall, respectively.

T P

means the number of positive samples that the model correctly predicted as positive,

F P

is the number of negative samples that the model incorrectly predicted as positive, and

F N

means the number of positive samples that the model incorrectly predicted as negative.

A P

is the area under the P-R (precision–recall) curve,

m A P

is the average AP of each category, and

F_{1} - S c o r e

is the harmonic mean of

P

and

R

.

3.3. Ablation Experiments

This paper used YOLOv8 as the baseline model, and improvements were made in terms of enhancing accuracy and reducing model parameters. In this section, our ablation experiments are presented. They were conducted to explain the interaction between various model components and to evaluate the effectiveness and feasibility of the proposed pest target detection algorithm. The ablation experiments were performed on the same test set of pest images, where each proposed improvement method or a combination of two improvement methods was added to the baseline model and compared with the final model for testing and evaluation. Table 1 shows the comparative results of the ablation experiments of the proposed algorithm and the combination of improved methods, showcasing the performance of different improved methods and their combinations on the test set. The results demonstrate that different improvement methods improved the detection performance of the baseline model.

The ablation experiment results in Table 1 show the details of the performance improvement of the proposed method over the baseline algorithm; ✓ represents the use of the corresponding improvement method. Among them, the addition of the M-CBAM module had the greatest improvement in the detection performance of the baseline model. It improved the mAP and F1-score by 1.8% and 1.2%, respectively, over the baseline model. Furthermore, the additional model calculation parameters only increased by 0.01 MB, which could be ignored. Additionally, adding the Ghost module to the model reduced model parameters from 3.01 M to 2.14 M, a reduction of approximately 29%, while the corresponding detection performance only showed a slight decrease. This showed that the Ghost module could effectively reduce model parameters and computational costs without affecting detection performance. Subsequently, the performance of the model that combined the two improved methods also showed a corresponding improvement. The model combined with the M-CBAM module and MPDIoU method performed best on the test set, with the mAP reaching up to 96%, 2.8% higher than the baseline model, and the F1-score reaching 94.9%, 2% higher than the baseline model. Finally, due to the parameter reduction using the Ghost module, the algorithm slightly reduced the mAP, but it still reached up to 95.8%, which is 2.6% higher than the baseline model. The F1-score was 1.7% higher than the baseline model, and the model parameters were 29% lower than the baseline model. Therefore, the ablation experiments verified the effectiveness of the three proposed improvement methods for baseline model accuracy and lightweight improvement.

3.4. Comparison with Other Detection Models

In order to objectively verify the effect of the proposed algorithm, we conducted comparative experiments between the proposed method and advanced object detection algorithms using the same dataset in the same environment. In this section, the proposed algorithm was compared with mainstream detection algorithms, including the classic detection algorithms Faster R-CNN [34], SSD [35], YOLOv7 [36], YOLOv5 [37], Swin-Transformer [38], RT-DETR [39], and NanoDet [40]. Table 2 shows the comparative experimental data of the proposed algorithm and several SOTA detection algorithms in mAP, F1-score, model parameters, and FPS. The mAP of the proposed algorithm was 11.9%, 12.8%, 8.6%, 3.6%, 4.9%, 1.1%, and 32% higher than Faster R-CNN, SSD, YOLOv7, YOLOv5, Swin-Transformer, RT-DETR, and NanoDet, respectively. The F1-score and FPS also improved compared to the other algorithms, and our algorithm achieved the best performance. It is worth noting that the most lightweight detection algorithm, NanoDet, has a compact size of only 1.17 M and can achieve an FPS of 60. However, its detection performance on our dataset is not ideal. Additionally, the model parameters of the proposed algorithm are 2.15 M, which is significantly reduced compared with most other algorithms, showing significant lightweightness. In general, compared with other mainstream target detection algorithms, the proposed algorithm performed better in terms of accuracy and lightweightness.

3.5. Analysis of the Detection Results

In this section, 1000 pest pictures from the test set were input into the trained network model to verify the detection performance and generalization ability of the model. Figure 11 depicts the comparison of the average precision of detecting each class of pest between the baseline algorithm YOLOv8n and our proposed method. The table reveals that, compared to the baseline algorithm YOLOv8n, the proposed algorithm exhibited varying degrees of improvement in the detection accuracy of various pest classes. For pest categories where the YOLOv8n algorithm demonstrated relatively lower accuracy, the proposed algorithm showcased even more significance. For example, the average detection precision of CM increased from 90.8% to 95.5%, CS increased from 89% to 93.8%, and NAM increased from 90.2% to 93.8%. The experimental results verified the effectiveness of the proposed method in improving model performance.

In order to better understand the improvement in pest detection performance of the proposed algorithm and its practical application, we selected part of the test results for display. In Figure 12, the first row of images represents the detection results of the baseline YOLOv8 algorithm, and the second row depicts the detection results of the proposed algorithm. Among them, the baseline model exhibits the phenomenon of missed detection in the first row of Figure 12c,d. This may be because the targets overlapped and the baseline model was unable to identify them, or it could also be considered that two pest objects pasted together were recognized as one object by the baseline algorithm. In contrast, in the second row, our proposed algorithm accurately extracted and identified features, correctly locating the object and identifying its category. As for Figure 12e, in the first row, the baseline model also exhibits the phenomenon of missed detection. The proposed algorithm in the second row was able to detect the presence of pest objects at that location, but the detected category was incorrect. This shortcoming was also the direction we should focus on in the future.

4. Discussion

4.1. Model Performance

This study proposed an improved algorithm based on YOLOv8 for lightweight and precise recognition of rice pests. The algorithm aimed to balance the lightweight design requirement with detection accuracy for the pest detection task. The results of the ablation experiments indicate that the proposed Multi-branch CBAM effectively enhanced the focus on pest feature information, with a noticeable improvement in detection accuracy. The introduction of the MPDIoU loss function also optimized the model’s learning process, leading to an enhancement in detection performance to some extent. Additionally, the Ghost convolution module reduced redundant convolutions and decreased certain model parameters. The algorithm proposed in this study was compared and analyzed against other typical detection algorithms. The results showed that our algorithm achieved a detection mean average precision of 95.8% and an F1-score of 94.6%, with model parameters of only 2.15 M and an FPS of 48. This indicated a higher comprehensive detection performance, enabling lightweight and accurate rice pest detection.

4.2. Limitations and Future Work

Although the proposed method achieved good results on the dataset, there are still issues worthy of further discussion and improvement. On the one hand, the improved algorithm still has areas that need refinement under certain complex conditions. For example, in cases of overlapping or neighboring targets, the model may fail to capture their distinguishing features without sufficient learning, leading to the identification of multiple targets as a single one. To address this issue, one could potentially enhance the model’s ability to handle such situations by augmenting the dataset with more examples of these cases. Additionally, more sophisticated algorithms based on Transformer structures [38,41], which excel at capturing complex features and obtaining finer feature region representations, might also solve this problem. On the other hand, some similar pest categories become more difficult to distinguish after imaging due to factors such as shooting angles and lighting conditions affecting detection efficiency. For this category ambiguity issue, employing a multi-label prediction strategy [42] might offer a solution.

In the future, we plan to explore the application of this algorithm to more scenarios and datasets containing a wider variety of pest species to test and improve its generalization performance. Simultaneously, we will actively integrate the proposed algorithm into practical applications, facilitating rapid and accurate pest identification and counting, thereby providing informational support for pest control efforts.

5. Conclusions

An economical and efficient pest detection algorithm can save resources and maximize performance with limited resources, making it suitable for practical applications where performance and cost are key considerations. In this study, we developed a rice pest detection algorithm based on an improved YOLOv8. In this algorithm, we constructed a Multi-branch CBAM to further focus on the prominent features of pests. Additionally, we introduced MPDIoU to optimize the model and used lightweight Ghost convolutions to reduce the number of model parameters. This resulted in a lightweight and accurate rice pest detection algorithm, achieving a mean average precision (mAP) of 97.3%, which is a 3.2% improvement over the baseline model. The model’s parameter count is only 2.15 M, a 28.5% reduction compared to the baseline model. This method can assist in the automatic detection and counting of rice pests, providing more accurate and effective decision-making data for crop pest control efforts.

Author Contributions

Conceptualization, J.Y. and P.H.; Methodology, P.H.; Validation, P.H.; Formal analysis, P.H. and J.Y.; Data curation, P.H.; Writing—original draft, P.H.; Writing—review and editing, J.Y., P.H. and B.Z.; Visualization, P.H. and B.Z.; Project administration, D.X. and J.Y. Resources, D.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Planning Project of Guangzhou Air, space and ground integrated intelligent planting monitoring and pest early warning system (grant number 202206010116) and in part by the Key-Area Research and Development Program of Guangdong Province (grant numbers 2019B020217003 and 2019B020214002).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chou, C.; Hadi, B.A.R.; Chiba, S.; Sato, I.; Choi, I.-R.; Tanaka, T. An Entomopathogenic Fungus and a Natural Extract Benefit Rice (Oryza sativa) by Suppressing Populations of Insect Pests While Keeping High Populations of Their Natural Enemies. Biol. Control 2022, 165, 104793. [Google Scholar] [CrossRef]
Lou, Y.-G.; Zhang, G.-R.; Zhang, W.-Q.; Hu, Y.; Zhang, J. Biological Control of Rice Insect Pests in China. Biol. Control 2013, 67, 8–20. [Google Scholar] [CrossRef]
Chodey, M.D.; Noorullah Shariff, C. Hybrid Deep Learning Model for In-Field Pest Detection on Real-Time Field Monitoring. J. Plant Dis. Prot. 2022, 129, 635–650. [Google Scholar] [CrossRef]
Li, W.; Wang, D.; Li, M.; Gao, Y.; Wu, J.; Yang, X. Field Detection of Tiny Pests from Sticky Trap Images Using Deep Learning in Agricultural Greenhouse. Comput. Electron. Agric. 2021, 183, 106048. [Google Scholar] [CrossRef]
Sun, Y.; Liu, X.; Yuan, M.; Ren, L.; Wang, J.; Chen, Z. Automatic In-Trap Pest Detection Using Deep Learning for Pheromone-Based Dendroctonus Valens Monitoring. Biosyst. Eng. 2018, 176, 140–150. [Google Scholar] [CrossRef]
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep Learning in Agriculture: A Survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
Guo, Q.; Wang, C.; Xiao, D.; Huang, Q. Automatic Monitoring of Flying Vegetable Insect Pests Using an RGB Camera and YOLO-SIP Detector. Precis. Agric. 2023, 24, 436–457. [Google Scholar] [CrossRef]
Wang, F.; Wang, R.; Xie, C.; Zhang, J.; Li, R.; Liu, L. Convolutional Neural Network Based Automatic Pest Monitoring System Using Hand-Held Mobile Image Analysis towards Non-Site-Specific Wild Environment. Comput. Electron. Agric. 2021, 187, 106268. [Google Scholar] [CrossRef]
Dong, Q.; Sun, L.; Han, T.; Cai, M.; Gao, C. PestLite: A Novel YOLO-Based Deep Learning Technique for Crop Pest Detection. Agriculture 2024, 14, 228. [Google Scholar] [CrossRef]
Sun, D.; Zhang, K.; Zhong, H.; Xie, J.; Xue, X.; Yan, M.; Wu, W.; Li, J. Efficient Tobacco Pest Detection in Complex Environments Using an Enhanced YOLOv8 Model. Agriculture 2024, 14, 353. [Google Scholar] [CrossRef]
Chen, C.; Liang, Y.; Zhou, L.; Tang, X.; Dai, M. An Automatic Inspection System for Pest Detection in Granaries Using YOLOv4. Comput. Electron. Agric. 2022, 201, 107302. [Google Scholar] [CrossRef]
Chen, J.; Chen, W.; Zeb, A.; Zhang, D.; Nanehkaran, Y.A. Crop Pest Recognition Using Attention-Embedded Lightweight Network under Field Conditions. Appl. Entomol. Zool. 2021, 56, 427–442. [Google Scholar] [CrossRef]
Zheng, T.; Yang, X.; Lv, J.; Li, M.; Wang, S.; Li, W. An Efficient Mobile Model for Insect Image Classification in the Field Pest Management. Eng. Sci. Technol. Int. J. 2023, 39, 101335. [Google Scholar] [CrossRef]
Sanghavi, V.B.; Bhadka, H.; Dubey, V. Hunger Games Search Based Deep Convolutional Neural Network for Crop Pest Identification and Classification with Transfer Learning. Evol. Syst. 2023, 14, 649–671. [Google Scholar] [CrossRef]
Hu, Y.; Deng, X.; Lan, Y.; Chen, X.; Long, Y.; Liu, C. Detection of Rice Pests Based on Self-Attention Mechanism and Multi-Scale Feature Fusion. Insects 2023, 14, 280. [Google Scholar] [CrossRef]
Li, K.; Wang, J.; Jalil, H.; Wang, H. A Fast and Lightweight Detection Algorithm for Passion Fruit Pests Based on Improved YOLOv5. Comput. Electron. Agric. 2023, 204, 107534. [Google Scholar] [CrossRef]
Li, W.; Zheng, T.; Yang, Z.; Li, M.; Sun, C.; Yang, X. Classification and Detection of Insects from Field Images Using Deep Learning for Smart Pest Management: A Systematic Review. Ecol. Inform. 2021, 66, 101460. [Google Scholar] [CrossRef]
Saleem, M.H.; Potgieter, J.; Arif, K.M. Automation in Agriculture by Machine and Deep Learning Techniques: A Review of Recent Developments. Precis. Agric. 2021, 22, 2053–2091. [Google Scholar] [CrossRef]
Kiobia, D.O.; Mwitta, C.J.; Fue, K.G.; Schmidt, J.M.; Riley, D.G.; Rains, G.C. A Review of Successes and Impeding Challenges of IoT-Based Insect Pest Detection Systems for Estimating Agroecosystem Health and Productivity of Cotton. Sensors 2023, 23, 4127. [Google Scholar] [CrossRef]
Wang, Q.-J.; Zhang, S.-Y.; Dong, S.-F.; Zhang, G.-C.; Yang, J.; Li, R.; Wang, H.-Q. Pest24: A Large-Scale Very Small Object Data Set of Agricultural Pests for Multi-Target Detection. Comput. Electron. Agric. 2020, 175, 105585. [Google Scholar] [CrossRef]
Yao, Q.; Feng, J.; Tang, J.; Xu, W.; Zhu, X.; Yang, B.; Lü, J.; Xie, Y.; Yao, B.; Wu, S.; et al. Development of an Automatic Monitoring System for Rice Light-Trap Pests Based on Machine Vision. J. Integr. Agric. 2020, 19, 2500–2513. [Google Scholar] [CrossRef]
Xiao, Q.; Zheng, W.; He, Y.; Chen, Z.; Meng, F.; Wu, L. Research on the Agricultural Pest Identification Mechanism Based on an Intelligent Algorithm. Agriculture 2023, 13, 1878. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Terven, J.; Cordova-Esparza, D. A Comprehensive Review of YOLO: From YOLOv1 and Beyond. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Jocher, G.; Chaurasia, A.; Qiu, J. YOLO by Ultralytics, Version 8; Github: San Francisco, CA, USA, 2023.
Kang, J.; Zhang, W.; Xia, Y.; Liu, W. A Study on Maize Leaf Pest and Disease Detection Model Based on Attention and Multi-Scale Features. Appl. Sci. 2023, 13, 10441. [Google Scholar] [CrossRef]
Lu, Y.; Wu, X.; Liu, P.; Li, H.; Liu, W. Rice Disease Identification Method Based on Improved CNN-BiGRU. Artif. Intell. Agric. 2023, 9, 100–109. [Google Scholar] [CrossRef]
Shen, Y.; Zhang, F.; Liu, D.; Pu, W.; Zhang, Q. Manhattan-Distance IOU Loss for Fast and Accurate Bounding Box Regression and Object Detection. Neurocomputing 2022, 500, 99–114. [Google Scholar] [CrossRef]
Zhang, Y.-F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and Efficient IOU Loss for Accurate Bounding Box Regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
Siliang, M.; Yong, X. MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression. arXiv 2023, arXiv:2307.07662. [Google Scholar]
Dong, X.; Yan, S.; Duan, C. A Lightweight Vehicles Detection Network Model Based on YOLOv5. Eng. Appl. Artif. Intell. 2022, 113, 104914. [Google Scholar] [CrossRef]
Li, S.; Zhang, S.; Xue, J.; Sun, H. Lightweight Target Detection for the Field Flat Jujube Based on Improved YOLOv5. Comput. Electron. Agric. 2022, 202, 107391. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1577–1586. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. ISBN 978-3-319-46447-3. [Google Scholar]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
Jocher, G. YOLOv5 by Ultralytics, version 7; GitHub: San Francisco, CA, USA, 2020.
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. arXiv 2021, arXiv:2103.14030. [Google Scholar]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-Time Object Detection. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 3 April 2024. [Google Scholar]
RangiLyu NanoDet-Plus. Super Fast and High Accuracy Lightweight Anchor-Free Object Detection Model; GitHub: San Francisco, CA, USA, 2021. [Google Scholar]
Zhang, L.; Chen, K.; Zheng, L.; Liao, X.; Lu, F.; Li, Y.; Cui, Y.; Wu, Y.; Song, Y.; Yan, S. Enhancing Fruit Fly Detection in Complex Backgrounds Using Transformer Architecture with Step Attention Mechanism. Agriculture 2024, 14, 490. [Google Scholar] [CrossRef]
Guo, Q.; Wang, C.; Xiao, D.; Huang, Q. A Novel Multi-Label Pest Image Classifier Using the Modified Swin Transformer and Soft Binary Cross Entropy Loss. Eng. Appl. Artif. Intell. 2023, 126, 107060. [Google Scholar] [CrossRef]

Figure 1. Pest image acquisition. (a) Automatic equipment used for collecting pest images; and (b) examples of acquired pest images.

Figure 2. Distribution of the number of pest objects in each category in different sets.

Figure 3. Overall detection architecture of YOLOv8, which consists of input part, backbone, neck, and detection head. The backbone part is the CSPDarknet structure, the neck network is the PANet structure, and the detection head is the anchor-based head, which outputs classification scores (

H \times W \times C

) and boundary prediction (

H \times W \times 4

, as the red dotted line), respectively.

Figure 3. Overall detection architecture of YOLOv8, which consists of input part, backbone, neck, and detection head. The backbone part is the CSPDarknet structure, the neck network is the PANet structure, and the detection head is the anchor-based head, which outputs classification scores (

H \times W \times C

) and boundary prediction (

H \times W \times 4

, as the red dotted line), respectively.

Figure 4. Detailed architecture of the proposed improved YOLOv8. C2f_Ghost is a module that replaces the convolution in C2f with Ghost convolution. M-CBAM is the proposed Multi-branch Convolutional Block Attention Module.

Figure 5. Structure diagram of CBAM.

Figure 6. Improved channel attention module. A third branch is added that combines max pooling and average pooling information, and the corresponding third vector is generated through shared MLP.

Figure 7. Schematic of bounding box loss metric. (a) Schematic of CIoU. The green box is the predicted box, the blue box is the ground truth box, and the orange box is the smallest box surrounding the predicted box and the ground truth box. (b) Schematic of MPDIoU. The green box and blue box are the predicted box and the ground truth box, respectively, and the orange box represents the image input to the calculation.

Figure 8. Comparison of mAP changes during training. The blue line is YOLOv8 with MPDIoU, and the orange line is YOLOv8 with CIoU. When the model converges, the performance of MPDIoU used in this article is better than that of CIoU, and it can achieve a higher mAP.

Figure 9. Comparison of feature maps generated by traditional convolution and Ghost convolution. (a) Traditional convolution and (b) Ghost convolution.

Figure 10. Detailed structure of the improved C2f_Ghost module.

Figure 11. Detailed comparison of the average accuracy of the proposed algorithm and the baseline algorithm YOLOv8n in pest detection.

Figure 12. Comparison of some inference results. The first row represents the detection results of the baseline model, while the second row represents the detection results of the proposed algorithm. In (a,b), both the baseline algorithm YOLOv8 and the algorithm we proposed correctly identified the targets. In (c–e), the baseline algorithm in the first row exhibited missed detection. In contrast, the proposed algorithm in the second row was able to correctly locate and identify the targets. But in (e), the proposed algorithm’s detection result included an incorrect classification.

Table 1. Ablation experiment performance comparison results.

Algorithms	M-CBAM	MPDIoU	Ghost Module	mAP (%)	F1-Score (%)	Parameters (M)
YOLOv8				93.2	92.9	3.01
Algorithm1	✓			95	94.1	3.02
Algorithm2		✓		94.4	93.2	3.01
Algorithm3			✓	93.1	92.7	2.14
Algorithm4		✓	✓	94	93.1	2.14
Algorithm5	✓		✓	94.3	93.5	2.15
Algorithm6	✓	✓		96	94.9	3.02
Ours	✓	✓	✓	95.8	94.6	2.15

Table 2. Experimental comparison of several common algorithms.

Algorithms	mAP (%)	F1-Score (%)	Parameters (M)	FPS (f/s)
Faster R-CNN	83.9	84.8	41.16	19.8
SSD	83	83.4	25.42	26.3
YOLOv7	87.2	89.1	37.23	37.8
YOLOv5s	92.2	91.3	12.6	39.3
Swin-Transformer	90.9	87.3	44.78	20
RT-DETR	94.7	83.6	20.18	36
NanoDet	63.8	75.2	1.17	60
Ours	95.8	94.6	2.15	48

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yin, J.; Huang, P.; Xiao, D.; Zhang, B. A Lightweight Rice Pest Detection Algorithm Using Improved Attention Mechanism and YOLOv8. Agriculture 2024, 14, 1052. https://doi.org/10.3390/agriculture14071052

AMA Style

Yin J, Huang P, Xiao D, Zhang B. A Lightweight Rice Pest Detection Algorithm Using Improved Attention Mechanism and YOLOv8. Agriculture. 2024; 14(7):1052. https://doi.org/10.3390/agriculture14071052

Chicago/Turabian Style

Yin, Jianjun, Pengfei Huang, Deqin Xiao, and Bin Zhang. 2024. "A Lightweight Rice Pest Detection Algorithm Using Improved Attention Mechanism and YOLOv8" Agriculture 14, no. 7: 1052. https://doi.org/10.3390/agriculture14071052

APA Style

Yin, J., Huang, P., Xiao, D., & Zhang, B. (2024). A Lightweight Rice Pest Detection Algorithm Using Improved Attention Mechanism and YOLOv8. Agriculture, 14(7), 1052. https://doi.org/10.3390/agriculture14071052

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Lightweight Rice Pest Detection Algorithm Using Improved Attention Mechanism and YOLOv8

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection and Processing

2.2. YOLOv8 Algorithm

2.3. Proposed Algorithm

2.3.1. Multi-Branch Convolutional Block Attention Module (M-CBAM)

2.3.2. Minimum Points Distance Intersection over Union (MPDIoU)

2.3.3. Ghost Module

3. Results

3.1. Experimental Environment

3.2. Evaluation Metrics

3.3. Ablation Experiments

3.4. Comparison with Other Detection Models

3.5. Analysis of the Detection Results

4. Discussion

4.1. Model Performance

4.2. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI