Microscopic Insect Pest Detection in Tea Plantations: Improved YOLOv8 Model Based on Deep Learning

Wang, Zejun; Zhang, Shihao; Chen, Lijiao; Wu, Wendou; Wang, Houqiao; Liu, Xiaohui; Fan, Zongpei; Wang, Baijuan

doi:10.3390/agriculture14101739

Open AccessArticle

Microscopic Insect Pest Detection in Tea Plantations: Improved YOLOv8 Model Based on Deep Learning

by

Zejun Wang

^1,2,

Shihao Zhang

^2,3,

Lijiao Chen

¹

,

Wendou Wu

²,

Houqiao Wang

^1,2,

Xiaohui Liu

^1,2,

Zongpei Fan

^1,2 and

Baijuan Wang

^1,2,*

¹

College of Tea Science, Yunnan Agricultural University, Kunming 650201, China

²

Yunnan Organic Tea Industry Intelligent Engineering Research Center, Kunming 650201, China

³

College of Mechanical and Electrical Engineering, Wuhan Donghu University, Wuhan 430212, China

^*

Author to whom correspondence should be addressed.

Agriculture 2024, 14(10), 1739; https://doi.org/10.3390/agriculture14101739

Submission received: 26 August 2024 / Revised: 18 September 2024 / Accepted: 30 September 2024 / Published: 2 October 2024

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Pest infestations in tea gardens are one of the common issues encountered during tea cultivation. This study introduces an improved YOLOv8 network model for the detection of tea pests to facilitate the rapid and accurate identification of early-stage micro-pests, addressing challenges such as small datasets and the difficulty of extracting phenotypic features of target pests in tea pest detection. Based on the original YOLOv8 network framework, this study adopts the SIoU optimized loss function to enhance the model’s learning ability for pest samples. AKConv is introduced to replace certain network structures, enhancing feature extraction capabilities and reducing the number of model parameters. Vision Transformer with Bi-Level Routing Attention is embedded to provide the model with a more flexible computation allocation and improve its ability to capture target position information. Experimental results show that the improved YOLOv8 network achieves a detection accuracy of 98.16% for tea pest detection, which is a 2.62% improvement over the original YOLOv8 network. Compared with the YOLOv10, YOLOv9, YOLOv7, Faster RCNN, and SSD models, the improved YOLOv8 network has increased the mAP value by 3.12%, 4.34%, 5.44%, 16.54%, and 11.29%, respectively, enabling fast and accurate identification of early-stage micro pests in tea gardens. This study proposes an improved YOLOv8 network model based on deep learning for the detection of micro-pests in tea, providing a viable research method and significant reference for addressing the identification of micro-pests in tea. It offers an effective pathway for the high-quality development of Yunnan’s ecological tea industry and ensures the healthy growth of the tea industry.

Keywords:

AKConv; BIFormer; deep learning; improved YOLOv8; pest detection; SIoU

1. Introduction

Yunnan, as a symbolic production area of China’s tea industry, boasts a unique geographical environment, climate conditions, and soil characteristics that provide ideal conditions for tea growth [1]. These natural advantages not only give birth to high-quality tea but also make Yunnan tea highly reputed in domestic and international markets [2]. However, the complex cultivation environment of tea gardens often leads to frequent and severe pest infestations [3], which not only damage the yield and quality of tea but also affect the healthy growth of tea plants. Pests pose a serious threat to the tea industry by destroying leaves, reducing nutrient absorption, and lowering photosynthesis efficiency [4,5]. Traditional pest management methods rely on the experience and intuitive judgment of tea farmers, which can only identify pests of ordinary size and are unable to identify smaller pests, thus being inefficient and prone to misjudgment; they are not effective in addressing pest problems [6]. Therefore, the use of modern scientific and technological means for intelligent detection and identification of tea garden pests is of great significance for improving prevention and control efficiency, ensuring the quality and yield of tea, and promoting the sustainable development of the tea industry [7].

In the context of digital agriculture, the application of machine learning classification models and artificial intelligence technology has become particularly important. For instance, the YOLOv8 (You Only Look Once version 8) architecture has been proven effective in optimizing the phenotypic detection of tomato plants, enhancing model performance through data balancing strategies, and the model integrates the SE (Squeeze-and-Excitation) block attention module to strengthen the recognition ability for the research categories [8]. Moreover, intelligent management of pest control involves not only precise detection technologies but also in-depth data analyses and real-time responses to achieve automation and intelligence in crop-growth management, thereby enabling real-time monitoring for precision planting, plant phenotypic monitoring, load estimation, intelligent harvesting, and intelligent management [9].

In recent years, with the development of deep learning and object detection algorithms, the automatic detection and recognition of tea pests using image processing and deep neural networks have become a hot topic [10]. Currently, representative two-stage object detection methods such as Faster-RCNN (Faster Region-Based Convolutional Neural Network) [11,12] and representative one-stage object detection methods such as SSD (Single Shot MultiBox Detector) [13], YOLO (You Only Look Once) [14,15,16,17] are widely applied in the detection and recognition of targets such as crop diseases and pests. Researchers have continuously optimized these algorithm models and attempted to apply them to the study of crop disease and pest classification, detection, and identification. Fuentes et al. [18] utilized various deep network architectures and deep learning feature extraction methods to design a detection network suitable for tomato disease and pest detection, adapting to the complex environmental scenes surrounding plants. Additionally, Dai et al. [19] introduced Swin Transformer and Transformer mechanisms into the YOLOv5m (You Only Look Once version 5 medium) network, improving the robustness and effectiveness of pest detection.

To address the challenges of detecting small objects, the main approach currently is to improve upon the object detection network model. These improvement methods include multi-scale feature fusion [20], super-resolution techniques [21], context information learning [22], and attention mechanisms [23]. Liu et al. [24] proposed the Selective Spatial Attention Module (SSAM) attention module and the Multi-scale Feature Pyramid Network (MPFPN) structure, which extract more informative features of small objects while reducing background noise, thereby making the detection of small objects more effective. Chen et al. [25] introduced the DW-YOLO (Darknet Tiny YOLO) model, which optimizes residual blocks and enhances feature extraction capabilities to improve the detection of both small and large-scale objects, albeit at the cost of increased model complexity. Currently, there is a collection of small pest object data and feature presentation that meets the definition of small objects. However, there is room for improvement in existing object detection algorithms when dealing with small pest objects [26,27]. As the depth of the network increases, the loss of edge features may occur, and the visibility of pest targets is also limited by vegetation and foliage, which can impact the model’s feature extraction capabilities and accuracy [28,29]. Despite significant breakthroughs in object detection algorithms, the research on small object detection still faces numerous challenges. These challenges include:

The visualization features of small objects are not prominent, and they provide limited information. Due to the low resolution of the images, it becomes challenging to detect small objects accurately.
The detection of pest infestation in small objects relies on computationally intensive algorithms and extensive data processing, resulting in significant computational requirements and high costs.
Current pest identification methods primarily focus on easily recognizable and larger-sized pests. However, there are challenges in studying tiny pests due to their minimal visual differences and significant variations in appearance during different growth stages. These issues contribute to the low accuracy in identifying small pest infestations in tea leaves.

This study proposes an improved YOLOv8 network model for the detection of pests in tea gardens, which is challenging to identify with the naked eye. The model is enhanced to improve the detection speed and accuracy of tiny pests in tea leaves. To accelerate model convergence, simplify the computation process, and accurately locate target positions, the original loss function is replaced with the SIoU (Smoothed Intersection over Union). Additionally, the AKConv (Attention with Kernel Convolutions) is employed to enhance the accuracy of target detection, reduce model parameters, and computational overhead. Meanwhile, BiFormer (Bi-Level Routing Attention Vision Transformer) is embedded to strengthen the recognition efficiency for targets with partial shape damage, making the model’s computational allocation and target perception more flexible. The method proposed in this study provides a practical research approach and an important reference for solving the problem of microscopic pest identification in tea leaves, offering an effective way to ensure the high-quality development and healthy growth of the tea industry in Yunnan.

2. Material and Methods

2.1. Image Collection

The images used in this study were collected from the ecological tea garden in Chuxiong Yi Autonomous Prefecture, Yunnan Province (24.79° N, 100.79° E), as described in Figure 1A. To overcome the issue of difficult capture of tiny pests during the image collection phase, in addition to capturing pest images on tea leaves, this study also hung yellow sticky traps (Beijing EKOM Biotech Co., Ltd., Beijing, China) on the tea trees, as shown in Figure 1B, to attract pests. When the yellow sticky traps had attracted a large number of pests, the image collection equipment, as depicted in Figure 1C, was used to capture the images. The macro lens (Shenzhen Meishengke Trading Co., Ltd., Shenzhen, China) has a magnification of 200×, a lens structure of 4 elements in 4 groups, a multilayer coating, and an input of 5 V/1 A. This macro lens was used to obtain high-resolution images to facilitate the study’s capture of the phenotypic characteristics of tiny pests. To ensure the accuracy of the model, this study employed devices such as the iPhone 14 Pro Max (Apple Inc., Cupertino, CA, USA) and the Redmi K50 (Xiaomi Inc., Beijing, China) for data collection, thereby enhancing the robustness and generalization capability of the recognition model, enabling it to adapt to various shooting conditions and equipment. As shown in Figure 1, this illustrates the method of image collection.

2.2. Image Preprocessing and Dataset Split

A total of 1346 original images were collected in this study, with a resolution of 7952 × 5340 and in .JPG format. Among these original images, images of four pest categories were classified: Xyleborus fornicatus Eichhoffr [30], Empoasca pirisuga Matumura [31], Arboridia apicalis [32], and Toxoptera aurantia [33]. From the original images, a subset of higher-quality images was selected, consisting of 189 images of Xyleborus fornicatus Eichhoffr, 221 images of Empoasca pirisuga Matumura, 168 images of Arboridia apicalis, and 225 images of Toxoptera aurantii, resulting in an initial dataset.

To enhance the model’s learning and generalization capabilities, as well as to strengthen its robustness, this study applies image enhancement techniques to the original images, as shown in Figure 2A [34]. These techniques include brightness adjustment [35], contrast enhancement [36], and random distortion processing [37], as depicted in Figure 2B, aiming to simulate various lighting conditions, enhance detail information, and increase the diversity of the dataset. Specifically, brightness adjustment is achieved by increasing and decreasing the brightness by 1.4 times and 0.6 times, respectively, enabling the model to adapt to different lighting environments. Contrast enhancement further improves the clarity of details in the images by increasing and decreasing the contrast by 1.4 times and 0.6 times, respectively. Additionally, this study introduces random distortion processing, which effectively increases the diversity of the dataset by randomly rotating, scaling, and translating the images. This enhances the model’s ability to adapt to different postures and positional changes, strengthens the model’s recognition capability for subtle changes in pest images, and ensures that the model maintains a high recognition accuracy when facing complex real-world scenarios. After processing the initial dataset images with image enhancement techniques, a dataset containing 6442 images was formed. Subsequently, images without pest targets and those with more than 80% missing targets were removed, resulting in a final dataset of 6198 images selected to ensure data quality and diversity.

In this study, to ensure the objectivity and validity of the experimental results and to enhance the reliability of the research, a 5-fold cross-validation method, as illustrated in Figure 3, was employed to assess the model performance on the image dataset [38]. The entire dataset was randomly divided into five equal subsets. In each round of cross-validation, one subset was selected as the validation set, while the remaining four subsets were combined to serve as the training set for model training. This process was repeated for five rounds, with a different subset used as the validation set in each round. After completing all five rounds of cross-validation, the average of the performance evaluation metrics of the model on the validation set in each round was calculated, which served as the metric for the model’s comprehensive performance. Furthermore, 20% of the data were reserved as an independent test set to evaluate the model’s generalization capability after the final model selection [39]. This method fully utilizes the dataset for training and validating the model while reducing the impact of randomness on the model performance assessment, thereby more reliably evaluating the model’s generalization capability.

2.3. YOLOv8 Network Improvement

YOLOv8 is a deep learning-based object detection algorithm known for its efficient computational capabilities and flexible deployment. It can handle input images of various sizes and achieve faster detection speeds while maintaining accuracy [40]. This algorithm utilizes the “Darknet” deep learning framework, which consists of multiple convolutional and pooling layers for feature extraction, as well as fully connected detection layers for generating object bounding boxes and class probabilities [41]. Although the YOLOv8 network model is capable of accurately detecting and recognizing objects, supporting multi-class object detection and real-time tracking, it faces challenges in achieving the precise detection of early-stage, small-scale pest infestations in tea gardens and struggles with small datasets and extracting target phenotypic features in tea pest detection. Therefore, this study proposes improvements to the YOLOv8 network framework to address these issues. The study employed the SIoU loss function to replace the original loss function, aiding the model in better learning to locate pests accurately. The calculation takes into account the distance between the centers of the bounding boxes, as well as the ratio of the predicted box area to the real box area, providing a more comprehensive assessment of the quality of the predicted box. AKConv was introduced to replace parts of the network structure, thereby enhancing the feature extraction capability, and reducing the number of model parameters. Additionally, BiFormer was embedded, equipping the model with more flexible computational allocation and an enhanced ability to obtain target location information, thereby improving the model’s perceptual performance for targets. These iterative improvements achieve efficient and precise detection. The improved YOLOv8 network structure is shown in Figure 4.

2.3.1. Improvement of the Loss Function

In the context of tea pest identification and detection, the original YOLOv8 model may encounter challenges due to factors such as target overlap, occlusion, and missing body parts, which can lead to suboptimal recognition performance. To address this issue, this study replaces the CIoU (Complete Intersection over Union) loss function, used in the original model, with the SIoU loss function that comprehensively considers regression metrics and handles direction mismatch. This substitution aims to enhance the model’s detection accuracy and robustness, optimize the training process, and accelerate convergence speed.

The SIoU loss function consists of four loss components: angle loss, distance loss, shape loss, and IoU loss [42]. The angle loss reduces distance-related variables by adding an angle-aware component. During convergence, if

α \leq \frac{\prod}{4}

,

α

is minimized first; otherwise,

β

is minimized.

α

and

β

are complementary angles. Based on this, a prediction is made on one of the X-axis or Y-axis and continuously approached, causing the predicted box to continually converge towards the real box. The structure of its SIoU loss function is shown in Figure 5. The green box in Figure 5 represents the real box, and the red box represents the predicted box.

Whereas, the angle-aware component is defined as shown in Equation (1).

Λ = 1 - 2 {s i n}^{2} (\sin^{- 1} (\sin α) - \frac{π}{4})

(1)

In the equation:

α

represents the horizontal angle;

b_{C_{x}}^{g t}

and

b_{C_{y}}^{g t}

denote the coordinate information of the ground truth box;

b_{C_{x}}

and

b_{C_{y}}

represent the coordinate information of the predicted box.

The distance loss is computed using the formula shown in Equation (2).

Δ = \sum_{t = x, y} (1 - e^{- γ ρ_{t}}) .

(2)

In the equation:

ρ_{x}

represents the degree of offset between the predicted box and the ground truth box along the X-axis, where

ρ_{x} = {(\frac{b_{C_{x}}^{g t} - b_{C_{x}}}{C_{w}})}^{2}

;

ρ_{y}

represents the degree of offset between the predicted box and the ground truth box along the Y-axis, where

ρ_{y} = {(\frac{b_{C_{y}}^{g t} - b_{C_{y}}}{C_{h}})}^{2}

;

C_{w}

and

C_{h}

denote the length and width of the minimum bounding box;

γ

represents a distance value, where

γ = 2 - Λ

. From the formula, it can be observed that as the horizontal angle

α

increases, the distance loss

Δ

also increases, indicating an increase in computational difficulty. Conversely, as the horizontal angle

α

decreases, the distance loss

Δ

approaches the conventional scenario.

The calculation formula for shape loss is given in Equation (3).

\{\begin{matrix} Ω = \sum_{t = w, h} {(1 - e^{- ω_{t}})}^{θ}, θ \in [2,6] \\ ω_{w} = \frac{|w - w^{g t}|}{m a x (w, w^{g t})} \\ ω_{h} = \frac{|h - h^{g t}|}{m a x (h, h^{g t})} \end{matrix} .

(3)

In the equation:

θ

represents a parameter setting;

w

and

h

denote the length and width of the rectangle formed by the predicted box and the ground truth box;

w^{g t}

and

h^{g t}

represent the length and width of the ground truth box.

Finally, the SIoU loss function can be obtained as shown in Equation (4).

L_{S I o U} = 1 - L_{I o U} + \frac{Δ + Ω}{2}

(4)

2.3.2. AKConv

Traditional convolution operations have achieved significant successes in the field of deep learning. However, they also have certain limitations. Firstly, the sampling shape and size of traditional convolutions are fixed, limiting the convolution operation to a local window, and preventing the capture of information from other locations. Secondly, the size of the convolution kernel is fixed as a square of size

k \times k

. As the size increases, the number of parameters and computational complexity dramatically increase, which is disadvantageous for constructing lightweight models [43]. In order to overcome these limitations, this study has introduced AKConv, which adapts to various image features by dynamically adjusting the shape of the convolution kernel, thereby enhancing the model’s adaptability and efficiency. This approach not only improves model performance but also reduces the number of model parameters. The specific structure is illustrated in Figure 6 below.

The core advantage of AKConv lies in the dynamic adaptability of its convolutional kernel. Specifically, AKConv first performs convolution operations on the input image using an initial sampling shape. This initial sampling shape is preset based on the features of the input image to ensure that the convolution operation can effectively capture key information. Subsequently, the shape of the convolution kernel is dynamically adjusted according to the learned offset to adapt to changes in image features. This is followed by resampling and processing with the activation function SiLU (Scaled Exponential Linear Unit), ultimately yielding the output result [44]. This dynamic adjustment mechanism allows AKConv to flexibly adapt to targets of different sizes and shapes, thereby achieving higher precision in the feature extraction process.

Furthermore, the design of AKConv enables it to automatically adjust the sampling shape of the convolution kernel based on the density of pests and the phenotypic features of the targets, thereby optimizing the number of parameters and computational efficiency. AKConv’s adaptive sampling shape first determines the initial sampling positions of the convolution kernel through its coordinate generation algorithm, which can dynamically change according to the features and targets in the image. Subsequently, to better adapt to the size and shape changes in the targets in the image, AKConv adjusts the sampling positions of the convolution kernel according to the characteristics of the targets. Finally, the feature map is resampled based on the adjusted sampling shape to achieve more precise feature extraction. Its adaptive sampling shape is shown in Figure 7.

In traditional convolutional operations, as the size of the convolution kernel increases, the number of parameters and computational load grow at a quadratic rate, which can lead to inefficiencies in resource-constrained environments. In contrast, AKConv, through its innovative kernel design, achieves linear growth in the number of parameters, significantly reducing the computational burden of the model. This design not only improves the operational efficiency of the model but also makes AKConv suitable for a variety of hardware environments, including mobile devices and embedded systems. Compared to traditional convolution, AKConv also has the ability to adapt to local feature changes at different locations. It uses offset adjustments to adapt the position of the convolution kernel during sampling, thereby better accommodating the non-rigid deformation, occlusion, and complex background of targets. This provides stronger impetus for subsequent pest detection. A schematic diagram of this adjustment process is shown in Figure 8.

2.3.3. BiFormer

In visual transformers, attention mechanisms play a crucial role in capturing key image features, improving model performance, and addressing scalability issues. The multi-head self-attention mechanism effectively captures diverse-angle features, thereby enhancing model performance. In this study, a dynamic sparse attention mechanism based on dual-layer routing is employed to improve the model based on the original YOLOv8 architecture, enabling it to adapt to pest recognition [43]. By utilizing image segmentation and region feature extraction, attention is computed using dispersed key-value pairs, which are particularly suitable for handling occluded or complex regions. The dual-layer routing provides richer information to support task execution. The schematic diagram of its structure is illustrated in Figure 9.

Firstly, for the pest image, this module segments it into

S \times S

non-overlapping regions, each with a size of

\frac{H \times W}{S^{2}}

, taking into account the original image’s height

H

and width

W

. Subsequently, the feature vectors of these regions are transformed into

Q, K, V

through linear mapping. During the mapping process, Equation (5) is applied, where

X^{r} \in R^{S^{2} \times \frac{H W}{S^{2}} \times C}

represents the sub-region of the feature map, and

W^{q}

,

W^{k}

, and

W^{v}

are the projection weights for the Query, Key, and Value, respectively. To obtain an overall feature representation for each region, the average value of the region features is calculated to generate

Q^{r}, K^{r} \in R^{S^{2} \times C}

. Simultaneously, to capture the correlation between

Q^{r}

and

K^{r}

, a correlation adjacency matrix is computed following Equation (6). Then, by multiplying the transpose of

Q^{r}

with

K^{r}

,

A^{r}

is obtained, which represents the degree of correlation between different regions according to the definition in Equation (7).

Q^{r}

,

K^{r}

, and

T

represent the region-level queries, region-level keys, and the transpose operation, respectively. To perform coarse-grained region-level routing calculations, a routing index matrix

I^{r} \in N^{S^{2} \times k}

is introduced. This matrix stores the indices of the k connections for each region and helps eliminate weaker correlations. To effectively handle the collected key tensor

K

and value tensor

V

, a public key normalization operation is applied, as described in Equations (8) and (9). In these equations,

K^{g}

represents the key aggregation tensor,

K

represents the original keys,

I^{r}

represents the routing index matrix,

V^{g}

represents the value aggregation tensor, and

V

represents the original values. Finally, we apply the attention mechanism to

K^{g}

and

V^{g}

, generating the feature map

O

according to the definition in Equation (10). In this definition,

O

represents fine-grained attention from token to token, and

L C E (V)

represents the local context enhancement term.

Q = X^{r} W^{q}, K = X^{r} W^{k}, V = X^{r} W^{v}

(5)

A^{r} = Q^{r} {(K^{r})}^{T}

(6)

I^{r} = t o p k I n d e x (A^{r})

(7)

K^{g} = g a t h e r (K, I^{r})

(8)

V^{g} = g a t h e r (V, I^{r})

(9)

O = A t t e n t i o n (Q, K^{g}, V^{g}) + L C E (V)

(10)

3. Results and Analysis

3.1. Datasets

In this study, image enhancement techniques were used to expand the dataset to 6442 images, and after screening, a final selection of 6198 images was made to construct an image dataset containing four types of pests: Xyleborus fornicatus Eichhoff, Empoasca pirisuga Matsumura, Arboridia apicalis, and Toxoptera aurantii. To ensure the accuracy of the dataset annotations, open-source annotation software Labeling (https://github.com/CVHub520/X-AnyLabeling, accessed on 25 August 2024) was used to manually annotate the pests in the images. The visualization analysis of the annotation results is shown in Figure 10, where the bounding boxes vary in size, but most aspect ratios are distributed between 0.04 and 0.4, reflecting the richness and detection difficulty of small pest targets. Figure 10A shows the distribution of the number of pest labels in the dataset, with a total of 7104 labels. Figure 10B displays the aspect ratios of the individual label boxes after standardizing the x and y coordinates of the labels. Figure 10C provides a detailed description of the distribution of x and y coordinates in the image, while Figure 10D shows the distribution of the label aspect ratios. Figure 10E further provides detailed information on the distribution of labels in the original dataset, deepening the understanding of the dataset structure. Through in-depth analysis of the dataset, the detection algorithm can be optimized to improve the detection accuracy of small pests, thereby playing a key role in agricultural production.

3.2. Experimental Environment and Parameter Settings

To validate the identification and detection effectiveness of the improved YOLOv8 network model on pests, comparative experiments were conducted with YOLOv10, YOLOv9, original YOLOv8, YOLOv7, Faster-RCNN, and SSD network models. To ensure the rigor and validity of the experimental training for each model, the same experimental platform and software versions were utilized, as illustrated in Table 1.

To validate the effectiveness of the improved method in this study, higher-level classification metrics Precision [45] and Recall [46] were calculated using the four elements of the binary confusion matrix. Additionally, the F1 score [47], AP (Average Precision) [48], and mAP [49] were introduced as measurement evaluation standards for assessing the performance of the classification target detection model. The calculation formulas are as follows:

P r e c i s i o n = \frac{T P}{T P + F P} \times 100 %

(11)

R e c a l l = \frac{T P}{T P + F N} \times 100 %

(12)

F_{1} = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(13)

A P = \int_{0}^{1} P r e c i s i o n (R e c a l l) d R e c a l l

(14)

m A P = \frac{1}{M} \sum_{i = 1}^{M} A P (i)

(15)

In the formula: TP (True Positive) represents the number that is predicted as pests and also identified as pests by the model; FP (False Positive) represents the number that is predicted as non-pests but identified as pests by the model; FN (False Negative) represents the number that is predicted as pests but identified as non-pests by the model.

3.3. Ablation Experiment

This study improved the YOLOv8 network model and conducted statistical analysis on the results of each improvement to validate the effectiveness of the improved model in pest recognition. The results of the analysis are shown in Table 2.

In Table 2, S, A, and B represent the experimental results of the original YOLOv8 model after the incorporation of SIoU, AKConv, and BiFormer improvements, respectively. The results show that the SIoU improvement can enhance the Precision, Recall, and mAP50 of the original YOLOv8 model by 2.14%, 1.46%, and 0.63%, respectively. The AKConv improvement effectively enhanced the Parameters metric, reducing it by 9.21% compared to the original YOLOv8 model, and the model size was decreased by 0.9 G. Compared to the original model, the BiFormer improvement increased Precision and mAP50 by 1.89% and 1.18%, respectively, and integrated the attention mechanism into the original model without increasing computational load, effectively improving various metrics. After the overall improvement, the YOLOv8-SAB model outperformed the original model in all metrics; Parameters were reduced by 8.66%, and the model size was decreased by 0.2 G. Its Precision, Recall, mAP50, and mAP50-95 were increased by 4.15%, 2.50%, 2.62%, and 18.86%, respectively. The significant improvement in mAP50-95 indicates that the model’s average precision within the IoU threshold range from 50% to 95% is superior to other ablation experiment models, demonstrating the YOLOv8-SAB model’s high robustness when dealing with situations where the bounding box and the real box are highly overlapping. Therefore, the superior structure of the YOLOv8-SAB model is more effective in handling complex scenes and better accomplishes target detection tasks from the perspective of smart agricultural equipment.

In this study, Grad-CAM (Gradient-weighted Class Activation Mapping) technology was utilized to calculate the gradients of the decision layers of various models in the ablation experiment, aiming to evaluate their classification decision-making capabilities in image recognition tasks. Through the generated heatmaps, this study visually revealed the key discriminative areas identified by the models in correctly classified samples. As shown in Figure 11, the YOLOv8-SAB model’s heatmap corresponds significantly better with the real pest areas in the detection of positive samples than other models in the ablation experiment. To comprehensively assess the performance of the YOLOv8-SAB model, this study further conducted a Grad-CAM analysis of the model’s performance on negative samples. The analysis results indicate that although the model shows some bias in recognizing certain target features, its accuracy in locating real pest areas remains high. Compared to other models in the ablation experiment, the YOLOv8-SAB model’s Grad-CAM heatmaps still perform exceptionally well, with a higher degree of conformity to the real pest areas.

3.4. Loss Function Analysis

The loss function is one of the important criteria for evaluating model performance, as it quantifies the difference between the predicted values and the actual results. As shown in Figure 12, it can be observed that the YOLOv8-SAB model in this study exhibits a rapid decrease in loss during the training process, followed by a slower rate of decrease after the 40th epoch. After 240 epochs of training, the loss function curve gradually stabilizes and converges, indicating a well-behaved training process without overfitting. In comparison to the original YOLOv8, although the YOLOv8-SAB model has a slightly higher convergence value for the dfl_loss (deformable loss) in both the training and validation sets, it achieves good performance in terms of the mean box_loss (bounding box loss) and cls_loss (classification loss) values. Additionally, it demonstrates a smoother training process without oscillations.

3.5. Model Performance Analysis

As shown in Figure 13, the YOLOv8-SAB model proposed in this study achieves a Precision of 96.85%, Recall of 98.05%, and a balanced F1 score of 97.45%. Compared to the original YOLOv8 model, the YOLOv8-SAB model exhibits a 4.15% increase in Precision, a 2.50% improvement in Recall, and a 3.35% increase in the balanced F1 score. These results indicate that the YOLOv8-SAB model proposed in this study significantly outperforms the original YOLOv8 model in terms of detection performance, especially in terms of the notable improvement in Precision. The network model can efficiently and accurately identify and detect pests, demonstrating superior effectiveness and performance in practical applications.

3.6. Model Comparative Experiment

In this study, the YOLOv8-SAB network model’s AP for four different pests and its overall mAP were compared with the YOLOv10, YOLOv9, YOLOv8, YOLOv7, Faster-RCNN, and SSD network models. According to the data in Table 3, the AP values of the YOLOv8-SAB network model for the four different pests are 98.16%, 98.32%, 98.06%, and 98.03%, and its mAP reaches 98.16%, which is a 2.62% increase compared to the YOLOv8 network. Compared to the YOLOv10, YOLOv9, YOLOv7, Faster RCNN, and SSD models, the mAP of the YOLOv8-SAB network model increased by 3.12%, 4.34%, 5.44%, 16.54%, and 11.29%, respectively. This indicates that the YOLOv8-SAB network model is more accurate in predicting the location of targets, has better recognition effects for pests of different categories, and possesses higher confidence and reliability.

3.6.1. Model Detection Experiment

In this study, targeting the four types of pests Xyleborus fornicatus Eichhoff, Empoasca pirisuga Matsumura, Arboridia apicalis, and Toxoptera aurantii, efforts were made to ensure that the validation dataset differed significantly from the original dataset in terms of lighting conditions, pest numbers, background contexts, and the completeness of the pest bodies during external validation, in order to test the model’s generalization capabilities. The YOLOv8-SAB, YOLOv10, YOLOv9, YOLOv8, YOLOv7, Faster-RCNN, and SSD models were used to detect single and multiple targets. By comparing the detection results of different models, the performance of the YOLOv8-SAB model in this study was verified. The external validation dataset was collected from the tea garden at the back of Yunnan Agricultural University, and the experimental environment and parameter settings during the external validation were consistent with those of the training and testing platform. The specific final detection results of each model are compared as shown in Figure 14.

The results show that all seven models were able to detect single target pests with both complete and incomplete bodies under normal and low light conditions. Among the models, YOLOv8-SAB still performed the best in terms of confidence in image recognition, followed by YOLOv10 and YOLOv9, with the original YOLOv8 closely behind. Under the influence of two key variables, light intensity and the completeness of the pest bodies, the confidence of all models decreased to some extent, especially when these two conditions were present simultaneously, indicating that light intensity and body completeness have a significant impact on model detection. Among all models tested, YOLOv8-SAB stood out under dual-variable conditions, effectively reducing the problems of pest localization bias and repeated detection, with an average confidence level increased by more than 3%, 4%, and 2% compared to YOLOv10, YOLOv9, and YOLOv8, respectively. Additionally, in the comparison of different models, only YOLOv8-SAB and YOLOv10 were able to successfully detect multi-target pests with complete and incomplete bodies under normal and low light conditions, but YOLOv8-SAB had a higher confidence level and greater accuracy than the YOLOv10 model. In contrast, while models such as YOLOv8, YOLOv7, Faster-RCNN, and SSD could also perform detection, they experienced a decrease in confidence and exhibited varying degrees of omissions and misidentifications. In summary, the YOLOv8-SAB model significantly outperformed other models in pest detection, and the YOLOv10 and YOLOv9 models also showed considerable potential, providing a new direction for future research in the field of pest detection.

3.6.2. External Validation Comparison

In this study, compared to the original dataset, the pest samples in the external validation dataset are more diverse in appearance, including individuals at different growth stages and levels of completeness. In addition, the images in the validation dataset were captured under various lighting conditions, including cloudy days, rainy days, and low-light environments during the early morning and evening, which contrasts with the images in the original dataset that were mainly captured under normal lighting conditions. The number of pests in the dataset images also varies. These differences provide additional challenges for the model’s generalization ability. According to the results shown in Table 4, the external validation parameters of the model in this study include Precision, Recall, mAP values, and the average score F1. The YOLOv8-SAB model in this study uses the SIoU loss function as an alternative to the original loss function, introduces AKConv to replace some parts of the original model’s network structure, and also adds BiFormer. The results show that the Precision of the YOLOv8-SAB model has increased by 8.12%, 7.40%, 4.52%, and 15.98% compared to YOLOv10, YOLOv9, YOLOv8, and YOLOv7, respectively. The Recall has improved by 13.54%, 6.59%, 3.01%, and 9.68%, respectively, and the mAP values have increased by 3.04%, 4.27%, 2.98%, and 5.56%, respectively. The balanced score F1 has increased by 10.87%, 7.00%, 3.78%, and 13%, respectively. The validation results of this study show that despite the differences in pest appearance, lighting, and background between the external validation dataset and the original dataset, the YOLOv8-SAB model still shows high accuracy and robustness, further proving the effectiveness and practicality of the model. In summary, the YOLOv8-SAB model proposed in this study is significantly superior to other models in pest detection, and the YOLOv10 and YOLOv9 models also show considerable potential, providing a new direction for future research in the field of pest detection. This conclusion provides strong support for the deployment of intelligent recognition on agricultural drones, irrigation systems, and mobile edge devices.

4. Discussion

Pest infestations in tea gardens are one of the common issues in the process of tea cultivation. To achieve rapid and accurate identification of early-stage minor pests in tea gardens, this study proposes an improved YOLOv8 network model for tea pest detection, addressing issues such as small datasets and difficulty in extracting target pest phenotypic features in the tea pest detection process. The model significantly enhances detection accuracy and robustness by introducing advanced technologies such as the SIoU loss function, AKConv, and BiFormer. Experimental results show that the improved model has achieved a precision rate of 98.16% in tea pest detection tasks, which is a 2.62% increase compared to the original YOLOv8 network, demonstrating its efficiency and accuracy in tea pest detection. This improvement significantly enhances the model’s ability to recognize minor pests, which is of great significance for ensuring the healthy development of the tea industry, especially in the context of Yunnan’s ecological tea industry.

The improved YOLOv8 network model in this study not only enhances the model’s learning ability for small target pest samples in tea gardens but also improves the acquisition capability of target location information, thereby enhancing the model’s perceptual performance for targets. Compared with existing studies, it is noted that Solimani et al. [8] proposed a lightweight YOLOv8n-ShuffleNetv2-Ghost-SE model, which achieved an average precision of 91.4% and a recall rate of 82.6%, while this study has improved the average precision and recall rate by 6.76% and 15.45% respectively. Fuentes et al. [18] proposed a tomato pest and disease detection network that combines VGG and ResNet as deep feature extractors. However, the network model detected an average precision of only 85.98%, which is 12.18% lower than that of this study, indicating a higher detection accuracy in this research. Dai et al. [19] introduced the Swin Transformer mechanism and improved feature fusion strategy into the YOLOv5m model, achieving a recall rate of 93.1%, an F1 score of 94.38%, and an average precision of 96.4%, but this study has significantly improved by 4.95%, 3.07%, and 1.76% respectively. He [30] and others proposed a tea garden pest recognition method based on an improved YOLOv7 network, using the MPDIoU optimized loss function, which: improved the model’s convergence speed and simplified the calculation process; applied spatial and channel reconstruction convolution to reduce feature redundancy, reducing model complexity and computational cost; introduced a dual-route attention visual transformer, enhancing the model’s computational distribution flexibility and content-aware capability. The improved YOLOv7 model saw an increase in Precision, Recall, F1, and mAP compared to the original YOLOv7 by 5.68%, 5.14%, 5.41%, and 2.58%, respectively. However, the improved YOLOv8 model in this study effectively increased accuracy while reducing the number of parameters. Compared to the original YOLOv8 model, the improved YOLOv8 model detected four types of tea pests with AP values of 98.16%, 98.32%, 98.06%, and 98.03%, respectively, increasing by 2.53%, 2.76%, 2.69%, and 2.43%, with an overall mAP increase of 2.62%. After external data validation, the improved YOLOv8 network model’s Precision was increased by 8.12%, 7.40%, 4.52%, and 15.98% compared to YOLOv10, YOLOv9, YOLOv8, and YOLOv7, respectively. Recall was increased by 13.54%, 6.59%, 3.01%, and 9.68%, respectively; mAP values were increased by 3.04%, 4.27%, 2.98%, and 5.56%, respectively; the balance score F1 was increased by 10.87%, 7.00%, 3.78%, and 13%, respectively. Yang [50] and others proposed a tea garden pest detection model based on an improved YOLOv7-Tiny algorithm. After adding Biformer to the original YOLOv7 model, the model’s mAP0.5 increased from 88.6% to 91.6%. In this study, after adding Biformer to the original YOLOv8 model, the model’s mAP0.5, Precision, and Recall were 96.72%, 94.59%, and 95.54%, respectively, significantly higher than the YOLOv7-Tiny model for tea garden pest detection. The YOLOv7-Tiny model only had an average accuracy of 93.23%, which is 4.93% lower than the accuracy of pests detected in this study. The improved YOLOv8 network model in this study has shown stronger performance in feature extraction and target localization for tea pest detection, achieving higher detection accuracy through specially optimized loss functions and network structures. In addition, by introducing AKConv and BiFormer, the model has shown stronger performance in feature extraction and target localization, which are less involved in existing studies.

Although this study has performed well in the identification and detection of specific foreign targets, such as minor tea pests, there are still certain limitations. First, the model’s performance has mainly been tested under specific environmental conditions in Yunnan tea gardens, and its performance in other regions and different environmental conditions has not been fully verified, so there is a limitation in environmental adaptability. Secondly, the main types of pests studied are relatively limited, and the model’s recognition ability and accuracy for a wider range of pest types still need further testing and optimization.Furthermore, the limited size of the training dataset may constrain the model's generalization capability when it comes to identifying new types of pests. To enhance the model’s generalization, it is recommended to conduct more in-depth data augmentation processing on the dataset in this study. Specifically, the diversity of the dataset can be enriched by using techniques such as generative adversarial networks, feature fusion, oversampling, and undersampling, thereby enhancing the model’s recognition ability for different pest types. Based on this, the model’s performance may be affected by environmental changes, the diversity of pest types, and dataset biases. Therefore, potential future directions can be developed from the above content to further promote the development of tea pest detection technology and provide scientific and technological support for the sustainable development of the tea industry.

5. Conclusions

This study proposed an improved YOLOv8 network method aimed at key issues in the tea pest detection process, such as small datasets and difficulty in extracting target pest phenotypic features. This method is based on the YOLOv8 network framework and enhances the model’s learning ability for pest samples by replacing the original loss function with the SIoU loss function. At the same time, AKConv is introduced to replace some parts of the original model’s network structure to enhance feature extraction capabilities and reduce the number of model parameters. Additionally, BiFormer is added to make the model have more flexible computational allocation and improve the ability to obtain target location information, thereby enhancing the model’s perceptual performance for targets. Experimental results show that the improved model has achieved a precision rate of 98.16% in tea pest detection tasks, which is a 2.62% increase in detection accuracy compared to the original YOLOv8 network. The improvement measures have enhanced the model’s ability to recognize small target pest samples and improve the ability to obtain target location information, thereby enhancing the model’s perceptual performance for targets. The improved YOLOv8 model, while reducing the model size by 0.2 G, has increased its Precision, Recall, and mAP by 4.15%, 2.50%, and 2.62%, respectively, compared to the original model, demonstrating its efficiency and accuracy in tea pest detection.

Although this study has achieved significant results in the identification and detection of specific small target pests on tea, it may face some challenges in practical applications. For example, the model’s performance may be affected by environmental changes, the diversity of pest species, and dataset biases. Future directions for this study can explore the following areas. First, conduct model testing across regions and multiple environmental conditions to evaluate and improve the model’s adaptability and generalization capabilities. Second, increase the variety of pest samples to improve the model’s recognition ability and accuracy for different pests. In addition, explore more efficient model optimization strategies to reduce the demand for computing resources and adapt to the needs of edge computing and Internet of Things devices while maintaining or improving detection accuracy. Finally, deploy the model to actual tea garden monitoring systems to verify its performance under real-world conditions and explore the integration and application with smart devices such as agricultural drones and irrigation systems to achieve real-time monitoring and management of tea pests.

Author Contributions

Z.W. conceived the overall design and conceptualization of the study, drafted the manuscript, and completed the final revision; S.Z. and L.C. were responsible for the establishment, simulation, and data analysis of the network model; W.W. and H.W. undertook the collection of the dataset and external experimental verification; X.L. and Z.F. managed the manuscript editing and review revisions; B.W. provided financial support, project management, and review revisions. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (32060702), the Science and Technology Mission of Smart Tea Industry in Menghai County, Yunnan Province (202304BI090013), the Yunnan Tea Industry Artificial Intelligence and Big Data Application Innovation Team (202405AS350025), the research and development and demonstration project of data sensing technology and equipment for high-altitude mountain smart agriculture (202302AEO9002001).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data and image dataset provided in this study can be obtained from the correspondent.

Conflicts of Interest

The authors declare no potential conflict of interests.

References

Li, W.; Zhang, Q.; Fan, Y.; Cheng, Z.; Lu, X.; Luo, B.; Long, C. Traditional management of ancient pu'er teagardens in jingmai mountains in yunnan of china, a designated globally important agricultural heritage systems site. J. Ethnobiol. Ethnomed. 2023, 19, 26. [Google Scholar] [CrossRef] [PubMed]
Long, P.; Su, S.; Han, Z.; Granato, D.; Hu, W.; Ke, J.; Zhang, L. The effects of tea plant age on the color, taste, and chemical characteristics of yunnan congou black tea by multi-spectral omics insight. Food Chem. X 2024, 21, 101190. [Google Scholar] [CrossRef] [PubMed]
Liu, M.; Zhang, Z.; Liu, X.; Li, M.; Shi, L. Trend analysis of coverage variation in pinus yunnanensis franch. Forests under the influence of pests and abiotic factors. Forests 2022, 13, 412. [Google Scholar]
Wang, Y.; Xu, R.; Bai, D.; Lin, H. Integrated learning-based pest and disease detection method for tea leaves. Forests 2023, 14, 1012. [Google Scholar] [CrossRef]
Drew, L. The growth of tea. Nature 2019, 566, S2. [Google Scholar] [CrossRef]
Radhakrishnan, B. Pests and their management in tea. In Trends in Horticultural Entomology; Springer: Singapore, 2022; pp. 1489–1511. [Google Scholar]
Wei, Y.; Wen, Y.; Huang, X.; Ma, P.; Wang, L.; Pan, Y.; Wei, X. The dawn of intelligent technologies in tea industry. Trends Food Sci. Technol. 2024, 144, 104337. [Google Scholar] [CrossRef]
Solimani, F.; Cardellicchio, A.; Dimauro, G.; Petrozza, A.; Summerer, S.; Cellini, F.; Renò, V. Optimizing tomato plant phenotyping detection: Boosting YOLOv8 architecture to tackle data complexity. Comput. Electron. Agric. 2024, 218, 108728. [Google Scholar] [CrossRef]
Ma, B.; Hua, Z.; Wen, Y.; Deng, H.; Zhao, Y.; Pu, L.; Song, H. Using an improved lightweight YOLOv8 model for real-time detection of multi-stage apple fruit in complex orchard environments. Artif. Intell. Agric. 2024, 11, 70–82. [Google Scholar] [CrossRef]
Soeb, M.J.A.; Jubayer, M.F.; Tarin, T.A.; Al Mamun, M.R.; Ruhad, F.M.; Parven, A.; Meftaul, I.M. Tea leaf disease detection and identification based on YOLOv7 (YOLO-T). Sci. Rep. 2023, 13, 6078. [Google Scholar] [CrossRef]
Hou, J.; Yang, C.; He, Y.; Hou, B. Detecting diseases in apple tree leaves using FPN–ISResNet–Faster RCNN. Eur. J. Remote Sens. 2023, 56, 2186955. [Google Scholar] [CrossRef]
Kundur, N.C.; Mallikarjuna, P.B. Insect pest image detection and classification using deep learning. Int. J. Adv. Comput. Sci. Appl 2022, 13, 411–421. [Google Scholar] [CrossRef]
Faisal, M.S.A.B. A pest monitoring system for agriculture using deep learning. Res. Prog. Mech. Manuf. Eng. 2021, 2, 1023–1034. [Google Scholar]
Li, J.; Li, J.; Zhao, X.; Su, X.; Wu, W. Lightweight detection networks for tea bud on complex agricultural environment via improved yolo v4. Comput. Electron. Agric 2023, 211, 107955. [Google Scholar] [CrossRef]
Li, Y.; Ma, R.; Zhang, R.; Cheng, Y.; Dong, C. A tea buds counting method based on yolov5 and kalman filter tracking algorithm. Plant Phenomics 2023, 5, 30. [Google Scholar] [CrossRef]
Liu, J.; Wang, X. Tomato diseases and pest detection based on improved yolo v3 convolutional neural network. Front. Plant Sci 2020, 11, 898. [Google Scholar] [CrossRef]
Guo, S.; Yoon, S.; Li, L.; Wang, W.; Zhuang, H.; Wei, C.; Liu, Y.; Li, Y. Recognition and positioning of fresh tea buds using yolov4-lighted+ icbam model and rgb-d sensing. Agriculture 2023, 13, 518. [Google Scholar] [CrossRef]
Fuentes, A.; Yoon, S.; Kim, S.C.; Park, D.S. A robust deep-learning-based detector for real-time tomato plant diseases and pests recognition. Sensors 2017, 17, 2022. [Google Scholar] [CrossRef]
Dai, M.; Dorjoy, M.M.H.; Miao, H.; Zhang, S. A new pest detection method based on improved yolov5m. Insects 2023, 14, 54. [Google Scholar] [CrossRef]
Luo, F.; Zhou, T.; Liu, J.; Guo, T.; Gong, X.; Ren, J. Multiscale diff-changed feature fusion network for hyperspectral image change detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar] [CrossRef]
Yu, M.; Shi, J.; Xue, C.; Hao, X.; Yan, G. A review of single image super-resolution reconstruction based on deep learning. Multimed. Tools Appl. 2024, 83, 55921–55962. [Google Scholar] [CrossRef]
Qiao, S.; Chen, L.; Yuille, A. Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20 June 2021; pp. 10213–10224. [Google Scholar]
Chen, Y.; Xia, R.; Yang, K.; Zou, K. DNNAM: Image inpainting algorithm via deep neural networks and attention mechanism. Appl. Soft Comput. 2024, 154, 111392. [Google Scholar] [CrossRef]
Liu, Y.; Yang, F.; Hu, P. Small-object detection in uav-captured images via multi-branch parallel feature pyramid networks. IEEE Access 2020, 8, 145740–145750. [Google Scholar] [CrossRef]
Chen, Y.; Zheng, W.; Zhao, Y.; Song, T.H.; Shin, H. Dw-yolo: An efficient object detector for drones and self-driving vehicles. Arab. J. Sci. Eng 2023, 48, 1427–1436. [Google Scholar] [CrossRef]
Song, L.; Liu, M.; Liu, S.; Wang, H.; Luo, J. Pest species identification algorithm based on improved YOLOv4 network. Signal Image Video Process. 2023, 17, 3127–3134. [Google Scholar] [CrossRef]
Ganatra, N.; Patel, A. A multiclass plant leaf disease detection using image processing and machine learning techniques. Int. J. Emerg. Technol. 2020, 11, 1082–1086. [Google Scholar]
Jing, J.; Zhai, M.; Dou, S.; Wang, L.; Lou, B.; Yan, J.; Yuan, S. Optimizing the yolov7-tiny model with multiple strategies for citrus fruit yield estimation in complex scenarios. Agriculture 2024, 14, 303. [Google Scholar] [CrossRef]
Shah, S.A.; Lakho, G.M.; Keerio, H.A.; Sattar, M.N.; Hussain, G.; Mehdi, M.; Vistro, R.B.; Mahmoud, E.A.; Elansary, H.O. Application of drone surveillance for advance agriculture monitoring by android application using convolution neural network. Agronomy 2023, 13, 1764. [Google Scholar] [CrossRef]
He, J.; Zhang, S.; Yang, C.; Wang, H.; Gao, J.; Huang, W.; Wang, B. Pest recognition in microstates state: An improvement of YOLOv7 based on Spatial and Channel Reconstruction Convolution for feature redundancy and vision transformer with Bi-Level Routing Attention. Front. Plant Sci. 2024, 15, 1327237. [Google Scholar] [CrossRef]
Yin, P.; Dai, J.; Guo, G.; Wang, Z.; Liu, W.; Liu, X.; Chen, H. Residue pattern of chlorpyrifos and its metabolite in tea from cultivation to consumption. J. Sci. Food. Agric. 2021, 101, 4134–4141. [Google Scholar] [CrossRef]
Han, C.; Yan, B.; Yu, X.; Yang, M.; Webb, M.D. Three new species of the leafhopper genus Arboridia Zachvatkin (Hemiptera, Cicadellidae, Typhlocybinae), with a key and checklist to known species of China. ZooKeys 2024, 1196, 255–269. [Google Scholar] [CrossRef]
Lu, C.; Shen, N.; Jiang, W.; Xie, B.; Zhao, R.; Zhou, G.; Chen, W. Different tea germplasms distinctly influence the adaptability of Toxoptera aurantii (Hemiptera: Aphididae). Insects 2023, 14, 695. [Google Scholar] [CrossRef] [PubMed]
Jha, K.; Sakhare, A.; Chavhan, N.; Lokulwar, P.P. A Review on Image Enhancement Techniques using Histogram Equalization. Grenze Int. J. Eng. Technol. (GIJET) 2024, 10, 923–928. [Google Scholar]
Sugimoto, Y.; Imaizumi, S. An extension of reversible image enhancement processing for saturation and brightness contrast. J. Imaging 2022, 8, 27. [Google Scholar] [CrossRef]
Maurya, L.; Lohchab, V.; Mahapatra, P.K.; Abonyi, J. Contrast and brightness balance in image enhancement using Cuckoo Search-optimized image fusion. J. King Saud Univ. -Comput. Inf. Sci. 2022, 34, 7247–7258. [Google Scholar] [CrossRef]
Tang, C.; Zhu, Q.; Wu, W.; Huang, W.; Hong, C.; Niu, X. Planet: Improved convolutional neural networks with image enhancement for image classification. Math. Probl. Eng 2020, 2020, 1245924. [Google Scholar] [CrossRef]
Roy, J.; Saha, S. Ensemble hybrid machine learning methods for gully erosion susceptibility mapping: K-fold cross validation approach. Artif. Intell. Geosci. 2022, 3, 28–45. [Google Scholar] [CrossRef]
Malakouti, S.M.; Menhaj, M.B.; Suratgar, A.A. The usage of 10-fold cross-validation and grid search to enhance ML methods performance in solar farm power generation prediction. Clean. Eng. Technol. 2023, 15, 100664. [Google Scholar] [CrossRef]
Talib, M.; Al-Noori, A.H.; Suad, J. YOLOv8-CAB: Improved YOLOv8 for Real-time object detection. Karbala Int. J. Mod. Sci. 2024, 10, 5. [Google Scholar] [CrossRef]
Wu, T.; Dong, Y. Yolo-se: Improved yolov8 for remote sensing object detection and recognition. Appl. Sci. 2023, 13, 12977. [Google Scholar] [CrossRef]
Shi, H.; Hu, Y.; Zhang, H. An improved yolox loss function applied to maritime video surveillance. In Proceedings of the 2023 8th International Conference on Image, Vision and Computing (ICIVC), Dalian, China, 27–29 September 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 633–638. [Google Scholar]
Zhang, S.; Liu, Z.; Chen, Y.; Jin, Y.; Bai, G. Selective kernel convolution deep residual network based on channel-spatial attention mechanism and feature fusion for mechanical fault diagnosis. ISA Trans. 2023, 133, 369–383. [Google Scholar] [CrossRef]
Tong, G.; Shao, Y.; Peng, H. Learning local contextual features for 3d point clouds semantic segmentation by attentive kernel convolution. Vis. Comput. 2024, 40, 831–847. [Google Scholar] [CrossRef]
Oreski, G. YOLO* C—Adding context improves YOLO performance. Neurocomputing 2023, 555, 126655. [Google Scholar] [CrossRef]
Bahhar, C.; Ksibi, A.; Ayadi, M.; Jamjoom, M.M.; Ullah, Z.; Soufiene, B.O.; Sakli, H. Wildfire and smoke detection using staged YOLO model and ensemble CNN. Electronics 2023, 12, 228. [Google Scholar] [CrossRef]
Yacouby, R.; Axman, D. Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, Online, 20 November 2020; pp. 79–91. [Google Scholar]
Yang, Y.; Zhang, G.; Ma, S.; Wang, Z.; Liu, H.; Gu, S. Potted phalaenopsis grading: Precise bloom and bud counting with the PA-YOLO algorithm and multiviewpoint imaging. Agronomy 2024, 14, 115. [Google Scholar] [CrossRef]
Han, Y.; Wang, F.; Wang, W.; Li, X.; Zhang, J. YOLO-SG: Small traffic signs detection method in complex scene. J. Supercomput. 2024, 80, 2025–2046. [Google Scholar] [CrossRef]
Yang, Z.; Feng, H.; Ruan, Y.; Weng, X. Tea tree pest detection algorithm based on improved yolov7-tiny. Agriculture 2023, 13, 1031. [Google Scholar] [CrossRef]

Figure 1. Image acquisition mode. In (A): Represents the ecological tea garden in Chuxiong Yi Autonomous Prefecture, Yunnan Province. In (B): Represents the yellow sticky traps. In (C): Represents the image acquisition device.

Figure 2. Image enhancement results. In (A): Represents the original image. In (B): Represents the dataset image after applying image enhancement techniques.

Figure 3. 5-fold cross-validation.

Figure 4. Improved YOLOv8 network structure diagram.

Figure 5. SIoU structure diagram.

Figure 6. AKConv structure diagram.

Figure 7. Adaptive initial sampling shapes.

Figure 8. Offset adjusts the sample shape.

Figure 9. BiFormer structure diagram.

Figure 10. Visual analysis of pest annotation files.

Figure 11. Grad-CAM heatmaps for ablation experiments.

Figure 12. Comparison of loss function variation curves.

Figure 13. Curves depicting the variations of Precision, Recall, and F1 score.

Figure 14. Comparison of detection results among different models.

Table 1. Environmental configuration and parameter settings.

Configuration Parameter	Configuration Item
Operating system	Windows 10
CPU	Intel(R)CORE(TM)i7-11700
Internal memory	2933MHz DDR4 ECC
Solid state drive	M.2 1TB PCIe NVMe Class 50
GPU	NVIDIA RTX A6000
Compiled language	Python 3.9
Frame	PyCharm 2019
CUDA	CUDA Version: 12.2
Epochs	1000
Batch size	128

Table 2. Comparative results of ablation experiments.

Model	SIoU	AKConv	BiFormer	Precision (%)	Recall (%)	mAP50 (%)	mAP50-95 (%)	Parameters	FPS (ms)	GFLOPs
YOLOv8	×	×	×	92.70	95.55	95.54	225	3157200	38.9	8.9
YOLOv8-S	√	×	×	94.84	97.01	96.17	225	3157200	37.4	8.9
YOLOv8-A	×	√	×	92.63	93.86	95.84	243	2866356	35.4	8.0
YOLOv8-B	×	×	√	94.59	95.54	96.72	236	3028908	42.0	8.9
YOLOv8-SA	√	√	×	94.41	97.19	97.05	243	2866356	40.8	8.0
YOLOv8-SB	√	×	√	95.62	97.72	97.21	236	3028908	41.1	8.9
YOLOv8-AB	×	√	√	94.87	96.21	97.51	254	2883636	39.1	8.7
YOLOv8-SAB	√	√	√	96.85	98.05	98.16	254	2883636	42.4	8.7

Note: √. Use this algorithm; ×. not using this algorithm. S represents the SIoU improvement; A represents the AKConv improvement; B represents the BiFormer improvement.

Table 3. Comparison of AP values for different models.

Pest Names	AP
Pest Names	YOLOv8-SAB	YOLOv10	YOLOv9	YOLOv8	YOLOv7	Faster-RCNN	SSD
Xyleborus fornicatus Eichhoffr	98.16	94.76	93.67	95.63	92.76	81.84	87.06
Empoasca pirisuga Matumura	98.32	94.88	94.13	95.56	92.68	81.67	86.93
Arboridia apicalis	98.06	95.02	93.54	95.37	92.81	81.36	86.54
Toxoptera aurantii	98.03	95.50	93.94	95.60	92.63	81.61	86.95

Table 4. Comparison of external validation parameters among different models.

Model	Precision (%)	Recall (%)	mAP (%)	F1
YOLOv8-SAB	96.74	98.15	98.21	97.44
YOLOv10	88.62	84.61	95.17	86.57
YOLOv9	89.34	91.56	93.94	90.44
YOLOv8	92.22	95.14	95.23	93.66
YOLOv7	80.76	88.47	92.65	84.44
Faster-RCNN	73.54	79.76	78.42	76.52
SSD	84.61	78.11	85.19	81.23

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Z.; Zhang, S.; Chen, L.; Wu, W.; Wang, H.; Liu, X.; Fan, Z.; Wang, B. Microscopic Insect Pest Detection in Tea Plantations: Improved YOLOv8 Model Based on Deep Learning. Agriculture 2024, 14, 1739. https://doi.org/10.3390/agriculture14101739

AMA Style

Wang Z, Zhang S, Chen L, Wu W, Wang H, Liu X, Fan Z, Wang B. Microscopic Insect Pest Detection in Tea Plantations: Improved YOLOv8 Model Based on Deep Learning. Agriculture. 2024; 14(10):1739. https://doi.org/10.3390/agriculture14101739

Chicago/Turabian Style

Wang, Zejun, Shihao Zhang, Lijiao Chen, Wendou Wu, Houqiao Wang, Xiaohui Liu, Zongpei Fan, and Baijuan Wang. 2024. "Microscopic Insect Pest Detection in Tea Plantations: Improved YOLOv8 Model Based on Deep Learning" Agriculture 14, no. 10: 1739. https://doi.org/10.3390/agriculture14101739

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Microscopic Insect Pest Detection in Tea Plantations: Improved YOLOv8 Model Based on Deep Learning

Abstract

1. Introduction

2. Material and Methods

2.1. Image Collection

2.2. Image Preprocessing and Dataset Split

2.3. YOLOv8 Network Improvement

2.3.1. Improvement of the Loss Function

2.3.2. AKConv

2.3.3. BiFormer

3. Results and Analysis

3.1. Datasets

3.2. Experimental Environment and Parameter Settings

3.3. Ablation Experiment

3.4. Loss Function Analysis

3.5. Model Performance Analysis

3.6. Model Comparative Experiment

3.6.1. Model Detection Experiment

3.6.2. External Validation Comparison

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI