1. Introduction
Tea tree is an important economic crop in the world, favored by consumers in general. In many countries, tea production serves as a pillar industry [
1]. Due to the environmental stability and biodiversity of tea plantations, tea trees are vulnerable to various pests, which often affect the yield and quality of the tea, resulting in serious economic losses for tea farmers [
2]. The control methods for different pests vary. Accurate and rigorous identification of tea tree pests is therefore crucial for implementing appropriate pest management strategies, ensuring the healthy growth of tea trees, and producing high-quality tea.
With the advancement of artificial intelligence technology, machine learning and deep learning have gradually replaced manual detection methods as important approaches to crop pest detection and identification [
3]. The early pest identification models were developed by combining image processing and machine learning techniques, enabling the extraction of features for pest detection [
4]. Deng et al. [
5] employed the image saliency technique to extract regions of interest from pest images and combined it with Support Vector Machine (SVM) to identify tea tree pests, achieving a recognition rate of 85.5%. Lu et al. [
6] proposed an innovative semi-automatic detection model by analyzing the morphological differences among various locust species. This model integrates image segmentation, feature extraction, and SVM classification techniques to accurately identify both locust species and their developmental stages. Yang et al. [
7] introduced a novel image processing method that combines ensemble learning with multi-feature fusion across two color spaces, enabling precise recognition and counting of greenhouse pests. Although these pest identification methods achieve high accuracy, they still face several challenges. Their performance heavily relies on manual feature extraction, which can result in the loss of critical pest details. Moreover, these methods struggle to adapt to diverse environments and a wide range of pest categories, highlighting the need for further research to improve model generalization and robustness.
The performance of the pest identification model relies on features extracted from images. However, traditional models, which rely on manually extracted features, depend on subjective judgment and experience, making them unsuitable for real-world pest identification applications due to their limited performance and generalization capabilities. In recent years, deep learning and computer vision techniques have been widely applied in the field of image recognition [
8,
9,
10,
11,
12,
13,
14,
15]. Pest identification models developed using deep learning methods, which leverage convolutional mechanisms for feature extraction, have significantly improved both accuracy and robustness [
16,
17,
18]. For instance, Liu et al. [
19] used ensemble algorithms to integrate enhanced CNN models such as VGG16 and Inception-ResNet-v2, building a crop disease and pest recognition model with improved accuracy. Liu et al. [
20] further enhanced the YOLOv4 model by integrating triple attention mechanisms and a focal loss function. While this model achieved a recognition accuracy of 95.2% on a self-built tomato pest dataset, it struggled to recognize small pests in complex backgrounds and dense plant scenes. This limitation was largely attributed to the dataset’s simplistic background and insufficient consideration of pest size diversity.
To address the challenge of fine-grained agricultural pest recognition in images, Li et al. [
21] optimized the rotation invariance of CNN models through data augmentation strategies. This improvement addressed the poor recognition performance caused by multi-scale and variable pest postures in the field, resulting in high recognition accuracy for four types of rice pests in natural environments. Sun et al. [
22] expanded the shallow feature range of the YOLOv5s model, effectively addressing the issue of small pests being missed or misidentified in high-density environments. Zhu et al. [
23] proposed a method to refine multi-scale fusion features across different dimensions, enhancing the feature expression ability and eliminating conflicting information between different pest characteristics. This approach significantly improved the accuracy of soybean pest detection in complex environments. Tang et al. [
24] introduced the ECA mechanism and transformer encoder, combined with a novel cross-stage feature fusion approach, to overcome the limitations of real-time detection for small-scale pests. Hu et al. [
25] introduced a hybrid architecture of Transformer and multi-scale attention mechanisms into the YOLOX model, which significantly boosted the detection capabilities for small target pests. These studies have enhanced the models’ ability to extract multi-scale features, effectively capturing critical characteristics of small pests, thus reducing the rate of missed and false detections.
Although significant advancements have been made in pest recognition algorithms based on deep neural networks, practical agricultural applications demand models that maintain high precision while being easy to deploy. Current research into lightweight pest recognition methods has primarily focused on two areas: reducing training costs to improve deployment efficiency and designing lightweight modules to minimize computational resource consumption. Gan et al. [
26] leveraged transfer learning and attention mechanisms to improve the EfficientNet model, achieving efficient pest recognition with an accuracy of 69.5% while maintaining a sufficiently lightweight structure. Min and Wei [
27] developed a high-precision, lightweight real-time detection model for Tephritidae pests, with a size of only 2.4 MB, by integrating innovative Multicat and C2flite modules into YOLOv8 and optimizing the number and size of detection heads. Liang et al. [
28] proposed a lightweight model, GBW-YOLOv5, which reduced the model size by 66.7%. This model successfully met the stringent real-time requirements for multi-scale cotton pest detection in complex field environments. These studies demonstrate that deep learning-based pest recognition methods offer substantial advantages over traditional methods that rely on manual feature extraction.
However, most datasets used in existing research are limited in background complexity, as they are typically captured in laboratory environments with relatively simple backgrounds. This constraint reduces the model’s effectiveness when applied to complex real-world images. Moreover, accurate identifying pests presents additional challenges due to inter-species similarity; intra-species diversity; variable pests postures; and the complex backgrounds of a real tea garden, which include elements such as leaves, tree branches, and soil [
29,
30,
31]. These factors together test the robustness and accuracy of image recognition algorithms. To address the recognition challenges and detection omissions caused by complex backgrounds, we proposed a lightweight model called TTPRNet. This method is designed to effectively capture pest details at various scales, ensuring high precision and rapid pest identification in images with complex backgrounds.
Figure 1 shows the overall framework of the tea tree pest recognition model proposed in this paper. The main contributions of this paper are as follows: First, we propose a lightweight model, TTPRNet, capable of accurately and efficiently identifying multi-scale tea tree pests in complex environments, including varying light conditions and vegetation densities, while maintaining high accuracy and speed. Second, a novel network structure has been designed by replacing the traditional ELAN structure in the CSPDarknet53 backbone of the YOLOv7-tiny model with a parallel network composed of ConvNeXt and ELAN. This parallel structure extends the model’s receptive field and effectively prevents feature loss, thereby enhancing overall model performance. Third, the model performs well in detecting multi-scale pests and effectively identifying pests with high inter-specific similarity, such as those from the same family but different species. Fourth, this model not only improves the accuracy and speed of pest recognition but also integrates a pest counting feature, providing a basis for pest control decisions.
4. Discussion
It is difficult to detect tea tree pests under complex backgrounds in natural environments. Accurately extracting pest features becomes critical. Some recent studies have focused on improving the model’s multi-scale feature extraction capability to reduce background interference. Three newer models in the field of pest recognition were selected for comparison in this study. Qiang et al. [
42] used a dual backbone and fused deep and shallow features to improve the recognition performance of the SSD model; although achieving a
mAP of 86.01% on the citrus pest dataset, the method exhibits recognition errors when facing similar pests. Zhao et al. [
43] integrated the CBAM attention module into the YOLOv7 model to suppress distracting background information, allowing the model to focus more effectively on the pest region. In this study, this method achieved an integrated
mAP of 90.3%, which represents a 1.5% improvement over the original model but still falls short compared to the CA attention module integrated into our study. Xu et al. [
44] enhanced the model’s ability to capture multi-scale pests by employing convolutional kernels of different sizes, coupled with the Inception module to extract features at various scales in parallel. Their experiments on the rice pest dataset yielded a
mAP of 91.4%, but real-time applications were not considered.
Continuously optimizing the model architecture, our study successfully improves the model’s anti-interference ability under complex backgrounds, achieving a balance between accuracy and detection speed. As shown in
Table 12, the TTPRNet model achieves a
mAP of 92.8%. In comparison, the
mAP of the three models mentioned above is 6.79%, 4.6%, and 1.4% lower than the model established in this study, respectively. Additionally, the TTPRNet model shows a slight advantage in lightweight performance when comparing FPS and single-image detection time.
Additionally, we selected an image containing pests from the same genus but different species to compare the detection performance among the models. The recognition results are shown in
Figure 14.
Figure 14 presents the detection results of the ten models for three different scarabs, highlighting the TTPRNet model’s superior accuracy in detection and bounding box prediction compared to other models. The EfficientDet model performs poorly, failing to recognize the target object. In contrast, the CenterNet, YOLOXs, and YOLOX-tiny models were able to identify the scarabs but still experienced missed detections. Further observation reveals that the SSD, YOLOv5s, YOLOv7-tiny, YOLOv7, and YOLOv8n models were misclassified during the recognition process, incorrectly recognizing the
Miridiba sinensis on the right side of the image as
Holotrichia parallela. The SSD model not only misclassified the target but also exhibited omission. The YOLOv8n model encountered more serious issues, misclassifying both
Miridiba sinensis and
Anomala corpulenta Motschulsky as
Holotrichia parallela. Notably, the YOLOv5m model detected
Miridiba sinensis but exhibited a double misclassification: while correctly detecting
Miridiba sinensis, it also misclassified it as
Holotrichia parallela and additionally incorrectly identified the background plant area as
Apolygus lucorum. This dual misclassification was particularly evident in its results. On the other hand, the YOLOv8s model successfully detected
Miridiba sinensis but still experienced missed detections. Among all evaluated models, only TTPRNet correctly identified all targets without false or missed detections. Additionally, it displayed both the category and count of the pests in the upper left corner of the resultant figure, with one each of
Anomala corpulenta Motschulsky,
Holotrichia parallela, and
Miridiba sinensis, demonstrating its accuracy and reliability in target detection.
While the TTPRNet model demonstrates significant performance advantages in tea tree pest recognition, there remains room for improvement, particularly in terms of P compared to other models. This limitation in P may be attributed to two factors: category imbalance in the dataset and the IoU threshold settings.
Firstly, our constructed dataset exhibits significant variability in sample numbers across different categories, with some categories containing over 300 samples while others having only around 100. This imbalance poses challenges to the model’s generalization ability. Secondly, while the dataset comprises images of various tea tree pests, the total number of images per pest category remains relatively limited. To enrich our study, future efforts should focus on expanding this dataset with more diverse and extensive image samples. Finally, considering the diverse environmental conditions encountered in real-world applications, such as backlighting and adverse weather, future research should also include pest images captured under these non-ideal conditions. Integrating such images will enhance the robustness of the pest identification model, ensuring consistent pest recognition and categorization even under variable natural conditions, thereby improving the accuracy and reliability of pest monitoring and control.
Despite the limitations of the current study, the proposed model has demonstrated a high recognition accuracy, fast detection speeds, and low parameter requirements, indicating its potential for deployment on mobile devices. These features underscore the innovative and technologically advanced nature of the research and highlight its practical value in real-world applications, particularly in pest control for tea plantations. Rapid and accurate pest recognition on mobile devices can help tea farmers implement timely control measures, reducing the impact of pests on tea yield and quality, thereby promoting the sustainability of agricultural production.
5. Conclusions
In this study, we proposed a novel target recognition model named TTPRNet, designed to meet the need for accurate pest identification in complex tea garden environments. This model significantly enhances the ability to capture global information by incorporating the ConvNeXt architecture into the backbone, expanding the sensory field and enhancing performance in complex scenes. To further boost feature extraction and reduce background interference, the CA attention module was fused into the backbone’s output feature layer. This innovation significantly improved the model’s recognition accuracy in complex tea garden scenes.
Additionally, replacing ordinary convolution in the neck with GSConv convolution effectively reduced redundant information and enhanced the feature extraction efficiency. For bounding box regression, we employed the CIoU equal scale fusion NWD loss function, which accelerated network convergence and improved the localization accuracy. The experimental results demonstrate that the model achieved a mAP of 92.8% and 184.6 FPS in the pest detection tasks, significantly enhancing both recognition efficiency and accuracy compared to existing algorithms.
This study provides an efficient and accurate method for detecting pests in tea gardens, significant for developing scientific pest control strategies and promoting sustainable tea garden development. The innovative pest recognition model can effectively assist in the monitoring and control of tea pests, providing robust support for the success of tea plantations and contributing to the sustainable growth of the tea industry.