1. Introduction
China is the world’s largest producer and consumer of tea, with a wide variety of tea varieties, and holds a high position among the world’s top tea suppliers [
1,
2]. With the improvement in people’s quality of life, tea, which contains rich active compounds, has gradually become a popular beverage [
3]. According to statistics, the tea plantation area in China exceeds 3.27 million hectares, with an annual output of over 3.1 million tons, accounting for more than 40% of the world’s tea production and ranking first in the world [
4]. But currently, the picking of tea buds is facing a challenge. Currently, tea bud picking is mainly performed manually, resulting in low picking efficiency and high costs. Traditional tea-picking machines are unable to accurately locate tea buds and can cause tree branches and leaves to break, which not only affects the subsequent growth of tea trees but also reduces the overall quality of the picked tea buds [
5,
6]. Therefore, accurate target detection of tender buds is the foundation of automated picking of premium tea. However, the variety of tea leaves, the similarity between tea buds and tea characteristics, and the small target size make it easy to be obscured, making tea bud localization and harvesting more challenging. If applied to other similar agricultural scenarios, tea bud target detection becomes more practically valuable.
Traditional image processing research mainly distinguishes tea buds from the background based on features such as color differences, texture, and shape. In the study by Zhang et al. [
7], adaptive thresholding is performed using the blue component of the image, which is combined with the green component of the image to form a new grayscale image. The contrast between tender buds and leaves is improved through linear transformation, and a watershed segmentation algorithm is used to locate and identify tea buds. In the study by Huang et al. [
8], a partial differential equation model is used to filter out noise in the image, and the watershed algorithm and OTSU algorithm are used to segment the preprocessed image. In the study by Shao et al. [
9], histogram equalization was performed using tea bud images. Then, k-means clustering analysis was performed on the saturation components under the HIS color model to obtain the best value of k. In the study by Karunasena et al. [
10], gradient histograms of positive and negative samples were trained on tender shoots of different lengths using a Cascade classifier. In the study by Li et al. [
11], RGB images of tea buds are extracted, and the LBP/C algorithm is used to extract the texture and shape of tender buds, combined with a support vector machine tea bud localization recognition algorithm to complete model training. However, traditional machine learning algorithms are subject to excessive human intervention in feature extraction, which makes it easy to overlook some features.
In recent years, deep learning technology has been continuously developed and iterated, and through semantic segmentation and object detection algorithms, it can detect and recognize crops by applying these methods to related agricultural fields. In the study by Chen et al. [
12], the Faster R-CNN is used to detect the one-bud and two-leaf regions in the image, and then the trained fully convolutional network is used to identify the one-bud and two-leaf regions and locate the picking point. In the study by Yan et al. [
13], a tea bud picking point localization method based on Mask R-CNN is proposed, which segments and locates tea buds by training a model to predict picking point positions. In the study by Xu et al. [
14], YOLOv3, which has the advantage of fast detection, and DeniseNet201, which has high-
precision classification ability, are used to achieve precise localization of tea buds. In the study by Li et al. [
15], an attention module is embedded in the YOLOv4 network, and the penalty index is redefined using the SIoU loss function. This improves detection accuracy while reducing model size and lowering the deployment cost and difficulty of the robot vision module. In the study by Shuai et al. [
16], in the YOLOv5 network, Bottleneck Transformers are used as residual modules, combined with CARAFE and attention mechanisms, to create long-range dependencies on tea bud feature images. In the study by Xie et al. [
17], the Tea-YOLOv8s model combines deformable convolutions, attention mechanisms, and improved Spatial Pyramid Pooling, thereby resulting in improved detection
precision. At present, most methods for tea bud target detection are based on original models, and dataset creation is mostly based on experimental fields, which results in poor applicability to actual tea planting environments. Deep learning-based tea bud object detection still has disadvantages such as low detection efficiency, large model size, poor accuracy, and difficulty in applying it to actual tea-picking scenarios.
On the one hand, most of the above papers are based on the picking scene of tea gardens with flat terrain. Therefore, the growth environment of tea trees is similar and the characteristics of tea buds are uniform, resulting in poor detection of tea buds in complex environments. Unlike the dataset production mentioned above, the dataset production scenario in this article is created from mountainous areas. Due to the influence of lighting and obstruction, there are significant differences in the growth of tea trees. Therefore, the model trained on the dataset in this article is more capable of handling complex scenarios. On the other hand, in order to improve the precision, recall, and mean average precision of the model, this paper optimizes the focus blur of the shooting equipment and addresses the problem of different tea bud sizes. At the same time, it can ensure that the model can be easily deployed to picking equipment. The specific contributions are as follows:
(1) The dataset was collected in mountainous areas, and data were collected on spring tea under different growth conditions on sunny and rainy days. Data for one bud, one bud with one leaf, and one bud with two leaves were annotated based on the size of the tender buds.
(2) SimSPPF is introduced into the backbone network, replacing serial connections with parallel connections based on SPP. Additionally, the activation function is replaced with a simpler ReLU to improve the computational efficiency of the model and facilitate deployment on picking equipment.
(3) To solve the problem of blurred focus in shooting equipment, a BiFPN bidirectional connection mechanism is used in the neck network to enhance the network’s feature expression and improve the model’s accuracy in identifying tea buds with blurred focus.
(4) Due to the terrain factors of tea tree planting, some tea bud targets are too small. Replacing traditional convolution modules with ODConv in the neck allows the model to focus on more dimensions of information, thereby improving the accuracy of small target tea bud detection.
4. Conclusions
For machine equipment, the tea bud recognition model not only needs to consider the inference speed of the model but also the accuracy of the model, which is very important. In the future, when the model needs to be deployed to mobile devices, there will inevitably be disadvantages such as low hardware computing power and poor performance. Therefore, it is necessary to balance the accuracy and speed of tea bud detection. To adapt to real-world tea-picking scenarios, this study conducts a series of optimizations on YOLOv5. The precision of the tea bud model is 84.5%, the recall is 74.1%, mAP is 83.7%, and the model size is only 14.9 MB, making it convenient to deploy detection models on mobile devices with limited computing power and storage space. Compared with other mainstream models, the improved model in this paper has the advantages of less computation, smaller model size, and higher detection accuracy. In this study, the dataset collected tea bud images from various environments, including different lighting conditions and angles, which made the model robust. This article proposes a new optimization direction for the tea bud detection algorithm and achieves good performance in an experimental structure, providing certain assistance for the development of intelligent tea picking. The follow-up work will be based on optimizing the picking model of premium tea and continuously improving the detection speed and accuracy.
In future research, we will rely on the Anhui Province Forest Crop Intelligent Equipment Engineering Platform to collect tea bud data for different tea trees and seasons, enriching the depth of the research topic. At the same time, we will pay attention to more evaluation indicators to avoid imbalanced datasets. Firstly, we will determine the picking time for green tea, white tea, and black tea throughout the year and collect data on the corresponding tea leaves during the corresponding time periods. Based on multiple evaluation indicators such as P, R, mAP, and F1, the optimal model will be determined comprehensively, while ensuring that the model is lightweight enough to achieve higher generalization performance. Finally, based on the use of our platform in real tea-picking scenarios, the model will be dynamically adjusted according to the situation.