**1. Introduction**

China is the most important tea producer in the world, with a tea plantation area of 305.9 million hectares and annual output of 260.9 million tons, accounting for 62.6% and 50.1% of the global tea plantation area and output, respectively [1]. Famous and excellent tea is favored by the people because of its high drinking value and economic value [2]. High-quality tea usually has strict requirements on tenderness and number of leaves, and different grades of high-quality tea often have different requirements on the number of leaves and buds, usually famous and excellent tea picking only pick a bud and a leaf [3]. Famous and excellent tea picking has strong seasonality, short picking cycle and high labor intensity, which is a labor-intensive operation. With the rapid development of the tea industry, the contradiction between the timeliness of famous and excellent tea picking and the shortage of labor force of manual picking is increasingly prominent [4]. In recent years, some famous and excellent tea picking equipment was used for picking tea gardens, but it has some shortcomings, such as imprecise mechanical picking technology and poor quality mechanical tea [5]. In the complex environment of tea gardens, the rapid and accurate detection of young leaves of famous and excellent tea based on vision is the key task to realize automatic picking of famous and excellent tea.

Research on the bud detection of famous and excellent tea is mainly divided into two methods. The first method is the segmentation method based on the physical characteristics of famous and excellent tea [6–10], which mainly takes the shape, color, texture and other physical characteristics of famous and excellent tea as the basis for identifying and segmenting young leaves. Then, traditional methods such as threshold segmentation and watershed segmentation are used to separate and extract the tender leaves from the complex environment. This method is greatly affected by the environment and has a small scope of application. The other is the detection method based on neural network [11–14]. By training the marked famous and excellent tea dataset, the weight model is obtained and then used

**Citation:** Wang, Y.; Xiao, M.; Wang, S.; Jiang, Q.; Wang, X.; Zhang, Y. Detection of Famous Tea Buds Based on Improved YOLOv7 Network. *Agriculture* **2023**, *13*, 1190. https:// doi.org/10.3390/agriculture13061190

Academic Editors: Hongbin Pu and Filipe Neves Dos Santos

Received: 21 April 2023 Revised: 19 May 2023 Accepted: 2 June 2023 Published: 3 June 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

to detect the tender buds. At present, it is widely used in related agriculture and agronomy fields. YOLO (you only look once) is a target detection algorithm, which has high precision and high efficiency, and it can directly predict the location and attribute of the target in the whole image. Marco Sozzi et al. [15] applied target detection to yield prediction of white grape, compared detection effects of YOLOv3, YOLOv3-Tiny, YOLOv4, YOLOv4-Tiny, YOLOv5x and YOLOv5s, and finally found that the YOLOv5x model, considering bunch occlusion, was able to estimate the number of bunches per plant with an average error of 13.3% per vine. The YOLOv4-tiny model has a better combination of accuracy and speed, which should be considered for real-time grape yield estimation. YOLOv3 model is affected by a false positive–false negative compensation, which decreases the RMSE. Angelo Cardellicchio et al. [16] used the YOLOv5 model to test the phenotypic traits of tomato plants. The train used a challenging dataset acquired during a stress experiment conducted on multiple tomato genotypes, considering the particular challenges of the input images in terms of object size, similarity between objects and their color. The results demonstrated that the models achieve relatively high scores in identifying nodes, fruit and flowers. Dandan Wang et al. developed an accurate apple fruitlet detection method with small model size based on a channel pruned YOLOv5s deep learning algorithm [17]. The experimental results showed that the channel pruned YOLOv5s model provided an effective method to detect apple fruitlets under different conditions. The recall, precision, F1 score and false detection rate were 87.6%, 95.8%, 91.5% and 4.2%, respectively; the average detection time was 8 ms per image; and the model size was only 1.4 MB. It can be used to help growers optimize their orchard management. Compared with the traditional physical method, the deep learning algorithms have the advantages of high identification accuracy, strong robustness and less influence by environmental factors, so it is appropriate for the detection task of famous and excellent tea.

However, with the increase in researchers' attention, Wu et al. [18] found that YOLO has the disadvantage of insufficient frame positioning and difficult to distinguish overlapping detection objects. Famous and excellent tea has a small bud shape and high density, which also have the same problems in detection, and the emergence of attention mechanism can effectively settle the above problems. The attention mechanism can obtain a weight through module calculation and multiply it with input information to achieve the purpose of focusing on important information with high weight and ignoring irrelevant information with low weight. It directly establishes the dependency relationship between input and output without cycling, making the parallelization degree enhanced, the running speed greatly improved and the weight automatically adjusted. So that important information can be selected in different situations, it has higher scalability and robustness. It achieved good results in the detection of famous and excellent tea and other agricultural fields, and it was widely used in the optimization of the model. Liu Tianzhen et al. [19] added SE Block to the YOLOv3 network, and compared with the YOLOv3 model, the F1 score increased by 2.38 percentage points and mAP increased by 4.78 percentage points. Yang et al. applied CBAM Block to wheat detection, and the results showed that the model could effectively overcome the field environmental noise and achieve the accurate detection and counting of wheat ears with different density distributions [20]. The average accuracy of wheat ears detection increased to 94%, 96.04% and 93.11%, respectively. To compare the effect of SE, CBAM and ECA attention modules on the network in the YOLO v5 network model for the posture detection of meat geese, Liu Yingying et al. [21] proved that YOLOv5+ECA had better stability and was more suitable for the posture detection of meat geese in complex scenarios in farms. Fang Mengrui et al. added the CBAM module to YOLOv4-tiny adopted bidirectional feature pyramid network (BiFPN) to integrate feature information of different scales. It was found that the F1 score of the improved Yolov4-Tiny-tea model was 12.11, 11.66 and 6.76 percentage points higher than that of the YOLOv3, YOLOv4 and YOLOv5l network models, respectively [22]. Fu et al. introduced the channel attention-asymmetric spatial pyramid pool (CA-ASPP) module to improve the detection of weak and weak pod targets [23]. The precision of the improved YOLOv5 model increased by about 6%, and the

precision of POD number in the 200 soybeans population reached 88.14%. Bao et al. [11] proposed an improved AX-RetinaNet target detection and recognition network for automatic detection and recognition of tea diseases in natural scene images. AX-RetinaNet took the improved X-module multi-scale feature fusion module and added SE Block in the network. Compared with the original network, the mAP, recall rate and recognition accuracy increased by nearly 4%, 4% and nearly 1.5%, respectively. However, it was also found that adding the attention mechanism had the opposite effect for some networks, such as SSD and EfficientNet.

Through research and experiments, it can be found that SE Block [24], CBAM Block [25], ECA Block [26] and CA Block [27] have different degrees of improvement in the detection of different crops, and the improvement effect is related to the position in the model. However, there was no research to compare the effects of four kinds of attention mechanism modules in different positions in the YOLOv7 network on parameters such as the recognition accuracy rate and recall rate of famous and excellent tea.

Therefore, this study focused on the influence of SE Block, ECA Block, CBAM Block and CA Block on the recognition accuracy, recall rate and F1 score in different positions of YOLOv7 network for famous and excellent tea detection. The purpose of this study was to select the most suited network for the detection of famous and excellent tea by comparison.
