Asparagus is recognized as a healthy vegetable with high nutritional value, ranking among the world’s “Top 10” famous vegetables and having the reputation of being the king of vegetables in the international market [
1,
2]. In 2020, China’s asparagus planting area and output ranked first in the world, with 1.501 million ha and 8.613 million t, accounting for 90.6% and 88.1% of the world’s total, respectively [
3]. The tender stems of asparagus are its edible parts. However, considerable manual labor is required for selective harvesting because of the inconsistency in the growth direction and maturity of the tender shoots. Asparagus production is currently facing a series of problems, such as high labor intensity, high operating cost, low degree of mechanization, and low production efficiency, which seriously restrict the sustainable development of the asparagus industry [
4]. Therefore, the realization of intelligent machine harvesting of asparagus has become an urgent need to promote its industrial development.
The recognition and detection technology of asparagus is considered the key technology to realize the intelligent machine harvesting of asparagus. Sakai et al. [
5] used the technology of a laser sensor to locate and detect asparagus, and the recognition success rate was 75%. Peebles et al. [
6,
7] compared the sensor technologies used for asparagus harvesting and investigated the method used for determining the position of the ridge surface under the asparagus harvesting scenario. Kennedy et al. [
8] proposed the concept of perceptual channels based on multiple cameras to achieve the localization of green asparagus. Peebles et al. [
9] compared the detection effects of Faster-RCNN and SSD algorithms for asparagus and determined that Faster-RCNN can obtain an F1 value of 0.73, which is more suitable for asparagus detection. Leu et al. [
10] developed a robot for the selective harvesting of green asparagus, which uses RGBD cameras to obtain 3D information on asparagus and ridge surface and realizes asparagus localization through clustering. To date, only a few studies on the recognition and detection of asparagus have been conducted worldwide, and two major problems in this field must be addressed: first, the success rate of asparagus recognition is low, the model algorithm needs to be further investigated, and the technical scheme of recognition and detection must be improved. Second, the current research on asparagus recognition and detection is aimed at the simple scenario of asparagus in spring (i.e., asparagus has no stalk in that season) but has not been applied to scenarios for complex environments in summer and autumn. In summer and autumn, the stems and leaves of asparagus are clustered, and the planting environment is more complicated. The shape and color of asparagus shoots and stems are similar, and numerous stacking conditions are involved. Asparagus sprouts are similar in color to leaves and weeds, and many leaves and weeds are obscured or fade into the background. Moreover, asparagus occupies a small proportion of the whole scene, so the detection becomes small target detection. Therefore, the recognition and detection technology of asparagus is difficult to develop.
In recent years, research on the visual recognition and detection technology of crops has received considerable attention from scholars across the world [
11,
12]. Related results have been applied in agricultural production and can provide meaningful references and solutions for asparagus recognition and detection. YOLO series [
13,
14,
15,
16] algorithms have been applied to the recognition and detection of apples, citrus, winter jujubes, bananas, popular teas, cucumber, corn seedlings, and flower buds, as well as pests and diseases, by many scholars worldwide because of its advantages of fast detection speed and high accuracy. Zhao et al. [
17] used the YOLOv3 deep convolutional neural network to enable apple pickers to identify and locate fruits under various conditions, such as occlusion, adhesion, and bagging under varying lighting conditions, around the clock. Lv et al. [
18] proposed an identification method of apple growth shape based on the improved YOLOv5 algorithm to facilitate the harvesting robot to adopt different harvesting attitudes. Xiong et al. [
19] proposed the multiscale convolutional neural network Des-YOLOv3, which realized the recognition and detection of ripe citrus in a complex environment at night. Liu et al. [
20] proposed a recognition method based on the improved YOLOv3, which realized fast and accurate recognition of winter jujube in natural scenarios. Fu et al. [
21] applied YOLOv4 to recognize banana bunches and stems in natural environments. Xu et al. [
22] compared the target detection effect of different backbone feature extraction networks based on YOLOv3 and selected DenseNet201 as the backbone feature to extract the network to realize the recognition and detection of popular teas. Bai et al. [
23] integrated the U-Net, YOLO, and SSD networks to achieve the target detection of cucumbers in complex natural environments. Quan et al. [
24] used the YOLOv4 convolutional neural network to achieve accurate identification of corn seedlings and farmland weeds in complex field environments. Li et al. [
25] used the improved YOLOv4 to realize the recognition of kiwifruit flowers and buds in preparation for automatic machine pollination. Qi et al. [
26] proposed an improved YOLOv5 algorithm based on the attention mechanism to realize the detection of tomato pests and diseases. Zhang et al. [
27] added ECA, hard-swish activation function, and focal loss function into YOLOX to detect cotton diseases and pets. Zhang et al. [
28] proposed a new method of CBAM + ASFF-YOLOXs for guiding agronomic operation based on the identification of key growth stages of lettuce. Nan et al. [
29] used NSGA-II-based pruned YOLOv5l to detect green pepper quickly in the field environment. Xu et al. [
30] improved the citrus fruit detection accuracy for a picking robot in complex scenes and enables real-time picking operations by proposing a novel detection method called high-precision and lightweight YOLOv4. Fan et al. [
31] realized real-time defects detection for apple sorting using NIR cameras with a pruning-based YOLOv4 network. Thus far, no research on the YOLO algorithm in asparagus recognition and detection has been reported worldwide, and the recognition accuracy of the algorithm must be further improved and perfected according to different application scenarios.
In summary, this study proposed an improved YOLOv5 algorithm for the recognition and detection of green asparagus in complex environments. The coordinate attention (CA) mechanism was added to the backbone feature extraction network [
32]. Such improvements can enhance the feature learning capability and strengthen the attention to the feature information of asparagus. The PANet was modified to BiFPN [
33,
34,
35], which can strengthen the fusion of features, reduce the loss of feature information, and improve the detection effect of the model. Meanwhile, comparative experiments and verification of different algorithms were conducted.