1. Introduction
Cabbage is one of the main vegetables in China [
1], with advantages such as high yield, ease of transportation, and durability for storage [
2]. Among all the procedures involved in cabbage production, the harvesting process is the most labor-intensive. At present, the level of agricultural mechanization in China has significantly improved, and with the rise of artificial intelligence technology in recent years, unmanned and automated cabbage harvesting is becoming a future trend [
3]. Recognizing cabbage heads is a crucial task in achieving automated, unmanned cabbage harvesting. During the unmanned crop harvesting process, knowing the location of the cabbage heads enables quick and accurate identification and positioning, which enhances harvesting efficiency and reduces damage. Therefore, recognizing cabbage heads during the harvest period is significantly meaningful for unmanned cabbage harvesting.
The current mechanical harvesting of cabbages heavily relies on A-B line navigation via GPS and BeiDou satellites [
4]. This method, while fulfilling most cabbage harvesting needs, still encounters significant drawbacks such as missed cabbages and physical damage to the crops. Consequently, a more precise recognition and localization method for individual cabbages during the harvesting period is necessitated to enhance harvesting accuracy and efficiency [
5].
Traditional fruit detection technologies majorly segment fruit images [
6] based on color and texture properties. However, the recognition rate of these methods is quite low in scenarios with little color difference, indistinct texture features, and complex backgrounds. With advancements in computer vision technology, deep learning algorithms like Faster R-CNN [
7], SSD [
8], and the YOLO series [
9] have been utilized for fruit detection and instance segmentation [
10]. These deep learning-based methods extract and classify features from input images to identify and locate fruits.
Nils Lüling et al. [
11] leveraged structured light, 3D reconstruction technology to generate depth information. They utilized the Mask R-CNN model for detecting and segmenting cabbage heads and leaves, yielding precise measurements for fruit volume and leaf area. This offers significant technical support for automated cabbage harvesting. Jianwei Yan et al. [
12] used the YOLOv3 model, augmented with a residual network, to detect Rosa roxburghii fruit in their dataset, achieving a recognition accuracy of 88.5% and a recall rate of 91.5%. Tianzhen Liu et al. [
13] introduced an extension to the YOLOv3 model by integrating SeNet’s SE block, thereby strengthening the model’s feature representation capabilities and subsequently increasing the detection accuracy of jujube fruits. Hui Zhao et al. [
14] enhanced the original YOLOv3 model by incorporating the spatial pyramid pooling (SPP) module, effectively amalgamating global and local feature contexts to improve both accuracy and recall rates for detecting small-sized fruits. Fangzhen Gao et al. [
15] advocated for a simplification of the convolutional layers and the residual structures of the YOLOv3 model’s backbone network, which increased the frame per second (FPS) performance from 152 in the baseline to 198, thereby improving the model’s detection speed to the detection of tomato fruits in orchard environments. Wei Cheng et al. [
16] proposed modifications to the loss function of the original YOLOv3 to refine tomato yield estimation in greenhouse environments. Their experiments demonstrated a 2.7% improvement in accuracy over the original model. Shenglian Lu et al. [
17] employed the YOLOv4 network to identify fruits on branches by introducing the CBAM and channel attention modules and improved spatial pyramid pooling methods. This approach provides technical support for harvesting fruits on the branches. Shuai Ma et al. [
18] employed the depthwise separable convolution structure to replace the convolution preceding and followed the SPP module based on YOLOv4. This resulted in a 44% reduction in the occupied space of the improved model compared to the original model. Similarly, Ning Wang et al. [
19] enhanced the YOLOv5s model using the Cr channel of the color space for pre-training, along with the incorporation of the Ghost module. This strategy significantly improved the detection accuracy for cucumber fruits. Tongbin Huang et al. [
20] demonstrate an improvement in the feature extraction capability of the network by introducing a CBAM attention mechanism module, which improves the detection of occluded citrus fruit targets with small targets. Guangyu Hou [
21], in using the YOLOv7 model, introduced the depthwise separable convolution layer to address the loss of information for small target objects due to pooling different residual layers during the detection of differentially obscured cherry tomatoes. Furthermore, Guiling et al. [
22] developed an apple instance segmentation model based on YOLACT for the identification and picking of apples on branches. Changyuan Liu [
23] utilized a depth camera to capture the depth images of fruit trees and employed a spherical fruit recognition and localization algorithm based on depth images, overcoming the issue of fruit recognition difficulty caused by illumination in traditional algorithms. Furthermore, Inkyu Sa et al. [
24] used an unsupervised deep neural network to generate synthetic NIR images and accomplished the detection of 11 types of fruits and crops. The aforementioned research provides valuable references for the identification of cabbages during the harvest period.
Currently, cabbages are primarily grown in open fields, resulting in a certain degree of complexity in the growing environment. Factors such as fluctuating illumination levels, weed cover, and erratic changes in weather conditions profoundly impact these models. Such conditions create visual discrepancies that obscure the essential features of cabbage heads, making accurate detection difficult. In addition, weeds contribute to visual clutter, and changes in weather conditions can alter the appearance of cabbage heads, thereby affecting the effectiveness of the models.
In current research on cabbage head detection, several challenges remain, including poor model generalization capability, limited adaptability to complex environments, loss of information when subjects are occluded, inaccurate detection of small targets, and slow identification speed [
25].
Addressing the aforementioned challenges, this paper presents an optimized YOLOv8n model for detecting ripened cabbage heads in complex environments, denoted as YOLOv8n-Cabbage. Focusing on enhancing detection accuracy and speed, the model incorporates several strategies, including an improvement in the backbone network, the introduction of the DyHead module, the optimization of the loss function, and the light-weighting treatment of the model. These refinements substantially improve the detection performance and robustness in recognizing ripened cabbage heads within complex environments, ensuring precise and efficient identification. The main contributions of this study are as follows:
Backbone Network Enhancement by ConvNeXt V2: The adoption of ConvNeXt V2 significantly augments the model’s proficiency in delineating distinctive features from cabbage head imagery, concurrently amplifying its learning efficacy in the face of intricate conditions.
Substitution of Detection Head by DyHead Modules: These modules enhance the model’s sensitivity to key feature extraction and improve its adaptability to various target features and transformations, thereby increasing the accuracy of object detection in multifaceted backgrounds.
Enhancement of Robustness by Slide Loss Function: The Slide Loss function enhances the robustness of the model, reduces background interference, and ultimately improves the overall performance of the model in complex environments. Additionally, it accelerates convergence and facilitates precise object localization.
Model Light-weighting: To guarantee the efficacy of the model, a light-weighting process was implemented, which markedly enhanced the model’s run-time efficiency and reduced the consumption of computational resources while maintaining the requisite degree of precision.
4. Conclusions and Discussion
This study addresses the prevailing challenges associated with the automated harvesting of cabbages, primarily the inability to perform real-time recognition. The advanced YOLOv8n-Cabbage model introduced here provides a refined method for the precise identification and localization of cabbage heads during harvest, thereby enhancing the accuracy and efficiency of the harvesting operations. To achieve the real-time and accurate detection of cabbage heads, we constructed a specialized dataset and, building on the YOLOv8n model, enhanced the backbone network, integrated a dynamic detection head, substituted the loss function, and implemented model lightweight processing techniques. The utility of the model is evidenced by its compact size of just 4.8 MB, achieving a precision rate of 91%, a recall rate of 87.2%, and a mAP50 of 94.5%. These metrics compellingly substantiate that the model tabled presents a feasible and innovative methodology for the automated production estimation and harvesting of cabbages.
Despite the considerable advances made in the field of the unmanned harvesting of cabbage, the limitations of the available data have prevented the current model from identifying and localizing all of the cabbage varieties. Consequently, in future work, it will be necessary to further expand the dataset to encompass a more diverse range of cabbage varieties and types to ensure its applicability to a wider range of cabbage varieties. Moreover, we will endeavor to enhance the model’s performance by optimizing the training parameters to increase its accuracy and efficacy.