Recognition Method of Cabbage Heads at Harvest Stage under Complex Background Based on Improved YOLOv8n

Tian, Yongqiang; Zhao, Chunjiang; Zhang, Taihong; Wu, Huarui; Zhao, Yunjie

doi:10.3390/agriculture14071125

Open AccessArticle

Recognition Method of Cabbage Heads at Harvest Stage under Complex Background Based on Improved YOLOv8n

by

Yongqiang Tian

¹

,

Chunjiang Zhao

^2,*,

Taihong Zhang

^1,3,4,

Huarui Wu

^2,5 and

Yunjie Zhao

^1,3,4,*

¹

School of Computer and Information Engineering, Xinjiang Agricultural University, Urumqi 830052, China

²

National Engineering Research Center for Information Technology in Agriculture, Beijing 100125, China

³

Ministry of Education Engineering Research Center for Intelligent Agriculture, Urumqi 830052, China

⁴

Xinjiang Agricultural Informatization Engineering Technology Research Center, Urumqi 830052, China

⁵

Key Laboratory of Digital Village Technology, Ministry of Agriculture and Rural Affairs, Beijing 100125, China

^*

Authors to whom correspondence should be addressed.

Agriculture 2024, 14(7), 1125; https://doi.org/10.3390/agriculture14071125

Submission received: 20 May 2024 / Revised: 26 June 2024 / Accepted: 9 July 2024 / Published: 11 July 2024

(This article belongs to the Special Issue Current and Future Application of Computer Vision and Data Analysis in Smart Agriculture and Agroforestry)

Download

Browse Figures

Versions Notes

Abstract

:

To address the problems of low recognition accuracy and slow processing speed when identifying harvest-stage cabbage heads in complex environments, this study proposes a lightweight harvesting period cabbage head recognition algorithm that improves upon YOLOv8n. We propose a YOLOv8n-Cabbage model, integrating an enhanced backbone network, the DyHead (Dynamic Head) module insertion, loss function optimization, and model light-weighting. To assess the proposed method, a comparison with extant mainstream object detection models is conducted. The experimental results indicate that the improved cabbage head recognition model proposed in this study can adapt cabbage head recognition under different lighting conditions and complex backgrounds. With a compact size of 4.8 MB, this model achieves 91% precision, 87.2% recall, and a mAP@50 of 94.5%—the model volume has been reduced while the evaluation metrics have all been improved over the baseline model. The results demonstrate that this model can be applied to the real-time recognition of harvest-stage cabbage heads under complex field environments.

Keywords:

cabbage; recognition and localization; object detection; deep learning; automatic harvesting

1. Introduction

Cabbage is one of the main vegetables in China [1], with advantages such as high yield, ease of transportation, and durability for storage [2]. Among all the procedures involved in cabbage production, the harvesting process is the most labor-intensive. At present, the level of agricultural mechanization in China has significantly improved, and with the rise of artificial intelligence technology in recent years, unmanned and automated cabbage harvesting is becoming a future trend [3]. Recognizing cabbage heads is a crucial task in achieving automated, unmanned cabbage harvesting. During the unmanned crop harvesting process, knowing the location of the cabbage heads enables quick and accurate identification and positioning, which enhances harvesting efficiency and reduces damage. Therefore, recognizing cabbage heads during the harvest period is significantly meaningful for unmanned cabbage harvesting.

The current mechanical harvesting of cabbages heavily relies on A-B line navigation via GPS and BeiDou satellites [4]. This method, while fulfilling most cabbage harvesting needs, still encounters significant drawbacks such as missed cabbages and physical damage to the crops. Consequently, a more precise recognition and localization method for individual cabbages during the harvesting period is necessitated to enhance harvesting accuracy and efficiency [5].

Traditional fruit detection technologies majorly segment fruit images [6] based on color and texture properties. However, the recognition rate of these methods is quite low in scenarios with little color difference, indistinct texture features, and complex backgrounds. With advancements in computer vision technology, deep learning algorithms like Faster R-CNN [7], SSD [8], and the YOLO series [9] have been utilized for fruit detection and instance segmentation [10]. These deep learning-based methods extract and classify features from input images to identify and locate fruits.

Nils Lüling et al. [11] leveraged structured light, 3D reconstruction technology to generate depth information. They utilized the Mask R-CNN model for detecting and segmenting cabbage heads and leaves, yielding precise measurements for fruit volume and leaf area. This offers significant technical support for automated cabbage harvesting. Jianwei Yan et al. [12] used the YOLOv3 model, augmented with a residual network, to detect Rosa roxburghii fruit in their dataset, achieving a recognition accuracy of 88.5% and a recall rate of 91.5%. Tianzhen Liu et al. [13] introduced an extension to the YOLOv3 model by integrating SeNet’s SE block, thereby strengthening the model’s feature representation capabilities and subsequently increasing the detection accuracy of jujube fruits. Hui Zhao et al. [14] enhanced the original YOLOv3 model by incorporating the spatial pyramid pooling (SPP) module, effectively amalgamating global and local feature contexts to improve both accuracy and recall rates for detecting small-sized fruits. Fangzhen Gao et al. [15] advocated for a simplification of the convolutional layers and the residual structures of the YOLOv3 model’s backbone network, which increased the frame per second (FPS) performance from 152 in the baseline to 198, thereby improving the model’s detection speed to the detection of tomato fruits in orchard environments. Wei Cheng et al. [16] proposed modifications to the loss function of the original YOLOv3 to refine tomato yield estimation in greenhouse environments. Their experiments demonstrated a 2.7% improvement in accuracy over the original model. Shenglian Lu et al. [17] employed the YOLOv4 network to identify fruits on branches by introducing the CBAM and channel attention modules and improved spatial pyramid pooling methods. This approach provides technical support for harvesting fruits on the branches. Shuai Ma et al. [18] employed the depthwise separable convolution structure to replace the convolution preceding and followed the SPP module based on YOLOv4. This resulted in a 44% reduction in the occupied space of the improved model compared to the original model. Similarly, Ning Wang et al. [19] enhanced the YOLOv5s model using the Cr channel of the color space for pre-training, along with the incorporation of the Ghost module. This strategy significantly improved the detection accuracy for cucumber fruits. Tongbin Huang et al. [20] demonstrate an improvement in the feature extraction capability of the network by introducing a CBAM attention mechanism module, which improves the detection of occluded citrus fruit targets with small targets. Guangyu Hou [21], in using the YOLOv7 model, introduced the depthwise separable convolution layer to address the loss of information for small target objects due to pooling different residual layers during the detection of differentially obscured cherry tomatoes. Furthermore, Guiling et al. [22] developed an apple instance segmentation model based on YOLACT for the identification and picking of apples on branches. Changyuan Liu [23] utilized a depth camera to capture the depth images of fruit trees and employed a spherical fruit recognition and localization algorithm based on depth images, overcoming the issue of fruit recognition difficulty caused by illumination in traditional algorithms. Furthermore, Inkyu Sa et al. [24] used an unsupervised deep neural network to generate synthetic NIR images and accomplished the detection of 11 types of fruits and crops. The aforementioned research provides valuable references for the identification of cabbages during the harvest period.

Currently, cabbages are primarily grown in open fields, resulting in a certain degree of complexity in the growing environment. Factors such as fluctuating illumination levels, weed cover, and erratic changes in weather conditions profoundly impact these models. Such conditions create visual discrepancies that obscure the essential features of cabbage heads, making accurate detection difficult. In addition, weeds contribute to visual clutter, and changes in weather conditions can alter the appearance of cabbage heads, thereby affecting the effectiveness of the models.

In current research on cabbage head detection, several challenges remain, including poor model generalization capability, limited adaptability to complex environments, loss of information when subjects are occluded, inaccurate detection of small targets, and slow identification speed [25].

Addressing the aforementioned challenges, this paper presents an optimized YOLOv8n model for detecting ripened cabbage heads in complex environments, denoted as YOLOv8n-Cabbage. Focusing on enhancing detection accuracy and speed, the model incorporates several strategies, including an improvement in the backbone network, the introduction of the DyHead module, the optimization of the loss function, and the light-weighting treatment of the model. These refinements substantially improve the detection performance and robustness in recognizing ripened cabbage heads within complex environments, ensuring precise and efficient identification. The main contributions of this study are as follows:

Backbone Network Enhancement by ConvNeXt V2: The adoption of ConvNeXt V2 significantly augments the model’s proficiency in delineating distinctive features from cabbage head imagery, concurrently amplifying its learning efficacy in the face of intricate conditions.

Substitution of Detection Head by DyHead Modules: These modules enhance the model’s sensitivity to key feature extraction and improve its adaptability to various target features and transformations, thereby increasing the accuracy of object detection in multifaceted backgrounds.

Enhancement of Robustness by Slide Loss Function: The Slide Loss function enhances the robustness of the model, reduces background interference, and ultimately improves the overall performance of the model in complex environments. Additionally, it accelerates convergence and facilitates precise object localization.

Model Light-weighting: To guarantee the efficacy of the model, a light-weighting process was implemented, which markedly enhanced the model’s run-time efficiency and reduced the consumption of computational resources while maintaining the requisite degree of precision.

2. Materials and Methods

2.1. Image Data Acquisition

In this paper, the primary research data consist of images of cabbage heads harvested at the maturity stage of the ‘ZhongGan-21’ cabbage variety. Field collection was conducted at Golden Sun Farm in the Changping District, Beijing, China, the National Precision Agriculture Research Demonstration Base in Xiaotangshan, Beijing, China, and the Experimental Base in Yunhe District, Cangzhou City, Hebei Province, China. Over 6000 RGB images of cabbage heads were collected by a Phone 11 Pro Max, manufactured by Foxconn in Zhengzhou, China (featuring a 12MP primary sensor, 1/2.55” size CMOS, the camera was set to a 1:1 aspect ratio, with flash and exposure compensation deactivated) under various weather conditions (sunny, after rain, and cloudy), varying lighting conditions, different states, and different angles during the harvest period. After screening, 6000 images with a resolution of 3024 × 3024 were obtained. To further enhance the dataset’s diversity, a crawler script was utilized in this study to acquire over 20,000 images of cabbage heads from the internet with varying backgrounds. Following the filtration of data that did not meet the experimental requisites, 4000 images were selected from the image sets. The selected images have undergone a thorough manual examination to determine their authenticity by examining watermarks and metadata details, as well as investigating their provenance and processing. These selections followed specific criteria: 1. inclusion of one or more cabbage heads at the harvesting stage; 2. allowance for damaged cabbage heads, provided that at least one-third remained intact, thus excluding completely shattered heads; and 3. prioritization of cabbage head images aligned with actual harvesting conditions. After filtering out data that did not meet the experimental requirements, 4000 images were selected, resulting in a total of 10,000 images in the dataset. Figure 1 shows the images in the cabbage dataset under different lighting conditions, different states, and from different sources against complex backgrounds.

2.2. Data Preprocessing

This article conducts a uniform-size operation on the pre-filtered dataset, processing all the data into a resolution of 640 × 640. The SAM [26] (Segment Anything Model) model-assisted annotation and manual annotation are carried out on all images in the dataset through X-AnyLabeling v1.0.0. A rectangle tool is used to label the cabbage heads in all datasets during annotation, which would form the minimum external rectangle for each cabbage head. Subsequently, the annotated files are converted to the standard format for the YOLO dataset utilizing scripting tools. To ensure that the distribution of samples under different conditions in the dataset is representative and consistent, we performed a categorical sampling of cabbage head images under different conditions according to the ratio of 6:2:2 and divided the dataset into a training subset, a testing subset, and a validation subset. That is, 6000 images in the training set and 2000 images each for the test and validation sets. The training set is used for parameter training, the test set is used for model evaluation, and the validation set is used for hyperparameter tuning to enhance the accuracy and speed of model recognition.

2.3. Model Architecture

YOLOv8 [27] is a single-stage network model that primarily consists of three main parts: the backbone, the neck, and the head. Specifically, the backbone serves as the feature extraction network, mainly responsible for extracting information from images. The neck integrates the feature information obtained from the backbone layer, and finally, the task of bounding box location and object classification is carried out by the head section, providing the final detection results. Depending on the differences in network structure and scale, YOLOv8 was divided into multiple versions. In field automated cabbage harvesting, there is a focus on the real-time cabbage head localization and identification method. Therefore, YOLOv8n was selected as the baseline model to meet the real-time detection needs essential for harvesting activities.

2.3.1. Backbone

The backbone network constitutes the main part of the model, primarily responsible for extracting features from the input images. Because of the typically large number of layers and parameters, the backbone network can capture high-level feature representations of the images. Thus, improvements to the backbone network can enhance the model’s accuracy in object detection tasks. To improve the performance in detecting ripened cabbages under complex conditions, we replaced the existing backbone network with ConvNeXt V2 [28,29]. ConvNeXt V2, building upon the foundation of ConvNeXt, investigates the feature space of ConvNeXt under different training configurations. It was found that when directly training ConvNeXt on masked inputs, the MLP layer encounters a potential feature collapse issue. Therefore, ConvNeXt V2 incorporates a Fully Convolutional Masked Autoencoder (FCMAE, Figure 2) and Global Response Normalization (GRN, Figure 3) into the ConvNeXt architecture, enhancing the competitiveness of the features among channels and subsequently improving the performance on benchmark recognition tasks. Through these enhancements to the backbone network, the model is adept at more effectively extracting and utilizing image features, thereby bolstering the accuracy of ripened cabbage detection in complex environments.

2.3.2. Head

The head is responsible for predicting the locations of the object bounding boxes and performing classification tasks as the final component of the object detection module. Traditional network structures may not meet the demands of precision when faced with diverse and intricate object detection scenarios, such as detecting cabbage heads in outdoor fields. Therefore, this study proposes a DyHead module [30]. The DyHead module (Figure 4) is designed to enhance the model’s adaptability to varied target characteristics. It can dynamically adjust the structure of the detection head in response to factors such as morphological changes, scale mutations, angular shifts, and background complexity related to the object. Furthermore, the DyHead module utilizes a multi-dimensional attention mechanism that includes scale awareness, spatial awareness, and task awareness features. This enhances the model’s object detection head’s representational capability without any increase in computational complexity. Using the DyHead module instead of the original decoupled head in the YOLOv8 algorithm enhances the precision and adaptability of cabbage detection in complex environments, thereby bolstering the model’s accuracy in identifying object specifics.

2.3.3. Improved Loss Function

Loss computation in the object detection task involves two processes: positive and negative sample strategy and loss computation. The latter includes two branches: classification and regression branches. The cross-entropy loss function is widely utilized in machine learning for addressing classification tasks, especially pertinent to binary and multiclass contexts. This function serves to quantify the conformity between the probability distribution outputted by a model and the actual labels. In binary classification scenarios, it is also referred to as Log Loss. This loss function combines the Sigmoid and binary cross-entropy loss function, resulting in higher numerical stability and more efficient calculation compared to using the simple Sigmoid and binary cross-entropy loss function.

The loss function is calculated as follows:

L o s s = - [y \cdot l o g (sigmoid (x)) + (1 - y) \cdot l o g (1 - sigmoid (x))]

(1)

where x denotes the logits output by the model, y is the label, which has a value of either 0 or 1, and the formula for the sigmoid function is shown below:

sigmoid (x) = \frac{1}{1 + e x p (- x)}

(2)

The dataset’s construction is influenced by the collection environment and other factors, leading to an imbalance of samples. Specifically, the number of easy-to-detect samples is disproportionately high compared to the relatively sparse difficult samples. The primary distinction between the simple and difficult samples is the size of the predicted and actual frame IoU (Intersection over Union). To simplify the process, we use the average of the IoU values of all bounding boxes μ as the threshold. Negative samples are selected if they are smaller than μ, and positive samples are selected if they are larger than μ. To improve the model’s ability to detect challenging samples that are difficult to accurately identify and classify, we propose to replace the baseline model’s loss function with the Slide Loss function [31]. The formula for the Slide Loss function is as follows:

f (x) = \{\begin{matrix} 1 & x \leq μ - 0.1 \\ e^{1 - μ} & μ < x < μ - 0.1 \\ e^{1 - x} & x \geq μ \end{matrix}

(3)

The use of the Slide Loss function can enhance the model’s performance in detecting small targets against complex environments. By introducing a sliding window mechanism, the training process can focus on small targets, improving the model’s ability to recognize them. This addresses the limitations of traditional methods in dealing with small targets. Furthermore, Slide Loss enhances the model’s robustness, reducing background interference and improving its overall performance, particularly in complex backgrounds. In this paper, we achieve accelerated convergence and accurate localization by using the improved Slide Loss function. Additionally, we improve the accuracy of the model in recognizing small and occluded targets under complex backgrounds in the cabbage dataset while reducing the probability of missed and false detections.

2.3.4. Model Compress

In the preceding discussion, the model demonstrated enhanced performance through the replacement of network modules, a process that significantly elevated the model’s complexity and computational demand. These adjustments limit the model’s applicability in applications demanding real-time responses. In response, we integrate the LAMP (Layer-adaptive Magnitude-based Pruning) [32] model pruning algorithm to address these implications. LAMP is a highly efficient method for automated network pruning. It streamlines architectural complexity and curtails the overall parameter count by excising network nodes or weights that are of subordinate importance to its performance. This strategy notably increases computational efficiency and minimizes potential losses in model prediction accuracy during the compression process. Furthermore, pruned models maintain their high-level performance, paving the way for the significant optimization of the model’s inference speed. This approach enables the execution of rapid and efficient computations without compromising prediction accuracy in practical applications, thereby providing a reliable means of balancing model performance and operational efficiency.

2.3.5. The YOLOv8n-Cabbage Network Structure

In the preceding discussion, we employed ConvNeXt V2 for the Backbone and DyHead for the head of our network model. Concurrently, we undertook the optimization of the loss function and the lightening of the model via light-weighting techniques. These modifications effectively improved the learning capabilities of our proposed YOLOv8n-Cabbage. The network structure of YOLOv8n-Cabbage is illustrated in Figure 5.

2.4. Experimental Environment and Training Strategies

The hardware utilized for model training consists of a 12th Gen Intel (R) Core (TM) i9-12900H @ 2.50 GHz CPU, NVIDIA GeForce RTX 3080 Ti Laptop GPU, Kingston DDR5 4800 MHz (32 GB × 2) memory and a network training framework of PyTorch 2.2.2 with the general purpose parallel computing architecture CUDA 11.8 and the deep learning GPU acceleration library cuDNN V8.9.7. Table 1 displays the training parameter settings used during the experimental training phase.

2.5. Evaluation Metrics

During the experimental phase of this study, we comprehensively evaluate the performance of the harvested cabbage heads recognition model on the test set using multiple model evaluation metrics, such as precision (accuracy), recall (recall), AP (average precision), and mAP (mean average precision). Precision, recall, AP, and mAP are commonly used metrics to evaluate the performance of a model. Precision, reflecting the model’s accuracy, is quantified via Equation (4). Recall, which assesses the model’s ability to comprehensively identify relevant instances, follows Equation (5). AP, a metric for evaluating performance on a per-category basis, is computed according to Equation (6). Finally, mAP, providing an aggregate measure of precision across multiple categories, is derived from Equation (7). Additionally, the model was tested on various data subsets to assess its robustness and generalization ability for the cabbage head recognition task.

Precision = \frac{T P}{T P + F P}

(4)

Recall = \frac{T P}{T P + F N}

(5)

A P = \int_{0}^{1} Precision (Recall) d (Recall)

(6)

m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}

(7)

where TP denotes true positives, FP signifies false positives, and FN represents false negatives in the context of n, which is the number of retrieved instances. Furthermore, N indicates the number of categories, with AP_i denoting the AP value corresponding to category i.

3. Experiments and Results

3.1. Comparison Experiments

To demonstrate the real-time and accuracy of the proposed model for object detection in complex backgrounds, an experimental comparison was conducted between the proposed model and several classical object detection models, including Faster-RCNN, SSD, and numerous other models. The comparison leveraged a dataset converted to the COCO format, utilizing COCO metrics for a comprehensive evaluation. The experimental results are presented in the following table, in which the most prominent parameter metrics of the models are highlighted in bold.

Table 2 presents a comprehensive assessment of a multitude of models, including Faster R-CNN, SSD, RetinaNet, Cascade R-CNN, FSAF, CenterNet, ATSS, VarifocalNet, and CO-DETR, concerning their object detection capabilities. Models such as Faster R-CNN and Cascade R-CNN demonstrate excellent accuracy, yet their inference speeds may impede their usability in time-bound scenarios. Models such as SSD and CenterNet, while exhibiting accelerated prediction speeds, exhibit diminished object detection precision, particularly in the case of smaller objects and elevated IoU thresholds. The RetinaNet and VarifocalNet models employ loss function optimization to achieve a balance between detection speed and accuracy, yet this often results in a compromise of performance. The FSAF and ATSS models employ novel sampling methodologies to enhance performance consistency, which concomitantly intensifies the computational load. The CO-DETR model employs a transformer architecture and demonstrates enhanced accuracy in model training while simultaneously necessitating a significant increase in computational demand.

Following a rigorous comparison and experimentation process, the YOLOv8n-Cabbage model, developed within the context of this study, has been found to exhibit superior performance in object detection relative to the related models. Notably, the YOLOv8n-Cabbage model is characterized by a reduced number of parameters, an increased frame rate per second (FPS), significantly faster recognition, and a more compact architecture. Quantitative analysis indicates that the model achieved an AP0.5 score of 93.7%, an AP0.75 score of 81.8%, and a mAP (AP0.5:0.95) score of 74.2%. With an APsmall metric of 33.0%, the yolov8n-cabbage model exhibits a superior detection rate for small targets compared to other object detection models. These scores were demonstrably superior to other models in terms of detection precision. The experimental findings highlight the commendable adaptability of the YOLOv8n-Cabbage model in identifying objects across diverse scales, particularly in complex and challenging scenarios.

The comparative analysis presented in Table 3 reveals that the model under consideration demonstrated superior performance across all evaluation metrics when compared with conventional models from the YOLO series. The model achieved a precision rate of 91%, which demonstrated an improved predictive accuracy relative to its counterparts, namely YOLOv5n, YOLOv6n, YOLOv6s, YOLOv8n, and YOLOv9c. Despite a marginally lower record retrieval accuracy of 87.2%, this was exceeded only by YOLOv6s and YOLOv9c. The evaluation utilized mAP@50 and mAP@50-95 benchmarks, indicating that the model offers advanced object detection capabilities, particularly in varying overlap degrees. In terms of computational efficiency, the model’s architecture necessitates a mere 2.3 million parameters and occupies only 4.8 MB of memory, approximately 10.5 times smaller than the YOLOv9c model. This considerable reduction in both parameter count and model size not only enhances computational efficiency but also minimizes storage expenditures, thereby affirming the model’s substantial practical value in contexts demanding precision, efficiency, and economic use of storage.

3.2. Pruning Experiments

During the model compression process, direct pruning of the C2f module was not feasible, so it was replaced by the functionally equivalent Separated C2f module. The main difference between these two modules lies in their operations: the C2f module performs a single convolution operation on the input, followed by decomposition into two parts, while the Separated C2f module processes the input through two parallel convolution layers, producing two parts. Consequently, an additional experiment was conducted to compare models utilizing the separable C2f module before and after pruning. As illustrated in Table 4, the reduction in parameters after pruning resulted in enhanced efficiency. However, there was a slight decline in the mean average precision (mAP).

In the research on pruning the YOLOv8n-Cabbage model, we use the speed up parameter (the ratio of GFLOPS before and after pruning) to adjust the pruning rate, thereby indirectly controlling the computational load of the model. By varying the value of the speed up parameter, the precision of the pruning process can be effectively managed, leading to optimization of the model’s running speed. We pruned the YOLOv8n-Cabbage model to different extents, with speed up values set at 4.0, 2.0, and 1.5, as shown in Table 5. The results indicate that a higher speed up parameter corresponds to a higher pruning rate, resulting in smaller model parameters post-pruning. This enhances the speed of object detection while causing a slight decrease in the accuracy of model detection.

3.3. Ablation Experiments

This section utilizes ablation studies to validate the influence of various enhancement strategies on the entire deep learning model. Utilizing the initial YOLOv8n model as a basis, improvements have been made to the architecture, feature extraction network, head, and loss function of the model. We sought to ascertain the effectiveness of each section in improving the entire model, proving that improvements to each component do indeed have a positive effect on model optimization, effectively increasing the model’s performance. The empirical results are demonstrated in Table 6.

The following model modifications have been implemented:

Model 1 signifies the original YOLOv8n model serving as a baseline.
For Model 2, the backbone network was replaced with ConvNeXt V2, optimizing the base model.
Model 3 saw improvements to the backbone network, neck, and head by swapping the base model for ConvNeXt V2, optimizing the detection head with the DyHead block.
Model 4 upgrades the backbone network, head, slide weighting function, and the loss function by substituting the base model with ConvNeXt V2, optimizing the detection head with the DyHead block, and altering the loss function to Slide Loss.
Lastly, Model 5 upgrades the backbone network, head, and loss function by substituting the base model with ConvNeXt V2, optimizing the detection head with the DyHead block, and altering the loss function to Slide Loss. Finally, the model is compressed using the LAMP light-weighting approach.

The results of the experimental analysis, as presented in Table 6, demonstrate the impact of each improvement strategy on the enhancement of model performance. The replacement of the original backbone network with ConvNeXt V2 resulted in an improvement of 0.5% in precision, 1.8% in recall, and 0.3% in mAP50 and mAP@50-95 compared to the baseline model. The incorporation of the DyHead module led to an improvement in the accuracy of detecting objects of varying scales in comparison to the baseline model. The baseline model exhibited a mAP@50-95 improvement of 1.1%, which significantly reduced the rate of missed detection of small-size targets and further enhanced the model’s practicality and generalizability. Following the replacement of the original loss function with Slide Loss, an enhancement of 1.3% in accuracy was observed, further enhancing the accuracy of cabbage object detection in complex backgrounds. Furthermore, the LAMP model pruning algorithm was applied, resulting in an improvement of 0.7% in precision, 1.2% in recall, 0.6% in mAP@50, and 0.9% in mAP@50-95 compared to the baseline model. This was accompanied by a reduction of 0.7 M model parameters and a reduction in model size by 1.2 MB.

3.4. Visual Analysis of Experimental Outcomes

This section employs a diverse array of graphical representations and other visualization techniques for an in-depth dissection of the experimental data, aiming to derive accurate and comprehensive research conclusions. To facilitate a direct visual comparison of the improvements offered by our proposed models, we compare them with baseline models, with comparative outcomes presented in Figure 5.

Figure 6 depicts the continuous enhancement in precision, recall, and mAP metrics for both YOLOv8n and YOLOv8n-Cabbage models over 200 training epochs. Notably, YOLOv8n-Cabbage consistently outperforms the baseline YOLOv8n model, highlighting its superior performance and broader generalization abilities.

Heat maps have emerged as a powerful visualization technology in the field of object detection, elucidating the spatial intensity distribution of targets pinpointed by a model within an integrated image. Heat maps typically indicate the position and perceptual intensity levels of detected targets, with heightened luminosity denoting superior detection capabilities. This study selected typical data from the test set and employed the YOLOv8n and YOLOv8n-Cabbage models to execute detection, subsequently generating heat maps by employing the LayerCAM methodology [45]. Figure 7 illustrates the comparison of original images with the heat maps created by YOLOv8n and YOLOv8n-Cabbage detection. An examination of the heat maps reveals the advantage of the YOLOv8n-Cabbage model in the detection of cabbage heads under complex scenarios. The analysis indicates that the heat maps generated by the YOLOv8n-Cabbage model exhibit a pronounced concentration of highlighted areas at the center of the cabbage heads. This enhanced focus on the central region signifies the model’s strengthened capability to recognize the target’s key feature areas, thereby achieving a higher accuracy in terms of object detection. The successful implementation of the proposed improvements substantiates their effectiveness, ensuring that the model prioritizes attention to the most relevant features and regions for superior predictive performance.

To evaluate the performance of the proposed model, a challenging set of test images was chosen for the dataset, which includes cabbages of various sizes against complex backgrounds. As illustrated in Figure 8, the model shows high accuracy in recognizing a single cabbage image; it also shows higher accuracy and performance than yolov8n in recognizing a larger number of cabbage heads; it performs well in recognizing cabbage heads in a moving scene during mechanical operation; and, finally, it shows higher recognition accuracy than yolov8n in recognizing small target cabbages captured by UAVs. The experimental results show that the cabbage head object recognition model proposed in this study can effectively recognize cabbage heads of different sizes in complex backgrounds and exhibits good performance and generalization ability.

4. Conclusions and Discussion

This study addresses the prevailing challenges associated with the automated harvesting of cabbages, primarily the inability to perform real-time recognition. The advanced YOLOv8n-Cabbage model introduced here provides a refined method for the precise identification and localization of cabbage heads during harvest, thereby enhancing the accuracy and efficiency of the harvesting operations. To achieve the real-time and accurate detection of cabbage heads, we constructed a specialized dataset and, building on the YOLOv8n model, enhanced the backbone network, integrated a dynamic detection head, substituted the loss function, and implemented model lightweight processing techniques. The utility of the model is evidenced by its compact size of just 4.8 MB, achieving a precision rate of 91%, a recall rate of 87.2%, and a mAP50 of 94.5%. These metrics compellingly substantiate that the model tabled presents a feasible and innovative methodology for the automated production estimation and harvesting of cabbages.

Despite the considerable advances made in the field of the unmanned harvesting of cabbage, the limitations of the available data have prevented the current model from identifying and localizing all of the cabbage varieties. Consequently, in future work, it will be necessary to further expand the dataset to encompass a more diverse range of cabbage varieties and types to ensure its applicability to a wider range of cabbage varieties. Moreover, we will endeavor to enhance the model’s performance by optimizing the training parameters to increase its accuracy and efficacy.

Author Contributions

Conceptualization, Y.T. and C.Z.; methodology, Y.T.; software, Y.T.; validation, Y.T., C.Z. and T.Z.; formal analysis, Y.T.; investigation, Y.T.; resources, Y.T. and H.W.; data curation, Y.T. and H.W.; writing—original draft preparation, Y.T.; writing—review and editing, Y.T. and Y.Z.; visualization, Y.T.; supervision, Y.T. and C.Z.; project administration, Y.T. and C.Z.; All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Key R&D Program of China (2022ZD0115805), Provincial Key S&T Program of Xinjiang (2022A02011), and the China Agriculture Research System of MOF and MARA under Grant CARS-23-D07.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data supporting the findings of this study are available upon request from the readers.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, T.H.; Meng, Z.W.; Zheng, C.L.; Hou, J.L.; Shi, G.Y.; Xu, R. Research Status and Development of Cabbage Harvester. J. Chin. Agric. Mech. 2019, 40, 40–46. (In Chinese) [Google Scholar] [CrossRef]
Yao, S.; Zhang, J.F.; Xiao, H.R.; Jin, Y. Research Status and Development Trend of Mechanized Cabbage Harvesting Technology. J. Chin. Agric. Mech. 2019, 40, 36–42. (In Chinese) [Google Scholar] [CrossRef]
Tong, W.Y.; Zhang, J.F.; Song, Z.Y.; Cao, G.Q.; Jin, Y.; Ning, X.F. Research Status and Development Trend of Cabbage Mechanical Harvesting Equipment and Technology. J. Chin. Agric. Mech. 2024, 45, 322–329. (In Chinese) [Google Scholar] [CrossRef]
Zou, L.L.; Liu, X.M.; Yuan, J.; Dong, X.H. Research Progress in Mechanized Harvesting Technology and Equipment of Leafy Vegetables. J. Chin. Agric. Mech. 2022, 43, 15–23. (In Chinese) [Google Scholar] [CrossRef]
Wang, W.; Lv, X.L.; Wang, S.L.; Lu, D.P.; Yi, Z.Y. Current Status and Development of Stem and Leaf Vegetable Mechanized Harvesting Technology. J. China Agric. Univ. 2021, 26, 117–127. [Google Scholar]
Tian, Y.; Yang, G.; Wang, Z.; Li, E.; Liang, Z. Instance Segmentation of Apple Flowers Using the Improved Mask R-CNN Model. Biosyst. Eng. 2020, 193, 264–278. [Google Scholar] [CrossRef]
Zhu, X.; Ma, H.; Ji, J.T.; Jin, X.; Zhao, K.X.; Zhang, K. Detecting and Identifying Blueberry Canopy Fruits based on Faster R-CNN. J. South. Agric. 2020, 51, 1493–1501. (In Chinese) [Google Scholar]
Zhang, L.J.; Zhou, S.H.; Li, N.; Zhang, Y.Q.; Chen, G.Y.; Gao, X. Apple Location and Classification Based on Improved SSD Convolutional Neural Network. Trans. Chin. Soc. Agric. Mach. 2023, 54, 223–232. (In Chinese) [Google Scholar]
Yang, Z.; Gong, W.X.; Li, K.; Hao, W.; He, Z.; Ding, X.T.; Cui, Y.J. Fruit Recognition and Stem Segmentation of the Elevated Planting of Strawberries. Trans. Chin. Soc. Agric. Eng. 2023, 39, 172–181. (In Chinese) [Google Scholar]
Hussain, D.; Hussain, I.; Ismail, M.; Alabrah, A.; Ullah, S.S.; Alaghbari, H.M. A Simple and Efficient Deep Learning-Based Framework for Automatic Fruit Recognition. Comput. Intell. Neurosci. 2022, 2022, e6538117. [Google Scholar] [CrossRef]
Luling, N.; Reiser, D.; Stana, A.; Griepentrog, H.W. Using Depth Information and Color Space Variations for Improving Outdoor Robustness for Instance Segmentation of Cabbage. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: Xi’an, China, 2021; pp. 2331–2336. [Google Scholar] [CrossRef]
Yan, J.W.; Zhao, Y.; Zhang, L.W.; Zhang, F.G. Recognition of Rosa Roxbunghii Fruit in Natural Environment Based on Residual Network. J. Chin. Agric. Mech. 2020, 41, 191–196. (In Chinese) [Google Scholar] [CrossRef]
Liu, T.Z.; Teng, G.F.; Yuan, Y.C.; Liu, B.; Liu, Z.G. Winter Jujube Fruit Recognition Method Based on Improved YOLO v3 under Natural Scene. Trans. Chin. Soc. Agric. Mach. 2021, 52, 17–25. (In Chinese) [Google Scholar]
Zhao, H.; Qiao, Y.J.; Wang, H.J.; Yue, Y.J. Apple Fruit Recognition in Complex Orchard Environment based on Improved YOLOv3. Trans. Chin. Soc. Agric. 2021, 37, 127–135. (In Chinese) [Google Scholar]
Gao, F.Z.; Tang, W.J.; Chen, G.M.; Huang, J.C. Fast Recognition of Ripe Tomato Fruits in Complex Environment based on Improved YOLOv3. J. Chin. Agric. Mech. 2023, 44, 174–183. (In Chinese) [Google Scholar] [CrossRef]
Cheng, W.; Zhang, W.A.; Feng, Q.C.; Zhang, W.H. Method of Greenhouse Tomato Fruit Identification and Yield Estimation based on Improved YOLOv3. J. Chin. Agric. Mech. 2021, 42, 176–182. (In Chinese) [Google Scholar] [CrossRef]
Lu, S.; Chen, W.; Zhang, X.; Karkee, M. Canopy-attention-YOLOv4-based immature/mature apple fruit detection on dense-foliage tree architectures for early crop load estimation. Comput. Electron. Agric. 2022, 193, 106696. [Google Scholar] [CrossRef]
Ma, S.; Zhang, Y.; Zhou, G.H.; Liu, B. Recognition of Pear Fruit Under Natural Environment Using an Improved YOLOv4 Model. J. Agric. Univ. Hebei 2022, 45, 105–111. (In Chinese) [Google Scholar] [CrossRef]
Wang, N.; Qian, T.; Yang, J.; Li, L.; Zhang, Y.; Zheng, X.; Xu, Y.; Zhao, H.; Zhao, J. An Enhanced YOLOv5 Model for Greenhouse Cucumber Fruit Recognition Based on Color Space Features. Agriculture 2022, 12, 1556. [Google Scholar] [CrossRef]
Huang, T.B.; Huang, H.Q.; Li, Z.; Lü, S.L.; Xue, X.Y.; Dai, Q.F.; Wen, W. Citrus fruit recognition method based on the improved model of YOLOv5. J. Huazhong Agric. Univ. 2022, 41, 170–177. (In Chinese) [Google Scholar] [CrossRef]
Hou, G.; Chen, H.; Ma, Y.; Jiang, M.; Hua, C.; Jiang, C.; Niu, R. An Occluded Cherry Tomato Recognition Model Based on Improved YOLOv7. Front. Plant Sci. 2023, 14, 1260808. [Google Scholar] [CrossRef]
Sun, G.L.; Wang, R.B.; Qian, C.; Zhou, F.; Wang, S.R. Research and Realization of Crop Instance Segmentation Based on YOLACT. In Proceedings of the 2021 3rd International Conference on Management Science and Industrial Engineering, Osaka, Japan, 2–4 April 2021; ACM: New York, NY, USA, 2021; pp. 13–20. [Google Scholar] [CrossRef]
Liu, C.Y.; Lai, N.X.; Bi, X.J. Spherical Fruit Recognition and Location Algorithm Based on Depth Image. Trans. Chin. Soc. Agric. Mach. 2022, 53, 228–235. (In Chinese) [Google Scholar]
Sa, I.; Lim, J.Y.; Ahn, H.S.; MacDonald, B. deepNIR: Datasets for Generating Synthetic NIR Images and Improved Fruit Detection System Using Deep Learning Techniques. Sensors 2022, 22, 4721. [Google Scholar] [CrossRef] [PubMed]
Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A Small-Object-Detection Model Based on Improved YOLOv8 for UAV Aerial Photography Scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef] [PubMed]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y.; et al. Segment Anything. arXiv 2023, arXiv:2304.02643. [Google Scholar] [CrossRef]
Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics Yolov8 (Ultralytics: A Collection of PyTorch Implementations of State-of-the-Art Models for Computer Vision Tasks. Available online: https://github.com/ultralytics/ultralytics (accessed on 20 October 2023).
Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. ConvNeXt v2: Co-designing and scaling ConvNets with masked autoencoders. arXiv 2023, arXiv:2301.00808. [Google Scholar] [CrossRef]
Wang, W. Advanced Auto Labeling Solution with Added Features. Advanced Auto Labeling Solution with Added Features. Available online: https://github.com/CVHub520/X-AnyLabeling (accessed on 3 November 2023).
Dai, X.; Chen, Y.; Xiao, B.; Chen, D.; Liu, M.; Yuan, L.; Zhang, L. Dynamic head: Unifying object detection heads with attentions. arXiv 2021, arXiv:2106.08322. [Google Scholar] [CrossRef]
Yu, Z.; Huang, H.; Chen, W.; Su, Y.; Liu, Y.; Wang, X. YOLO-FaceV2: A Scale and Occlusion Aware Face Detector. arXiv 2022, arXiv:2208.02019. [Google Scholar] [CrossRef]
Lee, J.; Park, S.; Mo, S.; Ahn, S.; Shin, J. Layer-Adaptive Sparsity for the Magnitude-Based Pruning. arXiv 2021, arXiv:2010.07611. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2016, arXiv:1506.01497. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. arXiv 2016, arXiv:1512.02325. [Google Scholar] [CrossRef]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. arXiv 2018, arXiv:1708.02002. [Google Scholar] [CrossRef]
Cai, Z.; Vasconcelos, N. Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 1483–1498. [Google Scholar] [CrossRef] [PubMed]
Zhu, C.; He, Y.; Savvides, M. Feature Selective Anchor-Free Module for Single-Shot Object Detection. arXiv 2019, arXiv:1903.00621. [Google Scholar] [CrossRef]
Zhou, X.; Wang, D.; Krähenbühl, P. Objects as Points. arXiv 2019, arXiv:1904.07850. [Google Scholar] [CrossRef]
Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the Gap between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection. arXiv 2020, arXiv:1912.02424. [Google Scholar]
Zhang, H.; Wang, Y.; Dayoub, F.; Sünderhauf, N. VarifocalNet: An IoU-Aware Dense Object Detector. arXiv 2021, arXiv:2008.13367. [Google Scholar] [CrossRef]
Zong, Z.; Song, G.; Liu, Y. DETRs with Collaborative Hybrid Assignments Training. arXiv 2023, arXiv:2211.12860. [Google Scholar] [CrossRef]
Jocher, G. Ultralytics Yolov5. Ultralytics Yolov5. Available online: https://github.com/ultralytics/yolov5 (accessed on 10 May 2024).
Li, C.; Li, L.; Geng, Y.; Jiang, H.; Cheng, M.; Zhang, B.; Ke, Z.; Xu, X.; Chu, X. YOLOv6 v3.0: A Full-Scale Reloading. arXiv 2023, arXiv:2301.05586. [Google Scholar]
Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar] [CrossRef]
Jiang, P.T.; Zhang, C.B.; Hou, Q.; Cheng, M.M.; Wei, Y. LayerCAM: Exploring Hierarchical Class Activation Maps for Localization. IEEE Trans. Image Process. 2021, 30, 5875–5888. [Google Scholar] [CrossRef]

Figure 1. Harvest season cabbage samples with complex background. (a) Cabbage with front light; (b) cabbage with back light; (c) cabbage with low light; (d) cabbage with blurry state; (e) cabbage with wormholes; (f) cabbage with raindrops; (g) cabbage with damages; (h) cabbage obscured by leaves; (i) cabbage obscured by weeds; (j) cabbage with sick; (k) multiple cabbages with complex backgrounds; (l) cabbage collected from internet.

Figure 2. FCMAE of ConvNeXt V2. The area indicated in gray represents the masked area, the area indicated in blue represents the visible area, and the area indicated in red represents the reconstructed invisible area. FCMAE is a self-supervised approach for model pre-training.

Figure 3. GRN of ConvNeXt V2.

Figure 4. DyHead block. The red boxed area displays the detailed implementation process of each attention module. The arrows indicate the manner in which the DyHead block is applied to the model.

Figure 5. Network structure diagram of YOLOv8n-Cabbage.

Figure 6. Metrics comparison between YOLOv8n and YOLOv8n-Cabbage models.

Figure 7. Heat maps of YOLOv8n and YOLOv8n-Cabbage models. (a) Original images; (b) YOLOv8n; (c) YOLOv8n-Cabbage.

Figure 8. Detection results of YOLOv8n and YOLOv8n-Cabbage models.

Table 1. Parameter setting of model training.

Parameter Category	Parameter Settings
Input Size	640 × 640
Batch Size	16
Epochs	200
Momentum	0.937
Initial learning rate	0.01
Weight decay rate	0.0005

Table 2. Results of the various comparison experiments.

Models	AP0.5/%	AP0.75/%	AP0.5:0.95/%	APsmall/%	Params/M	FPS
Faster R-CNN [33]	91.5	74.4	66.0	25.4	41.1	20.3
SSD [34]	80.9	56.7	51.6	6.8	23.7	74.2
RetinaNet [35]	81.3	56.0	51.2	5.0	36.1	23.9
Cascade R-CNN [36]	91.1	76.7	68.6	27.3	68.9	13.9
FSAF [37]	92.6	75.5	67.8	24.7	35.0	24.3
CenterNet [38]	92.4	76.4	67.5	25.6	31.8	24.9
ATSS [39]	89.6	70.5	63.6	19.0	31.9	23.8
VarifocalNet [40]	92.7	77.0	69.2	27.9	32.5	18.6
CO-DETR [41]	90.4	74.3	67.2	20.4	64.2	1.8
YOLOv8n-Cabbage	93.7	81.8	74.2	33.0	2.3	119.7

Table 3. Results of the comparison experiments for the YOLO series models.

Models	Precision	Recall	mAP@50	mAP@50-95	Params/M	FPS	GFLOPs	Modelsize/MB
YOLOv5n [42]	90.1%	86.4%	93.8%	72.9%	2.5	505.4	7.1	5.0
YOLOv6n [43]	90.2%	85.9%	93.6%	73.3%	4.2	500.2	11.8	8.3
YOLOv6s	89.8%	88.0%	94.2%	74.5%	16.3	225.4	44.0	31.3
YOLOv8n	90.3%	86.0%	93.9%	73.6%	3.0	382.4	8.1	6.0
YOLOv9c [44]	90.1%	89.6%	95.3%	77.1%	25.3	49.4	102.3	49.2
Ours	91%	87.2%	94.5%	74.5%	2.3	119.7	7.8	4.8

Table 4. Results of YOLOv8n-Cabbage unpruned and pruned models.

Models	Precision	Recall	mAP@50	mAP@50-95	Params/M	FPS	GFLOPs
Ours Unpruned (use Separated C2f)	91.3%	89.2%	94.9%	76.2%	6.1	53.6	15.6
Ours Pruned (use Separated C2f)	91%	87.2%	94.5%	74.5%	2.3	119.7	7.8

Table 5. Results of the pruning experiments.

Models	Pruning Rate	Precision	Recall	mAP@50	mAP@50-95	Params/M	FPS	GFLOPs
Ours (speed up = 4.0)	0.77	90.7%	86.7%	94.2%	74.5%	1.4	171.4	3.9
Ours (speed up = 2.0)	0.62	91%	87.2%	94.5%	74.5%	2.3	119.7	7.8
Ours (speed up = 1.5)	0.42	91.2%	88.1%	94.6%	74.8%	3.5	81.5	10.3

Table 6. Results of the various ablation experiments.

Components	1	2	3	4	5
ConvNeXt V2		√	√	√	√
DyHead			√	√	√
Slide Loss				√	√
LAMP Compress					√
Precision	90.3%	90.8%	90.7%	91.6%	91% (+0.7%)
Recall	86%	87.8%	87.9%	87.1%	87.2% (+1.2%)
mAP@50	93.9%	94.2%	94.2%	94.4%	94.5% (+0.6%)
mAP@50-95	73.6%	73.9%	74.7%	74.4%	74.5% (+0.9%)
Params/M	3.0	5.6	6.1	6.1	2.3 (−0.7)
Modelsize/MB	6.0	11.0	12.0	12.0	4.8 (−1.2)

The ticked sections indicate that the model incorporates the improvements described therein.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tian, Y.; Zhao, C.; Zhang, T.; Wu, H.; Zhao, Y. Recognition Method of Cabbage Heads at Harvest Stage under Complex Background Based on Improved YOLOv8n. Agriculture 2024, 14, 1125. https://doi.org/10.3390/agriculture14071125

AMA Style

Tian Y, Zhao C, Zhang T, Wu H, Zhao Y. Recognition Method of Cabbage Heads at Harvest Stage under Complex Background Based on Improved YOLOv8n. Agriculture. 2024; 14(7):1125. https://doi.org/10.3390/agriculture14071125

Chicago/Turabian Style

Tian, Yongqiang, Chunjiang Zhao, Taihong Zhang, Huarui Wu, and Yunjie Zhao. 2024. "Recognition Method of Cabbage Heads at Harvest Stage under Complex Background Based on Improved YOLOv8n" Agriculture 14, no. 7: 1125. https://doi.org/10.3390/agriculture14071125

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Recognition Method of Cabbage Heads at Harvest Stage under Complex Background Based on Improved YOLOv8n

Abstract

1. Introduction

2. Materials and Methods

2.1. Image Data Acquisition

2.2. Data Preprocessing

2.3. Model Architecture

2.3.1. Backbone

2.3.2. Head

2.3.3. Improved Loss Function

2.3.4. Model Compress

2.3.5. The YOLOv8n-Cabbage Network Structure

2.4. Experimental Environment and Training Strategies

2.5. Evaluation Metrics

3. Experiments and Results

3.1. Comparison Experiments

3.2. Pruning Experiments

3.3. Ablation Experiments

3.4. Visual Analysis of Experimental Outcomes

4. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI