New Plum Detection in Complex Environments Based on Improved YOLOv8n

Chen, Xiaokang; Dong, Genggeng; Fan, Xiangpeng; Xu, Yan; Zou, Xiangjun; Zhou, Jianping; Jiang, Hong

doi:10.3390/agronomy14122931

Open AccessArticle

New Plum Detection in Complex Environments Based on Improved YOLOv8n

by

Xiaokang Chen

^1,2,*,

Genggeng Dong

^1,2,

Xiangpeng Fan

^3,4

,

Yan Xu

^1,2,

Xiangjun Zou

^1,5

,

Jianping Zhou

^1,2 and

Hong Jiang

^1,2

¹

College of Intelligent Manufacturing and Modern Industry, Xinjiang University, Urumqi 830049, China

²

Agriculture and Animal Husbandry Robot and Intelligent Equipment Engineering Research Center of Xinjiang Uygur Autonomous Region, Urumqi 830049, China

³

Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China

⁴

Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing 100081, China

⁵

Foshan Zhongke Agricultural Robotics and Intelligent Agriculture Innovation Research Institute, Foshan 528251, China

^*

Author to whom correspondence should be addressed.

Agronomy 2024, 14(12), 2931; https://doi.org/10.3390/agronomy14122931

Submission received: 25 November 2024 / Revised: 6 December 2024 / Accepted: 7 December 2024 / Published: 9 December 2024

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

To address the challenge of accurately detecting new plums amidst trunk and leaf occlusion and fruit overlap, this study presents a novel target detection model, YOLOv8n-CRS. A specialized dataset, specifically designed for new plums, was created under real orchard conditions, with the advanced YOLOv8n model serving as the base network. Initially, the CA attention mechanism was introduced to the backbone network to improve the model’s ability to extract crucial features of new plums. Subsequently, the RFB module was incorporated into the neck layer to leverage multiscale information, mitigating inaccuracies caused by fruit overlap and thereby enhancing detection performance. Finally, the original CIOU loss function was replaced with the SIOU loss function to further enhance the model’s detection accuracy. Test results show that the YOLOv8n-CRS model achieved a recall rate of 88.9%, with average precision scores of mAP@0.5 and mAP@0.5:0.95 recorded at 96.1% and 87.1%, respectively. The model’s F1 score reached 90.0%, and it delivered a real-time detection speed of 88.5 frames per second. Compared to the YOLOv8n model, the YOLOv8n-CRS exhibited a 2.2-percentage-point improvement in recall rate, alongside increases of 0.7 percentage points and 1.2 percentage points in mAP@0.5 and mAP@0.5:0.95, respectively. In comparison to the Faster R-CNN, YOLOv4, YOLOv5s, and YOLOv7 models, the YOLOv8n-CRS model features the smallest size of 6.9 MB. This streamlined design meets the demands for real-time identification of new plums in intricate orchard settings, providing strong technical backing for the visual perception systems of advanced plum-picking robots.

Keywords:

new plum detection; YOLOv8n; attention mechanism; RFB module; deep learning

1. Introduction

The new plum, a plum variety native to Europe, is increasingly recognized as a significant specialty cash crop in Xinjiang owing to its exceptional nutritional and economic value. Presently, the demand for new plum in the market is steadily increasing, and it is extensively cultivated across Xinjiang, with its production volume and planting area ranking first nationwide. However, the harvesting and identification of new plum predominantly rely on manual labor at this stage, presenting notable challenges, including high costs and low efficiency. Simultaneously, due to new plum short maturation cycle, delays in identification and harvesting often result in substantial economic losses. Consequently, the advancement of intelligent harvesting technology for new plums, particularly the establishment of precise and efficient identification methods, is pivotal to realizing automated harvesting and crucial for the accelerated development of the new plum cultivation industry [1,2].

In recent years, the rapid advancements in artificial intelligence technologies have significantly expanded the application of deep learning-based target detection algorithms in crop fruit identification, successfully overcoming the constraints of conventional detection techniques [3]. Currently, deep-learning-based target detection models are generally classified into two main types: two-stage detection algorithms, including RCNN [4], Faster R-CNN [5], and Mask R-CNN [6]; and single-stage detection algorithms, such as SSD [7] and YOLO [8,9,10,11]. Sun et al. [12] utilized ResNet50 as the feature extraction network in the Faster R-CNN model, employing flexible nonmaximum suppression to retain detection frames, achieving a recognition accuracy of 90.7% for tomatoes. In Du et al. [13], the DSW-YOLO network model was proposed for recognizing strawberries in challenging environments, improving recognition accuracy through the integration of DCNv3 into the ELAN module. In Tian et al. [14], an enhanced YOLOv3 model was developed for detecting apples across different growth stages, utilizing DenseNet as the feature extraction network, enhancing detection accuracy in natural settings. In Li et al. [15], an innovative grape detection model called YOLO-Grape was introduced to address the challenges related to unrecognizable or diminished recognition accuracy due to complex growing environments, branch and leaf shadows, and overlapping grape clusters. In Maceachern et al. [16], the YOLOv4 model was employed for blueberry ripeness detection, achieving high accuracy. However, its computational complexity presented challenges for deployment on mobile devices. In Zhang et al. [17], the YOLOv5 model was optimized for yellow flower recognition by incorporating a compact neural network and a dual-attention mechanism, leading to improved detection accuracy. In Li et al. [18], the YOLOv7 backbone network was improved by incorporating the Swin-Transformer and ConvNext modules, allowing for efficient detection of foreign fibers in seed cotton. Yang et al. [19] introduced an automated tomato detection methodology grounded in an enhanced YOLOv8s framework. This approach employs depth-separable convolution (DSConv) to mitigate the computational complexity of the model and enhances its detection accuracy in challenging environments through the design of a dual-path attention gate module (DPAG) and a feature enhancement module (FEM).

Although previous studies have proposed various deep-learning-based fruit detection algorithms for crops and made significant progress, the detection of new plums in natural orchard environments remains underexplored. Unlike other fruit crops, new plums grow in clusters with denser fruit distribution and smaller target sizes. Additionally, the smaller size of new plums, along with occlusions caused by leaves and branches and the uneven distribution of light in the orchard, often lead to missed detections and inaccurate identification. Consequently, achieving fast and accurate detection of new plums in real orchard environments is a challenging task. To surmount these difficulties, this study proposes an improved method for detecting new plum targets based on the YOLOv8n model. By integrating the CA attention mechanism into the model’s backbone network, incorporating the RFB module into the head network, and optimizing the loss function, the proposed model enhances both detection efficiency and speed in the complex environment of new plum orchards. In contrast to existing crop fruit detection techniques, this study emphasizes the development of a method for the rapid and precise identification of new plums in the context of orchards, which are characterized by factors such as uneven lighting, overlapping fruits, and shading from tree trunks and leaves. This improvement provides both theoretical and technical support for the development of future new plum picking robots.

The structure of this study is arrangement as follows: First, the source of the new plum dataset and the data collection process are described, followed by an explanation of the dataset enhancement methods, data calibration, and dataset partitioning. Second, the model improvement methods employed in this study are presented, with each method’s impact on the model clearly demonstrated. Finally, an ablation study and model comparison are conducted to highlight the superiority of the proposed model.

2. Image Data Acquisition and Distribution Study

2.1. Experimental Data Sources

In this study, the new plum dataset was collected from a plantation in Qapqal County, Ili Kazakh Autonomous Prefecture. Data collection took place from August to September 2023, between 10:00 and 18:00, using an iPhone 13 (Apple Inc., Xi’an, China). A total of 2880 new plum images, each with a resolution of 4032 × 3024 pixels, were captured under varying illumination, orientation, and occlusion conditions, including both single-target and multitarget new plums, as illustrated in Figure 1. The dataset includes 954 images of immature new plums, 978 of mature new plums, and 948 of diseased new plums. The dataset was then split into a training set and a test set with an 80:20 ratio.

2.2. Expansion and Labeling of the Dataset

Data amplification methodologies were applied to the dataset to increase image data diversity, enhance model robustness, and improve model recognition performance in difficult orchard environments. The data augmentation processes included flipping, mirror transformations, noise addition, and brightness adjustments. These operations expanded the new plum training dataset to 3100 images, with representative augmented images shown in Figure 2. The LabelImg annotation tool was then used to manually label the new plum dataset with rectangular bounding boxes. The labeled new plums were categorized into three classes—immature, ripe, and diseased new plum, based on the Xinjiang Uygur Autonomous Region standard.

2.3. Construction of the YOLOv8n-CRS Model

The small size of new plum fruits and the dense canopy of tree branches often result in many fruits being obscured by leaves and trunks, making it challenging to capture key feature information, leading to an increased risk of misdetection by the model. Furthermore, the overlap between fruits is significant, particularly in areas with large overlaps, which exacerbates the model’s detection leakage problem. To address the detection challenges posed by trunk and leaf occlusion, as well as fruit overlap, this study developed the YOLOv8n-CRS new plums target detection model. With an enhancement of the detection model’s ability to recognize tree trunks and foliage in the presence of occlusion, the CA attention mechanism [20] was first incorporated into the backbone network to improve the network’s ability to capture essential feature information of new plums. Simultaneously, to enhance the identification of overlapping fruits, the RFB module [21] (Receptive Field Block) was incorporated into the model’s neck layer. This module alleviates the detection issues caused by overlapping fruits by leveraging information from multiple receptive fields, thereby improving the model’s detection accuracy. Finally, the original CIOU [22] loss function was replaced with the SIOU [23] loss function to improve the model’s ability to align predicted frames with ground truth. The enhanced model is depicted in Figure 3.

2.3.1. CA Attention Mechanism

The CA attention mechanism is both flexible and lightweight, with its core principle being the assignment of attention weights to every channel of the input feature map, thereby enhancing the representation of salient features while suppressing irrelevant information. This mechanism significantly boosts the model’s ability to focus essential features of new plums in the orchard’s complex environment, thereby reducing the interference of background factors. The detailed architecture of this mechanism is shown in Figure 4.

The CA attention mechanism initially pools the input feature maps along the X and Y axes separately, generating distinct feature maps for each direction. These maps are then combined, and a 1 × 1 convolution is utilized to reduce the dimensions, resulting in a feature map F with C/r channels. The resulting feature map f is subsequently processed through batch normalization and then followed by a nonlinear activation function. After this, it is divided into separate feature maps along the X and Y axes. The channel dimensions of the two feature maps are adjusted to correspond to the required number of channels C of the input feature map using a 1 × 1 convolution kernel. The attention weights g^w and g^h for the X and Y feature maps are then calculated using the Sigmoid activation function. Finally, multiplicative weighting is applied to the input feature maps, producing output feature maps that incorporate attention weights along both the X and Y axes.

2.3.2. RFB Module

To tackle the challenge of the model’s inability to detect overlapping fruits, this study integrates the RFB (Receptive Field Block) module into the neck layer. By utilizing a larger receptive field, this module reinforces the model’s ability to capture features from fruits that are overlapping. The architecture of the RFB module is presented in Figure 5. Its overall design is inspired by the Inception [24] network, which informs the module’s architecture and its approach to feature extraction. Initially, a bottleneck structure is used to reduce computation via a 1 × 1 convolutional layer. Next, an n × n convolutional layer is added, followed by the substitution of a 5 × 5 convolution with two 3 × 3 convolutions to reduce parameters and improve the nonlinear structure. Finally, the output feature maps, which differ in size and receptive field, are concatenated and fused using a 1 × 1 convolution. The RFB module is composed of a multibranch convolutional layer and a dilated convolutional layer. The multibranch convolutional layer uses kernels of different sizes to mimic various receptive fields. Simultaneously, the dilated convolutional layer captures multiscale contextual information by adjusting the dilation rates. This enables the extraction of receptive fields of different sizes without adding extra parameters. As a result, the RFB module can efficiently leverage feature information from diverse receptive fields, strengthen the model’s feature extraction abilities, and boost its ability to detect overlapping new plum fruits.

2.3.3. Loss Function Improvement

The YOLOv8n model uses the CIOU loss function, which primarily accounts for the distance between the predicted and ground truth frames, the area of overlap, and the aspect ratio. However, it overlooks the directional mismatch between the predicted and ground truth frames. This results in slow and inefficient convergence of the model. To resolve this, the study substitutes the CIOU loss function with the SIOU loss function. The SIOU loss function includes the vector angle between the ground truth and predicted frames, redefines the corresponding loss function, and enhances the intersection between the predicted and ground truth frames. This modification further speeds up the model’s convergence. The precise definition of the SIOU loss function is outlined below:

{Loss}_{SIOU} = 1 - IOU + \frac{Δ + Ω}{2}

(1)

where IOU denotes the intersection and merger ratio between the anticipated frame B and the true frame B^GT, defined as follows:

I O U = \frac{|B \cap B^{G T}|}{|B \cup B^{G T}|}

(2)

∆ denotes the distance loss, defined as follows:

Δ = \sum_{t = x, y} (1 - e^{- γ ρ_{t}}) = 2 - e^{- γ ρ_{x}} - e^{- γ ρ_{y}}

(3)

included among these,

ρ_{x} = {(\frac{b_{c_{x}}^{gt} - b_{c_{x}}}{c_{w}})}^{2}

(4)

ρ_{y} = {(\frac{b_{c_{y}}^{g t} - b_{c_{y}}}{c_{h}})}^{2}

(5)

γ = 2 - Λ

(6)

Ω denotes the shape loss, defined as follows:

Ω = \sum_{t = w, h} {(1 - e^{- w_{t}})}^{θ} = {(1 - e^{- w_{w}})}^{θ} + {(1 - e^{- w_{h}})}^{θ}

(7)

included among these,

w_{w} = \frac{|w - w^{g t}|}{\max (w, w^{g t})}

(8)

w_{h} = \frac{|h - h^{g t}|}{\max (h, h^{g t})}

(9)

In the formula, w and h, as well as w^gt and h^gt, represent the width and height of the anticipate and ground real frames, respectively. The parameter θ controls the emphasis on shape loss. To prevent overemphasis on shape loss, which could limit the movement of the predicted frame, θ typically takes a value between 2 and 6.

3. Test Environment and Evaluation Criteria

3.1. Test Environment and Parameter Setting

The experimental environment used in this study includes Windows 11, 32 GB RAM, an NVIDIA GeForce RTX 4080 GPU, a 13th Gen Intel (R) Core (TM) i7-13700KF processor, Python 3.10.14, Pytorch 1.12.1, and CUDAN 12.0. The model input size is set to 640 × 640, with a batch size of 16. The initial learning rate is 0.01, and the training process consists of 200 iterations. Model parameters are optimized using stochastic gradient descent (SGD).

3.2. Criteria for Evaluating the Model

To effectively evaluate the model’s performance, the following evaluation metrics were employed: recall (R), mean average precision (mAP), F1 score, model size, and detection time [25,26]. Specifically, mAP@0.5 represents the model’s average precision when the intersection over union (IoU) threshold is set to 0.5. In contrast, mAP@0.5:0.95 is the model’s mean average precision calculated across IoU thresholds from 0.5 to 0.95, with a 0.05 step.

4. Test Results and Analysis

4.1. Loss Function Comparison Test

To evaluate the effect of the SIOU loss function on enhancing the model’s performance, this study incorporates the MPDIOU [27], SHAPRIOU [28], WIOU [29], and SIOU loss functions, following the cohesion of the CA attention mechanism and the RFB module. A comparative test was performed to verify the performance of these different loss functions, with the test results summarized in Table 1.

As shown in Table 1, the model was evaluated using four different loss functions. The SIOU loss function provided the best performance, achieving mAP@0.5 and mAP@0.5:0.95 values of 96.1% and 87.1%, respectively. Compared to the MPDIOU loss function, the model using the SIOU loss function showed an elevation of 0.8 percentage points in both mAP@0.5 and mAP@0.5:0.95, while also enhancing detection speed, with F1 values remaining comparable. When compared to the SHAPRIOU loss function, the model incorporating the SIOU loss function resulted in elevations of 0.9 and 0.3 percentage points in mAP@0.5 and mAP@0.5:0.95, respectively, along with a 0.5 percentage point improvement in the F1 value. Compared to the WIOU loss function, the model with the SIOU loss function showed improvements of 0.6 and 0.7 percentage points in mAP@0.5 and mAP@0.5:0.95, respectively. The F1 value was elevated by 0.7 percentage points, although there was a slight reduction in detection speed. The experimental analysis above shows that the model using the SIOU loss function outperforms the other three loss functions in accurately identifying new plum fruits. This improvement can be attributed to the SIOU loss function’s inclusion of the vector angle between the predicted and ground truth bounding boxes. This adjustment reduces regression loss and accelerates model convergence, ultimately improving detection accuracy.

4.2. Ablation Test Performance Analysis

To assess the efficacy of the three proposed improvements for detecting new plums, six ablation experiments were designed using the basic YOLOv8n model. Tests 1, 2, and 3 involved the sequential addition of the CA attention mechanism, RAF module, and SIOU loss function, respectively.

The test results are shown in Table 2. When compared to the basic YOLOv8n model, the inclusion of the CA attention mechanism in test 2 led to elevations of 0.3 and 1.2 percentage points in the mAP@0.5:0.95 and recall rate, respectively. This suggests that integrating the CA attention mechanism into the backbone network of the model enhances its ability to emphasize both channel and spatial location information, thereby improving its capacity to extract features of new plums. In test 3, the RFB module was introduced individually, yielding an mAP@0.5 value of 95.9%. Both the mAP@0.5:0.95 and recall rate increased by 0.7 and 2.5 percentage points, respectively. These results demonstrate that the RFB module enhances the backbone network’s ability to extract key feature information from new plum fruit by utilizing various receptive fields, thereby reducing feature information loss. Consequently, the model’s capability to detect overlapping new plum fruits was significantly improved. In test 4, the CIOU loss function in the original model was replaced with the SIOU loss function. After this change, the detection speed increased to 128.2 frames per second, along with improvements in both mAP@0.5:0.95 and recall rate. This indicates that the SIOU loss function not only improves the model’s detection accuracy but also boosts real-time detection speed. In test 5, both the CA attention mechanism and RFB module were added simultaneously. Although there was a minor reduction in detection speed, the model’s mAP@0.5, mAP@0.5:0.95, and recall rate increased by 0.6, 1.2, and 2.9 percentage points, respectively, compared to the basic model in test 1. In test 6, all three improvement methods were integrated simultaneously. The model achieved the highest mAP@0.5 and mAP@0.5:0.95 values, 96.1% and 87.1%, respectively. The recall rate increased by 2.2 percentage points compared to test 1 of the basic model. However, detection speed decreased due to the addition of multiple modules. The changes in mAP@0.5 of three different types of new plums and the loss curves before and after model improvement are shown in Figure 6. The combined ablation tests show that integrating all three improvements results in the best detection performance while still maintaining a detection speed suitable for real-time orchard applications. These tests effectively highlight the positive impact of the model enhancements.

4.3. Comparative Tests of Different Models

To better demonstrate the performance of the YOLOv8n-CRS model, a comparative analysis was performed with several advanced object detection models, including Faster R-CNN, YOLOv4, YOLOv5s, YOLOv7, and YOLOv8n. The same dataset was used for all tests, and the findings are shown in Table 3.

The YOLOv8n-CRS model outperformed the Faster R-CNN model; the YOLOv8n-CRS model exhibited increases of 2.4 and 15.7 percentage points in mAP@0.5 and mAP@0.5:0.95, respectively, while also significantly improving detection speed. The model size of YOLOv8n-CRS was only 6.3% of that of Faster R-CNN. While the two-stage target detection model, Faster R-CNN, offers higher detection accuracy, its slower detection speed makes it unsuitable for the real-time detection needs of actual orchard harvesting. Compared to the other four one-stage detection algorithms, the YOLOv8n-CRS model demonstrated improvements of 11.3, 6.4, 5.3, and 0.7 percentage points in mAP@0.5, respectively. The mAP@0.5:0.95 increased by 17.8, 8.6, 5.7, and 1.2 percentage points, respectively, while the F1 score improved by 8.7, 5.4, 3.7, and 0.5 percentage points. With a model size of only 6.9 MB, which is slightly larger than the original YOLOv8n model but smaller than the other four algorithms, it is well suited for deployment on portable devices. Despite the addition of multiple modules, the YOLOv8n-CRS model maintains a detection speed of 88.5 frames per second, which is still sufficient to meet the real-time detection requirements of the orchard environment. In conclusion, when compared to the other five detection models, the YOLOv8n-CRS model achieved the highest mAP@0.5 and mAP@0.5:0.95 values, while retaining a compact model size. The detection speed also satisfies the practical needs of orchards, and the YOLOv8n-CRS model not only fulfills the need for lightweight performance but also delivers high detection accuracy. This makes the YOLOv8n-CRS model particularly well suited for the rapid and precise identification of new plums in natural orchard environments.

4.4. Analysis of Model Recognition Effect

To visually demonstrate the improvements of the YOLOv8n-CRS model, a comparative test was conducted between the YOLOv8n and YOLOv8n-CRS models on the test set, with the results presented in Figure 7. The red box represents the immature new plums, The pink box represents mature new plums, The blue boxes highlight the missed detections of fruits. The YOLOv8n-CRS model effectively detects new plum fruits across various target counts, demonstrating high detection confidence. In cases of severe leaf obstruction and significant fruit overlap, the YOLOv8n model fails to detect the fruits. The YOLOv8n-CRS model significantly mitigates the missed detection issue in these two complex scenarios. In conclusion, the YOLOv8n-CRS model exhibits strong recognition performance for new plum fruits in complex environments, making it well suited for real-world orchard detection applications.

5. Discussion

This study presents a new plum fruit detection model, YOLOv8n-CRS, which is founded on an enhanced version of YOLOv8n and has been demonstrated to exhibit a strong detection capability on the new plum dataset. The advanced nature of incorporating the SIOU loss function in this investigation was validated through comparative assessments of various loss functions; furthermore, the effectiveness of the CA attention mechanism, RFB module, and SIOU loss function in model detection effect in complex orchard environments was validated through ablation experiments. Evaluations comparing the YOLOv8n model with YOLOv8n-CRS using the new plum dataset indicate that the YOLOv8n-CRS model is more effective in detecting leaf occlusion and fruit overlap in new plums, thereby rendering it better suited for practical applications in actual new plum orchards.

The YOLOv8n-CRS model demonstrates an effective equilibrium among accuracy, detection speed, and model size. In comparison to contemporary mainstream target detection models, the YOLOv8n-CRS model exhibits superior detection accuracy and faster detection speeds, all while preserving a more compact model size. As outlined in this study, the YOLOv8n-CRS model is applicable to a diverse range of practical picking scenarios. For instance, the YOLOv8n-CRS can be seamlessly integrated into a new plum-picking robot, enabling rapid and precise detection of new plum fruits, which subsequently enhances picking efficiency. This advancement not only boosts yield and reduces labor costs but also provides critical technical support for the development of smart agriculture. Additionally, this study addresses challenges such as clustered fruit growth, uneven fruit distribution, and occlusion, offering valuable technical insights for detecting crops with similar growth characteristics. While the YOLOv8n-CRS model demonstrates effective detection capabilities, it still faces challenges when applied in varying orchard environments. For instance, in the presence of significant fluctuations in lighting conditions, the model’s detection performance may prove inadequate, indicating that future research should concentrate on further optimizing the model to effectively navigate these complex orchard environments.

6. Conclusions

To achieve rapid identification and detection of new plums in the complex orchard environment, this study enhanced the YOLOv8n-based object detection algorithm. A YOLOv8n-CRS detection model for new plums was proposed, and the primary conclusions are summarized as follows:

This study presented the YOLOv8n-CRS model, an advanced new plum detection framework built upon enhancements to YOLOv8. The model incorporates the CA attention mechanism to enhance the extraction of essential features of new plums and enhance detection performance, particularly under conditions where fruits are occluded by branches and leaves. Secondly, it incorporates the RFB module in the neck layer, which further improves the model’s capacity to extract features from overlapping fruits through its expanded receptive field. Additionally, the loss function is upgraded to SIOU, enhancing the overlap between the predicted frames and the ground truth, thereby further improving model accuracy. These advancements significantly enhance the model’s detection performance in complex orchard environments while satisfying the requirements for real-time monitoring.

Compared to the Faster R-CNN, YOLOv4, YOLOv5s, and YOLOv7 models, the YOLOv8n-CRS model introduced in this study achieves the highest mean average precision (mAP) scores of 97.1% at mAP@0.5 and 87.1% at mAP@0.5:0.95, while also having the smallest model size of 6.9 MB. Additionally, its detection speed satisfies the real-time detection requirements for new plum fruits, demonstrating optimal overall performance.

The enhanced YOLOv8n-CRS model shows significantly better detection performance on the new plum dataset. Compared to the basics YOLOv8n model, the recall, mAP@0.5, and mAP@0.5:0.95 increased by 2.2, 0.7, and 1.2 percentage points, respectively. Additionally, the model achieved a detection speed of 88.5 frames per second. Consequently, this study facilitates the swift and accurate identification of new plums in intricate orchard environments, laying a robust groundwork for the advancement of new plum-picking robots.

Author Contributions

Methodology, X.C., G.D. and X.F.; resources, Y.X.; writing—original draft preparation, X.C., G.D., X.F. and Y.X.; experimental guidance, writing—review and editing, X.Z., J.Z. and H.J.; supervision, X.Z. and J.Z.; funding acquisition, X.C. and X.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 52465055; Beijing Natural Science Foundation Project under grant 6244056; Natural Science Foundation of Xinjiang Uygur Autonomous Region under grant 2023D01C189.

Data Availability Statement

All the new research data are included in this contribution.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tang, Y.; Qi, S.; Zhu, L.; Zhuo, X.; Zhang, Y.; Meng, F. Obstacle avoidance motion in mobile robotics. J. Syst. Simul. 2024, 36, 1–26. [Google Scholar]
Li, C.E.; Tang, Y.; Zou, X.; Zhang, P.; Lin, J.; Lian, G.; Pan, Y. A novel agricultural machinery intelligent design system based on integrating image processing and knowledge reasoning. Appl. Sci. 2022, 12, 7900. [Google Scholar] [CrossRef]
Luo, L.; Liu, W.; Lu, Q.; Wang, J.; Wen, W.; Yan, D.; Tang, Y. Grape Berry Detection and Size Measurement Based on Edge Image Processing and Geometric Morphology. Machines 2021, 9, 233. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 23–28 July 2016. [Google Scholar]
Ren, S. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 21–26 June 2016. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
Redmon, J. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Wei, X. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar]
Sun, J.; He, X.; Ge, X.; Wu, X.; Shen, J.; Song, Y. Detection of key organs in tomato based on deep migration learning in a complex background. Agriculture 2018, 8, 196. [Google Scholar] [CrossRef]
Du, X.; Cheng, H.; Ma, Z.; Lu, W.; Wang, M.; Meng, Z.; Jang, C.; Hong, F. DSW-YOLO: A detection method for ground-planted strawberry fruits under different occlusion levels. Comput. Electron. Agric. 2023, 214, 108304. [Google Scholar] [CrossRef]
Tian, Y.; Yang, G.; Wang, Z.; Wang, H.; Li, E.; Liang, Z. Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Comput. Electron. Agric. 2019, 157, 417–426. [Google Scholar] [CrossRef]
Li, H.; Li, C.; Li, G.; Chen, L. A real-time table grape detection method based on improved YOLOv4-tiny network in complex background. Biosyst. Eng. 2021, 212, 347–359. [Google Scholar] [CrossRef]
MacEachern, C.B.; Esau, T.J.; Schumann, A.W.; Hennessy, P.J.; Zaman, Q.U. Detection of fruit maturity stage and yield estimation in wild blueberry using deep learning convolutional neural networks. Smart Agric. Technol. 2023, 3, 100099. [Google Scholar] [CrossRef]
Zhang, L.; Wu, L.; Liu, Y. Hemerocallis citrina Baroni maturity detection method integrating lightweight neural network and dual attention mechanism. Electronics 2022, 11, 2743. [Google Scholar] [CrossRef]
Li, Q.; Ma, W.; Li, H.; Zhang, X.; Zhang, R.; Zhou, W. Cotton-YOLO: Improved YOLOV7 for rapid detection of foreign fibers in seed cotton. Comput. Electron. Agric. 2024, 219, 108752. [Google Scholar] [CrossRef]
Yang, G.; Wang, J.; Nie, Z.; Yang, H.; Yu, S. A lightweight YOLOv8 tomato detection algorithm combining feature enhancement and attention. Agronomy 2023, 13, 1824. [Google Scholar] [CrossRef]
Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), online, 19–25 June 2021. [Google Scholar]
Liu, S.; Huang, D. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
Lin, T. Focal Loss for Dense Object Detection. arXiv 2017, arXiv:1708.02002. [Google Scholar]
Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv 2022, arXiv:2205.12740. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Loffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
He, P.; Zhao, S.; Pan, P.; Zhou, G.; Zhang, J. PDC-YOLO: A Network for Pig Detection under Complex Conditions for Counting Purposes. Agriculture 2024, 14, 1807. [Google Scholar] [CrossRef]
Jiang, L.; Wang, Y.; Wu, C.; Wu, H. Fruit Distribution Density Estimation in YOLO-Detected Strawberry Images: A Kernel Density and Nearest Neighbor Analysis Approach. Agriculture 2024, 14, 1848. [Google Scholar] [CrossRef]
Ma, S.; Xu, Y. Mpdiou: A loss for efficient and accurate bounding box regression. arXiv 2023, arXiv:2307.07662. [Google Scholar]
Zhang, H.; Zhang, S.J. Shape-IoU: More Accurate Metric considering Bounding Box Shape and Scale. arXiv 2023, arXiv:2312.17663. [Google Scholar]
Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]

Figure 1. Example chart of new plum dataset.

Figure 2. Data enhancement effect diagram.

Figure 3. YOLOv8n-CRS model diagram.

Figure 4. CA attention mechanism.

Figure 5. RFB module.

Figure 6. Performance comparison diagram before and after model improvement. (a) Comparison of the three categories’ mAP@0.5. (b) Comparison of model loss curves.

Figure 7. Comparison chart of detection effect.

Table 1. Comparison test of the performance of different loss functions.

Loss Functions	mAP@0.5/%	mAP@0.5:0.95/%	F1/%	FPS/(frames s⁻¹)
MPDIOU	95.3	86.3	90.0	86.2
SHAPRIOU	95.2	86.8	89.5	87.7
WIOU	95.5	86.4	89.3	90.9
SIOU	96.1	87.1	90.0	88.5

Table 2. Comparison of ablation test results.

Test	CA	RAF	SIOU	mAP@0.5/%	mAP@0.5:0.95/%	R/%	FPS/(frames s⁻¹)
1	×	×	×	95.4	85.9	86.7	120.4
2	√	×	×	95.5	86.2	87.9	112.3
3	×	√	×	95.9	86.6	89.2	93.4
4	×	×	√	95.4	86.1	88.6	128.2
5	√	√	×	96.0	87.1	89.6	85.4
6	√	√	√	96.1	87.1	88.9	88.5

Table 3. Comparative experiments with different models.

Model	mAP@0.5/%	mAP@0.5:0.95/%	F1/%	FPS/(frames s⁻¹)	Model Size/MB
Faster R-CNN	93.7	71.4	83.6	28.0	108
YOLOv4	84.8	69.3	81.3	69.4	244
YOLOv5s	89.7	78.5	84.6	117.6	13.7
YOLOv7	90.8	81.4	86.3	78.1	71.3
YOLOv8n	95.4	85.9	89.5	120.4	5.9
YOLOv8n-CRS	96.1	87.1	90.0	88.5	6.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, X.; Dong, G.; Fan, X.; Xu, Y.; Zou, X.; Zhou, J.; Jiang, H. New Plum Detection in Complex Environments Based on Improved YOLOv8n. Agronomy 2024, 14, 2931. https://doi.org/10.3390/agronomy14122931

AMA Style

Chen X, Dong G, Fan X, Xu Y, Zou X, Zhou J, Jiang H. New Plum Detection in Complex Environments Based on Improved YOLOv8n. Agronomy. 2024; 14(12):2931. https://doi.org/10.3390/agronomy14122931

Chicago/Turabian Style

Chen, Xiaokang, Genggeng Dong, Xiangpeng Fan, Yan Xu, Xiangjun Zou, Jianping Zhou, and Hong Jiang. 2024. "New Plum Detection in Complex Environments Based on Improved YOLOv8n" Agronomy 14, no. 12: 2931. https://doi.org/10.3390/agronomy14122931

APA Style

Chen, X., Dong, G., Fan, X., Xu, Y., Zou, X., Zhou, J., & Jiang, H. (2024). New Plum Detection in Complex Environments Based on Improved YOLOv8n. Agronomy, 14(12), 2931. https://doi.org/10.3390/agronomy14122931

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

New Plum Detection in Complex Environments Based on Improved YOLOv8n

Abstract

1. Introduction

2. Image Data Acquisition and Distribution Study

2.1. Experimental Data Sources

2.2. Expansion and Labeling of the Dataset

2.3. Construction of the YOLOv8n-CRS Model

2.3.1. CA Attention Mechanism

2.3.2. RFB Module

2.3.3. Loss Function Improvement

3. Test Environment and Evaluation Criteria

3.1. Test Environment and Parameter Setting

3.2. Criteria for Evaluating the Model

4. Test Results and Analysis

4.1. Loss Function Comparison Test

4.2. Ablation Test Performance Analysis

4.3. Comparative Tests of Different Models

4.4. Analysis of Model Recognition Effect

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI