1. Introduction
The new plum, a plum variety native to Europe, is increasingly recognized as a significant specialty cash crop in Xinjiang owing to its exceptional nutritional and economic value. Presently, the demand for new plum in the market is steadily increasing, and it is extensively cultivated across Xinjiang, with its production volume and planting area ranking first nationwide. However, the harvesting and identification of new plum predominantly rely on manual labor at this stage, presenting notable challenges, including high costs and low efficiency. Simultaneously, due to new plum short maturation cycle, delays in identification and harvesting often result in substantial economic losses. Consequently, the advancement of intelligent harvesting technology for new plums, particularly the establishment of precise and efficient identification methods, is pivotal to realizing automated harvesting and crucial for the accelerated development of the new plum cultivation industry [
1,
2].
In recent years, the rapid advancements in artificial intelligence technologies have significantly expanded the application of deep learning-based target detection algorithms in crop fruit identification, successfully overcoming the constraints of conventional detection techniques [
3]. Currently, deep-learning-based target detection models are generally classified into two main types: two-stage detection algorithms, including RCNN [
4], Faster R-CNN [
5], and Mask R-CNN [
6]; and single-stage detection algorithms, such as SSD [
7] and YOLO [
8,
9,
10,
11]. Sun et al. [
12] utilized ResNet50 as the feature extraction network in the Faster R-CNN model, employing flexible nonmaximum suppression to retain detection frames, achieving a recognition accuracy of 90.7% for tomatoes. In Du et al. [
13], the DSW-YOLO network model was proposed for recognizing strawberries in challenging environments, improving recognition accuracy through the integration of DCNv3 into the ELAN module. In Tian et al. [
14], an enhanced YOLOv3 model was developed for detecting apples across different growth stages, utilizing DenseNet as the feature extraction network, enhancing detection accuracy in natural settings. In Li et al. [
15], an innovative grape detection model called YOLO-Grape was introduced to address the challenges related to unrecognizable or diminished recognition accuracy due to complex growing environments, branch and leaf shadows, and overlapping grape clusters. In Maceachern et al. [
16], the YOLOv4 model was employed for blueberry ripeness detection, achieving high accuracy. However, its computational complexity presented challenges for deployment on mobile devices. In Zhang et al. [
17], the YOLOv5 model was optimized for yellow flower recognition by incorporating a compact neural network and a dual-attention mechanism, leading to improved detection accuracy. In Li et al. [
18], the YOLOv7 backbone network was improved by incorporating the Swin-Transformer and ConvNext modules, allowing for efficient detection of foreign fibers in seed cotton. Yang et al. [
19] introduced an automated tomato detection methodology grounded in an enhanced YOLOv8s framework. This approach employs depth-separable convolution (DSConv) to mitigate the computational complexity of the model and enhances its detection accuracy in challenging environments through the design of a dual-path attention gate module (DPAG) and a feature enhancement module (FEM).
Although previous studies have proposed various deep-learning-based fruit detection algorithms for crops and made significant progress, the detection of new plums in natural orchard environments remains underexplored. Unlike other fruit crops, new plums grow in clusters with denser fruit distribution and smaller target sizes. Additionally, the smaller size of new plums, along with occlusions caused by leaves and branches and the uneven distribution of light in the orchard, often lead to missed detections and inaccurate identification. Consequently, achieving fast and accurate detection of new plums in real orchard environments is a challenging task. To surmount these difficulties, this study proposes an improved method for detecting new plum targets based on the YOLOv8n model. By integrating the CA attention mechanism into the model’s backbone network, incorporating the RFB module into the head network, and optimizing the loss function, the proposed model enhances both detection efficiency and speed in the complex environment of new plum orchards. In contrast to existing crop fruit detection techniques, this study emphasizes the development of a method for the rapid and precise identification of new plums in the context of orchards, which are characterized by factors such as uneven lighting, overlapping fruits, and shading from tree trunks and leaves. This improvement provides both theoretical and technical support for the development of future new plum picking robots.
The structure of this study is arrangement as follows: First, the source of the new plum dataset and the data collection process are described, followed by an explanation of the dataset enhancement methods, data calibration, and dataset partitioning. Second, the model improvement methods employed in this study are presented, with each method’s impact on the model clearly demonstrated. Finally, an ablation study and model comparison are conducted to highlight the superiority of the proposed model.
2. Image Data Acquisition and Distribution Study
2.1. Experimental Data Sources
In this study, the new plum dataset was collected from a plantation in Qapqal County, Ili Kazakh Autonomous Prefecture. Data collection took place from August to September 2023, between 10:00 and 18:00, using an iPhone 13 (Apple Inc., Xi’an, China). A total of 2880 new plum images, each with a resolution of 4032 × 3024 pixels, were captured under varying illumination, orientation, and occlusion conditions, including both single-target and multitarget new plums, as illustrated in
Figure 1. The dataset includes 954 images of immature new plums, 978 of mature new plums, and 948 of diseased new plums. The dataset was then split into a training set and a test set with an 80:20 ratio.
2.2. Expansion and Labeling of the Dataset
Data amplification methodologies were applied to the dataset to increase image data diversity, enhance model robustness, and improve model recognition performance in difficult orchard environments. The data augmentation processes included flipping, mirror transformations, noise addition, and brightness adjustments. These operations expanded the new plum training dataset to 3100 images, with representative augmented images shown in
Figure 2. The LabelImg annotation tool was then used to manually label the new plum dataset with rectangular bounding boxes. The labeled new plums were categorized into three classes—immature, ripe, and diseased new plum, based on the Xinjiang Uygur Autonomous Region standard.
2.3. Construction of the YOLOv8n-CRS Model
The small size of new plum fruits and the dense canopy of tree branches often result in many fruits being obscured by leaves and trunks, making it challenging to capture key feature information, leading to an increased risk of misdetection by the model. Furthermore, the overlap between fruits is significant, particularly in areas with large overlaps, which exacerbates the model’s detection leakage problem. To address the detection challenges posed by trunk and leaf occlusion, as well as fruit overlap, this study developed the YOLOv8n-CRS new plums target detection model. With an enhancement of the detection model’s ability to recognize tree trunks and foliage in the presence of occlusion, the CA attention mechanism [
20] was first incorporated into the backbone network to improve the network’s ability to capture essential feature information of new plums. Simultaneously, to enhance the identification of overlapping fruits, the RFB module [
21] (Receptive Field Block) was incorporated into the model’s neck layer. This module alleviates the detection issues caused by overlapping fruits by leveraging information from multiple receptive fields, thereby improving the model’s detection accuracy. Finally, the original CIOU [
22] loss function was replaced with the SIOU [
23] loss function to improve the model’s ability to align predicted frames with ground truth. The enhanced model is depicted in
Figure 3.
2.3.1. CA Attention Mechanism
The CA attention mechanism is both flexible and lightweight, with its core principle being the assignment of attention weights to every channel of the input feature map, thereby enhancing the representation of salient features while suppressing irrelevant information. This mechanism significantly boosts the model’s ability to focus essential features of new plums in the orchard’s complex environment, thereby reducing the interference of background factors. The detailed architecture of this mechanism is shown in
Figure 4.
The CA attention mechanism initially pools the input feature maps along the X and Y axes separately, generating distinct feature maps for each direction. These maps are then combined, and a 1 × 1 convolution is utilized to reduce the dimensions, resulting in a feature map F with C/r channels. The resulting feature map f is subsequently processed through batch normalization and then followed by a nonlinear activation function. After this, it is divided into separate feature maps along the X and Y axes. The channel dimensions of the two feature maps are adjusted to correspond to the required number of channels C of the input feature map using a 1 × 1 convolution kernel. The attention weights gw and gh for the X and Y feature maps are then calculated using the Sigmoid activation function. Finally, multiplicative weighting is applied to the input feature maps, producing output feature maps that incorporate attention weights along both the X and Y axes.
2.3.2. RFB Module
To tackle the challenge of the model’s inability to detect overlapping fruits, this study integrates the RFB (Receptive Field Block) module into the neck layer. By utilizing a larger receptive field, this module reinforces the model’s ability to capture features from fruits that are overlapping. The architecture of the RFB module is presented in
Figure 5. Its overall design is inspired by the Inception [
24] network, which informs the module’s architecture and its approach to feature extraction. Initially, a bottleneck structure is used to reduce computation via a 1 × 1 convolutional layer. Next, an
n × n convolutional layer is added, followed by the substitution of a 5 × 5 convolution with two 3 × 3 convolutions to reduce parameters and improve the nonlinear structure. Finally, the output feature maps, which differ in size and receptive field, are concatenated and fused using a 1 × 1 convolution. The RFB module is composed of a multibranch convolutional layer and a dilated convolutional layer. The multibranch convolutional layer uses kernels of different sizes to mimic various receptive fields. Simultaneously, the dilated convolutional layer captures multiscale contextual information by adjusting the dilation rates. This enables the extraction of receptive fields of different sizes without adding extra parameters. As a result, the RFB module can efficiently leverage feature information from diverse receptive fields, strengthen the model’s feature extraction abilities, and boost its ability to detect overlapping new plum fruits.
2.3.3. Loss Function Improvement
The YOLOv8n model uses the CIOU loss function, which primarily accounts for the distance between the predicted and ground truth frames, the area of overlap, and the aspect ratio. However, it overlooks the directional mismatch between the predicted and ground truth frames. This results in slow and inefficient convergence of the model. To resolve this, the study substitutes the CIOU loss function with the SIOU loss function. The SIOU loss function includes the vector angle between the ground truth and predicted frames, redefines the corresponding loss function, and enhances the intersection between the predicted and ground truth frames. This modification further speeds up the model’s convergence. The precise definition of the SIOU loss function is outlined below:
where IOU denotes the intersection and merger ratio between the anticipated frame
B and the true frame
BGT, defined as follows:
∆ denotes the distance loss, defined as follows:
included among these,
Ω denotes the shape loss, defined as follows:
included among these,
In the formula, w and h, as well as wgt and hgt, represent the width and height of the anticipate and ground real frames, respectively. The parameter θ controls the emphasis on shape loss. To prevent overemphasis on shape loss, which could limit the movement of the predicted frame, θ typically takes a value between 2 and 6.
3. Test Environment and Evaluation Criteria
3.1. Test Environment and Parameter Setting
The experimental environment used in this study includes Windows 11, 32 GB RAM, an NVIDIA GeForce RTX 4080 GPU, a 13th Gen Intel (R) Core (TM) i7-13700KF processor, Python 3.10.14, Pytorch 1.12.1, and CUDAN 12.0. The model input size is set to 640 × 640, with a batch size of 16. The initial learning rate is 0.01, and the training process consists of 200 iterations. Model parameters are optimized using stochastic gradient descent (SGD).
3.2. Criteria for Evaluating the Model
To effectively evaluate the model’s performance, the following evaluation metrics were employed: recall (R), mean average precision (mAP), F1 score, model size, and detection time [
25,
26]. Specifically, mAP@0.5 represents the model’s average precision when the intersection over union (IoU) threshold is set to 0.5. In contrast, mAP@0.5:0.95 is the model’s mean average precision calculated across IoU thresholds from 0.5 to 0.95, with a 0.05 step.
4. Test Results and Analysis
4.1. Loss Function Comparison Test
To evaluate the effect of the SIOU loss function on enhancing the model’s performance, this study incorporates the MPDIOU [
27], SHAPRIOU [
28], WIOU [
29], and SIOU loss functions, following the cohesion of the CA attention mechanism and the RFB module. A comparative test was performed to verify the performance of these different loss functions, with the test results summarized in
Table 1.
As shown in
Table 1, the model was evaluated using four different loss functions. The SIOU loss function provided the best performance, achieving mAP@0.5 and mAP@0.5:0.95 values of 96.1% and 87.1%, respectively. Compared to the MPDIOU loss function, the model using the SIOU loss function showed an elevation of 0.8 percentage points in both mAP@0.5 and mAP@0.5:0.95, while also enhancing detection speed, with F1 values remaining comparable. When compared to the SHAPRIOU loss function, the model incorporating the SIOU loss function resulted in elevations of 0.9 and 0.3 percentage points in mAP@0.5 and mAP@0.5:0.95, respectively, along with a 0.5 percentage point improvement in the F1 value. Compared to the WIOU loss function, the model with the SIOU loss function showed improvements of 0.6 and 0.7 percentage points in mAP@0.5 and mAP@0.5:0.95, respectively. The F1 value was elevated by 0.7 percentage points, although there was a slight reduction in detection speed. The experimental analysis above shows that the model using the SIOU loss function outperforms the other three loss functions in accurately identifying new plum fruits. This improvement can be attributed to the SIOU loss function’s inclusion of the vector angle between the predicted and ground truth bounding boxes. This adjustment reduces regression loss and accelerates model convergence, ultimately improving detection accuracy.
4.2. Ablation Test Performance Analysis
To assess the efficacy of the three proposed improvements for detecting new plums, six ablation experiments were designed using the basic YOLOv8n model. Tests 1, 2, and 3 involved the sequential addition of the CA attention mechanism, RAF module, and SIOU loss function, respectively.
The test results are shown in
Table 2. When compared to the basic YOLOv8n model, the inclusion of the CA attention mechanism in test 2 led to elevations of 0.3 and 1.2 percentage points in the mAP@0.5:0.95 and recall rate, respectively. This suggests that integrating the CA attention mechanism into the backbone network of the model enhances its ability to emphasize both channel and spatial location information, thereby improving its capacity to extract features of new plums. In test 3, the RFB module was introduced individually, yielding an mAP@0.5 value of 95.9%. Both the mAP@0.5:0.95 and recall rate increased by 0.7 and 2.5 percentage points, respectively. These results demonstrate that the RFB module enhances the backbone network’s ability to extract key feature information from new plum fruit by utilizing various receptive fields, thereby reducing feature information loss. Consequently, the model’s capability to detect overlapping new plum fruits was significantly improved. In test 4, the CIOU loss function in the original model was replaced with the SIOU loss function. After this change, the detection speed increased to 128.2 frames per second, along with improvements in both mAP@0.5:0.95 and recall rate. This indicates that the SIOU loss function not only improves the model’s detection accuracy but also boosts real-time detection speed. In test 5, both the CA attention mechanism and RFB module were added simultaneously. Although there was a minor reduction in detection speed, the model’s mAP@0.5, mAP@0.5:0.95, and recall rate increased by 0.6, 1.2, and 2.9 percentage points, respectively, compared to the basic model in test 1. In test 6, all three improvement methods were integrated simultaneously. The model achieved the highest mAP@0.5 and mAP@0.5:0.95 values, 96.1% and 87.1%, respectively. The recall rate increased by 2.2 percentage points compared to test 1 of the basic model. However, detection speed decreased due to the addition of multiple modules. The changes in mAP@0.5 of three different types of new plums and the loss curves before and after model improvement are shown in
Figure 6. The combined ablation tests show that integrating all three improvements results in the best detection performance while still maintaining a detection speed suitable for real-time orchard applications. These tests effectively highlight the positive impact of the model enhancements.
4.3. Comparative Tests of Different Models
To better demonstrate the performance of the YOLOv8n-CRS model, a comparative analysis was performed with several advanced object detection models, including Faster R-CNN, YOLOv4, YOLOv5s, YOLOv7, and YOLOv8n. The same dataset was used for all tests, and the findings are shown in
Table 3.
The YOLOv8n-CRS model outperformed the Faster R-CNN model; the YOLOv8n-CRS model exhibited increases of 2.4 and 15.7 percentage points in mAP@0.5 and mAP@0.5:0.95, respectively, while also significantly improving detection speed. The model size of YOLOv8n-CRS was only 6.3% of that of Faster R-CNN. While the two-stage target detection model, Faster R-CNN, offers higher detection accuracy, its slower detection speed makes it unsuitable for the real-time detection needs of actual orchard harvesting. Compared to the other four one-stage detection algorithms, the YOLOv8n-CRS model demonstrated improvements of 11.3, 6.4, 5.3, and 0.7 percentage points in mAP@0.5, respectively. The mAP@0.5:0.95 increased by 17.8, 8.6, 5.7, and 1.2 percentage points, respectively, while the F1 score improved by 8.7, 5.4, 3.7, and 0.5 percentage points. With a model size of only 6.9 MB, which is slightly larger than the original YOLOv8n model but smaller than the other four algorithms, it is well suited for deployment on portable devices. Despite the addition of multiple modules, the YOLOv8n-CRS model maintains a detection speed of 88.5 frames per second, which is still sufficient to meet the real-time detection requirements of the orchard environment. In conclusion, when compared to the other five detection models, the YOLOv8n-CRS model achieved the highest mAP@0.5 and mAP@0.5:0.95 values, while retaining a compact model size. The detection speed also satisfies the practical needs of orchards, and the YOLOv8n-CRS model not only fulfills the need for lightweight performance but also delivers high detection accuracy. This makes the YOLOv8n-CRS model particularly well suited for the rapid and precise identification of new plums in natural orchard environments.
4.4. Analysis of Model Recognition Effect
To visually demonstrate the improvements of the YOLOv8n-CRS model, a comparative test was conducted between the YOLOv8n and YOLOv8n-CRS models on the test set, with the results presented in
Figure 7. The red box represents the immature new plums, The pink box represents mature new plums, The blue boxes highlight the missed detections of fruits. The YOLOv8n-CRS model effectively detects new plum fruits across various target counts, demonstrating high detection confidence. In cases of severe leaf obstruction and significant fruit overlap, the YOLOv8n model fails to detect the fruits. The YOLOv8n-CRS model significantly mitigates the missed detection issue in these two complex scenarios. In conclusion, the YOLOv8n-CRS model exhibits strong recognition performance for new plum fruits in complex environments, making it well suited for real-world orchard detection applications.
5. Discussion
This study presents a new plum fruit detection model, YOLOv8n-CRS, which is founded on an enhanced version of YOLOv8n and has been demonstrated to exhibit a strong detection capability on the new plum dataset. The advanced nature of incorporating the SIOU loss function in this investigation was validated through comparative assessments of various loss functions; furthermore, the effectiveness of the CA attention mechanism, RFB module, and SIOU loss function in model detection effect in complex orchard environments was validated through ablation experiments. Evaluations comparing the YOLOv8n model with YOLOv8n-CRS using the new plum dataset indicate that the YOLOv8n-CRS model is more effective in detecting leaf occlusion and fruit overlap in new plums, thereby rendering it better suited for practical applications in actual new plum orchards.
The YOLOv8n-CRS model demonstrates an effective equilibrium among accuracy, detection speed, and model size. In comparison to contemporary mainstream target detection models, the YOLOv8n-CRS model exhibits superior detection accuracy and faster detection speeds, all while preserving a more compact model size. As outlined in this study, the YOLOv8n-CRS model is applicable to a diverse range of practical picking scenarios. For instance, the YOLOv8n-CRS can be seamlessly integrated into a new plum-picking robot, enabling rapid and precise detection of new plum fruits, which subsequently enhances picking efficiency. This advancement not only boosts yield and reduces labor costs but also provides critical technical support for the development of smart agriculture. Additionally, this study addresses challenges such as clustered fruit growth, uneven fruit distribution, and occlusion, offering valuable technical insights for detecting crops with similar growth characteristics. While the YOLOv8n-CRS model demonstrates effective detection capabilities, it still faces challenges when applied in varying orchard environments. For instance, in the presence of significant fluctuations in lighting conditions, the model’s detection performance may prove inadequate, indicating that future research should concentrate on further optimizing the model to effectively navigate these complex orchard environments.
6. Conclusions
To achieve rapid identification and detection of new plums in the complex orchard environment, this study enhanced the YOLOv8n-based object detection algorithm. A YOLOv8n-CRS detection model for new plums was proposed, and the primary conclusions are summarized as follows:
This study presented the YOLOv8n-CRS model, an advanced new plum detection framework built upon enhancements to YOLOv8. The model incorporates the CA attention mechanism to enhance the extraction of essential features of new plums and enhance detection performance, particularly under conditions where fruits are occluded by branches and leaves. Secondly, it incorporates the RFB module in the neck layer, which further improves the model’s capacity to extract features from overlapping fruits through its expanded receptive field. Additionally, the loss function is upgraded to SIOU, enhancing the overlap between the predicted frames and the ground truth, thereby further improving model accuracy. These advancements significantly enhance the model’s detection performance in complex orchard environments while satisfying the requirements for real-time monitoring.
Compared to the Faster R-CNN, YOLOv4, YOLOv5s, and YOLOv7 models, the YOLOv8n-CRS model introduced in this study achieves the highest mean average precision (mAP) scores of 97.1% at mAP@0.5 and 87.1% at mAP@0.5:0.95, while also having the smallest model size of 6.9 MB. Additionally, its detection speed satisfies the real-time detection requirements for new plum fruits, demonstrating optimal overall performance.
The enhanced YOLOv8n-CRS model shows significantly better detection performance on the new plum dataset. Compared to the basics YOLOv8n model, the recall, mAP@0.5, and mAP@0.5:0.95 increased by 2.2, 0.7, and 1.2 percentage points, respectively. Additionally, the model achieved a detection speed of 88.5 frames per second. Consequently, this study facilitates the swift and accurate identification of new plums in intricate orchard environments, laying a robust groundwork for the advancement of new plum-picking robots.
Author Contributions
Methodology, X.C., G.D. and X.F.; resources, Y.X.; writing—original draft preparation, X.C., G.D., X.F. and Y.X.; experimental guidance, writing—review and editing, X.Z., J.Z. and H.J.; supervision, X.Z. and J.Z.; funding acquisition, X.C. and X.F. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Natural Science Foundation of China, grant number 52465055; Beijing Natural Science Foundation Project under grant 6244056; Natural Science Foundation of Xinjiang Uygur Autonomous Region under grant 2023D01C189.
Data Availability Statement
All the new research data are included in this contribution.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Tang, Y.; Qi, S.; Zhu, L.; Zhuo, X.; Zhang, Y.; Meng, F. Obstacle avoidance motion in mobile robotics. J. Syst. Simul. 2024, 36, 1–26. [Google Scholar]
- Li, C.E.; Tang, Y.; Zou, X.; Zhang, P.; Lin, J.; Lian, G.; Pan, Y. A novel agricultural machinery intelligent design system based on integrating image processing and knowledge reasoning. Appl. Sci. 2022, 12, 7900. [Google Scholar] [CrossRef]
- Luo, L.; Liu, W.; Lu, Q.; Wang, J.; Wen, W.; Yan, D.; Tang, Y. Grape Berry Detection and Size Measurement Based on Edge Image Processing and Geometric Morphology. Machines 2021, 9, 233. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 23–28 July 2016. [Google Scholar]
- Ren, S. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 21–26 June 2016. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Redmon, J. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Wei, X. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar]
- Sun, J.; He, X.; Ge, X.; Wu, X.; Shen, J.; Song, Y. Detection of key organs in tomato based on deep migration learning in a complex background. Agriculture 2018, 8, 196. [Google Scholar] [CrossRef]
- Du, X.; Cheng, H.; Ma, Z.; Lu, W.; Wang, M.; Meng, Z.; Jang, C.; Hong, F. DSW-YOLO: A detection method for ground-planted strawberry fruits under different occlusion levels. Comput. Electron. Agric. 2023, 214, 108304. [Google Scholar] [CrossRef]
- Tian, Y.; Yang, G.; Wang, Z.; Wang, H.; Li, E.; Liang, Z. Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Comput. Electron. Agric. 2019, 157, 417–426. [Google Scholar] [CrossRef]
- Li, H.; Li, C.; Li, G.; Chen, L. A real-time table grape detection method based on improved YOLOv4-tiny network in complex background. Biosyst. Eng. 2021, 212, 347–359. [Google Scholar] [CrossRef]
- MacEachern, C.B.; Esau, T.J.; Schumann, A.W.; Hennessy, P.J.; Zaman, Q.U. Detection of fruit maturity stage and yield estimation in wild blueberry using deep learning convolutional neural networks. Smart Agric. Technol. 2023, 3, 100099. [Google Scholar] [CrossRef]
- Zhang, L.; Wu, L.; Liu, Y. Hemerocallis citrina Baroni maturity detection method integrating lightweight neural network and dual attention mechanism. Electronics 2022, 11, 2743. [Google Scholar] [CrossRef]
- Li, Q.; Ma, W.; Li, H.; Zhang, X.; Zhang, R.; Zhou, W. Cotton-YOLO: Improved YOLOV7 for rapid detection of foreign fibers in seed cotton. Comput. Electron. Agric. 2024, 219, 108752. [Google Scholar] [CrossRef]
- Yang, G.; Wang, J.; Nie, Z.; Yang, H.; Yu, S. A lightweight YOLOv8 tomato detection algorithm combining feature enhancement and attention. Agronomy 2023, 13, 1824. [Google Scholar] [CrossRef]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), online, 19–25 June 2021. [Google Scholar]
- Liu, S.; Huang, D. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Lin, T. Focal Loss for Dense Object Detection. arXiv 2017, arXiv:1708.02002. [Google Scholar]
- Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv 2022, arXiv:2205.12740. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Loffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- He, P.; Zhao, S.; Pan, P.; Zhou, G.; Zhang, J. PDC-YOLO: A Network for Pig Detection under Complex Conditions for Counting Purposes. Agriculture 2024, 14, 1807. [Google Scholar] [CrossRef]
- Jiang, L.; Wang, Y.; Wu, C.; Wu, H. Fruit Distribution Density Estimation in YOLO-Detected Strawberry Images: A Kernel Density and Nearest Neighbor Analysis Approach. Agriculture 2024, 14, 1848. [Google Scholar] [CrossRef]
- Ma, S.; Xu, Y. Mpdiou: A loss for efficient and accurate bounding box regression. arXiv 2023, arXiv:2307.07662. [Google Scholar]
- Zhang, H.; Zhang, S.J. Shape-IoU: More Accurate Metric considering Bounding Box Shape and Scale. arXiv 2023, arXiv:2312.17663. [Google Scholar]
- Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).