1. Introduction
Accurate detection of red raspberry maturity is crucial for timely harvesting and ensuring quality in agriculture. Traditional methods, relying on features like color histograms and shape descriptors, struggle with low robustness and accuracy in complex environments with lighting variations, background interference, and occlusion [
1]. In recent years, deep learning, particularly convolutional neural networks (CNNs), has dramatically improved maturity detection, enhancing accuracy and stability [
2]. Early research focused on image processing and computer vision techniques. For instance, Kienzle et al. (2012) utilized principal component and cluster analysis to analyze mango maturity [
1]. Mohammadi et al. (2015) attained 0.9024 accuracy in persimmon maturity detection with LDA and QDA classifiers [
2]. Khoshnam et al. (2016) employed acoustic testing to assess melon maturity, demonstrating that frequency changes indicate maturation [
3]. Furthermore, Namdari Gharaghani et al. (2020) utilized finite element modal analysis to detect orange maturity, achieving over 0.91 consistency with experimental data [
4]. Wakchaure et al. (2024) developed an image processing-based prototype for plantain maturity detection, automating manual classification [
5], while Kumar Saha et al. (2024) estimated tomato maturity using dual-wavelength LiDAR data, achieving spatially resolved maturity classification [
6].
The advent of deep learning, most notably convolutional neural networks (CNNs), has precipitated a paradigm shift in the domain of fruit maturity detection. A comprehensive review of fruit classification, maturity detection, and grading methodologies was conducted by Reshm and Sreekumar (2018) [
7]. Subsequently, Surya Kiran and Niranjana (2019) provided a synopsis of the advancements in various maturity detection technologies [
8]. Pardede et al. (2021) enhanced the efficiency of maturity classification by integrating VGG16 transfer learning with multi-layer perceptron (MLP) blocks [
9]. Momeny et al. (2022) integrated a deep CNN with Bayesian optimization to enhance the robustness of detecting orange and black spot diseases and maturity [
10]. Chen et al. (2022) proposed a method combining visual saliency maps and CNNs for citrus fruit maturity detection [
11]. Azadnia et al. (2023) used the Inception-V3 model to detect high-precision hawthorn fruit maturity through machine vision and deep learning [
12]. In another study, Olisah et al. (2024) addressed the challenge of inconspicuous features in Blackberry maturity detection by employing a multi-input CNN ensemble and optimizing the VGG16 model to enhance accuracy [
13]. In a related study, Astuti et al. (2019) applied the K-nearest neighbor algorithm and image acquisition for oil palm fruit maturity detection [
14]. Zhao and Chen (2021) utilized color information and an SVM model for wolfberry maturity detection [
15]. Similarly, Kumar et al. (2022) developed a non-destructive tomato maturity model using reflectance data and chemometric analysis [
16].
In recent years, YOLO (You Only Look Once) models have become increasingly prevalent in fruit maturity detection, owing to their efficiency and accuracy. Bonora et al. (2021) employed a YOLO-based convolutional neural network (CNN) to detect maturity and physiological disorders in “Abbé Fétel” pears, yielding favorable classification results [
17]. Li et al. (2022) developed a real-time sweet cherry maturity detection algorithm using YOLOX, improving accuracy in complex environments [
18]. Xiao et al. (2023) introduced a lightweight method for blueberry maturity detection based on an enhanced YOLOv5n algorithm, incorporating ShuffleNet and CBAM modules for better feature fusion and high recall [
19]. Xu et al. (2023) proposed the YOLO Jujube method for detecting jujube fruit maturity in natural environments [
20]. In addition, Xia Hongmei et al. (2021) utilized a Faster R-CNN model with an attention mechanism and multi-scale fusion to detect hydroponically grown broccoli buds with 0.965 accuracy [
21]. Xingxu Li et al. (2023) designed a cascaded visual inspection system that improved cherry tomato picking efficiency through target detection and feature discrimination [
22]. Fengjun Chen et al. (2024) enhanced YOLOv7 to address occlusion in oil tea trees, boosting mAP to 0.946 [
23]. Similarly, Ligang Wu et al. (2024) proposed YOLOv8-ABW, a system that integrates AIFI and Bi FPN to enhance the efficiency of yellow flower maturity detection [
24]. Xu Tingting et al. (2024) introduced YOLO v7-RA, a system that combines ELAN_R3 and hybrid attention mechanisms to detect the maturity of dragon fruit [
25]. Youwen Tian et al. (2024) optimized blueberry maturity detection with the MSC-YOLOv8 model, incorporating MobileNetV3 and CBAM modules [
26]. Xuesong Jiang et al. (2024) reviewed profound learning advancements in non-destructive forest fruit quality detection [
27]. Liu Zhigang et al. developed a machine vision-based method for apple maturity detection using an RGB model for high accuracy [
28]. Runchi Zhang et al. (2024) enhanced YOLOv8 for tomato counting, achieving 0.938 accuracy [
29]. Li Ying et al. (2024) proposed the YOLOv8s model for citrus fruit maturity, improving mAP with an adaptive fusion head [
30]. Using multi-scale feature fusion, Liu et al. (2024) optimized YOLOv5ns for apple maturity detection [
31]. Sun et al. (2024) introduced a lightweight YOLO-FHLD method for date maturity detection, improving accuracy and model expressiveness [
32]. Jing et al. (2024) proposed YOLO-IA for peaches, achieving high-precision detection with a progressive feature pyramid network [
33]. Zhu et al. (2024) used YOLO-LM to reduce camellia fruit occlusion with Criss-Cross Attention [
34]. Ye et al. (2024) introduced CR-YOLOv9 for strawberry maturity detection, optimizing network design for efficiency [
35]. Zhai et al. (2024) proposed an attention mechanism and bidirectional feature pyramid network for blueberry maturity detection, achieving 0.888 accuracy and 0.882 recall [
36].
The findings of the aforementioned study demonstrate the efficacy of the deep learning model in recognizing the ripeness of various fruits. In the case of red raspberries, for instance, the fruits were arranged in clusters and spaced extensively in both greenhouse and outdoor field environments. The light in the greenhouse is more uniform, and there is less shade between the fruits. However, water vapor or haze may compromise the clarity of the fruits and branches due to the enclosed environment and higher humidity. Conversely, outdoor field environments are subject to light, wind, and temperature variations, and red raspberries frequently experience shading and overlap, which hinders accurate detection. While the deep learning model offers robust detection capability, its practical application is constrained by the device’s computational capacity, preventing optimal real-time detection. This study proposes an enhanced version of a lightweight YOLO v11n red raspberry ripeness detection model to address these limitations. The model’s primary strengths are its high accuracy, robustness, and computational efficiency. The specific strategies employed to achieve these objectives are as follows:
(1) The HCSA attention mechanism: The HCSA module proposed in this study effectively combines halo attention, channel attention, and spatial attention, thereby enhancing the model’s capacity to extract and represent features, particularly in recognizing spatial information and detailed features and demonstrating significant ad-vantages.
(2) The second component is the extended residual module (DWR), which enhances the feature extraction ability and multi-pixel perception accuracy while im-proving computational efficiency. The model’s learning process is optimized by introducing residual connectivity to enhance its accuracy and real-time performance.
(3) The lightweight dynamic upsampling module (DySample) is integrated into the network’s backbone and neck, thereby enhancing the extraction capability of multi scale feature maps, reducing background noise with optimized spatial resolution, and improving the performance of detail changes.
3. Results and Analyses
3.1. Results and Analysis of the Ablation Experiment
In the red raspberry maturity detection task, the independent contributions and synergistic effects of the three modules, HCSA, DWR, and DySample, were evaluated through ablation experiments. In order to ensure the stability of the experimental results, three independent experiments were performed on the performance of each model, and the average and standard deviation of each experiment were calculated. The HCSA module has been found to enhance the ability to focus on key features of the fruit through halo convolution and channel-space attention mechanisms, especially in feature extraction in complex backgrounds.
Table 1 shows the results of the ablation experiment for different module configurations. The results, calculated from three independent experiments, show that the HCSA module, when used alone, has a precision of 0.935 and a mAP@0.5 of 0.928, and the standard deviation is slight (standard deviation < 0.01) in different experiments, showing the stability and high efficiency of the module. The F1-score of the HCSA module is 0.883, indicating balanced performance in identifying unripe and ripe red raspberries. Furthermore, the HCSA module has been shown to enhance the response strength of key regions when dealing with overlapping fruit areas, thereby verifying its exceptional performance in complex backgrounds.
In the context of the DWR module, while it does not match the DySample module’s proficiency in small target detection (mAP0.5–95 = 0.762), DWR demonstrates a reduction in calculation while preserving a high mAP@0.5 (0.905) through multi-scale dilated convolution, in scenarios where standard deviations are minimal (i.e., standard deviation < 0.02), the DWR module’s superiority in optimizing global feature distribution becomes more pronounced, leading to substantial enhancement in performance, particularly in the detection of diverse fruits.
The DySample module, with its dynamic upsampling mechanism, demonstrated particular efficacy in detecting small objects, achieving a mAP0.5–95 of 0.799 and a mAP0.5 of 0.945, indicating its effectiveness in detecting ripe fruit. The DySample module enhances spatial resolution by 2.3 fold through adaptive sampling offset generation technology, enabling the model to maintain high-performance stability in high-interference environments.
In further module combination analysis, when HCSA and DWR are combined, mAP@0.5 improves to 0.931, which is 0.3 percentage points higher than the best value of HCSA alone, showing good complementarity. The whole combination of the three modules (HCSA + DWR + DySample) ultimately achieved the best performance, with a precision of 0.922, mAP@0.5 of 0.934, mAP0.5–95 of 0.798, and F1-score of 0.890, all of which demonstrate the synergistic effect between the modules.
To verify the stability and statistical significance of the experimental results, we performed t-tests on the experimental results of each module configuration. The experimental results show that the performance differences between each module configuration are statistically significant (p-value < 0.05), proving that the combination of modules significantly improves model performance. Finally, this combined architecture forms a complete closed-loop for feature processing: The HCSA provides high-confidence region localization, the DWR ensures the integrity of feature transmission, and the DySample completes fine-grained reconstruction. In an orchard environment with complex factors such as leaf occlusion and reflective interference, the model’s overall detection accuracy (mAP@0.5 > 0.92) has met the technical requirements of commercial harvesting systems. Its modular design shows cross-species generalization potential and provides an extensible technical framework for agricultural visual inspection.
3.2. Compare the Results of the Experiment with the Visual Analysis
This study conducted a multi-dimensional performance evaluation of YOLOv3n, YOLOv5n, YOLOv6n, YOLOv9c, YOLOv10n, YOLOv11n, and our proposed improved model. As demonstrated in
Table 2 and
Figure 7, a visual comparative analysis reveals significant disparities among the models when confronted with complex scenarios, such as overlapping fruits, leaf occlusions, background interference, and uneven lighting.
The YOLOv3n model demonstrated consistent performance in fundamental scenarios and effectively handled scenarios with sparse object distributions and minimal background interference, achieving a mAP@0.5 of 0.902 and an F1-score of 0.899. However, when confronted with more complex backgrounds or extremely uneven lighting conditions, the model’s capacity to identify occluded objects diminished considerably, suggesting that its ability to model spatial context in complex environments is constrained and that it struggles to capture object features effectively. YOLOv5n balances speed and accuracy through an enhanced feature pyramid structure. A comparison with YOLOv3n reveals a substantial enhancement in inference speed, with mAP@0.5 increasing to 0.908. The model demonstrates notable robustness in scenes with moderate to heavy leaf occlusions, exhibiting a consistent F1-score. However, the detection confidence exhibits substantial fluctuations in scenes with significant lighting variations, suggesting that the model’s sensitivity to lighting changes could be further optimized.
In the benchmark test, YOLOv6n demonstrated suboptimal performance, particularly in scenarios characterized by dense background interference and complex lighting conditions, as evidenced by a mAP@0.5 of 0.881, lower than that of YOLOv5n. Visual analysis revealed that YOLOv6n experienced a semantic information loss during the high-level feature fusion stage. This led to an inadequate response to small occlusions and edge targets, consequently affecting its detection performance in complex environments.
Conversely, YOLOv9c demonstrated notable efficacy in high-density fruit overlapping scenes, with an accuracy of 0.913 and mAP@0.5 of 0.892. However, its recall rate was suboptimal, with an F1-score of 0.810. This was primarily due to the high confidence threshold, which suppressed false positives but also led to a substantial increase in the missed detection rate of small objects. When the visible area of the fruit is small, the model’s detection ability is limited. YOLOv10n enhances its adaptability to complex environments through a sparse dynamic convolution strategy, maintaining a stable mAP@0.5 (0.908) in challenging scenarios. However, when confronted with drastic lighting conditions, its color space modeling capability is found wanting, resulting in a decline in the F1-score, thereby highlighting the model’s inadequacy in such complex lighting scenarios. YOLOv11n exhibits a balanced overall performance, with a mAP@0.5 of 0.907, though its performance in complex scenes is marginally inferior to that of other models. The experimental results demonstrate that the confidence calibration mechanism of this model exhibits bias in extreme scenes, leading to inadequate detection stability.
In contrast, our enhanced model, which incorporates the HCS attention mechanism, DWR multi-scale feature fusion, and DySample dynamic sampling technology, maintains a computational volume of 8.2 GFLOPs while achieving a mAP@0.5 elevation to 0.934, representing a 2.9% enhancement compared to YOLOv11n. The model demonstrates notable advantages in complex scenes characterized by dense occlusions and uneven lighting. Visualizing results serves to verify the effectiveness of the improved model in accurately locating overlapping targets and suppressing background noise, thus confirming the superiority of the module design.
The dynamic curve in
Figure 8 demonstrates that the enhanced model exhibits consistent stability and efficiency during the training process, unlike other models that demonstrate variability or suboptimal performance in complex scenarios. These findings indicate that the model attains enhanced detection accuracy and adaptability to complex scenarios through optimized feature extraction, multi-scale feature fusion, and lightweight design. This provides a technically effective method for intelligent picking in the context of red raspberry agriculture.
In order to undertake a more systematic evaluation of the effectiveness of the model improvement scheme, we utilize Gradient-weighted Class Activation Mapping (GradCAM) technology to visually compare and analyze the original YOLOv11n model with the improved YOLOv11n. As demonstrated in
Figure 9, in the overlapping fruit scene, the improved YOLOv11n can accurately distinguish and identify the overlapping fruits. The heat map illustrates that there is more precise target positioning, avoiding blurry or misidentified targets. For leaf occlusion, the improved YOLOv11n ensures the detection of occluded fruits through optimized feature extraction and multi-scale feature fusion. The heat map demonstrates that the region of interest is more extensive, covering partially occluded targets. When faced with background interference, the improved YOLOv11n can effectively reduce the impact of background noise. The heat map demonstrates a clear separation of the fruit from the background, enhancing the detection accuracy in complex background scenes. In an environment with uneven lighting, the improved YOLOv11n can stably identify the fruit, as shown by the heat map. The heat map also shows a more even distribution of attention, indicating that the model is more adaptable to changes in lighting.
Confusion matrices are utilized to demonstrate the performance of a model in classification tasks, particularly those involving multiple categories. They facilitate the visualization and comprehension of the model’s performance across diverse categories, with particular emphasis on misclassification and miss detection, as illustrated in
Figure 10. In this study’s context of the red raspberry ripeness detection task, the enhanced model exhibits notable advantages in accurately identifying unripe and ripe fruits. An analysis of the confusion matrix reveals that the enhanced model successfully identifies 407 samples in the detection of immature fruits, exhibiting a more robust performance in complex scenarios (e.g., when fruits overlap and the background is interfered with) compared to the 359 recognition results of the YOLOv11n model. This enhancement can be attributed primarily to optimizing feature extraction and spatial information processing in the enhanced model, which facilitates a more precise capture of the subtle characteristics of immature fruits. The enhanced model demonstrates noteworthy performance in detecting ripe fruits, accurately identifying 291 ripe fruits, while the YOLOv11n model recognizes only 274. This outcome suggests that the enhanced model possesses a superior capacity for localizing and extracting features of ripe fruits, particularly in identifying fruit color gradients and contour alterations with enhanced precision. Moreover, the enhanced model has achieved substantial progress in misclassification control. When unripe fruits are erroneously classified as ripe fruits, the enhanced model exhibits a mere 11 misclassifications, whereas the YOLOv11n model demonstrates eight misclassifications. While the YOLOv11n model exhibits a minor edge in processing background categories, the enhanced model substantially mitigates the confusion between background and fruit by optimizing the background noise suppression mechanism, enhancing overall detection accuracy. The enhanced model evinces greater robustness and accuracy in the red raspberry ripeness detection task. The enhanced model can effectively distinguish fruits of varying maturity levels in complex orchard environments. The enhanced model provides more reliable technical support for automated picking systems in real-world application scenarios by reducing the misclassification rate.
4. Discussion
The enhanced YOLOv11n model, as outlined in this study, exhibited notable advantages in the red raspberry ripeness detection task. However, several aspects necessitate further in-depth discussion.
4.1. Module Synergy and Performance Balance
The detection framework proposed has achieved substantial breakthroughs in multiple pivotal indicators, particularly in extreme testing scenarios. While maintaining a precision of 0.922 and a mAP@0.5 of 0.934, it has exhibited a 1.5% and 2.9% enhancement, respectively, compared to the leading comparative model. Incorporating the HCSA attention mechanism, dilation-wise residual and dynamic upsampling modules have been demonstrated to enhance the model’s performance without a substantial decline in the F1-score (0.890) while concurrently reducing the false detection rate in complex scenarios. The visualized heat map illustrates the framework’s ability to accurately identify key morphological characteristics of red raspberries, including color gradient areas and changes in contour curvature. Notably, even in scenarios of substantial leaf occlusion, the model maintains a recognition confidence rating of over 0.87, demonstrating a substantial enhancement in detection accuracy and robustness within complex environments. Acknowledging the contributions of prior studies that have optimized the YOLO model is also pertinent. For instance, Xiao et al. (2023) enhanced YOLOv5n by integrating ShuffleNet and CBAM modules, thereby demonstrating a substantial enhancement in feature fusion capability for blueberry detection [
19].
Nevertheless, the model continues encountering difficulties in multi-scale feature fusion, particularly in complex scenarios. In contrast, Wu et al. (2024) integrated the AIFI and BiFPN modules into YOLOv8 to enhance the performance of yellow flower maturity detection, demonstrating superiority in scenes with small object occlusions [
24]. Nevertheless, the YOLO model remains unstable in complex environments, such as fruit occlusions and light changes.
To address these issues, this study’s proposed HCSA attention mechanism significantly improves the model’s detection ability in complex scenes by fusing halo attention, channel attention, and spatial attention, especially in cases of overlapping fruits and significant changes in lighting, showing greater robustness. The DySample module effectively addresses the limitations of the YOLOv9c model in occluded scenes by dynamically adjusting the upsampling strategy, enhancing detection accuracy in complex environments. However, its reliance on high-resolution inputs may incur certain preprocessing costs. In summary, the collaborative integration of the HCSA, DWR, and DySample modules enhances the model’s adaptability in complex scenes. The HCSA enhances the ability to focus on key features of the fruit (e.g., color gradients and edge contours) through a hybrid attention mechanism. However, its high computational complexity may affect the real-time performance of low-powered devices. The DWR module optimizes the efficiency of feature extraction through multi-scale dilated convolutions, but when detecting small objects, there is still insufficient detail capture. The DySample module effectively reduces background noise; however, the requirement for high-resolution input may result in high preprocessing costs.
4.2. Environmental Generalization and Robustness
Compared with the existing YOLOv5n and YOLOv9c models, the enhanced model has achieved a superior balance between accuracy and computational efficiency. YOLOv9c exhibits a high miss-detection rate for small objects due to its elevated confidence threshold. However, by incorporating dynamic sampling technology, the miss-detection rate has been effectively reduced to 3.2%. However, the model still exhibits misdetections in complex environments, such as misjudging the background as fruit, which can potentially impact the operational efficiency of automated picking robots. To further mitigate misjudgments, future research can explore using near-infrared spectroscopy or depth information to assist in classification.
The challenges of fruit occlusion and light changes are also relevant here, as they are also faced in the studies of Chen et al. (2022) [
23] and Zhu et al. (2024) [
34] on the optimization of YOLOv7/YOLO-LM in the detection of camellia fruits. Whilst existing methods have made advances in occlusion processing, the majority still rely on static attention mechanisms, which renders them less adaptable to light changes. The hybrid attention mechanism (HCSA) proposed in this study significantly improves the robustness and adaptability of the model in complex environments with uneven lighting by dynamically adjusting spatial and channel features. The improved model significantly enhances its performance in complex agricultural environments by introducing dynamic sampling techniques and the hybrid attention mechanism (HCSA). These innovations not only enhance the adaptability and robustness of the model but also provide more substantial support for balancing accuracy and efficiency in future practical applications.
4.3. Practical Application Challenges
Despite the model’s satisfactory performance in greenhouse and field environments, further verification is required to ascertain its stability under sudden changes in light (e.g., alternating between intense light and shade). Although the experiment encompassed diverse weather conditions, the model’s generalization capability in these specific environments remains to be thoroughly evaluated due to the absence of tests involving extreme weather conditions (e.g., rainy and foggy). It is postulated that light variations may influence the model’s ability to recognize fruit characteristics, thereby impacting detection accuracy. Consequently, future research should prioritize enhancing the model’s resilience to these conditions. Additionally, the impact of dynamic environmental factors, such as fruit shaking caused by wind or leaf occlusion, on detection accuracy remains to be elucidated. Wind may induce motion blur of the fruit, while leaf occlusion may result in partial or complete obstruction, affecting recognition efficacy. Consequently, subsequent studies may necessitate the validation of the model’s resilience through dynamic scene simulations or real-world environmental testing. Concerning real-time requirements, the model’s inference speed must align with the robotic arm’s response speed. However, in intricate scenarios, the parallel processing of multiple targets might exacerbate the system’s response delay. This, when coupled with the constraints on the computing capabilities of edge devices, could potentially diminish the benefits of the dynamic upsampling module. Consequently, future optimization should prioritize enhancing the algorithm’s efficiency to better align with the demands of low-power hardware. Concerning cross-species generalization capability, while the model exhibits a modest detection error for blueberries and strawberries, these outcomes remain to be validated across diverse datasets. Hence, subsequent endeavors could involve assessing the model’s performance on alternative small berry crops through transfer learning or domain adaptation techniques. Furthermore, the existing dataset is incomplete and lacks coverage of all stages of the fruit growth period (e.g., transitional maturity states), which may result in the model’s underperformance at certain maturity stages. To address this limitation, it is recommended that future research focus on collecting more comprehensive data to enhance the model’s adaptability.
4.4. Future Research Directions
The following research directions are recommended for exploration in future studies: dynamic weight allocation techniques to adjust module weights based on environmental complexity; multimodal fusion to optimize 3D spatial localization of fruits by combining depth camera or LiDAR point cloud data to reduce overlapping misclassification; and lightweight design to further compress the model parameters through knowledge distillation or Neural Network Architecture Search (NAS) techniques to improve the computational efficiency and adapt to edge computing devices. Despite the success of the enhanced model in this study, challenges remain, including environmental dynamics and hardware adaptability in practical deployment. Future research should consider integrating multimodal sensing with adaptive optimization strategies to advance intelligent detection technology in agriculture toward greater practicality and pervasiveness.
5. Conclusions
This study proposes an enhanced lightweight network model, YOLOv11n, for detecting fruit maturity in red raspberries. The model incorporates a hybrid attention mechanism (HCSA), dilated residual (DWR), and dynamic sampling technology (DySample), building upon the original YOLOv11n model. This study’s primary conclusions are as follows:
(1) Firstly, experimental findings demonstrate that incorporating the HCSA, DWR, and DySample modules enhances the model’s feature extraction capacity and robustness, particularly in scenarios involving overlapping fruits, background interference, and lighting variations. The experimental findings demonstrate that the model’s precision on the test set attained 0.922, mAP@0.5 reached 0.934, and mAP@0.5-0.95 reached 0.798, representing enhancements of 2.0%, 9.8%, and 3.7%, respectively, in comparison to the original YOLOv11n model. This enhancement signifies an advancement in detection accuracy and the model’s adaptability.
(2) The effectiveness of the enhanced module was validated through a series of ablation experiments, in which the integration of three modules with the original YOLOv11n model yielded notable enhancements. The HCSA module facilitated high-confidence region localization, the DWR module ensured the integrity of feature transmission, and the DySample module facilitated fine-grained reconstruction. In an orchard environment with complex factors such as leaf occlusion and reflective interference, the model’s comprehensive detection accuracy (mAP0.5 > 0.92) meets the technical requirements of a robotic harvesting system. Its modular design also shows cross-species generalization potential, providing an extensible technical framework for agricultural visual inspection.
(3) Comparative experiments demonstrated that the enhanced YOLOv11n model exhibited superior performance in red raspberry maturity detection when compared with mainstream models such as YOLOv3n, YOLOv5n, YOLOv6n, YOLOv9c, and YOLOv1. The enhanced YOLOv11n model demonstrated a 3.2%, 2.6%, and 9.3% improvement in mAP@0.5 and a substantial enhancement in mAP@0.5:0.95 compared to other models. Integrating a multi-scale attention mechanism, an expanded residual and dynamic sampling technique not only enhances detection accuracy but also significantly reduces the computational complexity of the model, underscoring its substantial practical application potential.
In summary, the improved YOLOv11n model proposed in this study provides an efficient and lightweight solution for red raspberry maturity detection, especially for precision agriculture detection in complex environments. The model improves detection accuracy and real-time performance, providing new technical support for developing agricultural robot intelligent picking technology.