Ripe-Detection: A Lightweight Method for Strawberry Ripeness Detection

Yu, Helong; Qian, Cheng; Chen, Zhenyang; Chen, Jing; Zhao, Yuxin

doi:10.3390/agronomy15071645

Open AccessArticle

Ripe-Detection: A Lightweight Method for Strawberry Ripeness Detection

by

Helong Yu

^1,2

,

Cheng Qian

¹,

Zhenyang Chen

²,

Jing Chen

^1,*

and

Yuxin Zhao

^3,*

¹

College of Information Technology, Jilin Agricultural University, Changchun 130118, China

²

Smart Agriculture Research Institute, Jilin Agricultural University, Changchun 130118, China

³

School of Municipal and Environmental Engineering, Jilin Jianzhu University, Changchun 130000, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2025, 15(7), 1645; https://doi.org/10.3390/agronomy15071645

Submission received: 2 June 2025 / Revised: 3 July 2025 / Accepted: 5 July 2025 / Published: 6 July 2025

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Strawberry (Fragaria × ananassa), a nutrient-dense fruit with significant economic value in commercial cultivation, faces critical detection challenges in automated harvesting due to complex growth conditions such as foliage occlusion and variable illumination. To address these limitations, this study proposes Ripe-Detection, a novel lightweight object detection framework integrating three key innovations: a PEDblock detection head architecture with depth-adaptive feature learning capability, an ADown downsampling method for enhanced detail perception with reduced computational overhead, and BiFPN-based hierarchical feature fusion with learnable weighting mechanisms. Developed using a purpose-built dataset of 1021 annotated strawberry images (Fragaria × ananassa ‘Red Face’ and ‘Sachinoka’ varieties) from Changchun Xiaohongmao Plantation and augmented through targeted strategies to enhance model robustness, the framework demonstrates superior performance over existing lightweight detectors, achieving mAP50 improvements of 13.0%, 9.2%, and 3.9% against YOLOv7-tiny, YOLOv10n, and YOLOv11n, respectively. Remarkably, the architecture attains 96.4% mAP50 with only 1.3M parameters (57% reduction from baseline) and 4.4 GFLOPs (46% lower computation), simultaneously enhancing accuracy while significantly reducing resource requirements, thereby providing a robust technical foundation for automated ripeness assessment and precision harvesting in agricultural robotics.

Keywords:

computer vision; deep learning; image processing; precision agriculture; strawberry ripeness detection

1. Introduction

Strawberry (Fragaria × ananassa) is a perennial herbaceous plant of the genus Fragaria in the Rosaceae family. It is a crop rich in nutritional value and of high economic importance [1]. Strawberries serve as a vital source of essential nutrients for human health [2] while generating substantial economic returns for agricultural practitioners. Due to their soft texture, strawberry fruits lose their firmness rapidly and are highly susceptible to fungal infections [3]. This technical constraint necessitates precision-controlled operations in automated strawberry harvesting systems [4], requiring timely picking at two defined ripeness thresholds: the approaching ripeness stage and the ripeness phase. This quality preservation mechanism ensures strawberries maintain market-ready conditions post-harvest, facilitating both immediate local distribution and extended long-distance transportation, thereby enhancing economic returns through expanded market coverage. Therefore, accurate detection of strawberry ripeness has become a crucial task [5]. Traditional methods dependent on manual observation of color characteristics or component analysis [6] are plagued by low efficiency, strong subjectivity, and high costs, failing to meet the demands of large-scale agricultural production. With the continuous increase in agricultural production costs [7], particularly rising labor expenditures, developing automated non-destructive detection technologies has become crucial for enhancing agricultural productivity [8]. The application of computer vision technology for automated strawberry ripeness detection provides an efficient and precise methodological solution that maintains fruit integrity throughout the assessment process [9].

In recent years, many researchers have conducted studies in the field of agriculture [10,11,12]. The rapid development of computer technology has led to its widespread adoption in agricultural applications [13,14,15,16], including but not limited to disease or pest detection [17,18], crop yield estimation [19], and crop classification [20]. In recent years, numerous researchers have conducted computer vision studies on strawberries. Mi et al. developed a strawberry ripeness identification method using RGB images combined with YOLOv9. An average accuracy of 87.3% was achieved in this task [21]. Wang et al. proposed the DSE-YOLO model for detecting and classifying strawberries at different growth stages [22]. The model employs a self-designed Detail-Semantics Enhancement (DSE) module to effectively extract strawberry features across developmental phases, enabling accurate detection. Zhang and Yu developed RTSD-Net, a streamlined strawberry detection framework based on YOLOv4-Tiny [23]. By optimizing the baseline architecture through structural simplification, the model achieves accelerated computational performance while maintaining detection accuracy, making it particularly suitable for real-time monitoring applications in strawberry cultivation environments. Li et al. developed the DAC-YOLOv4 architecture for early-stage powdery mildew detection on strawberry leaves [24]. This framework innovatively integrates depthwise convolutional layers with hybrid attention mechanisms in both backbone and neck network components, providing an effective solution for timely plant disease identification in precision agriculture systems. Wang et al. proposed a transformer-based ripeness detection framework that establishes quantitative correlations between chromatic characteristics and developmental stages in strawberries [25]. This methodology leverages self-attention mechanisms to adaptively weight color proportion features across different spectral channels, enabling precise ripeness assessment through fruit surface pigmentation analysis. Yang et al. developed LS-YOLOv8s for strawberry ripeness detection by implementing three critical modifications to the YOLOv8s framework [26]. The enhancements comprise integrating learnable parameter-based scaled normalization into Swin Transformer’s residual structures, deploying LW-Swin Transformer modules in TopDown Layer2 to optimize feature fusion efficiency, and incorporating multi-head attention mechanisms to strengthen model generalization capabilities. To address strawberry detection challenges under occlusion conditions, Du et al. proposed an enhanced YOLOv7-based framework incorporating two key technical improvements [27]. The method integrates Deformable Convolution v3 (DCNv3) into the ELAN module to strengthen feature representation of occluded targets, while implementing Shuffle Attention mechanisms in the backbone network to optimize discriminative feature learning. These modifications collectively improve detection accuracy in real-world scenarios with partial fruit occlusion. Zheng et al. developed an enhanced FaceNet architecture optimized for occlusion handling in strawberry remote-sensing imagery [28]. The framework innovatively integrates clustering algorithms with feature embedding distances generated by FaceNet to establish spatial correspondences between individual strawberry plants. This dual-processing approach achieves precise localization and enumeration of fruits and flowers even under mutual occlusion conditions, with the model attaining 96.98%, 99.09%, and 97.17% recognition accuracy for strawberry flowers, immature fruits, and mature fruits, respectively.

Small target detection in agricultural vision systems poses greater technical challenges compared to large-scale object recognition. Tao et al. developed the YOLOv5s-BiCE framework by implementing three critical enhancements to the baseline YOLOv5s model for strawberry ripeness identification [29]. The methodology replaces conventional upsampling with Content-Aware ReAssembly of FEatures (CARAFE) modules to improve feature reconstruction through dynamic kernel prediction and expanded receptive fields. Combined with a dual attention mechanism (Biformed) and multi-scale feature fusion architecture, the optimized network effectively reduces computational redundancy while enhancing small target discriminability. Further improvement in classification accuracy is achieved through Focal-EIoU loss optimization, which addresses sample imbalance issues. Experimental validation demonstrated a 2.8% increase in mean average precision and 7.4% higher accuracy compared to the original YOLOv5s model on strawberry ripeness datasets, confirming the framework’s practical value in precision agriculture applications. Despite so much research having been conducted, it is difficult to balance its computational power with detection accuracy. The shading of the strawberry growing environment for fruits, the dynamic change in light, different weather, and the difference in imaging quality of image sensors will affect the robustness of the model [30]. Aiming at the above technical bottlenecks, this study proposes the Ripe-Detection algorithm, so that training the model with diverse data will improve the detection performance and robustness of the model.

A lightweight single-stage detection framework tailored for strawberry ripeness grading. Based on an enhanced dataset constructed by high-fidelity simulation of an actual growing environment, the method demonstrates an excellent balance between computational efficiency (4.4 GFLOPs) and recognition accuracy (96.4% mAP50). The method provides reliable technical support for precision harvesting by agricultural robots.

We can summarize the main contributions of this study as follows:

(1) To propose a lightweight single-stage target detection algorithm (Ripe-Detection) for strawberry ripeness detection.

(2) Propose a new detection head structure named PEDblock. And combine the ADown module and BiFPN module to enhance the detection capability of the model.

(3) Produce and use an enhanced dataset constructed by high-fidelity simulation of the actual growing environment to improve the detection performance and robustness of the model.

2. Materials and Methods

2.1. Image Dataset Acquisition

Strawberries are recognized for their rich nutritional value [31] and show great economic potential in agricultural production, especially the Fragaria × ananassa ‘Red Face’ and ‘Sachinoka’ varieties. The experimental materials for this study were collected in April 2024 from Xiaohongmao strawberry plantation in Shuangyang District, Changchun City (125°34′2.86″ E, 43°43′3.94″ N), including the two varieties under greenhouse cultivation conditions. Experiments using images of two varieties of strawberries can enrich the diversity of features. Under the guidance of agricultural experts, the samples were classified into three phenological stages—ripe, nearly ripe, and unripe [21]—a classification system designed to address post-harvest physiological ripening during transport, thereby preventing economic losses due to over-ripening [32]. High-resolution images (3648 × 2736) were used in the experiments, captured in multiple directions by an iPhone 12 camera (Apple Inc., Cupertino, CA, USA) at a distance of 5–30 cm from the target to ensure comprehensive feature representation to optimize model training results [33].

2.2. Dataset Creation and Preprocessing

Using data augmentation to simulate more complex real-world scenarios can eliminate the dependence of the model on some specific features to improve the robustness and generalization of the model. To this end, we carefully annotated 1021 images into three categories: unripeness, soon-to-be ripeness, and ripeness. These labels are represented in training using “unripe”, “half-ripe”, and “ripe”, respectively. We divided the dataset into the training set, validation set, and test set according to a ratio of 6:2:2. At the same time, we used fog, brightness change, motion blur, and noise addition techniques to expand the dataset to 5105 images, and the number of labels increased to 14,910. The distribution of images with different enhancement methods and the number of labels within the dataset are detailed in Table 1 and Table 2.

Figure 1 shows a subset of the dataset, showing the effect of the strawberry dataset under five states: no enhancement, brightness change, motion blur, fog, and increased noise, which contains strawberry images with different ripened fruits, different densities, and different occlusion degrees. The noise was added to simulate the increase in imaging noise under conditions such as camera shake or weak ambient light during the shooting process, which helped to test the robustness of the model in low-quality images. The blur effect reflects the image distortion caused by motion or inaccurate focal length and evaluates the ability of the model to recognize blurred objects. Fogging simulates foggy days or scenes with poor air quality to examine the performance of the model in the absence of visual information. The change in light and shade simulated the environments of bright light and cloudy days, respectively, to test the detection ability of the model under different light conditions. The enhanced dataset is used to train the model, which improves the robustness of strawberry ripeness detection in different situations and ensures that its performance is more reliable in practical applications.

2.3. Model Construction

2.3.1. Ripe-Detection

The YOLO (You Only Look Once) family of algorithms has demonstrated superior detection accuracy and computational efficiency [34]. Based on these advantages, this study develops a mature detection model by systematically improving the YOLOv8n architecture. As shown in Figure 2, the established model is enhanced with strategic modifications for feature preservation: the ADown downsampling method replaces the traditional method in the feature extraction phase, which reduces the number of parameters while preserving richer local feature information. This dual improvement improves detail capture for small, low-contrast strawberry targets while optimizing model efficiency. By replacing the original Neck part with a repetitive bi-directional weighted feature pyramid network, the architecture is further refined in terms of feature integration, enhancing multi-scale feature fusion capabilities while maintaining computational economy. To solve the problem of ripeness differentiation accuracy degradation due to size change, a novel PEDblock optimization mechanism is proposed to maintain the anchor point accuracy during the dimension reduction operation. Experimental validation confirms that the model improves detection accuracy and operational robustness, and is particularly effective in complex agricultural environments, successfully balancing the requirements of accurate strawberry ripeness assessment and real-time processing.

2.3.2. ADown Module

Strawberries at the green ripening stage exhibit unique phenotypic characteristics characterized by chlorophyll-based surface pigmentation, dense cellular structure, and small fruit size. In cultivated environments, the high similarity between these immature fruits and the surrounding plant components (stems and leaves) complicates visual target discrimination. The ADown downsampling module alleviates this challenge by synergistically integrating convolutional operations with pooling through a two-branch architecture. This hybrid design achieves a balanced optimization between feature preservation and computational efficiency, performing resolution reduction while maintaining critical texture details required for small target recognition. Experimental results validate the module’s ability to enhance detection robustness in agricultural scenes with complex background interference.

The ADown module consists of a layered architecture combining pooling and convolutional layers [35], whose operational flow is detailed in Figure 3. The architecture first averages the input feature maps for pooling, preserving global contextual information through spatial aggregation. Subsequently, the feature map is divided into two parallel processing streams by channel. The first feature map processes segmented features directly through a convolutional layer to achieve dimensionality reduction while maintaining the integrity of the main features. The second feature map implements successive max-pooling and convolution operations, where max-pooling extracts locally salient features by identifying regional activation maxima, followed by convolutional filtering for fine-grained feature abstraction. By processing these dual pathways independently, the module captures both macro-scale structural patterns and micro-scale discriminative details. The final stage employs a concat operation to re-integrate complementary feature representations from two new feature maps to synthesize multi-scale contextual information. This dual-stream mechanism effectively mitigates the information attenuation during the resolution reduction process and especially improves the feature discrimination of small-scale strawberry targets embedded in complex agricultural contexts.

2.3.3. BiFPN Module

Multiple downsampling operations lead to a stepwise reduction in feature map size, and single-layer convolutional architectures have inherent limitations in multi-scale feature extraction, especially in agricultural inspection scenarios that require fine ripeness grading. As shown in Figure 4a, the Path Aggregation Network (PANet) improves the feature pyramid network [36] through bottom-up path augmentation, which partially mitigates the problem of insufficient cross-scale feature fusion in FPNs, but the expansion of the network structure imposes a significant computational burden. In contrast, repeat-weighted bi-directional feature pyramid networks (BiFPNs) achieve topology optimization by pruning single-input edge nodes; such nodes exhibit limited feature fusion utility due to the lack of multi-source information integration [37]. Meanwhile, BiFPNs establish shortcut connections between input and output nodes at the same hierarchical level to enhance feature blending efficiency at a lower computational cost. Traditional feature pyramid methods usually adopt a homogenized weighting strategy in feature fusion, ignoring the differences in feature contributions of heterogeneous scale inputs. BiFPNs effectively overcome this shortcoming by dynamically calibrating feature importance through learnable channel attention weights [38]. The sixth-level fusion mechanism shown in Figure 4b presents this process specifically: two heterogeneous feature representations are integrated after adaptive weighting to ensure the accurate retention of key ripeness-related visual patterns in strawberry detection tasks.

P_{6}^{t d} = Conv (\frac{w_{1} \cdot P_{6}^{i n} + w_{2} \cdot Resize (P_{7}^{i n})}{w_{1} + w_{2} + ϵ})

(1)

P_{6}^{out} = Conv (\frac{w_{1}^{'} \cdot P_{6}^{i n} + w_{2}^{'} \cdot P_{6}^{t d} + w_{3}^{'} \cdot Resize (P_{5}^{out})}{w_{1}^{'} + w_{2}^{'} + w_{3}^{'} + ϵ})

(2)

Among them,

P_{6}^{t d}

is the intermediate feature of the 6th level in the top−down path, and

P_{6}^{out}

is the output feature of the 6th level in the bottom-up path. All other features are constructed similarly. Finally, BiFPNs integrate bidirectional cross-scale connections and fast normalization fusion. This efficient and multi-scale feature fusion can effectively fuse feature information from different-scale feature maps, and by removing nodes with lower contributions, the parameter count is controlled.

2.3.4. PEDBlock

Accurate object detection requires correct classification and high-quality localization [39], and the loss function of BBR has a significant impact on its performance. Different-sized targets affect the accuracy of the anchor box, and Powerful-IoU(PIoU) is an efficient IoU loss function that combines a target size adaptive penalty factor and a gradient adjustment function based on the quality of the anchor box. PIoU effectively improves the accuracy of the anchor box using a penalty factor P and a function that adapts to the quality of the anchor box, as shown in Figure 5, where the grey box is the target box and the blue box is the prediction box.

The specific formula for its penalty factor is as follows:

P = \frac{\frac{d w_{1}}{w_{g t}} + \frac{d w_{2}}{w_{g t}} + \frac{d h_{1}}{h_{g t}} + \frac{d h_{2}}{h_{g t}}}{4} .

(3)

where

d w_{1}

,

d w_{2}

,

d h_{1}

, and

d h_{2}

are the absolute values of the distances between the corresponding edges of the prediction box and the target box, and

w_{g t}

and

h_{g t}

are the width and height of the target box, respectively. It is worth mentioning that PIoU is different from the penalty factors in other loss functions. Using the penalty factor in the loss function of PIoU does not cause the anchor box to increase in size. The PIoU loss, as demonstrated in Figure 5, uses only the edge length of the target box as the denominator of the loss factor. The denominator of the penalty factor P depends only on the size of the target box and is independent of the size of the anchor box and the smallest outer box of the target box [40]. Unless the anchor frame and the target frame completely overlap, P will never decay to 0. In addition, this PIOU is adaptive to the target size, and its adaptation function is calculated as follows:

f (x) = 1 - e^{- x^{2}}

(4)

P I o U = I o U - f (P), - 1 \leq P I o U \leq 1

(5)

L_{P I o U} = 1 - P I o U = L_{I o U} + f (P), 0 \leq L_{P I o U} \leq 2

(6)

When the penalty factor

P > 2

, the anchor frame gradient will take a smaller value and is used to suppress the gradient from low-quality anchor frames. When the penalty factor P is about 1, the anchor frame has a low overlap with the target frame, and the anchor frame gradient takes a larger value to allow for faster regression. When the penalty factor

P

is close to 0, the anchor box is close to the target box, and as the quality of the anchor box improves, the anchor box gradient takes a progressively smaller value, allowing the anchor box to be steadily optimized and fully aligned with the target box. This anchor box adaptive mechanism effectively improves the accuracy of the anchor box for targets of different sizes. It effectively improves the problem of reduced anchor box accuracy brought about by the different sizes of strawberries at different ripeness levels.

The number of parameters in the model and the amount of computation are important factors that affect the efficient operation of the device [41]. By applying PEDblock to the detection head, we hope to further reduce the number of parameters and computation in the model without reducing the detection capability of the model. As shown in Figure 6, we use PIoU to replace the Complete Intersection over Union (CIoU) in the original model. The four two-by-two Conv are removed and two 3 × 3 grouped convolutions are used in series, and then the outputs of the convolutions are passed to the two Conv2d layers in the next layer, respectively, and the final outputs are the detection results of the PIoU loss function and the classification results of the CLS loss function. In this case, the 3 × 3 group convolution reduces the total number of parameters required for each convolutional layer, which is achieved by dividing the channels of the input feature maps into multiple groups and performing the convolution operation separately within each group so that each convolution kernel only needs to interact with a subset of the input feature maps, thus reducing the total number of parameters required for each convolutional layer [42]. This structure allows the model to become more lightweight while still ensuring high detection accuracy.

2.4. Algorithmic Parameter Settings and Environment Configuration for Experiments

The server in the experiment uses Windows 10 as the operating system and PyTorch 2.1.2 as the deep learning framework to build the experimental platform with Python 3.8 and torch-1.1.3 + cuda11.2. The CPU model is Intel(R) Xeon(R) Gold 6246R 3.4GHz. The graphics card model is NVIDIA GeForce RTX 8000, and the memory is 128 GB. The detailed hyperparameters of the experiment are shown in Table 3.

2.5. Evaluation Indicators

The target detection model needs some evaluation metrics to assess the performance of the model. To evaluate the Ripe-Detection model, four evaluation metrics, precision, recall, average precision, and F1 score, were used in this study. These evaluation metrics enable a comprehensive evaluation of the Ripe-Detection model on the strawberry ripeness detection task. The calculation formulas of the relevant evaluation indexes are as follows:

p = \frac{T P}{T P + F P} \times 100 %

(7)

where TP denotes true positives, and FP denotes false positives.

R = \frac{T P}{T P + F N} \times 100 %

(8)

where TP denotes true positives, FP denotes false positives, and FN denotes false negatives.

A P = \int P (R) d R \times 100 %

(9)

where P is precision, and R is recall.

m A P = \frac{\sum_{i = 1}^{n} A P_{i}}{n}

(10)

where AP_i denotes the average precision of the ith category, and n is the number of categories.

F 1 = \frac{2 \times R e c a l l \times P r e c i s i o n}{R e c a l l + P r e c i s i o n}

(11)

3. Results

3.1. Ablation Experiment

To systematically validate the effectiveness of the proposed ripeness detection model for strawberry ripeness assessment, a comprehensive ablation study was conducted using a randomly sampled test set of 1021 images. Each structure was progressively integrated and analyzed in comparison to the baseline model. The same hardware configuration and hyperparameter settings were maintained for all experiments to ensure that control variables were compared. As shown in Figure 7a, the quantitative evaluation via the mAP50 metrics indicates that the full model configuration provides a significant improvement in detection accuracy with faster convergence and smoother convergence curves compared to making partial modifications. Compared to the baseline model, our model reaches a peak validation accuracy of 95% at epoch 40, indicating enhanced optimization stability. As shown in Figure 7b, we keep the model parameter counts and GFLOPs decreasing through the combination of each method, and our model has only 1.3M parameter counts and 4.4GFLOPs, which outperforms the baseline.

As shown in Table 4, the ablation experiments show that the performance of our proposed mature detection model is significantly improved compared to the baseline model. Compared to the original model, the Ripe-Detection framework shows a 6.8% improvement in precision, a 5.1% improvement in recall, a 4% improvement in F1 score, a 3.3% improvement in mAP50, and a 5.9% improvement in mAP50-95, which demonstrates its excellent generalization ability and robustness in strawberry ripeness detection. The integration of the ADown module into the baseline architecture resulted in a 3.1% improvement in accuracy, a 1.5% improvement in F1 score, and 0.4% and 0.3% improvements in mAP50 and mAP50-95, respectively, while the model parameters were reduced by 0.3M and the computational load was reduced by 0.5 GFLOPs. This optimization enhanced the accuracy of the strawberry ripeness detection by global feature preservation with average pooling and by the maximum pooling-convolution hybrid operation of local detail extraction, which enhances feature preservation for small irregular targets.

After we use BiFPN in the feature fusion phase, the key metrics are improved to 93.7% precision, 88.7% recall, 91.1% F1 score, 94.7% mAP50, and 88.1% mAP50-95, while the parameter and computation are reduced to 2.0M and 7.1M, respectively, by adaptive feature fusion weighting. Our proposed PEDblock architecture replaces the two-by-two Conv layers arranged in parallel, in tandem with consecutive 3 × 3 groups of convolutions, resulting in a slight decrease in performance mAP50 to 88.0 (88.7% to 88.0%) but a significant reduction in computation to 5.6 GFLOPs (8.2 GFLOPs to 5.6 GFLOPs) for the realization of the lightweight design of the model.

The synergistic combination of ADown and BiFPN achieves state-of-the-art performance (96.7% mAP50 and 90.2% mAP50-95) but still leaves much to be desired for the lightness of the model (1.7M parameters, 6.6 GFLOPs). The full integration of all three modules (ADown, BiFPN, and PEDblock) demonstrates excellent performance equilibrium: 95.5% accuracy, 92.8% recall, 94.1 F1 score, 96.4% mAP50, and 89.1% mAP50-95, as well as a reduction in parameters and computation to 43% (3.0M to 1.3M) and 54% (8.2 GFLOPs to 4.4 GFLOPS) of the baseline, respectively. Figure 8 confirms the stable convergence of the training and validation sets, with loss metrics verifying consistent multi-scale detection capabilities. The strategic use of 3 × 3 groups of convolutions in the detection header optimizes the trade-off between accuracy and efficiency and establishes an effective framework for smart agriculture.

3.2. Comparison Experiments of Different Loss Functions

The design of the loss function has a direct impact on the model’s performance. A well-structured loss function can accelerate model convergence while improving detection accuracy and robustness. To verify the effectiveness of the PIoU loss function in the Ripe-Detection model, we conducted comparative experiments on related loss functions. Under controlled experimental conditions with consistent parameters, we systematically evaluated six mainstream loss functions: CIoU, DIoU, EIoU, GIoU, SIoU, and PIoU. As shown in Table 5, we comprehensively evaluated the model performance using precision, recall, F1 score, mAP50, and mAP50-95 metrics. The experimental results show that the GIoU loss function has the weakest performance in the classification task, with a precision of only 94.1%. It is worth noting that although the DIoU loss function achieves comparable mAP50 results to PIoU (96.4%), its recall score (91.3%) is significantly lower than that of PIoU (92.8%). This 1.5% difference seriously affects the accuracy of target detection. In addition, we have compared the inference time impact of these different loss functions, with EIoU and PIoU using the shortest inference time. The rest of the loss functions use slightly more inference time than our method. The combined analysis shows that PIoU excels in several key metrics: it achieves optimal values for precision (95.5%), recall (92.8%), and mAP50-95 (89.1%), demonstrating its dual ability to accurately locate strawberries and effectively classify ripeness.

3.3. Comparison Experiment

To comprehensively evaluate the performance of the mature detection model, we conducted comparative experiments with mainstream single-stage target detection networks, including YOLOv3-tiny, YOLOv5n, YOLOv7-tiny, Baseline, YOLOv10n, and YOLOv11n. As shown in Table 6, five key metrics (mAP50, mAP50-95, precision, recall, F1 scores, number of parameters, and GFLOPs) were used for performance evaluation, and model parameters and computational complexity were analyzed. The experimental results show that Ripe-Detection outperforms all other models in all evaluation metrics. Notably, although YOLOv3-tiny (8.6M parameters) and YOLOv7-tiny (6.0M parameters) show higher parameter counts, they do not obtain higher mAP50 scores (87.6% and 83.4%, respectively) coupled with high computational costs (12.9GFLOPs and 13.2GFLOPs), which severely limit practical applicability. Limits the practical applicability. Although the computational complexity of YOLOv5n is low (4.1GFLOPs), its recall (83.7%) compares favorably with that of our model (92.8%) by as much as 9.1%. This lack of detection sensitivity will lead to a large number of missed detections in practical applications and will not fulfill the operational requirements of strawberry ripeness detection.

The superior performance of Ripe-Detection can be attributed to the specific improvements we made based on the strawberry’s phenotypic characteristics and the characteristics of the growing environment, such as ADown, a downsampling method. As an efficient feature fusion module, BiFPN can effectively fuse multi-scale features while reducing the number of parameters and computation. In addition, PEDblock uses two 3 × 3 grouped convolutions, which effectively reduces the complexity of the model and the number of parameters. At the same time, grouped convolution can learn different levels of feature information in different groups, which is conducive to improving the detection accuracy of the model. In terms of the comprehensive detection performance of the model, Ripe-Detection is more suitable for strawberry ripeness detection tasks in multiple environments.

3.4. Experimental Comparison Before and After Model Improvement

In order to visualize whether our method has significantly improved on each of the evaluation metrics, we used the ANOVA method to analyze the significance of the model on the five key evaluation metrics through SPSS 23 software. All metrics reflect p < 0.01, which proves that our model has significantly improved on all five metrics. We plotted radar charts to visualize the differences between them more closely. As shown in Figure 9, the mature detection model showed significant improvements in all five key performance metrics (mAP50, mAP50-95, precision, recall, and F1 score) compared to the baseline model. Our model exhibits higher accuracy, confirming its effectiveness in reducing false alarms and missed detections. These experimental results show that our proposed improvement scheme not only improves the detection accuracy but also enhances the robustness of the model under different environmental conditions by optimizing the feature fusion mechanism.

As shown in Table 7, after 100 training epochs, the improved model proposed in this study demonstrated significant advantages in the detection task for all strawberry ripeness levels. Compared with the benchmark model, absolute performance improvements of 6.8%, 5.1%, 5.9%, 4.0%, and 3.3% were achieved in the five core metrics of precision, recall, F1 value, mAP50, and mAP50-95, respectively, corresponding to relative improvement rates of 7.7%, 5.8%, 6.7%, 4.3%, and 3.8%. Especially in the ripeness stage (Ripe) testing, the model achieves breakthroughs: the precision metric improves by 7.8 percentage points to 98.3%, the recall metric improves by 4.8% to 95.9%, while mAP50 and mAP50-95 reach 98.3% and 91.7%, respectively, improvements of 4.6% and 3.8% over the baseline model. The accuracy of this method for ripe strawberries was as high as 98.3%, which proved its reliability in the identification of ripe strawberries.

For the more challenging task of unripe strawberry (unripe) detection, the improved model achieves significant improvements in precision (+2.5%), recall (+5.1%), and mAP50-95 (+2.3%) metrics, with a 4.2% improvement in its F1 value (90.5%) compared to the benchmark model (86.8%). The synergistic improvement in recall (95.6%) and precision (96.0%) (5.8% and 5.2% improvements, respectively) in the half-ripe stage (half-ripe) detection confirms the strong robustness of the improved model in the presence of complex background disturbances. Notably, the model’s detection accuracies for harvestable stage (ripe and near-ripe) strawberries were all stabilized above 98%.

As shown in Figure 10, this study visually demonstrates the performance advantages of the improved model by comparing the detection results of Ripe-Detection with the baseline model on the test set. The analysis shows that compared to the baseline model, the model proposed in this paper effectively reduces the false detection rate, missed detection rate, and repeated detection of targets. Specifically, the baseline model has significant defects in ripening stage detection: it misclassifies ripe strawberries as half-ripe (row a), generates multiple detection errors in complex scenarios (rows b–c), and has a background misclassification rate (row d). These detection defects will directly lead to a decrease in the operational efficiency of the picking robot, resulting in significant economic losses. On the contrary, the model in this paper not only has a compact bounding box but also can achieve higher accuracy in the classification and detection of strawberry ripeness, with better detection and classification results. The visualization results confirm that the improved model not only achieves a more compact bounding box but also maintains a stable detection performance under light variations and local occlusion conditions, and these characteristics fully satisfy the stringent requirements of strawberry ripeness detection accuracy and robustness in the production environment.

4. Discussion

Our proposed model uses a lightweight approach with lower parameter counts and model complexity while maintaining detection accuracy, which allows Ripe-Detection to achieve good results in strawberry ripeness detection tasks. Nevertheless, as shown in Figure 11, the model still has a probability of recognizing the background as a target. Therefore, further research is still necessary in the future. In future research, attention should be focused on using a wider range of datasets, including the use of sample images from different regions and different varieties. This diversity would help the model to be applicable to a wider range of strawberry varieties. The image data used in this experiment were collected from commercial production strawberry greenhouses, and supplementing data related to open-air growing could further enhance model generalization. In addition, attention should be paid to investigating the effect of images acquired in different growing environments (e.g., pot planting method and strawberry planting frame growing method) on the strawberry ripening classification task. Addressing these issues will enable generalization and applicability of the Ripe-Detection model, making it more effective in processing samples under a variety of conditions.

5. Conclusions

In this study, we use ADown and BiFPN in the feature perception part and feature fusion part, respectively, and propose PEDBlock applied to the detection head part. Finally, we propose a model for the strawberry ripeness detection task, Ripe-Dection, which not only shows advantages in terms of accuracy but also has a lower number of parameters and GFLOPs. The significance of this work is that it can provide a solution for the automated detection of strawberry ripeness, which is an important basis for automated strawberry picking and solves the limitations of the traditional manual classification and the existing neural network-based models. Ripe-Detection, with its high accuracy and robustness, improves the reliability of crop monitoring and management and makes a great contribution to the realization of precision agriculture.

Author Contributions

Conceptualization, H.Y. and Z.C.; methodology, H.Y., C.Q., Z.C., J.C., and Y.Z.; formal analysis, H.Y., C.Q., J.C., and Y.Z.; investigation, C.Q. and Z.C.; resources, C.Q. and J.C.; data curation, C.Q., Z.C., and Y.Z.; writing—original draft preparation, H.Y.; writing—review and editing, C.Q. and J.C.; visualization, Z.C., J.C., and Y.Z.; supervision, Z.C., J.C., and Y.Z.; project administration, H.Y. and Y.Z.; funding acquisition, H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a major project of the Ministry of Agriculture and Rural Affairs, grant number NK202302020205.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

We thank all the authors for their support. The authors would like to thank all the reviewers who participated in the review of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, G.Q.; Jiao, L.; Chen, P.; Liu, K.; Wang, R.J.; Dong, S.F.; Kang, C.R. Spatial convolutional self-attention-based transformer module for strawberry disease identification under complex background. Comput. Electron. Agric. 2023, 212, 10. [Google Scholar] [CrossRef]
Liu, H.; Wang, X.X.; Zhao, F.Y.; Yu, F.Y.; Lin, P.; Gan, Y.; Ren, X.F.; Chen, Y.M.; Tu, J. Upgrading swin-B transformer-based model for accurately identifying ripe strawberries by coupling task-aligned one-stage object detection mechanism. Comput. Electron. Agric. 2024, 218, 16. [Google Scholar] [CrossRef]
Cybulska, J.; Drobek, M.; Panek, J.; Cruz-Rubio, J.M.; Kurzyna-Szklarek, M.; Zdunek, A.; Frąc, M. Changes of pectin structure and microbial community composition in strawberry fruit (Fragaria × ananassa Duch.) during cold storage. Food Chem. 2022, 381, 132151. [Google Scholar] [CrossRef] [PubMed]
Parsa, S.; Debnath, B.; Khan, M.A.; Ghalamzan, A.E. Modular autonomous strawberry picking robotic system. J. Field Robot. 2024, 41, 2226–2246. [Google Scholar] [CrossRef]
Yue, X.-Q.; Shang, Z.-Y.; Yang, J.-Y.; Huang, L.; Wang, Y.-Q. A smart data-driven rapid method to recognize the strawberry maturity. Inf. Process. Agric. 2020, 7, 575–584. [Google Scholar] [CrossRef]
Vandendriessche, T.; Vermeir, S.; Martinez, C.M.; Hendrickx, Y.; Lammertyn, J.; Nicolaï, B.M.; Hertog, M. Effect of ripening and inter-cultivar differences on strawberry quality. LWT-Food Sci. Technol. 2013, 52, 62–70. [Google Scholar] [CrossRef]
Hemathilake, D.; Gunathilake, D. Agricultural productivity and food supply to meet increased demands. In Future Foods; Elsevier: Amsterdam, The Netherlands, 2022; pp. 539–553. [Google Scholar]
Islam, M.; Bijjahalli, S.; Fahey, T.; Gardi, A.; Sabatini, R.; Lamb, D.W. Destructive and non-destructive measurement approaches and the application of AI models in precision agriculture: A review. Precis. Agric. 2024, 25, 1127–1180. [Google Scholar] [CrossRef]
Rizzo, M.; Marcuzzo, M.; Zangari, A.; Gasparetto, A.; Albarelli, A. Fruit ripeness classification: A survey. Artif. Intell. Agric. 2023, 7, 44–57. [Google Scholar] [CrossRef]
Liu, X.; Wang, P.; Wang, S.; Liao, W.; Ouyang, M.; Lin, S.; Lin, R.; Sarris, P.F.; Michalopoulou, V.; Feng, X.; et al. The circular RNA circANK suppresses rice resistance to bacterial blight by inhibiting microRNA398b-mediated defense. Plant Cell 2025, 37, koaf082. [Google Scholar] [CrossRef]
Gong, L.; Gao, B.; Sun, Y.; Zhang, W.; Lin, G.; Zhang, Z.; Li, Y.; Liu, C. preciseSLAM: Robust, Real-Time, LiDAR–Inertial–Ultrasonic Tightly-Coupled SLAM With Ultraprecise Positioning for Plant Factories. IEEE Trans. Ind. Inform. 2024, 20, 8818–8827. [Google Scholar] [CrossRef]
Zhang, Z.; He, R.K.; Han, B.; Ren, S.Q.; Fan, J.H.; Wang, H.S.; Zhang, Y.L.; Ma, Z.C. Magnetically Switchable Adhesive Millirobots for Universal Manipulation in both Air and Water. Adv. Mater. 2025, 16, 2420045. [Google Scholar] [CrossRef] [PubMed]
Dhanya, V.G.; Subeesh, A.; Kushwaha, N.L.; Vishwakarma, D.K.; Kumar, T.N.; Ritika, G.; Singh, A.N. Deep learning based computer vision approaches for smart agricultural applications. Artif. Intell. Agric. 2022, 6, 211–229. [Google Scholar] [CrossRef]
Chen, Z.; Cai, Y.; Liu, Y.; Liang, Z.; Chen, H.; Ma, R.; Qi, L. Towards end-to-end rice row detection in paddy fields exploiting two-pathway instance segmentation. Comput. Electron. Agric. 2025, 231, 109963. [Google Scholar] [CrossRef]
Pan, W.; Chen, J.; Lv, B.; Peng, L. Lightweight marine biodetection model based on improved YOLOv10. Alex. Eng. J. 2025, 119, 379–390. [Google Scholar] [CrossRef]
Jiang, D.; Wang, H.; Li, T.; Gouda, M.A.; Zhou, B. Real-time tracker of chicken for poultry based on attention mechanism-enhanced YOLO-Chicken algorithm. Comput. Electron. Agric. 2025, 237, 110640. [Google Scholar] [CrossRef]
Yu, H.L.; Liu, J.W.; Chen, C.C.; Heidari, A.A.; Zhang, Q.; Chen, H.L. Optimized deep residual network system for diagnosing tomato pests. Comput. Electron. Agric. 2022, 195, 18. [Google Scholar] [CrossRef]
Chen, H.R.; Wen, C.J.; Zhang, L.; Ma, Z.Y.; Liu, T.Y.; Wang, G.Y.; Yu, H.L.; Yang, C.; Yuan, X.H.; Ren, J.F. Pest-PVT: A model for multi-class and dense pest detection and counting in field-scale environments. Comput. Electron. Agric. 2025, 230, 15. [Google Scholar] [CrossRef]
Qi, Z.X.; Zhang, W.Q.; Yuan, T.; Rong, J.C.; Hua, W.J.; Zhang, Z.Q.; Deng, X.; Zhang, J.X.; Li, W. An improved framework based on tracking-by-detection for simultaneous estimation of yield and maturity level in cherry tomatoes. Measurement 2024, 226, 12. [Google Scholar] [CrossRef]
Yu, H.L.; Chen, Z.Y.; Song, S.Z.; Qi, C.Y.; Liu, J.L.; Yang, C.L. Rapid and non-destructive classification of rice seeds with different flavors: An approach based on HPFasterNet. Front. Plant Sci. 2025, 15, 16. [Google Scholar] [CrossRef]
Mi, Z.; Yan, W.Q. Strawberry ripeness detection using deep learning models. Big Data Cogn. Comput. 2024, 8, 92. [Google Scholar] [CrossRef]
Wang, Y.; Yan, G.; Meng, Q.L.; Yao, T.; Han, J.F.; Zhang, B. DSE-YOLO: Detail semantics enhancement YOLO for multi-stage strawberry detection. Comput. Electron. Agric. 2022, 198, 8. [Google Scholar] [CrossRef]
Zhang, Y.C.; Yu, J.Y.; Chen, Y.; Yang, W.; Zhang, W.B.; He, Y. Real-time strawberry detection using deep neural networks on embedded system (rtsd-net): An edge AI application. Comput. Electron. Agric. 2022, 192, 12. [Google Scholar] [CrossRef]
Li, Y.; Wang, J.C.; Wu, H.R.; Yu, Y.; Sun, H.B.; Zhang, H. Detection of powdery mildew on strawberry leaves based on DAC-YOLOv4 model. Comput. Electron. Agric. 2022, 202, 12. [Google Scholar] [CrossRef]
Wang, D.Z.; Wang, X.C.; Chen, Y.Y.; Wu, Y.; Zhang, X.L. Strawberry ripeness classification method in facility environment based on red color ratio of fruit rind. Comput. Electron. Agric. 2023, 214, 12. [Google Scholar] [CrossRef]
Yang, S.Z.; Wang, W.; Gao, S.; Deng, Z.P. Strawberry ripeness detection based on YOLOv8 algorithm fused with LW-Swin Transformer. Comput. Electron. Agric. 2023, 215, 10. [Google Scholar] [CrossRef]
Du, X.Q.; Cheng, H.C.; Ma, Z.H.; Lu, W.W.; Wang, M.X.; Meng, Z.C.; Jiang, C.J.; Hong, F.W. DSW-YOLO: A detection method for ground-planted strawberry fruits under different occlusion levels. Comput. Electron. Agric. 2023, 214, 13. [Google Scholar] [CrossRef]
Zheng, C.W.; Liu, T.; Abd-Elrahman, A.; Whitaker, V.M.; Wilkinson, B. Object-Detection from Multi-View remote sensing Images: A case study of fruit and flower detection and counting on a central Florida strawberry farm. Int. J. Appl. Earth Obs. Geoinf. 2023, 123, 14. [Google Scholar] [CrossRef]
Tao, Z.Q.; Li, K.; Rao, Y.; Li, W.; Zhu, J. Strawberry Maturity Recognition Based on Improved YOLOv5. Agronomy 2024, 14, 460. [Google Scholar] [CrossRef]
Le Louëdec, J.; Cielniak, G. 3D shape sensing and deep learning-based segmentation of strawberries. Comput. Electron. Agric. 2021, 190, 106374. [Google Scholar] [CrossRef]
Yu, Y.; Zhang, K.L.; Yang, L.; Zhang, D.X. Fruit detection for strawberry harvesting robot in non-structural environment based on Mask-RCNN. Comput. Electron. Agric. 2019, 163, 9. [Google Scholar] [CrossRef]
Tang, C.; Chen, D.; Wang, X.; Ni, X.D.; Liu, Y.H.; Liu, Y.H.; Mao, X.; Wang, S.M. A fine recognition method of strawberry ripeness combining Mask R-CNN and region segmentation. Front. Plant Sci. 2023, 14, 18. [Google Scholar] [CrossRef] [PubMed]
Wu, Z.Z.; Wang, X.F.; Zou, L.; Xu, L.X.; Li, X.L.; Weise, T. Hierarchical object detection for very high-resolution satellite images. Appl. Soft. Comput. 2021, 113, 16. [Google Scholar] [CrossRef]
Varghese, R.; S, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar]
Wang, C.-Y.; Yeh, I.H.; Mark Liao, H.-Y. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. In Proceedings of the Computer Vision–ECCV 2024; Springer: Cham, Switzerland, 2025; pp. 1–21. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar]
Zhang, Z.; Lu, X.; Cao, G.; Yang, Y.; Jiao, L.; Liu, F. ViT-YOLO:Transformer-Based YOLO for Object Detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021; pp. 2799–2808. [Google Scholar]
Chen, Z.; Chen, K.; Lin, W.; See, J.; Yu, H.; Ke, Y.; Yang, C. PIoU Loss: Towards Accurate Oriented Object Detection in Complex Environments. In Proceedings of the Computer Vision–ECCV 2020; Springer: Cham, Switzerland, 2020; pp. 195–211. [Google Scholar]
Liu, C.; Wang, K.G.; Li, Q.; Zhao, F.Z.; Zhao, K.; Ma, H.T. Powerful-IoU: More straightforward and faster bounding box regression loss with a nonmonotonic focusing mechanism. Neural Netw. 2024, 170, 276–284. [Google Scholar] [CrossRef]
Menghani, G. Efficient deep learning: A survey on making deep learning models smaller, faster, and better. ACM Comput. Surv. 2023, 55, 259. [Google Scholar] [CrossRef]
Ioannou, Y.; Robertson, D.; Cipolla, R.; Criminisi, A. Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5977–5986. [Google Scholar]

Figure 1. Partial dataset presentation. Column (a) shows the unenhanced images, column (b) shows the enhanced image display after brightness change, column (c) shows the enhanced image display after motion blur, column (d) shows the enhanced image display after fogging, and column (e) shows the enhanced image display after random noise.

Figure 2. Flowchart of Ripe-Detection development. The part marked by the red box is the improved part of this model.

Figure 3. Structure of ADown downsampling module.

Figure 4. Feature pyramid network. (a) PANet (b) BiFPN.

Figure 5.

L_{P I o U}

. Grey coverage is the actual box. The blue overlay is the predicted box.

Figure 5.

L_{P I o U}

. Grey coverage is the actual box. The blue overlay is the predicted box.

Figure 6. Structure of PEDblock (PIoU-Efficient Detect).

Figure 7. Baseline is YOLOv8n. (a) Effect of each method on the mAP50 convergence curve. (b) Effect of each method on the number of parameters and GFLOPs.

Figure 8. Changes in Ripe-Detection training and validation losses and related evaluation indicator metrics with training cycles.

Figure 9. Significant difference analysis before and after improvement, where p < 0.01 labeled as ** indicates a significant difference.

Figure 10. Comparison pictures of detection results of Ripe-Detection and Baseline. (a) Showing the effect of target proximity detection. (b) Row showing detection effect of single target. (c) Showing the detection effect under the influence of background. (d) Demonstrate the detection effect of large background area.

Figure 11. Ripe-Detection confusion matrix.

Table 1. Distribution of images within the dataset; a denotes the original image, b denotes the brightness change enhancement image, c denotes the motion blur enhancement image, d denotes the fogging state simulation enhancement image, and e denotes the addition of random noise enhancement image.

	a	b	c	d	e
Train	621	626	574	617	605
Val	204	194	222	201	200
Test	196	181	225	203	216

Table 2. Distribution of labels within the dataset.

Classes	Train	Val	Test	Total
Unripe	3005	1082	1053	5140
Half-ripe	2811	975	919	4705
Ripe	2985	1010	1070	5065

Table 3. Algorithmic parameter settings for the experiment.

Parameters	Setup
Epochs	100
Batch Size	32
Optimizer	Adam
Initial Learning Rate	0.01
Final Learning Rate	0.01
Momentum	0.937
Images	640
Workers	16
Mosaic	0

Table 4. Ablation experiments. Baseline is YOLOv8n. The label √ indicates the method(s) used.

Baseline	ADown	BiFPN	PEDBlock	Precision (%)	Recall (%)	F1 (%)	mAP50 (%)	mAP50-95 (%)	Parameters (M)	GFLOPs
√				88.7	87.7	88.2	92.4	85.8	3.0	8.2
√	√			91.8	87.7	89.7	92.8	86.1	2.7	7.6
√		√		93.7	88.6	91.1	94.7	88.1	2.0	7.1
√			√	88.0	85.3	86.6	89.8	81.6	2.4	5.6
√	√	√		95.2	92.8	94.0	96.7	90.2	1.7	6.6
√		√	√	88.7	87.2	88.2	92.3	85.8	1.6	5.0
√	√		√	94.4	91.2	89.7	96.2	86.1	2.1	5.1
√	√	√	√	95.5	92.8	91.1	96.4	88.1	1.3	4.4

Table 5. Comparison results of different loss functions.

Method	Precision (%)	Recall (%)	F1 (%)	mAP50 (%)	mAP50-95 (%)	Inference Times (ms)
GIoU	94.1	91.5	92.8	95.8	88.3	1.1
DIoU	94.7	91.3	93.0	96.4	88.7	1.1
CIoU	95.1	92.7	93.9	96.1	88.7	1.1
EIoU	94.6	92	93.3	96.1	88.5	1.0
SIoU	94.6	91.4	93.0	95.8	87.9	1.1
PIoU	95.5	92.8	94.1	96.4	89.1	1.0

Table 6. Comparative experiments of different models.

Model	Precision (%)	Recall (%)	F1 (%)	mAP50 (%)	mAP50-95 (%)	Parameters (M)	GFLOPs
YOLOv3-tiny	87.1	78.3	82.5	87.6	65.3	8.6M	12.9
YOLOv5n	90.3	83.7	86.9	90.9	75.6	1.7M	4.1
YOLOv7-tiny	82.7	76.1	79.3	83.4	62.0	6.0M	13.2
Baseline	88.7	87.7	88.2	92.4	85.8	3M	8.2
YOLOv10n YOLOv11n Ripe-Detection	86.9 87.6 95.5	79.1 88.5 92.8	82.8 88.0 94.1	87.2 92.5 96.4	80.0 84.2 89.1	2.7M 2.5M 1.3M	8.2 6.3 4.4

Table 7. Experimental results of Ripe-Detection compared to Baseline.

Level	Model	Precision (%)	Recall (%)	F1 (%)	mAP50 (%)	mAP50-95 (%)
Ripe	Baseline	88.5	91.1	89.8	93.7	87.9
Ripe	Ripe-Detection	96.3	95.9	96.1	98.3	91.7
Half-ripe	Baseline	85.8	89.6	87.7	92.4	87.2
Half-ripe	Ripe-Detection	96.0	95.6	95.8	98.0	92.0
Unripe	Baseline	91.8	82.3	86.8	91.0	82.4
Unripe	Ripe-Detection	94.3	87.0	90.5	92.9	83.5
ALL	Baseline	88.7	87.7	88.2	92.4	85.8
ALL	Ripe-Detection	95.5	92.8	94.1	96.4	89.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, H.; Qian, C.; Chen, Z.; Chen, J.; Zhao, Y. Ripe-Detection: A Lightweight Method for Strawberry Ripeness Detection. Agronomy 2025, 15, 1645. https://doi.org/10.3390/agronomy15071645

AMA Style

Yu H, Qian C, Chen Z, Chen J, Zhao Y. Ripe-Detection: A Lightweight Method for Strawberry Ripeness Detection. Agronomy. 2025; 15(7):1645. https://doi.org/10.3390/agronomy15071645

Chicago/Turabian Style

Yu, Helong, Cheng Qian, Zhenyang Chen, Jing Chen, and Yuxin Zhao. 2025. "Ripe-Detection: A Lightweight Method for Strawberry Ripeness Detection" Agronomy 15, no. 7: 1645. https://doi.org/10.3390/agronomy15071645

APA Style

Yu, H., Qian, C., Chen, Z., Chen, J., & Zhao, Y. (2025). Ripe-Detection: A Lightweight Method for Strawberry Ripeness Detection. Agronomy, 15(7), 1645. https://doi.org/10.3390/agronomy15071645

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ripe-Detection: A Lightweight Method for Strawberry Ripeness Detection

Abstract

1. Introduction

2. Materials and Methods

2.1. Image Dataset Acquisition

2.2. Dataset Creation and Preprocessing

2.3. Model Construction

2.3.1. Ripe-Detection

2.3.2. ADown Module

2.3.3. BiFPN Module

2.3.4. PEDBlock

2.4. Algorithmic Parameter Settings and Environment Configuration for Experiments

2.5. Evaluation Indicators

3. Results

3.1. Ablation Experiment

3.2. Comparison Experiments of Different Loss Functions

3.3. Comparison Experiment

3.4. Experimental Comparison Before and After Model Improvement

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI