YOLOv8-CBSE: An Enhanced Computer Vision Model for Detecting the Maturity of Chili Pepper in the Natural Environment

Ma, Yane; Zhang, Shujuan

doi:10.3390/agronomy15030537

Open AccessArticle

YOLOv8-CBSE: An Enhanced Computer Vision Model for Detecting the Maturity of Chili Pepper in the Natural Environment

by

Yane Ma

¹ and

Shujuan Zhang

^2,*

¹

School of Intelligent Engineering, Jinzhong College of Information, Jinzhong 030800, China

²

College of Agricultural Engineering, Shanxi Agricultural University, Jinzhong 030801, China

^*

Author to whom correspondence should be addressed.

Agronomy 2025, 15(3), 537; https://doi.org/10.3390/agronomy15030537

Submission received: 11 January 2025 / Revised: 17 February 2025 / Accepted: 21 February 2025 / Published: 23 February 2025

(This article belongs to the Special Issue Intelligent Information System for Agriculture Based on Vision Technology)

Download

Browse Figures

Versions Notes

Abstract

:

In order to accurately detect the maturity of chili peppers under different lighting and natural environmental scenarios, in this study, we propose a lightweight maturity detection model, YOLOv8-CBSE, based on YOLOv8n. By replacing the C2f module in the original model with the designed C2CF module, the model integrates the advantages of convolutional neural networks and Transformer architecture, improving the model’s ability to extract local features and global information. Additionally, SRFD and DRFD modules are introduced to replace the original convolutional layers, effectively capturing features at different scales and enhancing the diversity and adaptability of the model through the feature fusion mechanism. To further improve detection accuracy, the EIoU loss function is used instead of the CIoU loss function to provide more comprehensive loss information. The results showed that the average precision (AP) of YOLOv8-CBSE for mature and immature chili peppers was 90.75% and 85.41%, respectively, with F1 scores and a mean average precision (mAP) of 81.69% and 88.08%, respectively. Compared with the original YOLOv8n, the F1 score and mAP of the improved model increased by 0.46% and 1.16%, respectively. The detection effect for chili pepper maturity under different scenarios was improved, which proves the robustness and adaptability of YOLOv8-CBSE. YOLOv8-CBSE also maintains a lightweight design with a model size of only 5.82 MB, enhancing its suitability for real-time applications on resource-constrained devices. This study provides an efficient and accurate method for detecting chili peppers in natural environments, which is of great significance for promoting intelligent and precise agricultural management.

Keywords:

chili maturity detection; YOLO vision-based model; target detection and differentiation; natural environment; precision agricultural management

1. Introduction

In the context of agricultural intelligence and precision agriculture, the real-time monitoring and accurate management of crop growth conditions have become key factors in improving agricultural production efficiency and quality [1,2]. The importance of real-time monitoring stems from the need for immediate decision-making in modern agricultural practices, especially in the context of automated harvesting systems. Modern agricultural robots are equipped with limited on-board computing resources and require efficient, real-time, lightweight inspection algorithms to optimize harvesting operations. Chili pepper is an important cash crop that is widely planted and has a large market demand [3,4,5]. Accurate detection of maturity has a profound impact on ensuring the quality of agricultural products, reducing waste, and improving harvest efficiency.

In the field of maturity detection of chili pepper, the traditional method mainly relies on manual observation and empirical judgment. This method is not only time-consuming and labor-intensive but also prone to being influenced by subjective factors, leading to the inaccuracy and variability of detection results. In contrast, object detection algorithms based on deep learning exhibit a higher degree of automation and precision and can handle complex and changing natural conditions more effectively [6,7,8].

In recent years, with the rapid advancement of computer vision and deep learning technology, the application scope of these algorithms in the agricultural sector has expanded [9,10,11,12]. By analyzing image or video data, they enable the real-time monitoring and precise identification of crop growth conditions [13,14,15].

You Only Look Once (YOLO) series algorithms, as one of the most popular target detection algorithms at present, have achieved remarkable results in many fields due to their high efficiency and accuracy [16,17,18]. Jackey et al. [19] explored the combination of YOLOv7 with augmented reality technology to detect and visualize the maturity of strawberries. The results show that the mAP value and F1 value of the YOLOv7 model, using transfer learning, fine-tuning, and multi-scale training, are 0.89 and 0.92, respectively. The prediction accuracy for ripe, partially ripe, and unripe strawberries was 0.93, 0.81, and 0.94, respectively, and the maturity of each strawberry could be accurately identified. Jing et al. [20] created a large melon maturity dataset from a greenhouse based on melon maturity, and proposed a lightweight and efficient melon maturity detection method, MRD-YOLO, with an average precision of 97.4% and only 4.8 G FLOPs and 2.06 M parameters, providing a valuable reference for various types of melon maturity detection. Zhai et al. [21] proposed a YOLOv5-CA maturity detection model for blueberries based on BiFPN and CA attention mechanisms, with a recall rate of 88.2%, an accuracy of 88.8%, and a mAP of 91.1%. The detection accuracy for mature and immature blueberries was 0.93 and 0.87, respectively, providing a basic research foundation for automated blueberry harvesting.

Wu et al. [22] proposed a lightweight Cabbage-YOLO model to quickly and accurately detect the maturity of Chinese cabbage, and the mAP of the improved model reached 86.4%. Compared with the original YOLOv8-n model, the FLOPs, number of parameters, and weight of YOLOv8-n decreased by 35.9%, 47.2%, and 45.2%, respectively, while the mAP increased by 1.9%. The improved lightweight model can provide effective technical support for the promotion of intelligent management of cabbage. Fruit maturity is a major factor affecting the quality and yield of camellia oil. Zhu et al. [23] proposed an improved lightweight model YOLO-LM based on YOLOv7-tiny. The results showed that the accuracy rate, recall rate, and mAP of YOLO-LM reached 93.96%, 93.32%, and 93.18%, respectively. The model size is 19.82 MB. By introducing an adaptive pooling scheme and a loss function, Sekharamantry et al. [24] further improved the YOLOv5 model used to detect outdoor apples. The values of precision, recall, and F1 score were 0.97, 0.99, and 0.98, respectively, providing references for apple picking.

However, maturity detection in natural environments faces universal challenges such as variable lighting conditions, leaf occlusion, and overlapping fruits. These challenges are particularly pronounced in chili peppers due to their dense growth patterns and color-dependent maturity indicators. While existing algorithms have made progress in general crop detection tasks, they exhibit limitations in handling these challenges for chili peppers. For instance, the standard convolutional layer has the problem of insufficient feature extraction for occluded fruits, and high structural complexity. Therefore, this study aims to improve accuracy and robustness in the maturity detection of chili peppers in natural environments by enhancing the YOLOv8n model. To achieve this goal, a C2CF module is first designed to replace the C2f module in the original model’s backbone network, thereby enhancing the model’s feature extraction capability. Secondly, SRFD and DRFD modules are introduced to replace the original convolutional layers and capture features of different scales through a feature fusion mechanism, thereby enhancing feature diversity. Finally, the EIoU loss function is used to replace the original CIoU loss function, providing the model with more abundant loss information so that it can learn the position and shape of the target more accurately, thus improving detection accuracy. By enhancing the YOLOv8n model, this study aims to address the challenges of maturity detection in chili peppers in natural environments, and provide strong support for precision agriculture management. The results are expected to promote the sustainable development of the pepper industry, reduce waste, improve production efficiency, and provide a useful reference for the maturity detection of other crops.

The structure of this paper is as follows: The Section 2 focuses on the materials and methods of this study, including dataset construction and methodology. The Section 3 describes the experimental results and their analysis and discusses the advantages and limitations of this study. The Section 4 summarizes the research.

2. Materials and Methods

2.1. Construction of Dataset

The images used in this experiment were collected from the chili pepper planting base in Jishan County, Shanxi Province, China, using an iPhone camera during a period from 23 August 2023 to 30 August 2023. The camera featured an aperture value of f/1.5, an ISO speed of ISO-50 (a specific setting value for camera ISO sensitivity), and a focal length of 6 mm, which contributed to the high quality of the captured images. The image format was JPG, with a resolution of 3023 pixels × 3023 pixels. Each image typically included 1–2 chili pepper plants. To ensure the diversity and comprehensiveness of the data samples, the dataset included chili pepper images under different lighting, leaf occlusion, and fruit overlap conditions, in dense environments, with a total of 1390 images. LabelImg software (version 1.8.6) was used to annotate the chili pepper images [25], which included two categories: mature (labelled with pickable) and immature (labelled with unpickable). The annotated images were saved in XML format. According to a ratio of 7:2:1, the collected images were divided into training, validation, and test sets, consisting of 973, 278, and 139 images, respectively. Parts of the datasets in different environments are shown in Figure 1. It should be noted that some plants are only partially captured in the images due to field of view limitations and the natural growth patterns of the chili peppers. Nevertheless, this did not compromise the validity of the dataset, as the annotations were meticulously performed to ensure accuracy.

2.2. YOLOv8-CBSE Model Structure

This study focuses on research on maturity detection of chili peppers in natural environments, aiming to improve the accuracy and robustness of the model for detecting the maturity of chili peppers by enhancing the target detection algorithm. YOLOv8 has achieved remarkable results in many fields due to its efficient and accurate target detection capabilities. The network structure of YOLOv8 comprises an input layer, backbone network, neck network, and head network. YOLOv8 is an object detection algorithm based on deep learning. Its fundamental principle involves extracting features from input images using convolutional neural networks and utilizing these features to predict the location and category of objects. Additionally, YOLOv8 incorporates a lightweight design, which reduces model complexity and enhances computational efficiency. YOLOv8 consists of five versions: n, s, m, l, and x. In this study, YOLOv8n was selected as the improved basic model. It not only inherits the efficient and accurate object detection capability of YOLOv8 but also further optimizes the model size and computational requirements. Compared with other YOLOv8 versions, YOLOv8n achieves a smaller model size and lower computational complexity by streamlining the network structure and reducing the number of parameters. This makes YOLOv8n excellent for real-time inspection tasks, making it more suitable for such tasks [14,26,27]. The input layer of YOLOv8n is responsible for receiving image data; the backbone network is responsible for extracting deep features from the image. The neck network further fuses and processes these features to improve detection accuracy. Finally, the head network uses these features to predict the location and category of the target. The structure of YOLOv8n is shown in Figure 2.

Under natural conditions, detection of chili pepper maturity faces many challenges, such as changes in lighting, complex backgrounds, morphological diversity of chili pepper fruits, leaf occlusion, and fruit overlap. To effectively address these challenges, this study proposes the YOLOv8-CBSE model. Based on YOLOv8n, several improvements were made to enhance the accuracy and robustness of chili pepper maturity detection. The C2CF module, introduced for the first time in this study, combines the concepts of the C2f module and ConvFormer to replace the C2f module in the backbone network of the original model, aiming to more effectively extract the maturity characteristics of chili peppers. By incorporating a local feature extraction mechanism, the C2CF module can more accurately capture local details of the pepper fruit, which is especially important when dealing with complex backgrounds and lighting changes. Additionally, the module integrates global information modeling to help capture the overall shape and background information of chili pepper fruits, thereby improving the model’s feature extraction capabilities. Inspired by others’ research, this study introduced SRFD and DRFD modules to replace the convolutional layer of the original model to improve the detection effect of chili pepper maturity. The SRFD module focuses on capturing the fine characteristics of chili pepper fruit, which is crucial for detecting partially occluded or overlapping fruits. The DRFD module can extract deep features of chili pepper fruit, which is helpful for the model to accurately identify the target in complex scenes. The two modules work together to achieve multi-scale feature extraction and feature fusion, thereby improving the accuracy of the model in detecting chili pepper maturity. Additionally, the YOLOv8-CBSE model adopts the EIoU loss function to replace the original CIoU loss function. The EIoU loss function introduces the distance from the center point of the bounding box and the relative difference in aspect ratio as additional loss terms, providing more comprehensive loss information for the model. This helps the model learn the location and shape of chili peppers more accurately, further improving detection accuracy. The structure of the improved model YOLOv8-CBSE is shown in Figure 3.

2.2.1. C2CF Module

The ConvFormer Module is an innovative module that combines the advantages of convolutional neural networks (CNNs) and the Transformer architecture [28]. The module is designed to capture local features of images through convolution operations and to model global context information using the Transformer’s self-attention mechanism, thereby enhancing the feature extraction capability of the model.

In the ConvFormer module, the input feature map is initially extracted through a series of convolutional layers, which not only help to further capture image details but also provide rich local features for subsequent global information modeling. Subsequently, these local features are passed into the Transformer encoding part, where the self-attention mechanism can effectively capture the relationships between different regions in the image and establish global context information. This strategy of combining local features and global information makes ConvFormer modules excel at handling complex visual tasks. The ConvFormer module structure is shown in Figure 4.

YOLOv8’s C2f module may have certain limitations when dealing with complex scenes and subtle features. Therefore, this study replaces the Bottleneck of the C2f module with a ConvFormer Block, which is named the C2CF module, in order to improve the capability for maturity detection of chili peppers in a natural environment through local feature extraction and global information modeling of the ConvFormer module.

2.2.2. SRFD Module and DRFD Module

The SRFD module is mainly used to handle the preliminary stage of image feature extraction. Compared with traditional Conv, SRFD has a stronger feature extraction capability [29]. The SRFD module adopts a parallel downsampling path and a feature fusion mechanism. In the parallel downsampling path, SRFD uses different convolution kernel sizes and step sizes to capture features at different scales. The feature maps generated by these parallel paths are then input into a feature fusion layer, where they are fused through weighted summation or concatenation. This fusion strategy not only enhances the diversity of features but also improves the model’s adaptability to complex image content. Additionally, the SRFD module introduces an adaptive weight allocation mechanism to balance the contributions of different downsampling paths. This mechanism can dynamically adjust the weights of each path based on the characteristics of the input image and the requirements of the detection task, thereby further improving the performance of the model. The SRFD module structure is shown in Figure 5.

Compared with SRFD, the DRFD module focuses more on in-depth mining and the integration of complementary features from different downsampling paths. The DRFD module employs a more complex feature fusion mechanism, including multi-level feature extraction and fusion strategies. In the multi-level feature extraction stage, DRFD uses convolutional layers of different depths to capture feature information at different levels. This feature information is then input into a multi-level feature fusion layer, where it is fused recursively or iteratively. The multi-level feature fusion layer adopts advanced techniques such as pyramid structures or attention mechanisms to achieve fine-grained fusion of features. These techniques can capture the correlation between features and highlight important feature information, thereby improving the recognition accuracy and robustness of the model. Additionally, the DRFD module introduces an adaptive feature selection mechanism to screen out the most representative features. This mechanism can dynamically select the optimal feature combination based on the characteristics of the input image and the requirements of the detection task, thus further enhancing the performance of the model. The DRFD module structure is illustrated in Figure 6.

In the YOLOv8 model, we replaced the first two Conv layers with SRFD modules, aiming to capture richer and more robust features through their unique structure, so as to improve the model’s ability to identify different levels of maturity in chili peppers. In addition, all the other Conv layers were replaced with DRFD modules to enhance the understanding and analysis of chili pepper images under complex natural environments.

2.2.3. EIoU Loss Function

In the task of maturity detection of chili pepper under natural conditions, factors such as the shape, color, and growing environment affect the detection results. As a loss function used to measure the difference between the predicted frame and the real frame in the YOLOv8 algorithm, CIoU loss performs well in most cases, but in some complex scenarios, its ability to capture information such as the distance of the center point of the boundary frame and the relative difference in the width and height of the boundary frame still needs to be improved. To overcome this limitation, the EIoU loss function is introduced in this study. The EIoU loss function introduces the distance from the center point of the bounding box and the relative difference between the width and height of the bounding box as additional loss terms [30,31]. Compared with the traditional IoU loss function, the EIoU loss function provides more abundant loss information, which helps the model to learn the position and shape of the target more accurately. When the boundary frames do not overlap, the EIoU loss function can still produce an effective gradient, avoiding the problem of stagnant gradient update. By introducing additional measures, the EIoU loss function can more fully assess the difference between the predicted box and the real box, thereby improving the detection accuracy of the model. EIoU is composed of three parts: the IoU loss function (L_IoU), the distance loss function (L_dis), and the aspect ratio loss function (L_asp). The formula is as follows:

L_{E I o U} = L_{I o U} + L_{d i s} + L_{a s p} = 1 - I o U + \frac{p^{2} (b, b^{g t})}{w_{c}^{2} + h_{c}^{2}} + \frac{p^{2} (w, w^{g t})}{w_{c}^{2}} + \frac{p^{2} (h, h^{g t})}{h_{c}^{2}}

(1)

Among them, L_IoU, L_dis, and L_asp represent IoU loss, center distance loss, and height loss, respectively. IoU represents the intersection ratio between the forecast box and the real box. b and b^gt represent the center points of the forecast box and the real box, respectively. w and w^gt represent the widths of the forecast box and the real box, respectively. W and W^gt also represent the widths of the forecast box and the real box, respectively. w_c and h_c are the width and height of the minimum outer rectangle of the predicted bounding box and the real bounding box, and p is the Euclidean distance between the two points.

2.3. Test Environment

The specifications for the training environment of this study are as follows: the CPU used was I7-13790F 2.10 GHz; the GPU was an RTX 4090 (Intel, Santa Clara, CA, USA); the system memory was 24 GB; and PyTorch version 1.13 and Python version 3.8 were used. In this study, the Stochastic Gradient Descent (SGD) method was used to optimize network parameters. The initial learning rate of the SGD optimizer was set to 0.01, and momentum was set to 0.937. The weight decay was set to 0.0005 for 300 rounds of training.

2.4. Evaluation Indices

In evaluating the quality of the chili pepper maturity test model, this study selected precision (P), recall (R), F1 score, average precision (AP), mean average precision (mAP), model size, and floating-point operations (FLOPs) as evaluation metrics to ensure a comprehensive and accurate measurement of the model’s performance. Precision reflects the proportion of instances in which the model correctly predicts a positive sample out of all predicted positive samples. Recall measures the percentage of actual positive samples that the model correctly identifies. The F1 score is the harmonic mean of precision and recall and is used to comprehensively evaluate the model’s performance. Average precision and mean average precision further consider the model’s performance under different thresholds, reflecting its detection ability more comprehensively. Additionally, model size and floating-point computations (FLOPs) are used to evaluate the computational complexity and feasibility of the model in practical applications [32,33]. In order to ensure the robustness of the statistics, the training set and the validation set were cross-validated in 5-fold cross-validation. The dataset was randomly divided into 5 subsets. In each fold, four subsets were used for training, and the remaining subset was used for validation. The paired T-test was used to evaluate the statistical difference between YOLOv8-CBSE and the baseline model. These evaluation indicators together constitute the key factors for measuring the quality of the model and ensuring the comprehensiveness and accuracy of the evaluation results. The formulas are as follows:

P = \frac{T_{P}}{T_{P} + F_{P}}

(2)

R = \frac{T_{P}}{T_{P} + F_{N}}

(3)

F 1 = \frac{P R}{P + R}

(4)

A P = \int_{0}^{1} P d R

(5)

m A P = \frac{1}{N} \sum_{i = 1}^{N} {A P}_{i}

(6)

where T_P represents the number of actual positive samples correctly identified by the model as positive, F_P represents the number of actual negative samples incorrectly identified by the model as positive, F_N represents the number of actual positive samples incorrectly identified by the model as negative, and N represents the total number of categories, which is 2.

3. Results and Discussion

3.1. C2CF Module Replacement of C2f Module in Different Positions Test

In order to verify the specific impact of replacing the C2f module with the C2CF module at different positions on model performance, in this study, we replaced the C2f module in the YOLOv8n backbone network with the C2CF module, named YOLOv8-CB. The C2f module in the YOLOv8n neck network was replaced with the C2CF module, named YOLOv8-CN. The C2f module in both the YOLOv8n backbone and neck networks was replaced with the C2CF module, named YOLOv8-CBN. The test results of replacing the C2f modules with the C2CF module at different positions are shown in Table 1.

As can be seen from Table 1, the precision of YOLOv8-CB is 2.92% and 2.95% lower than that of YOLOv8-CN and YOLOv8-CBN, but recall increased to 79.54%, indicating that the replacement of the C2CF module in the backbone network may sacrifice accuracy to some extent. However, the recall rate of the model improved, enabling the model to identify more chili pepper targets more accurately. From the perspective of average precision (AP) in the detection task of mature and non-mature chili peppers, the YOLOv8-CB model achieved a significant improvement in the AP value of non-mature chili peppers, which was 0.90%, 0.63%, and 0.27% higher than that of YOLOv8n, YOLOv8-CN, and YOLOv8-CBN, respectively. At the same time, the AP value of mature chili peppers also remained at a high level, decreasing by only 0.24% compared with YOLOv8n.

The mAP of YOLOv8-CB and YOLOv8-CBN increased by 0.31% and 0.33%, respectively, compared with that of YOLOv8n, while the mAP of YOLOv8-CN decreased by 0.12% compared with that of YOLOv8n, indicating that the application of the C2CF module to the neck network has a poor effect. YOLOv8-CB can effectively improve the model’s ability to identify immature chili peppers while maintaining the detection accuracy for mature chili peppers, thereby enhancing the overall detection performance of the model. In terms of model size and computation, YOLOv8-CB also demonstrates certain advantages. Compared with the original YOLOv8n, the model size of YOLOv8-CB is reduced by 0.23 MB and FLOPs are reduced by 0.3 G, which helps reduce the storage requirements of the model and improve operational efficiency. To sum up, YOLOv8-CB showed a good comprehensive effect; therefore, YOLOv8-CB was selected for subsequent improvements.

3.2. Loss Function Comparison Test

In order to verify the specific improvement effects of the EIoU loss function on the detection of chili pepper maturity and to analyze the impact of different loss functions on model performance, we replaced the loss functions in the YOLOv8 model with EIoU, DIoU, and ShapeIoU, and conducted comparative tests. The original YOLOv8 model used the CIoU loss function. The DIoU loss function focuses on optimizing the center point distance between the predicted box and the ground truth box [13], while ShapeIoU pays more attention to the shape similarity of the bounding boxes [34]. The test results are shown in Table 2.

As can be seen from Table 2, compared with the CIoU loss function, although the EIoU loss function decreases by 1.07% in precision, it has a significant improvement in recall, increasing by 2.22%. This indicates that the EIoU loss function is more accurate in capturing the chili pepper target and can better identify the maturity of chili peppers under natural conditions, reducing the occurrence of missed detections. In terms of F1 score, the F1 score of the EIoU loss function is 0.67%, 1.00%, and 0.67% higher than those of the CIoU, DIoU, and ShapeIoU loss functions, respectively, which further proves its effectiveness in optimizing model performance. The mAP of the EIoU loss function is 87.18%, which is increased by 0.26%, 0.46%, and 0.16% compared with the CIoU, DIoU, and ShapeIoU loss functions, respectively. This shows that the EIoU loss function improves the position of the prediction box by introducing additional metrics such as the length and width penalty terms, thereby enhancing the detection accuracy and robustness of the model for the mining of chili pepper.

3.3. Ablation Test

To verify the effectiveness of the proposed improvements, an ablation test was conducted in this study, and the results are shown in Table 3. In this study, the C2f module in the YOLOv8n backbone network was replaced by the C2CF module, resulting in YOLOv8-CB. The first two Conv layers in YOLOv8n were replaced by the SRFD module, and the other Conv layers were replaced by the DRFD module, resulting in YOLOv8-S. The model that replaces the CIoU loss function with the EIoU loss function in the original model is named YOLOv8-E. The fusion model of YOLOv8-CB and YOLOv8-S is named YOLOv8-CBS, and the fusion model of YOLOv8-S and YOLOv8-E is named YOLOv8-SE. The fusion model of YOLOv8-CB, YOLOv8-S, and YOLOv8-E is named YOLOv8-CBSE.

As can be seen from Table 3, compared with YOLOv8n, YOLOv8-S reduces the F1 score and AP in the mature category by 0.06% and 0.25%, respectively, while it increases the F1 score and AP in the immature category by 1.28% and 1.38%, respectively. YOLOv8-S improves the F1 score and mAP by 0.61% and 0.57%, respectively, over YOLOv8n. In addition, the model size of YOLOv8-S is only increased by 0.09 MB compared to YOLOv8n. This indicates that the introduction of SRFD and DRFD modules can improve the performance of the model in detecting the maturity of chili peppers in a natural environment without significantly increasing the complexity of the model.

The F1 scores for the mature and immature categories of YOLOv8-CBS, respectively, improved by 0.76% and 1.16% compared to those of YOLOv8-CB, and the F1 score for the mature category of YOLOv8-CBS improved by 0.70% compared to that of YOLOv8-S. In terms of AP, YOLOv8-CBS showed improvements of 0.49% and 0.50% in the mature category, and 0.72% and 0.21% in the immature category, respectively, compared to YOLOv8-CB and YOLOv8-S. Additionally, the mAP of YOLOv8-CBS also improved by 0.61% and 0.35% compared to YOLOv8-CB and YOLOv8-S, respectively. These results indicate that the integration of the C2CF module, SRFD module, and DRFD module further enhances the performance of the model. Compared to YOLOv8-S and YOLOv8-E, the mature AP of YOLOv8-SE increased by 1.10% and 0.88%, respectively, and the mAP of YOLOv8-SE increased by 0.42% and 0.73%, respectively. This verifies the effectiveness of the fusion of the SRFD, DRFD modules, and the EIoU loss function.

The highest AP and mAP of YOLOv8-CBSE in the mature category are 90.75% and 88.08%, respectively. Compared with YOLOv8-CBS, the AP of YOLOv8-CBSE in the mature category increased more, reaching 0.59%, and the mAP increased by 0.24%. Compared with YOLOv8-SE, the AP of the immature category increased by 0.35%, while the AP of the mature category remained largely unchanged. YOLOv8-CBSE outperforms the original model YOLOv8n in various performance metrics, particularly in the F1 scores for both mature and immature categories, where YOLOv8-CBSE improves by 0.31% and 0.61%, respectively. The F1 score and mAP of YOLOv8-CBSE increased by 0.46% and 1.16%, respectively, while the model size reduced by 0.15 MB. This indicates that, by incorporating the SRFD module, DRFD module, and EIoU loss function, and integrating the C2CF module, the improved model not only enhances the performance of chili pepper maturity detection in a natural environment, but also reduces the model size.

The results of 5-fold cross-validation before and after model improvement are shown in Table 4. The 5-fold cross-validation results demonstrate that YOLOv8-CBSE achieves a mean mAP of 88.03% ± 0.43, significantly outperforming YOLOv8n (86.92% ± 0.46) with a paired t-test value of (t = 6.51, p = 0.0014 < 0.005). This confirms the statistical significance of the proposed improvements.

In order to evaluate the robustness and consistency of the proposed model, the boxplot of the 5-fold cross-validation of mAP values of YOLOv8n and YOLOv8-CBSE are shown in Figure 7. The mean mAP of YOLOv8-CBSE is 1.11% higher than that of YOLOv8n. The median mAP of YOLOv8-CBSE (87.86%) is 0.87% higher than that of YOLOv8n (86.99%), indicating that the performance of the improved model has been significantly enhanced. YOLOv8-CBSE had an interquartile range of 0.51%, while YOLOv8n had an interquartile range of 0.54%, indicating less variability between different data segments. These results confirm that YOLOv8-CBSE not only achieves higher accuracy but also maintains stable performance across diverse data partitions, reinforcing the statistical significance of the improvements.

In order to directly reflect the effectiveness of the improved model, the GradCAM method is adopted in this study to visualize the output layer [35,36], so as to deeply understand the decision-making process of the model under different natural environments. The results are shown in Figure 8. A thermal map, as an effective visualization tool, intuitively represents the weight of a region in the output prediction process through the brightness of the region. The larger the area of the bright-colored region, the higher the attention of the region in the model’s output prediction, and the greater its contribution to the final decision. In different scenarios, the heat map of YOLOv8-CBSE showed that the model focused more attention on chili pepper targets and was more accurate. Compared with YOLOv8n, the model reduced background interference and could accurately capture the characteristics of chili pepper targets.

In this study, a comparative test of the model before and after the improvement was conducted, drawing the mAP curve and box loss curve, respectively, as shown in Figure 9. Firstly, from the perspective of the box loss curve, the loss value of the improved YOLOv8-CBSE model during training is consistently lower than that of the original YOLOv8n model. This result indicates that the improved model has higher accuracy in boundary box prediction and can locate the position of chili pepper more accurately. It is worth noting that both models converge after approximately 250 epochs. Secondly, from the perspective of the mAP curve, the mAP value of the improved YOLOv8-CBSE model is consistently better than that of the original model after about 100 epochs. The results demonstrate that the improved model has higher accuracy in detecting the maturity of chili pepper and can identify different levels of maturity more precisely. Additionally, the mAP curves of both models remain stable throughout the subsequent training process, further proving their stability and reliability. In summary, by comparing the box loss curve and mAP curve of the model before and after the improvement, it can be seen that the fast convergence and stable performance of the YOLOv8-CBSE model during training highlight its superiority in the maturity detection task of chili pepper under natural environments.

3.4. Comparison Test of Different Models

In order to objectively measure the effect of the improved model in this study, YOLOv8-CBSE was compared with the existing benchmark models YOLOv5s, YOLOv9-Tiny, YOLOv10s, and YOLOv11n, and the results are shown in Table 5. In terms of mature AP, YOLOv8-CBSE reached 90.75%, which is 1.39%, 0.33%, 1.43%, and 2.02% higher than those of YOLOv5s, YOLOv9-Tiny, YOLOv10s, and YOLOv11n, respectively. For immature AP, YOLOv8-CBSE is only 0.26% lower than YOLOv10s, and 1.10%, 2.84%, and 1.70% higher than YOLOv5s, YOLOv9-Tiny, and YOLOv11n, respectively. The F1 score of YOLOv8-CBSE is 1.22%, 0.41%, and 1.41% higher than those of YOLOv9-Tiny, YOLOv10s, and YOLOv11n, respectively, and the mAP of YOLOv8-CBSE is the highest. Compared with YOLOv5s, YOLOv9-Tiny, YOLOv10s, and YOLOv11n, the results are 1.24%, 1.59%, 0.58%, and 1.86% higher, respectively, which indicates that the improved model has higher stability and accuracy in the overall detection task. The model size of YOLOv8-CBSE is only 5.82 MB, which is 7.95 MB and 9.95 MB less than those of YOLOv5s and YOLOv10s, respectively, and only 1.42 MB and 0.59 MB higher than those of YOLOv9-Tiny and YOLOv11n, respectively. These results highlight the tradeoff between performance and efficiency in YOLOv8-CBSE, helping reduce computing resource consumption and improve operational efficiency in practical applications.

The target detection effects of YOLOv8-CBSE, YOLOv5s, YOLOv9-Tiny, YOLOv10s, and YOLOv11n on chili peppers in different natural environments are shown in Figure 10. All the models show good detection results under fair-light conditions. In the backlight environment, the detection effect of other models decreases in different degrees due to the influence of light, while YOLOv8-CBSE can maintain a high detection accuracy. In the case of fruit overlap and leaf occlusion, YOLOv8-CBSE also performs well, accurately identifying and detecting chili pepper targets that are overlapping or occluded, while other models may have false detections or missed detections. This further demonstrates the powerful detection capability of YOLOv8-CBSE in complex environments. Under dense conditions, YOLOv8-CBSE can accurately detect multiple chili pepper targets, and the detection frame is more suitable for chili pepper targets, with higher confidence. In conclusion, YOLOv8-CBSE’s effect on target detection in chili peppers under different natural environments is superior to other models, and its detection accuracy, stability, and robustness are improved, as well as its unique advantages in the task of chili pepper maturity detection.

3.5. Discussion

In the field of pepper fruit detection, Ayan et al. [37] used the YOLOv8s model to detect peppers in a greenhouse environment and achieved an mAP of 0.967 at an IOU threshold of 0.5, with a model size of 21.5 MB. Ma et al. [38] proposed a pepper target detection method based on an improved YOLOv8n model. They detected only mature red peppers and obtained an mAP of 96.3%. However, while other studies have focused on specific types of peppers, this study focuses more broadly on the detection of chili peppers with different levels of maturity. The YOLOv8-CBSE model proposed in this study was tested under natural light and in more complex natural environments, achieving accurate detection of chili pepper maturity while maintaining a lightweight design (model size only 5.82 MB). In particular, our model showed good performance in distinguishing between mature and immature chili peppers, with AP values of 90.75% and 85.41%, respectively. The YOLOv8-CBSE model demonstrated high efficiency and accuracy in detecting chili pepper maturity under natural light and various natural environmental conditions. It has high recognition accuracy and robustness in detecting the maturity of chili peppers, indicating the potential for wide-ranging applications. The application of this model will promote intelligent picking of chili peppers, improve picking efficiency, reduce damage rates during the picking process, and enhance the overall quality of chili peppers. Additionally, the successful development of this model provides new ideas and methods for target detection research of other crops in natural environments.

4. Conclusions

In this study, we aimed to address the challenge of maturity detection of chili peppers in natural environments by proposing an improved YOLOv8-based YOLOv8-CBSE model. The following are the main findings and contributions of this research:

Firstly, by introducing the C2CF module, SRFD module, and DRFD module, as well as adopting the EIoU loss function, the YOLOv8-CBSE model achieved significant improvements in terms of model size reduction, FLOP decrease, and performance enhancement. Specifically, compared to the original YOLOv8n, the YOLOv8-CB model reduced the model size by 0.23 MB and FLOPs by 0.3 G, while increasing the mAP by 0.31%. Additionally, replacing the CIoU loss function with EIoU resulted in increases in recall, F1, and mAP of 2.22%, 0.26%, and 0.26%, respectively.

Secondly, the effectiveness of each improvement was rigorously verified through ablation tests. The YOLOv8-CBSE model demonstrated the highest AP and mAP values of 90.75% and 88.08%, respectively, for pickable objects. Moreover, compared to YOLOv8n, YOLOv8-CBSE showed increases of 0.84% and 1.48% in AP for both pickable and unpickable objects, respectively. When compared to other models such as YOLOv5s, YOLOv9-Tiny, YOLOv10s, and YOLOv11n, YOLOv8-CBSE exhibited the best overall performance.

Furthermore, the proposed YOLOv8-CBSE model exhibited high recognition accuracy and robustness in detecting chili pepper maturity in natural environments. In future studies, we will continue to explore and optimize the model, further improving its performance, and deploy it in actual agricultural production environments to evaluate its performance and feasibility in real-world scenarios.

Author Contributions

Conceptualization, Y.M.; methodology, Y.M.; software, Y.M.; validation, Y.M.; formal analysis, Y.M.; investigation, Y.M.; data curation, Y.M.; writing—original draft preparation, Y.M.; writing—review and editing, S.Z.; visualization, S.Z.; funding acquisition, S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by The Key Research and Development Program of Shanxi Province (Project No: 201903D221027).

Data Availability Statement

The data are available from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mulakaledu, A.; Swathi, B.; Jadhav, M.M.; Shukri, S.M.; Bakka, V.; Jangir, P. Satellite Image–Based Ecosystem Monitoring with Sustainable Agriculture Analysis Using Machine Learning Model. Remote Sens. Earth Syst. Sci. 2024, 7, 764–773. [Google Scholar] [CrossRef]
Tamrakar, N.; Karki, S.; Kang, M.Y.; Deb, N.C.; Arulmozhi, E.; Kang, D.Y.; Kook, J.; Kim, H.T. Lightweight Improved YOLOv5s-CGhostnet for Detection of Strawberry Maturity Levels and Counting. AgriEngineering 2024, 6, 962–978. [Google Scholar] [CrossRef]
Ye, Z.; Shang, Z.; Li, M.; Zhang, X.; Ren, H.; Hu, X.; Yi, J. Effect of ripening and variety on the physiochemical quality and flavor of fermented Chinese chili pepper (Paojiao). Food Chem. 2022, 368, 130797. [Google Scholar] [CrossRef] [PubMed]
Ashok Priyadarshan, A.M.; Shyamalamma, S.; Nagesha, S.N.; Anil, V.S.; Nirmala, K.S.; Shravya, K.J. Evaluation of Local Types of Bird’s Eye Chilli (Capsicum frutescens L.) of Plant and Fruit Morphological Characters. Int. J. Plant Soil. Sci. 2024, 36, 339–351. [Google Scholar] [CrossRef]
Pinar, H.; Kaplan, M.; Karaman, K.; Ciftci, B. Assessment of interspecies (Capsicum annuum X Capsicum frutescens) recombinant inbreed lines (RIL) for fruit nutritional traits. J. Food Compos. Anal. 2023, 115, 104848. [Google Scholar] [CrossRef]
Salim, E.; Suharjito. Hyperparameter optimization of YOLOv4 tiny for palm oil fresh fruit bunches maturity detection using genetics algorithms. Smart Agric. Technol. 2023, 6, 100364. [Google Scholar] [CrossRef]
Badeka, E.; Karapatzak, E.; Karampatea, A.; Bouloumpasi, E.; Kalathas, I.; Lytridis, C.; Tziolas, E.; Tsakalidou, V.N.; Kaburlasos, V.G. A Deep Learning Approach for Precision Viticulture, Assessing Grape Maturity via YOLOv7. Sensors 2023, 23, 8126. [Google Scholar] [CrossRef]
Nugroho, D.P.; Widiyanto, S.; Wardani, D.T. Comparison of Deep Learning-Based Object Classification Methods for Detecting Tomato Ripeness. Int. J. Fuzzy Log. Intell. Syst. 2022, 22, 223–232. [Google Scholar] [CrossRef]
Khatun, T.; Nirob, M.A.S.; Bishshash, P.; Akter, M.; Uddin, M.S. A comprehensive dragon fruit image dataset for detecting the maturity and quality grading of dragon fruit. Data Brief 2024, 52, 109936. [Google Scholar] [CrossRef]
Azadnia, R.; Fouladi, S.; Jahanbakhshi, A. Intelligent detection and waste control of hawthorn fruit based on ripening level using machine vision system and deep learning techniques. Results Eng. 2023, 17, 100891. [Google Scholar] [CrossRef]
Begum, N.; Hazarika, M.K. Maturity detection of tomatoes using transfer learning. Meas. Food 2022, 7, 100038. [Google Scholar] [CrossRef]
Khan, H.A.; Farooq, U.; Saleem, S.R.; Rehman, U.-u.; Tahir, M.N.; Iqbal, T.; Cheema, M.J.M.; Aslam, M.A.; Hussain, S. Design and development of machine vision robotic arm for vegetable crops in hydroponics. Smart Agric. Technol. 2024, 9, 100628. [Google Scholar] [CrossRef]
Appe, S.N.; Arulselvi, G.; Balaji, G.N. CAM-YOLO: Tomato detection and classification based on improved YOLOv5 using combining attention mechanism. PeerJ Comput. Sci. 2023, 9, e1463. [Google Scholar] [CrossRef]
Shaikh, I.M.; Akhtar, M.N.; Aabid, A.; Ahmed, O.S. Enhancing sustainability in the production of palm oil: Creative monitoring methods using YOLOv7 and YOLOv8 for effective plantation management. Biotechnol. Rep. 2024, 44, e00853. [Google Scholar] [CrossRef]
de Almeida, G.P.S.; dos Santos, L.N.S.; da Silva Souza, L.R.; da Costa Gontijo, P.; de Oliveira, R.; Teixeira, M.C.; De Oliveira, M.; Teixeira, M.B.; do Carmo França, H.F. Performance Analysis of YOLO and Detectron2 Models for Detecting Corn and Soybean Pests Employing Customized Dataset. Agronomy 2024, 14, 2194. [Google Scholar] [CrossRef]
Olisah, C.C.; Trewhella, B.; Li, B.; Smith, M.L.; Winstone, B.; Whitfield, E.C.; Fernández, F.F.; Duncalfe, H. Convolutional neural network ensemble learning for hyperspectral imaging-based blackberry fruit ripeness detection in uncontrolled farm environment. Eng. Appl. Artif. Intell. 2024, 132, 107945. [Google Scholar] [CrossRef]
Bonora, A.; Bortolotti, G.; Bresilla, K.; Grappadelli, L.C.; Manfrini, L. A convolutional neural network approach to detecting fruit physiological disorders and maturity in ‘Abbé Fétel’ pears. Biosyst. Eng. 2021, 212, 264–272. [Google Scholar] [CrossRef]
Bortolotti, G.; Piani, M.; Gullino, M.; Mengoli, D.; Franceschini, C.; Grappadelli, L.C.; Manfrini, L. A computer vision system for apple fruit sizing by means of low-cost depth camera and neural network application. Precis. Agric. 2024, 25, 2740–2757. [Google Scholar] [CrossRef]
Chai, J.J.K.; Xu, J.-L.; O’Sullivan, C. Real-Time Detection of Strawberry Ripeness Using Augmented Reality and Deep Learning. Sensors 2023, 23, 7639. [Google Scholar] [CrossRef]
Jing, X.; Wang, Y.; Li, D.; Pan, W. Melon ripeness detection by an improved object detection algorithm for resource constrained environments. Plant Methods 2024, 20, 1–17. [Google Scholar] [CrossRef]
Zhai, X.; Zong, Z.; Xuan, K.; Zhang, R.; Shi, W.; Liu, H.; Han, Z.; Luan, T. Detection of maturity and counting of blueberry fruits based on attention mechanism and bi-directional feature pyramid network. J. Food Meas. Charact. 2024, 18, 6193–6208. [Google Scholar] [CrossRef]
Wu, M.; Yuan, K.; Shui, Y.; Wang, Q.; Zhao, Z. A Lightweight Method for Ripeness Detection and Counting of Chinese Flowering Cabbage in the Natural Environment. Agronomy 2024, 14, 1835. [Google Scholar] [CrossRef]
Zhu, X.; Chen, F.; Zheng, Y.; Chen, C.; Peng, X. Detection of Camellia oleifera fruit maturity in orchards based on modified lightweight YOLO. Comput. Electron. Agric. 2024, 226, 109471. [Google Scholar] [CrossRef]
Sekharamantry, P.K.; Melgani, F.; Malacarne, J. Deep Learning-Based Apple Detection with Attention Module and Improved Loss Function in YOLO. Remote Sens. 2023, 15, 1516. [Google Scholar] [CrossRef]
Sekharamantry, P.K.; Melgani, F.; Malacarne, J.; Ricci, R.; de Almeida Silva, R.; Marcato Junior, J. A Seamless Deep Learning Approach for Apple Detection, Depth Estimation, and Tracking Using YOLO Models Enhanced by Multi-Head Attention Mechanism. Computers 2024, 13, 83. [Google Scholar] [CrossRef]
Dewi, C.; Bilaut, F.Y.; Christanto, H.J.; Dai, G. Deep Learning for the Classification of Rice Leaf Diseases Using YOLOv8. Math. Model. Eng. Probl. 2024, 11, 3025–3034. [Google Scholar] [CrossRef]
Ramos, L.; Casas, E.; Bendek, E.; Romero, C.; Rivas-Echeverría, F. Hyperparameter optimization of YOLOv8 for smoke and wildfire detection: Implications for agricultural and environmental safety. Artif. Intell. Agric. 2024, 12, 109–126. [Google Scholar] [CrossRef]
Yu, W.; Si, C.; Zhou, P.; Luo, M.; Feng, J.; Yan, S.; Wang, X. MetaFormer Baselines for Vision. In Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence, Haikou, China, 1 November 2023. [Google Scholar]
Lu, W.; Chen, S.-B.; Tang, J.; Ding, C.H.Q.; Luo, B. A Robust Feature Downsampling Module for Remote-Sensing Visual Tasks. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–12. [Google Scholar] [CrossRef]
Magalhaes, S.A.; Castro, L.; Moreira, G.; Dos Santos, F.N.; Cunha, M.; Dias, J.; Moreira, A.P. Evaluating the Single-Shot MultiBox Detector and YOLO Deep Learning Models for the Detection of Tomatoes in a Greenhouse. Sensors 2021, 21, 3569. [Google Scholar] [CrossRef] [PubMed]
Suleiman, S.H.; Faki, S.M.; Hemed, I.M.d. Economic Growth and Environmental Pollution in Brunei: ARDL Bounds Testing Approach to Cointegration. Asian J. Econ. Bus. Account. 2019, 10, 1–11. [Google Scholar] [CrossRef]
Ali, M.; Yin, B.; Bilal, H.; Kumar, A.; Shaikh, A.M.; Rohra, A. Advanced efficient strategy for detection of dark objects based on spiking network with multi-box detection. Multimed. Tools Appl. 2023, 83, 36307–36327. [Google Scholar] [CrossRef]
Mirhaji, H.; Soleymani, M.; Asakereh, A.; Abdanan Mehdizadeh, S. Fruit detection and load estimation of an orange orchard using the YOLO models through simple approaches in different imaging and illumination conditions. Comput. Electron. Agric. 2021, 191, 106533. [Google Scholar] [CrossRef]
Huangfu, Y.; Huang, Z.; Yang, X.; Zhang, Y.; Li, W.; Shi, J.; Yang, L. HHS-RT-DETR: A Method for the Detection of Citrus Greening Disease. Agronomy 2024, 14, 2900. [Google Scholar] [CrossRef]
Shimizu, T.; Nagata, F.; Arima, K.; Miki, K.; Kato, H.; Otsuka, A.; Watanabe, K.; Habib, M.K. Enhancing defective region visualization in industrial products using Grad-CAM and random masking data augmentation. Artif. Life Robot. 2023, 29, 62–69. [Google Scholar] [CrossRef]
Avro, S.S.; Atikur Rahman, S.M.; Tseng, T.-L.; Fashiar Rahman, M. A deep learning framework for automated anomaly detection and localization in fused filament fabrication. Manuf. Lett. 2024, 41, 1526–1534. [Google Scholar] [CrossRef]
Paul, A.; Machavaram, R.; Ambuj; Kumar, D.; Nagar, H. Smart solutions for capsicum Harvesting: Unleashing the power of YOLO for Detection, Segmentation, growth stage Classification, Counting, and real-time mobile identification. Comput. Electron. Agric. 2024, 219, 108832. [Google Scholar] [CrossRef]
Ma, N.; Wu, Y.; Bo, Y.; Yan, H. Chili Pepper Object Detection Method Based on Improved YOLOv8n. Plants 2024, 13, 2402. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Images of chili pepper in different scenarios. (a) Fair-light. (b) Backlight. (c) Leaf occlusion. (d) Fruit overlap. (e) Dense.

Figure 2. The structure of YOLOv8n.

Figure 3. The structure of the improved model YOLOv8-CBSE.

Figure 4. ConvFormer module structure.

Figure 5. SRFD module structure.

Figure 6. DRFD module structure.

Figure 7. Box plot of YOLOv8n and YOLOv8-CBSE with 5-fold cross-validation of mAP values. The red triangle represents the mean and the purple line represents the median.

Figure 8. Thermal map visualization results before and after model improvement. (a) Fair-light. (b) Backlight. (c) Leaf occlusion. (d) Fruit overlap. (e) Dense.

Figure 9. Box loss curve and mAP curve before and after model improvement. (a) Box loss curve. (b) mAP curve.

Figure 10. Results of maturity detection of chili pepper by different target detection models under different natural environments. (a) Fair-light. (b) Backlight. (c) Leaf occlusion. (d) Fruit overlap. (e) Dense.

Table 1. Results of C2CF replacement experiment.

Model	Precision /%	Recall /%	AP/%		mAP /%	Model Size /MB	FLOPs /G
Model	Precision /%	Recall /%	Mature	Immature	mAP /%	Model Size /MB	FLOPs /G
YOLOv8n	84.62	78.10	89.91	83.93	86.92	5.97	8.2
YOLOv8-CB	82.55	79.54	89.67	84.80	87.23	5.74	7.9
YOLOv8-CN	85.47	75.87	89.43	84.17	86.80	5.74	8.0
YOLOv8-CBN	85.50	76.99	89.97	84.53	87.25	5.50	7.7

Table 2. Comparative test of loss function.

Loss Function	Precision/%	Recall/%	F1/%	mAP/%
CIoU	84.62	78.10	81.23	86.92
EIoU	83.55	80.32	81.90	87.18
DIoU	83.18	78.74	80.90	86.72
ShapeIoU	84.04	78.60	81.23	87.02

Table 3. Results of ablation test.

Model	F1/%		AP/%		F1 /%	mAP /%	Model Size /MB
Model	Mature	Immature	Mature	Immature	F1 /%	mAP /%	Model Size /MB
YOLOv8n	83.70	78.76	89.91	83.93	81.23	86.92	5.97
YOLOv8-CB	83.58	78.46	89.67	84.80	81.02	87.23	5.74
YOLOv8-S	83.64	80.04	89.66	85.31	81.84	87.49	6.06
YOLOv8-E	84.63	79.17	89.88	84.48	81.90	87.18	5.97
YOLOv8-CBS	84.34	79.62	90.16	85.52	81.98	87.84	5.82
YOLOv8-SE	84.11	79.16	90.76	85.06	81.64	87.91	6.06
YOLOv8-CBSE	84.01	79.37	90.75	85.41	81.69	88.08	5.82

Table 4. Comparison of 5-fold cross-validation results before and after model improvement.

Model	mAP/%						Standard Deviation
Model	Fold 1	Fold 2	Fold 3	Fold 4	Fold 5	Mean	Standard Deviation
YOLOv8n	86.71	86.26	86.99	87.25	87.38	86.92	0.46
YOLOv8-CBSE	87.72	87.65	87.86	88.23	88.71	88.03	0.43

Table 5. Comparison results of different models.

Model	AP/%		F1 /%	mAP /%	Model Size /MB
Model	Mature	Immature	F1 /%	mAP /%	Model Size /MB
YOLOv5s	89.36	84.31	82.20	86.84	13.77
YOLOv8-CBSE	90.75	85.41	81.69	88.08	5.82
YOLOv9-Tiny	90.42	82.57	80.47	86.49	4.40
YOLOv10s	89.32	85.67	81.28	87.50	15.77
YOLOv11n	88.73	83.71	80.28	86.22	5.23

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, Y.; Zhang, S. YOLOv8-CBSE: An Enhanced Computer Vision Model for Detecting the Maturity of Chili Pepper in the Natural Environment. Agronomy 2025, 15, 537. https://doi.org/10.3390/agronomy15030537

AMA Style

Ma Y, Zhang S. YOLOv8-CBSE: An Enhanced Computer Vision Model for Detecting the Maturity of Chili Pepper in the Natural Environment. Agronomy. 2025; 15(3):537. https://doi.org/10.3390/agronomy15030537

Chicago/Turabian Style

Ma, Yane, and Shujuan Zhang. 2025. "YOLOv8-CBSE: An Enhanced Computer Vision Model for Detecting the Maturity of Chili Pepper in the Natural Environment" Agronomy 15, no. 3: 537. https://doi.org/10.3390/agronomy15030537

APA Style

Ma, Y., & Zhang, S. (2025). YOLOv8-CBSE: An Enhanced Computer Vision Model for Detecting the Maturity of Chili Pepper in the Natural Environment. Agronomy, 15(3), 537. https://doi.org/10.3390/agronomy15030537

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

YOLOv8-CBSE: An Enhanced Computer Vision Model for Detecting the Maturity of Chili Pepper in the Natural Environment

Abstract

1. Introduction

2. Materials and Methods

2.1. Construction of Dataset

2.2. YOLOv8-CBSE Model Structure

2.2.1. C2CF Module

2.2.2. SRFD Module and DRFD Module

2.2.3. EIoU Loss Function

2.3. Test Environment

2.4. Evaluation Indices

3. Results and Discussion

3.1. C2CF Module Replacement of C2f Module in Different Positions Test

3.2. Loss Function Comparison Test

3.3. Ablation Test

3.4. Comparison Test of Different Models

3.5. Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI