A Raisin Foreign Object Target Detection Method Based on Improved YOLOv8

Ning, Meng; Ma, Hongrui; Wang, Yuqian; Cai, Liyang; Chen, Yiliang

doi:10.3390/app14167295

Open AccessArticle

A Raisin Foreign Object Target Detection Method Based on Improved YOLOv8

by

Meng Ning

^1,2,*,

Hongrui Ma

^1,2,

Yuqian Wang

^1,2,

Liyang Cai

^1,2 and

Yiliang Chen

^1,2

¹

School of Mechanical Engineering, Jiangnan University, Wuxi 214122, China

²

Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment and Technology, Wuxi 214122, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(16), 7295; https://doi.org/10.3390/app14167295

Submission received: 22 June 2024 / Revised: 31 July 2024 / Accepted: 13 August 2024 / Published: 19 August 2024

(This article belongs to the Special Issue Deep Learning and Machine Learning in Image Processing and Pattern Recognition)

Download

Browse Figures

Versions Notes

Abstract

:

During the drying and processing of raisins, the presence of foreign matter such as fruit stems, branches, stones, and plastics is a common issue. To address this, we propose an enhanced real-time detection approach leveraging an improved YOLOv8 model. This novel method integrates the multi-head self-attention mechanism (MHSA) from BoTNet into YOLOv8’s backbone. In the model’s neck layer, selected C2f modules have been strategically replaced with RFAConv modules. The model also adopts an EIoU loss function in place of the original CIoU. Our experiments reveal that the refined YOLOv8 boasts a precision of 94.5%, a recall rate of 89.9%, and an F1-score of 0.921, with a mAP reaching 96.2% at the 0.5 IoU threshold and 81.5% across the 0.5–0.95 IoU range. For this model, comprising 13,177,692 parameters, the average time required for detecting each image on a GPU is 7.8 milliseconds. In contrast to several prevalent models of today, our enhanced model excels in mAP0.5 and demonstrates superiority in F1-score, parameter economy, computational efficiency, and speed. This study conclusively validates the capability of our improved YOLOv8 model to execute real-time foreign object detection on raisin production lines with high efficacy.

Keywords:

raisins; foreign object detection; YOLOv8; computer vision

1. Introduction

Amidst escalating global concerns over food safety [1], raisins—ubiquitous as a popular snack in everyday life—frequently encounter the challenge of foreign object contamination during their production [2]. Such contamination not only undermines the product’s market competitiveness but also poses a significant risk to consumer health [3]. Consequently, the advancement of a precise and efficient detection system for foreign objects in raisins is imperative for safeguarding food safety and bolstering product quality.

In conventional raisin production, the identification of foreign matter predominantly depends on manual visual inspection, a method that, despite its simplicity, is fraught with inefficiencies [4,5]. Initially, this approach necessitates substantial labor, escalating production expenses and proving impractical for large-scale operations [6,7]. Moreover, the efficacy of manual inspection is constrained by factors such as the inspector’s visual acuity, fatigue, and concentration span, which can result in both oversights and erroneous identifications [8]. In addition, manual detection is difficult to adapt to rapidly changing production environments, and it is hard for inspectors to promptly find foreign bodies that are small or similar in color to raisins.

In stark contrast, automated foreign object detection systems have significant advantages. Initially, automation significantly enhances detection velocity, facilitating real-time surveillance and aligning with the exigencies of high-throughput production lines [9]. Moreover, these systems, leveraging machine vision, refine their discernment of foreign object characteristics through extensive training datasets, thereby bolstering detection precision and dependability. Immune to the vicissitudes of human fatigue and concentration, they operate with unwavering consistency. Furthermore, with the capacity to fine-tune algorithmic parameters and training data, these automated systems adeptly accommodate diverse production settings and product varieties, adeptly managing a spectrum of foreign object detection challenges [10].

Numerous researchers have extensively investigated the automation of raisin grading and sorting. Raihen M.N. et al. [11] designed a new model for raisin classification, combining eleven machine learning and five AI methods to classify raisins based on size, shape, and texture, with the LightGBM model peaking at 98.40% accuracy. Backes A.R. et al. [12] optimized feature extraction for raisin evaluation through particle swarm algorithms, coupling with linear discriminant analysis (LDA) to reach the zenith of classification accuracy at 99.73%. Predominantly, these raisin detection studies are rooted in traditional machine learning, heavily dependent on feature engineering, and often constrained by oversimplified image backgrounds, limiting their detection efficacy and hindering industrial scalability.

As deep learning technology rapidly develops, an influx of researchers have delved into the detection of agricultural products utilizing this advanced technology. Gao F. et al. [13] introduced a deep-learning-based multi-class fruit detection approach for plant analysis, employing Faster R-CNN to discern apples under various occlusion scenarios. The model’s accuracy is 0.909 in the absence of obstructions, 0.899 when leaves are blocking the view, 0.858 when branches or wire mesh are obstructing, and 0.848 when other fruits are in the way. The average overall accuracy is 0.879. Tian Y. et al. [14] proposed an enhanced YOLO-V3 model for the real-time tracking of apples throughout different growth phases in orchards. By integrating DenseNet to refine the YOLO-V3 network’s lower-resolution feature layers, the YOLOV3-dense model achieved an average precision of 0.882, with a swift average detection time of 0.304 s per frame. Ma N. et al. [15] have developed a technique for detecting and quantifying wheat seeds using an enhanced YOLOv8 model, incorporating shared convolutional layers for a streamlined design with reduced parameters. The integration of a visual transformer equipped with a deformable attention mechanism into the C2f module aims to bolster the network’s feature extraction prowess and improve detection precision. The model demonstrated an average detection accuracy of 77.6% in scenarios with seed aggregation and occlusion, an inference time of 2.86 ms, and improved seed detection accuracy under conditions of stacking, adhesion, and occlusion. Presently, deep learning-based detection techniques are primarily categorized into two-stage object detection methods, exemplified by Faster R-CNN, and single-stage object detection methods, typified by YOLO [16]. Two-stage methods boast higher accuracy but compromise on real-time capabilities. Conversely, single-stage methods, albeit with marginally lower accuracy, excel in real-time performance and are extensively applied in industrial settings.

The main goal of this research is to create and deploy a system for detecting and identifying foreign objects in raisins, utilizing an enhanced version of the YOLOv8 algorithm. By deeply analyzing the shortcomings of the YOLOv8 algorithm in handling small target detection and complex background recognition, we propose an enhanced real-time detection approach leveraging an improved YOLOv8 model. Our major contributions are as follows:

Incorporated the multi-head self-attention mechanism (MHSA) from BoTNet into the backbone network of the YOLOv8 model, strengthening the model backbone’s global information capture capability to enhance the richness and completeness of feature extraction.
The Conv module of the neck layer was changed to the RFAConv module, and the feature extraction process of different regions was fine-adjusted, so that the model could dynamically generate acceptance fields and improve the detection ability of foreign bodies of different sizes.
The original network’s CIoU loss is substituted with EIoU, enhancing the efficiency and accuracy of raisin foreign body detection.

Through such algorithm optimization and model adjustments, the detection system’s accuracy and real-time performance have been improved without increasing the number of parameters, making it more suitable for the raisin processing production environment. The subsequent sections of this paper are structured as follows: Section 2 primarily introduces the dataset construction process used for raisin foreign matter detection, demonstrating the model employed in this paper’s detection method and the significance of each module in the improved model for raisin detection. Section 3 outlines the experimental setup and parameter settings, detailing the model performance metrics for evaluation. It further presents the training outcomes and detection visualizations and underscores the model’s performance and the efficacy of the refined model through comparative studies with other models, ablation analyses, and experiments on various foreign matter compositions. Section 4 discusses and analyzes the research in conjunction with other researchers and the research presented in this paper. Section 5 renders a thorough encapsulation of the entire research undertaking.

2. Materials and Methods

2.1. Platform Setup

This study employed the Dahua (Hangzhou, China) industrial camera model A3504CG100, coupled with the Huaray (Hangzhou, China) industrial lens MH0820S, to achieve a resolution of 2592 × 1944 pixels, equating to approximately 5.04 megapixels. The platform’s frame was constructed from European Standard 3030 aluminum extrusions [17], which provide structural rigidity. Light shielding panels were affixed to the exterior, while an annular LED light strip (Hanglong Lighting, Shenzhen, China) is integrated within for uniform illumination—critical for maintaining image capture quality. The platform was positioned at a fixed distance of 200 mm from the camera lens, with an adjustable range that spanned from 150 mm to 300 mm to accommodate varying imaging requirements, as depicted in Figure 1.

2.2. Sample Preparation and Image Collection

This study utilized raisins sourced from Turpan City, within the Xinjiang Uyghur Autonomous Region of China. The specific variety of raisin selected is the seedless white type. The study encompasses a range of foreign matter typically encountered during raisin production and processing, including both endogenous and exogenous contaminants such as fruit stems, branches, stones, and plastics. Images were collected using HALCON image acquisition software (23.11.0.0), encompassing a variety of image types: mixed images featuring single foreign objects with raisins, mixed images with multiple foreign objects and raisins, images depicting varying degrees of density, and images with differing numbers of detection targets. A total of 823 images were obtained by removing low-quality images, as illustrated in Figure 2. The LabelImg annotation tool was utilized to designate and label the foreign objects within the images. The samples prepared for the experiment were divided into two major groups. The first major group consisted of subgroups, each composed entirely of four different types of foreign matter. The second major group was formed by mixing 20 g of single or multiple foreign matter combinations into 180 g of pure raisins, including combinations of raisins with fruit stems, raisins with branches, raisins with stones, raisins with plastic, raisins with fruit stems and branches, raisins with stones and plastic, and raisins with all types of foreign matter combined. Firstly, to ensure the randomness of the position of the raisins and foreign matter, the samples were placed in a plastic bag and shaken thoroughly before being poured out for each image capture. Secondly, to increase the diversity of the captured images and reduce material repetition, the camera was moved up and down to obtain images with different quantities of materials, and new materials were used after every 20 photos were taken.

To augment the dataset’s diversity and bolster the model’s generalization capacity, we implemented an array of data augmentation techniques. These encompassed scalable rotation, randomized adjustments to brightness and contrast, Gaussian blurring, stochastic cropping, fine-tuning of contrast, grid distortion, optical distortion, and channel shuffling. The application of these methods expanded our dataset to a comprehensive total of 2469 images, as illustrated in Figure 3. The dataset was split into training, validation, and test sets at ratios of 70%, 15%, and 15% respectively. The training set comprises a total of 1729 images, the validation set includes 370 images, and the test set also consists of 370 images. A priority was placed on achieving an equitable distribution of samples from each category across these subsets to ensure robust model evaluation and learning.

2.3. YOLOv8

YOLO’s fundamental concept is to approach object detection as a direct, end-to-end regression problem, with the algorithm involving a single neural network that predicts object bounding boxes and class probabilities directly from the input image. YOLOv8, introduced in 2023, builds on the YOLOv5 framework, significantly improving its performance. It integrates the capability to classify, detect, and segment objects within a single model. The comprehensive structure of YOLOv8 is illustrated in Figure 4.

In the backbone network algorithm, the initial convolutional layer has transitioned from the original 6 × 6 convolution to a 3 × 3 convolution. Additionally, the C3 module has been substituted with the C2f module. These modifications have decreased the complexity and parameter count of the model, resulting in a more streamlined design that improves the richness of gradient flow information. In the neck layer algorithm, the 1 × 1 convolutional reduction layer has been eliminated, with the C3 module being replaced by the C2f module. The detection head has been equipped with a decoupled architecture, which separates the classification and regression tasks. Additionally, the anchor-based approach has been replaced with anchor-free, which removes the anchor boxes, simplifying the model design and enhancing adaptability to targets of varying shapes and sizes. This approach diminishes the necessity for extensive hyperparameter adjustment while enhancing the model’s capacity for generalization. In the algorithm for the loss function, Binary Cross-Entropy (BCE) Loss serves as the metric for classification loss, whereas Combined Intersection over Union (CIOU) Loss, in conjunction with Distance Factor (DFL) Loss, is employed for regression loss. BCE Loss optimizes classification performance, CIOU Loss improves the accuracy of bounding box localization, and DFL Loss enhances the model’s focus on difficult examples during training, achieving comprehensive optimization for object detection tasks, particularly improving the detection capability for small objects and class imbalance issues. For the training strategy algorithm, the addition of Mosaic data augmentation in the final 10 epochs has effectively improved accuracy.

2.4. Improvement

YOLOv8 is divided into five versions: x, l, m, s, and n, based on the model’s d (depth factor), w (width factor), and r (ratio). YOLOv8n is distinguished by its minimal depth and feature map width, serving as the starting point from which the other four models are expanded through further depth and width enhancements. Although more complex models have better detection capabilities, they also require higher computer performance. Additionally, their training and detection times are longer. Therefore, this study comprehensively considers detection performance and model complexity, choosing the YOLOv8s model, and making improvements based on it. The architecture of the enhanced model is illustrated in Figure 5.

2.4.1. Backbone Network

The YOLOv8 backbone uses the Conv module for basic feature extraction, the C2f module for lightweight feature extraction, and the SPPF module to obtain features of different sizes in the image. These three modules are integrated to derive features from the input image. However, when the foreign matter is similar in texture and color to the raisins, it is difficult to extract comprehensive feature information using the above methods. BoTNet, proposed by Srinivas A. et al. [18], replaces the 3 × 3 convolution in the ResNet Bottleneck with multi-head attention (MHSA) based on ResNet, which significantly improves the baseline in object detection and instance segmentation, reducing parameters while minimizing latency. The MHSA in BoTNet, as shown in Figure 6, is different from that in Transformer. The model incorporates two-dimensional position encoding for both vertical and horizontal directions within the content-position segment, enabling it to gauge the spatial relationships between features at varying locations and thereby enhancing the correlation of information with its spatial context.

This study integrates BoTNet’s multi-head self-attention mechanism into the YOLOv8 framework, allowing YOLOv8 to merge convolutional processing with the attention mechanism. This reinforces the model’s backbone’s global information capture capability and adaptability to high-dimensional data, enhancing the richness and completeness of feature extraction.

2.4.2. Neck Layer

The YOLOv8 neck layer uses the upsample module to upsample low-resolution feature maps to a higher resolution, the C2f module to fuse features, and the Conv module to extract feature information. These modules combine multi-scale features from the backbone for detection by the detection head. However, there is a problem with poor detection performance for smaller-sized targets. RFAConv is a new convolution operation proposed by Zhang X. et al. [19] in 2024, and its specific structure is shown in Figure 7. Although existing spatial attention mechanisms effectively overcome the inherent parameter-sharing limitations of convolutional neural networks, this is limited to acting before the 1 × 1 convolution operation. When the spatial attention mechanism acts before the 3 × 3 convolution operation, there is an overlap of receptive fields, which leads to the problem of shared attention weights between sliders that cannot effectively resolve parameter-sharing issues. RFAConv performs group convolution (Group Conv) operations and average pooling (AvgPool) operations on the input feature map. The group convolution operation extracts receptive field spatial features from the input feature map, converting the receptive field into a collection of multiple receptive fields, each being an independent unit. The average pooling operation computes the mean value of each pixel within the input feature map, yielding a feature map that encapsulates global data. After average pooling, the global information feature map is divided into three groups, each applied with convolution kernels of different sizes. After normalization with the Softmax function, an attention weight map is obtained. RFAConv re-weights and adjusts the size of the receptive field spatial features and the attention map, and then outputs a feature map through a 3 × 3 convolution operation with a stride of 3 to ensure compatibility with subsequent network layers.

This research substitutes certain C2f modules within the YOLOv8 network’s intermediate layer with RFAConv modules, facilitating a more nuanced refinement of the feature extraction across various regions while maintaining a manageable parameter count. This adaptation empowers the model to adaptively configure its receptive fields, capturing more intricate details for petite foreign objects and broader contextual information for larger ones, thereby bolstering its proficiency in detecting a diverse range of foreign matter sizes.

2.4.3. Loss Function

Figure 8 illustrates the incorporation of CIoU loss into the regression loss of the YOLOv8 framework. The CIoU loss

L_{C I o U}

is computed through Equations (1) to (5), with A representing the estimated box, B the actual box, and C the minimal rectangle encompassing both. The dimensions are denoted by width (w) and height (h), while

I o U

measures the overlap between A and B. The centers’ separation is given by the Euclidean distance

ρ (b^{A}, b^{B})

. The diagonal of C is c, and

R_{C I o U}

adjusts for aspect ratio discrepancies. The coefficient

α

modulates the weight, and v indicates the aspect ratio variance.

L_{C I o U} = 1 - I o U + \frac{ρ^{2} (b^{A}, b^{B})}{c^{2}} + R_{C I o U}

(1)

R_{C I o U} = α v

(2)

I o U = \frac{|A \cap B|}{|A \cup B|}

(3)

α = \frac{v}{(1 - I o U) + v}

(4)

v = \frac{4}{π^{2}} {(arctan \frac{w^{B}}{h^{B}} - arctan \frac{w^{A}}{h^{A}})}^{2}

(5)

Although the CIoU loss builds upon the DIoU loss [20] by incorporating a penalty for the aspect ratio discrepancy between the estimated and actual boxes, the v component within the

R_{C I o U}

term merely indicates the variance in aspect ratio, not the specific interplay of width and height for both boxes. In order to tackle this concern, Zhang Y. F. et al. [21] proposed the EIoU loss based on CIoU loss, as shown in Figure 9. The EIoU loss

L_{E I o U}

calculation formula is shown in Equations (6) to (8), where

w^{C}

is the width of the smallest bounding rectangle, and

h^{C}

is the height of the smallest bounding rectangle. EIoU loss refines the CIoU loss by breaking down the aspect ratio penalty term

R_{C I o U}

into separate penalties for the width and height discrepancies between the estimated and actual boxes. This not only resolves the previously mentioned issue but also improves the convergence rate and accuracy of the loss function. Therefore, using EIoU loss helps improve the efficiency and accuracy of raisin foreign matter detection.

L_{E I o U} = 1 - I o U + \frac{ρ^{2} (b^{A}, b^{B})}{c^{2}} + R_{E I o U}

(6)

c^{2} = {(w^{C})}^{2} + {(h^{C})}^{2}

(7)

R_{E I o U} = \frac{ρ^{2} (w^{A}, w^{B})}{{(w^{C})}^{2}} + \frac{ρ^{2} (h^{A}, h^{B})}{{(h^{C})}^{2}}

(8)

3. Results

3.1. Experimental Platform and Parameter Configuration

Table 1 displays the comprehensive setup and specifications of the training platform utilized for model development.

The image size for model training is 640 × 640, the training epochs are 300, the batch size is 32, the learning rate is 0.01, the SGD momentum is 0.937, the weight decay for the optimizer is 0.0005, the warm-up period is 3, the initial momentum during warm-up is 0.8, and the initial learning rate for biases during warm-up is 0.1.

3.2. Performance Evaluation Metrics

For raisin foreign matter detection in scenarios with multiple foreign matters and dense objects, it is necessary to measure the model’s accuracy and speed. Therefore, this paper selects multi-faceted performance evaluation metrics to analyze the model. The primary assessment criteria encompass precision (P), recall (R), F1-score, mean average precision (mAP) across all categories, mean average precision (AP) for individual categories, the count of model parameters, floating-point operations per second (FLOPs), and the duration for image detection on the GPU platform (Time).

The formulas for each performance evaluation metric are shown in Equations (9) to (12), where k represents the number of classes. P indicates the ratio of true positives (TP) to the total predicted positives (TP and FP), while R reflects the ratio of true positives (TP) to the sum of true positives and false negatives (TP and FN). The F1 score is the harmonic average of precision and recall. mAP stands for the mean average precision across multiple classes, and AP denotes the area under the precision–recall curve. The count of model parameters signifies its size, FLOPs measure the computational complexity, and Time encompasses the sum of preprocessing, inference, and post-processing durations needed for the model to process an image with a batch size of 1.

P = \frac{T P}{(T P + F P)}

(9)

R = \frac{T P}{(T P + F N)}

(10)

F 1 = 2 \times \frac{(P \times R)}{(P + R)}

(11)

m A P = \frac{\sum_{i = 1}^{k} A P}{k}

(12)

3.3. Model Training Results and Visualization

The confusion matrix, a widely utilized instrument for assessing classification model performance, offers a straightforward visualization of how well the model performs across various classes. It offers a clear view of the model’s classification effectiveness by comparing the actual labels with the predicted labels. The normalized confusion matrices for the original YOLOv8s model and the enhanced model are depicted in Figure 10a and Figure 10b, respectively, with the rows corresponding to the actual classes and the columns to the predicted classes. The figures reveal that the enhanced model exhibits enhanced detection capabilities across all categories, particularly noteworthy in the detection of fruit stalks.

The boundary box loss, classification loss, and feature point loss during training and validation of the improved model, as well as its precision, recall, average precision at a 0.5 IoU threshold, and average precision at a 0.5:0.95 IoU threshold are shown in Figure 11.

The visualization of the training results is shown in Figure 12. We can clearly see that the recognition accuracy of the improved model has been significantly improved when dealing with small objects with severe occlusion. This improvement not only reflects the model’s ability to capture details but also reflects the model’s ability to adapt to complex scenes. In addition, the enhanced model demonstrates enhanced overall recognition accuracy, further validating the effectiveness of our refinement approach. Overall, these findings clearly illustrate the enhanced performance of the refined model.

3.4. Comparison Experiments with Different Object Detection Algorithms

To more intuitively demonstrate the performance of the improved YOLOv8 model, comparative experiments were conducted using Faster R-CNN, SSD, YOLOv5m, YOLOv5s, YOLOv7, YOLOv7-tiny, YOLOv8x, YOLOv8l, YOLOv8m, YOLOv8s, and YOLOv8n, all based on deep learning models, using the same raisin dataset. The comparative experimental results of different models are shown in Table 2, and the experimental results of different models for different foreign matter classification accuracy are shown in Table 3.

The detection performance of various models is illustrated in Figure 13. The comparative experimental outcomes indicate that the refined YOLOv8 model exhibits superior performance in precision, recall, and the composite metric F1-score, particularly excelling in precision, with a marginal decrease compared to the YOLOv7 model. In terms of the average precision at a 0.5 IoU threshold, it is only slightly lower than the YOLOv8x model by 0.4 percentage points and is 4.1, 30.3, 1.9, 2.9, 0.1, 1.5, 0.2, 0.3, 1.5, and 2.7 percentage points higher than other models, respectively. In terms of the average precision at a 0.5:0.95 IoU threshold, it is lower than the larger YOLOv8 models and the YOLOv7 model. In terms of the number of parameters, FLOPs, and detection time, the improved model has a good performance. From the experimental results of different models for different foreign matter classification accuracy, the improved model has achieved a more significant improvement in detecting fruit stalk foreign matter, only slightly lower than the YOLOv8x model by 0.7 percentage points and is 9.0, 43.4, 5.0, 5.8, 2.8, 5.1, 1.7, 1.4, 3.1, and 6.9 percentage points higher than other models, respectively.

In summary, the enhanced model strikes an optimal harmony between detection accuracy and processing speed, exhibiting notable advancements in detecting various types of raisin foreign matter while introducing a minimal increment in the parameter count.

3.5. Ablation Experiments

To further verify the interaction effects between the improved modules and the optimization effects of the improved modules on the entire model, eight ablation experiments were set up for three groups of modules, as shown in Table 4. These experiments systematically evaluate the impact of each module’s independent and combined work on the model’s performance.

When the MHSA module is introduced alone, the recall rate and the average precision at a 0.5 IoU threshold increase by 1.3% and 0.6%, respectively, while the precision rate and the average precision at higher IoU thresholds decrease by 0.3% and 1.0%, respectively. This indicates that the MHSA module enhances the model’s target recognition ability and the overall performance under more lenient criteria, but slightly increases the false detection rate. When the RFAConv module and the replacement with EIoU loss are introduced individually, the precision rate, recall rate, and average precision at both IoU thresholds all improve, indicating that the RFAConv module and EIoU loss make the model recognition more accurate and contribute significantly to enhancing the overall model effectiveness.

When both the MHSA module and RFAConv module are introduced simultaneously, compared with the baseline model and the model after introducing each module separately, the accuracy improves by 1.6%, 1.9%, and 1.2%, respectively, indicating that these two modules have a synergistic effect in improving the accuracy of the model in recognizing targets. In contrast to the baseline model, the integration of the MHSA module with EIoU loss outperforms the model employing the MHSA module alone or EIoU loss independently, the recall rate increases by 1.4%, 0.2%, and 0.9%, and the precision rate decreases by 0.6%, 0.3%, and 1.4%, respectively. This suggests that the MHSA module enhances the model’s recognition capabilities but marginally elevates the false detection rate, whereas EIoU loss amplifies the benefits of the MHSA module while preserving the average precision.

In contrast to the baseline model, the combined use of the RFAConv module with EIoU loss outperforms the model using the RFAConv module alone, and the model after replacing the loss with EIoU loss alone, the precision rate improves by 1.9%, 1.5%, and 1.1%, the recall rate slightly decreases, and the average precision at both IoU thresholds improves. This suggests that the integration of the RFAConv module with EIoU loss contributes significantly to enhancing the model’s overall effectiveness.

Finally, the improved model, under the synergistic effect of multiple improved modules, has a significant improvement on indicators such as precision rate, recall rate, and average precision at both IoU thresholds compared to the baseline model.

3.6. Comparison Experiments of Different Foreign Matters Combinations

To verify the detection performance of the improved model for single or multiple foreign matters, seven groups of comparative experiments were set up for four types of foreign matters, including combinations of raisins with fruit stems, raisins with branches, raisins with stones, raisins with plastic, raisins with fruit stems and branches, raisins with stones and plastic, and raisins with all types of foreign matters combined. The detection results are shown in Figure 14. The outcomes of the experimentation demonstrate that the refined model showcases robust detection capabilities both in the detection of single foreign matters and in the detection of combinations of multiple foreign matters, but it is more difficult to detect foreign matters that are severely obscured.

4. Discussion

The automated detection of agricultural products is currently one of the more important research areas, and for raisins, a considerable number of researchers have already implemented automated detection and classification using traditional machine learning techniques. Abbasgolipour M. et al. [22] engineered a machine vision-driven automatic sorting apparatus for raisins, comprising a conveyor belt, illumination module, control and processing unit, and a sorting mechanism. The system classifies raisins by initially segmenting them from the background in images and subsequently evaluating them based on hue, saturation, and intensity (HSI) color attributes, realizing a 93% classification accuracy. Mollazade K. et al. [23] enhanced the preprocessing and segmentation of four raisin types—green, green with stems, black, and black with stems—leveraging image processing and data mining to extract a robust feature set of 44, including 36 color and 8 shape features. These data were fed into an Artificial Neural Network (ANN) with a 7-6-4 topology, achieving a 96.33% classification accuracy. Wang S. et al. [24] introduced a classification approach that amalgamates morphological, color, and texture features from RGB images, yielding 74 features classified via five models, with the linear discriminant model leading in accuracy at 99%. Yu X. et al. [25] proposed a classification strategy integrating color and texture features, employing linear discriminant analysis, SIMCA, and LS-SVM to construct models that identified a 95% accuracy rate in classifying four raisin types. Karimi N. et al. [26] utilized four extraction techniques to distill 146 texture features, subsequently refined by principal component analysis (PCA), and applied ANN and SVM classifiers to differentiate between high-quality and substandard raisins, with SVM demonstrating superior accuracy at 92.71%. Çınar İ. et al. [27] discerned seven morphological features and classified raisin varieties using logistic regression, multilayer perceptrons, and support vector machines, with SVM attaining the highest classification accuracy of 86.44%. Khojastehnazhand M. et al. [28] advanced a texture-feature-based machine vision technique for bulk raisin classification, utilizing a Support Vector Machine with GLRM features to achieve 85.55% accuracy in quality classification and 69.78% in differentiating raisins from foreign matter such as sawdust and thistles. Sahin O. et al. [29] compared six machine learning models in classifying Turkish raisins, with the Multi-Layer Perceptron outperforming others at 94.07% accuracy. MOHAMED T.M. et al. [30] designed a raisin seed classification model, which employs feature normalization and principal component analysis for preprocessing, and then input the data into a fully connected neural network for the classification task, achieving an accuracy and F-Score exceeding 91%.

Current studies reveal that conventional machine learning techniques have reached an exceptional level of precision in raisin classification, surpassing industry grading benchmarks. However, these methods are mostly focused on the classification of the raisins themselves, neglecting the presence of endogenous and exogenous foreign matter during the drying, collection, and processing stages. Moreover, existing methods heavily rely on feature engineering, which often requires a significant amount of time and effort from researchers during the model-building process. Handling situations with material density and complex backgrounds is particularly challenging.

Deep learning is a new detection technology that has emerged in recent years, and a large number of researchers are currently using deep learning techniques to detect agricultural products. Parvathi S. et al. [31] developed a deep learning model leveraging Faster R-CNN and ResNet-50 to identify the ripeness stages of coconuts against intricate backgrounds, attaining a model accuracy of 0.894 and an average processing velocity of 3.124 s per image. Wang Z. et al. [32] put forth a tomato ripeness detection technique grounded in the MatDet model, an iteration of Faster R-CNN that employs ResNet-50 as its backbone, integrates RolAlign for precise bounding box delineation, and utilizes PANet for multi-scale feature fusion. This model excelled in complex environments with branch occlusions, fruit superpositions, and variable lighting, reaching an average accuracy of 96.14% and effectively mitigating challenges like occlusion, overlapping, and lighting variations. Wang Z. et al. [33] also introduced a real-time apple stem/calyx recognition system founded on the YOLO-v5 algorithm, which through the optimization of hyperparameters, transfer learning, and the refinement of model parameters and weight volumes using detection head search algorithms and pruning techniques, achieves an overall reduction of approximately 71%, enabling a real-time detection rate of 25.51 frames per second on a CPU with an accuracy of 93.89%. Mathew M. P. et al. [34] presented a disease detection methodology for sweet pepper plant leaves, capitalizing on the YOLOv5 model, which adeptly identified early bacterial spot infections with a compact model size, an average accuracy of 0.907, and an average processing speed of 20ms per image. Yang S. et al. [35] proposed a strawberry ripeness detection strategy predicated on the LS-YOLOv8s model, an evolution of the YOLOv8 algorithm that assimilates the LW-Swin Transformers model and introduces novel random variables to govern contrast enhancement, thereby enhancing model performance. The LS-YOLOv8s model achieved a detection accuracy of 94.4% and a rate of 19.23 frames per second, with 51.93% of the parameters of the YOLOv8s model.

The current studies comprehensively showcase the applicability of deep learning in agricultural product detection, with the YOLO model exemplifying the efficacy of single-stage object detection approaches. Although its accuracy cannot be compared to that of traditional machine learning methods and two-stage object detection methods, it has significant advantages in terms of detection speed and the resources required, while also achieving product standard accuracy. In the detection of foreign matter in raisins, there are situations where materials and foreign matter overlap to some extent, and some foreign matter is very small, making detection even more challenging.

To address this, we propose an enhanced real-time detection approach leveraging an improved YOLOv8 model. This novel method integrates MHSA from BoTNet into YOLOv8’s backbone. In the model’s neck layer, selected C2f modules have been strategically replaced with RFAConv modules. The model also adopts an EIoU loss function in place of the original CIoU. Our experiments reveal that the refined YOLOv8 boasts a precision of 94.5%, a recall rate of 89.9%, and an F1-score of 0.921, with a mAP reaching 96.2% at the 0.5 IoU threshold and 81.5% across the 0.5–0.95 IoU range. For this model, comprising 13,177,692 parameters, the average time required for detecting each image on a GPU is 7.8 milliseconds. The model demonstrates good performance both in comparison with existing models and in the detection of samples composed of single or multiple foreign substances combined with raisins. Nonetheless, there remains scope for enhancing the research. First, although the model has shown some improvement in detecting raisins under conditions where they are obscured by foreign substances, there is still an issue of missed detection. Second, although the model has shown promising results in terms of accuracy and computational efficiency, further enhancements can be achieved through techniques such as pruning and knowledge distillation, which are also being explored in the domain of agricultural product detection. Third, the current model is only targeted at the seedless white variety of raisins from Turpan City, Xinjiang Uyghur Autonomous Region, China. In future research, the range of dried grape varieties can be expanded to increase the applicability of the model.

5. Conclusions

This study introduces a real-time detection approach for identifying foreign matter in mixed raisins, utilizing an enhanced version of YOLOv8. The proposed method integrates BoTNet’s multi-head self-attention (MHSA) into YOLOv8’s backbone, substitutes the neck layer’s Conv module with RFAConv, switches from the original CIoU loss to EIoU loss, and uses a new raisin foreign matter dataset for training. After experiments, the F1-score and mAP0.5 of the improved model increased by 1.21% and 1.48%, respectively, compared to the original model, and its comprehensive performance is close to that of the YOLOv8m model with 96.1% more parameters.

Comparing the average precision of the improved and original models for different categories, although the detection performance of this method has improved in various categories, there are still problems of being unable to detect or misjudge small or severely occluded foreign matter, especially in the detection of fruit stalks and branches. In comparison to various deep learning-based object detection algorithms, this approach delivers excellent detection outcomes with reduced parameter count and enhanced speed, striking an optimal balance between accuracy and efficiency. It is hardware-efficient and demonstrates robust practical utility.

Using this method for rapid detection of foreign matter in raisins and combining it with a removal platform can greatly reduce labor costs, reduce hardware costs, and promote the intelligent development of the raisin industry, which has a positive significance for the promotion of agricultural products. Currently, this method still has shortcomings in detecting dense small objects and occluded objects. In our ongoing efforts, we plan to delve into refining and streamlining the model for implementation on mobile and embedded systems.

Author Contributions

M.N.: conceptualization, methodology, writing—reviewing and editing. H.M.: data curation, writing—original draft preparation, visualization. Y.W.: supervision. L.C.: software. Y.C.: validation. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key R&D Program of China (2022YFD2100304), the National Natural Science Foundation of China (Grant No. 52275001), and the National Science Foundation for Young Scientists of China (Grant No. 51705201).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets we produced and the experimental data we obtained are not publicly available.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mc Carthy, U.; Uysal, I.; Badia-Melis, R.; Mercier, S.; O’Donnell, C.; Ktenioudaki, A. Global food security–Issues, challenges and technological solutions. Trends Food Sci. Technol. 2018, 77, 11–20. [Google Scholar] [CrossRef]
Christensen, L.P. Raisin Production Manual; University of California, Division of Agriculture and Natural Resources: Davis, CA, USA, 2000; Volume 3393. [Google Scholar]
Payne, K.; O’Bryan, C.A.; Marcy, J.A.; Crandall, P.G. Detection and prevention of foreign material in food: A review. Heliyon 2023, 9, e19574. [Google Scholar] [CrossRef]
Huxsoll, C.C.; Bolin, H.R.; Mackey, B.E. Near infrared analysis potential for grading raisin quality and moisture. J. Food Sci. 1995, 60, 176–180. [Google Scholar] [CrossRef]
Satake, T.; Chang, S.; Omori, S.; Fujioka, O.; Sakata, O. Basic Study on Grading of Chinese Dried Green Raisin Using Image Information. Nogyo Shisetsu (J. Soc. Agric. Struct. Jpn.) 2003, 33, 217–224. [Google Scholar]
Liu, X.X.; Xu, L.M.; Yuan, Q.C. Current status and development trends of raisin automatic grading technology. Agric. Equip. Veh. Eng. 2018, 56, 11–15. [Google Scholar]
Abbas, H.M.T.; Shakoor, U.; Khan, M.J.; Ahmed, M.; Khurshid, K. Automated sorting and grading of agricultural products based on image processing. In Proceedings of the 2019 8th International Conference on Information and Communication Technologies (ICICT), Karachi, Pakistan, 16–17 November 2019; pp. 78–81. [Google Scholar]
Chen, L. Research on the development of agricultural product quality detection technology. Shanxi Agric. Econ. 2017, 15, 43–44. [Google Scholar]
Patel, K.K.; Kar, A.; Jha, S.N.; Khan, M.A. Machine vision system: A tool for quality inspection of food and agricultural products. J. Food Sci. Technol. 2012, 49, 123–141. [Google Scholar] [CrossRef]
Jha, K.; Doshi, A.; Patel, P.; Shah, M. A comprehensive review on automation in agriculture using artificial intelligence. Artif. Intell. Agric. 2019, 2, 1–12. [Google Scholar] [CrossRef]
Raihen, M.N.; Akter, S. Prediction modeling using deep learning for the classification of grape-type dried fruits. Int. J. Math. Comput. Eng. 2024, 2, 1–12. [Google Scholar] [CrossRef]
Backes, A.R.; Khojastehnazhand, M. Optimizing a combination of texture features with partial swarm optimizer method for bulk raisin classification. Signal Image Video Process. 2024, 18, 1–8. [Google Scholar] [CrossRef]
Gao, F.; Fu, L. Zhang, X.; Majeed, Y.; Karken, M.; Zhang, Q. Multi-class fruit-on-plant detection for apple in SNAP system using Faster R-CNN. Comput. Electron. Agric. 2020, 176, 105634. [Google Scholar] [CrossRef]
Tian, Y.; Yang, G.; Wang, Z.; Wang, H.; Li, E.; Liang, Z. Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Comput. Electron. Agric. 2019, 157, 417–426. [Google Scholar] [CrossRef]
Ma, N.; Su, Y.; Yang, L.; Li, Z.; Yan, H. Wheat Seed Detection and Counting Method Based on Improved YOLOv8 Model. Sensors 2024, 24, 1654. [Google Scholar] [CrossRef] [PubMed]
Tasnim, S.; Qi, W. Progress in Object Detection: An In-Depth Analysis of Methods and Use Cases. Eur. J. Electr. Eng. Comput. Sci. 2023, 7, 39–45. [Google Scholar] [CrossRef]
BS EN 12020-2:2022-TC. Available online: https://knowledge.bsigroup.com/products/aluminium-and-aluminium-alloys-extruded-precision-profiles-in-alloys-en-aw-6060-and-en-aw-6063-tolerances-on-dimensions-and-form-3?version=tracked&tab=overview (accessed on 16 January 2023).
Srinivas, A.; Lin, T.Y.; Parmar, N.; Shlens, J.; Abbeel, P.; Vaswani, A. Bottleneck transformers for visual recognition. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Tennessee, TN, USA, 20–25 June 2021. [Google Scholar]
Zhang, X.; Liu, C.; Yang, D.; Song, T.; Ye, Y.; Li, K.; Song, Y. Rfaconv: Innovating spatital attention and standard convolutional operation. arXiv 2023, arXiv:2304.03198. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the 2020 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
Abbasgolipour, M.; Omid, M.; Keyhani, A.; Mohtasebi, S.S. Sorting raisins by machine vision system. Mod. Appl. Sci. 2010, 4, 49. [Google Scholar] [CrossRef]
Mollazade, K.; Omid, M.; Arefi, A. Comparing data mining classifiers for grading raisins based on visual features. Comput. Electron. Agric. 2012, 84, 124–131. [Google Scholar] [CrossRef]
Wang, S.; Liu, K.; Yu, X.; Wu, D.; He, Y. Application of hybrid image features for fast and non-invasive classification of raisin. J. Food Eng. 2012, 109, 531–537. [Google Scholar] [CrossRef]
Yu, X.; Liu, K.; Wu, D.; He, Y. Raisin quality classification using least squares support vector machine (LSSVM) based on combined color and texture features. Food Bioprocess Technol. 2012, 5, 1552–1563. [Google Scholar] [CrossRef]
Karimi, N.; Kondrood, R.R.; Alizadeh, T. An intelligent system for quality measurement of Golden Bleached raisins using two comparative machine learning algorithms. Measurement 2017, 107, 68–76. [Google Scholar] [CrossRef]
Çınar, İ.; Koklu, M.; Taşdemir, Ş. Classification of raisin grains using machine vision and artificial intelligence methods. Gazi Muhendis. Bilim. Derg. 2020, 6, 200–209. [Google Scholar]
Khojastehnazhand, M.; Ramezani, H. Machine vision system for classification of bulk raisins using texture features. J. Food Eng. 2020, 271, 109864. [Google Scholar] [CrossRef]
Sahin, O. Raisin Grain Classification Using Machine Learning Models. In Proceedings of the 2023 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS 2023), Bhopal, India, 18–19 February 2023. [Google Scholar]
Mohamed, T.M.; Abdulkader, S.N. A Novel Deep Learning Model For Raisin Grains Classification. J. Theor. Appl. Inf. Technol. 2023, 101, 21. [Google Scholar]
Parvathi, S.; Selvi, S.T. Detection of maturity stages of coconuts in complex background using Faster R-CNN model. Biosyst. Eng. 2021, 202, 119–132. [Google Scholar] [CrossRef]
Wang, Z.; Ling, Y.; Wang, X.; Meng, D.; Nie, L.; An, G.; Wang, X. An improved Faster R-CNN model for multi-object tomato maturity detection in complex scenarios. Ecol. Inform. 2022, 72, 101886. [Google Scholar] [CrossRef]
Wang, Z.; Jin, L.; Wang, S.; Xu, H. Apple stem/calyx real-time recognition using YOLO-v5 algorithm for fruit automatic loading system. Postharvest Biol. Technol. 2022, 185, 111808. [Google Scholar] [CrossRef]
Mathew, M.P.; Mahesh, T.Y. Leaf-based disease detection in bell pepper plant using YOLO v5. Signal Image Video Process. 2022, 16, 841–847. [Google Scholar] [CrossRef]
Yang, S.; Wang, W.; Gao, S.; Deng, Z. Strawberry ripeness detection based on YOLOv8 algorithm fused with LW-Swin Transformer. Comput. Electron. Agric. 2023, 215, 108360. [Google Scholar] [CrossRef]

Figure 1. Image acquisition platform.

Figure 2. Raisin images to be detected.

Figure 3. Images after data augmentation.

Figure 4. Overall structure of YOLOv8.

Figure 5. Overall structure of the improved model.

Figure 6. MHSA structure in BoTNet.

Figure 7. RFAConv structure diagram.

Figure 8. CIoU loss.

Figure 9. EIoU loss.

Figure 10. Normalized confusion matrix.

Figure 11. Loss curves and precision curves.

Figure 12. Visualization of training results. The blue arrows highlight the improvements in the improved model compared to the original model.

Figure 13. The detection effects of different models.

Figure 14. The Detection effects of different foreign matter combinations.

Table 1. Experimental platform configuration and parameters.

Parameter	Configuration
CPU	Intel(R) Xeon(R) Gold 6430
GPU	NVIDIA GeForce RTX 4090
GPU Video Memory	24G
Operating System	Ubuntu22.04
Training Environment	PyTorch 2.1.0 + Python 3.10 + Cuda 12.1

Table 2. Comparison of experimental results of different models.

Model	P/%	R/%	F1-Score	mAP0.5/%	mAP0.5: 0.95/%	Params	FLOPs/G	Time/ms
Faster R-CNN	96.4	78.3	0.864	92.4	79.0	41,367,656	269.0	23.5
SSD	83.7	60.9	0.705	73.7	51.9	13,437,006	30.5	6.3
YOLOv5m	94.2	90.7	0.924	94.4	79.8	20,865,057	47.9	9.1
YOLOv5s	94.4	88.0	0.911	93.5	75.1	7,020,913	14.4	6.5
YOLOv7	94.9	92.7	0.938	96.1	82.7	36,497,654	103.2	9.3
YOLOv7-tiny	92.6	89.5	0.910	94.8	74.4	6,015,714	13.0	8.8
YOLOv8x	92.8	93.3	0.931	96.6	84.8	68,127,420	257.4	10.3
YOLOv8l	94.3	91.9	0.931	96.0	84.6	43,609,692	165.4	9.5
YOLOv8m	93.1	91.4	0.922	95.9	83.8	25,842,076	78.7	8.0
YOLOv8s	92.7	89.4	0.910	94.8	81.2	11,127,132	28.4	7.2
YOLOv8n	93.2	86.8	0.898	93.7	77.7	3,006,428	8.1	6.3
Ours	94.5	89.9	0.921	96.2	81.5	13,177,692	33.1	7.8

Table 3. Experimental results of different models for different foreign matter classification accuracy (%).

Model	Fruit Stem	Branch and Truck	Stone	Plastic
Faster R-CNN	84.3	90.6	95.4	97.3
SSD	49.9	70.5	86.9	87.6
YOLOv5m	88.3	94.5	96.4	98.3
YOLOv5s	87.5	93.7	95.5	97.3
YOLOv7	90.5	96.7	98.1	99.2
YOLOv7-tiny	88.2	95.6	97.4	98.1
YOLOv8x	94.0	95.5	98.0	98.7
YOLOv8l	91.6	95.0	98.3	98.9
YOLOv8m	91.9	94.8	97.8	98.8
YOLOv8s	90.2	93.1	97.6	98.3
YOLOv8n	86.4	93.8	96.7	98.0
Ours	93.3	94.7	98.1	98.8

Table 4. Ablation experiment results.

Base Line	+MHSA	+RFA Conv	+EIoU	P/%	R/%	mAP 0.5/%	mAP0.5: 0.95/%	Params	FLOPs /G
YOLOv8s				92.7	89.4	94.8	81.2	11,127,132	28.4
YOLOv8s	+			92.4	90.7	95.4	80.2	11,915,100	29.1
YOLOv8s		+		93.1	89.7	95.3	81.7	12,389,724	32.5
YOLOv8s			+	93.5	90.0	95.2	81.4	11,127,132	28.4
YOLOv8s	+	+		94.3	89.7	95.4	80.6	13,177,692	33.1
YOLOv8s	+		+	92.1	90.9	95.4	80.5	11,915,100	29.1
YOLOv8s		+	+	94.6	89.1	95.6	82.3	12,389,724	32.5
YOLOv8s	+	+	+	94.5	89.9	96.2	81.5	13,177,692	33.1

“+” Indicates the introduction of this method.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ning, M.; Ma, H.; Wang, Y.; Cai, L.; Chen, Y. A Raisin Foreign Object Target Detection Method Based on Improved YOLOv8. Appl. Sci. 2024, 14, 7295. https://doi.org/10.3390/app14167295

AMA Style

Ning M, Ma H, Wang Y, Cai L, Chen Y. A Raisin Foreign Object Target Detection Method Based on Improved YOLOv8. Applied Sciences. 2024; 14(16):7295. https://doi.org/10.3390/app14167295

Chicago/Turabian Style

Ning, Meng, Hongrui Ma, Yuqian Wang, Liyang Cai, and Yiliang Chen. 2024. "A Raisin Foreign Object Target Detection Method Based on Improved YOLOv8" Applied Sciences 14, no. 16: 7295. https://doi.org/10.3390/app14167295

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Raisin Foreign Object Target Detection Method Based on Improved YOLOv8

Abstract

1. Introduction

2. Materials and Methods

2.1. Platform Setup

2.2. Sample Preparation and Image Collection

2.3. YOLOv8

2.4. Improvement

2.4.1. Backbone Network

2.4.2. Neck Layer

2.4.3. Loss Function

3. Results

3.1. Experimental Platform and Parameter Configuration

3.2. Performance Evaluation Metrics

3.3. Model Training Results and Visualization

3.4. Comparison Experiments with Different Object Detection Algorithms

3.5. Ablation Experiments

3.6. Comparison Experiments of Different Foreign Matters Combinations

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI