1. Introduction
Quality inspection in floriculture has been commonly conducted by humans; however, with the advancement of Industry 4.0, DL models have the potential to conduct real-time and remote inspection. Floriculture is a subdivision of horticulture that produces ornamental plants, flowers, and greenery [
1]. Over the years, floriculture has become one of the most profitable in the agricultural sector; the global flower market is valued at around 44 billion US dollars annually [
2,
3]. As in other developing countries such as India [
4,
5], Mexican floriculture is a crucial activity that generates over 250,000 direct jobs and nearly one million indirect jobs, of which nearly 60% are female workers [
2]. Handmade wreaths of preserved greenery are a type of product mainly destined for exportation, with quality and safety being prominent characteristics that must be revised during different production stages. Wreath manufacturers rely on visual inspection during the manufacturing process, aligning with traditional horticulture inspection to visually examine plants for signs of disease, pests, and other characteristics. However, this manual approach is a tiring activity prone to human error [
6] that can be time-consuming, subjective, and limited by the knowledge and skills of the individual inspector. Studies in the manufacturing sector reported an accuracy of 80% to 85% for correctly rejected precision-manufactured parts [
7]. Inspection error rates vary depending on many factors [
7], with values of 20% to 30% commonly identified in the literature [
8] and with variations depending on the inspection activity; for example, the error rate in inspections of highway bridges ranges from 48% to 19% [
9], while in metal casting inspection, it ranges from 17.8% to 29.8% [
10], indicating that the reliability and accuracy of visual inspection frequently prove insufficient [
11]. Despite the importance of visual inspection in floriculture, particularly in floral wreath manufacturing, there is scarce evidence regarding the accuracy level or error rates in this sector, which leads us to the first research question.
The growing interest in recent years for utilizing digital technologies in horticulture includes robots [
12], digital twins [
13], internet of things [
14], artificial intelligence (AI) [
15], including machine learning [
16], and DL [
17]. Notably, the inherent dependence on visual inspection has fostered the increasing use of DL techniques due to the positive effects and promising contribution to improving visual inspections. Several cases of DL have been deployed for a variety of purposes, including disease detection in fruits [
18], healthy flower detection [
4], flower recognition [
5], pest recognition [
17], surface defect detection [
19], and fruit grading [
20]. Despite these efforts, few studies on defect identification of floral wreaths have been conducted, particularly in developing countries; therefore, we proposed the following research question.
The increasing utilization of DL encompasses many possibilities to digitalize the floriculture and horticulture sector, particularly in derived products such as floral wreaths. However, most of the studies mainly focused on the advantages of DL and the computational efficiency metrics of different architectures and less on the actual accuracy of visual inspection, suggesting a research gap to be bridged. To the best of our knowledge, this research is the first to investigate the suitability of DL techniques within the inspection process of artisanal-made products, specifically those designed for ornamental purposes. This research marks a significant departure from traditional manual inspection methods by leveraging the power of AI, including quality control and human-dependence reduction, particularly in hand-made products in developing countries. This novel contribution holds promise in enhancing the efficiency and reliability of inspection procedures, ultimately benefiting the economic growth and global competitiveness of these countries’ artisanal industries.
2. Related Work
The main challenging problem in computer vision is object detection [
21] due to the complexity of recognizing objects and localizing them within the image. With the continuous updating of neural networks, different models are used to complete different situations.
Table A1 (
Appendix A) summarizes the most recurrent architectures for object detection in agriculture and horticulture applications. For inspection purposes, object detection algorithms, such as region-based classification, create a bounding box around the region of interest (ROI), such as density and color, thus detecting a defect or not. Broadly, standard methods are classified into single-stage and two-stage object detectors [
21]. The latter group includes Region-based Convolutional Neural Networks (R-CNNs), Fast R-CNN [
22], Faster R-CNN [
23], and Mask R-CNN [
24]. The former group includes YOLO (You Only Look Once) [
25] and single-shot multi-box detectors [
26]. Although two-stage detectors have demonstrated ideal accuracy, their detection is not very fast [
27]. One-stage detectors overcome these limitations, with YOLO and SSD being very popular due to being faster than two-stage deep-learning object detectors and requiring less time for model training [
28].
Since its release, the YOLO network has been used in several cases, including grape detection [
29,
30]; potted flower detection [
31]; tomato, flower, and node detection [
32]; pineapple surface defect detection [
28]; sugarcane stem node recognition [
33]; and tomato growth period tracking [
34]. The YOLO network has stood out in the average recognition speeds and accuracy for occluded grapes compared to Resnet50 and SSD300 [
29].
ResNet has been a recurrent model for multi-stage networks due to the resulting accuracy and the obtained balance between accuracy and training cost [
35]. ResNet is based on stacked residual units consisting of convolution and pooling layers [
36]. There are different versions, including ResNet18, ResNet34, ResNet50, Resnet10, and ResNet152, with the last three being reported as more accurate [
36]. Particularly, ResNet50 stands out as a popular version, achieving maximum validation accuracy among several pre-trained models [
37] as well as training accuracy [
38], thus outperforming similar networks such as ResNet101 [
39] and other network architectures including AlexNet, GoogLeNet, Inception v3, ResNet-101, and SqueezeNet [
35]. In fact, among these architectures, ResNet-50 struck the best balance between training cost and accuracy [
35].
We used both one-stage and two-stage networks to identify the network suitable for visual inspection purposes in horticulture contexts. Therefore, after conducting an evaluation of multiple object detection architectures and due to the good results of previous studies [
28,
29,
30,
31,
33,
34], we used YOLO networks, particularly YOLOv4-tiny, YOLOv5, and YOLOv8. YOLOv4-tiny is the compressed version of YOLOv4 designed to train on machines with less computing power [
40,
41]. YOLOv4, an improved version of YOLOv3, generates bounding-box coordinates and assigns probabilities to each category, converting the object detection task into a regression problem [
42]. Continuing with the YOLO family, Ultralytics proposed the YOLOv5 algorithm [
43], which is smaller and more convenient, enabling flexible deployment and more accurate detection [
44]. YOLOv8 represents the most recent iteration of the YOLO object detection model [
45,
46]. It incorporates numerous enhancements, including a novel neural network architecture that leverages both the Feature Pyramid Network (FPN) and the Path Aggregation Network (PAN) [
47], alongside the implementation of an improved labeling tool designed to streamline the annotation process.
Moreover, to provide a comprehensive comparison with the YOLO networks and due to the proven results in previous studies [
35,
37,
39], we also used ResNET50. This inclusion aims to enhance the validity and reliability of the findings, adding perspective to the evaluation. Moreover, by considering multiple architectures, the evaluation aimed to strike the right balance between accuracy, speed, and resource efficiency.
5. Discussion
This study assessed visual human inspection and deep learning models to detect objects and classify defects in decorative wreaths. The results indicated a similar performance between inspectors and models. When assessing five classes, inspectors showed an overall precision of 92.4%, just below the precision of 93.8% obtained with both YOLOv8 and YOLOv5. The accuracy obtained by the inspectors was 97%, while YOLOv5 exhibited an accuracy of 98.2% with 100 epochs, YOLOv8 obtained 93.9% with 50 epochs, and YOLOv4-tiny obtained 88.7% with 100 epochs. Concerning each class, both inspectors and algorithms presented mixed results. The inspectors obtained a larger precision than the algorithms did when assessing empty and low-volume wreaths; however, YOLOv4-tiny showed a greater precision than the inspectors did for brown and high-volume wreaths. Finally, for Ok wreaths, both inspectors and algorithms achieved a precision of 100%.
Regarding accuracy, the inspectors exhibited higher values than the algorithms did for empty, low-volume, and Ok classes. However, for classes brown and high volume, YOLOv5 presented higher accuracy values.
When assessing two classes, the inspectors exhibited a high precision, similar to all three YOLO models, while ResNet50 showed slightly inferior values. Regarding accuracy, the inspectors achieved an overall inspection accuracy value of 100% for both defective and correct wreaths, while the algorithms showed an overall accuracy of 94.5% and 97.1%, respectively.
5.1. Deep Learning Models
The findings highlight the potential of YOLOv4-tiny, YOLOv5, and YOLOv8 in accurately detecting and categorizing specific quality criteria associated with decorative wreaths. These three models were trained with five classes, allowing for a more comprehensive assessment of the wreaths’ quality. YOLOv8 and YOLOv4-tiny achieved high precision across different classes. While YOLOv5 demonstrated a slightly lower precision when identifying all five classes, it presented higher accuracy and recall results. Thus, all three one-stage models are adequate options, indicating their ability to precisely and accurately identify the different quality criteria of wreaths beyond a binary classification, thus representing a potential advantage in agricultural inspection, where multiple parameters contribute to overall product quality. Regarding assessing two classes, all one-stage models exhibited a larger precision and accuracy than ResNet50, thus confirming that YOLOs performed meaningly better than two-stage object detectors in inference time and detection accuracy [
21], due to the condition for a large dataset of labeled images per class to achieve reasonable performance [
52,
53].
All three single-stage models exhibited numerous precision, accuracy, or recall values greater than 99% when assessing five or two classes. Such metrics can suggest potential overfitting risks; however, these results are in line with previous studies that also prevented overfitting when using YOLOv4 [
64,
65], YOLOv5 [
66,
67,
68], or YOLOv8 [
69]. Overfitting is a fundamental issue in supervised machine learning, which prevents generalizing the models to fit observed data on training data, as well as unseen data on testing sets [
70]. Frequent problems for overfitting are the lack of sufficient training data or uneven class balance within the datasets [
71]; thus, overfitting is particularly common for models using small datasets [
48]. Therefore, similar to previous studies [
72,
73], we conducted different measures to prevent overfitting, including data augmentation [
48] and a 70%-30% training/testing ratio [
50]. In addition, to increase the robustness of the models, the images were captured using different wreaths for training and testing at different times and days and using different filters and light conditions.
Although different deep learning models were utilized to detect defects in different sectors [
74,
75], a few challenges persist due to the dependency of the performance on datasets [
76]. Despite utilizing a small dataset, our findings showed an adequate performance when using one-stage models (YOLOv4-tiny, YOLOv5, and YOLOv8). For cases where large datasets are unattainable, pre-training [
77] and transfer learning [
78] are practical options when using small datasets. Additionally, with multi-scale training, YOLO detects better on objects of different sizes and with an easy trade-off between the performance and inferences [
21]. Moreover, one-stage object detection models can work in real-time [
79], thus being an advantage over two-stage models. YOLO is designed for real-time object detection, representing an adequate option for inspection in real-time in the agricultural sector [
5,
54]. Among six different YOLO versions, YOLOv4-tiny presented the best combination of accuracy and speed, being considered for real-time in a previous study [
80]. We used Google Colab, obtaining adequate results and being effective in enabling deep learning architectures on resource-constrained computers. Researchers and small enterprises can effectively train their models with minimum investment by leveraging the available resources, including powerful GPUs and ample memory. The Pay-as-You-Go approach (including Watson Studio, Kaggle Kernels, Microsoft Azure Notebooks, and Codeanywhere) presents an accessible avenue for companies seeking to harness the potential of deep learning architectures without the need for expensive hardware upgrades.
5.2. Human Inspection
Our results indicated an overall human inspection accuracy of 97% (five classes) and 100% (two classes), which is greater than reported values in the manufacturing sector, where the accuracy ranges from 80% to 85% [
7], or even more critical industries such as the inspection of aircraft engine blades, where the average appraiser agreement with the ground truth was reported to be 67.69% [
81], and 84% when operators were allowed to use their hands and apply their tactile sense [
82]. In this regard, high values of accuracy and precision are related to different factors, including the complexity of the inspection process, the training level, and the experience of inspectors. In this study, wreaths are not considered a critical product, the inspection process is relatively simple, and inspectors averaged five years of experience, thus not presenting common problems in the agricultural context, such as object size and occlusion [
83]. Particularly, training is critical since a human inspector’s accuracy usually lies between 70 and 80% after a training period [
84]. Moreover, depending on the product, the inspection process might include pure visual inspection, another sensorial inspection such as tactile inspection, or a combination. Evidence from various industries suggested that a combination of visual and tactile inspection improved respondent detection [
82,
85].
For the attribute agreement analysis, we computed Kappa, which is a common statistic used to assess the effectiveness of attribute-based inspections [
86], allowing us to assess the reliability of agreement between a fixed number of assessors and being more robust than the percent agreement of the AAA [
87]. The overall kappa value of 0.9046 indicated a good level of absolute agreement of the assessments between inspectors with the standard. Compared with other industries, this measure is adequate since the agreement acceptance limits for the aerospace industry indicate unacceptable values lower than 80% [
88].
5.3. Inspection Challenges
The comparison of the performance between visual human inspection and inspection using AI tools has shown different results depending on several factors, including the complexity of the inspection, context, data availability, and inspectors’ experience, among others. Despite the common finding of DL algorithms outperforming human visual inspection [
89,
90], this is not always the case. Some studies reported mixed and similar results [
83,
91] or a lower model performance compared with human inspectors [
92]. This variety of results indicates a gap in fully automated inspection and a prevalence of human intervention, including cases where algorithms initiate the inspection and inspectors intervene for dubious items or items below a predefined threshold [
93,
94]. In all cases, DL models assist inspection activities by reducing human intervention, thus reducing physical and mental fatigue.
Time for processing and inspecting is another critical factor when comparing human visual inspection and AI-assisted inspection. In this study, the average human inspection time was 9.26 s per wreath, while models averaged less than 1 s per wreath; this is in line with previous studies where inspectors and algorithms exhibited similar performance except in detection speed, where algorithms are superior [
89,
91].
Visual and tactile inspection might enhance the inspection performance, particularly for cases with restricted views or where surface inspection is required. However, visual and sensorial inspections might be complex for agricultural and floricultural applications. Detection in agriculture settings has particular features and, frequently, is more challenging than standard detection benchmarks [
95]. Images of the field might comprise several objects with high-scale variance, occlusion of objects, and similarity to the background structures [
83].
In our study, wreath images present minimum scale variance. However, for inspections that require detecting objects at different scales and handling small objects effectively, feature extraction techniques, including Feature Pyramid Network (FPN) [
96], might be used, which has been successfully utilized in other object detection models, such as Faster R-CNN and RetinaNet. Additionally, the quantity of objects to inspect is critical. In datasets with several objects (dozens or hundreds), algorithms outperformed humans, while this is the opposite in datasets with the prominence of occlusion [
83].
5.4. Limitations and Future Work
The limitations of this study are various. The results are based on a specific and small dataset and, thus, may not generalize to inspections of other ornamental products or other floricultural and agricultural contexts. Regarding visual inspection, we utilized a small sample size of inspectors; therefore, subjectivity might affect the result. In addition, we did not control environmental conditions (e.g., temperature and humidity), which might affect the inspection process. This study focused on assessing visual inspection and DL inspection when detecting five classes of wreaths, thus not focusing on modifying the network, with this being a limitation of this study. The dataset was adequately balanced for five classes; however, for the complementary analysis using two classes, an adequate balance was not possible, due to limitations on the material. Thus, balancing the two classes for further analysis remains for future research. In addition, this research did not explore sociotechnical factors that might have an impact on the results. Therefore, further analysis is necessary, including refinement of the models and modifications of the networks to improve their performance. Moreover, expanding the dataset to include complex backgrounds is also recommended for future research. In addition, more research is required to validate these results in various agricultural settings, including exploring additional datasets and environmental conditions, as well as employing advanced techniques to advance the utilization of deep learning models in agricultural inspection and quality assurance.
6. Conclusions
This study compared human visual inspection with deep learning models for inspecting decorative wreaths. The results indicated that the models presented similar performance to humans in terms of precision and accuracy, highlighting the DL suitability in enhancing quality inspection by leveraging the models’ ability to capture subtle details and quality flaws that the human eye might miss. Notably, one-stage models such as YOLOv4-tiny, YOLOv5, and YOLOv8 resulted in a similar or slightly superior performance than inspectors in detecting quality flaws. Also, they outperformed a two-stage model such as ResNet50, providing evidence of adequate performance with small datasets and being suitable for real-time inspection. These findings have practical implications for quality control processes in floriculture and agriculture, aiding in identifying and mitigating material absence/excess and color-related issues.
Shifting the paradigm from predominant human-driven inspections to automated, human-supported inspections requires careful consideration. Implementation strategies should encompass a phased approach involving technology integration, training programs for human operators to transition into supportive roles, and validation of the automated systems’ performance against established benchmarks. Furthermore, ethical implications related to potential effects on labor should be thoughtfully addressed.
By assisting humans with digital technologies and automation for inspection purposes, organizations can embrace the full potential of Industry 4.0, making the inspection process more intelligent and reliable.