Research on a Real-Time, High-Precision End-to-End Sorting System for Fresh-Cut Flowers

Duan, Zhaoyan; Liu, Weihua; Zeng, Shan; Zhu, Chenwei; Chen, Liangyan; Cui, Wentao

doi:10.3390/agriculture14091532

Open AccessArticle

Research on a Real-Time, High-Precision End-to-End Sorting System for Fresh-Cut Flowers

by

Zhaoyan Duan

¹,

Weihua Liu

^1,*,

Shan Zeng

^2,*,

Chenwei Zhu

¹,

Liangyan Chen

¹ and

Wentao Cui

¹

School of Electric & Electronic Engineering, Wuhan Polytechnic University, Wuhan 430023, China

²

School of Mathematics & Computer Science, Wuhan Polytechnic University, Wuhan 430023, China

^*

Authors to whom correspondence should be addressed.

Agriculture 2024, 14(9), 1532; https://doi.org/10.3390/agriculture14091532

Submission received: 17 July 2024 / Revised: 22 August 2024 / Accepted: 3 September 2024 / Published: 5 September 2024

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

As the quality of life rises, the demand for flowers has increased significantly, leading to higher expectations for flower sorting system efficiency and speed. This paper presents a real-time, high-precision end-to-end method, which can complete three key tasks in the sorting system: flower localization, flower classification, and flower grading. In order to improve the challenging maturity detection, red–green–blue depth (RGBD) images were captured. The multi-task and multi-dimension-You Only Look Once (MTMD-YOLO) network was proposed to complete these three tasks in an end-to-end manner. The feature fusion was simplified to increase training speed, and the detection head and non-maximum suppression (NMS) were optimized for the dataset. This optimization allowed the loss function for the grading task to be added to train each task separately. The results showed that the use of RGBD and multi-task improved by 3.63% and 1.87% of mean average precision (mAP) on flower grading task, respectively. The final mAP of the flower classification and grading task reached 98.19% and 97.81%, respectively. The method also achieved real-time speed on embedded Jetson Orin NX, with 37 frames per second (FPS). This method provided essential technical support to determine the automatic flower picking times, in combination with a picking robot.

Keywords:

fresh-cut flower sorting system; RGBD images; multi-task and multi-dimension-You Only Look Once

1. Introduction

With the development of online shopping, an increasing number of people choose to buy fresh-cut flowers online, which presents higher requirements for the preservation period of fresh-cut flowers. However, the current flower sorting process is mainly carried out manually, which is subjective, tiring, and can reduce the preservation period of flowers. According to statistics, the processing loss of fresh-cut flowers after harvest can reach 31.88%, with the sorting loss accounting for 21.74% [1]. Therefore, a more automatic and accurate fresh-cut flower sorting method is required. Computer vision is a popular tool for object detection due to its ability to recognize objects from images. By collecting fresh-cut flower images and applying intelligent algorithms, computer vision can achieve accurate and efficient sorting, greatly reducing sorting loss and improving economic benefits.

Early research utilized machine learning methods in computer vision to extract features and recognize flowers. These methods included K-nearest neighbor [2], random forest (RF) algorithm [3], stochastic gradient descent (SGD) [4], support vector machines (SVMs) [5], all of which demonstrated satisfactory results in flower recognition. Some researchers combined methods to extract more features for recognition. For instance, Soleimanipour A. et al. [6] used principal component analysis (PCA), linear discriminant analysis (LDA), and SVM to classify flowers, achieving an accuracy of 99.50%. Patel I et al. [7] explored new morphological feature extraction methods and classified flowers using multiple kernel learning SVM, reaching an accuracy rate of 76.92%. Although good results have been achieved, these methods require manual extraction of features.

In recent years, due to the ability to automatically extract features, deep learning technology based on convolutional neural network (CNN) has gained significant attention. Tian M et al. [8] utilized the CNN classification model and softmax classifier to classify 17 types of flowers, achieving a precision rate of 92%. Anjani I A et al. [9] employed the CNN algorithm and dropout technology to streamline the automatic rose sales system, obtaining the accuracy of 96.33% on their test data. Cibuk M et al. [10] applied the hybrid classification method to flower classification, achieving the accuracy rate of 96.39%. The above research studies used CNN network and achieved good results, but the methods were not very real-time.

Currently, the anchor-based framework has become a research hotspot for object detection. Based on whether the classification and positioning process can be directly realized, it can be divided into two-stage algorithm and one-stage algorithm. The two-stage algorithm mainly included the region-convolutional neural network (R-CNN) series [11,12,13], while the one-stage algorithm mainly included the You Only Look Once (YOLO) series [14,15,16] and single-shot detection (SSD) series [17,18]. Building on the YOLO series network, Krishna K P et al. [19] proposed a panoramic driving perception system, which can simultaneously perform traffic target detection, drivable area segmentation, and lane detection. Gao Y L et al. [20] used YOLOv8MS network to explore automatic cultivation of corn, achieved a mean average precision (mAP) of 89.6%, and a multiple object tracking accuracy (MOTA) of 92.5%. One-stage object detection methods were simple and stable, providing fast detection speed to achieve flower sorting systems. However, they were less accurate for difficult-to-distinguish maturity levels in flower grading.

The addition of depth information through red–green–blue depth (RGBD) enables better differentiation of difficult-to-distinguish maturities. Sun X et al. [21] proposed a flower quality grading method based on deep learning and deep information. For diana rose, the RGBD improved InceptionV3 network was used for grading, achieving a grading accuracy of 98%. Fei Y et al. [22] classified the maturity of flame roses using depth information. Initially, traditional image segmentation was conducted to obtain edge information of fresh-cut flowers, followed by bract segmentation. The lightweight Shuffle Net V2 network was used for recognition, achieving a classification precision of 98% on the RGB flower dataset and 99% on the RGBD flower dataset. To enhance the efficiency and accuracy of the flower sorting system, this paper presented a real-time, high-precision end-to-end method. The main contributions of this paper were as follows:

Firstly, an RGBD flower sorting dataset was produced with RGBD images to improve the accuracy of difficult-to-distinguish maturity.

Secondly, the MTMD-YOLO network was constructed to an end-to-end realize flower sorting system, the feature fusion was simplified to increase training speed, and the detection head and non-maximum suppression (NMS) [23] were improved to make it suitable for a flower sorting dataset; the loss function for the maturity task was added to train each task separately.

Lastly, the multi-task and multi-dimension-You Only Look Once (MTMD-YOLO) network was compared with YOLO series and validated the performance of hardware in real time and difficult-to-distinguish maturity accuracy.

2. Materials and Methods

2.1. Experimental Design

The implementation process of the flower sorting system is depicted in Figure 1 Initially, a depth camera captured images of four varieties of rose flowers, and both color and depth images were acquired. Subsequently, the original image was cropped to isolate the region of interest (ROI), and then the cropped color image was fused with the depth image to produce the RGBD image. This image was then labeled to generate the flower sorting label with localization, species, and maturity. Combining the RGBD images with their corresponding labels formed an RGBD flower sorting dataset. Following this, the MTMD-YOLO model was established. This dataset was utilized for both model training and validation, and the trained model was employed for the classification and grading of fresh-cut flowers. Ultimately, the MTMD-YOLO model yielded the prediction results.

2.2. RGBD Flower Sorting Dataset

2.2.1. Depth Image Acquisition

The data acquisition platform is shown in Figure 2, which is composed of five parts: metal bracket, depth camera, background, robot arm, and acquisition object. To obtain more detailed information about the roses, a depth camera (Intel RealSense D435) was used to capture depth images and RGB images. The robot arm was used to fix the flowers, and the four most popular varieties of roses in the market were selected as shooting objects, which were representative of real garden flowers; each of the varieties had 40 roses, namely zhenai, anna, jinzhi, and weiguang, as shown in Figure 3. A total of 2400 RGB images and 2400 depth images were collected for one week. The depth images convey pixel-to-camera distance information, collected at 16-bit depth and standardized using min–max normalization [24]. Subsequently, the depth data were compressed to 8 bits and stored as depth images.

To simulate the complex sample environment and increase the reliability of the system in real-world scenarios, the dataset was collected under different background colors, illumination environments, and growing status. Varied backgrounds simulated colorful real-world environments. The background board included three colors: orange, green, and blue, each randomly switching during collection. The lighting environments were changed to simulate the low-light and direct-sunlight conditions. The lighting conditions were classified into bright, natural, and dark, and flower images were captured using indoor lights, natural daylight, and curtains. Different growing status measurements were collected to simulate the natural growth patterns of flowers in the garden. The initial maturity grade of roses was grade 1, gradually opening up to reach grade 5.

According to the standard of Product Quality Grade for Cut Flowers Auction (SB/T 11098.2–2014) [25], the roses were divided into five maturity grades. Sun X et al. [21] also used this method for flower grading in his study. Figure 4 illustrates the five maturity grades using anna rose as an example. Figure 4a is Grade 1, which was the separation of sepals from each other, not opened from the petals. Figure 4b is Grade 2, which was the complete opening of sepals, with 3–5 petals opening from the top and separating from the top. Figure 4c is Grade 3, which was the opening of more than 5 petals and separating from the top, Figure 4d is Grade 4, which was the opening of 50% petals from the top. Figure 4e is Grade 5, which was the opening of 50% petals from the top.

2.2.2. Image Preprocessing

The image preprocessing steps are shown in Figure 5. Firstly, the original RGB image and the depth image were cut to standardize the size, which was from 1280 × 720 × 3 to 480 × 480 × 3 and from 1280 × 720 × 1 to 480 × 480 × 1, respectively. Then, the RGB image and the depth image were merged into an RGBD image by channel fusion. The image format was png, and the size was 480 × 480 × 4. The channel fusion method increased the input information without increasing the number of images, thus saving loading time, then processing the flower sorting labels. Labeling software was used to label the bounding box and species of flowers. To adopt the grading task, the code was used to add the maturity grade, and finally the RGBD images and the flower sorting label files were obtained. Finally, the dataset of 2400 fused RGBD images and sorting labels was expanded, and 9600 RGBD images were obtained by rotating 90 degrees, 180 degrees, and 270 degrees clockwise. A total of 9600 RGBD images and corresponding sorting labels comprised an RGBD flower sorting dataset, with 7680 images (80%) allocated to the training set and 1920 images (20%) designated for the validation set.

2.3. MTMD-YOLO Detection Model

The network architecture of the MTMD-YOLO is shown in Figure 6, including five parts: input, CSP Darknet53 [14] backbone network, neck network, double-label detection head, and output. The input terminal was responsible for inputting the information to be learned by the network, using a proprietary RGBD flower sorting dataset. The backbone network was responsible for extracting the features of the input images. The neck network was responsible for fusing the features generated from the backbone network. The simplified structure of feature pyramid network (FPN) [26] and path aggregation network (PAN) [27] was used to improve the detection accuracy and reduce the complexity of the model. Finally, two different scales of double-label detection heads and double-label NMS completed the three tasks of detection, classification, and grading of fresh-cut flowers, and directly output the prediction information of the three tasks at the output end. In the MTMD-YOLO network, there was no additional number of images and no additional convolution. It did not increase the complexity of the model while completing the three tasks and allowed our network to easily perform end-to-end training.

2.3.1. Feature Fusion

The neck structure played a crucial role in integrating features derived from the backbone network. The origin structure is shown in Figure 7a, where CUCC represents the Conv, Upsample, Concat, and C3 modules, respectively, while CCC represents the Conv, Concat, and C3 modules, respectively. After being down-sampled {8, 16, 32} times, the feature maps of {P3, P4, P5} were obtained, where the P5 layer was the smallest resolution. After the neck part, 80 × 80, 40 × 40, and 20 × 20, three scales of tensors, were put to detection heads. In order to lighten the network structure, the simplified structure was proposed and is shown in Figure 7b. The feature fusion values of the P3, P4, and P5 layers were changed to feature fusion of P4 and P5 feature layers, and the detection heads were also changed from the three scales of 80 × 80, 40 × 40, and 20 × 20 to the two scales of 40 × 40 and 20 × 20. The improved neck structure can reduce redundant feature layers and reduce the number of parameters of the model while ensuring accuracy.

2.3.2. Double-Label Detection Head

Unlike the ordinary single-label network, the prediction information of the MTMD-YOLO network included location information, confidence information, species prediction, and maturity prediction. To enable the network to complete the grading task, a set of maturity information was added to the detection head. The tensor information of the double-label detection head is shown in Figure 8, where S is the number of species, M is the number of maturity, S = 4, M = 5, Ni is the size of the two feature maps, N4 = 40, N5 = 20. N4 and N5 scale feature maps were received from the neck network by double-label detection head, and 40 × 40 × 3 (5 + S + M) and 20 × 20 × 3 (5 + S + M) pixel-sized tensors were output. Then, three priori anchor boxes with different aspect ratios were assigned to each grid of the multi-scale feature map, and the content of each anchor box included four location information boxes; the confidence information represented the probability of whether the prediction box contained the prediction object; S was the species score of fresh-cut flowers; M was the maturity score of fresh-cut flowers.

2.3.3. Double-Label NMS

The crucial part to accomplishing the three tasks of localization, classification, and grading in the same network was the improvement of the NMS [24]. NMS was the post-processing of predictive information, as shown in Figure 9. NMS in YOLOv5 is shown in Figure 9a and included four layers of filtering: the first layer of size filtering filters oversized and undersized boxes, the second layer of confidence filtering filtered the lower confidence, the third layer of score filtering filtered the class score to obtain the predict class. The fourth layer of number filtering controlled the boxes number within a range. The most accurate prediction boxes and prediction class were finally obtained. The NMS in MTMD-YOLO is shown in Figure 9b; the score filtering layer was divided into two branches to accomplish the prediction of the two tasks, then was merged after filtering the maturity and species separately. Finally, the prediction species and maturity were obtained.

2.3.4. Loss Function of Multi-Task

Because the characteristics of maturity were similar, the difficulties of the maturity detection were higher than the species detection. To train the loss function of species and maturity separately, the loss of maturity was increased. The total loss

l_{a l l}

was composed of target confidence loss

l_{o b j}

, bounding box loss

l_{b o x}

, fresh-cut flower species loss

l_{s p e c i e s}

, and fresh-cut flower maturity loss

l_{m a t u r i t y}

, where

a_{1}

,

a_{2}

,

a_{3}

, and

a_{4}

were the coefficients used to balance the total loss. The formula is as follows:

l_{a l l} = a_{1} l_{o b j} + {a_{2} l}_{b o x} + a_{3} l_{s p e c i e s} + a_{4} l_{m a t u r i t y}

(1)

The coefficients are

a_{1}

= 1,

a_{2}

= 0.05,

a_{3}

= 0.5, and

a_{4}

= 1.0. Similar to the YOLOv5 network, the

l_{b o x}

of the MTMD-YOLO network used the CIoU [28] loss. The CIoU loss took into account the distance, overlap, size similarity, and aspect ratio between the predicted box and the actual box, making the model more comprehensive to learn the characteristics of the target box. The calculation formula of CIoU is as follows:

C I o U = 1 - I o U + \frac{ρ^{2} (b_{A}, b_{B})}{c^{2}} + α v

(2)

α = \frac{v}{(1 - I o U) + v}

(3)

v = \frac{4}{π^{2}} {(a r c t a n \frac{w_{B}}{h_{B}} - a r c t a n \frac{w_{A}}{h_{A}})}^{2}

(4)

In the formula,

ρ^{2} (b_{A}, b_{B})

represents the Euclidean distance between the center point of the prediction box and the actual box, and

c

is the diagonal distance of the rectangle with the minimum circumscribed.

v

is a correction factor, which was used to further adjust the loss function, by considering the shape and direction of the target box.

{(w}_{B} {, h}_{B})

and

{(w}_{A} {, h}_{A})

are the width and height of the actual box and the predicted box, respectively. Binary cross-entropy loss (BCE Loss) [29] was used to attenuate

l_{o b j}

,

l_{s p e c i e s}

,

l_{m a t u r i t y}

. The calculation formula for the BCE loss is as follows:

B C E = - \frac{1}{N} \sum_{i = 1}^{N} (y_{i}^{^} * \log (p (y_{i})) + (1 - y_{i}^{^}) * \log (1 - p (y_{i})))

(5)

y_{i} = S i g m o i d (x_{i}) = \frac{1}{1 + e^{- x_{i}}}

(6)

In the formula, N represents the total number of classes,

x_{i}

is the predicted value of the current class,

y_{i}

is the probability of the current class obtained by the activation function, and

y_{i}^{^}

is the true value of the current class (0 or 1).

2.4. Experiment Setting and Evaluation Indicators

In this work, all experiments used the Windows 10 operating system, 64 GB RAM, NVIDIA RTX A4000 graphics card; the deep learning framework used PyTorch 1.13.0, and the programming language was Python 3.8. The embedded devices utilized Jetson Orin NX.

Six evaluation indicators were used to evaluate the performance of the target detection model: precision (P), recall (R), average precision (AP), mean average precision (mAP), F1 score (F1), frames per second (FPS), and parameters (Params).

P is the ratio of the number of positive samples correctly predicted to the number of positive samples predicted. The calculation formula is as follows:

P = \frac{T P}{T P + F P} \times 100 %

(7)

where

T P

means true positive,

F P

means false positive. R is the proportion of positive samples in the predicted correct samples. The calculation formula is as follows:

R = \frac{T P}{T P + F N} \times 100 %

(8)

where

F N

means false negative. AP is the area under the P-R curve. The calculation formula of AP is as follows:

A P = \int_{0}^{1} P (R) d R

(9)

The mAP is the average of all categories of APs. The calculation formula of mAP is as follows:

m A P = \frac{1}{n} \sum_{j = 1}^{n} A P (j)

(10)

where

n

is the number of categories, and

A P (j)

represents the AP of the category (j). F1 score is used to assess the balance of accuracy and recall.

F 1 = \frac{P \times R}{P + R} \times 100 %

(11)

The detection accuracy uses FPS, which represents the number of images that can be processed per second. The complexity of the model is expressed by the number of parameters. The calculation formula is as follows:

P a r a m s = C_{0} \times (C_{i} {\times k}_{w} \times k_{h} + 1)

(12)

where

C_{0}

represents the number of output channels,

C_{i}

represents the number of input channels,

k_{w}

and

k_{h}

represent the width and height of the convolution kernel, respectively.

3. Experimental Results and Analysis

3.1. Optimization Experiment

3.1.1. Feature Fusion Optimization

To compare the effects of different feature layers selected by PAN, three combinations of P3~P4, P4~P5, and P3~P5 were selected to compare mAP and Params. The ↑ after the index indicated that the larger the index, the better the effect. The ↓ indicated that the smaller the index, the better the effect. The comparison results are shown in Table 1. The results showed that the mAP of the P4~P5 layer was the highest, 7.67% and 3.49% higher than the P3~P4 layer and P3~P5 layer, respectively. The number of parameters of the P3~P4 layer was the smallest, 4.56M and 5.13M smaller than the P4~P5 layer and P3~P5 layer, respectively, but the accuracy was lower than the other two. The number of parameters of the P3~P5 layer was 0.57M lower than the P4~P5 layer, and the mAP was 3.49% lower than that in the P4~P5 layer. This was because the target of this dataset in this paper was about 96–340 pixels, which was smaller than the anchor frame in the P3 layer which was 10–33 pixels. And the resolution of P3 was greater than that of P4 and P5, increasing the speed of the network′s operation. So, the P3 layer was not suitable for this dataset in this work and increased the false detection rate. Therefore, choosing the P4~P5 layer to complete the feature fusion can not only increase the mAP of the model but also reduce some parameters.

3.1.2. Weight Optimization of the Loss Function

The weight of each task of the loss function represented the level of attention given to the task. Weight optimization experiments were conducted to balance the attention to the maturity tasks; the weight of five maturity loss functions was selected to compare the size of mAP. The five weights were 0.5, 0.8, 1.0, 1.2, and 1.5, as shown in Figure 10. The horizontal axis represents the number of training epochs, ranging from 100 to 200 epochs; the vertical axis represents mAP, and the curves of different colors represent the results of different weights. As the number of training epochs increased, mAP showed an increasing trend. The results showed that when the number of training epochs was 200, the mAP with a weight of 1.0 was the highest, followed by 0.8, then 1.5 and 0.5, and the mAP with a weight of 1.2 was the lowest. Therefore, choosing the right weights can lead to higher accuracy in maturity tasks.

3.2. Experiments Contrast

3.2.1. Ablation Experiments

In order to verify the contribution of each module to the performance of the model, four improvements of feature fusion, RGBD, multi-task, and loss function optimization were carried out on the two tasks of classification and grading. RGBD meant the adding of depth information, and multi-task meant the joint completion of the three tasks of flower sorting: localization, classification, and grading. Compared to the basic network, YOLOv5, AP, AR, mAP, and Speed were compared. The results of the ablation experiment of the fresh-cut flower classification task are shown in Table 2. In the classification task, the basic network had performed well, with the mAP reaching 97.17% and the speed reaching 76.49 FPS. However, the improved module still increased the mAP, but the final speed after improvement was slightly lower than that of the basic network. Compared to the basic network, after simplifying the feature fusion layer, mAP increased by 0.7%, AP reached 100%, and speed increased by 14.2 FPS. After adding RGBD, mAP increased by 0.3%, and the speed was still 4.73 FPS faster than the basic network. After adding multi-task, mAP increased by 0.1%, and the speed decreased by 3.6 FPS. After loss function optimization, the mAP remained unchanged and the speed decreased to 3.42 FPS.

The results of the ablation experiment of the fresh-cut flower grading task are shown in Table 3. In the grading task, due to the high similarity between different maturity, the performance of the basic network in the grading task was worse than the classification task: AP was only 76.8%, AR only 85.1%, mAP only 85.5%. After adding four improvements, the final network had also achieved good performance. Compared to the basic network, after simplifying the feature fusion layer, AP increased by 7.2%, AR increased by 6.3%, and mAP increased by 6.1%, which had been greatly improved, and the simplified feature layer increased the speed by 14.34 FPS. After adding RGBD, AP increased by 7.2%, AR increased by 2%, and mAP increased by 3.7%. This showed that adding depth information to the dataset helped to increase more bud detail information, thereby improving the accuracy of the grading task. After adding the multi-task, AP increased by 7.1%, AR increased by 5.1%, and mAP increased by 1.9%. Due to the change in the NMS screening method, the detection accuracy was improved. After loss function optimization, AP increased by 1.3%, AR increased by 0.7%, and mAP increased by 10.6%. The detection speed was reduced by 9 FPS, but still achieved real-time performance. This showed that loss function optimization strengthened the learning ability of the grading task, and the accuracy was further improved. Through the analysis of the two tasks’ results, it was concluded that each module used in this paper improved the detection accuracy of the network. However, due to the addition of a new structure, the speed gradually decreased, but it still reached 73.07 FPS, which was 3.38 FPS lower than the basic network.

To verify the necessary of depth data, the results of P, R, mAP, and F1 using RGB and RGBD images were compared, as shown in Table 4. The results showed that in the classification task, P was 99.98%, R was 100%, the F1 score was 100% using RGB, and using RGBD still maintained the high score. The mAP using RGB was 97.75%, and the mAP using RGBD was 98.13%, increased by 0.38%. This indicated that the depth information had a certain enhancement effect on the classification task. For the grading task, when RGB was used, P was 83.95%, R was 91.38%, mAP was 91.61%, and F1 was 87.03%. When using RGBD, all the indicators had improved. Among them, P increased by 7.26%, R increased by 2.01%, mAP increased by 3.63%, and F1 increased by 5.12%. This showed that the addition of depth information played an important role in grading task. The necessity of in-depth information was verified.

In order to verify the comprehensive impact of multi-task on the model, F1, mAP, Params, and Speed of the single task and multi-task were compared, as shown in Table 5. The results showed that F1 was 100% and mAP was 98.13% in the single classification task, which had reached a high level. F1 remained the same in multi-task, and mAP improved by 0.07%, indicating that the classification task in the multi-task resulted in a small optimization. In the single grading task, F1 was 92.15% and mAP was 95.24%. In multi-task, F1 improved by 6.86% and mAP improved by 1.87%, indicating that multi-task strengthened the ability of grading task, which was due to the double screening of NMS, making the final samples obtained more accurate. In a single task, the number of parameters was 6.45M and the speed was 81.22 FPS. The number of parameters increased by 0.01M in multi-task, so the computing power of the computer did not increase the load too much. The Speed drop of 8.21 FPS was due to an increase in the improved NMS, resulting in a decrease in the detection speed. Overall, although the speed of multi-task was lower than that of single task, the completion of multi-task did not increase too much computer load but also achieved higher accuracy. The comprehensive ability of multi-task was verified.

3.2.2. Contrast Experiments

In order to verify the effectiveness of the MTMD-YOLO network, the detection effects of the SSD, RetinaNet, and YOLOv5, YOLOv6, YOLOv7, YOLOv8, four kinds of YOLO series detection networks, were reproduced on the RGB flower species dataset. The five indicators of AP, AR, mAP, Params, and Speed were compared, and the comparison results are shown in Table 6. The mAP of MTMD-YOLO on the species dataset was 98.19%, which was 1.02%, 1.37%, and 0.97% higher than that of YOLOv5, YOLOv6, and YOLOv7, and was 0.76%, 1.51%, and 0.7% lower than that of SSD, RetinaNet, and YOLOv8, respectively. This was because the number of parameters of the YOLOv8 network was 4.67M more than that of MTMD-YOLO, which will lead to a decrease in speed while increasing accuracy. The number of parameters of MTMD-YOLO was next to RetinaNet, 6.46M, and the speed of MTMD-YOLO reached 73.07 FPS, which was 1.86 FPS slower than that of the YOLOv5 network, but mAP was higher than YOLOv5. Among the classification task, the RetinaNet model had fewer parameters, greatly improving the detection speed, and had the best comprehensive performance, followed by the MTMD-YOLO network.

Figure 11 shows the histogram comparison of MTMD-YOLO and other excellent networks in the RGB flower species dataset. It can be seen that the MTMD-YOLO network achieved the highest level of integration in the flower classification task, with the second highest accuracy after YOLOV8, the smallest number of parameters, and the second highest speed after YOLOv5.

In order to verify the effectiveness of the MTMD-YOLO network, the detection effects of the SSD, RetinaNet, and YOLOv5, YOLOv6, YOLOv7, YOLOv8, four YOLO series detection networks, were reproduced on the RGB flower maturity dataset. Compared to the five indicators of AP, AR, mAP, Params, and Speed, the comparison results of maturity detection are shown in Table 7. The mAP of MTMD-YOLO on the maturity dataset was 97.81%, which was 19.09%, 14.07%, 12.36%, 15.46%, 14.30%, and 0.68% higher than SSD, RetinaNet, YOLOv5, YOLOv6, YOLOv7, and YOLOv8, respectively. The RetinaNet model had the smallest number of parameters, which was 4.02M, greatly improving the detection speed. The number of parameters of MTMD-YOLO was next to RetinaNet, which was 6.46M, and the speed was 16.78 FPS and 1.86 FPS slower than the RetinaNet and YOLOv5 network, respectively, reaching 73.07 FPS. Although RetinaNet performed well in the classification task, due to its simple model, it was unable to cope with a complex maturity task, resulting in lower test results. The detailed analysis showed that the MTMD-YOLO model performed best and achieved the best results in the speed and accuracy in both two tasks. It also showed that the MTMD-YOLO network can still maintain high accuracy and speed, which can end-to-end complete the flower sorting task.

Figure 12 shows the histogram comparison of MTMD-YOLO and other excellent networks in the RGB flower maturity dataset. It can be seen that the MTMD-YOLO network achieved the highest level of integration in the flower grading task, with the highest accuracy, the smallest number of parameters, and the second highest speed after YOLOv5.

In order to verify the performance of the model on the embedded Jetson Orin NX, the MTMD-YOLO network was compared with the RetinaNet, YOLOv5, and YOLOv8, as shown in Table 8. The results showed that RetinaNet had the fastest detection speed, reaching 45 FPS. In the classification task, the mAP reached 97.70%, and in the grading task, the mAP reached 83.74%, which was 14.06% lower than the MTMD-YOLO network. The speed of the MTMD-YOLO network was second, reaching 37 FPS, the mAP of classification task was 98.15%, next to YOLOv8, and the mAP of classification task was the highest, reaching 97.80%. The mAP of YOLOv8 had reached a high level, but the speed was only 29 FPS, which did not reach the real-time speed. YOLOv5 had only 86.12% of the mAP in the grading task and was 5 FPS slower than MTMD-YOLO. In general, MTMD-YOLO had an excellent performance in hardware.

3.3. Detection Results in Challenging Conditions

3.3.1. Experiments on Difficult-to-Distinguish Maturity of Flower

In visible light (RGB), distinguishing whether the petals are open or not can be challenging, potentially resulting in lower accuracy in maturity judgment. After adding RGBD, the accuracy of difficult-to-distinguish maturity was significantly improved. The comparison of the effect of visible light (RGB) and RGBD is shown in Figure 13. The results showed that anna rose petals of grade 3 have been categorized as grade 2, weiguang rose of grade 3 have been categorized as grade 2, zhenai rose of grade 5 have been categorized as grade 4, and jinzhi rose of grade 1 have been categorized as grade 2. This was because visible light was unable to judge the upright state of the petals, while RGBD incorporated depth information and was able to obtain depth information of the petals, and judge the number of petals open, thus accurately judging the maturity of the petals.

3.3.2. Detection Effects in Real-World Environments

Due to the limitation of the depth camera collecting device in a real environment, this work used an experimental environment dataset to simulate the flower sorting effect. The experimental environment dataset used a variety of lighting and backgrounds to simulate real-world lighting and backgrounds. To verify the robustness of the dataset, ten samples were randomly selected for verification through online and the real world, and Figure 14 shows the sorting results in the natural environment. The results showed that when there was a single flower in a single image, the system can accurately identify the background and correctly distinguish between species and maturity. This showed that the model can be generalized to real environments. When there were multiple flowers in a single image, some flower species and maturity can be accurately identified, but not all flowers. This indicated that the system′s ability to recognize flowers needed to be improved when flowers were dense in the image. When flowers had indistinct cores, inaccurate positioning and incorrect maturity predictions can occur. This was because flower maturity was mainly judged by the flower cores, and this put forward higher requirements for the acquisition angle. In order to verify the detection effect on other species, two varieties of flowers were selected for verification. When the flower species were not included in the dataset, the system still recognized flowers and maturity but identified them as species present in the dataset. In the ten samples, the true positive (TP) of the classification task was 8 and false positive (FP) was 2. The TP of the grading task was 8 and FP was 2. In summary, the system had a certain generalization ability for the background of the real environment, but it was limited by the acquisition angle, the number and species of flowers.

3.4. Innovations, Limitations, and Future Work

Previous studies had shown that deep learning technology had made progress in the field of flower species detection, but its application in flower maturity detection was relatively limited. The existing methods had some problems, such as complex model, single type of dataset, and complex operation. For instance, Sun X et al. [21] proposed a flower quality grading method based on deep learning and deep information. Four convolutional models, VGG16, ResNet18, MobileNetV2, and InceptionV3, were used to classify RGBD images, which proved that the depth information can effectively reflect the characteristics of flower buds. On the basis of using in-depth information, Fei Y et al. [22] realized a lightweight flower grading system based on the ShuffleNetV2 network. The overall predicted classification speed can reach 0.020 s/flower. Compared with the fresh-cut flower classifier on the market, the system had great advantages in speed. Although the above models achieved good results in terms of speed and accuracy, it can only detect the maturity information of one variety of flower, and the species and maturity detection of many varieties of fresh-cut flowers had not been completed.

To address these issues, this study proposed an end-to-end flower sorting method based on depth information, which can simultaneously complete flower location, classification, and grading tasks. To improve the accuracy of difficult-to-distinguish maturity, an RGBD flower sorting dataset with RGBD images and double label was produced. To end-to-end realize a flower sorting system, the MTMD-YOLO network was constructed. In the MTMD-YOLO network, the high-resolution P3 layer was removed to increase training speed; the detection head increased the maturity tensor to predict maturity information. The NMS filtered the prediction boxes of the classification and grading tasks, respectively, and then merged them to obtain the prediction results of location, species, and maturity. The loss function for the maturity task was added to train each task separately. Compared with the SSD, RetinaNet, and YOLOv5, YOLOv6, YOLOv7, YOLOv8, four kinds of YOLO series detection networks, considering accuracy and speed, this model had the best performance. F1 achieved 100% in the classification task and 99.01% in the grading task. In the classification task, mAP reached 98.19%, and in the grading task, mAP reached 97.81%. In the hardware Jetson Orin NX, a speed of 37 FPS was achieved with the same accuracy. This method effectively improved the efficiency of flower sorting and only needed to input the depth camera to collect images, which can directly obtain the location, species, and maturity information of flowers. The end-to-end operation saved learning costs and holds great significance for smart agriculture and actual deployment.

This study had several limitations. It had verified the feasibility of sorting four common varieties of fresh-cut flowers, but this model had not yet been validated in other rose varieties. Considering that the class information learned by the model was built according to the training set, the detection accuracy of the model on other flower species may be reduced, because other flower species were not present in the training set. In addition, the actual flower growing environment was dynamic and influenced by numerous unpredictable factors, such as leaf shading, severe insufficient or excessive light, acquisition angle, and complex backgrounds. In such a complex real-world environment, further evaluation, optimization, and enhancement of the robustness of the model are necessary.

This experiment provided the idea of sorting fresh-cut flowers and broadened the feasibility of multi-task network processing. The sorting task was preliminarily completed. In the future, this work will continue to expand the dataset, optimize the algorithm model, and enhance the ability of the system under overload environment to better serve the flower sorting task. This work will upgrade the robotic arm to collect more species of flowers in real-world environments, then to validate the MTMD-YOLO flower sorting model. This method aimed to enhance the robustness and generalizability of the model. In addition, this work will explore the application of MTMD-YOLO federated learning technology combined with the Internet of Things (IoT) in real-world flower sorting and explore the combination of the system and the client and how to carry out self-service flower sorting conveniently and quickly. This work will also explore the application of MTMD-YOLO in large-scale flower sorting, thereby driving the development of smart agriculture.

4. Conclusions

This paper proposed a real-time, high-precision end-to-end system for flower sorting, addressing three key challenges: real-time operation on embedded devices, high precision in distinguishing difficult-to-determine maturity stages, and end-to-end processing for flower localization, classification, and grading tasks. The MTMD-YOLO network was developed for an end-to-end flower sorting system. To improve difficult-to-distinguish maturity, an RGBD flower sorting dataset was constructed, and real-time capability was achieved by simplifying the feature fusion layer. For the final prediction and post-processing of flower sorting, improvements included the use of a double-label detection head, double-label NMS, and loss function to predict information across three tasks. Experiments showed that the mAP of the MTMD-YOLO network reaches 98.19% in the fresh-cut flower classification task, and 97.81% in the fresh-cut flower grading task. The method achieved real-time speed (37 FPS) on a portable embedded Jetson Orin NX. Furthermore, the flower sorting system can be seamlessly integrated with mobile carts to fully leverage depth information and execute robotic automatic picking tasks efficiently.

Author Contributions

Conceptualization, W.L. and S.Z.; methodology, Z.D., W.L. and L.C.; software, Z.D.; validation, Z.D.; formal analysis, Z.D. and W.C.; investigation, Z.D. and W.L.; resources, Z.D., W.L. and C.Z.; data curation, Z.D., W.L. and L.C.; writing—original draft preparation, Z.D. and C.Z.; writing—review and editing, W.L. and S.Z.; visualization, Z.D. and W.C.; supervision, W.L., L.C. and S.Z.; funding acquisition, S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by Hubei’s Key Project of Research and Development Program under Grant 2023BBB046, and Excellent young and middle-aged scientific and technological innovation teams in colleges and universities of Hubei Province under Grant T2021009, and NSFC-CAAC under Grant U1833119.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

Thanks to all of the authors cited in this article and the referees for their helpful comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Aman, M. Postharvest loss estimation of cut rose (Rosa hybrida) flower farms: Economic analysis in East Shoa Zone, Ethiopia. Int. J. Sustain. Econ. 2014, 6, 82–95. [Google Scholar] [CrossRef]
Tiay, T.; Benyaphaichit, P.; Riyamongkol, P. Flower recognition system based on image processing. In Proceedings of the 2014 Third ICT International Student Project Conference (ICT-ISPC), Nakhonpathom, Thailand, 26–27 March 2014; pp. 99–102. [Google Scholar]
Zawbaa, H.M.; Abbass, M.; Basha, S.H.; Hazman, M.; Hassenian, A.E. An automatic flower classification approach using machine learning algorithms. In Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Delhi, India, 24–27 September 2014; pp. 895–901. [Google Scholar]
Albadarneh, A.A. Automated Flower Species Detection and Recognition from Digital Images; Princess Sumaya University for Technology: Amman, Jordan, 2016. [Google Scholar]
Liu, W.; Rao, Y.; Fan, B.; Song, J.; Wang, Q. Flower classification using fusion descriptor and SVM. In Proceedings of the 2017 International Smart Cities Conference (ISC2), Wuxi, China, 14–17 September 2017; pp. 1–4. [Google Scholar]
Soleimanipour, A.; Chegini, G.R.; Massah, J. Classification of Anthurium flowers using combination of PCA, LDA and support vector machine. Agric. Eng. Int. CIGR J. 2018, 20, 219–228. [Google Scholar]
Patel, I.; Patel, S. Flower identification and classification using computer vision and machine learning techniques. Int. J. Eng. Adv. Technol. (IJEAT) 2019, 8, 277–285. [Google Scholar] [CrossRef]
Tian, M.; Chen, H.; Wang, Q. Flower identification based on Deep Learning. J. Phys. Conf. Ser. 2019, 1237, 022060. [Google Scholar] [CrossRef]
Anjani, I.A.; Pratiwi, Y.R.; Nurhuda, S.N.B. Implementation of deep learning using convolutional neural network algorithm for classification rose flower. J. Phys. Conf. Ser. 2021, 1842, 012002. [Google Scholar] [CrossRef]
Cıbuk, M.; Budak, U.; Guo, Y.; Ince, M.C.; Sengur, A. Efficient deep features selections and classification for flower species recognition. Measurement 2019, 137, 7–13. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-time flying object detection with YOLOv8. arXiv 2023, arXiv:2305.09972. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. pp. 21–37. [Google Scholar]
Fu, C.-Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. Dssd: Deconvolutional single shot detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
Krishna, K.P.; Thomas, G.; Soumya, M.; Praneetha, K.; Imrana, A. You Only Look Once for Panoptic Driving Perception (YOLOP). EPRA Int. J. Multidiscip. Res. (IJMR) 2022, 8, 55–61. [Google Scholar]
Gao, Y.; Li, Z.; Li, B.; Zhang, L. YOLOv8MS: Algorithm for Solving Difficulties in Multiple Object Tracking of Simulated Corn Combining Feature Fusion Network and Attention Mechanism. Agriculture 2024, 14, 907. [Google Scholar] [CrossRef]
Sun, X.; Li, Z.; Zhu, T.; Ni, C. Four-dimension deep learning method for flower quality grading with depth information. Electronics 2021, 10, 2353. [Google Scholar] [CrossRef]
Fei, Y.; Li, Z.; Zhu, T.; Ni, C. A lightweight attention-based Convolutional Neural Networks for fresh-cut flower classification. IEEE Access 2023, 11, 17283–17293. [Google Scholar] [CrossRef]
Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; pp. 850–855. [Google Scholar]
Patro, S.; Sahu, K.K. Normalization: A preprocessing stage. arXiv 2015, arXiv:1503.06462. [Google Scholar] [CrossRef]
Quality Grade of Fresh Cut Flower Auction Products Part 2: Single Rose. 2014. Available online: https://hbba.sacinfo.org.cn/stdDetail/975d7254c55992f9797c99a36e366404 (accessed on 16 July 2024).
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 2021, 52, 8574–8586. [Google Scholar] [CrossRef] [PubMed]
Mao, A.; Mohri, M.; Zhong, Y. Cross-entropy loss functions: Theoretical analysis and applications. In Proceedings of the International Conference on Machine Learning, Hangzhou, China, 23–29 July 2023; pp. 23803–23828. [Google Scholar]

Figure 1. The implementation process of the flower sorting system.

Figure 2. Data acquisition platform.

Figure 3. Four rose samples.

Figure 4. Representative images of anna rose in the five grades.

Figure 5. Image preprocessing process.

Figure 6. MTMD-YOLO network structure.

Figure 7. Structures of the feature pyramids, where (a) is the path aggregation network (PANet) structure; (b) is the simplified path aggregation network (SPANet) structure.

Figure 8. Tensor information of double-label detection head.

Figure 9. Double-label NMS preprocessing, where (a) is the NMS of YOLOv5; (b) is the NMS of MTMD-YOLO.

Figure 10. Comparing the loss weights of different maturity, only the part with 100–200 training epochs was taken.

Figure 11. Histogram comparison of MTMD-YOLO and other excellent networks in the RGB flower species dataset.

Figure 12. Histogram comparison of MTMD-YOLO and other excellent networks in the RGB flower maturity dataset.

Figure 13. Comparation of RGB and RGBD at difficult-to-distinguish maturity flowers, where (a–d) were the results with RGB images; (e–h) were the results with RGBD images.

Figure 14. Flower sorting results in natural environment.

Table 1. The comparison results of selecting different feature layers in PAN.

Method	mAP (%) ↑	Params (M) ↓
P3~P4	90.14	1.90
P4~P5	97.81	6.46
P3~P5	94.32	7.03

Note: “↑” indicated that the larger the index, the better the effect; “↓” indicated that the smaller the index, the better the effect.

Table 2. Ablation experiment of the fresh-cut flower classification task.

Baseline	Feature Fusion	RGBD	Multi-Task	Loss Function Optimization	AP (%) ↑	AR (%) ↑	mAP (%) ↑	Speed (FPS) ↑
√					99.98	100	97.17	76.49
√	√				99.98	100	97.75 (+0.62)	90.69
√	√	√			99.98	100	98.13 (+0.38)	81.22
√	√	√	√		99.98	100	98.19 (+0.06)	73.01
√	√	√	√	√	100 (+0.02)	100	98.19	73.07

Note: “↑” indicated that the larger the index, the better the effect.

Table 3. Ablation experiment of the fresh-cut flower grading task.

Baseline	Feature Fusion	RGBD	Multi-Task	Loss Function Optimization	AP (%) ↑	AR (%) ↑	mAP (%) ↑	Speed (FPS) ↑
√					75.68	86.18	85.45	76.45
√	√				83.95 (+8.27)	91.38 (+5.20)	91.61 (+6.16)	90.79
√	√	√			91.21 (+7.26)	93.39 (+2.01)	95.24 (+3.63)	81.25
√	√	√	√		98.24 (+7.03)	98.48 (+5.09)	97.11 (+1.87)	73.01
√	√	√	√	√	99.57 (+1.33)	99.17 (+0.69)	97.81 (+0.70)	73.07

Note: “↑” indicated that the larger the index, the better the effect.

Table 4. Results of using RGB and RGBD images.

Indicators	P_{classification} (%)	R_{classification} (%)	mAP_{classification} (%)	F1_{classification} (%)	P_grading (%)	R_grading (%)	mAP_grading (%)	F1_grading (%)
RGB	99.98	100	97.75	100	83.95	91.38	91.61	87.03
RGBD	99.98	100	98.13	100	91.21	93.39	95.24	92.15

Table 5. Results of single task and multi-task.

Indicators	F1_{classification} (%)	mAP_{classification} (%)	Params_{classification} (M)	Speed_{classification} (FPS)	F1_grading (%)	mAP_grading (%)	Params_grading (M)	Speed_grading (FPS)
Single task	100	98.13	6.45	81.22	92.15	95.24	6.45	81.25
Multi-task	100	98.19	6.46	73.01	99.01	97.11	6.46	73.01

Table 6. Comparison of MTMD-YOLO and other excellent networks in RGB flower species dataset.

Method	Size (Pixels)	AP (%) ↑	AR (%) ↑	mAP (%) ↑	Params (M) ↓	Speed (FPS) ↑
SSD	300 × 300	96.05	93.52	98.95	6.96	42.86
RetinaNet	600 × 600	99.71	96.34	99.70	4.02	89.85
YOLOv5	640 × 640	99.98	100	97.17	7.02	76.49
YOLOv6	640 × 640	96.87	98.14	96.82	18.50	56.47
YOLOv7	640 × 640	100	100	97.22	36.50	23.55
YOLOv8	640 × 640	99.99	100	98.86	11.13	60.85
MTMD-YOLO (This work)	640 × 640	100	100	98.19	6.46	73.07

Note: “↑” indicated that the larger the index, the better the effect; “↓” indicated that the smaller the index, the better the effect.

Table 7. Comparison of MTMD-YOLO and other excellent networks in the RGB flower maturity dataset.

Method	Size (Pixels)	AP (%) ↑	AR (%) ↑	mAP (%) ↑	Params (M) ↓	Speed (FPS) ↑
SSD	300 × 300	78.75	89.82	78.72	6.96	42.86
RetinaNet	600 × 600	83.76	83.41	83.74	4.02	89.85
YOLOv5	640 × 640	75.68	86.18	85.45	7.02	76.45
YOLOv6	640 × 640	82.47	97.73	82.35	18.50	55.84
YOLOv7	640 × 640	75.78	84.32	83.51	36.50	24.24
YOLOv8	640 × 640	93.79	95.05	97.13	11.13	60.40
MTMD-YOLO (This work)	640 × 640	99.57	99.17	97.81	6.46	73.07

Note: “↑” indicated that the larger the index, the better the effect; “↓” indicated that the smaller the index, the better the effect.

Table 8. Comparison of MTMD-YOLO and other excellent networks in Hardware.

Method	mAP_{classification} (%)	mAP_grading (%)	Speed (FPS)
RetinaNet	97.70	83.74	45
YOLOv5	97.18	86.12	32
YOLOv8	98.86	97.13	29
MTMD-YOLO (This work)	98.15	97.80	37

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Duan, Z.; Liu, W.; Zeng, S.; Zhu, C.; Chen, L.; Cui, W. Research on a Real-Time, High-Precision End-to-End Sorting System for Fresh-Cut Flowers. Agriculture 2024, 14, 1532. https://doi.org/10.3390/agriculture14091532

AMA Style

Duan Z, Liu W, Zeng S, Zhu C, Chen L, Cui W. Research on a Real-Time, High-Precision End-to-End Sorting System for Fresh-Cut Flowers. Agriculture. 2024; 14(9):1532. https://doi.org/10.3390/agriculture14091532

Chicago/Turabian Style

Duan, Zhaoyan, Weihua Liu, Shan Zeng, Chenwei Zhu, Liangyan Chen, and Wentao Cui. 2024. "Research on a Real-Time, High-Precision End-to-End Sorting System for Fresh-Cut Flowers" Agriculture 14, no. 9: 1532. https://doi.org/10.3390/agriculture14091532

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on a Real-Time, High-Precision End-to-End Sorting System for Fresh-Cut Flowers

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Design

2.2. RGBD Flower Sorting Dataset

2.2.1. Depth Image Acquisition

2.2.2. Image Preprocessing

2.3. MTMD-YOLO Detection Model

2.3.1. Feature Fusion

2.3.2. Double-Label Detection Head

2.3.3. Double-Label NMS

2.3.4. Loss Function of Multi-Task

2.4. Experiment Setting and Evaluation Indicators

3. Experimental Results and Analysis

3.1. Optimization Experiment

3.1.1. Feature Fusion Optimization

3.1.2. Weight Optimization of the Loss Function

3.2. Experiments Contrast

3.2.1. Ablation Experiments

3.2.2. Contrast Experiments

3.3. Detection Results in Challenging Conditions

3.3.1. Experiments on Difficult-to-Distinguish Maturity of Flower

3.3.2. Detection Effects in Real-World Environments

3.4. Innovations, Limitations, and Future Work

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI