Recognition and Detection of Persimmon in a Natural Environment Based on an Improved YOLOv5 Model

Cao, Ziang; Mei, Fangfang; Zhang, Dashan; Liu, Bingyou; Wang, Yuwei; Hou, Wenhui

doi:10.3390/electronics12040785

Open AccessArticle

Recognition and Detection of Persimmon in a Natural Environment Based on an Improved YOLOv5 Model

by

Ziang Cao

¹,

Fangfang Mei

¹,

Dashan Zhang

¹

,

Bingyou Liu

²,

Yuwei Wang

¹

and

Wenhui Hou

^1,*

¹

Anhui Provincial Engineering Laboratory of Intelligent Agricultural Machinery, School of Engineering, Anhui Agriculture University, Hefei 230036, China

²

Key Laboratory of Electric Drive and Control of Anhui Province, Anhui Polytechnic University, Wuhu 241000, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(4), 785; https://doi.org/10.3390/electronics12040785

Submission received: 26 December 2022 / Revised: 31 January 2023 / Accepted: 2 February 2023 / Published: 4 February 2023

(This article belongs to the Special Issue Artificial Intelligence and Sensors with Agricultural Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Accurate and rapid recognition of fruit is the guarantee of intelligent persimmon picking. Given the changes in the light and occlusion conditions in a natural environment, this study developed a detection method based on the improved YOLOv5 model. This approach has several critical steps, including optimizing the loss function based on the traditional YOLOv5, combining the centralized feature pyramid (CFP), integrating the convolutional block attention module (CBAM), and adding a small target detection layer. Images of ripe and unripe persimmons were collected from fruit trees. These images were preprocessed to enhance the contrast, and they were then extended by means of image enhancement to increase the robustness of the network. To test the proposed method, several experiments, including detection and comparative experiments, were conducted. From the detection experiments, persimmons in a natural environment could be detected successfully using the proposed model, with the accuracy rate reaching 92.69%, the recall rate reaching 94.05%, and the average accuracy rate reaching 95.53%. Furthermore, from the comparison experiments, the proposed model performed better than the traditional YOLOv5 and single-shot multibox detector (SSD) models, improving the detection accuracy while reducing the leak detection and false detection rate. These findings provide some references for the automatic picking of persimmons.

Keywords:

persimmon detection; YOLOv5; centralized feature pyramid; CBAM; Alpha-IoU; small target detection layer

1. Introduction

In terms of persimmon production, China ranks first in the world in persimmon production. Large-scale persimmon planting brings the difficulty of picking. Ripe persimmon fruit are usually surrounded by many branches and leaves, and the fruit distribution is also complex, meaning that picking the fruit is time-consuming and laborious work. At present, fruit picking is mostly manual and requires a large amount of labor.

Since the 1980s, technology based on vision and image processing has become increasingly advanced, and its application plays an important role in various fields [1]. Many scholars have started to combine target detection techniques with robots to improve the efficiency and accuracy of fruit harvesting [2,3,4].

Traditional machine learning has been widely applied in the detection of fruits [5,6]. Chaivivatrakul et al. [7] used different textures of different fruits to identify the fruits. This identification method requires obvious texture differences between different fruits. However, this detection method has low identification accuracy for fruits with insignificant texture differences. Zhao et al. [8] designed a KNN classifier to classify fruits’ shape features, and the accuracy rate reached 84%. However, this classification method relied too much on fruits’ shape features and lacked detection accuracy. Tian et al. [9] proposed an optimized graph-based recognition algorithm, wherein the gradient information and RGB spatial information in the in-depth apple image were used to determine the spatial position of the target fruit. However, this algorithm exhibits poor robustness when the environment is complex, such as when the fruit are numerous or block each other.

With the prevalence of deep learning in the field of target detection and the advancement of hardware devices, end-to-end identification has become an important approach [10,11,12,13]. Sa et al. [14] proposed a novel method for fruit detection based on deep convolutional neural networks (CNN). The transfer model is trained by inputting sweet pepper images of different colors, and the final detection accuracy reaches 83.8%. Based on YOLOv3, Liu et al. [15] proposed an improved detection model, YOLO-Tomato, which has a better learning effect and better detection effect. Ghoury et al. [16] used SSD_MobileNet v1 and Faster R-CNN Inception v2 to complete the migration learning and test grape diseases. The accuracy of the image recognition by Faster R-CNN Inception v2 is 95.57%; however, the processing time is long. The accuracy of SSD_MobileNet v1 for image recognition is low, being between 52% and 80%, although the processing time is short. Liang et al. [17] proposed a combination of the YOLOv3 and U-Net models for the detection of litchi and litchi stems. Under normal brightness, high brightness, and low brightness conditions, the average accuracy is 99.57%, 96.78%, and 89.30%, respectively. The detection model has high precision and robustness.

In the process of automated picking, the important steps are picking efficiency and precise positioning. Therefore, this paper proposes the rapid identification of fruit based on an improved YOLOv5 network under natural occlusion and illumination conditions. Joseph Redmon et al. [18] proposed the You Only Look Once (YOLO) network model. YOLO does not need a complex network framework to achieve target detection, and it has a faster detection speed. After continuous development, YOLOv3 [19], YOLOv4 [20], and other detection models were derived. As the fifth version of the YOLO series, YOLOv5 is characterized by a fast detection speed and strong practicability. In this paper, we propose a modified YOLOv5 model for persimmon detection. Specifically, the main contributions of this paper are summarized as follows:

Combining YOLOv5 with a centralized feature pyramid (CFP) [21] so the model focuses more on feature extraction, which gives strong robustness and generalization ability.
Based on the traditional model, a convolutional block attention module (CBAM) [22] is integrated to improve the detection effect.
The GIoU_loss function is replaced by Alpha-IoU loss to improve the detection accuracy [23].
For the better detection of small targets, a small target detection layer (STDL) is added based on the structure of the YOLOv5 model.

The rest of the paper is structured as follows. First, the specific conditions of the image acquisition, image processing, and data enhancement are described in Section 2. We improve the model and obtain the improved model structure framework in Section 3. We input the datasets into the model for training, testing, verification, and thermal map display of the test results in Section 4. In order to verify the superiority of the proposed model, we conduct several comparison tests with the other algorithms on the same datasets in Section 5. Our conclusions are shown in Section 6.

2. Persimmon Datasets

2.1. Image Collection

First, persimmon images in a natural environment were collected, and then the images with lower pixels were filtered to facilitate the training and evaluation of the subsequent datasets. The following are the details of the image collection:

Persimmon categories: Unripe persimmon and ripe persimmon.
Collection environment: In order to avoid overfitting due to the insufficient diversity of the sample data, the samples were collected under normal light in the day and weak light in the night. Figure 1 shows the effect in different lighting. At the same time, different persimmon numbers, different degrees of branch and leave blocking conditions, and different distances of persimmon were photographed to increase the diversity of the datasets. The pictures taken are shown in Figure 2.
Collection location and collection device: The persimmon datasets were collected from the NongCuiyuan experimental field of Anhui Agricultural University. The image acquisition equipment was a Basler industrial camera. The dataset acquisition device used in this paper is shown in Figure 3.
Image processing: The datasets were first manually annotated using Labelimg to minimize the impact of other useless pixels in the image on the training datasets. In addition, digital image histogram and equalization technology were used to enhance the contrast of the original datasets without changing the basic features of the images [24].

2.2. Dataset Enhancement

Dataset enhancement is performed to prevent overfitting due to insufficient sample quantity. The enhancement is conducted by adding noise to the original image, rotating the image randomly, and shaking the color, as shown in Figure 4. In this paper, salt and pepper noise is added to the original images, and the added density is set to 0.3. When rotating the image, the angle is 90° or 180°. Color dithering is used to generate new images and enrich the datasets by adjusting the saturation, brightness, and contrast of the original images. The dataset enhancement produces a total of 3126 images, of which 80% are used for the dataset training, 10% for the dataset validation, and 10% for the dataset testing.

3. An Improved YOLOv5 Model

3.1. Mosaic Data Augmentation

Mosaic data augmentation can randomly select four images to scale, cut, and arrange to achieve re-stitching, as shown in Figure 5. YOLOv5 continues the Mosaic data augmentation method of the previous version, which greatly enriches the datasets and increases the robustness of the model. At the same time, random scaling and cropping techniques also improve the detection accuracy of small objects, thereby improving the overall detection effect.

3.2. Multi-Scale Feature Extraction

The structure network of YOLOv5 uses a CSPDarknet53 and Focus structure, which makes YOLOv5 have more feature extraction ability than YOLOv3 and YOLOv4, and it also has higher recognition accuracy for some occluded objects. The core operation of the Focus structure is to slice the images. A picture of 640 × 640 pixels is first entered and the slice operation is performed to make a feature diagram of 320 × 320 × 12. A flow chart of the slicing operation is shown in Figure 6. Finally, 32 convolution cores are used to convolve into a feature map of 320 × 320 × 32. The Focus structure takes one value of every other pixel in each image to avoid the loss of the feature information of the image. Finally, four independent feature maps can be obtained, and the features are then extracted using convolutional layers, which makes the feature extraction more sufficient.

3.3. Loss Function Optimization

The YOLOv5 model adopts the GIoU_loss function and Bounding box at the output end. As a loss calculation method for border prediction, GIoU_loss compares the predicted border with the actual border to calculate the loss. The GIoU_loss can be expressed as:

G I o U = I o U - \frac{g - (θ \cup β)}{g}

(1)

where

θ

is the prediction box,

β

is the target box, and

g

is the minimum bounding box that surrounds

θ

and

β

, as shown in Figure 7.

When

θ

and

β

coincide, the formula is as follows:

| θ \cup β | = | θ \cap β |

(2)

When there is no overlapping part in

θ

and

β

, the formula is as follows:

I o U = 0

(3)

G I o U = - 1 + \frac{θ \cup β}{g}

(4)

The loss function of the Bounding box is CIoU_loss. Compared with IoU, the center distance between the true box and the predicted box (d in the figure) is considered by the CIoU_loss. The diagonal distance of the smallest bounding box (c in the figure), (

w^{g t}, h^{g t}

), (w, h) respectively represent the height and width of the real box, while (b,

b^{g t}

) respectively represent the predicted box and the center point of the real box, as shown in Figure 8. The CIoU_loss works well for the case where IoU = 0 when the two boxes do not overlap.

The CIoU_loss can be expressed as:

R = 1 - I o U

(5)

L_{C I o U} = R + \frac{ρ (b, b^{g t})}{c^{2}} + τ υ

(6)

υ = \frac{4}{π^{2}} {(\arctan \frac{w^{g t}}{h^{g t}} - \arctan \frac{w}{h})}^{2}

(7)

where

τ

is a weighting function and

υ

is a similarity measure of length.

The GIoU_loss takes into account well the case when the prediction box does not coincide with the target box. However, the GIoU_loss is no more prominent than the Iou when the prediction box has an inclusion relationship with the target box. Therefore, the Alpha-IoU is introduced to optimize the loss function in this paper, which is able to generalize to the existing IoU-based losses, including the GIoU, DIoU, and CIoU. The

α

can be used as a hyperparameter to adjust the Alpha-IoU loss to adapt to different regression accuracy, thereby obtaining a better target detection effect. The formula is shown as:

L_{α - I o U} = 1 - I o U^{α}

(8)

L_{α - C I o U} = 1 - I o U^{α} + \frac{ρ^{2 α} (b, b^{g t})}{c^{2 α}} + {(β υ)}^{α}

(9)

L_{α - G I o U} = 1 - I o U^{α} + {(\frac{| C - (B \cup B^{g t}) |}{| C |})}^{α}

(10)

where

b

represents the prediction box center point,

b^{g t}

represents the real box center point,

B

represents the prediction box,

B^{gt}

represents the true box, and

C

represents the minimum box surrounding

B

and

B^{gt}

. In this paper, we name the YOLOv5 model with an optimized loss function as YOLOv5-AIoU.

3.4. Integration of the CBAM Attention Mechanism

The work of the picking robots is inseparable from accurate target detection. While ensuring the detection speed of YOLOv5, it is also important to improve the accuracy of the detection. In the detection of persimmon images in a natural environment, in addition to the persimmon image information, there will usually be complex background information. To improve the extraction of the persimmon image features by the YOLOv5 model, we combine the dual-channel CBAM with the traditional YOLOv5 model, which makes the model pay more attention to the features of the target and identify the fruit shape features more accurately, thereby improving the detection effect and reducing the false detection rate.

The YOLOv5 model mainly includes two parts: the Backbone and Head. Both the Backbone and Head contain C3 modules. The C3 module in the Backbone focuses more on the location of objects in the image, while the C3 module in the Head pays more attention to the extraction of object features in the image. After the target image experiences the previous C3 module information processing, the image features information is easily lost, and the YOLOv5 model shows a decrease in object recognition accuracy. Therefore, in this paper, we add the CBAM after the C3 module in the Backbone, and the structure is as shown in Figure 9.

As shown in Figure 9, the CBAM contains two sub-modules, namely the channel attention module (CAM) and spatial attention module (SAM). First, the feature map is transmitted into the CAM. Then, the global average pooling (GAP) and global maximum pooling (GMP) are conducted to obtain two feature maps, which are then passed into the fully connected neural network Multi-Layer Perceptron (MLP). The output of the MLP is processed by element-wise addition and then activated by the sigmoid activation function. Lastly, the element-wise multiplication operation is conducted to output feature maps that can be adapted to the SAM module. In the SAM module, the feature maps are still obtained after the GAP and GMP operations. Then, a new feature map is obtained via channel splicing and convolution operation, and the spatial attention feature is activated. The YOLOv5 model combined with the CBAM is named AIoU-CBAM.

3.5. Add a Small Target Detection Layer

In the persimmon-harvesting environment, some persimmons grow at the top of the tree, which is farther away and, therefore, has a smaller target. This makes accurate detection difficult. YOLOv5 can be used to detect targets larger than 8 × 8, 16 × 16 and 32 × 32. In this paper, we add an STDL based on YOLOv5, and a feature detection map of 160 × 160 is added, which can be used to detect targets with an image size larger than 4 × 4, as shown in Figure 10. In this paper, the YOLOv5 model combined with the small target detection layer is named AIoU-CBAM-STDL.

3.6. Combined Centralized Feature Pyramid

Due to the uncertainty in the target size, a single feature size cannot meet the detection accuracy requirement, so a feature pyramid network approach is proposed. This approach assigns a certain region of a different size to each target and extracts contextual information so that the target can be identified in the different detection layers. However, this method is so computationally intensive that it performs poorly when a great number of images are input.

A CFP uses a lightweight MLP architecture to grab the long-distance dependencies and a parallel learnable visual center mechanism to aggregate the locally critical regions of the input image. A lightweight MLP mainly consists of deep learning-based modules and channel-based MLP. At the same time, the deep learning module can reduce the computation process and improve the feature extraction ability. A channel MLP not only reduces the computational difficulty, but also meets the requirements of general vision tasks. A CFP has obvious advantages over feature pyramids, that is, it can grab global long-distance dependencies and can also differentiate feature representations of the target more efficiently. The feature extraction and target detection ability of the model can be improved by combining YOLOv5 with a CFP. The structure of the proposed model is shown in Figure 11.

4. Training of the Model

4.1. Experimental Setup

In this study, the experimental hardware environment uses Intel (R) Core (TM) i7-10870HCPU@2.20GHz as the CPU and NVIDIA GeForce GTX 1650 Ti as the graphics card. The software environments are Pytorch1.10 and CUDA11.3 in Windows 11.

The parameters of the YOLOv5 network are initialized before training the datasets. The image input size is 640 pixels × 640 pixels, the batch-size is 4, the maximum number of iterations is 300, the initial learning rate is 0.01, the cosine annealing hyperparameter is 0.2, and the weight attenuation coefficient is 0.0005. CIoU_loss and Alpha-IoU are used as the loss functions. To validate the effectiveness of this experimental approach, the improved YOLOv5 network is compared with conventional YOLOv5 and SSD on the datasets.

4.2. Detection Experiment

During detection, the model can not only detect the category of target, but also locate the location of the target correctly. Accuracy metrics do not directly reflect the effect of the model testing. Here, mAP_0.5 (mean average precision [IoU = 0.5]) means that the AP (average precision) of all the images in each category is calculated and then averaged over all the categories.

After inputting the datasets into the proposed model for training, the mAP_0.5 results are obtained after 300 iterations by TensorBoard and visualized, as shown in Figure 12. In the first 50 iterations, the value of mAP_0.5 increases significantly, and by the time the iteration reaches 140 iterations, the mAP_0.5 value fluctuates slightly so that the network detection model tends to stabilize.

When the detection network is used for training, validation, and testing, the metric chosen for the performance of the detection network includes the mean average precision (mAP), precision (P), and recall (R). The mAP reflects the accuracy of the detection model; P represents the ratio of the number of targets correctly identified to the number of targets considered; and R represents the ratio of the number of targets correctly identified by the algorithm to the number of targets that the algorithm should have found. The validation results are shown in Table 1.

4.3. Comparative Experiments

In order to verify the superiority of the proposed improved loss function, the traditional YOLOv5 model and the YOLOv5 model introducing the EIoU_loss function (YOLOv5-EIoU) are compared with YOLOv5-AIoU under the same datasets. As shown in Table 2, the improved model loss function of Alpha-IoU has a better detection effect than the other two items.

In order to verify the detection effect of the model proposed in this paper, it is compared with AIoU-CBAM, traditional YOLOv5, and SSD under the same basic environment. The results are shown in Table 3. The mAP value of the test results of unripe persimmon and ripe persimmon increase by 0.06% and 0.41% in the model proposed in this paper compared with the AIoU-CBAM, and they increase by 0.63% and 2.83% compared with the traditional YOLOv5. Compared to the SSD model, the increases are 27.88% and 25.22%. It can be seen that the model proposed in this paper has greater ability to extract image features.

After the training is completed, a trained weight file is selected and the datasets are input into YOLOv5. The Grad-CAM thermal map visualization method is used to visualize the test results, which can more intuitively display the fruit features in the images that YOLOv5 focuses on. As shown in Figure 13, after adding the CFP to the model, the detection layer focuses more on processing the target feature information.

5. Model Test on Small Target Detection

To detect small persimmon in a long-distance scene, an STDL is added to the above-mentioned AIoU-CBAM. We designed a comparative experiment with other models, and the improved detection model is compared with the traditional model. During real-time persimmon detection, the detection effect can be intuitively reflected by three metric: false detection rate (FDR), confidence, and leak detection rate (LDR), as shown in Figure 14. The 1000 images were randomly chosen as the test set to test the FDR and the LDR of the different models, as shown in Table 4.

With the addition of the STDL, the fault detection rate and leak detection rate of the model proposed in this paper decrease by 0.5% and 0.8% compared with AIoU-CBAM-STDL. Compared with AIoU-CBAM, the FDR and the LDR decrease by 0.9% and 3.2%. Compared to the Alpha-IoU, the FDR and the LDR decrease by 2.0% and 6.8%, respectively. Compared with traditional YOLOv5, the FDR and LDR decrease by 1.4% and 11.9%, respectively.

In the scene with numerous persimmons on the left of Figure 14, it can be seen that the traditional YOLOv5 network has the situation of missing and wrong detection when detecting persimmon, as do the AIoU-CBAM-STDL and AIoU-CBAM. By contrast, the model proposed in this paper is able to identify all the persimmons in the image, with high confidence in the identification result.

For the scene with fewer persimmons in the right image, the overlap between the fruit is lower and the environment is simpler than in the left image with a large number of persimmons. Only the model proposed in this paper can successfully identify persimmons with high confidence under the condition of good detection and not leak detection.

In summary, the model proposed in this paper can achieve the best effect in persimmon case detection, and it can avoid missing detection and false detection on the premise of ensuring accuracy.

To test the detection effect in the complex environment of the improved YOLOv5, the persimmon fruit are classified into single fruit occlusion, single fruit no occlusion, overlapping fruit occlusion, and overlapping fruit no occlusion. A total of 115 single fruit persimmon images are selected, of which 40 are single fruit occlusion and 75 are no occlusion. A total of 120 overlapping fruit persimmon images are selected, of which 70 are occlusion and 50 are no occlusion. The ratio between the number of successful identifications and the total number of samples is the failure rate, and the results are shown in Table 5. The persimmon rate of identification can reach more than 92% in both the occlusion and no occlusion conditions, and a graph of the persimmon instance detection is shown in Figure 15.

6. Conclusions

In this paper, a new model based of an improved YOLOv5 model for persimmon recognition and detection is proposed. By combining a CFP with YOLOv5, the feature generalization and robustness of the model are increased. The detection accuracy of YOLOv5 is improved by replacing the GIoU_loss with the Alpha-IoU loss function, while the integration of a CBAM into YOLOv5 makes YOLOv5 more focused on image feature extraction. In addition, an STDL is added to YOLOv5 to reduce the probability of false detection and leak detection in YOLOv5 and improve the detection accuracy of small fruit. In this paper, the collected fruit images are augmented with data enhancement and digital image histogram equalization techniques to make datasets for training, validation, and testing. In order to verify that the improved model has good detection performance under natural occlusion and illumination conditions, target detection is performed when there is some occlusion in the image and the number of persimmons is large or small. The improved model significantly reduces both the leak detection and false detection of traditional YOLOv5. To verify the superiority of the proposed modified loss function model, YOLOv5-AIoU, YOLOv5-EIoU, and traditional YOLOv5 are tested on the same datasets. The detection accuracy of YOLOv5-AIoU is 0.77% higher than that of YOLOv5-EIoU and 1.51% higher than that of traditional YOLOv5. The recall rate of YOLOv5-AIoU is 0.2% higher than that of YOLOv5-EIoU and 3.63% higher than that of traditional YOLOv5. After optimizing the loss function of the model, the detection capability is improved to some extent. At the same time, in order to test the model proposed in this paper, a comparison test is conducted with traditional YOLOv5, AIoU-CBAM, and SSD. For both the unripe and ripe persimmon detection results, the mAP values are 0.63% and 2.83% higher for the model proposed in this paper compared to the traditional YOLOv5. Compared to AIoU-CBAM, the mAP increases by 0.06% and 0.41%, respectively. Compared with the SSD model, the mAP value increases by 27.88% and 25.22%. The recognition rates reach 97.3% and 92.5% for single fruit no occlusion and occlusion, and 94% and 94.3% for overlapping fruit no occlusion and occlusion, in the model proposed in this paper. The model proposed in this paper has better robustness and higher recognition accuracy for persimmons in complex environments.

The improved method in this paper offers some references for the mechanization and intelligence of persimmon picking in natural environments. In future studies, the detection accuracy of persimmons with different shapes will be further improved.

Author Contributions

Conceptualization, methodology, software, writing, review, and editing, Z.C.; investigation, experiments, and data curation, F.M.; review, D.Z. and B.L.; editing and supervision, Y.W. and W.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Opening Project of Key Laboratory of Electric Drive and Control of Anhui Province, Anhui Polytechnic University (No.: DQKJ202203) and University Science Research Project of Anhui Province (2022AH050872).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable to this article.

Acknowledgments

The authors would like to thank the editors and anonymous reviewers for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Tripathi, M.K.; Maktedar, D.D. A role of computer vision in fruits and vegetables among various horticulture products of agriculture fields: A survey. Inf. Process. Agric. 2020, 7, 183–203. [Google Scholar] [CrossRef]
Kapach, K.; Barnea, E.; Mairon, R.; Edan, Y.; Ben-Shahar, O. Computer vision for fruit harvesting robots-state of the art and challenges ahead. Int. J. Comput. Vis. Robot. 2012, 3, 4–34. [Google Scholar] [CrossRef]
Bergerman, M.; Van Henten, E.; Billingsley, J.; Reid, J.; Mingcong, D. IEEE Robotics and Automation Society Technical Committee on Agricultural Robotics and Automation. IEEE Robot. Autom. Mag. 2013, 20, 20–23. [Google Scholar] [CrossRef]
Bechar, A.; Vigneault, C. Agricultural robots for field operations: Concepts and components. Biosyst. Eng. 2016, 149, 94–111. [Google Scholar] [CrossRef]
Wang, Z.F.; Jia, W.K.; Mou, S.H.; Hou, S.J.; Yin, X.; Ze, J. KDC: A Green Apple Segmentation Method. Spectrosc. Spectr. Anal. 2021, 41, 2980–2988. [Google Scholar]
Jia, W.; Zhang, Y.; Lian, J.; Zheng, Y.; Zhao, D.; Li, C. Apple harvesting robot under information technology: A review. Int. J. Adv. Robot. Syst. 2020, 17, 1–16. [Google Scholar] [CrossRef]
Chaivivatrakul, S.; Dailey, M.N. Texture-based fruit detection. Precis. Agric. 2014, 15, 662–683. [Google Scholar] [CrossRef]
Zhao, Y.; Gong, L.; Zhou, B.; Huang, Y.; Liu, C. Detecting tomatoes in greenhouse scenes by combining AdaBoost classifier and colour analysis. Biosyst. Eng. 2016, 148, 127–137. [Google Scholar] [CrossRef]
Tian, Y.; Duan, H.; Luo, R.; Zhang, Y.; Jia, W.; Lian, J.; Zheng, Y.; Ruan, C.; Li, C. Fast Recognition and Location of Target Fruit Based on Depth Information. IEEE Access 2019, 7, 170553–170563. [Google Scholar] [CrossRef]
Koirala, A.; Walsh, K.B.; Wang, Z.; McCarthy, C. Deep learning-Method overview and review of use for fruit detection and yield estimation. Comput. Electron. Agric. 2019, 162, 219–234. [Google Scholar] [CrossRef]
Sultana, F.; Sufian, A.; Dutta, P. Evolution of Image Segmentation using Deep Convolutional Neural Network: A Survey. Knowl. -Based Syst. 2020, 201–202, 106062. [Google Scholar] [CrossRef]
Li, J.Q.; Liu, Z.M.; Li, C.; Zheng, Z.X. Improved Artificial Immune System Algorithm for Type-2 Fuzzy Flexible Job Shop Scheduling Problem. IEEE Trans. Fuzzy Syst. 2021, 29, 3234–3248. [Google Scholar] [CrossRef]
Hou, S.; Zhou, S.; Liu, W.; Zheng, Y. Classifying advertising video by topicalizing high-level semantic concepts. Multimed. Tools Appl. 2018, 77, 25475–25511. [Google Scholar] [CrossRef]
Inkyu, S.; Ge, Z.; Dayoub, F.; Upcroft, B.; Perez, T.; McCool, C. DeepFruits: A Fruit Detection System Using Deep Neural Networks. Sensors 2016, 16, 1222. [Google Scholar]
Liu, G.; Nouaze, J.C.; Touko Mbouembe, P.L.; Kim, J.H. YOLO-Tomato: A Robust Algorithm for Tomato Detection Based on YOLOv3. Sensors 2020, 20, 2145. [Google Scholar] [CrossRef]
Ghoury, S.; Sungur, C.; Durdu, A. Real-Time Diseases Detection of Grape and Grape Leaves using Faster R-CNN and SSD MobileNet Architectures. In Proceedings of the International Conference on Advanced Technologies, Computer Engineering and Science (ICATCES 2019), Antalya, Turkey, 26–28 April 2019. [Google Scholar]
Liang, C.; Xiong, J.; Zheng, Z.; Zhong, Z.; Li, Z.; Chen, S.; Yang, Z. A visual detection method for nighttime litchi fruits and fruiting stems. Comput. Electron. Agric. 2020, 169, 105192. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 27–30 June 2016. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Quan, Y.; Zhang, D.; Zhang, L.; Tang, J. Centralized Feature Pyramid for Object Detection. arXiv 2022, arXiv:2210.02093. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
He, J.; Erfani, S.; Ma, X.; Bailey, J.; Chi, Y.; Hua, X.S. Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression. Adv. Neural Inf. Process. Syst. 2021, 34, 20230–20242. [Google Scholar]
Wang, Y.; Cai, J.; Zhang, D.; Chen, X.; Wang, Y. Nonlinear Correction for Fringe Projection Profilometry with Shifted-Phase Histogram Equalization. IEEE Trans. Instrum. Meas. 2022, 71, 5005509. [Google Scholar] [CrossRef]

Figure 1. Sampling under different light rays. (a) Normal light. (b) Weak light.

Figure 2. Actual persimmon images in a complex environment: (a) single fruit; (b) overlapping fruit; (c) occlusion; (d) no occlusion; (e) long-distance shooting; and (f) close-range shooting.

Figure 3. Dataset acquisition device.

Figure 4. The enhancement effect on a persimmon image: (a) original image; (b) sharpness reduction; (c) brightness enhancement; (d) brightness reduction; (e) added noise; and (f) flip.

Figure 5. Schematic diagram of the Mosaic data augmentation.

Figure 6. Flow chart of the slice operation.

Figure 7. Relative positions of

θ

,

β

, and

g

.

Figure 7. Relative positions of

θ

,

β

, and

g

.

Figure 8. The structure of the CIoU_loss.

Figure 9. YOLOv5 with added CBAM post-structure and CBAM structure.

Figure 10. Comparison of the YOLOv5 output layers before and after adding a small target detection layer. (a) YOLOv5 output layer. (b) Model output layer after adding a small target detection layer.

Figure 11. YOLOv5 improved model structure.

Figure 12. Visualization of the training results.

Figure 13. Visualization of the thermal map of the YOLOv5 partial structural layer.

Figure 14. Comparison of persimmon detection with the different improved models.

Figure 15. Results plot of single and overlapping fruit detection: (a) single fruit occlusion; (b) single fruit no occlusion; (c) overlapping fruit occlusion; and (d) overlapping fruit no occlusion.

Table 1. Various indicators.

	P	R	mAP
Training set	98.95%	88.18%	94.47%
Validation set	92.69%	94.05%	95.53%
Test set	94.26%	90.73%	93.18%

Table 2. Different loss function detection results.

Methods	P	R
YOLOv5-AIoU	92.07%	91.02%
YOLOv5-EIoU	91.30%	90.82%
Traditional YOLOv5	90.56%	87.39%

Table 3. Detection results of the four model indicators.

Methods	mAP (Unripe Persimmon)	mAP (Ripe Persimmon)
Proposed model	98.73%	98.03%
AIoU-CBAM	98.67%	97.62%
Traditional YOLOv5	98.10%	95.20%
SSD	70.85%	72.81%

Table 4. Fault detection rates and omission rates of each model.

Methods	FDR	LDR
Traditional YOLOv5	5.7%	35.6%
YOLOv5-AIoU	6.3%	30.5%
AIoU-CBAM	5.2%	26.9%
AIoU-CBAM-STDL	4.8%	24.5%
Proposed in this paper	4.3%	23.7%

Table 5. Model recognition effect in a complex environment.

Parameter	Single Fruit		Overlapping Fruit
Parameter	Occlusion	No Occlusion	Occlusion	No Occlusion
Number of pictures/pictures	40	75	70	50
Rate of identification/%	92.5	97.3	94.3	94

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, Z.; Mei, F.; Zhang, D.; Liu, B.; Wang, Y.; Hou, W. Recognition and Detection of Persimmon in a Natural Environment Based on an Improved YOLOv5 Model. Electronics 2023, 12, 785. https://doi.org/10.3390/electronics12040785

AMA Style

Cao Z, Mei F, Zhang D, Liu B, Wang Y, Hou W. Recognition and Detection of Persimmon in a Natural Environment Based on an Improved YOLOv5 Model. Electronics. 2023; 12(4):785. https://doi.org/10.3390/electronics12040785

Chicago/Turabian Style

Cao, Ziang, Fangfang Mei, Dashan Zhang, Bingyou Liu, Yuwei Wang, and Wenhui Hou. 2023. "Recognition and Detection of Persimmon in a Natural Environment Based on an Improved YOLOv5 Model" Electronics 12, no. 4: 785. https://doi.org/10.3390/electronics12040785

APA Style

Cao, Z., Mei, F., Zhang, D., Liu, B., Wang, Y., & Hou, W. (2023). Recognition and Detection of Persimmon in a Natural Environment Based on an Improved YOLOv5 Model. Electronics, 12(4), 785. https://doi.org/10.3390/electronics12040785

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Recognition and Detection of Persimmon in a Natural Environment Based on an Improved YOLOv5 Model

Abstract

1. Introduction

2. Persimmon Datasets

2.1. Image Collection

2.2. Dataset Enhancement

3. An Improved YOLOv5 Model

3.1. Mosaic Data Augmentation

3.2. Multi-Scale Feature Extraction

3.3. Loss Function Optimization

3.4. Integration of the CBAM Attention Mechanism

3.5. Add a Small Target Detection Layer

3.6. Combined Centralized Feature Pyramid

4. Training of the Model

4.1. Experimental Setup

4.2. Detection Experiment

4.3. Comparative Experiments

5. Model Test on Small Target Detection

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI