Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Research on Multi-Scale Pest Detection and Identification Method in Granary Based on Improved YOLOv5

Agriculture 2023, 13(2), 364; https://doi.org/10.3390/agriculture13020364

by Jinyu Chu^1,2,3

, Yane Li^1,2,3, Hailin Feng^1,2,3,*, Xiang Weng⁴ and Yaoping Ruan^1,2,3

Reviewer 1:

Sudhakar Tummala

Reviewer 2:

David Hunter

Agriculture 2023, 13(2), 364; https://doi.org/10.3390/agriculture13020364

Submission received: 13 January 2023 / Revised: 30 January 2023 / Accepted: 31 January 2023 / Published: 2 February 2023

(This article belongs to the Section Digital Agriculture)

Round 1

Reviewer 1 Report

The authors proposed a deep learning model based on improvements to Yolov5 for detection of granary pests. Overall, the manuscript reads well, and I have the following suggestions to improve.

1. Tha main contributions of the work has to be highlighted in the introduction section.

2. What is the resolution of the images before the mosaic data enhancement? What effect does the model has when trained without image enhancement.

3. More details should be given for C3 and SPPF in the model architecture.

4. How did you optimize the hyperparameters in Table 3? Did you choose empirically?

5. Specificity values should be reported in the results.

6. For fair comparisons between yolov5 and improved yolov5, figure 11 should have same images.

7. A through comparison with the previous works should be given in the form of a table.

8. How does the model perform on some unseen dataset available in the literature. This is required to understand the model generality.

9. What about using Yolov6 and v7 models? They are already improved versions.

10. What is the practical use of this work? Its not very clear. How do you use this in real scenarios?

11. The definitions of FP and FN in lines 245-247 are not clear. They should be defined clearly for the sake of readers.

12. Figure 7 is not clear, more details should be provided.

Author Response

Following the suggestions of the editor and the reviewers, we have made significant modifications to our first manuscript. We have supplemented the content of the article, corrected the grammar of the full text and polished the overall wording. The major corrections in the paper are listed as below:

A paragraph has been added to the introduction to highlight the main contributions of this work.
The relevant answers to the relevant questions that have been asked are provided in the table below.
Additional explanations with more details are given for C3 and SPPFin the model architecture.
The commonly used evaluation metrics for target detection are Accuracy, Recall and mAP. The article has selected appropriate evaluation indicators to properly evaluate the model results.
A detailed explanation of some relevant parameters and definitions, such as FN, FPand TP.
In order to compare the effect of the model before and after improvement, both Figure 11 and Figure 12 show the detection results of the same image input before and after the improvement of the model.
The paper was modified thoroughly in presentation, language, citations, and references, etc. All the changes are marked up with the “Track Changes”function using MS Word in the revised manuscript.

Point 1:Tha main contributions of the work has to be highlighted in the introduction section.

Response 1:Thanks for the reviewer’s valuable comment.

The main contributions of this work including three points. Firstly, we collected one dataset, including seven common grain bin pests, containing mixed kinds of images in different backgrounds and environments. Second, we designed an improved yolov5 model, which can solve the problem of detecting and identifying multiple species of grain bin pests in complex backgrounds to some extent. The average accuracy of the improved model proposed in this study can reach 97.20%, and 98.20% for mAP0.5. The experimental results show that the detection and recognition ability and generalization ability of this model are higher than those of the above models.

We have added the main contributions of the work in the revised manuscript, as shown in the introduction section.

Point 2:What is the resolution of the images before the mosaic data enhancement? What effect does the model has when trained without image enhancement.

Response 2:Thanks for the reviewer’s helpful suggestion. The dataset contains four types of images with resolutions of 2188 x 2918 (cell phone), 5800 x 1200 (SLR), 3648 x 2746 (microscope), 1828 x 1219 (network), and 1200 x 800 (mixed species).

We have added resolution of the images before the mosaic data enhancement in the revised manuscript, as shown in the Dataset section.

The yolov5 original network without mosaic data enhancement has 87% mAP for the results of this dataset.

We have added the effect of the model when trained without image enhancement in the revised manuscript, as shown in the Table 5.

Table 5. Comparison of data enhancement results of the network before and after improvement

Model	Average accuracy（mAP_0.5）/%	Recall (R)/%
Yolov5s(no Mosaic)	87.33	81.57
Yolov5s	95.17	90.45
Improved Yolov5s(no Mosaic)	88.51	82.28
Improved Yolov5s	98.20	96.85

As can be seen from Table 5, the Mosaic data augmented approach for both the pre- and post-improvement networks yielded a significant improvement in mAP and Recall recall. Using the improved algorithm with Mosaic data augmentation, mAP0.5 improved from 88.51% to 99.20 % (+9.69%) and Recall improved from 82.28% to 96.85% (+14.57%) over the network without Mosaic algorithm. It is shown experimentally that the Mosaic data enhancement approach enriches the data sample and improves the generalization of the model.

Point 3:More details should be given for C3 and SPPF in the model architecture.

Response 3:Thanks for the reviewer’s valuable comment.

The C3 module connects the feature information and fuses the feature information before and after the network. The C3 module contains three standard convolutional layers and several Bottleneck modules, which are the main modules for learning the residual features. The C3 module first divides the feature mapping of the base layer into two parts, and then merges them through the cross-stage hierarchy, which reduces the computational effort while ensuring accuracy.The Conv module performs convolution, BN, and activation function operations on the input feature map, which implements a convolution layer, an activation function, and a normalization layer to obtain an output layer from the input features. SPPF is a spatial pyramid pooling, which serves as an implementation of an adaptive size output that can convert feature maps of arbitrary size into feature vectors of fixed size. The SPPF effectively avoids problems such as image distortion caused by cropping and scaling operations on image regions.The SPPF module maximizes pooling layer and increases the perceptual field to solve the problem of fusing repetitive features generated in convolution and to improve the speed of the network in generating candidate frames.

We have added the details of the C3 and SPPF in the model architecture, as shown in the YOLOv5 Algorithm section.

Point 4:How did you optimize the hyperparameters in Table 3? Did you choose empirically?

Response 4:Thanks for the reviewer’s helpful comment. The hyperparameters in Table 3 were adjusted and optimized by experiments. First the model is trained by selecting the original parameters of the network model, and then by adjusting each parameter to achieve the best training results.

Point 5:Specificity values should be reported in the results.

Response 5:Thanks for the reviewer’s valuable suggestion. The commonly used evaluation metrics for target detection are Accuracy, Recall and mAP. The article has selected appropriate evaluation indicators to properly evaluate the model results.

Point 6:For fair comparisons between yolov5 and improved yolov5, figure 11 should have same images.

Response 6:Thanks for the reviewer’s helpful comment. We have replaced figure 11 and figure 12 with same images in the revised manuscript.Please see Figure 11 and Figure 12 in the attached revised draft

Point 7:A through comparison with the previous works should be given in the from of a table.

Response 7:Thanks for the reviewer’s valuable suggestion.

Tables 4 and 5 have shown in tabular form the comparison of performance between models, and compared the improved model with previous work.The dataset used in this paper is one of the features of this paper, and the comparison of various models based on this dataset is carried out in this paper.

Point 8:How does the model perform on some unseen dataset available in the literature. This is required to understand the model generality.

Response 8:Thanks for the reviewer’s helpful comment. There are almost no publicly available datasets for grain bin pests, and most of the datasets mentioned in the references are those produced by the authors themselves, and this study is based on this dataset for testing and validation.

Point 9:What about using Yolov6 and v7 models? They are already improved versions.

Response 9:Thanks for the reviewer’s valuable suggestion. The present dataset was validated and tested on Yolov6 and v7 networks, respectively, and the mAP0.5 obtained from the experiments were 96.17% and 96.51%, respectively.

Please see Table 4 for more details.

Point 10: What is the practical use of this work? Its not very clear. How do you use this in real scenarios?

Response 10:Thanks for the reviewer’s helpful comment. The pest detection model of grain silo proposed in this work can be used for the detection and identification of grain silo pests in real environment. The food problem is an important national issue, and the protection of food security is the key to solve the livelihood problems. In a complex grain bin environment, the method proposed in this study can be used to monitor grain bin pests by carrying a camera in the grain bin environment and inputting real-time photos into this model for identification.

Point 11:The definitions of FP and FN in lines 245-247 are not clear. They should be defined clearly for the sake of readers.

Response 11:Thanks for the reviewer’s valuable suggestion.

TP: True Positive, which is judged as a positive sample. TP represents the number of targets detected by correct identification. FP: False Positive, the sample is judged as positive, but in fact it is negative. FP is the number of missed and wrong detections.

FN: False Negative, the sample is judged as negative, but in fact it is positive. FN is the number of target objects detected as other kinds of objects.

We have added the definitions of FP and FN in the model architecture, as shown in the Evaluation indexes for detection model section.

Point 12: Figure 7 is not clear, more details should be provided.

Response 12:Thanks for the reviewer’s helpful comment. In figure 7, Pi (i=3,4,...,7) represent each feature layer in the network structure, and feature information flows repeatedly between nodes in each layer through top-down and bottom-up directions. BiFPN processes each bidirectional path as a feature network layer and repeats the same layer several times to achieve higher-level feature fusion. BiFPN is more compact and feature fusion is more efficient.

We have added the details of figure 7 in the revised manuscript, as shown in the Feature Pyramid Improvement section.

The revised article is attached for your review.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments on Chu et al Pest ID in Granary

There is a lot of good work here, but the presentation of the results needs to be improved.

Abstract

Line 13: “Accurately detecting and identifying granary pests is important in effectively controlling the damage to a granary, ensuring….”

Line 18: “..using the YOLOv5 (You Look Only Once version 5) object detection algorithm incorporating the Bidirectional… …. Attention (ECA) Modules.

Line 20: “..we compared the performance of different…”

Line 26 “..mechanism was 96.9%.”

Introduction

Line 41: “…granary pests, which can then lead to effective control of pests and reduced damage, scientifically and efficiently…”

Line 45: “….species and number of pests in granary bins, which has the disadvantage of lower efficiency and limited professional ability to identify granary pests.”

Line 49:”..are limited by lower detection sensitivity, long times spent, high labour costs and the inability to detect hidden pests.”

Line 51 and elsewhere: put a space between words and the reference number: “method [6]” in line 51, “methods [8]” in line 52, 90% [10]” in line 59 etc

Line 55: “resulting in chemical methods being difficult to popularize.”

Line 65: “..features, and it is difficult to identify the marker….background resulting in low accuracy and slow speed of identification.”

Line 80: “..and tested them on the brown rice lice dataset..”

Line 89: “..there is not a good model available to achieve effective results..”

Line 104: “The flowchart of the methods used in this study are shown in Figure 1.” Also line 107” “Figure 1. General flowchart of the methods used in this study.” Use “study” not “paper” in line 110 and elsewhere eg line 301.

Methods

Line 113-4: “Illustrations for the images of the various granary pests are shown….”

Line 116: “Figure 2. Illustrations for the images of the various granary pests. “

Note that by convention, the Genus name is capitalized but species name is not, and the Genus species name is in italics: Rd = Rhizopertha domininca, So = Sitophilus oryzae, etc. Make these changes in Figure 2, Table 1, Table 5 and elsewhere in this paper.

Figures 4 & 5: need to be slightly larger to make the words easier to read.

Line 159: “..maximizes the pooling layer and …”

Line 163: “which further enhances the diversity…”

Line 170: “..it is difficult to obtain satisfactory results using the traditional detection…”

Line 215: “..prediction, non-maximum suppression (NMS) is needed…”

Line 225: “..large, and the distance between the two frames is also relatively large, it is classified as being the frames of two objects and will not be ??? not be what??

Results and Analysis

Precision correct detection of all detected targets Recall correct detection in +ve samples

Line 283: “..The results show that the model…”

Line 292: “..the model established by the improved algorithm increased the detection from 87.6% to 99.0% of granary pests like the cereal beetle Rhizopertha domininca which are often obscured due to complex backgrounds. This model is better able….”

Line 296: Use the species names for rust stealers and red flat cereal beetles to make it easier to see what you are referring to in Table 5.

Line 302: “..only one improvement method was added at each step to verify the improvement effect of each on the original algorithm.”

Lines 310-311: “The model established with improved algorithm had an improvement in AP_0.5 from 95.1 to 98.2% (+3.1) and an improvement in Recall from 90.45 to 96.85% (+6.40). Do the same for lines 306-14 the text to make it easier to see the amount of improvement.

Line 344: “Figure 11 shows illustrations of the recognition results..”

Line 347: “the model is not really able to detect….serious shading. The improved..”

Line 351: “Figure 11: Illustrations of the ….”

Conclusion

Line 363-64: “However, due to the limitations of the dataset, further study is needed that includes more pest species….”

References

Line 381: Why is reference 2 in ALL CAPITALS?

Author Response

Every grammatical error issue raised was carefully revised and formatting errors were addressed.
Revised the font of species names in Tables 1 and 5, and abbreviated species names in Table 5 for better presentation in the tables.
Enlarged the image size of Figure 4 and Figure 5 for easier reading.
The paper was modified thoroughly in presentation, language, citations, and references, etc. All the changes are marked with red words in the revised manuscript.

Point 1:Line 51 and elsewhere: put a space between words and the reference number: “method [6]” in line 51, “methods [8]” in line 52, 90% [10]” in line 59 etc

Response 1:Thanks for the reviewer’s valuable comment. This issue has been modified in the revised manuscript.

Point 2:Line 104: “The flowchart of the methods used in this study are shown in Figure 1.” Also line 107” “Figure 1. General flowchart of the methods used in this study.” Use “study” not “paper” in line 110 and elsewhere eg line 301.

Response 2:Thanks for the reviewer’s helpful comment. This issue has been modified in the revised manuscript.

Point 3:Line 51 and elsewhere: put a space between words and the reference number: “method [6]” in line 51, “methods [8]” in line 52, 90% [10]” in line 59 etc

Response 3:Thanks for the reviewer’s valuable comment. This issue has been modified in the revised manuscript.

Point 4:Line 104: “The flowchart of the methods used in this study are shown in Figure 1.” Also line 107” “Figure 1. General flowchart of the methods used in this study.” Use “study” not “paper” in line 110 and elsewhere eg line 301.

Response 4:Thanks for the reviewer’s helpful comment. This issue has been modified in the revised manuscript.

Point 5:Line 116: “Figure 2. Illustrations for the images of the various granary pests. “

Note that by convention, the Genus name is capitalized but species name is not, and the Genus species name is in italics: Rd = Rhizopertha domininca, So = Sitophilus oryzae, etc. Make these changes in Figure 2, Table 1, Table 5 and elsewhere in this paper.

Response 5:Thanks for the reviewer’s valuable comment.This issue has been modified.

Point 6:Lines 310-311: “The model established with improved algorithm had an improvement in AP0.5 from 95.1 to 98.2% (+3.1) and an improvement in Recall from 90.45 to 96.85% (+6.40). Do the same for lines 306-14 the text to make it easier to see the amount of improvement.

Response 6:Thanks for the reviewer’s helpful comment. This issue has been modified in the revised manuscript.

Point 7:References

Line 381: Why is reference 2 in ALL CAPITALS?

Response 7:Thanks for the reviewer’s valuable comment. The original title of this reference is in capitalized form.

Point 8:All other questions

Response 8:Thanks for the reviewer’s helpful comment. We have carefully read the grammar and presentation issues you have raised, and have revised them.

The revised article is attached for your review.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

No comments.

Reviewer 2 Report

Much improved presentation. Only a few minor changes: Editor: correct spacing between words: "However", line 73, "In", line 75 etc. Italic for species names in Figure 2, Labelling NOT Labelimg, line 181 "Figure 10. Illustrations for feature visualization of the granary pest recognition model" (Illustrations already corrected for Figure 11)

Article Menu

Research on Multi-Scale Pest Detection and Identification Method in Granary Based on Improved YOLOv5

Further Information

Guidelines

MDPI Initiatives

Follow MDPI