Deep Learning-Based Apple Detection with Attention Module and Improved Loss Function in YOLO
Round 1
Reviewer 1 Report (Previous Reviewer 3)
The authors have addressed all my concerns from previous revision hence paper can be published
Author Response
Dear Reviewer,
Thank you for your valuable review report and suggesting forward for publication.
Best Regards
Dr. S. Praveen Kumar
Reviewer 2 Report (Previous Reviewer 4)
The entitled "Deep learning-based apple detection with attention module and improved loss function in YOLO" investigated the detection of apple using deep learning approaches (Yolov5 architecture) and the model was improved and achieved high accuracy. I would recommend the major revision for it as there are some issues (see below specific comments) needed to be handled before final acceptance.
Specific comments:
1. Line 29-32. The references were in wrong format.
2. line 57. Wrong format.
3. line 60-66. The review the study of CNN for object detection should be well revised by adding specific CNN model, and the detailed object, and the accuracy.
4. line 105-108 can be removed.
5. line 113-124. The data collection should be more clear, especially, the setting of UAV flight routes and overlaps, and the final resolution. How the pictures were radiometrically calibrated? Kindly refer to the following recommended necessary references:
1) The use of unmanned aerial vehicles (UAVs) for remote sensing and mapping
2) Machine Learning-Based Approaches for Predicting SPAD Values of Maize Using Multi-Spectral Images
3) Overview and current status of remote sensing applications based on unmanned aerial vehicles (UAVs)
4) UAVs as remote sensing platforms in plant ecology: review of applications and challenges
6. Avoid all abbreviations when first use.
7. Improve all figures as they are all blur.
8. line 182. Remove ‘as the final training set data’.
9. line 208, try to avoid the same reference in one article.
Author Response
Please see the attachment for the cover letter followed by the responses to Reviewer comments.
Kindly Verify the attachment for your kind perusal.
Author Response File: Author Response.doc
Round 2
Reviewer 2 Report (Previous Reviewer 4)
The manuscript was much improved, and it can be accepted for publication.
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
This paper presents a customised YOLOv5 model architecture for object detection, validated by a case study of apple detection. Despite authors' efforts to illustrate the novelties of their model and test results, the manuscript contains serious presentation flaws and limited critical thinking, giving the impression that not enough time was allocated to prepare this manuscript due to a poor attention to detail. Below is a list of key detailed comments:
- This manuscript does not follow the basic guidelines of instructions to authors nor the provided template. Some examples include missing line numbers, most of the paper acronyms are not defined within the body text nor the figure captions, and inconsistent definition of equations.
- Literature review from paragraph 2 page 2 requires more work, topic sentence is not supported with references. Also, literature review is limited by only listing a few previous works, but it does not include an in-depth discussion of contributions and limitations of former research, and what unresolved gaps are. Without the identification of research gaps, it is quite challenging to assess whether this work is novel or not.
- Authors missed describing the core components of the YOLO architecture (i.e., a detailed description of Fig. 4, 5, and 6)., which are the key contributions of this work. Authors should not assume that readers are already experts and aware of the YOLO CNN architecture. For instance, PANet is never introduced, but it is referred as one of the components on Page 6.
- Paragraph on page 5 contains vague sentences, it does not explain does not explain how attention mechanism works, or how the traditional model failed.
- Second paragraph page 7: "In [23] authors reported the advantages of adap-tive pooling for image segmentation tasks." What are such advantages?
- Authors claimed that one of the contributions of the motivations to use YOLOV5 and design their custom CNN architecture is the computational power demand from other architectures. However, the paper does not contain any key performance indicators in regards to inference time, or execution time in resource-constrained hardware.
- The manuscript does not include important data collection details such as the type of sensors used, weather and illumination conditions, or whether the data was collected by handholding the sensor, of from a robotic manipulator.
- This paper does not include the mean average precision (mAP) in the analysis of their results, which is a key variable in CNNs for object detection. What was the intersection over union (IOU) used here?
- The manuscript does not present a discussion of this work's contributions vs previous works. Without this, it is not possible to evaluate the actual contribution of this paper.
- Most of the used acronyms are never defined in both the body text and figures. Hence, it is challenging to follow the paper discussion.
- The conclusions section is basically a repetition of the abstract. This section should highlight key insights, limitations, and future work.
Author Response
Dear Reviewer,
Thank you for your review comments. It is made sure all of your instructions are well addressed and modified the paper accordingly.
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
This paper proposes a method to detect apples using an attention module and improved loss function in YOLO. The paper basically uses YOLOv5 and the attention module, which has become a common detection tool for the object detection problem.
The purpose of the paper is quite similar to another previous paper entitled “ATSS Deep Learning-Based Approach to Detect Apple Fruits” which was published by another research group for the purpose of detecting apples in Remote Sensing in December 2020. Authors should cite the paper clearly and compare the detection performance as far as the purpose is the same.
Originality and strategy of the paper using YOLOv5 are weak but if proposed loss function played an important role for the detection performance, authors should describe the basis why and how the proposed approach improved the detection performance with this loss function. More detailed contribution including the basis is required although some results are shown in Figure 8, Figure 9, Table 1, and Table 2. Authors should clarify the improvement of using Eq.(3) and the reason why proposed approach provides stronger detection for the illumination variations in Table 1. Attention module is just added and each contribution for the improvement is not clear.
As the results, it is required to describe the basic difference between the previous paper in Dec 2020 and authors should suggest the advantage of the proposed approach. Overall, it seems that the original points and contribution of new idea are not sufficient without confirming each contribution of the paper as suggested.
Author Response
Dear Reviewer,
Thank you for your review comments. It is made sure all of your instructions are well addressed and modified accordingly.
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 3 Report
In this paper authors describe deep learning-based apple detection by using improved YOLOv5 deep learning architecture. I have major concerns regarding the content as follows:
- It is unclear what the novelties are, and should be written out in the Introduction section.
- At the end of Introduction section paper organization should be briefly explained.
- It is unclear how the authors choose the number of augmentation images per augmentation technique.
- Results should clearly indicate which augmentation technique gives the best performance.
- Figure 4 is very unclear, and should be explained in more detail. The figure itself should be annotated with (a),(b),(c) etc. for different workflows shown.
- The results should include k-fold validation.
Author Response
Dear Reviewer,
Thank you for your review comments. It is made sure all of your instructions are well addressed and modified accordingly.
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 4 Report
The authors present deep learning-based scheme to detect apples, improve the Yolov5 architecture by incorporating adaptive pooling scheme and attribute augmentation model. This model detects the smaller objects and improves the feature quality to detect the apples in complex backgrounds. However, this work lacks of innovations in the method,
1. In the abstract, “a loss function is also incorporated to obtain the accurate bounding box which helps to minimize the detection accuracy”, minimize the detection accuracy? Not maximize?
2. Figures clarity need to be improved.
3. Add references in the first half of the first paragraph of the introduction.
4. Data enhancement methods are described in detail, but the specific role of each method is not detailed.
5. The article lacks ablation experiments for attribute augmentation and attention mechanism.
6. The result figure of the article is not clear to see.
7. Table 2 shows the results of using the illumination varied and noisy images as input, which are images that have been augmented by images as inputs and are not real captured images. Is it considered to pick out real images with such features as data inputs to test the effectiveness of the model?
8. The format of the article needs to be carefully revised.
9. there is no number of lines in the whole manuscript, which makes it difficult to read.
10. Avoid all unnecessary blanks.
Author Response
Dear Reviewer,
Thank you for your review comments. It is made sure all of your instructions are well addressed and modified accordingly.
Please see the attachment.
Author Response File: Author Response.docx