Next Article in Journal
Cybersecurity Enhancement of Smart Grid: Attacks, Methods, and Prospects
Previous Article in Journal
Direct Method for Reconstructing the Radiating Part of a Planar Source from Its Far-Fields
 
 
Article
Peer-Review Record

Real-Time Detection of Mango Based on Improved YOLOv4

Electronics 2022, 11(23), 3853; https://doi.org/10.3390/electronics11233853
by Zhipeng Cao and Ruibo Yuan *
Reviewer 1:
Reviewer 2:
Electronics 2022, 11(23), 3853; https://doi.org/10.3390/electronics11233853
Submission received: 27 September 2022 / Revised: 4 November 2022 / Accepted: 5 November 2022 / Published: 23 November 2022
(This article belongs to the Topic Computer Vision and Image Processing)

Round 1

Reviewer 1 Report

This paper presents a credible improvement to the standard YOLOv4 for the application of mango fruit picking robot. The experimentation is rigorous and proves the capabilities of the approach convincingly. However, I have two related major issues:

1) The motivation for the paper seems unsound. The authors draw attention to the “huge” computation of YOLOv4. However, I don’t see why the level of computation is a major problem here. According to Table 1, standard YOLOv4 can operate at 24.5 FPS. This seems easily fast enough for the application of fruit picking. Please clarify why a fast rate is required.

2) It should be make clear in the intro and conclusion whether the main contribution is a modification of YOLOv4 for one specific application (in which case, the paper should be termed an application paper) or whether the research is more fundamental than this (in which case describe what other applications there might be for the method).

Other points:

·         The middle part of the abstract is one huge sentence. Please break this down. Also, the abstract does not make clear what the output of the algorithm/network is.

·         There are quite a few typesetting/formatting issues. E.g. no need to include author initials in the main text, spaces missing after square brackets, etc.

·         All three of the points made near the start of section 2.1 are unclear and need to be re-written.

·         The heading for Section 2.1 is superfluous.

·         The last paragraph of Sec 2.1 is too wordy. Just say: “A total of 1700 images were captured for this paper, which were broken down into 10% each for testing and validation and 80% for training”.

·         Explain why all the images are so dark.

·         In Section 3.1, “continue to acquire more information” is not clear.

·         Explain more about the motivation for some of the changes to the existing YOLO network. For example, Sec 3.2.2 mostly describes the current CSP_Darknet53, but includes little to justify the changes made. Similarly, in Sec 3.2.1 you need to clearly state what is meant by “width” in this context (i.e. width of WHAT?).

·         The results section is thorough and compelling. However, there are too many words. Try and reduce the word count of Sec 4 by 50% by removing unnecessary descriptions of what can easily read from the table directly.

·         The caption to Fig 7 needs to describe the figure more fully.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

This is an interesting study that has broad applications in robotics and real-time image recognition and processing. However, there are a number of significant weaknesses.

(1) The novelty of the study is not sound as elaborated in the manuscript

(2) The proposed model algorithms are not well elaborated in comparison with the existing ones

(3) The methodology and experimental setup are not elaborated adequately

(4) The manuscript contexts require extensive editing. There are many unclear statements and improperly used symbols.

(5) The attained results are adequately elaborated.    

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The revisions have definitely improved the paper, but it its not yet at publication standard as it is hard to follow in places (e.g. the new caption to fig 7). When you have made the next revision, I suggest that you ask a colleague who is NOT familiar with the work to see if he/she understands it.

 

 

I'm afraid I am unsatisfied by my answer to point 1 (about the motivation for high speed). I don't fully understand your response and it needs to be included in the paper. I could not see any red text about this.

 

Abstract:

 

Grammar: "The new and improved network model is named YOLOv4-LightC-CBAM, the improved net- 13 work."

 

 

Sec 2:

State what model of "professional camera" was used.

 

I suggest the following:

1. In order to ensure the diversity of data, the database consisted on many different individual hanging fruits from different trees.

2. The shooting time was at night (STATE WHY). In order to ensure observable mango fruits, light (YOU MEAN A FLASH?) was used for illumination.

3. In order to obtain more characteristic information, the mangoes images should include the presence of complicating factors such as occlusion. The database shoul also be sufficiently large. 

 

It's good that you simplified around line 106. But I would still mention the percentage of images kept back for training.

 

It's good that you added a caption to fig 7, but it's hard to understand

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors made some changes, but I don't see much of changes with respect to the given comments.

- The notations given in Equations (1) and (2) have to match with the ones in Equation (3). The notations given in Equation (4) must be explained and referred to it in the contexts.  

- Many acronyms are left unexplained.

- There are many sentence structures are to be corrected.  

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 3

Reviewer 2 Report

You have done some work on the revised version but still, you need to revise the manuscript.

In Equation (4), p(r) and r need to be explained.

Author Response

In Equation (4), p(r) and r need to be explained.

Response: Thanks for your suggestion.

In the P-R curve, P is Precision on the vertical axis and R is Recall on the horizontal axis. In Equation (4), p(r) is the functional relation between P and R in P-R curve, and r is the corresponding horizontal axis.

Back to TopTop