Next Article in Journal
Ecosystem Service Synergies Promote Ecological Tea Gardens: A Case Study in Fuzhou, China
Next Article in Special Issue
Overcoming Domain Shift in Neural Networks for Accurate Plant Counting in Aerial Images
Previous Article in Journal
Deep Network Architectures as Feature Extractors for Multi-Label Classification of Remote Sensing Images
Previous Article in Special Issue
Training Machine Learning Algorithms Using Remote Sensing and Topographic Indices for Corn Yield Prediction
 
 
Article
Peer-Review Record

Deep Object Detection of Crop Weeds: Performance of YOLOv7 on a Real Case Dataset from UAV Images

Remote Sens. 2023, 15(2), 539; https://doi.org/10.3390/rs15020539
by Ignazio Gallo 1,*, Anwar Ur Rehman 1, Ramin Heidarian Dehkordi 2, Nicola Landro 1, Riccardo La Grassa 3 and Mirco Boschetti 2
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Remote Sens. 2023, 15(2), 539; https://doi.org/10.3390/rs15020539
Submission received: 30 November 2022 / Revised: 3 January 2023 / Accepted: 11 January 2023 / Published: 16 January 2023

Round 1

Reviewer 1 Report

     In order to detect weeds , this paper have done some important work including :(1) Construct a comprehensive chicory plant dataset, and make it public. (2) Investigate and evaluate the performance of YOLOv7 object detector model for weed identification and detection.The experiments prove the effectiveness of the dataset and the YOLOv7 method.

     I think this work is interesting and can be with good value.

Author Response

We thank the reviewer for his/her constructive comment and positive evaluation of our work.
We have improved the paper following the suggestion of all reviewers and
tried to improve the English as well.

Reviewer 2 Report

This paper examines the possibility of detecting different types of weeds using various object-detector deep learning techniques. The paper introduces why weed detection is needed and what has been currently done. The authors then introduce the reader to the You-Only-Look-Once (YOLO) family of object detectors with emphasis on the new Yolov7. The authors have labeled a large dataset (CP), available on Roboflow, and have trained different models on the CP data. They further compared the results with those of models trained on a previously publicly available dataset, Lincoln Beet (LB). 

 

The paper presents some good results and can be considered for publication after some revisions.

 

General comments:

- It is far too long. Shorten the Abstract, introduction, and methodology.

- Parts of the methodology are explained in the Experiments section and must be changed. 

- Use Hedging. “Computer Vision use Machine Learning for ”… This is simply wrong. Computer vision can use Machine Learning for Machine Learning. However, there are many sentences in which you should use hedging.

- The identity of this paper is unclear. Is this a paper introducing a new dataset, or is it a paper that is comparing different deep learning models?  If it is a data paper, perhaps this is the wrong journal. “We construct a comprehensive chicory plant dataset, and make it public available (…) “

- Your methodology is lacking. How and where did you implement the models? Did you use them as-is from, e.g., your reference [52]? Did you use some of Roboflow’s YOLO models? Or, did you implement them yourself, in that case in what framework? Tensorflow, Pytorch, or in another language, e.g., Julia?  It is not possible to recreate yolov7 using your article you should therefore state where you are using them from. 

- What specific parameters did you use in Roboflow? What type of classes did you initially choose (Object Detection etc.)?  What data pipeline did you further pick in Roboflow?

- It is very difficult to know how well your model performs, qualitatively. Other types of examples, like Figure 5 could be relevant.

 

Specific comments

- Change the title. The word Novel is redundant. If it is not novel, you should not publish it. Furthermore. 

- Line 100: "we cannot use already available trained models.” This is confusing as you are using already trained models yourself for, e.g., transfer learning. Refer to line 386 and line 396.

- You are comparing many different models, but only give us the Recall and mAP Weed. How many parameters does each model have, what was the inference time? Especially so when you emphasize speed etc.

- Line 407: You are referring to table 3, placed 3 pages later.

- Table 5: From what model?

 -Line 429: Image augmentation gave worse results. comment on this since you state it generally increases, e.g., accuracy. In the discussion, you state that there already is an augmentation in the YOLO model. Please elaborate. Is there always augmentation during inference?

- Table 1 is placed on page 7 and is referred to only on page 12. This is very confusing. 

- In line 307: You discuss briefly the pros/cons of spectral multispectral vs. sharp RGB images. Are there any studies further discussing this?

- Line 356: "and many more”. What many more techniques were used?

- Figure 5: what are the classes of the predictions?

 

 

Language:

- Generally, the English grammar is good. However, it is not scientific and is very difficult to read. 

Line 323: "To correctly annotate and label the high-quality UAV captured images, an image-editing application with the ability to quickly label hundreds of weeds objects in an image as well as an enhanced zoom function that concentrates on a specific region and the clarity of the weeds present there was needed”

The sentence is too long and confusing to read.

- Many smaller mistakes that must be corrected, e.g., “this study has illustrated the potentiality”, “Red-green-brown (RGB)”

Author Response

We thank the reviewer for his/her careful reading of the manuscript and their
constructive comments. Below is a detailed point-by-point answer to all comments.
This paper examines the possibility of detecting different types of weeds
using various object-detector deep-learning techniques. The paper introduces
why weed detection is needed and what has been currently done. The authors
then introduce the reader to the You-Only-Look-Once (YOLO) family of object
detectors with emphasis on the new Yolov7. The authors have labelled a large
dataset (CP), available on Roboflow, and have trained different models on the
CP data. They further compared the results with those of models trained on a
previously publicly available dataset, Lincoln Beet (LB).
The paper presents some good results and can be considered for publication
after some revisions.
General comments:
Q1)It is far too long. Shorten the Abstract, introduction, and methodology.
A) We agree with the reviewer so we revised the text. The title has been
changed and the abstract reduced from 466 to 394 words (27 lines vs the
previous 32). The introduction has been shortened from 101 to 89 lines.
Q2) Parts of the methodology are explained in the Experiments section
and must be changed.
A) We removed the "Experiments" section and included its content in the
"methodology" section and Supplementary Material document.
Q3) Use Hedging. “Computer Vision use Machine Learning for ”. . . This
is simply wrong. Computer vision can use Machine Learning for Machine
Learning. However, there are many sentences in which you should use
hedging.
A) We correct this mistake.
Q4) The identity of this paper is unclear. Is this a paper introducing a new dataset, or is it a paper that is comparing different deep learning
models? If it is a data paper, perhaps this is the wrong journal. “We
construct a comprehensive chicory plant dataset, and make it publicly
available (. . . ) “
A) The paper introduces a new dataset for an interesting problem, namely
weed detection, but at the same time compares different deep learning
models with one of the latest versions (v7) of models of the YOLO family for object detection. Since UAV technology is becoming very useful
and affordable for everyone, we are interested in experimenting with the
YOLOv7 model for weed detection starting from images acquired with a
UAV technology.
Q4) Your methodology is lacking. How and where did you implement
the models? Did you use them as-is from, e.g., your reference [52]? Did
you use some of Roboflow’s YOLO models? Or, did you implement them
yourself, in that case in what framework? Tensorflow, Pytorch, or in
another language, e.g., Julia? It is not possible to recreate yolov7 using
your article you should therefore state where you are using them from.
A) We used as-is the Pytorch implementation of YOLOv7 which anyone
can download from [52](https://github.com/WongKinYiu/yolov7). We
have not used the Roboflow YOLO implementation. We explained this
concept better in the paper, reporting the reference of the model we used.
Q5) What specific parameters did you use in Roboflow? What type
of classes did you initially choose (Object Detection etc.)? What data
pipeline did you further pick in Roboflow?
A) We used the Roboflow tool only to prepare our CP (Chicory crop) im-
ages according to the YOLOv7 format. We uploaded all CP images in
Roboflow as a public dataset and used the object detection class and an-
notated the objects as ‘weed’. In the case of the data pipeline, different
object detection models’ formats were given in Roboflow. First, to generate the CP dataset, we exported all the labelled and annotated images in
YOLOv7 format. Second, before exporting, we used Roboflow data augmentation parameters like Flip, 90° rotate and mosaic data then generated
the augmented CP dataset. After exporting these two datasets, we used a PyTorch-based YOLOv7 object detector to perform experiments on our
machines (details are given in 5.2).
Q6) It is very difficult to know how well your model performs, qualitatively.
Other types of examples, like Figure 5 could be relevant.
A) We have added a new figure containing the predicted bounding boxes
of the YOLOv7 model when applied to the LB dataset.
Specific comments
Q1) Change the title. The word Novel is redundant. If it is not novel, you
should not publish it. Furthermore.
A) We changed the title following your suggestion.
Q2) Line 100: "we cannot use already available trained models.” This
is confusing as you are using already trained models yourself for, e.g.,
transfer learning. Refer to line 386 and line 396.
A) Thanks for this comment. We edit line 100 to clarify why we are unable
to use another dataset to train our model to detect weeds from our images.
Q3) You are comparing many different models, but only give us the Recall
and mAP Weed. How many parameters does each model have, what was
the inference time? Especially so when you emphasize speed etc.
A) YOLOvF, YOLOv3, and SSD have only recall and mAP in terms of
parameters. We do not have other parameters to add, as we used the
mmdetection library for the YOLOvF, SSD and YOLOv3 implementations, and the mmdetection library only calculated and showed recall and
mAP, which is why in Table 6. we only compare recall and mAP between
YOLOv7, YOLOv3, YOLOvF, and SSD. If we talk specifically then the
only YOLOv7 model calculated recall, precision, and mAP for the LB
and CP datasets. Further, in the case of speed or frame rate, we also did not
calculate it during our experiments.
Q4) Line 407: You are referring to table 3, placed 3 pages later.
A) We moved the table next to the text that refers to the table.
Q5) Table 5: From what model?
A) We have corrected and referred to the model for table 5 in the results and
in the table 5 caption.
Q6) Line 429: Image augmentation gave worse results. comment on this
since you state it generally increases, e.g., accuracy. In the discussion, you
state that there already is an augmentation in the YOLO model. Please
elaborate. Is there always augmentation during inference?
A) Generally, data augmentation increases the dataset size and helps im-
prove the accuracy of a smaller dataset. However, in the case of YOLOv7,
it automatically incorporated the standard data augmentation techniques
during training and increase the training set size to get more accurate
results. That is why further non-tailored data augmentation may not
contribute to accuracy improvements. Dang et al.[2] also reported the
lower performance of YOLOv5 while using custom data augmentation
techniques as it also used standard data augmentation techniques dur-
ing training. In YOLOv7, There is always augmentation during training
however there is not always an augmentation during inference.
Q7) Table 1 is placed on page 7 and is referred to only on page 12. This
is very confusing.
A) We moved the table next to the text that refers to the table.
Q8) In line 307: You discuss briefly the pros/cons of spectral multispectral
vs. sharp RGB images. Are there any studies further discussing this?
A) We added some new references and related discussion in the section
"Chicory Plant (CP) dataset".
Q9) Line 356: "and many more”. What many more techniques were used?
A) "many more" refers to image augmentation techniques in general.
We have added a link to the image augmentation techniques available
in Roboflow.
Q10) Figure 5: what are the classes of the predictions?
A) We added class information to the caption of this figure.
Language:
Q1) Generally, the English grammar is good. However, it is not scientific
and is very difficult to read. Line 323: "To correctly annotate and label
the high-quality UAV captured images, an image-editing application with
the ability to quickly label hundreds of weeds objects in an image as well
as an enhanced zoom function that concentrates on a specific region and
the clarity of the weeds present there was needed”
The sentence is too long and confusing to read.
A) We have simplified the sentence.
Q2) Many smaller mistakes that must be corrected, e.g., “this study has
illustrated the potentiality”, “Red-green-brown (RGB)”
A) We corrected these mistakes, thank you.

 

Reviewer 3 Report

1. Line 194: What is the original image format and the targeted image format for YOLO V7? 

2. Line 350: Are the augmented images already included in the dataset? I think it would be btter to provide only the original images and leave the options of augmentation to the users. 

3. Figure 6: Could you give some explanations about why the large bounding box is given in the prediction. Do you think it is necessary to give some constraints (which are easy to apply) to avoid such errors? 

4. Table 3: Could you list the number of parameters for each model? 

5. What types of weeds are considered in the datasets? 

6. It would be better to justify why the datasets can be used as benchmarks considering the size of weeds can be varying in different growing stages. 

7. It would be good if the authors can comment whether your datasets can be used for the purpose of pretraining weed detections models for fine-tuning for other regions.

Author Response

We thank the reviewer for his/her careful reading of the manuscript and their
constructive comments. Below is a detailed point-by-point answer to all comments.


Q1) Line 194: What are the original image format and the targeted image
format for YOLO V7?
A) The captured DJI phantom4 pro-UAV (Unmanned Aerial Vehicle)
images were in RGB format with a dimension of 5472×3648 pixels. More
specifically, we used RGB images in JPG format in the CP dataset and
targeted images format for YOLOv7 were also RGB with JPG format
after the experimentation process.
Q2) Line 350: Are the augmented images already included in the dataset?
I think it would be better to provide only the original images and leave
the options of augmentation to the users.
A) We generated two versions of the CP dataset: the normal CP dataset
and the augmented CP dataset. We provide the link to the CP dataset
which contains only original images without any augmented images. You
can find the link to the dataset in the conclusion section. Researchers can
download the CP dataset directly from Roboflow and use data augmentation techniques according to their needs.
Q3) Figure 6: Could you give some explanations about why the large
bounding box is given in the prediction. Do you think it is necessary to
give some constraints (which are easy to apply) to avoid such errors?
A) We explained in the "Results" section, the motivation for the large
bounding boxes produced by the pre-trained model. After some experiments, we know that a constraint on the size of the bounding box is not a solution.
Q4) Table 3: Could you list the number of parameters for each model?
A) We used the YOLOv7 model for CP and LB datasets. For the compar-
ison, we also used YOLOv3, YOLOvF, and SSD detectors. The important
parameters for YOLOv7 and YOLOv3 are already given in table 2. Further, SSD and YOLOvF parameters are added in the paper in a new Table
7.
Q5) What types of weeds are considered in the datasets?
A) Mercurialis annua, a.k.a. French mercury, which is a weed species of Euphorbiaceae spurge family was being monitored in the present study.
This is a rather common wheat species within the cereal fields of European countries like Belgium [1]. We added this information in the section "Chicory Plant (CP) dataset".
1. P.K. Jensen, V. Bibard, E. Czembor, S. Dumitru, G. Foucart, R.J.
Froud-Williams, J.E. Jensen, M. Saavedra, M. Sattin, J. Soukup,
A.T. Palou, J.-B. Thibord, W. Voegler & P. Kudsk. Survey of weeds in maize crops in Europe.
2. Dang, F., Chen, D., Lu, Y., Li, Z., Zheng, Y. (2022). DeepCottonWeeds (DCW): a novel benchmark of YOLO object detectors for
weed detection in cotton production systems. In 2022 ASABE Annual International Meeting (p. 1). American Society of Agricultural
and Biological Engineers.
Q6) It would be better to justify why the datasets can be used as benchmarks considering the size of weeds can be varying in different growing
stages.
A) In the "Discussion" section we added the reasons why it is useful to
have a dataset like the proposed CP dataset.
Q7) It would be good if the authors can comment on whether your datasets can be used for the purpose of pretraining weed detection models for fine-tuning for other regions.
A) In the "Discussion" section we added the reasons why it is useful to
have a dataset like the proposed CP dataset that can be used to pre-train
detection models and use them in different regions.

Round 2

Reviewer 2 Report

The introduction and methodology are much better.  

The manuscript could still be more scientific and precise. However, it can still be accepted as it is.

What does the following sentence in line 300 mean: “Furthermore, RGB images offer a sharper representation of surface details”.

Compared to what and sharper in what sense? Panchromatic images offer higher S/N ratio while having the same spatial resolution, but not the same spectral information.

Back to TopTop