Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Vineyard Zoning and Vine Detection Using Machine Learning in Unmanned Aerial Vehicle Imagery

Remote Sens. 2024, 16(3), 584; https://doi.org/10.3390/rs16030584

by Milan Gavrilović¹

, Dušan Jovanović^1,*

, Predrag Božović²

, Pavel Benka²

and Miro Govedarica¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Remote Sens. 2024, 16(3), 584; https://doi.org/10.3390/rs16030584

Submission received: 10 January 2024 / Revised: 29 January 2024 / Accepted: 31 January 2024 / Published: 3 February 2024

(This article belongs to the Special Issue Remote Sensing in Viticulture II)

Round 1

Reviewer 1 Report (Previous Reviewer 2)

Comments and Suggestions for Authors

The manuscript presents a model for detecting and counting vineyards (both live and wilted) to subsequently create management zones, using RGB and NIR images captured by UAVs. For this purpose, algorithms such as YOLOv5 were employed for vineyard detection and k-means clustering was used to create the management zones. In addition to this type of remote sensing data, data from biochemical analyses on the leaves and petioles of the grapevines are also used.

Following, I outline the problems and flaws that I identified throughout the chapters.

Abstract

Lines 15-18: Break down the sentence for improved readability. Example: “Vineyard zoning, achieved through the application of the K-means algorithm, incorporates geospatial data such as the Normalized Difference Vegetation Index (NDVI) and the assessment of nitrogen, phosphorus and potassium content in leaf blades and petioles. This approach enables efficient resource management tailored to each zone's specific needs”.

1. Introduction

The introduction became a bit long. I suggest shortening it, summarizing as much as possible, while ensuring essential content is retained.

Lines 133 and 135: In cases where you are referring directly to the study, avoid writing the text in this way (“in a paper [34]” and ”In the study [33]”). It is desirable to write the authors' names and, next to them, the reference number. Examples:

Line 133: “(…) identify vineyards, e.g., Comba et al. [34], proposed an unsupervised (…)”.
Line 135: “Jurado et al. [33] proposed an automatic method for grapevine trunk detection using point clouds.”

Line 153: “Numerous authors [57–61] proposed (…) ”. And not propose. Check the verb conjugation along the text.

Line 167: Th authors refers “some papers”, however only refers 1. Rephrase the text or include more references.

Correct similar occurrences along the document (if there is more).

2. Material and methods

Table 1: In the first review I suggested asked the authors to give the sensor information, however, simple plain text information is sufficient. Remove the table and add the information in text, for example:

“The research used a DJI Phantom P4 v2.0 drone equipped with a multispectral camera, MicaSense RedEdge-M. This sensor captures images in five bands: blue (475 nm), green (560 nm), red (668 nm), Red Edge (717 nm) and NIR (840 nm). To perform the vine (…).

Lines 212 and 213: Specify the exact dates or time frames for the two time points to provide a clearer understanding of the study timeline.

4. Discussion

As requested in the initial review, the authors have added more information to the discussion and have addressed the findings of studies along the same lines as their proposal. However, in my opinion, overall, the writing in this section should be improved. The language should be more refined, using a more scientific and formal English. Additionally, throughout the text, there are instances where paragraphs seem somewhat disconnected from preceding ones. This aspect should also be enhanced by ensuring a cohesive narrative with a clear introduction, development, and conclusion.

Lines 543-549: In my opinion, this full paragraph can be removed.

Lines 563-568: Rewrite this text on a more “scientific way”. Example:

Aerial imaging using UAV has emerged as an expeditious method for object capture, with increasing significance in contemporary remote sensing due to its cost-effectiveness and high-resolution capabilities. Nevertheless, a substantial limitation in vertical aerial imaging lies in the challenge of accurately discerning the underlying conditions beneath the canopy. In absence of a plant, neighbouring plants can extend their shoots and leaves to fill the adjacent unoccupied space. Consequently, a essential consideration relies in the careful selection of the imaging period [32].

Lines 579-586: Same as previous comment. Example:

In the context of image object detection within the scope of this investigation, the fundamental principle characteristic in deep learning methodologies relies in the use of a pre-trained model for the classification of novel images. This involves the knowledge or proficiency acquired by a previously trained model when applied to new images—an approach particularly noticeable when neural networks, used on expansive datasets, are to be used in databases characterized by significantly reduced data volumes. The adoption of pre-trained models mitigates the extended training periods typically associated with neural networks [100]. It is noteworthy that the application of Transfer Learning is particularly uncommon within the agricultural domain [96]. An empirical validation advanced within this study establishes its viability and efficacy within this specific sector.

Perform the same type of rewriting in the remaining discussion.

Comments on the Quality of English Language

The level of English needs moderate improvement to a more scientific and carefully constructed style, particularly in Discussion section.

Author Response

First, we would like to thank the reviewer for his insightful comments that indeed helped shaping the manuscript. We have carefully and thoroughly revised the submission, and we believe that we have addressed all comments. To meet the required revision, we have taken all raised issues seriously, which led to modifications in writing.

In the following text, we will give our comments and answers on all raised issues.

Comment Abstract: Lines 15-18: Break down the sentence for improved readability. Example: “Vineyard zoning, achieved through the application of the K-means algorithm, incorporates geospatial data such as the Normalized Difference Vegetation Index (NDVI) and the assessment of nitrogen, phosphorus and potassium content in leaf blades and petioles. This approach enables efficient resource management tailored to each zone's specific needs”.

Author’s response: The necessary modifications to the text have been implemented.

Comment Introduction

Comment 1: Lines 133 and 135: In cases where you are referring directly to the study, avoid writing the text in this way (“in a paper [34]” and ”In the study [33]”). It is desirable to write the authors' names and, next to them, the reference number.

Author’s response: Corrected throughout the text, with examples of sentences before and after the correction provided below.

Before:

Several studies have explored 3D point clouds to identify vineyards, e.g., in a paper [34], an unsupervised algorithm for vineyard detection and evaluation of vine characteristics, based on 3D point cloud processing, was proposed. In the study [33], an automatic method for grapevine trunk detection using point clouds was proposed.

In the paper [35], an object-oriented method of image analysis for the evaluation of grapevine canopy was developed, applied to high-resolution digital models of the surface.

The study [96] proposes tree detection using high-performance deep learning suitable for real-time applications in robotics.

Also, in the study [97], the feature extraction problem in the vineyard context is solved using deep learning to detect grapevine trees with the YOLO algorithm.

In the study [90], 3D models were created for the vineyard during the winter season when only branches without vegetation were present, and based on this, the position of each tree was determined.

After:

Several studies have explored 3D point clouds to identify vineyards, e.g., Comba et al. [40], proposed an unsupervised algorithm for vineyard detection and evaluation of vine characteristics, based on 3D point cloud processing, was proposed. Jurado et al. [39], proposed an automatic method for grapevine trunk detection using point clouds was proposed.

de Castro et al. [41], proposed an object-oriented method of image analysis for the evaluation of grapevine canopy was developed, applied to high-resolution digital models of the surface.

Aguiar et al. [100] propose tree detection using high-performance deep learning suitable for real-time applications in robotics.

Also, Pinto de Aguiar et al. [101], proposed the feature extraction problem in the vineyard context is solved using deep learning to detect grapevine trees with the YOLO algorithm.

Moreno et al. [96], proposed 3D models were created for the vineyard during the winter season when only branches without vegetation were present, and based on this, the position of each tree was determined.

Comment 2: Line 153: “Numerous authors [57–61] proposed (…) ”. And not propose. Check the verb conjugation along the text.

Author’s response: The necessary modifications to the text have been implemented.

Comment 3: Line 167: Th authors refers “some papers”, however only refers 1. Rephrase the text or include more references.

Author’s response: We fixed that. An example sentence before and after the correction is given below.

Before:

Also, although there are some papers [70] that deal with defining standardized methods for delineating zones and providing recommendations on which data to use in zone delimitation, the proposed methodology in this paper is valuable because it provides the locations of grapevines with high accuracy without the need for processing point clouds.

After:

Also, although there is a study in the literature [76] that deal with defining standardized methods for delineating zones and providing recommendations on which data to use in zone delimitation, the proposed methodology in this paper is valuable because it provides the locations of grapevines with high accuracy without the need for processing point clouds.

Comment Material and methods

Comment 1: Table 1: In the first review I suggested asked the authors to give the sensor information, however, simple plain text information is sufficient. Remove the table and add the information in text.

Author’s response: In the new version of the paper, Table 1 has been removed, and the information from it has been incorporated into the preceding paragraph.

Comment 2: Lines 212 and 213: Specify the exact dates or time frames for the two time points to provide a clearer understanding of the study timeline.

Author’s response: As suggested, precise dates of the UAV images used have been added. The paragraph, now incorporating all the considered comments, appears as follows:

For this area, a series of studies were taken in 2020 and 2022 during different phases of the vine growth cycle. The research utilized a DJI Phantom P4 v2.0 drone equipped with a multispectral camera, MicaSense RedEdge-M. This sensor captures images in five bands: Blue (475 nm ± 20 nm), Green (560 nm ± 20 nm), Red (668 nm ± 10 nm), Red Edge (717 nm ± 10 nm) and Near Infrared (840 nm ± 40 nm) [77]. To perform the vine counting and locate wilted vines according to the proposed model, it was necessary to capture the vineyard at two different time points: before the start of the vegetative cycle (first half of April (02.04.2020. and 13.04.2022.)) and after the flowering stage (end of flowering and onset of veraison (21.08.2020. and 11.08.2022.)). The images taken before vegetative growth were later used to identify the grapevines (using shadows), a process that is difficult during and after the flowering stage due to leaf density. In both periods, the imaging covered the vineyard and its surroundings, and precisely these data from the surrounding vineyards were used to train a neural network, enabling automatic recognition of vines within the analysed vineyard.

Comment Discussion: As requested in the initial review, the authors have added more information to the discussion and have addressed the findings of studies along the same lines as their proposal. However, in my opinion, overall, the writing in this section should be improved. The language should be more refined, using a more scientific and formal English. Additionally, throughout the text, there are instances where paragraphs seem somewhat disconnected from preceding ones. This aspect should also be enhanced by ensuring a cohesive narrative with a clear introduction, development, and conclusion.

Author’s response: We have followed the reviewer's advice and corrected the writing to a more formal style, and by doing so, as well as by removing certain paragraphs, we have addressed the issue of disconnectedness between some paragraphs. All the corrections made in the discussion are visible in the new version of the paper.

Author Response File: Author Response.docx

Reviewer 2 Report (Previous Reviewer 3)

Comments and Suggestions for Authors

Paper seems to be carefully revised and now it has a better structure and much solid results presentation.

I have 2 minor comments:

1. Lines 66 - 80 needs several reference.

2. The explanation of why YOLO5s is used is not convincing ((lines 297-303). WoS will naturally shows more studies as it possibly used much more compared to current versions. I think this part needs a short ablation study for comparison on the performance of detection and localization performances such with YOLO 7 or 8.

Author Response

In the following text, we will give our comments and answers on all raised issues.

Comment 1: Lines 66 - 80 needs several reference.

Author’s response: We made a small oversight by omitting references in this paragraph. In the new version of the paper, this has been corrected by adding new references.

We have added new 6 publications in the reference section that are related to the research topic.

However, it's worth noting that satellite imagery still has its utility in this domain [7,8]. Depending on the spatial resolution of the images, it can be applied for management at different levels, such as the plot, row, or individual plant. Due to their coarser spatial resolution, satellite images are typically seen for vineyard-level management when distinguishing rows or individual plants is impractical [8,9]. In contrast to satellite remote sensors, UAVs offer several advantages, notably their ability to capture images with higher spatial resolution compared to satellites. This high spatial resolution enables the identification of fine details and features that are often indiscernible in satellite imagery [10]. This becomes particularly important when the pixel size is larger than the objects of interest, as is often the case in vineyards [11]. Consequently, mixed pixels emerge, where a single pixel encompasses various elements, including the above-ground sections of cultivated plants, weeds, soil, and shadows [12]. Given the narrow width of vine canopies, using images with resolutions exceeding 25 cm presents challenges related to the accurate classification of vine canopies, weeds, soil, and shadows [13].

Cogato, A.; Meggio, F.; Collins, C.; Marinello, F. Medium-Resolution Multispectral Data from Sentinel-2 to Assess the Damage and the Recovery Time of Late Frost on Vineyards. Remote Sensing 2020, 12, 1896, doi:10.3390/rs12111896.
Di Gennaro, S.F.; Dainelli, R.; Palliotti, A.; Toscano, P.; Matese, A. Sentinel-2 Validation for Spatial Variability Assessment in Overhead Trellis System Viticulture Versus UAV and Agronomic Data. Remote Sensing 2019, 11, 2573, doi:10.3390/rs11212573.
Giovos, R.; Tassopoulos, D.; Kalivas, D.; Lougkos, N.; Priovolou, A. Remote Sensing Vegetation Indices in Viticulture: A Critical Review. Agriculture 2021, 11, 457, doi:10.3390/agriculture11050457.
Atencia Payares, L.K.; Tarquis, A.M.; Hermoso Peralo, R.; Cano, J.; Cámara, J.; Nowack, J.; Gómez del Campo, M. Multispectral and Thermal Sensors Onboard UAVs for Heterogeneity in Merlot Vineyard Detection: Contribution to Zoning Maps. Remote Sensing 2023, 15, 4024, doi:10.3390/rs15164024.
de Castro, A.I.; Peña, J.M.; Torres-Sánchez, J.; Jiménez-Brenes, F.M.; Valencia-Gredilla, F.; Recasens, J.; López-Granados, F. Mapping Cynodon Dactylon Infesting Cover Crops with an Automatic Decision Tree-OBIA Procedure and UAV Imagery for Precision Viticulture. Remote Sensing 2020, 12, 56, doi:10.3390/rs12010056.
Meyers, J.M.; Dokoozlian, N.; Ryan, C.; Bioni, C.; Vanden Heuvel, J.E. A New, Satellite NDVI-Based Sampling Protocol for Grape Maturation Monitoring. Remote Sensing 2020, 12, 1159, doi:10.3390/rs12071159.

Comment 2: The explanation of why YOLO5s is used is not convincing ((lines 297-303). WoS will naturally shows more studies as it possibly used much more compared to current versions. I think this part needs a short ablation study for comparison on the performance of detection and localization performances such with YOLO 7 or 8.

Author’s response: We understand the reviewer's concern regarding the possibility of newer versions such as YOLOv7 or YOLOv8. However, we focused on YOLOv5 as it is a well-researched and widely accepted model, making it a strong and competitive option for this study. YOLOv5 has a well-established track record of successful applications in various domains, demonstrating its reliability and stability. Its widespread use and positive results reported in the literature support its effectiveness. Additionally, YOLOv5 has gained broad adoption and community support, making it more accessible and an easier choice. This is crucial to ensure that the model is easy to train and implement in different scenarios. Furthermore, YOLOv5 offers a range of model variants (YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x), allowing users to choose the model size that best suits their specific requirements. This flexibility is valuable for various applications and computational resources. We will consider the possibility of incorporating newer versions in future work for better performance comparison of detection and localization between different YOLO algorithm versions.

In the Detection of vines - You Only Look Once (YOLO) algorithm section we have changed paragraph in lines 297:

The YOLO algorithm family consists of multiple models, with YOLOv5 being easy to train and perform good reliability and stability [82]. Web of Science shows that publications based on YOLOv5 had an absolute advantage and have been widely used in the past years [83]. The selection of the YOLOv5 model for our research is based on its proven simplicity and speed, as well as its widespread use in previous studies [5,29,82,83], confirming the relevance of this model in the research community. Additionally, YOLOv5 was chosen for its popularity and availability across a wide range of applications in both industry and academic circles. Therefore, YOLOv5 remains highly competitive and was utilized in this study. YOLOv5 is a popular deep learning framework that includes five network models of different sizes: YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x [84], representing different depths and widths of the network [83].

Sozzi, M.; Cantalamessa, S.; Cogato, A.; Kayad, A.; Marinello, F. Automatic Bunch Detection in White Grape Varieties Using YOLOv3, YOLOv4, and YOLOv5 Deep Learning Algorithms. Agronomy 2022, 12, 319, doi:10.3390/agronomy12020319.
Lu, S.; Liu, X.; He, Z.; Zhang, X.; Liu, W.; Karkee, M. Swin-Transformer-YOLOv5 for Real-Time Wine Grape Bunch Detection. Remote Sensing 2022, 14, 5853, doi:10.3390/rs14225853.
Liu, Z.; Gao, X.; Wan, Y.; Wang, J.; Lyu, H. An Improved YOLOv5 Method for Small Object Detection in UAV Capture Scenes. IEEE Access 2023, 11, 14365–14374, doi:10.1109/ACCESS.2023.3241005.
Sun, Z.; Li, P.; Meng, Q.; Sun, Y.; Bi, Y. An Improved YOLOv5 Method to Detect Tailings Ponds from High-Resolution Remote Sensing Images. Remote Sensing 2023, 15, 1796, doi:10.3390/rs15071796.

Round 2

Reviewer 1 Report (Previous Reviewer 2)

Comments and Suggestions for Authors

The authors have implemented all the requested modifications. In my opinion, the paper has improved both in content and structure, and is therefore ready for publication.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In this article the authors propose a methodology for vineyards zoning and vine detection using machine learning. The authors argue that for accurately estimate yield it’s important to detect the missing vines (without a canopy) coverage.

Although the overall idea sounds interesting, I found many faults on the manuscript. The story line is not 100% clear and the author have also reported additional results that do not link with what is described in the abstract and the goals mentioned in the introduction.

Why is important to detect wine plant for yield estimation? What is your understanding of wine plant. The vine plant is composed by the trunk and the canopy. Should be noted that the canopy should not be necessary aligned with the trunk.

You derive the yield from detecting grape bunches or from the canopy vegetation? Which yield are you referring to? Could be plant vigor instead?

What is the important value of detecting individual vine plants (trunk+canopy volume)?

In detail:

Line 18 – Do you refer to canopy/vegetation gaps?

Introduction – Please update the references in the introduction. Mainly references from the last 5 years. Moreover, over the article there are references with mor than 10 years, e.g., 80, 81. You have cited the work of many authors with more than 5 years. These authors have published more recent work.

For instance:

https://www.sciencedirect.com/science/article/pii/S0168169923002892

https://www.sciencedirect.com/science/article/pii/S0168169923004970

https://www.sciencedirect.com/science/article/pii/S0168169923004398

https://www.sciencedirect.com/science/article/pii/S2772375521000058

Line 99 - Which yield are referring to? You will look on the grape bunches or other biophysical parameters from crop?

Line 121-135 - An important aspect of individuate grape vines plants is to enable the management at the plant level. For instance, assess the yield per plant, whereas a specific wine plant (per id) has a disease and it amount. This would help the farmer in plant/site specific treatment. This should be better explained that there are several spatial resolutions of interest and their advantages and advantages. You can determine yield or diseases at the patch-level, row-level, and plant -level.

Line 142 - I don't agree with this statement, check for instance this work:

https://www.sciencedirect.com/science/article/pii/S1161030122002398

As far as I understand there is a standard proposed in this study. This need of software and data processing standards workflows should also be addressed in the discussion. You are looking to solve a problem that will be an end-to-end solution which beneficiate viticulture management.

line 148 - Usually the farmer/producer provides the information on the where the vines are planted (coordinates of the stem). This is usually the case for medium-large production. What is not known is the canopy volume occupied by each plant. Could be that most the authors focus on point clouds derived from RGB images. But what is the add value of your methods with respect to previous ones? How much resources are being saved? When you generate the orthophotos will get the PC as well. Please, add this point to the discussion section and have in consideration the work of:

https://www.mdpi.com/2504-446X/7/6/349

Line 160 – This paragraph should be mentioned before in the first paragraphs form the introduction. To put in context please, also have inconsideration other application such as diseases detection. You could refer to some of the previous papers that have provided. I would suggest one or two citations per application and with references from 2018 onwards.

Line 166 – Define management zone or predict yield or both. The need is not clear.

Line 187 - In the legend of this figure you mention training and validation, but at this stage the reader is not aware which techniques will be used and what is the purpose of training and validation. Or you either explain briefly in the body of the manuscript or you indicate in the caption.

Line 195 - How was the record done?

Line 196 - What was the specific protocol used to understand if a vine is dead or not? How do you assess it? What was the protocol.

Figure 1 – There is a typo in your legend 'trening' should be training.

Line 226 - Why this important and where it was used in the study proposed?

Line 234 - What is a complex algorithm?

Line 237 - In your flowchart you present grape bunch detection. This is very different from what you have mentioned before and the goals you stated. What is the relationship between this grape bunches and the three classes that you are trying to identify?

Line 239 - Figure 2 does not say anything about detection of vines, instead it says that you are detecting grape bunches.

Line 243 - Object detection is not a technology is rather and approach or method.

Line 255 – You mention grapevines, but you have mentioned grape bunches before. Please, be consistent.

Line 258 - How do you separate the vegetation?

Line 260-267 - You can keep this and the next paragraph shorter. Yolo is well known, and it will be enough if you cite works where this algorithm was used. Preferably within Viticulture.

Figure 3 - Please complete caption with further information. The reader should be able to interpret the image without looking to the text.

Line 272 - WOS is a library. Please rewrite this sentence. Many authors…

Line 301 - The first result was giving you is the position of the stem of the vine tree which is very different from the detection of the complete and since vine tree (trunk + correspondent canopy volume).

Line 335 – Why not using k-means in the RGB images to detect the vegetation gaps? You could also assess further geometric features like the authors in:

https://ieeexplore.ieee.org/abstract/document/8755475

Could please motivate in this discussion why the approach such as the one proposed by these authors does not fit in your solution?

Line 342- K-means does not remove the noise. Please, re-write. In case you have noise the K-means will cluster the noise in one of the defined classes.

Line 352 - This has been used many times. With K-means you could even determinate there is vegetation or not, like shown by the authors in the previous article indicated. Or you could assess which plants are healthier than others. Why you needed to apply k-means over NVDI and not in RGB as proposed by the authors. Please, explain it here or on the discussion.

Line 361 - There are too many 'and' in this paragraph. Moreover, the authors have suddenly introduced the chemical analysis of leaves. It’s not clear at this stage for a reader what the links with the overall study are.

Line 365 – Not clear.

Line 371 – what is ’…’

Line 382 - Please, explain what 'accuracy means within your study' like the interpretation that you did on lines 224-232.

Line 389 - Please, explain what 'recall' means within your study'.

Line 394 - Please, explain what 'precision' means within your study'.

Line 410 - How many samples were labelled? What was the percentage for training, validation, and test? Metrics on detecting the vine trunks? Hyperparameters used, etc. Further information about this model needs to be added on the results.

Line 419 - This is the combination of all your approach. Form which steps are these results? After combining the trunks detection with the vegetation segmentation? This needs further elaboration.

Line 435 – The NVDI could perfectly detect if there is vegetation or not. Moreover, if you don't want to use the NVDI you can also use k-means like suggested before. It does not cleat the benefit. Finding gaps in the vegetation you could do it by using RGB or NVDI or combining both.

Line 451 - Where this belongs to? Not clear in the complete map where zone 1 and 2 are. Integrate everything in one picture where we can see the different steps/results.

Line 453 - From this point onwards it’s not clear what is the link between these results and the goals proposed. This looks like another study.

Line 473 - Discussion is quite short. Please, develop further and compare with previous related works where these approaches have been used. What were their results in comparison yours and why? Find links with other works and applications. Future works does not say much. Are the limitations and foreseen improvements. Could you indicate some other works from the literature that you would like to build on?

Comments on the Quality of English Language

English spelling and grammar should be checked by a language editing professional.

Reviewer 2 Report

Comments and Suggestions for Authors

Despite the interesting idea (performing flights before the beginning of the vegetative cycle to enable vine detection through shadows) and the algorithm showing promising results, the manuscript presents several shortcomings: the abstract and introduction presents confusing structures; the discussion is notably weak; the image quality is weak, among other issues. In terms of English language, in my opinion, the text should be reviewed. There are some sentences that are somewhat challenging to read, and certain terms/words used, in my view, do not align with the careful and formal English language that an article should have. Throughout the text, I identified some examples and suggested the necessary modifications. I marked these examples with *English improvement* at the beginning of the comment along this review document. However, the authors do not need to strictly follow the modifications that I will suggest. Feel free to use other words; however, please maintain the coherence of formal and well-structured English. Moreover, beyond my suggestions, I kindly request the authors to review the article and make an effort to improve this aspect and increase the readability.

Following, I outline the problems and flaws that I identified throughout the chapters.

Abstract

The structure of the abstract is a bit confusing and could be improved. It starts by highlighting the significance of precision viticulture however without introducing the problem that it aims to address. Subsequently, the authors immediately present the research goal, and then "go back" to identify the problem before revisiting their research goal. This sequencing can cause some confusion for readers.

The authors should rewrite the abstract, and follow a structure similar as:

(1) Introduce the topic and its importance briefly.

(2) Identify the specific problem within this topic that you aim to address.

(3) State your research goal and main thesis.

(4) Describe the methodology you used to address the identified problem.

(5) Highlight the main findings and conclusions of your work.

(6) Explain how your work contributes to the existing knowledge on this topic.

1. Introduction

In general, the introduction needs a complete restructuring. The abrupt shifts in topics between paragraphs with no apparent connection make it somewhat challenging to read. Additionally, the paragraphs are scattered throughout the introduction; the authors start discussing topic X in one paragraph, then switch to topic Y in the next, and later return to topic X. All of this needs to be revised. Try to keep topics within their respective paragraphs and avoid jumping between different topics. Furthermore, the introduction contains too many paragraphs. For example, from line 99 to line 120, there are four paragraphs that, in my opinion, are poorly organized. It's evident that there are topics between these paragraphs that should be within the same paragraph. I will conduct a detailed review of the topics in each paragraph to illustrate how the structure is somewhat confusing:

Paragraph 1 (line 30): Importance of viticulture in Serbia.

Paragraph 2 (line 36): Continuation of the importance of viticulture in Serbia with some more detailed data. (Consider merging with the previous paragraph - same topic)

Paragraph 3 (line 44): Importance of remote sensing and its applications.

Paragraph 4 (line 53): Definition of precision viticulture.

Paragraph 5 (Line 63): Evolution of sensors and it ends with a goal of precision viticulture (shouldn't it be in the previous paragraph?)

Paragraph 6 (line 69): Process for adopting precision viticulture. (Again, shouldn't it be in the previous paragraph?)

Paragraph 7 (line 75): Comparison between satellite and UAV platforms. I think it would make more sense to place this after paragraph 3.

Paragraph 8 (line 84): Back to the issues encountered in precision viticulture. Perhaps add this information to paragraph 3 where remote sensing importance is discussed.

Paragraph 9 (line 91): Continuation of the previous paragraph. They should be merged.

Paragraph 10 (line 99): Importance of identifying and counting missing vines in vineyards.

Paragraph 11 (line 108): Continuation of the previous topic. These paragraphs should be combined.

Paragraph 12 (line 111): Returning to remote sensing platforms used in vineyards. This topic has already been discussed. Integrate it into the appropriate paragraph.

Paragraph 13 (line 116): Motivation for practicing precision viticulture. (Shouldn't it follow paragraph 4 or even be included within it?)

Paragraph 14 (line 121): Methods for delimiting management zones for managing spatial variability.

Paragraph 15 (line 130): Continuation of the previous paragraph. Consider merging these two.

Paragraph 16 (line 137): Issues encountered in the methods for delimiting management zones. (Should also be in the same paragraph as the previous one)

Paragraph 17 (line 147): Studies detecting and counting vineyards.

Paragraph 18 (line 160): Again, we return to the topic of remote sensing in precision viticulture. This paragraph is out of context. Integrate it into the correct paragraph.

Paragraph 19 (line 164): Identification of the main goal of the work. However, it lacks an explanation of its primary contribution.

Paragraph 20 (line 169): Explanation of the article's structure. (Is this paragraph really necessary?)

More flaws in introduction:

Line 34 (*English improvement*): While "very difficult" is an appropriate description, you can increase accuracy by specifying the nature of the challenges. For example, you might replace "very difficult" with a more descriptive phrase, such as "facing significant challenges" or "in a state of crisis."

Line 42: When acronyms are introduced for the first time in the paper, they should be written out in full form. The full form of GIS should be written, with the acronym in parentheses: Geographic Information System (GIS).

Lines 44 – 62: Add references along these two paragraphs.

Lines 44 – 47(*English improvement*): Rewrite this sentence, for instance: "Remote sensing in agricultural production is diverse, including detection of chlorophyll content in plants, assessment of plant health and water status, soil moisture measurement, weed and pest detection, creation of maps for selective spraying and fertilization, among others". (By the way, this sentence needs reference)

Lines 54 – 56: I disagree with this statement. The authors seem to imply that precision viticulture generally only studies variability in vineyard conditions, which is incorrect. This is just one of the issues that precision viticulture seeks to address; however, it is not the only one, nor the most significant (in my opinion). There are other issues that precision viticulture aims to address, such as resource efficiency, disease and pest management, optimal harvest timing, water management, among others. Alternatively, the authors should revise this statement to avoid suggesting that precision viticulture is exclusively focused on variability in vineyard conditions but rather that it is one of the topics addressed by this approach. They could even mention the other topics addressed by precision viticulture and then provide more detailed explanations of their intentions.

Lines 69 – 74: In my opinion, these topics can be presented in continuous text, and there is no need for enumeration in this style. However, this is only a personal preference that I believe would enhance the aesthetics of the introduction. If the authors prefer to keep it this way, there is no issue.

Lines 69 – 74 (*English improvement*): Make sure the format is consistent in the list of topics. The first topic proceeds with "Collection of data on vineyards", but subsequent steps can follow a similar pattern, such as "Interpretation of data" and "Development and implementation of a targeted management plan based on the analysis".

Lines 75 – 77: Add citation

Lines 77 – 78 (*English improvement*): This sentence could be revised for greater clarity. For instance, you could say, "This enables the detection of details and features that are typically not discernible in satellite imagery."

Lines 78 – 79 (*English improvement*): Consider to replace “are large in relation to the observed objects” with “are larger than the observed objects”.

Lines 79 – 81 (*English improvement*): This sentence might benefit from a more explicit explanation. You could say for instance: "As a result, mixed pixels occur, where a single pixel includes above-ground parts of the cultivated plants, weeds, soil, and shadows."

Lines 82 – 83 (*English improvement*): Instead of "problems related to the wrong classification," consider saying "challenges associated with incorrect classification."

Lines 87 – 90 (*English improvement*): Please consider rewriting it as “These challenges have urged the development of new technologies that use data from advanced sensors, including data from Unmanned Aerial Vehicles (UAVs) and the application of artificial intelligence. These advancements aim to enhance productivity, improve quality and boost economic competitiveness.” Furthermore, as the acronym UAV appears for the first time in the manuscript, it should be written in its full form.

Lines 91 – 93 (*English improvement*): Please consider rewriting it as "Over the years, issues such as diseases and mechanical damage have caused the loss of plants, resulting in a decrease in the initial number of vines per hectare. As a result, farmers experience a significant reduction in potential wine production."

Lines 95 – 98 (*English improvement*): Please consider rewriting it as “However, vertical aerial photography cannot capture below the canopy, and, when a plant is missing, neighbouring plants can expand their shoots and leaves to fill the nearby adjacent space [9]”.

Lines 116 – 117 (*English improvement*): Please consider rewriting it as “The widely adopted practice of uniform vineyard management results in lower productivity, inefficient resource use and adverse environmental effects [15–17]”

Lines 130 – 132 (*English improvement*): Please consider rewriting it as “Numerous authors [38–42] propose zoning through the simple categorization of values, such as vegetation indices, into a specific number of categories or classes, each containing an equal number of objects (pixels)”.

Lines 142 – 145 (*English improvement*): Please consider rewriting it as “However, in the literature, there is no established rule for choosing these locations, in which, they are often selected randomly. Additionally, it was not employed a standardized method for defining management zones, nor are there recommendations on which data should be used in the zone delineation process.”

Lines 160 – 163 (*English improvement*): Please consider rewriting it as "In the field of precision viticulture, remote sensing is used for numerous purposes, including yield estimation [51–55]. Additionally, computer image processing and artificial intelligence techniques are applied for detecting inflorescences [56,57], vines [2,8,58–61] and even cluster berry detection [62]."

2. Material and methods

Regarding the acquired remote sensing data and associated equipment, there is a lack of some information: What specific wavelengths does the MicaSense sensor capture? What is the resolution of each sensor? What was the ground sample distance (GSD) for each RGB and multispectral data? The authors should specify this information in this chapter.

Lines 177 – 180 (*English improvement*): Please consider rewriting it as “The study area includes the valleys of Fruška Gora Mountain, characterized by meadows and pastures, while its slopes are covered with orchards and vineyards. Some parts of the mountain rise to heights exceeding 300 meters above sea level (ASL) being covered with dense deciduous forest.

Lines 180 – 182: In my opinion, this information does not add significant value to this section. It should be removed. However, if the authors consider this information important, they may want to consider incorporating it into the introduction.

Lines 184 – 185: Same comment as the previous lines.

Line 189: “was planted in 1996

Lines 190 – 194 (*English improvement*): Please consider rewriting it as “Initially, the vineyard block consisted of 5880 vines (2940 locations with two grafts each). However, in the test area with 1.2 ha (marked in red in Figure 1), where the controls were conducted, there were a total of 2442 locations with two grafts each.”

Lines 194 – 197 (*English improvement*): Please consider rewriting it as “The actual presence of vines was determined manually by walking through the vineyard and recording the status of each vine. Three situations were observed: live vines, missing vines and dead vine but its trunk is still present (wilted vines)."

Lines 200 – 202: Why did the authors choose these two dates? Was it intentional, coincidental, or because there are some advantages in terms of remote sensing to use these specific periods? The authors should provide justification for their choice.

Figure 1: Typo in the legend (Vineyards for trening). Moreover, the image quality of the legend should be improved.

Line 206: Write the full form of DSM and DTM.

Line 208: Write the full form of GNSS and RTK.

Line 209: Write the full form of GCP.

Line 212: Maintain the consistency of the text. Previously it was written “above-ground”

Line 214: Consider to replace the “drone” word with UAV along the document.

Line 224: I know that it is, Blue, green, red, rededge and near infrared, but, for those who’s not from the area, this is not clear. Write the full form of each letter.

Lines 226 – 231: Why were chemical analyses performed on the leaves? For what purpose? The authors should provide justification for these analyses, even if it's a brief introduction to what will be done with the results of the analyses. For example, they can mention that these analyses will be useful for a later phase of the algorithm for classifying management zones.

Figure 2: The image quality should be improved. It is difficult to read the text inside the boxes. Furthermore, the flowchart is not well-designed. The diamond-shaped box is a decision box, where the answer is "yes" or "no". In this case, the first decision is not "Selection of images." The decision at this stage should be: “Does the image contain vegetation?” If “yes”, it will proceed to the bottom part of the flowchart; otherwise, it will go to the top part. These "yes" and "no" should be written on their respective arrows. The same applies to the decision diamond in the middle part of the algorithm. I couldn't even understand what decision was made here. What causes the flowchart to have different paths at this stage? Additionally, I believe it would be aesthetically more appealing to have only one "end" instead of two.

Line 250: Write the full form of RCNN.

Line 253: Write the full form of YOLO and SSD.

Line 260: The reference should be beside the name of the authors. The text should be: "In 2016, Redmon and Farhadi [66] proposed the YOLO model, which is a one-stage network".

Line 261: The full form of YOLO should be performed on the first appearance of the word in the document (Line 253).

Lines 262 – 263 (*English improvement*): Please consider rewriting it as “It distinguishes itself from other object detection algorithms by observing the image only once”.

Figure 3: Only in this image did I understand what the authors mean by images with or without vegetation (by checking both images). I always thought that when the authors referred to images with vegetation, it would be ground vegetation existing between rows, not the actual vineyard. Here, I understood the reason for the two distinct phases of image collection: before the vegetative growth begins and during the flowering stage. The images taken before vegetative growth were later used to identify the grapevines (using shadows), a process that is difficult during the flowering stage due to leaf density. However, this is not explained in the text. It should be. The rationale for choosing these two phases should be clarified in earlier chapters. Furthermore, in my opinion, the authors should not label the images as with or without vegetation because it can be misleading, as it happened to me. They should, for example, label them as images taken before the start of the vegetative cycle and images taken after flowering stage. I believe that in Figure 2, this distinction would also provide more clarity to the process.

Lines 295 – 300 (*English improvement*): Please consider rewriting it as “The training dataset, created in this way and used to train the neural network, enables the transfer and the detection of vines in another vineyard without the need of remarking or creating new training sets. Additionally, there is no requirement to retrain the network. When applying the previously trained model to a new dataset, results are obtained rapidly, with marked bounding boxes around each vine. The number of bounding boxes corresponds to the number of vines.”

Lines 302 – 304 (*English improvement*): Please consider rewriting it as “However, as previously mentioned, several events, such as diseases or mechanical damage, can lead to the drying of vine trees, thus reducing the number of living vines that will develop later”

Lines 307 – 308 (*English improvement*): Please consider rewriting it as “Then, the Normalized Difference Vegetation Index (NDVI) is calculated to differentiate between living and dead vines.”

Line 308: How this threshold values was estimated? Manually or using a thresholding algorithm such as Otsu?

Lines 312 – 314 (*English improvement*): Please consider rewriting it as “Initially, the vine identification process involves adjusting the boundary frames containing the identified shadows (Figure 7-a) to include the entire width of the vine row (Figure 7-b).”

Lines 315 – 317 (*English improvement*): Please consider rewriting it as “If those polygons are within the adjusted bounding box, it indicates the presence of a live vine. Conversely, if the boundary is empty, it indicates a dead vine.”

Figure 7: In this image, there is a missing legend explaining the meaning of: blue rectangle; red rectangle; red lines inside red rectangle and yellow dot.

Lines 322 – 324: Here, the authors identify the 'ideal phase' for capturing UAV images for the algorithm's effective operation in detecting live vines. However, it should also be emphasized that for the algorithm to function optimally, whether capturing UAV images in the early stage of the vegetative cycle or the late stage, this should be done at specific times of the day. For instance, if the images were captured around solar noon, shadows would be almost inexistent, causing errors in the algorithm's operation. Moreover, is there also a 'maximal time' for these data to be captured? In other words, if the flight is performed later in the afternoon when shadows have a larger and less dense projection, will the algorithm face challenges?"

Line 336: The full form of UAV should be performed on the first appearance of the word in the document (Line 89).

Lines 336 – 337 (*English improvement*): Please consider rewriting it as “Being an unsupervised learning method, it does not require labelled data for training.”

Lines 347 – 348 (*English improvement*): Please consider rewriting it as “The next step in the proposed model is to exclude the inter-rows, which are the spaces between rows in a field where non-vine vegetation or bare soil may exist."

Lines 352 – 354 (*English improvement*): Please consider rewriting it as “Although this point selection is logical, it has not been widely applied in the existing literature."

Lines 361 – 364 (*English improvement*): Please consider rewriting it as “Based on the newly created raster’s representing NDVI without the influence of inter-row vegetation and raster’s created from chemical analyses of leaves and stem content of nitrogen, phosphorus and potassium, management zones are defined using the K-means clustering algorithm (Figure 8)." Furthermore, the first time a table or image is referenced in the text, this citation should precede the appearance of the table or image itself. Thus, mention the figure 8 in preceding text or place the image after this text.

Lines 367 – 369 (*English improvement*): Please consider rewriting it as “If spatial information is desired for grouping, the data concerning object locations must be appropriately adapted for algorithmic processing."

Lines 370 – 372 (*English improvement*): Please consider rewriting it as “When defining the management zones, it was also incorporated the location information (coordinates) along with the previously calculated attributes.”

Lines 383 – 384 (*English improvement*): Please consider rewriting it as “In addition to visual assessment, numerical or quantitative evaluation plays an important role in assessing the accuracy of the obtained results.”

Formula 4: The word "recall" is misspelled

3. Results

Lines 407 – 408 (*English improvement*): Please consider rewriting it as “The results show a high level of accuracy in prediction, with no instances of misidentification (pillars are distinguished from vines)”

Figure 9: The acronyms should be written in full form in the legend of the figures

Lines 411 – 412 (*English improvement*): Please consider rewriting it as “To fully test the vineyard detection algorithm, it was conducted three separate combinations.”

Lines 414 – 418 (*English improvement*): Please consider rewriting it as “The second combination consisted on training on the 2020 image and applied vineyard detection to the 2022 image. The third combination used the 2022 image for training purposes and subsequently detecting vines within the same 2022 image. The results were obtained through the detection of living and dead vines using the algorithm described previously, in combination with field analysis (vine counting) in the vineyard.”

Line 421: Again, in the first time, the table should be mentioned in text before its appearance.

Lines 424 – 427 (*English improvement*): Please consider rewriting it as “During results analysis, the data obtained from the applied vine detection were compared with reference data to visually represent the number of True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN) objects.”

Lines 427 – 432: In this case, the authors do not need to specify the colours in the text since they are labelled in the figure with TP, TN, FP and FN alongside them. Here, it is sufficient to describe the meaning of TP, TN, FP, and FN. For instance: “Figure 10 illustrates these results as follows: TP represents correctly identified living vines; FP denotes instances where the algorithm mistakenly recognized non-existent vines in the field; FN represents vines that exist in the field but were not recognized by the model; and TN indicates vines that do not exist in the field and were correctly identified as wilted chocks.”

Figure 10: The image quality should be improved, and the legend made more visible.

Line 439: So, which of the combinations described earlier was used to create these images in Figure 11?

Figure 11: Insert the NDVI values legend into the image.

Lines 443 – 446 (*English improvement*): Please consider rewriting it as “In this study, it was assigned weights by giving a value of 0.15 to the coordinates (0.15/2 for each coordinate) and distributed the remaining 0.85 (0.85/7 for each of the 7 attributes – NDVI, nitrogen, phosphorus and potassium for leaf blades and petioles) to the other attributes.”

Lines 447 – 450 (*English improvement*): Please consider rewriting it as “This approach aims to create management zones in vineyards that preserve both spatial and spectral properties. The results of the grouping (zoning) approach with variable weights for coordinates and attributes are illustrated in Figure 12."

Line 465: To affirm that the two zones are statistically different, the authors should conduct a statistical analysis, such as ANOVA, for each variable.

4. Discussion

The discussion requires significant improvement. In my opinion, the discussion of an article constitutes the most important part of the work; however, in this case, it is the least developed chapter. The authors should discuss other works in the literature with the same objective as the present article and make comparisons: the results, differences between methods employed, limitations encountered in each, among other factors. Furthermore, I believe that performing statistical analysis (e.g., ANOVA) on the data in Figure 13 would bring more discussion about the differences found between zones.

Lines 482 – 483: I disagree when authors affirms that only RGB+NIR imagery of the vineyard were used. In addition to remote sensing data, according to the methodology presented here, biochemical analyses of vine leaves and petioles are required.

Lines 507 – 508: Didn’t understand what authors affirms in this phrase.

References

The structure of the references does not conform to the journal's template. Below, I provide how they should be structured:

1. Author 1, A.B.; Author 2, C.D. Title of the article. Abbreviated Journal Name Year, Volume, page range.

2. Author 1, A.; Author 2, B. Title of the chapter. In Book Title, 2nd ed.; Editor 1, A., Editor 2, B., Eds.; Publisher: Publisher Location, Country, 2007; Volume 3, pp. 154–196.

3. Author 1, A.; Author 2, B. Book Title, 3rd ed.; Publisher: Publisher Location, Country, 2008; pp. 154–196.

4. Author 1, A.B.; Author 2, C. Title of Unpublished Work. Abbreviated Journal Name year, phrase indicating stage of publication (submitted; accepted; in press).

5. Author 1, A.B. (University, City, State, Country); Author 2, C. (Institute, City, State, Country). Personal communication, 2012.

6. Author 1, A.B.; Author 2, C.D.; Author 3, E.F. Title of Presentation. In Proceedings of the Name of the Conference, Location of Conference, Country, Date of Conference (Day Month Year).

7. Author 1, A.B. Title of Thesis. Level of Thesis, Degree-Granting University, Location of University, Date of Completion.

8. Title of Site. Available online: URL (accessed on Day Month Year).

Comments on the Quality of English Language

· The level of English needs significant improvement to a more scientific and carefully constructed style.

Reviewer 3 Report

Comments and Suggestions for Authors

I reviewed the paper and must say that it is not suitable for publishing in its current form, no additional method is evaluated and expereimental region is too small that makes the generalizability as an important question

1. Abstract should more focus on the applied methodology and more detailed quantitative results.

2. The Introduction has a flow and reference problem. The first two paragraphs more suited to Study Area Section, the third and foruth paragraphs (lines 44-62) needs several references.

3. In line 75, Authors should extend the history of satellite image based studies (with references), then support how UAVs have advantages. There are severl studies applied VHR satellite images (Worldview etc.) and achived succesful results. And what about limitations of UAVs (such as limited coverage yielding much local studies).

4. Authors should provide a strong aims and significance of the study paragraph before the last structure papragraph.

5. In Figure 1 not "trening" but "training" in the legend.

6. Metholodogy is superficially presented, it is not convincing why Authors used YOLOv5 instead of newer versions ( provided reference 68 not compares YOLO 5 versus YOLO7 or 8), it is not clear why shallowest version of it (5s) was used (is it bjust because it is the lightweight version), and lastly why these hyper parameters are selected are needing important revisions.

7. Detection of Management zones seems the initial step and then detection of vines should be performed. If I am right the sections are not in the proper order.

8. Moreover, what about the accuracy of management zones detection.

9. In keywords please remove CNN and neural networks and add K-means

10. In lines 518-519 I gues VHR images will be much close to UAV so it should be considered.

11. Conclusion again is superfically mentions just as "proposed method" but not provide an explanation about it.

Article Menu

Vineyard Zoning and Vine Detection Using Machine Learning in Unmanned Aerial Vehicle Imagery

Further Information

Guidelines

MDPI Initiatives

Follow MDPI