Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Training Artificial Intelligence Algorithms with Automatically Labelled UAV Data from Physics-Based Simulation Software

Appl. Sci. 2023, 13(1), 131; https://doi.org/10.3390/app13010131

by Jonathan Boone¹, Christopher Goodin²

, Lalitha Dabbiru², Christopher Hudson²

, Lucas Cagle^2,* and Daniel Carruth^2,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2023, 13(1), 131; https://doi.org/10.3390/app13010131

Submission received: 28 November 2022 / Revised: 13 December 2022 / Accepted: 16 December 2022 / Published: 22 December 2022

(This article belongs to the Topic Artificial Intelligence in Sensors)

Round 1

Reviewer 1 Report (New Reviewer)

In this contribution, the authors developed a software to automatically generate labeled image data for training and testing ML. The study was well designed and carried out and scientifically sound; however, there are a few minor points that needed to be addressed before final publication in Applied Sciences.

1. The title is too broad – maybe consider adapting words e.g. “UAV”, “labeling”, and/or “software” in the title.

2. The reviewer didn’t capture much details of the “physics-based simulations” – could the authors clarify further? E.g. the simulation details and key equations.

3. A. In Section 5 (objective detection), please provide the hyperparameters of the model, hyperparameter optimization procedure, learning curve, etc.

B. Metrics other than “precision” are needed to evaluate the model. Were this 77.99% accuracy from training set or test set? Was there a validation set when tuning the model?

4. Minor issue: References needed for CNN (line 52), RCNN (line 66), OIDN (line 259), and FPN (line 346).

Author Response

Thank you for your comments. Please see the responses.

Author Response File: Author Response.docx

Reviewer 2 Report (New Reviewer)

Review Report: applsci-2095443 Title: Training Artificial Intelligence Algorithms with Physics-Based Simulations

Comments: I have carefully reviewed the paper and found that the results of the paper look interesting. Overall, the paper and subject are related to the journal's scope and interesting topic for the readers. Therefore, the manuscript can be accepted for publication after the major revisions.

1. Abstract is hard to follow. Revise by adding some abstractable outcomes.

2. The introduction section needs serious changes. It must highlight, what the problem is? Why implementing this technique? What are the drawbacks in previous studies.

3. Further, it is suggested to extend the literature by adding the recent work related to the presented study. Authors must consider the following work “ A memetic algorithm based on two_Arch2 for multi-depot heterogeneous-vehicle capacitated arc routing problem.” “Learning to Detect 3D Symmetry From Single-View RGB-D Images With Weak Supervision.”, “Ore Image Classification Based on Improved CNN. Computers & electrical engineering,” “Multimodal Fusion Convolutional Neural Network With Cross-Attention Mechanism for Internal Defect Detection of Magnetic Tile”, “Deep Feature Interaction Network for Point Cloud Registration, With Applications to Optical Measurement of Blade Profiles”, “OCEAN Personality Model Construction Method Using a BP Neural Network.”, “Modeling Relation Paths for Knowledge Graph Completion.”, “Learning practically feasible policies for online 3D bin packing. Science China. ”, “Global and Local-Contrast Guides Content-Aware Fusion for RGB-D Saliency Prediction.” “ROSEFusion: random optimization for online dense reconstruction under fast camera motion.”, “ GRASS: generative recursive autoencoders for shape structures.”, “2D/3D Multimode Medical Image Alignment Based on Spatial Histograms. ”, “Reconstruct Dynamic Soft-Tissue With Stereo Endoscope Based on a Single-Layer Network.” “Dual-Graph Attention Convolution Network for 3-D Point Cloud Classification.”

4. The deep neural network possess a multiple hidden layers then why authors have adopted CNN rather then using deep neural networks? It must be clearly stated.

5. The parameter setting for the ML technique’s is not presented.

Author Response

Thank you for your comments. Please see the responses.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report (New Reviewer)

The authors have done their level best to answer my comments. Therefore, the manuscript can be accepted for publication in this esteemed journal.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

The paper introduces an interesting workflow that automatically generates the labeled synthetic image data by sampling real dataset. Then the data is labelled through an Image Labeler. Finally RetinaNet is trained with the data and examples of plums and building classifications are demonstrated. Overall I think the paper is with high quality and publishable. There are still a few points that need to be addressed.

1. It looks inappropriate for me to include the workflow diagram and example results for plums and building classifications in Conclusion section. The authors can consider putting the worflow diagram in the beginning of methodology section. As for the results, they should be described before the Conclusion section.

2. In the workflow, Matlab Image Labeler was used to label the data. Why do we want to use the trained RetinaNet for object detection if the performance is worse than the labeler? The authors need to clarify this.

3. In Conclusion section, the citaiton is missing for "For the object detection task of detecting buildings (Error! Reference source not 350 found.), "

4. Figure 6 is too big, consider reformatting it.

5. Figure 11 and Figure 12 are too vague, consider increasing the fontsize and resolution.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper presents a pipeline to automatically label synthetic images to train CNNs. However, the manuscript feels incomplete as I found sections lacking in content, a missing detailed discussion of the proposed approach compared to previous works, inconsistent citation style and presentation of references. Section 5 (Conclusions) does not present any conclusions. Instead, authors keep expanding on methods and results.

Overall, it is unclear how to decide whether augmenting the training dataset to increase performance metrics is more convenient than discarding a chosen CNN architecture and select another one.

Detailed comments:

[Lines 54-55]: Please elaborate why CNNs are a very useful tool for machine learning applications. Perhaps a high-level diagram of CNNs could help readers understand a typical topology of them and how they excel in retrieving features from image rasters.
[Line 56]: Revise correct name of R-CNN acronym.
[Line 60-62]: Second sentence clause is not well connected to the first one.
[Lines 63-64]: Please expand on anchor boxes for better clarity on readers that are not experts in object detectors using CNNs.
[Lines 76-78]: Vague sentence, it is not clear what makes a trained model sufficiently accurate to be deployed to production.
[Lines 86-87]: Discussion of previous research on automated image labelling techniques and their limitations is missing. Please expand.
[Line 99]: Missing reference to open-source library and MAVS.
[Line 104]: Define LiDAR acronym.
[Lines 103-105]: "cameras" repeated in the list of emulated sensor systems. The sentence also suggests that ray-tracing is used for all the listed items, which might not be true at all.
[Lines 107-115]: Five consecutive sentences begin with "MAVS". Please avoid sounding repetitive by increasing the variety of words used at the beginning of each sentence.
[Line 153]: Incorrect citation to paper. Please review instructions for authors and select a single consistent citation style.
[Line 153-154]: It should be great to understand why this is the most important factor (or why other factors were not considered).
[Line 178]: it might be easier for the reader to refer to that section by its number instead.
[Line 230]: Please add details on software used and procedure to generate the orthomosaic. Other important details about the UAV campaign are missing. What was the overlap and sidelap of collected rasters, speed of the UAV, and focal length of RGB camera?
[Figure 7]: It is unclear to differentiate between a and b, c and d, and e and f. Please expand the caption.
[Figure 8]: A legend of colour-coded classes for the right-hand figure is missing.

Author Response

Point 1: [Lines 54-55]: Please elaborate why CNNs are a veryuseful tool for machine learning applications. Perhaps a high-level diagram of CNNs could help readers understand a typical topology of them and how they excel in retrieving features from image rasters.

Response 1: Great comment. The following narrative was added following to clarify why CNNs are useful tools in machine learning applications. The CNN architecture implicitly combines the benefits obtained by a standard neural network training with the convolution operation to efficiently classify images. Further, being a neural network, the CNN is also scalable for large datasets, which is often the case when images are to be classified. A schematic diagram of a CNN architecture was added as well.

Point 2 [Line 56]: Revise correct name of R-CNN acronym.

Response 2: Great comment. The correction was made to identify the R-CNN acronym as “Region Based Convolutional Neural Network”.

Point 3: [Line 60-62]: Second sentence clause is not well connected to the first one.

Response 3: Great comment. The following revisions were made “Recent work focuses on one-stage detectors such as You Only Look Once (YOLO) [8] and single-shot multibox detector (SSD) [9]. RetinaNet [10] one-stage object detection models have demonstrated promising results over existing single stage detectors.”

Point 4: [Lines 63-64]: Please expand on anchor boxes for better clarity on readers that are not experts in object detectors using CNNs.

Response 4: Great comment. The following revisions and clarifications were made. In this work, we implemented the RetinaNet framework as its design features an efficient feature pyramid and uses anchor boxes which the model uses to predict the bounding box for an object. This aids in predicting the relative scale and aspect ratio of specific object classes. The model works well even with a limited training dataset and gives excellent detection accuracy.

Point 5: [Lines 76-78]: Vague sentence, it is not clear what makes a trained model sufficiently accurateto be deployed to production.

Response 5: Great comment. The following revisions and clarifications were made. If the model is deemed sufficient to produce accurate predictions with new data, it may be deployed to production. However, if the model is deficient, it may be improved by improving the training dataset and by adjusting the parameters of the learning process. Different models will require different thresholds that are determined on a case-by-case basis.

Point 6: [Lines 86-87]: Discussion of previous research on automated image labelling techniques and their limitations is missing. Please expand.

Response 6: Great Comment. In this paper, we have generated simulated data using the MAVS simulator [11]. We have added the following sentences to the manuscript on auto labeling of this data.

“MAVS [11] automatically labels the training data during the simulation, avoiding the tedious and time-consuming process of hand-labeling data. Each object in the simulated scene was assigned a semantic label prior to the simulation.”

Point 7: [Line 99]: Missing reference to open-source library and MAVS.

Response 7: Great comment. Reference [11] has been added which describes MAVS library. So, all the references from [12] have been incremented by 1.

Point 8: [Lines 104]: Define LiDAR acronym.

Response 8: Great comment. The acronym was defined.

Point 9: [Lines 103-105]: "cameras" repeated in the list of emulated sensor systems. The sentence also suggests that ray-tracing is used for all the listed items, which might not be true at all.

Response 9: Great comment. The repeated word “cameras” has been removed. We have used ray-tracing for lidar, camera and GPS systems. We removed IMU from the listed items.

Point 10: [Lines 107-115]: Five consecutive sentences begin with "MAVS". Please avoid sounding repetitive by increasing the variety of words used at the beginning of each sentence.

Response 10: Great comment. The repetitiveness of the word “MAVS” has been corrected.

Point 11: [Line 153]: Incorrect citation to paper. Please review instructions for authors and select a single consistent citation style.

Response 11: Great comment. The citation was modified.

Point 12: [Line 153-154]: It should be great to understand why this is the most important factor (or why other factors were not considered).

Response 12: Great comment. The model described by authors in [16] stated that “most important factor is the plants ability to compete for sunlight”. This sentence is not required in this context, so we removed it.

Point 13: [Line 178]: it might be easier for the reader to refer to that section by its number instead.

Response 13: Great comment. Line 178 is now Line 190. The sentence has been corrected to “Mitigation techniques for slow image generation will be discussed in section 3, “Methods for Generating Randomized UAV Images”.

Point 14: [Line 230]: Please add details on software used and procedure to generate the orthomosaic. Other important details about the UAV campaign are missing. What was the overlap and sidelap of collected rasters, speed of the UAV, and focal length of RGB camera?

Response 14: Line 230 is now Line 235. Real images were acquired using the DJI Phantom 4 flown over the H. H. Leveck Animal Research Center, often referred to as South Farm. The data were acquired in July 2019 with the built-in Phantom-4 Pro RGB camera. The camera has a resolution of 4864X3648 pixels, focal length of 8.8 mm, and a 70° field-of-view. The flight altitude was 122 meters (400 feet) and the flight speed was around 7.1 meters/second. A total of 119 images were analyzed for this project. The images consist primarily of open fields and pastureland bordered by both mature and scrub trees. In addition, the images contained some residential areas, barns and other farming structures, as well as paved and dirt roads. An orthomosaic created from all 119 images using Agisoft Metashape with overlap and sidelap average of 80% is shown in Figure 6.

Point 15: [Figure 7]: It is unclear to differentiate between a and b, c and d, and e and f. Please expand the caption.

Response 15: Great comment. We have expanded the Figure 8 caption as “Figure 8: Sample Raw Images on the left (a, c, and e) and Denoised Images on the right (b, d, and f). Images (a,b) 1 rays per pixel, (c,d) 100 rays per pixel, and (e,f) 10,000 rays per pixel”

Point 16: [Figure 8]: A legend of colour-coded classes for the right-hand figure is missing.

Response 16: Great comment. Figure 8 has been revised to Figure 9. A legend for the color-coded classes has been added to this figure.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

General Comments

This new version addresses many of the inline comments from revision 1.

However, general comments on the novelty and impact of the paper were not addressed. I highly appreciate if authors indicate the line numbers to point inline text changes.

The manuscript feels incomplete as I found sections lacking in content, a missing detailed discussion of the proposed approach compared to previous works. Section 5 (Conclusions) does not present any conclusions. Instead, authors keep expanding on methods and results.

Overall, it is unclear how to decide whether augmenting the training dataset to increase performance metrics is more convenient than discarding a chosen CNN architecture and select another one.

Inline Comments:

[Lines 55 - 59]: These sentences are vague, they do not explain why CNNs are suitable for image classification. Use Fig. 1 to elaborate on this. Compared to ground-based images, aerial datasets are limited.
My previous point on other research works has not been addressed "Point 6: [Lines 86-87]: Discussion of previous research on automated image labelling techniques and their limitations is missing. Please expand." Also, MAVS acronym hasn't been defined yet. Without this discussion, it is challenging for the reader to understand the real novelty and impact of your research.
Section 3.1: SI units are inconsistent. In some phrases, authors use "mm", in others they use "meters". Please choose a single style to refer to the units.
In-text references of Figures are outdated in section 3.2.4
A section of conclusions is missing in the paper, and should synthesise the primary research findings and future work.

Article Menu

Training Artificial Intelligence Algorithms with Automatically Labelled UAV Data from Physics-Based Simulation Software

General Comments

Inline Comments:

Further Information

Guidelines

MDPI Initiatives

Follow MDPI