Medium-Sized Lake Water Quality Parameters Retrieval Using Multispectral UAV Image and Machine Learning Algorithms: A Case Study of the Yuandang Lake, China
Round 1
Reviewer 1 Report
The manuscript titled “Medium-sized Lake Water Quality Parameters Retrieval Using Multispectral UAV Image and Machine Learning algorithms – A Case Study of the Yuandang Lake, China” deals with the collection of MS data from UAV and modelling it for water quality using Machine Learning algorithms for Yuandang Lake, China. The research is within the scope of the journal. The study reported in sufficient detail to allow for its replicability and reproducibility.
The usage of term “Unmanned Aerial Vehicle” is being discouraged. It is advised to use gender neutral term such as Uncrewed Aerial Systems or Unoccupied Aerial Systems for UAVs.
Authors need to emphasise on the novelty of the study. Many previous studies have been conducted using UAV data and machine learning models, so what is new in this?
Line 77- 79: “UAV spectral remote sensing technology is proving to be of great practical use by obtaining broad-range and high-frequency environmental data at a more economical cost to support precise water management and pollution control activities” needs to be substantiated by references. Authors can utilise following for this as well as strengthening other aspects of the manuscript:
· https://www.hindawi.com/journals/ace/2023/3544724/
· Wu, D., Jiang, J., Wang, F., Luo, Y., Lei, X., Lai, C., ... & Xu, M. (2023). Retrieving Eutrophic Water in Highly Urbanized Area Coupling UAV Multispectral Data and Machine Learning Algorithms. Water, 15(2), 354.
· https://www.mdpi.com/2504-446X/5/3/84
The portion about UAV Multispectral Image Collection is nicely written giving tiny details of the process.
There are common formatting mistakes present in the manuscript. For eg.: -
· Line 93: km2 is written as km2.
· Line 527: GA_XGBoost should be replaced with
There are following limitations of the IF algorithms:
· Outlier detection: IF algorithm is not designed to handle outliers that are close to each other in the feature space. These points may be classified as normal, as the algorithm is based on the principle that anomalies are more likely to be isolated.
· Training data: Isolation Forest algorithm requires a large number of training data points to work effectively. If the dataset is small, the algorithm may struggle to identify anomalies accurately.
· Unbalanced data: If the dataset is highly unbalanced, with a small number of anomalies compared to normal data points, the algorithm may struggle to identify the anomalies accurately.
Are authors sure that these limitations are not there in this study?
Why CNN was selected for training? Convolutional Neural Networks (CNN) are a popular type of deep learning algorithm commonly used for classification. But the limitation of CNN is data requirement. CNN requires a large amount of training data to achieve high accuracy. Without sufficient data, the model may overfit or underfit, leading to poor performance. It is important to note that the quality of the dataset also matters, and the data should be diverse, representative of the underlying distribution, and free from errors or biases. Here 4k data points are there but the data diversity would be less.
Authors have mention GA-XGBoost much later in the manuscript. The authors should mention about this in abstract as well as in section 2.3.1
Authors have clearly mentioned the limitations.
Figure 1 needs to be redrawn. Use SI units in scale. Font sizes for coordinate grids need to be uniform for both 1 (a) and 1 (b). Provide the scale for 1(c). Also provide good resolution images for the first two maps.
Author Response
Please refer to the uploaded file.
Author Response File: Author Response.docx
Reviewer 2 Report
|
Format and Grammer |
L22 |
Spelling, 13 km² |
L40 |
“Since inland water bodies typically located near human populations, they are prone to pollution from intensive human activities and environmental changes.” – Isn’t it the other way around? Human populations are located at water bodies |
L64 |
Grammar, eases |
L93 |
Spelling, 50 km² |
L93 |
Grammar, lakes |
L109 |
Spelling, 13 km² |
Figure 1 |
Different axis sizes between a) and b). Font size of axis b) is too small. Bad quality of axis labels in c) and too small font size. |
Table 1 |
Please include index of band number B1 - B5. They are used in table 2 but are not introduced to the reader. |
L139 |
Grammar, maybe “was set at a maximum altitude of 500 m”? ; Spelling, 500 m |
L179 |
Abbreviation USV is not introduced to the reader. In the introduction in line 165 it is also not mentioned that an USV is used for data collecting. |
L177 to L184 |
Do you collect all water samples of the 20 measuring sites in two 2.5-liter bottles or do you have 20 samples? |
L238 |
It is difficult for the reader to link this section to the measurement methods in section 2.2.2. |
L264 |
Grammar, bands |
L269 |
Wrong section numbering |
L270 |
Machine Learning Models |
L298 |
Neuronal Network Models |
Figure 7 |
There is not connection between the figure and the descriptions in the section. The abbreviations in the figure are not introduced. |
Table 2 |
Layout is not suitable. Maybe use bars between formular and index to mark the three sections of the table. |
L351 |
Spelling: R² |
L382 to 389 |
The types of classes introduced to the reader will not be used in further analysis. Do the reader really need this information? |
L402 |
Which data do you use for the pearson correlation analysis? Do you use the whole multispectral data? |
Further Comments:
Interesting to read, however some comments:
General:
The individual chapters of the paper (after chapter 1) would benefit from a short introduction at the beginning explaining the subsequent structure and argumentation of the text.
Section 2.3:
This section should be reworked. In general, the explanations are missing some details in order to fully understand the topics and those you provide do not exactly match with literature terms. Furthermore, the most part of the section introduces the general topic but doesn’t provide the reader with detailed information about your model architecture and the processing workflow.
Section 3.1:
There is no explanation how the 20 measurement areas are transformed to the Max, Min, Mean and SD-Values in table 3. Although you miss to make clear which type of in-situ water data collection from section 2.2.2 belongs to the tables 3 and 4 so the reader isn’t introduced and must go back to section 2.2.2.
Author Response
Please refer to the uploaded file.
Author Response File: Author Response.docx
Round 2
Reviewer 1 Report
Authors have made necessary changes and the quality have been improved.