Next Article in Journal
Dynamic Programming-Based Track-before-Detect Algorithm for Weak Maneuvering Targets in Range–Doppler Plane
Previous Article in Journal
Utilizing Dual-Stream Encoding and Transformer for Boundary-Aware Agricultural Parcel Extraction in Remote Sensing Images
Previous Article in Special Issue
3D Landslide Monitoring in High Spatial Resolution by Feature Tracking and Histogram Analyses Using Laser Scanners
 
 
Article
Peer-Review Record

Application and Evaluation of the AI-Powered Segment Anything Model (SAM) in Seafloor Mapping: A Case Study from Puck Lagoon, Poland

Remote Sens. 2024, 16(14), 2638; https://doi.org/10.3390/rs16142638
by Łukasz Janowski 1,* and Radosław Wróblewski 2,3
Reviewer 1: Anonymous
Reviewer 2:
Remote Sens. 2024, 16(14), 2638; https://doi.org/10.3390/rs16142638
Submission received: 30 April 2024 / Revised: 11 June 2024 / Accepted: 17 July 2024 / Published: 18 July 2024
(This article belongs to the Special Issue Advanced Remote Sensing Technology in Geodesy, Surveying and Mapping)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The authors submitted a well written and an interesting manuscript dealing seafloor mapping using. In addition, the subject of this study is of great interest for readers and scientists working to find solution to mitigate the effects of the changing climate especially in the field of hydrology. The conclusions reported are supported by the results and the methods are described. However, the section 2 of methods and materials should be improved by dividing it into 2 subsections with the first one mainly providing the descriptions of the study area and the second one containing the descriptions of the methodology adopted to conduct this study.  For the section of results, it would be interesting by including the percentage of spatial distribution of the results for distinct types of bed-forms obtained from the application of the 3 methods.

Comments on the Quality of English Language

Minor Editing of English language are required

Author Response

Dear Reviewer 1,

Thank you for your constructive feedback on our manuscript. We appreciate the time and effort you put into reviewing our work.

We were pleased to hear that you found our manuscript well-written and interesting, and that you believe the subject of our study is of great interest to readers and scientists working in the field of hydrology.

In response to your comments, we improved Section 2 of Methods and Materials by dividing it into two subsections. The first subsection now provides a detailed description of the study area, and the second subsection contains a comprehensive description of the methodology we adopted to conduct this study.

For the Results section, we included the percentage of spatial distribution of the results for distinct types of bed-forms obtained from the application of the three methods, as you suggested. We believe this addition enhances the clarity and comprehensibility of our results.

We hope these revisions address your concerns, and we look forward to any further suggestions you may have.

Best regards,

Łukasz Janowski

Reviewer 2 Report

Comments and Suggestions for Authors

SAM and MRS are combined to study the seafloor mapping in the manuscript. The results are of scientific sense. But the method should be further expressed in detail.

[1]     There are too many abbreviations, which may make the difficult to understand the manuscript.

[2]     Section 1: There are some literatures about the Puck Lagoon. These literatures should be further analyzed and generalized.

[3]     Section 2: Please show the method used in the study in detail. If possible, a detailed flow chart should be shown.

[4]     Fig. 1: The sub plot numbers do not match (a), to (g).

[5]     Section 3: 1) How to make the train and prediction of SAM in the study? 2) How to combine the SAM and MRS in detail? 3) Why is a classic method not used to compare with SAM+MRS?

[6]     Fig. 2: The sub plot numbers do not match (a), to (i).

[7]     Fig. 3: How to validate these segmentation results?

[8]     Table 5: How to make the accuracy assessment?

[9]     Section References: The format of literatures should be normalized.

Author Response

Dear Reviewer 1,

Thank you for your valuable feedback on our manuscript. We appreciate the time and effort you put into reviewing our work.

In response to your comments:

[1] We reduced the number of abbreviations and provided a list of abbreviations at the Appendix to make it easier to understand. Where appropriate in the text, we used the full names of bedforms instead of their abbreviations.

[2] We analyzed and generalized the literature about the Puck Lagoon in more detail. In respect to the Reviewer 1, we incorporated the subsection 2.1. ‘Study area”, where we included more literature about the Puck Lagoon. We have tried to do our best, but we would be grateful if you could suggest any other literature positions that we may have inadvertently omitted.

[3] In Section 2, we provided a detailed description of the method used in the study, especially explaining SAM working principles. We also included a detailed flow chart in Figure 2 as you suggested. The introduced text:

“SAM operates using a diverse range of input prompts. These prompts, which specify what to segment in an image, enable SAM to perform a broad spectrum of segmentation tasks without additional training [23,46]. One of the key strengths of SAM lies in its advanced training methodology. It has been trained on millions of images and masks, collected via a “data engine” that operates in a model-in-the-loop manner. This engine allows researchers to use SAM and its data to annotate images interactively and update the model. This iterative process has been repeated numerous times to enhance both the model and the dataset. The final dataset is quite extensive, comprising over 1.1 billion segmentation masks collected from approximately 11 million licensed and privacy-preserving images [47,48].

The output from SAM, in the form of masks, can be utilized as inputs to other AI systems. For instance, object masks can be tracked in videos, used in image editing applications, converted to 3D, or employed for creative tasks such as collaging. SAM has developed a comprehensive understanding of what constitutes an object. This understanding facilitates zero-shot generalization to unfamiliar objects and images, eliminating the need for additional training [18,49].”

[4] We corrected the subplot numbers in Fig. 1 to match from (a) to (g).

[5] Thank you very much for these questions. In the following part we provided explanations for all of them:

1) How to make the train and prediction of SAM in the study?

In the revised version of the article, we explained in detail the working principles of SAM. The algorithm does not require training because it has been trained on millions of images and masks, collected via a “data engine” that operates in a model-in-the-loop manner. The final training dataset is quite extensive, comprising over 1.1 billion segmentation masks collected from approximately 11 million licensed and privacy-preserving images.

SAM is an approach of segmentation of the images, so in general, it does not contain meaningful information about categories of the segmented objects (although, as mentioned above, the large database of meaningful objects was used for pre-training of the algorithm, so resultant segments should be very precise). Categorization of such distinct areas (segments) is often determined in a (supervised) classification process. The supervised classification process consists of training and application (or prediction) steps allowing the assignment of distinct categories for each segment. Therefore, the SAM algorithm does not contain the prediction step, but it is included in the Random Forest machine learning supervised classifier which was described in detail in Section 2 of the article.

2) How to combine the SAM and MRS in detail?

We explained this methodical approach in appropriate Section 2. Similarly, we added an extensive description of how SAM and MRS were combined in detail, as the following:
“The SAM algorithm creates a grid of initial points, or “seeds,” based on the “point_per_side” parameter. These seeds are used to segment images, but they may lack significance. To address this issue, we applied an alternative to have more control over the seed creation process for object delineation by SAM. In this approach, we utilized SAM + MRS as seeds, combining MRS with SAM. By applying MRS before SAM, the seeds become more meaningful, leading to a more precise object delineation in certain situations. The Scale parameter was set to 250 in the MRS algorithm to produce seeds that more accurately represent the objects in the benchmark dataset.”

3) Why is a classic method not used to compare with SAM+MRS?

In the course of generating the results, we noticed that the SAM + MRS method did not give as detailed results as the manual interpretation we had ready at the beginning. We therefore felt that a qualitative comparison of all methods, as described in Section 3, would be sufficient. However, we understood that it would still be necessary to compare the methods quantitatively as well. Therefore, in the revised version of the article, we have completed the accuracy assessment results also for the SAM + MRS solution, after the application of the RF classification algorithm. Please find below the parts of the text that were added in the revised version of the article:
“Contrarily, Figure 4b depicts the outcome of applying the RF classification algorithm to the combined MRS and SAM results. It is evident that the majority of the area was categorized as ‘anthropogenic formations’. This was followed by ‘uneven seabed’, and finally ‘accumulations of organics’. Due to the significantly lower level of segmentation complexity compared to the MRS result, only 3 out of 21 classes were assigned to various image objects with differing levels of accuracy.”
“For comparison, we also conducted an accuracy assessment for the SAM + MRS + RF algorithm (Table 6). The results indicate a marginally higher overall accuracy compared to the previous findings. However, it is important to note that due to the reduced complexity of segmentation, the number of classes that emerged decreased to 5 out of 21. Furthermore, even though some classes were present in the validation ground-truth dataset, they were not classified at all. This led to false positive results, particularly in the ‘undulating seabed’ and ‘foreshore slope’ classes.

Table 7 presents the percentage of spatial distribution of different types of bedforms, as determined by three methods: manual delineation, MRS + RF, and SAM + MRS + RF. The manual delineation method is considered the reference point for the other two methods. The MRS + RF method shows a significant increase in the identification of `uneven seabed` (from 1.26% to 30.90%) and `flat, even seabed with vegetation` (from 5.04% to 21.13%) compared to the manual delineation. However, it significantly underperforms for `slightly undulating seabed` (from 32.74% to 2.34%), `undulating seabed` (from 11.71% to 1.63%), and `flat, even seabed` (from 22.12% to 1.82%). The SAM + MRS + RF method only identifies two bedforms: `uneven seabed` (19.60%) and `anthropogenic formations` (77.97%). It significantly outperforms the manual delineation for `anthropogenic formations` (from 0.87% to 77.97%) but underperforms for `uneven seabed` (from 1.26% to 19.60%).

This analysis suggests that while the MRS + RF and SAM + MRS + RF methods can outperform manual delineation for certain bedforms, they also significantly underperform for others. This could be due to the different characteristics of the bedforms and the specific conditions under which the methods are applied. Considering the complexity of the task, it is clear that the MRS + RF method has a broader application as it was able to classify all 21 bedforms. This suggests that it might be a more general method, capable of handling a wider variety of bedforms.”

[6] We corrected the subplot numbers in Fig. 2 to match from (a) to (i).

[7] We argue that validation of just segmentation results without category information is not meaningful. Therefore, we performed the same Random Forest supervised classifier on all segmentation results in order to validate all results in a quantitative manner. An additional accuracy assessment table is included in the revised version of the article. Moreover, we compared the percentage of spatial distribution of the results for distinct types of bedforms obtained from the application of the three methods.

[8] We provided a detailed explanation of how we made the accuracy assessment, including all accuracy statistics used in the analysis, in the methods section. Please see the following:


“In the subsequent phase of the process, we employed a supervised classification technique utilizing the machine learning-based Random Forest (RF) classifier [55]. The manual delineation of bedforms served as the basis for generating 2100 random control points. We ensured these points were strategically and representatively placed, maintaining a minimum distance of 4 m from each other and 10 m from the class boundaries [8]. Following this, we divided the control points into training and validation samples, adhering to a 70/30 split [56]. The training points were utilized to execute the Random Forest classification, while the validation samples were used to assess the accuracy of the classification [57-59]. In the correlation matrix, the diagonal elements denote instances that have been correctly classified for each respective class. Conversely, the off-diagonal elements signify instances that have been misclassified. The Producer’s accuracy is a measure of the likelihood that a given ground truth class is accurately classified. On the other hand, the User’s accuracy signifies the probability that a predicted class corresponds to that class in reality. The Kappa per class metric provides a measure of agreement between the predicted and actual class labels, adjusted for chance agreement [60]. This value ranges from -1, indicating total disagreement, to 1, indicating perfect agreement. A value of 0 would suggest an agreement equivalent to random classification. The Overall accuracy is calculated as the ratio of the total number of correct classifications to the total number of instances [57]. Lastly, the Kappa statistic measures the degree of agreement between the predicted and actual class labels, taking into account the possibility of chance agreement [60]. A Kappa value of 1 signifies perfect agreement, while a Kappa value of 0 suggests agreement no better than random classification. This approach allowed us to maintain the integrity of the classification process while also providing a means for performance evaluation. A detailed workflow presenting all steps mentioned in this section was provided in Figure 2.”

[9] In accordance with Instructions for Authors, we used EndNote software to normalize the format of the references in the References section. We double-checked all input references in the EndNote software.

We hope these revisions address your concerns, and we look forward to any further suggestions you may have.

Best regards,

Łukasz Janowski

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript has been revised based on the previous comments and proposals. I recommend it can be accepted for publication in Remote sensing.

Back to TopTop