Next Article in Journal
Experimental Analysis of the Changes in Coral Sand Beach Profiles under Regular Wave Conditions
Previous Article in Journal
A Novel Unmanned Surface Vehicle Path-Planning Algorithm Based on A* and Artificial Potential Field in Ocean Currents
 
 
Article
Peer-Review Record

Extracted Spectral Signatures from the Water Column as a Tool for the Prediction of the Structure of a Marine Microbial Community

J. Mar. Sci. Eng. 2024, 12(2), 286; https://doi.org/10.3390/jmse12020286
by Staša Puškarić 1,2,*, Mateo Sokač 1,3,4,5, Živana Ninčević 6, Danijela Šantić 6, Sanda Skejić 6, Tomislav Džoić 6, Heliodor Prelesnik 6 and Knut Yngve Børsheim 7
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
J. Mar. Sci. Eng. 2024, 12(2), 286; https://doi.org/10.3390/jmse12020286
Submission received: 4 January 2024 / Revised: 29 January 2024 / Accepted: 2 February 2024 / Published: 5 February 2024
(This article belongs to the Section Marine Biology)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In the present article “Extracted spectral signatures from the water column as a tool

for the characterization of the marine microbial community,” the author offered a very interesting interpretation and tentative do describe the microbial structure in two different stations in the central Adriatic Sea, using “advanced Machine Learning and Artificial Intelligence techniques, specifically the Non-Negative Matrix Factorization (NMF) method, to analyze downward and upward light spectra collected by Hyperspectral Ocean Color Radiometer (HyperOCR, HOCR) sensors in the water column”.

 

Initially, I was enthusiastic to start reading the paper because the topic treated is very interesting and can offer a big advance in oceanographic studies, especially investigating processes driven by microbial community in the water column and possible relationship between abiotic and biotic variables. But suddenly I recognized the difficulties of the argument and I found some important “inaccuracies” in the terms used.

First of all, as a marine microbiologist, I can’t not ignore a problem with the title

“Characterization of marine microbial community”

the title should be changed, 

in

prediction of the structure of the microbial community

with the characterization of marine microbial community, a quality assessment of microbial communities should be offered utilizing 16s and 18s biodiversity for both prokaryotic and eukaryotes compartment 

Here the authors presented only a quantitative description of some compartments:  

Autotrophic cells ( Synechococcus, Prochlorococcus, and picoeukaryotes,)

High nucleic acid content (HNA) bacteria, Low nucleic acid content (LNA) bacteria, and heterotrophic nanoflagellates (HNF) 

I'm not sure that this kind of description is enough to obtain a 

full characterization of marine microbial community without the 16s and 18s data of above....

The approach proposed by the authors is very interesting and will offer enormous possibilities in the future, but could be possible to insert biodiversity data?

I’m not confident with model prediction but as a generalist reader I had some difficulties reading same part of the paper

paragraph 3.2 

is not sufficiently clear to me, could be explained how was possible to extract the 5 different signature curves? are they a result of a combination of different spectra detected by HORC or they are 5 "original" spectra detected in the two stations?  

Is the "optimal number"  the minimum number of signatures that can describe the communities?  could you explain better the concept of optimal number?

 

TABLE 1

description of the acronyms is missing 

HNAN? 

HNAN?

PE etc

 

line 314-315

no correlation with some of the microbial groups identified in this work?

Line 323

V30? is 30-meter depth at Stonica - Vis? Be constant with the sampling name, please

 

FIG3 and FIG4

The graph is not very clear, could you put R-value and P value out of the graph?

do we have a cutoff of the pvalue for this correlation?

 

Line 389-391

all of them? or only signature S2 as stated in line 393?

later on 

we should look for a specific correlation between spectra S1-S5 and specific microbial groups ((a) %HNA bacteria, (b) bacteria, (c) heterotrophic nanoflagellates, (d) picoeukaryotes, (e) Prochlorococcus, and (f) Synechococcus ?

I'm getting lost 

what does mean a minor association of depth?

Does the model predict the abundance of different microbial categories/groups or the effect of variables such as depth light, etc., or both?

Line 439 – 443

again could explain better "association with depth"

Author Response

Thank you very much for taking the time to review this manuscript. Please find the detailed responses below.

Comment 1. First of all, as a marine microbiologist, I can’t not ignore a problem with the title “Characterization of marine microbial community” the title should be changed, in prediction of the structure of the microbial community with the characterization of marine microbial community, a quality assessment of microbial communities should be offered utilizing 16s and 18s biodiversity for both prokaryotic and eukaryotes compartment 

Response 1. We changed the title as indicated in the comment. Unfortunately we did not have the means to conduct the proposed sequencing analyses, however, we are informed about these technologies. The reviewer is correct, this would significantly contribute to better description, we will hopefully be able to do it in the future. The issue is discussed.

Amendments: Title, Page 15, line 524-531

Comment  2. Here the authors presented only a quantitative description of some compartments:  

Autotrophic cells ( Synechococcus, Prochlorococcus, and picoeukaryotes,)

High nucleic acid content (HNA) bacteria, Low nucleic acid content (LNA) bacteria, and heterotrophic nanoflagellates (HNF). I'm not sure that this kind of description is enough to obtain a full characterization of marine microbial community without the 16s and 18s data of above....

The approach proposed by the authors is very interesting and will offer enormous possibilities in the future, but could be possible to insert biodiversity data?

As mentioned above, we are aware, and at this point we did the best  we could. We certainly hope that we will account for that in the future, however at this point our resources are limited to what you can see in our report.

Amendments: Page 15, line 527-531

Comment 3. I’m not confident with model prediction but as a generalist reader I had some difficulties reading same part of the paper. paragraph 3.2 is not sufficiently clear to me, could be explained how was possible to extract the 5 different signature curves? are they a result of a combination of different spectra detected by HORC or they are 5 "original" spectra detected in the two stations?  Is the "optimal number"  the minimum number of signatures that can describe the communities?  could you explain better the concept of optimal number?

Response 3. We apologize for not explaining this clearly. We added more information about NMF in the introduction. The NMF method is a dimensionality reduction and feature extraction method where we factorize a single matrix into two matrices. The original matrix can be approximated with matrix multiplication, and then the difference between the original matrix and the reconstructed matrix is called reconstruction error, which is a standard measurement of error (Frobenius norm, also known as the square root of the sum of squares). As stated in the text, the first matrix (H) represents “spectral signatures”, which are unique patterns in the data discovered by the method, therefore they are original spectra detected and derived from the data. The second matrix (W) represents the weights of each curve (sample) towards each signature. 

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3312-5

To find an optimal number of signatures, we performed the NMF multiple times and measured reconstruction error for each scenario. This produced Figure 1a, which shows that the error stabilizes at number 5. In unsupervised clustering, this is also called the elbow method

Comment 4. TABLE 1

description of the acronyms is missing. HNAN? HNAN? PE etc

Response 4. We thank the reviewer for pointing out this mistake. The Table 1 legend is amended.

Comment 5. line 314-315 no correlation with some of the microbial groups identified in this work?

We are not sure what the reviewer is pointing out; however, we do have a correlation matrix, correlation lines, and KDE estimates by depth for microbial groups and depth in supplementary figures A2, A3, and A4.

Comment 6. Line 323 V30? is 30-meter depth at Stonica - Vis? Be constant with the sampling name, please

Response 6. We thank the reviewer for pointing this out, however we believe that it is clearly explained and referred to, e.g.

“The signature S1 (curve peaks at 356 nm and 597 nm) is also correlated with Chlorophyll a, Prochlorococcus, and Picoeukaryotes (r = 0.48, P < 0.001; r = 0.38, P < 0.001; r = 0.57, P <0.001) (Figure 2b, Kaštela upwelling panel) indicating different community structures at the surface compared to the lower parts of the water column. S1 showed the highest enrichment at 0 and 28 meters and the lowest at 10 m " referring to the Kaštela site, where we sampled at 28 meters. 

Comment 7. FIG3 and FIG4. The graph is not very clear, could you put R-value and P value out of the graph?

Response 7. We apologize for not stating this clearly in the text, however; Figure 3e and Figure 3f have r value (correlation coefficient) and correlation p-value on the graph. 

In Figure 4, the points represent the true value of abundance (x-axis) and the predicted value of abundance (y-axis), not the linear regression line. In the ideal case, this would be a clear diagonal line, indicating the perfect fit. The performance of the model is stated for each subpanel with root mean square error metric (RMSE) alongside the definition of a model (formula). Using the formula, readers can see which signatures are contributing to the prediction of abundance. Those models are fitted on a low amount of data and they are for exploratory purposes. In other words, we wanted to test which microbial abundance groups we could predict and which ones we could not. This tells us if abundances are associated with extracted signatures (light spectrum), depth or both. We changed the figure legend to explain this clearly. 

Comment 8. Do we have a cutoff of the pvalue for this correlation?

Response 8. We are not sure what the reviewer is pointing out for Figure 4 as points represent the true value of abundance (x-axis) and the predicted value of abundance (y-axis), not the linear regression line. As for Figure 3, the Diatoms and Chlorophyll A show a significant correlation (r=0.75 P=0.02) whereas other microbial groups show no significant correlation.

Comment 9. Line 389-391 all of them? or only signature S2 as stated in line 393?

Response 9. The corresponding coefficients on the linear regression line indicate the “importance” of each variable (S1-S5 and depth). Figure 4b showed a small RMSE, indicating a low error when predicting bacterial abundance, but also the largest coefficient associated with S2 and a coefficient of 0 associated with depth. This exploratory analysis indicated that the abundance of bacteria does not depend on depth, but on the light spectrum, more specifically to high-intensity broad light spectrum peak around 503 nm. We changed the text to explain this in more detail.

Comment 10. Later on we should look for a specific correlation between spectra S1-S5 and specific microbial groups ((a) %HNA bacteria, (b) bacteria, (c) heterotrophic nanoflagellates, (d) picoeukaryotes, (e) Prochlorococcus, and (f) Synechococcus ?

Response 10. We thank the reviewer for the suggestion, however as some of the species in microbial groups are not abundant enough we did not perform correlation analysis for all of them. Instead, we correlated signatures to microbial groups as depicted in Figure 2b.

Comment 11. I'm getting lost what does mean a minor association of depth?

Response 11. We thank the reviewer for pointing out this error and agree that it should be changed. We changed the sentence to
The coefficient of 0.03 associated with depth signifies a marginal correlation between depth and count, as illustrated in Figure 4d.” 

Page 13, line 446-447

Comment 12. Does the model predict the abundance of different microbial categories/groups or the effect of variables such as depth light, etc., or both?

Response 12. The model predicts the abundance of different microbial categories using our extracted light spectrum signatures and depth as input parameters. 

Comment 13. Line 439 – 443, again could explain better "association with depth"

Response 13. As we used linear regression for exploratory analysis, we interpreted coefficients as “association” which was previously in paper tested with a correlation test (Figure A2). We changed the text, accordingly:  

“We noted a positive coefficient (0.36 for downwelling and 0.23 for upwelling) associated with depth (Figure 4e), aligning with the earlier discovery of a comprehensive positive correlation between Prochlorococcus count and depth (r= 0.95 P<0.001, Figure A2).” 

Page 13, line 450-453

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript submitted by Staša Puškarić et al. introduces a novel approach, powered by machine learning and AI algorithms, to characterize the complex marine microbial community. This is accomplished by analyzing spectral signals gathered using Hyperspectral Ocean Color Radiometer (HyperOCR, HOCR) sensors, which are then correlated with the abundance and community structure of phytoplankton. The proposed method can extract valuable information from intricate spectral signals and generate accurate predictions regarding the quantity and species of various marine microorganisms across different depths in the sea. This will undoubtedly benefit our readers in the field. However, there are several areas that require minor revisions to enhance the quality of the manuscript:

11. It would be beneficial to include more background information on the Non-Negative Matrix Factorization (NMF) method used in this study in the introduction. This would aid our readers in understanding its strengths, limitations, and the reasons why the authors chose this particular method for this study.

22. For Figures 2a, 3a, and 3b, could you specify the number of replicates measured for samples at each depth of each site?

33. Have the authors validated the models generated in Section 3.7 using new samples that were not used for model construction? Similarly, can the obtained models characterize the marine microbial community of other sites or seas?

44. On Page 5, lines 214-216, it appears that the data shows a linear correlation between the S1 extracted from the UW sensor, rather than the DW sensor, and the depth. Could you please clarify?

55. There seems to be a discrepancy between the peak wavelengths of spectral signatures in Table 1 and the corresponding text above the table.

66. There is no data presented to support the statements made in Section 3.3. Could you provide the necessary data or revise the section accordingly?

77. On Page 9, lines 310-311, the data suggests that S4 has almost no presence at 0 meters and no presence at 30 meters, but is present at other sampled depths at the Stončica - Vis station. Could you please clarify this point?

 

Author Response

Thank you very much for taking the time to review this manuscript. Please find the detailed responses below:

Comment 1. It would be beneficial to include more background information on the Non-Negative Matrix Factorization (NMF) method used in this study in the introduction. This would aid our readers in understanding its strengths, limitations, and the reasons why the authors chose this particular method for this study.

Response 1. We apologize for not adding more details about NMF. The introduction section is now amended

“NMF is a dimensionality reduction and feature extraction method that factorizes the input matrix into two matrices, which, when multiplied again, results in the original matrix (Wang and Zhang 2013). In our project, we factorize spectral curves from HyperOCR sensors with the intention of discovering latent patterns in the sensor data(Pauca et al. 2006). A similar approach is used when dealing with mutational data, where a simple count matrix is factorized into two matrices, discovering underlying biological mechanisms at play (Alexandrov et al. 2013). “ 

Page 2, line 51-57

More detailed meaning behind two matrices is defined in section 3.2: 

“Factorization method (NMF) (Lee and Seung 1999)(Lee and Seung 1999; Pauca et al. 2006), which uses the input data (HOCR) curves in order to factorize them into two matrices. The first matrix (H) represents “spectral signatures”, which are unique patterns in the data discovered by the method. The second matrix (W) represents the weights of each curve (sample) towards each signature.”

Page 4, line 206-213

Comment 2. For Figures 2a, 3a, and 3b, could you specify the number of replicates measured for samples at each depth of each site?

Response 2. We thank the reviewer for the suggestion. Amendment on Pg 7, line 267

Comment 3. Have the authors validated the models generated in Section 3.7 using new samples that were not used for model construction? Similarly, can the obtained models characterize the marine microbial community of other sites or seas?

Response 3. As stated in the text in section 3.2  “Using our custom-built vertical profiler and prior to the experiment, we collected HOCR curves at different locations in the southern Adriatic Sea in the vicinity of the Island of Mljet, and at different time points of the day. This resulted in a dataset containing 5397 HOCR curves which we used as a training dataset for model development.” From the perspective of the NMF model, the entire experiment conducted at the Stončica-Vis and Kaštela-Bay locations is a validation of the model and proof that we can use it on other sites.

Amendment Page 15, line 550-556

Comment 4. On Page 5, lines 214-216, it appears that the data shows a linear correlation between the S1 extracted from the UW sensor, rather than the DW sensor, and the depth. Could you please clarify?

Response 4. We apologize for this typographical error, The S1 from UW sensors shows a clear negative linear correlation with depth. We corrected the error in the text.

Comment 5. There seems to be a discrepancy between the peak wavelengths of spectral signatures in Table 1 and the corresponding text above the table.

Response 5. We thank the reviewer for catching this inconsistency. The text (page 5, paragraph 2) and Table 1 and legend  is amended.

Comment 6. There is no data presented to support the statements made in Section 3.3. Could you provide the necessary data or revise the section accordingly?

Response 6. We apologize for this mistake, the section is now revised.

Page 5, line 265-273

Comment 7. On Page 9, lines 310-311, the data suggests that S4 has almost no presence at 0 meters and no presence at 30 meters, but is present at other sampled depths at the Stončica - Vis station. Could you please clarify this point?

Response 7. S4 is characterized as a low-intensity, almost uniform spectral curve spanning across a wide spectrum ( 350-580 nm), we also added an explanation in the results section and added more references 

“Vertical distribution of microorganisms is under significant influence of moving masses of seawater, besides the influence of nutrients and light. Therefore this could be the reason for difficulties in prediction of different groups of microorganisms within the marine water column”. Temperature and salinity profiles (i.e. Figure A1) indicate that this could be the factor in this study.

Page 10, line 337-339

Reviewer 3 Report

Comments and Suggestions for Authors

Dear authors,

1. Even if the use of ML and AI in connection with HyperOCR is at the beginning, the number of titles in the references is very low. The authors could use the "classic" HyperOCR data obtained by other authors in other marine systems to compare them with the data obtained in the present study;

Has HyperOCR been used in the study of other marine systems? What would be beyond the advantages, limits and disadvantages of using ML and AI in such studies?

2. Introduction section. In our opinion, the authors should discuss more the use of HyperOCR in other marine systems and make comparisons with their study. The Introduction section is too short and less convincing, especially for a wider audience, less used to technical terms.

3. The Materials and methods section is detailed regarding the methods and techniques used for subsequent determinations. It is not clear to me how the light microscopy data were used because I only saw large groups in the description, not species or genera in connection with the flow cytometry data and hyperspectral curves.

What phytoplankton species have been identified and what is their significance for HyperOCR?

Only large groups are mentioned without reference to species and genera (as in figure 3).

4. Results and Discussions section - comparisons regarding the groups of microorganisms found in the study area with other researches are missing. These comparisons would give greater value to the study because they would integrate the original results from this study into a wider scientific perspective.

Author Response

Thank you very much for taking the time to review this manuscript. Please find the detailed responses below:

Comment 1. Even if the use of ML and AI in connection with HyperOCR is at the beginning, the number of titles in the references is very low. The authors could use the "classic" HyperOCR data obtained by other authors in other marine systems to compare them with the data obtained in the present study;

Response 1. We apologize for not clarifying this clearly in the text, however, we did not find a study with accessible public data that contains hyperOCR, chlorophyll absorption, and count of cells from flow cytometry. To clarify this further, we expanded the introduction and added more references.

Page 1, Paragraph 1

Comment 2. Has HyperOCR been used in the study of other marine systems? What would be beyond the advantages, limits and disadvantages of using ML and AI in such studies?

Response 2. The main advantage of ML and AI in such systems is that models with powerful predictive power can replace the need for expensive on-site laboratories for microenvironment characterization. Sensor systems are more convenient to deploy as they can collect data that can be analyzed. For example, our NMF model outputs five distinct signatures, where each one of them represents a specific spectrum of light and a specific association towards cell abundance. 

Page 15, line 555, 556

Comment 3. Introduction section. In our opinion, the authors should discuss more the use of HyperOCR in other marine systems and make comparisons with their study. The Introduction section is too short and less convincing, especially for a wider audience, less used to technical terms.

Response 3. We agree with the reviewer and added additional information into the Introduction section to clarify this issue and added additional references. Thank you for bringing this up.  

Amendments on Page 1, lines 40-58

Comment 4. The Materials and methods section is detailed regarding the methods and techniques used for subsequent determinations. It is not clear to me how the light microscopy data were used because I only saw large groups in the description, not species or genera in connection with the flow cytometry data and hyperspectral curves.

What phytoplankton species have been identified and what is their significance for HyperOCR?

Only large groups are mentioned without reference to species and genera (as in figure 3).

Response 4. Abundances and detailed species distribution per station per depth are shown in Supplementary Data File/Phytoplankton. We did not find any species specific correlation with extracted signatures. That is the reason we did not include it into results. Thank you for the comment, we included this issue in our discussion.

Page 12, line 398-401

Page 15, line 524-537

Comment 5. Results and Discussions section - comparisons regarding the groups of microorganisms found in the study area with other researches are missing. These comparisons would give greater value to the study because they would integrate the original results from this study into a wider scientific perspective.

Response 5. We included  the work of others which is related and relevant to the area of research and our findings in the Introduction and Discussion.

Page 14, line 500-506

Page 15, line 507-523

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

As far as possible, all the suggestions I proposed have been taken into consideration

the article can be published in this form

Back to TopTop