The paper presents a method for improving document design in MongoDB by reconstructing it and a tool to support the method, focusing on the data model. I think the research topic is interesting. However, the paper needs some information about the approach and a proper set of experiments.

The abstract can be improved. Reading it, I was unsure what the work was; after the problems (P1,P2) in the Introduction, I understood what it really was. I suggest being more objective in the abstract and more incisive about the paper's goal.

At the beginning of the Introduction, the authors say RDBs have shortcomings, such as performance degradation. How is that possible? An RDB's internal structures are prepared for ACID, which leads to poor performance in some types and volumes of data.

The query in the first example, finding Smith, is not correct. The Mongo syntax for finding requires two documents, one for filtering and another for projection. You have too many documents in the find parameter.

Line 364 has a reference problem.

Considering the Reconstruction processes, based on Figure 1, some of the steps have bullet points, and others do not. If you use bullet points for one, use them for all. However, I suggest rewriting this part in a more textual fashion, without the itens, just pure text.

Additionally, I think that a pseudo-code or a fluxogram can be a good way to present the process's flux.

The general approach to model data in MOngo is to keep one eye on the query workload. All the structures need to be organized based on how data are accessed. With this in mind, the first thing that occurred to me when reading the paper was how query performance was affected before the reconstruction. Authors are clearly aware of that, I suggest you putting the note in lines 402 and 403 and also in the Introduction to clarify this point at the beginning of the text.

The experiments are a little bit weak. Basically, the experiment shows how data is transformed and explains the values of each step and algorithm. Of course that is not wrong, but I do like to see the performance of the query after and before running the approach. Also, it will be valuable to know how much time is spent running the approach and the size of the data set. Testing bigger data sets is also interesting to see how the approach performs.

Comments on the Quality of English Language

English is good; the paper is well written.

Author Response

We wish to express our appreciation to the editor and the reviewer of our previously submitted manuscript for their valuable
comments on the manuscript. The comments have helped us significantly improve a revised manuscript. We have revised the previous
manuscript based on the comments and the suggestion. THe revised
parts are shown in boldface type.

1. The paper presents a method for improving document design in MongoDB by reconstructing it and a tool to support the method, focusing on the data model. I think the research topic is interesting. However, the paper needs some information about the approach and a proper set of experiments. The abstract can be improved. Reading it, I was unsure what the work was; after the problems (P1,P2) in the Introduction, I understood what it really was. I suggest being more objective
in the abstract and more incisive about the paper’s goal.
⇒ We have replaced the abstract expressions in abstract with more concrete ones using phrases in Introduction.

2. At the beginning of the Introduction, the authors say RDBs have shortcomings, such as performance degradation. How is that possible? An RDB’s internal structures are prepared for ACID, which leads to poor performance in some types and volumes of data.
⇒ We have added phrases to desribe reasons for RDB shortcomings at lines 23-27, p.1.

3. The query in the first example, finding Smith, is not correct. The Mongo syntax for finding requires two documents, one for 1 filtering and another for projection. You have too many documents in the find parameter.
⇒ We have corrected the find query at lines 171-172, p.4.

4. Line 364 has a reference problem.
⇒ We have add the reference [6] at line 354 P.7 and at lines 643 ー 644 P.18.

5. Considering the Reconstruction processes, based on Figure 1, some of the steps have bullet points, and others do not. If you use bullet points for one, use them for all. However, I suggest rewriting this part in a more textual fashion, without the itens, just pure text. Additionally, I think that a pseudo-code or a fluxogram can be a good way to present the process’s flux.
⇒ We have used bullet points in all steps to describe the reconstruction procedures between line 372, p,8 and line 413, P.9. We have not adopted pseudo codes because the description looks compricated.

6. The general approach to model data in Mongo is to keep one eye on the query workload. All the structures need to be organized based on how data are accessed. With this in mind, the first thing that occurred to me when reading the paper was
how query performance was affected before the reconstruction. Authors are clearly aware of that, I suggest you putting the note in lines 402 and 403 and also in the Introduction to clarify this point at the beginning of the text.
⇒ We measured performance before and after the reconstruction and have showed results between line 539, P.16 and line 551 P.17.

7. The experiments are a little bit weak. Basically, the experiment shows how data is transformed and explains the values of each step and algorithm.
⇒ We have added document schema before and after the reconstrion at Fig.6, 12, and 15.

Of course that is not wrong, but I do like to see the performance of the query after and before running the approach.
⇒ We measured performance before and after the reconstruction and have showed results between line 539, P.16 and line 551 P.17.
Also, it will be valuable to know how much time is spent running the approach and the size of the data set. Testing bigger data sets is also interesting to see how the approach performs.

Reviewer 2 Report

Comments and Suggestions for Authors

In this paper, the authors propose a method for scanning the structure of MongoDB documents to recommend reconstruction candidates with respect to a set of features - to improve document design. Such features include the hierarchical and horizontal relationships between the different data items in a JSON-format MongoDB document. The proposed method utilizes natural language processing to detect these features and relationships and then recommend a reconstruction to the user.

The proposed solution is very promising and can be utilized in other NoSQL databases that follow similar schemas.

For better quality, I recommend that the authors provide more clear figures for figures 2 and 4.

Author Response

We wish to express our appreciation to the editor and the reviewer of our previously submitted manuscript for their valuable comments on the manuscript. The comments have helped us significantly improve a revised manuscript. We have revised the previous manuscript based on the comments and the suggestion. THe revised parts are shown in boldface type.

1. In this paper, the authors propose a method for scanning the structure of MongoDB documents to recommend reconstruction candidates with respect to a set of features - to improve document design. Such features include the hierarchical and horizontal relationships between the different data items in a
JSON-format MongoDB document. The proposed method utilizes natural language processing to detect these features and relationships and then recommend a reconstruction to the user.

The proposed solution is very promising and can be utilized in other NoSQL databases that follow similar schemas. For better quality, I recommend that the authors provide more clear figures for figures 2 and 4.
⇒ We have replaces Fig.1-5 with finer images.

Reviewer 3 Report

Comments and Suggestions for Authors

Content

----------

The primary focus of this paper lies in introducing a method for reconstruction MongoDB document design, along with the tools that facilitate its implementation. The proposed method reconstructs the data structure in MongoDB documents based on the distance between field names using a method that measures word distances in natural language processing based on the hypothesis.Through the utilization of this tool, the experimental data yields significant outcomes, demonstrating a notable increase in document name cohesion, from 0.82 to 0.31. This substantial decrease serves as conclusive evidence validating the effectiveness of the experimental method.

Comments

--------------

1. Conventionally, it is customary to provide an explanation for technical terms when they first appear in a paper, as this aids in the comprehension of the content for the readers. Therefore, it is advisable to include an explanation for the terms "RDB/has-a/LogDice..." in order to ensure that the readers have a clear understanding of its meaning and context within the paper.

2. The majority of the experimental images included in your paper are of substandard quality. Kindly endeavor to enhance their clarity and resolution, if feasible, to ensure their effectiveness in communicating your research findings.

4. The currently proposed methodology rests upon two hypothesis, necessitating future endeavors to meticulously assess their alignment with practical necessities. Should the empirical outcomes reveal a disconnect between these presuppositions and the realities of application, a reconsideration of their rationality would be imperative.

5. Please ensure that the symbols adhere to the grammatical rules and are spelled correctly. Such as Line 164.

6. Kindly verify and confirm the validity of the document link provided. Such as document link provided by Line106.

Evaluation

--------------

Given the above, I'm in a position to major revision.

Comments on the Quality of English Language

Minor editing of English language required

Author Response

We wish to express our appreciation to the editor and the reviewer of our previously submitted manuscript for their valuable comments on the manuscript. The comments have helped us significantly improve a revised manuscript. We have revised the previous manuscript based on the comments and the suggestion. The revised parts are shown in boldface type.

1. Conventionally, it is customary to provide an explanation for technical terms when they first appear in a paper, as this aids in the comprehension of the content for the readers. Therefore, it is advisable to include an explanation for the terms ”RDB/hasa/LogDice...” in order to ensure that the readers have a clear understanding of its meaning and context within the paper.

⇒We have replaces Fig.1-5 with finer images.

⇒ We have replaces Fig.1-5 with finer images.

3. The article stipulates that the developed tools are operationalized via a web API, yet the accessibility of this API remains ambiguous. If indeed it is available for public usage, I kindly request that it be explicitly indicated in the paper, thereby
facilitating other researchers in replicating and validating the experiment.
⇒ The tool that we have developed is so far not open. We open the tool soon and make an environment where enybody can use it. So we have added the description at line 613, p.17.

5. Please ensure that the symbols adhere to the grammatical rules and are spelled correctly. Such as Line 164. Kindly verify and confirm the validity of the document link provided. Such as document link provided by Line106.
⇒ We have corrected the find query at lines 171-172, p.4.

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

Content

----------

Comments

--------------

1. As the link to document1 referenced in line 106 is currently inactive, you can consider citing it as a reference like this kind of updating webpages. However, if access remains unattainable due to other factors, kindly elaborate on the reasons.

Evaluation

--------------

After the initial revision, the main problems i had proposed with the paper have been improved. The author has fully responded to the proposed changes made in the first review. There is still one small problem to be solved.

Given the above, I'm in a position to minor revision.

Comments on the Quality of English Language

Minor editing of English language required

Author Response

1. The URL of the link to document1 was incorrect and has been corrected as follows

https://www.mongodb.com/docs/manual/tutorial/model-embedded-one-to-one-relationships-between documents/

↓

https://www.mongodb.com/docs/manual/tutorial/model-embedded-one-to-one-relationships-between-documents/

Author Response File: Author Response.pdf

Article Menu

A MongoDB Document Reconstruction Support System Using Natural Language Processing

Further Information

Guidelines

MDPI Initiatives

Follow MDPI