Next Article in Journal
A Data-Driven Framework for Analyzing Spatial Distribution of the Elderly Cardholders by Using Smart Card Data
Previous Article in Journal
Enhancing the Visibility of SuDS in Strategic Planning Using Preliminary Regional Opportunity Screening
 
 
Article
Peer-Review Record

QRB-tree Indexing: Optimized Spatial Index Expanding upon the QR-tree Index

ISPRS Int. J. Geo-Inf. 2021, 10(11), 727; https://doi.org/10.3390/ijgi10110727
by Jieqing Yu 1,2, Yi Wei 2,*, Qi Chu 2 and Lixin Wu 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
ISPRS Int. J. Geo-Inf. 2021, 10(11), 727; https://doi.org/10.3390/ijgi10110727
Submission received: 1 August 2021 / Revised: 6 October 2021 / Accepted: 11 October 2021 / Published: 27 October 2021

Round 1

Reviewer 1 Report

This paper presents a new framework to store and process spatial regions in big data systems. The authors base their proposal on the well-known QR-tree index method, extending it in a new methodology called QRB-index.

The main topic of this paper, in my opinion, is very interesting and has a good level of innovation. It has a good structure and, the proposal is well introduced and explained with high accuracy. The experimental results have been introduced with a high level of exhaustiveness and prove the effectiveness of the proposal.

I have only a few minor suggestions for the authors that, from my point of view, will improve the paper:

1) Section 2. Please, clarify the basic concept of the R-tree. The proposed procedure (QRB-tree index) is well explained but, from the inexpert reader, the R-tree concept explanation is insufficient. For example, in Figure 1, what does the "R*" leaves of an R-tree mean?

2) Section 2.1. First statement. "A QR tree index is composed of many R-tree indices; these R-tree indices fundamentally process the region queries. Since the volume of an index in a big data system might be very large, only a portion of the index, that related to the query region, can be loaded 145 into memory for searching." Please, to address the first suggestion, consider moving and re-write these sentences into Section 2. 

3) Section 2.1. Second statement. "The R-tree loading procedure is typically implemented by a recursive process (see the pseudocode presented in algorithm I), where child nodes are recursively loaded from the disk into memory". Is there a reference in the literature that supports this affirmation? Please, if yes, provide it, on the contrary, justify why the recursive approach fits with the R-tree loading procedure.

4) General. Please, add a new figure that provides to the reader a general point of view of the proposal. That is, this figure must explain (in a summarized way) how the loading process, the insertion algorithm, and the search algorithm.

5) Is possible, please provide some information about the implementation of the proposal. That is the programming language, software tools, etc. Even, provide the source code (for example, via a GitHub repo) of the implementation.

 

 

 

 

 

Author Response

Dear reviewer,
We made an adjustment to the structure, and had it re-edited by a native English speaker. Corrections regarding to the structure and re-edition can be found by using tracking mode. Corrections made according to the reviewers’ suggestions were highlighted by yellow background. 

 

 

Reviewer #1

This paper presents a new framework to store and process spatial regions in big data systems. The authors base their proposal on the well-known QR-tree index method, extending it in a new methodology called QRB-index.The main topic of this paper, in my opinion, is very interesting and has a good level of innovation. It has a good structure and, the proposal is well introduced and explained with high accuracy. The experimental results have been introduced with a high level of exhaustiveness and prove the effectiveness of the proposal.

Thanks for your comments.

I have only a few minor suggestions for the authors that, from my point of view, will improve the paper:

  • Section 2. Please, clarify the basic concept of the R-tree. The proposed procedure (QRB-tree index) is well explained but, from the inexpert reader, the R-tree concept explanation is insufficient. For example, in Figure 1, what does the "R*" leaves of an R-tree mean?

Thanks for your suggestions. We added the explanation of the R-tree in Section 2 (See line 115 to 116 ), and replaced the ‘R*’ with ‘M*’ to avoid any misunderstanding. The meaning of the ‘M*’ is presented in the caption of Figure 1 (See line 142).

  • Section 2.1. First statement. "A QR tree index is composed of many R-tree indices; these R-tree indices fundamentally process the region queries. Since the volume of an index in a big data system might be very large, only a portion of the index, that related to the query region, can be loaded 145 into memory for searching." Please, to address the first suggestion, consider moving and re-write these sentences into Section 2. 

Thanks for your suggestions. We re-write and then move the first sentence of the statement into the first paragraph of Section 2 (See line 116 to 118). The rest of the statement is unchanged, as we try to emphasize that the R-trees in a QR-tree are loaded dynamically rather than in advance.

  • Section 2.1. Second statement. "The R-tree loading procedure is typically implemented by a recursive process (see the pseudocode presented in algorithm I), where child nodes are recursively loaded from the disk into memory". Is there a reference in the literature that supports this affirmation? Please, if yes, provide it, on the contrary, justify why the recursive approach fits with the R-tree loading procedure.

Thanks for your suggestions. Examples of references on the implementation are added (See line 164 to 165).

4) General. Please, add a new figure that provides to the reader a general point of view of the proposal. That is, this figure must explain (in a summarized way) how the loading process, the insertion algorithm, and the search algorithm.

Thanks for your suggestion. We added a new figure to explain the general idea of the QRB-tree index. Please kindly see Figure 2 in the revised manuscript.

5) Is possible, please provide some information about the implementation of the proposal. That is the programming language, software tools, etc. Even, provide the source code (for example, via a GitHub repo) of the implementation.

Thanks for your suggestion. Source code is uploaded to Github repo. The URL is presented in the “Data and Code Availability Statement” section (see line 511).

Reviewer 2 Report

The paper shows organized and a strong knowledge of most well-known non-distributed spatial indexes in literature.  However, the paper shows less confidence and knowledge of distributed spatial indexes.  The limitation of distributed spatial indexes and other competitors are is not clear in this study; neither has it been investigated or experimentally tested. 

The paper claim that processing big data using a non-distributed spatial index remains sensible.  It is not clear what does the paper means.  With big data, many research studies have investigated and developed distributed spatial indexes from both sides of industry and academia to deal with big spatial data.  The paper did not support their claim with solid evidence or comprehensive comparison to show the limitation and highlight the novelty of this research study.  

I suggest that the paper does not motivate the need of the proposed QRB-tree index with big data.  A scientific and fair comparison is to be made between a non-distributed index with a non-distributed index.  A more reasonable presentation motivates the need and type of applications that use the introduced tree index. 

The experiment needs to have a comparison between other non-distributed indexing techniques with the proposed one.   

 

**Strong points**   S1: Comprehensive overview of spatial indexes in literature are giving in the introduction.   **Weak points ** W1: - The presentation of this paper needs improvements. Extensive English editing needed. Kindly see (D1).   W2: The organization of paper needs major improvement. For example, the flow of ideas between paragraphs is very hard to follow and understand what the paper is really trying to convey.   W3: The limitations of some competitor approaches are not clear.     W4: Appropriate related work is not referenced. In other words, missing strongly related work in the field of distributed spatial indexing. References used   W5: The motivation of this work is not clear. Kindly see (D2) and (D3)     ** Detailed/Technical Points **   D1: The presentation of the paper is hard to follow and read. For example, several ideas pop-up within the same paragraph. In addition, the transition between ideas is not smooth and easy to connect with prior sentences.   D2: The paper is not clear whether the paper address spatial indexing in big data. For example, the paper stated that interested reader that needs to read more about spatial indexing can read reference [19]. The referenced paper mainly survey spatial indexes on map-reduce platform. It is not clear that in this paper is the proposed index targeted the same platform. Especially, all previously mentioned indexes are not implemented on big data platforms.     D3: The paper contradict it self in a single statement ( line 74, one solution..... ), The paper claims that distributed spatial indexing result in much faster query processing, but the performance is bounded by the number of computing nodes. This is not clear and need to be explained.

Author Response

Dear reviewer,
We made an adjustment to the structure, and had it re-edited by a native English speaker. Corrections regarding to the structure and re-edition can be found by using tracking mode. Corrections made according to the reviewers’ suggestions were highlighted by yellow background.

Reviewer #4

Comments and Suggestions for Authors

The paper shows organized and a strong knowledge of most well-known non-distributed spatial indexes in literature. However, the paper shows less confidence and knowledge of distributed spatial indexes. The limitation of distributed spatial indexes and other competitors are is not clear in this study; neither has it been investigated or experimentally tested.

Thanks for your suggestions. QRB-tree is a non-distributed spatial index. Therefore, it sounds unnecessary to test the distributed spatial index. We did give a survey on the non-distributed spatial indices in the second paragraph of section 1, where the limitations of the grid-based and tree-based indices were presented (See line 57 and 60). The missing part (i.e., the limitations) for the hybrid indices was added in the revised manuscript (See line 78 to 80). The limitations of the distributed spatial indices were provided but are now revised according to the comment “D3” (See line 95 to 98).

The paper claim that processing big data using a non-distributed spatial index remains sensible. It is not clear what does the paper means. With big data, many research studies have investigated and developed distributed spatial indexes from both sides of industry and academia to deal with big spatial data. The paper did not support their claim with solid evidence or comprehensive comparison to show the limitation and highlight the novelty of this research study.

Thanks for your suggestions and comments. It is thoughtless to claim like that. In practice, it is impossible to adopt a non-distributed index in a big data system. We removed the words “big data” in case of any misunderstanding, and also revised the corresponding statements (See line 89 to 90).

As the solution concepts and techniques are similar, the performance of a distributing spatial index is likely influenced by that of the nondistributing version. Therefore, the development of a more efficient nondistributing spatial index can likely help enhance the efficiency of the distributing spatial index. This paper proposed a nondistributing spatial index, named QRB-tree index, with two optimizations on the QR-tree index. The proposed spatial index is compared with the well-known nondistributing spatial index, e.g., index in PostGIS, GeoHash index, and QR-tree index. Results show that the proposed index outperform the rest.

I suggest that the paper does not motivate the need of the proposed QRB-tree index with big data. A scientific and fair comparison is to be made between a non-distributed index with a non-distributed index. A more reasonable presentation motivates the need and type of applications that use the introduced tree index.

Thanks for your suggestions and comments. Indeed, the motivation with big data is not a good idea. We have revised the motivation of this paper. See the first sentence of the abstract (line 11), and the fourth paragraph of section 1(line 89 to 91) .

This manuscript is intend to develop a non-distributed spatial index. We did make a comparison between our newly developed index with other non-distributed spatial indices, including QR-tree index, GeoHash, and the index implemented by PostGIS, in the four section. All the tests are run at the same dataset under the same computer. We believe it is a fair comparison. Please let me known if there is any problem.

The experiment needs to have a comparison between other non-distributed indexing techniques with the proposed one.

Thanks for your suggestions and comments. All the indices to be compared with are non-distributed indices, so does the newly developed index in this manuscript. We have explicitly pointed out this problem (See line 398) in case of any misunderstanding.

**Strong points**

S1: Comprehensive overview of spatial indexes in literature are giving in the introduction.

Thanks for your comments.

**Weak points **

W1: - The presentation of this paper needs improvements. Extensive English editing needed. Kindly see (D1).

Thanks for your suggestions and comments. We are sorry for this problem. Actually, the manuscript was edited by American Journal Expert (AJE), a company that provides English editing service, before reviewing process. To make the manuscript more readable, we made some corrections and have it re-edited by AJE.

W2: The organization of paper needs major improvement. For example, the flow of ideas between paragraphs is very hard to follow and understand what the paper is really trying to convey.

Thanks for your suggestions and comments. We try to make some improvements according to your advices. Hope it is not very hard to follow now.

W3: The limitations of some competitor approaches are not clear.

Thanks for your suggestions and comments. We have revised this issue. For details, please kindly see the reply to the first comment.

W4: Appropriate related work is not referenced. In other words, missing strongly related work in the field of distributed spatial indexing. References used

Thanks for your suggestions and comments. This manuscript is to develop a non-distributed spatial index. Therefore, distributed spatial indices is not the focus of this manuscript. The reason why we argue about it in the introduction section is that non-distributed spatial index is the basis of distributed spatial index most of the time. It has no reason to compare the non-distributed spatial index with the distributed spatial index. Nonetheless, we updated two references regarding to non-distributed spatial index (See line 91 for the references of 38 and 39).

W5: The motivation of this work is not clear. Kindly see (D2) and (D3)

Thanks for your suggestions and comments. We have revised this issue. For details, please kindly see the reply to the second comment.

** Detailed/Technical Points **

D1: The presentation of the paper is hard to follow and read. For example, several ideas pop-up within the same paragraph. In addition, the transition between ideas is not smooth and easy to connect with prior sentences.

Thanks for your suggestions and comments. We try to make some improvements according to your advices. Hope it is not very hard to follow now.

D2: The paper is not clear whether the paper address spatial indexing in big data. For example, the paper stated that interested reader that needs to read more about spatial indexing can read reference [19]. The referenced paper mainly survey spatial indexes on map-reduce platform. It is not clear that in this paper is the proposed index targeted the same platform. Especially, all previously mentioned indexes are not implemented on big data platforms.

Thanks for your suggestions and comments. We have removed the scenario of “big data” throughout the manuscript. The newly developed spatial index in this manuscript is not designed for big data scenario directly, but for the scenario of a large amount of features, e.g., 100 millions of features (see the simulated dataset in section 3.1 for an examples). Currently, distributing spatial indices are adopted in big data system. The new spatial index (non-distributing index) is potential to be a distributing index with a slight modification like non-distributing QR-tree to distributing QR-tree as shown in reference [44], which will be one of our work in near future (See line 497 to 498).

Traditional spatial indices, as well as the map-reduce spatial indices, are surveyed in reference [19]( numbered [22] now). Please kindly see the second section of the reference for the survey of traditional spatial indices, which are the non-distributed spatial indices in fact.

D3: The paper contradict it self in a single statement ( line 74, one solution..... ), The paper claims that distributed spatial indexing result in much faster query processing, but the performance is bounded by the number of computing nodes. This is not clear and need to be explained.

Thanks for your suggestions and comments. We have revised it. Please kindly see line 94 to 95.

Reviewer 3 Report

The manuscript entitled "QRB-tree indexing: An optimised spatial index expanding upon the QR-tree index" presents the proposal and empirical validation of a new spatial index based on an improvement of the quad-tree R-tree index introducing the use of a bucket that speeds up the optimisation procedures. This improvement depends on the choice of the value l associated to the grid. The choice of this level is decisive to obtain better results.

The proposal is formally defined, justified and tested.

The reference literature used is relevant and up to date.

The method, logic and results support the contribution to the improvement of spatial searches in large databases.

Author Response

Dear reviewer,
We made an adjustment to the structure, and had it re-edited by a native English speaker. Corrections regarding to the structure and re-edition can be found by using tracking mode. Corrections made according to the reviewers’ suggestions were highlighted by yellow background.

Reviewer #2

Comments and Suggestions for Authors

The manuscript entitled "QRB-tree indexing: An optimised spatial index expanding upon the QR-tree index" presents the proposal and empirical validation of a new spatial index based on an improvement of the quad-tree R-tree index introducing the use of a bucket that speeds up the optimisation procedures. This improvement depends on the choice of the value l associated to the grid. The choice of this level is decisive to obtain better results.

Thanks for your comments.

The proposal is formally defined, justified and tested.

Thanks for your comments.

The reference literature used is relevant and up to date.

Thanks for your comments.

The method, logic and results support the contribution to the improvement of spatial searches in large databases.

Thanks for your comments.

Author Response File: Author Response.pdf

Reviewer 4 Report

The theme of spatial indexing is very important for processing of large datasets. Any new methods that improves performance of features selection are a benefit for GIS teams. 

As the QRB-tree indexing method presented in the article is a newly designed, the readability of the article is complicated. And it will be very beneficial to improve this aspect.  

An interesting point can be to test the method by comparing with indexing methods implemented in the most popular spatial databases (e.g. Oracle Spatial, PostGIS). Which should by able to process such amount of data on daily routine. 

Some comments to the text itself:

  • rows 109-110 - Double check the sentence: "We shall refer to the R-tree that is bound to a grid as the associated R-tree of the grid and the grid as the associated grid of the R-tree."
  • row 177 - Explanation of the term "QRB-tree index" appears too late in the text, even the acronym is used since the beginning of the text.
  • rows 217-218 - I suppose there should be "non-corner features" instead of "Accordingly, we can sort these subordinate features into corner features and non-features."
  • rows 299-300 - Some scale of rectangles sizes can be useful. Reader cannot imagine if the rectangle No. 4 on Figure 5 (a) is a 1 meter or 1 degree. "Figure 5(a) illustrates the extents of the six rectangles, and Figure 5(b) gives a close-up view of the simulated dataset on the smallest rectangle." When speaking about "a small feature" on row 288.

References are significantly oriented on articles from authors of the same  region.

Author Response

Dear reviewer,
We made an adjustment to the structure, and had it re-edited by a native English speaker. Corrections regarding to the structure and re-edition can be found by using tracking mode. Corrections made according to the reviewers’ suggestions were highlighted by yellow background.

Reviewer #3

Comments and Suggestions for Authors

The theme of spatial indexing is very important for processing of large datasets. Any new methods that improves performance of features selection are a benefit for GIS teams.

Thanks for your comments.

As the QRB-tree indexing method presented in the article is a newly designed, the readability of the article is complicated. And it will be very beneficial to improve this aspect.

Thanks for your suggestions and comments. We try to make some improvements according to your advices, and had it re-edited by a native English speaker. Hope it is readable now.

An interesting point can be to test the method by comparing with indexing methods implemented in the most popular spatial databases (e.g. Oracle Spatial, PostGIS). Which should by able to process such amount of data on daily routine.

Thanks for your suggestions. Actually, this article has already made a comparison with the implementation in PostGIS (see Figure 13). Clearly, the GeoHash (tGH(*) in Figure 13) index outperforms the implementation in PostGIS (tpg in Figure 13). Consequently, the rest of the article only compares the QRB-tree index with the GeoHash index and the QR-tree index.

Some comments to the text itself:

  • rows 109-110 - Double check the sentence: "We shall refer to the R-tree that is bound to a grid as the associated R-tree of the grid and the grid as the associated grid of the R-tree."

Thanks for your suggestions. We confirm the definition here is right.

  • row 177 - Explanation of the term "QRB-tree index" appears too late in the text, even the acronym is used since the beginning of the text.

Thanks for your suggestions. We move the explanation of the term "QRB-tree index" to the last paragraph of Section 1 (See line 102).

  • rows 217-218 - I suppose there should be "non-corner features" instead of "Accordingly, we can sort these subordinate features into corner features and non-features."

Thanks for your suggestions. We have corrected it ( See line 231).

  • rows 299-300 - Some scale of rectangles sizes can be useful. Reader cannot imagine if the rectangle No. 4 on Figure 5 (a) is a 1 meter or 1 degree. "Figure 5(a) illustrates the extents of the six rectangles, and Figure 5(b) gives a close-up view of the simulated dataset on the smallest rectangle." When speaking about "a small feature" on row 288.

Thanks for your suggestions. We added a scale bar to the bottom of the Figure 6, and revised the ‘a small feature’ to “a smaller feature” ( See line 297).

References are significantly oriented on articles from authors of the same region.

Thanks for your suggestions. We have made some changes on the references according to your advices.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

I want to thank the authors for addressing all concerns raised and modifying the manuscript accordingly.

Overall, the paper presentation is much clear now. Both promoting the proposed technique as an optimization of a non-distributed index structure, plus removing Big Data indexes" and motivations from the manuscript flashes the novelty and the significance of this research study. 

The revised version of the manuscript is much better now, and it gives more insight that there is no need to compare with big spatial indexes. 

Author Response

Reviewer #2

Comments and Suggestions for Authors

I want to thank the authors for addressing all concerns raised and modifying the manuscript accordingly.

Thanks for your comments.

Overall, the paper presentation is much clear now. Both promoting the proposed technique as an optimization of a non-distributed index structure, plus removing Big Data indexes" and motivations from the manuscript flashes the novelty and the significance of this research study. 

Thanks for your comments.

The revised version of the manuscript is much better now, and it gives more insight that there is no need to compare with big spatial indexes. 

Thanks for your comments.

Back to TopTop