Next Article in Journal
Spatiotemporal Varying Effects of Built Environment on Taxi and Ride-Hailing Ridership in New York City
Previous Article in Journal
Spatial Intensity in Tourism Accommodation: Modelling Differences in Trends for Several Types through Poisson Models
 
 
Article
Peer-Review Record

A Framework Uniting Ontology-Based Geodata Integration and Geovisual Analytics

ISPRS Int. J. Geo-Inf. 2020, 9(8), 474; https://doi.org/10.3390/ijgi9080474
by Linfang Ding 1,2, Guohui Xiao 1,3,*, Diego Calvanese 1,3,4 and Liqiu Meng 2
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
ISPRS Int. J. Geo-Inf. 2020, 9(8), 474; https://doi.org/10.3390/ijgi9080474
Submission received: 10 May 2020 / Revised: 6 July 2020 / Accepted: 27 July 2020 / Published: 28 July 2020

Round 1

Reviewer 1 Report

The paper is well written and presents results of a significant research effort. It is well organized and draws and, more importantly, builds appropriately on the previous work. 

My minor suggestions are to:

  • present previous work in form of a table where the contributions and shortcomings would be indicated; that would help make a clearer case of how this study fills the knowledge gap
  • be more explicit about relationship to existing standards, and the ones that could be developed based on this research
  • in the two case studies, clarify what are the criteria for evaluating the soundness of the proposed framework; introduction suggests that it compensates on previous work which had 'ad hoc semantics' and 'limited expressivity'; what to these two mean; how are they are concretely improved via this method in the traffic and meteorology case? the 'value-added' need to be operationlized and used in the empirical trials to demonstrate the difference this new framework makes
  • the framework seem to be driven by induction - works from the heterogenous data to build a common ontology; usually ontologies are universal (to the extent possible for a domain) and work deductively (fitting or translating a particular to general); could the author(s) speak to this? is this framework a de facto tool for making sense (ontologically) of a diverse data set? is the outcome unique for each set of data that are being integrated from various sources?
  • some of the abbreviations are not spelled out at the first mentioning; it would be good to do that even if the authors assume they are well known and widely used

Author Response

Thanks a lot for the suggestions. Below we list how we address them.

- Comment: present previous work in form of a table where the contributions and shortcomings would be indicated; that would help make a clearer case of how this study fills the knowledge gap be more explicit about relationship to existing standards, and the ones that could be developed based on this research

- Answer: We have improved the discussion comparing with existing works in Sec 2. Specifically, we have added some discussions in each subsection of Sec 2. However, we did not manage to produce a table because it would require a major effort identifying all the features to make the table meaningful, which is not feasible in this revision.


- Comment: in the two case studies, clarify what are the criteria for
evaluating the soundness of the proposed framework;

- Answer: We have added a new subsection 4.5 on evaluation, where we discuss
the criteria: (1) effectiveness of our approach in retrieving data
for analysis (2) feedback from the users and the outcome.


- Comment: introduction suggests that it compensates on previous work which
had 'ad hoc semantics' and 'limited expressivity'; what to these two
mean;

- Answer: We have moved this paragraph to Sec 2, where we have
more space to compare. "Ad hoc semantics" and "limited
expressivity' meant not using standard ontology or OWL reasoning.


- Comment: how are they are concretely improved via this method in the
traffic and meteorology case?

- Answer: Both weather and traffic data are modeled using the SSN and
GeoSPARQL ontology with extension, and it becomes possible
to answer queries and perform analysis over the combined domain using a common vocabulary. I hope the new evaluation subsection 4.5 addresses this issue.


- Comment: the 'value-added' need to be operationlized and used in the
empirical trials to demonstrate the difference this new framework
makes.

- Answer: We have added a paragraph on the feedback from stakeholders. In
particular, we describe two big industrial projects (IDEE, ODH) that
uses the current framework as the core technology.


- Comment: the framework seem to be driven by induction - works from the
heterogenous data to build a common ontology; usually ontologies
are universal (to the extent possible for a domain) and work
deductively (fitting or translating a particular to general);
could the author(s) speak to this?

- Answer: This is a very good observation. In fact, both induction and deduction are actually used. When design an OBDA specification, we start with the data sources and a few standard ontologies. On the one hand, we need to create mapping the data to the ontology (inductively), and on the hand, we use the ontology to find relevant data and guide the mapping design and ontology extension (deductively). In other words, the process of constructing OBDA specification is iterative, and we need to work in both directions (bottom-up and top-down). We have added some discussion in Sec 3.1.

- Comment: is this framework a de facto tool for making sense (ontologically)
of a diverse data set?

- Answer: We believe that our framework will become a de facto
tool. As we have added in the evaluation subsection, we have already
applied it to real industrial projects.

 

- Comment: is the outcome unique for each set of data that is being
integrated from various sources?

- Answer: Once the mapping is fixed, the outcome is unique.


- Comment: some of the abbreviations are not spelled out at the first mentioning; it would be good to do that even if the authors assume they are well known and widely used

- Answer: We have spelled out W3C, RDF, OWL, and OGC, when they first appear.

Reviewer 2 Report

The paper is very well written, nice and interesting to read. It presents an interesting application of ontologies and semantic technologies for the geographic information domain. The topic of the paper fits perfectly to the journal in general and the special issue it was submitted to. The application is well-described, at a high level, with examples and screenshots. 

However, the paper does not fulfil some of the requirements in the journal's call for papers. In particular, the ontology, data, and tool built do not seem to be available online, neither are they described at a level of technical detail to be possibly reconstructed, hence, the results of the paper are not really reproducible by others. My suggestion would be to publish at least the ontology and "cleaned" datasets online, but preferably also the demo software, so that others can use them to compare their approaches to this one. 

Further, there is no scientifically sound evaluation in the paper, hence, it is hard to say if the proposed approach actually works or not. The authors do establish that this tool/method CAN be used to integrate, and present data, but the claim that the "approach is effective for exploration and understanding" is unsubstantiated, since no experiment is conducted to study the effectiveness of the approach. Even an anecdotal comparison to some other way of integrating and using the same data sources would have helped here.

Finally, the actual novelty of the proposed approach is not obvious. The authors do discuss related work quite extensively, both in the introduction to the paper, and in section 2. However, in several of these sections it is not made clear how the proposed approach relates to this work, i.e. the authors do not clearly state how their work is different from the previous work, what is novel here, etc. I miss a detailed comparison to alternative approaches, that would establish the novelty, specifically for the geographic information domain. 

 

More in detail, I have the following comments/questions for specific part of the paper:

The first two paragraphs of the introduction contain several quotes that I find unnecessary. The authors should try to describe the content of these in their own words instead. 

The example on lines 58-61 is not given any context, e.g. what is a "wellbore"? Please explain domain and example better.

Line 77: are there no more recent examples, these seem quite old (most recent one is 14 years old)?

Lines 88-89: So why are these approaches not sufficient? How do they compare to the one proposed here? It seems strange to state that approaches exist, and then in the next paragraph say that there is a gap to be filled.

Line 97 and other places in introduction: The term "standard ontologies" is used here and there, but what does it mean? Ontologies proposed/endorsed by by the W3C? Or other kinds of de facto standards? Or only those listed on lines 103-105? 

The sentence on lines 120-121 is quite vague, what does "the flexibility of discovering patterns with the support of computational power and human reasoning" actually mean?

On lines 172-175 OBDI is defined as an extension of OBDA where data is in multiple data sources that are queried in an integrated way. However, this term seems to in the paper later on also be used for cases when only one source is queried. Hence, the difference between OBDA and OBDI is not so clear in the paper. 

The last paragraph of section 2.3 is a mix of systems/approaches targeting geographic data and more general approaches, e.g. for visual querying. I suggest that these are separated, and that the authors more clearly discuss how the general approaches would be applied to geographical data. 

One things I miss completely in section 2.4 is stream reasoning and RDF stream processing, which is frequently used to analyse sensor data streams. Some stream reasoning approaches are also using ontologies, and the RDF stream processing community has built a language (RSPQL) on top of SPARQL, for expressing patterns to detect. It seems a bit strange that none of this is mentioned, when rules are discussed at the end of section 2.4.

Another question that arises when reading section 2.4 is: is standard SPARQL really enough? This is something that is simply taken for granted by the authors, but not really discussed. Some of the stream reasoning approaches, for instance, use elaborate temporal reasoning for analysing sensor data. Why do you think that this is not needed in your case?

The semantics of Figure 1 is not clear. Are the boxes software components? Or just conceptual "steps" in a process? What do the arrows stand for? Dashed arrows? The dashed box?  

Section 3.1, first paragraph: I am not exactly sure how phase 1 fits together with the definition of OBDI that was given earlier. If OBDI means that sources are not actually integrated, but only queried through a virtual view, then why do you need phase 1 (the physical integration)? And if you do, then please explain better what kind of data cleaning means, and how that is done, e.g. in the context of this approach.

Line 276-277: What does "This process" refer to here?

Lines 278-282: I think that the mapping construction can be quite a complex process, so I am not so sure about describing it as a "lightweight" step.

Section 3.2, first paragraph: I am not sure it is obvious that all of these "standard" ontologies, such as SSN, consist of only concepts that are intuitive to the end user, e.g. a geologist or similar, as mentioned earlier. I know several concepts in SSN, for instance, that are quite hard to grasp, such as the "FeatureOfInterest", and the use of "Result" vs. the "hasSimpleResult" property. How did you arrive at the conclusion that such concepts are intuitive for the end user?

Overall, the whole section 3.2 also misses references. Does this mean that this part is not as founded on existing work as the other parts of the system? Or why are (almost) no related or previous work in visual analytics referenced here? In particular, I would like to see some motivation in existing work on the choice of visualisations that is explained in the last paragraph of 3.2.

Line 312: Here it is stated that "the queries naturally have a graphical representation". This is not obvious to me. What is that graphical representation? You mean that the simple graph patterns in the query has a natural graphical representation? Ok, I can agree with that. But what about filters? What about optionals? Or path expressions? I can see many cases where it is not obvious to me how to illustrate the query graphically. I think this statement deserves to be expanded upon, i.e. explaining what parts of the query, and/or what graphical representation you have used, in case there is previous literature on how to illustrate SPARQL graphically, which I am sure there is.

Section 4: please publish the ontology online so that it can be reviewed along with the paper, but also for others to reproduce and/or reuse this work. The same goes for any modified datasets, and it would also be useful to see the source code itself.  

Lines 352-353: So if you physically integrate this and store the data in the same database, then how is this OBDI? It is not according to how you defined it earlier. 

Line 359: What does "basic vocabularies" mean here? Since "vocabulary" is usually used as a synonym to a "light weight" ontology, it is a bit strange to call the classes of an ontology "vocabulary" - so if you mean the single elements, then why not "vocabulary elements" or "basic concepts"? I am also not sure what is meant by "basic" here - do you mean general/abstract? Or something else? Core concepts?

Please label the listing in section 4.2.1 so that it can be referenced in the text. Also it would be nice to include the base prefix/URI of the ontology somewhere. Further, the other prefixes are not explained until the end of that page, i.e. around line 393-394, while they are already used here. sosa: is also not explained as a prefix at all. 

In the same section, 4.2.1, the notion of a "grid" is introduced, however, this does not really correspond to my intuitive understanding of the term "grid". In the paper the term "grid" seems to refer to a single square delimited by the grid lines, while in my understanding the term "grid" is used for the whole thing, i.e. the set of vertical and horizontal lines that creates a number of such squares. This should be clarified in the paper. 

Figure 4 is quite messy. I understand that it is difficult to provide an illustration without crossing edges etc., however, at least you could make sure that every label is readable by not having elements displayed on top of each other. Also, I am not sure about the highlighting, what is the reader supposed to understand from that? Further, the semantics of this image is not entirely clear, especially in relation to the data illustrations later in the paper. I assume that the ovals represents an owl:Class? Then it is pretty clear that an rdfs:subClassOf arrow between two classes represent a subsumption relation between those classes. However, then there are similar looking arrows going between other classes, but that have labels not from the RDFS/OWL vocabularies. I assume then that these are properties defined in the ontology/a "standard" ontology? But what does it mean that they connect the classes in the image? Does it indicate domain and range restrictions? Or existensial/universial restrictions on the classes?  This needs to be specified in the paper, and preferably the arrows having different semantics should not look the same. 

Line 403 and 405: "are the answers" does not really make sense here. Could these queries only have one or two answers? I assume not, I rather assume that you mean that these are examples of possible answers for running the queries over a specific dataset? (Which one?)

What do you mean by "often" on line 412? It seems strange that you often have to change the mapping during an actual experiment?

The query in Figure 7b is a bit longer than the one in 7a, but not by that much. Is it really length that is important (as you say on line 417)? I would think that it is rather other factors, like what you state on line 420. However, this whole part seems quite speculative and would need a reference to support the claims, or be supported by an experiment in the paper. 

Would it be possible to find a better way to present the example in Figures 5, 6 and 7? It is quite hard to follow, when all the figures contain multiple queries/data snippets/illustrations. Would it be possible to instead split the figures so that we can se one mapping + data snippet + query in the same figure? I would be much easier to follow then. 

Figure 6b again has an unclear semantics. In contrast to the ontology illustration, here it seems to be an illustration of (virtual) RDF triples. Nevertheless, the exact same kind of arrows are used here as in the ontology case, so looking at the two figures together becomes quite confusing. 

In section 4.3 it would be interesting to know what parts of the interface was hand-crafted, and what parts were generated from the ontology and/or some predefined SPARQL queries? So for instance, is the data access view generated automatically? If not, that means every new dataset added would also need a new set of interface views to be developed. However, if this is generated automatically, it seems much more useful, but this is not discussed in the paper. 

Similarly, I wonder how the SPARQL query view is generated? Is this simply an exact representation of the basic graph patterns? Or a more elaborate visualisation, e.g. showing some shortcuts/path representations of some expressions?

Further, it is not exactly clear how the system determines what to put into the statistical result view at the bottom right? What aggregates to use, and so forth? Is this part of the query? Or expressed in some other way?

Line 476: How do you know that this view is intuitive? Did you study that? Or are you basing this on other previous studies? 

Overall, it is not clear why this particular set of visualisations and analysis methods were chosen, and why. This needs to be better explained here, or earlier in the paper, and supported by appropriate references. 

In section 4.4., I am not sure I understand why you call it "spatial patterns". It seems to be data about spatial features, but the patterns themselves - how are they spatial? To me they seem to be about volume and speed of traffic, not about the spatial features of the location. 

In the conclusions section it should be made clear that there was actually no evaluation reported at all in the paper - simply some kind of feasibility example, that shows the system could be built. However, claims such that "The experiment confirmed out hypothesis" are completely unsupported by the rest of the paper and needs to be removed. The very last line of section 5 in fact states as future work, what the authors claimed in the introduction and abstract of the paper that this paper should be about, i.e. studying the effects of the approach. However, this has apparently not been done, and should not be claimed, also c.f. my general comment about the paper, concerning evaluation. 

The following language issues/typos were also detected:

Line 55: their -> the

Lines 114-115: change the ref style,  names should be part of the sentence and not in parenthesis.

Line 195: model -> models

Line 198: "a systematical study" or "systematical studies"

Line 199, 202 and other places in the paper as well: ontology is usually inflected when used to refer to the computer science artefact, i.e. here either "an ontology" or "ontologies", not just "ontology".

Line 220: "In the transportation ..."

Line 331: "the RDF4J workbench"

Line 344: either "the formats of" or "in formats like"

Line 365: :WeatherSation -> :WeatherStation

Line 397: "ontology vocabulary" or "ontologies' vocabularies" 

Line 409: are not needed -> do not need

The reference list seems to be in order of appearance, which is not useful when an author-year reference style is applied, then the reference list needs to be ordered in alphabetical order of the first author's name. As it is now it is completely impossible to find anything in the list. 

Author Response

Thanks a lot for the very detailed and constructive review. We have significantly revised the manuscript following the suggestions. In particular, we have added a new section of the evaluation. The details are described below

- Comment: However, the paper does not fulfil some of the requirements in the journal's call for papers. In particular, the ontology, data, and tool built do not seem to be available online, neither are they described at a level of technical detail to be possibly reconstructed, hence, the results of the paper are not really reproducible by others. My suggestion would be to publish at least the ontology and "cleaned" datasets online, but preferably also the demo software, so that others can use them to compare their approaches to this one.

- Answer: We have published the code, the datasets, and the documentation on Github, and have put the link in the paper. Readers can run it easily.

- Comment: Further, there is no scientifically sound evaluation in the paper, hence, it is hard to say if the proposed approach actually works or not. The authors do establish that this tool/method CAN be used to integrate, and present data, but the claim that the "approach is effective for exploration and understanding" is unsubstantiated, since no experiment is conducted to study the effectiveness of the approach. Even an anecdotal comparison to some other way of integrating and using the same data sources would have helped here.

- Anser: we have added a new subsection 4.5 on evaluation, where we discuss
the criteria: (1) effectiveness of our approach in retrieving data
for analysis (2) feedback from the users and the outcome.

- Comment: Finally, the actual novelty of the proposed approach is not obvious. The authors do discuss related work quite extensively, both in the introduction to the paper, and in section 2. However, in several of these sections it is not made clear how the proposed approach relates to this work, i.e. the authors do not clearly state how their work is different from the previous work, what is novel here, etc. I miss a detailed comparison to alternative approaches, that would establish the novelty, specifically for the geographic information domain.

- Answer: We have improved the discussion comparing with existing works in
Sec 2. Specifically, we have added some discussions in each subsection of Sec 2.

More in detail, I have the following comments/questions for specific part of the paper:

- Comment: The first two paragraphs of the introduction contain several quotes that I find unnecessary. The authors should try to describe the content of these in their own words instead.

- Answer: We could easily rephrase the statements in our own words, but we believe that quoting other authors conveys a stronger meessage, because it shows that we are precisely addressing their concerns.


- Comment: The example on lines 58-61 is not given any context, e.g. what is a "wellbore"? Please explain domain and example better.

- Answer: We have provided further explanation to this example in the Introduction.

- Comment Line 77: are there no more recent examples, these seem quite old (most recent one is 14 years old)?

- Answer: We have cited several recent works.

- Comment: Lines 88-89: So why are these approaches not sufficient? How do they compare to the one proposed here? It seems strange to state that approaches exist, and then in the next paragraph say that there is a gap to be filled.

- Answer: We have moved the discussion of the comparison to existing works to related work. Now the introduction is now streamlined.

- Comment: Line 97 and other places in introduction: The term "standard ontologies" is used here and there, but what does it mean? Ontologies proposed/endorsed by by the W3C? Or other kinds of de facto standards? Or only those listed on lines 103-105?

- Answer: We have clarified in our paper that standard ontologies "are standardized by standards organizations, or de-facto standards used in certain domains."


- The sentence on lines 120-121 is quite vague, what does "the flexibility of discovering patterns with the support of computational power and human reasoning" actually mean?

- Answer: We have revised this sentence to: "the flexibility of
discovering patterns with the support of high-level query answering
and user interactions guided by intuitive visualizations."


- Comment: On lines 172-175 OBDI is defined as an extension of OBDA where data is in multiple data sources that are queried in an integrated way. However, this term seems to in the paper later on also be used for cases when only one source is queried. Hence, the difference between OBDA and OBDI is not so clear in the paper.

- Answer: Indded the relationshiop between OBDA and OBDI was not
clear. Now we have clarified in Sec 2.1: "OBDI typically requires an
additional step of setting up an (integrated) database so that one
can issue SQL queries to multiple datasources at the same time. This
can be done by either using a SQL federation engine, e.g., Denodo or
Dremio, to connect to the existing databases, or using a more
straightforward `physical integration' approach to import all the
datasources into one database system. After this step, OBDI
maintains the same conceptual architecture as OBDA."

- Comment: The last paragraph of section 2.3 is a mix of systems/approaches targeting geographic data and more general approaches, e.g. for visual querying. I suggest that these are separated, and that the authors more clearly discuss how the general approaches would be applied to geographical data.

- Answer: We have revised the paragraph to make the focus more clear. We have also added some discussion.

- Comment: One things I miss completely in section 2.4 is stream reasoning and RDF stream processing, which is frequently used to analyse sensor data streams. Some stream reasoning approaches are also using ontologies, and the RDF stream processing community has built a language (RSPQL) on top of SPARQL, for expressing patterns to detect. It seems a bit strange that none of this is mentioned, when rules are discussed at the end of section 2.4.

- Answer: Indeed RDF stream reasoning is quite relevant, although we don't address this aspect in this paper. We have added one paragraph at the end of section 2.4.


- Comment: Another question that arises when reading section 2.4 is: is standard SPARQL really enough? This is something that is simply taken for granted by the authors, but not really discussed. Some of the stream reasoning approaches, for instance, use elaborate temporal reasoning for analysing sensor data. Why do you think that this is not needed in your case?

- Answer: Indeed standard SPARQL query is not enough. In the current work, we
essentially only use SPARQL queries to retrieve data and do the
analysis as postprocessing. We have added some discussion: "Also, plain SPARQL is often not expressive enough to model complex temporal patterns. Brandt et
al. [68] proposed an expressive rule language based the Metric
Temporal Logic. The current paper focuses only on static data
retrieved by the classical SPARQL language, and the real-time aspect
and more expressive temporal queries will be studied in our future
work."


- Comment: The semantics of Figure 1 is not clear. Are the boxes software components? Or just conceptual "steps" in a process? What do the arrows stand for? Dashed arrows? The dashed box?

- Answer: We have revised the figure significanly. We have removed
most of boxes. Now the boxes simply mean a group of artifacts. The
arrow means information flow.


Section 3.1, first paragraph: I am not exactly sure how phase 1 fits together with the definition of OBDI that was given earlier. If OBDI means that sources are not actually integrated, but only queried through a virtual view, then why do you need phase 1 (the physical integration)? And if you do, then please explain better what kind of data cleaning means, and how that is done, e.g. in the context of this approach.

- Answer: As in this version we have explained the difference between
OBDA and OBDI in Sec 2. It should be clear that the physical
intergration means the step of loading the data sourcs into one
storage system. Regarding data cleaning step, it means the necessary
steps to make sure the data can be loaded into the database. For
instance, when working with Excel files, we often need to remove some
header lines with some metainformation from the file.

- Comment: Line 276-277: What does "This process" refer to here?

- Answer: We have revised it into "This process of mapping and ontology construction".

- Comment: Lines 278-282: I think that the mapping construction can be quite a complex process, so I am not so sure about describing it as a "lightweight" step.

- Answer: Indeed the whole process of mapping constrution is not lightweight. We have rephrased this sentence by "(re)iteration of ontology/mapping construction step is much more lightweight than the materialization-based approach."

- Comment: Section 3.2, first paragraph: I am not sure it is obvious that all of these "standard" ontologies, such as SSN, consist of only concepts that are intuitive to the end user, e.g. a geologist or similar, as mentioned earlier. I know several concepts in SSN, for instance, that are quite hard to grasp, such as the "FeatureOfInterest", and the use of "Result" vs. the "hasSimpleResult" property. How did you arrive at the conclusion that such concepts are intuitive for the end user?

- Answer: Indeed standarad ontologies are not necessarily easy to
understand. We have two replies to this comment (1) The SSN ontology
is not necessarily easy because the doamin of sensor itself is not trivial. The
authors have personally seen several real usecases in large companies
with wrong modelling in this domain. However, it is intuive for end user with a
good understanding of the domain. Spending some effort understanding
the SSN ontology will definitely pay off, because this is an
abstracted model, and can be later used in all kinds of sensor data
sources, e.g. censors in the turbines, trains, and even observations
of the universe. (2) Thanks to the user interface, users do not need
to know all the concepts from the ontology, and the interface hides some
complexities from the ontology. E.g., "FeatureOfInterest" is not
visible from the interface we developed.

 

- Commnent: Overall, the whole section 3.2 also misses references. Does this mean that this part is not as founded on existing work as the other parts of the system? Or why are (almost) no related or previous work in visual analytics referenced here? In particular, I would like to see some motivation in existing work on the choice of visualisations that is explained in the last paragraph of 3.2.

- Answer: We have improved this paragraph by adding several citations
in the text to support our claims. Most of them have actually
already been discussed in Sec 2. Now it becomes more clear how they
are related to our choices.


- Comment: Line 312: Here it is stated that "the queries naturally have a graphical representation". This is not obvious to me. What is that graphical representation? You mean that the simple graph patterns in the query has a natural graphical representation? Ok, I can agree with that. But what about filters? What about optionals? Or path expressions? I can see many cases where it is not obvious to me how to illustrate the query graphically. I think this statement deserves to be expanded upon, i.e. explaining what parts of the query, and/or what graphical representation you have used, in case there is previous literature on how to illustrate SPARQL graphically, which I am sure there is.

- Answer: Indeed the full SPARQL queries are not easy to
illustrate. In this work, we focus only on the visualization of
"basic graph patterns", which has natural graphical
representations.

Section 4: please publish the ontology online so that it can be reviewed along with the paper, but also for others to reproduce and/or reuse this work. The same goes for any modified datasets, and it would also be useful to see the source code itself.

- Answer: We have published the code, the datasets, and the documentation on Github, and have put the link in the paper.

- Lines 352-353: So if you physically integrate this and store the data in the same database, then how is this OBDI? It is not according to how you defined it earlier.

- Answer: In this version, as explained in Sec 2, OBDI typically
requires an additional step of setting up an (integrated) database
so that one can issue SQL queries to multiple data sources at the
same time. This can be done by either using a SQL federation engine,
e.g., Denodo or Dremio, to connect to the existing databases, or
using a more straightforward ``physical integration'' approach to
import all the data sources into one database system.

- Comment: Line 359: What does "basic vocabularies" mean here? Since "vocabulary" is usually used as a synonym to a "light weight" ontology, it is a bit strange to call the classes of an ontology "vocabulary" - so if you mean the single elements, then why not "vocabulary elements" or "basic concepts"? I am also not sure what is meant by "basic" here - do you mean general/abstract? Or something else? Core concepts?

- Answer: Indeed we mean "core concepts" by "basic vocabularies", and
we have reworked this paragraph completely. It should be clear now.


- Please label the listing in section 4.2.1 so that it can be referenced in the text. Also it would be nice to include the base prefix/URI of the ontology somewhere. Further, the other prefixes are not explained until the end of that page, i.e. around line 393-394, while they are already used here. sosa: is also not explained as a prefix at all.

- Answer: We have removed the listing in Sec 4.2.1 because the new ontology figure conveys more information. The prefixes are given in Table 2.

- Comment: In the same section, 4.2.1, the notion of a "grid" is introduced, however, this does not really correspond to my intuitive understanding of the term "grid". In the paper the term "grid" seems to refer to a single square delimited by the grid lines, while in my understanding the term "grid" is used for the whole thing, i.e. the set of vertical and horizontal lines that creates a number of such squares. This should be clarified in the paper.

- Answer: Thanks for the suggestion. We have refined the
terminology. Now "grid" refers to the whole thing, and each square
is called a "cell". We have also updated the ontology.

- Comment: Figure 4 is quite messy. I understand that it is difficult to provide an illustration without crossing edges etc., however, at least you could make sure that every label is readable by not having elements displayed on top of each other. Also, I am not sure about the highlighting, what is the reader supposed to understand from that? Further, the semantics of this image is not entirely clear, especially in relation to the data illustrations later in the paper. I assume that the ovals represents an owl:Class? Then it is pretty clear that an rdfs:subClassOf arrow between two classes represent a subsumption relation between those classes. However, then there are similar looking arrows going between other classes, but that have labels not from the RDFS/OWL vocabularies. I assume then that these are properties defined in the ontology/a "standard" ontology? But what does it mean that they connect the classes in the image? Does it indicate domain and range restrictions? Or existensial/universial restrictions on the classes? This needs to be specified in the paper, and preferably the arrows having different semantics should not look the same.

- Answer. This is a fair assessment. We wanted to show more structure of the ontology, including the relationship between classes. However, this turns out to be too confusing as the reviewer pointed out. In this revision, we are less ambitious and we have replaced this figure by screenshots from Protege, which should not cause any confusion, although less information is shown.

- Comment: Line 403 and 405: "are the answers" does not really make sense here. Could these queries only have one or two answers? I assume not, I rather assume that you mean that these are examples of possible answers for running the queries over a specific dataset? (Which one?)

- Answer: Indeed, these are some answers over our sample data. We have reworked this part.

- Comment: What do you mean by "often" on line 412? It seems strange that you often have to change the mapping during an actual experiment?

- Answer: We have revised this paragraph by not using the word
"often", which might be misleading. In fact, we meant that when
developing mappings, we can change it and test it easily.

- Comment: The query in Figure 7b is a bit longer than the one in 7a, but not by that much. Is it really length that is important (as you say on line 417)? I would think that it is rather other factors, like what you state on line 420. However, this whole part seems quite speculative and would need a reference to support the claims, or be supported by an experiment in the paper.

- Answer: Indeed not only the size is the critical factor. The emphasis is also about the understandability. For comparing sizes of SPARQL and SQL, we have chosen three more meaningful examples in the new evaluation subsection 4.5.

- Comment: Would it be possible to find a better way to present the example in Figures 5, 6 and 7? It is quite hard to follow, when all the figures contain multiple queries/data snippets/illustrations. Would it be possible to instead split the figures so that we can se one mapping + data snippet + query in the same figure? I would be much easier to follow then.

- Answer: Thanks for the suggestions. We have merged the mapping, sample data snippet, and the generated RDF triples into the new Table 3. Now the correspondence should be crystal clear to the readers. The unified RDF graph is put into a separate Figure 5.

- Comment: Figure 6b again has an unclear semantics. In contrast to the ontology illustration, here it seems to be an illustration of (virtual) RDF triples. Nevertheless, the exact same kind of arrows are used here as in the ontology case, so looking at the two figures together becomes quite confusing.

- Answer. Since we have changed the figure of ontology, the confusion is gone.


- Comment: In section 4.3 it would be interesting to know what parts of the interface was hand-crafted, and what parts were generated from the ontology and/or some predefined SPARQL queries? So for instance, is the data access view generated automatically? If not, that means every new dataset added would also need a new set of interface views to be developed. However, if this is generated automatically, it seems much more useful, but this is not discussed in the paper.

- Answer: Currently the data access view is hand-crafted. We agree that an automatic interface is more useful. Indeed, we will develop an ontology-driven interface in the future. This should be not very challenging, since the current hand-crafted version already partly follows the structure of the ontology.


- Comment: Similarly, I wonder how the SPARQL query view is generated? Is this simply an exact representation of the basic graph patterns? Or a more elaborate visualization, e.g. showing some shortcuts/path representations of some expressions?

- Answer: The SPARQL query view is generated automatically. It is indeed an exact representation of the basic graph patterns.


- Comment: Further, it is not exactly clear how the system determines what to put into the statistical result view at the bottom right? What aggregates to use, and so forth? Is this part of the query? Or expressed in some other way?

- Answer: Currently it is hand-crafted. In the future, we want to make it customizable according to the type of the data we are working on.

- Comment: Line 476: How do you know that this view is intuitive? Did you study that? Or are you basing this on other previous studies? Overall, it is not clear why this particular set of visualisations and analysis methods were chosen, and why. This needs to be better explained here, or earlier in the paper, and supported by appropriate references.

- Answer: We have added some references to support our choices.


- Comment: In section 4.4., I am not sure I understand why you call it "spatial patterns". It seems to be data about spatial features, but the patterns themselves - how are they spatial? To me they seem to be about volume and speed of traffic, not about the spatial features of the location.

- Answer: Thanks. Indeed, only comparing two stations was not enough to
justify "spatial patterns". However, this can be easily generalized to
show the info of all the stations on the map, e.g., using dot size to represent traffic volume or speed proportionally. We plan to design and implement this part in the future. Also, we have added some
discussion of precipitation, where the spatial patterns are more clear.


- Comment: In the conclusions section it should be made clear that there was actually no evaluation reported at all in the paper - simply some kind of feasibility example, that shows the system could be built. However, claims such that "The experiment confirmed out hypothesis" are completely unsupported by the rest of the paper and needs to be removed. The very last line of section 5 in fact states as future work, what the authors claimed in the introduction and abstract of the paper that this paper should be about, i.e. studying the effects of the approach. However, this has apparently not been done, and should not be claimed, also c.f. my general comment about the paper, concerning evaluation.

- Answer: With the newly added evaluation section 4.5, the conclusion should be more supported. We have also revised it to make it more objective.

The following language issues/typos were also detected:

Line 55: their -> the

Fixed

Lines 114-115: change the ref style, names should be part of the sentence and not in parenthesis.

Number style now

Line 195: model -> models

Fixed

Line 198: "a systematical study" or "systematical studies"

Fixed

Line 199, 202 and other places in the paper as well: ontology is usually inflected when used to refer to the computer science artefact, i.e. here either "an ontology" or "ontologies", not just "ontology".

Fixed

Line 220: "In the transportation ..."

Fixed

Line 331: "the RDF4J workbench"

Fixed. We have changed it to "using Ontop"

Line 344: either "the formats of" or "in formats like"

Fixed

Line 365: :WeatherSation -> :WeatherStation

We have removed this example of the turtle syntax.

Line 397: "ontology vocabulary" or "ontologies' vocabularies"

Fixed

Line 409: are not needed -> do not need

Fixed

The reference list seems to be in order of appearance, which is not useful when an author-year reference style is applied, then the reference list needs to be ordered in alphabetical order of the first author's name. As it is now it is completely impossible to find anything in the list.

We have changed it to the number format.

Reviewer 3 Report

    This manuscript adopts a solution of spatial data integration and visualization that uses the ontological model and multiple geo-visualization techniques. This topic is quite interesting. And their contribution is easily observed: (1) an ontology-based data integration (OBDI) module with the specific data domain ontology knowledge integrated; (2) a geo-visual analytical (GeoVA) module using standard ontologies to graphically show the spatial correlations between a large volume of gathered sensor data.   However, there still exist many flaws in the current version of the manuscripts.   First, the interpretation of several parts of the experiment is too shallow. Take the Fig. 15b and c as the example: I cannot find any meaningful explanation of scatter points aligned by the vertical axis in the last paragraph on page 19.   Second, the rewriting or conversion progress in the SPARQL-to-SQL could be better demonstrated by using sentence mapping or other techniques. Just pasting two equal parts between these two queries is a little weak to show their intrinsic ontological relationships in Fig. 7.   Third, why do authors only use one experimental region data? In my opinion, testing the module among more than two regions is a common sense for field tests.      

 

Author Response

Thanks a lot for the feedback. We have revised the paper accordingly. The details are provided below:

 

Comment: First, the interpretation of several parts of the experiment is
too shallow. Take the Fig. 15b and c as the example: I cannot find
any meaningful explanation of scatter points aligned by the vertical
axis in the last paragraph on page 19.

- Answer: We have added more explanation to the text for these
figures. These scatter points aligned by the vertical axis show that
there are no strong correlations.

- Comment: Second, the rewriting or conversion progress in the SPARQL-to-SQL
could be better demonstrated by using sentence mapping or other
techniques. Just pasting two equal parts between these two queries
is a little weak to show their intrinsic ontological relationships
in Fig. 7.

- Answer: We have reworked on this part. In Section 4.2, we have
explained better how SPARQL-to-SQL works. In the new Section 4.5, we
have added an evaluation, where also more examples are provided.


- Comment: Third, why do authors only use one experimental region data? In
my opinion, testing the module among more than two regions is a
common sense for field tests.

- Answer: Since we are using standard ontologies (SSN and GeoSPARQL),
the same principle can be used directly in other regions by creating
suitable mapping and proper extension of the ontology. In such
experiments, our experience is that using more datasets in one
region is actually more challenging than more regions. In future
work, we would like to study more regions as well.

Round 2

Reviewer 2 Report

Thanks for the improved version of the paper, I enjoyed reading it, and overall I think this nicely describes an important use case of semantic technologies. I still have some doubts regarding the evaluation part, as I will explain below, but all my other objections have been addressed, including the publication of data and ontologies for reproducibility. The details of the application, and how exactly data is integrated and used by the interface is much more clear in this version. Indeed I think this can be a very interesting and useful paper for the readers of the journal. 

There are two final things that I would strongly recommend, before publishing the paper:

1) The "evaluation" that has been added to the paper is useful and adds an understanding of the potential merits of the approach, however, I still do not agree that the evaluation supports the claim that the approach is "effective for exploration and understanding" (line 14 and 94). Instead, the conclusions more accurately sums up what can actually be concluded, namely that this "framework is feasible for the exploration and understanding ..." I think that this is all that the authors can claim based on the current work. I also strongly suggest to rename sections 4.5 and 4.5.1, since this is not really a full evaluation. I could envision to call 4.5 "Preliminary studies" or "Establishing feasibility" or something similar, and 4.5.1 could be called "Exploring effectiveness" or something else, indicating that this is not actually a proper scientific evaluation of effectiveness. In particular since no users are involved, and the results are not compared to anything (but perhaps the "raw" SQL queries, but not even that in a systematic manner). So in summary, I think the content of the paper, including the "evaluation" section, is fine, but I think that the formulations, in the abstract and introduction in particular, and the section headings in section 4.5, need to be rephrased to match what is actually being done and what can be concluded from that. 

2) Some proofreading of especially the newly written parts, which contain some English grammar issues and several typos.

Author Response

Thanks a lot for your assessment and the additional suggestions. Your feedback was very useful for improving the manuscript. We have carefully addressed the new issues and proofread the whole manuscript again.

1) The "evaluation" that has been added to the paper is useful and adds an understanding of the potential merits of the approach, however, I still do not agree that the evaluation supports the claim that the approach is "effective for exploration and understanding" (line 14 and 94). Instead, the conclusions more accurately sums up what can actually be concluded, namely that this "framework is feasible for the exploration and understanding ..." I think that this is all that the authors can claim based on the current work. I also strongly suggest to rename sections 4.5 and 4.5.1, since this is not really a full evaluation. I could envision to call 4.5 "Preliminary studies" or "Establishing feasibility" or something similar, and 4.5.1 could be called "Exploring effectiveness" or something else, indicating that this is not actually a proper scientific evaluation of effectiveness. In particular since no users are involved, and the results are not compared to anything (but perhaps the "raw" SQL queries, but not even that in a systematic manner). So in summary, I think the content of the paper, including the "evaluation" section, is fine, but I think that the formulations, in the abstract and introduction in particular, and the section headings in section 4.5, need to be rephrased to match what is actually being done and what can be concluded from that.

Answer: Following your suggestions, we have made the changes to better
reflect the status of our experiment:

- We have changed "effective" to "feasible" in the abstract and introduction
- Now Sec 4.5 is "Preliminary Studies" and Sec 4.5.1 is "Exploring Effectiveness".

2) Some proofreading of especially the newly written parts, which contain some English grammar issues and several typos.

Answer: We have carefully proofread the whole manuscript.

Back to TopTop