Next Article in Journal
Effects of Endurance Cycling on Mechanomyographic Median Power Frequency of the Vastus Lateralis
Next Article in Special Issue
GAN-Based Approaches for Generating Structured Data in the Medical Domain
Previous Article in Journal
Influence of Climate Change and Land-Use Alteration on Water Resources in Multan, Pakistan
Previous Article in Special Issue
Multi-Institutional Breast Cancer Detection Using a Secure On-Boarding Service for Distributed Analytics
 
 
Article
Peer-Review Record

Towards an Ontology-Based Phenotypic Query Model

Appl. Sci. 2022, 12(10), 5214; https://doi.org/10.3390/app12105214
by Christoph Beger 1,2,3,*, Franz Matthies 1,3, Ralph Schäfermeier 1,3, Toralf Kirsten 1,3,4, Heinrich Herre 1 and Alexandr Uciteli 1,3,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2022, 12(10), 5214; https://doi.org/10.3390/app12105214
Submission received: 13 April 2022 / Revised: 13 May 2022 / Accepted: 16 May 2022 / Published: 21 May 2022
(This article belongs to the Special Issue Data Science for Medical Informatics)

Round 1

Reviewer 1 Report

As per my previous report, I don't see any reason for not accepting the paper after the revision provided.

More holistically, in my humble opinion there is plenty of room for improvement, especially in terms of research background and references. 

Author Response

Dear reviewer,

Thank you very much for taking the time to do the review (again). We made the following changes to the manuscript:

  • improved English writing in some paragraphs
  • added new paragraph to section 2.2. with more clear reference to the original COP publication, where the reader can find more details and examples
  • updated caption of Figure 3
  • added description of the term ‘projection’ to section 2.4.
  • added description of the term ‘API’ to section 2.5.
  • added Figure 4 to section 2.5. with an overview of a possible architecture to enable modelling phenotypes in PheSOs, transformation to queries and retrieving result sets
  • added hint to section 2.6. that FAIR is composed of ‘findable’, ‘accessible’, ‘interoperable’ and ‘reusable’
  • reformatted the FHIR GET request example of section 3.1.1.
  • added new Table 1 to section 3.1.1. with example FHIR Search queries
  • added new paragraph about limitations to the end of section 4.

Best regards.

Reviewer 2 Report

This study proposed to use ontologies to model phenotypic knowledge on patients or study data management systems. I should appreciate the authors' time and patience to come up with some results. Below are several comments on this work.

  1. Could you add more figures to deploy the results?
  2. In Conclusion, there was no mention of the limitations of the study.
  3. The authors should proofread the English writing to improve the study.

Author Response

Dear reviewer,

Thank you very much for your report. We added an overview Figure 4 to section 2.5. that should demonstrate how the transformation of our ontology-based phenotypic model to a query language can be implemented and we added some more examples for SQL and FHIR Search based queries. We also updated the discussion section and added a paragraph about limitations of our study. In some sections the English writing has been updated too.

Thank you very much for taking the time to do the review.

Best regards.

Reviewer 3 Report

This is one of the best written papers I've read in quite a while (in terms of English, and its overall clarity).  I was disappointed in that you leave out details on how to build the CoP and write a PheSO.  It seems to me that any researcher wanting to use your model will need far more detail to understand both what is expected (scope, detail), how to represent the knowledge in CoP and how to then implement a PheSO.  I could not determine from the paper just how much effort a researcher would have to go through to implement these.  In your results section, it would be nice to show more examples, perhaps a table of specific queries and their translated forms in a given query language.  I realize space is limited but it seems to me you spend too much time addressing background on the medical domain and not enough on implementation details.  This might be my own bias being an AI person and not a medical person.

Otherwise, my comments below are just to help you revise the paper in modest ways to improve it. 

In the text leading up to figure 3, you might mention that the abbreviations used in the figure are further described in upcoming subsections, or alternatively place a key in the figure that spells out the abbreviations. 

On page 8, line 287, you talk about "in projection".  Its unclear to me what this means - are you talking about a projection operation in a database query?  This might need to be more concretely explained.

On page 8, line 296, spell out what API means (or more specifically what API capabilities are).  Readers may be unfamiliar with an API.

In section 2.6, specify what FAIR stands for.

The GET command shown on lines 413-414 runs into the left margin.  Try to reposition it even if it extends to a third line.

Again, a very well written paper and I think one that will be of interest.

Author Response

Dear reviewer,

Thank you very much for your kind words and also the very helpful suggestions to improve our manuscript.

We added some more examples on how phenotypic models can be translated into actual queries. For these examples, we used FHIR Search as query language (see section 3.1.1.). Also, there is a new paragraph at the end of section 2.2. that should direct the reader to our other publication with detailed information about COP and modelling details of PheSOs.

We also did some modifications to the text according to the comments you listed. Regarding your suggestion “In section 2.6, specify what FAIR stands for.”, we think that it is already sufficiently described in the first paragraph of section 2.6. We have added a small hint, that the terms findability, accessibility, interoperability, and reusability are related to the letters of FAIR.

Thank you very much for taking the time to do the review.

Best regards.

Detailed change list:

  • improved English writing in some paragraphs
  • added new paragraph to section 2.2. with more clear reference to the original COP publication, where the reader can find more details and examples
  • updated caption of Figure 3
  • added description of the term ‘projection’ to section 2.4.
  • added description of the term ‘API’ to section 2.5.
  • added Figure 4 to section 2.5. with an overview of a possible architecture to enable modelling phenotypes in PheSOs, transformation to queries and retrieving result sets
  • added hint to section 2.6. that FAIR is composed of ‘findable’, ‘accessible’, ‘interoperable’ and ‘reusable’
  • reformatted the FHIR GET request example of section 3.1.1.
  • added new Table 1 to section 3.1.1. with example FHIR Search queries
  • added new paragraph about limitations to the end of section 4.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

Authors present a method to develop phenotypic queries, based on a "standard" ontology (Core Ontology of Phenotypes). The motivations behind such a method and its tool are clear - facilitate the use, share, and mapping of data in clinical studies. 

The method relies on a layered process, moving from the most abstract to the most specific (i.e., it results in a specific-language query). 

Although the paper is well-written and its premises are solid, this reviewer has some major issues with this work. The main one is: why didn't the Authors rely on SPARQL, the RDF/OWL query language? SPARQL is a widely-used and well-known query language for triple data and can be used also in Ontology-based Data Access (OBDA) contexts.

In other words, Authors should explicitly clarify why there is the need for another, different, query language for ontologies. If source data are not available in an ontological form, then an API to get them into that form should be developed - and from there, data can be queried with SPARQL. 

Also, in Sect. 2.1 and 2.2 the role of PheSOs is unclear: are they task ontologies generated from COP? How are they generated and what happens if two or more domain experts model the same concepts in different ways? Are the PheSOs locally stored (I guess so)? In any case, when domain experts are involved in a knowledge engineering task, they should rely on some Ontology Engineering Methodology. 

No typos, correct grammar. Some sentences could be shorter.

I do not think this work is ready for publication, yet. It is very unclear why not relying on SPARQL (or a program able to convert a natural-language / or any other language query into a SPARQL query) could be of some help in solving the problems at hand. 

Author Response

Dear Reviewer,

Thank you very much for your comments. You are absolutely right, SPARQL is an important and established query language and should be considered in the paper. Furthermore, we realised that we caused misunderstanding by using the term 'abstract query language'. In fact, we are talking about an ontological model for phenotypic knowledge. We improved the paper accordingly.

Summary of changes to the submission:

  • The title was updated to better reflect the topic being about ontological modelling of phenotypic knowledge
  • Several occurrences of ‘query’ and ‘language’ were replaced with ‘knowledge’ and ‘model’ or were reformulated, figure 1 was updated accordingly
  • The introduction was extended with references to SPARQL and OBDA
  • We also added a hopefully more clear description to the introduction that the paper’s main focus is not on executing queries or searching in ontologies, but modelling phenotypic knowledge
  • Section 2.1 was updated to make the role of Phenotype Specification Ontologies (PheSOs) more clear
  • SPARQL was added as another example for target query languages of which queries can be constructed from our ontological model
  • Some sentences were shortened for better readability
  • Minor grammar issues were fixed

Detailed answers to your comments:

The objective of our work is not to develop a new query language for ontologies, but to define the notion of 'phenotypic query', develop a classification of phenotypic queries and provide a methodology to specify knowledge required for creating phenotypic queries. I.e, the paper is not about specification and sharing of queries themselves, but about the knowledge required for creating queries. The focus is therefore not on the form of patient data (databases, triple stores, FHIR repositories or file formats) or used query languages (SPARQL, SQL, etc.). Rather, it is about how complex phenotypic knowledge can be constructed (from precisely defined and reusable components), shared and semantically represented in a way comprehensible to domain experts (fully independently of the used specific query language).

The most important components of the phenotypic knowledge are so-called items (or as we call them, Unrestricted Single Phenotypes). These are characteristics of patients or study participants (e.g., age, sex, weight, height) for which data are available and can be queried. Such items have to be defined and shared in an elaborate and precise manner by specifying various metadata (e.g., datatype, measurement unit, codes from terminologies, such as LOINC or SNOMED, etc.). Further components of the phenotypic knowledge are mathematical formulas, relevant value ranges and also complex Boolean connections between classes or value ranges (see simple example below or more complex examples in the paper). Only if all data queries or data analyses are based on common knowledge, the transparency and comparability of the results can be guaranteed. Ontologies are a very suitable way to model, represent and reuse such complex phenotypic knowledge and its parts. Query languages (such as SPARQL or SQL) or natural language are not suitable for this purpose. As a basic representation formalism for phenotypic knowledge, we decided therefore to use ontologies (PheSO). The knowledge specified in a PheSO can be used to build many different queries. Sometimes it is necessary to split a complex query into several simpler ones and then combine the individual query results in the software (e.g. using external tools or services). Such functionality can be implemented in an adapter according to our approach.

The PheSOs are developed collaboratively by a group of domain experts (e.g., physicians or biometricians) supported by appropriate tools, so that they do not need any knowledge on ontologies or query languages. The PheSOs should be developed and shared in a web application (repository). The PheSO has a structure that is specified by the COP. The COP defines exactly which types of classes (the PheSO classes are subclasses of the COP classes), properties and axioms may be used in the PheSO. IT specialists develop tools supporting the specification of phenotypic knowledge (PheSO) and generation of queries in the desired query languages from this knowledge (some examples were mentioned in the paper). This enables a very clear division of labour. One can implement different translators/adapters from the ontological model to the respective query languages or just one SPARQL translator but different mappings of the respective data sources to an ontological representation (e.g. with OBDA or direct transformation). It makes no difference to our approach. In our projects (e.g. SMITH), we mostly deal with sensitive patient data. This data is managed in a protected area (e.g. in a hospital), mostly using an established medical standard for electronic health records (such as HL7 FHIR). Access to patient data is highly regulated and limited, e.g., in terms of the query languages (for FHIR, e.g., FHIR Search) and query types to be used. In some cases, no specific patient data may be requested, but only a number of patients meeting certain criteria. I.e. patient data must not be copied just anywhere (e.g. in a triple store) and may only be queried using FHIR Search. In this case, we therefore had to directly generate FHIR search queries from the ontology and combine the results of individual queries in the software. Nevertheless, it is of course possible (according to our approach) to generate SPARQL queries from a PheSO by using an appropriate adapter or mapping tool. 

Simple example:

Let us consider a very simple example of a phenotypic query: 'Search for obese patients' (based on weight and height). It sounds so simple but it requires, among other things, the knowledge of a mathematical formula (BMI), a BMI classification including corresponding value ranges (e.g. https://www.euro.who.int/en/health-topics/disease-prevention/nutrition/a-healthy-lifestyle/body-mass-index-bmi) as well as codes from various terminologies (e.g. from LOINC, body weight: 29463-7, body height: 8302-2) to identify the required data in the electronic health record. According to our approach, a PheSO would be developed containing, among other things, the classes Height, Weight, BMI and Obese with corresponding attributes (data types, units of measurement, value range, formula and codes). Queries in desired languages can then be easily generated from the PheSO.

 

Best Regards

Reviewer 2 Report

My most significative concern is directly related to the proposed approach for establishing a domain-specific query language.

Indeed, the support of formal query languages on ontological structures is barely mentioned (SPARQL, at the very end of the paper). The related statement:

We believe that our approach of modelling phenotype classes and automatically constructing phenotypic queries is more generic than the one proposed by Zhang et al., because we did not prepare a disease-specific ontology, but encourage researchers in building phenotype classes and enriching them with annotations on their own so that resulting PheSOs can be shared, are usable independently of the trial data management system (with the drawback that map-  pings need to be implemented), and are not restricted to a limited set of diseases.

sounds very ambiguous and doesn't provide any actual motivation for the approach proposed but rather recalls some typical functions supported by ontologies.

I would suggest to first of all properly discuss formal query on ontologies and related limitations; then, to discuss solutions in literature (e.g. translation of queries in a natural language to formal query); and, finally, to address clearly the proposed approach in context with a proper justification.

Please note that abstractions may be effectively built on top of the formal query layer (SPARQL normally) and, indeed, the most popular automatic reasoners (e.g. JENA, PELLET and HermiT) provides a query wrapper to support post-reasoning query as part of their architecture.   

Author Response

Dear Reviewer,

Thank you very much for your comments. You are absolutely right, SPARQL is an important and established query language and should be considered in the paper. Furthermore, we realised that we caused misunderstanding by using the term 'abstract query language'. In fact, we are talking about an ontological model for phenotypic knowledge. We improved the paper accordingly.

Summary of changes to the submission:

  • The title was updated to better reflect the topic being about ontological modelling of phenotypic knowledge
  • Several occurrences of ‘query’ and ‘language’ were replaced with ‘knowledge’ and ‘model’ or were reformulated, figure 1 was updated accordingly
  • The introduction was extended with references to SPARQL and OBDA
  • We also added a hopefully more clear description to the introduction that the paper’s main focus is not on executing queries or searching in ontologies, but modelling phenotypic knowledge
  • Section 2.1 was updated to make the role of Phenotype Specification Ontologies (PheSOs) more clear
  • SPARQL was added as another example for target query languages of which queries can be constructed from our ontological model
  • Some sentences were shortened for better readability
  • Minor grammar issues were fixed

Detailed answers to your comments:

The objective of our work is not to develop a new query language for ontologies, but to define the notion of 'phenotypic query', develop a classification of phenotypic queries and provide a methodology to specify knowledge required for creating phenotypic queries. I.e, the paper is not about specification and sharing of queries themselves, but about the knowledge required for creating queries. The focus is therefore not on the form of patient data (databases, triple stores, FHIR repositories or file formats) or used query languages (SPARQL, SQL, etc.). Rather, it is about how complex phenotypic knowledge can be constructed (from precisely defined and reusable components), shared and semantically represented in a way comprehensible to domain experts (fully independently of the used specific query language).

The most important components of the phenotypic knowledge are so-called items (or as we call them, Unrestricted Single Phenotypes). These are characteristics of patients or study participants (e.g., age, sex, weight, height) for which data are available and can be queried. Such items have to be defined and shared in an elaborate and precise manner by specifying various metadata (e.g., datatype, measurement unit, codes from terminologies, such as LOINC or SNOMED, etc.). Further components of the phenotypic knowledge are mathematical formulas, relevant value ranges and also complex Boolean connections between classes or value ranges (see simple example below or more complex examples in the paper). Only if all data queries or data analyses are based on common knowledge, the transparency and comparability of the results can be guaranteed. Ontologies are a very suitable way to model, represent and reuse such complex phenotypic knowledge and its parts. Query languages (such as SPARQL or SQL) or natural language are not suitable for this purpose. As a basic representation formalism for phenotypic knowledge, we decided therefore to use ontologies (PheSO). The knowledge specified in a PheSO can be used to build many different queries. Sometimes it is necessary to split a complex query into several simpler ones and then combine the individual query results in the software (e.g. using external tools or services). Such functionality can be implemented in an adapter according to our approach.

The PheSOs are developed collaboratively by a group of domain experts (e.g., physicians or biometricians) supported by appropriate tools, so that they do not need any knowledge on ontologies or query languages. The PheSOs should be developed and shared in a web application (repository). The PheSO has a structure that is specified by the COP. The COP defines exactly which types of classes (the PheSO classes are subclasses of the COP classes), properties and axioms may be used in the PheSO. IT specialists develop tools supporting the specification of phenotypic knowledge (PheSO) and generation of queries in the desired query languages from this knowledge (some examples were mentioned in the paper). This enables a very clear division of labour. One can implement different translators/adapters from the ontological model to the respective query languages or just one SPARQL translator but different mappings of the respective data sources to an ontological representation (e.g. with OBDA or direct transformation). It makes no difference to our approach. In our projects (e.g. SMITH), we mostly deal with sensitive patient data. This data is managed in a protected area (e.g. in a hospital), mostly using an established medical standard for electronic health records (such as HL7 FHIR). Access to patient data is highly regulated and limited, e.g., in terms of the query languages (for FHIR, e.g., FHIR Search) and query types to be used. In some cases, no specific patient data may be requested, but only a number of patients meeting certain criteria. I.e. patient data must not be copied just anywhere (e.g. in a triple store) and may only be queried using FHIR Search. In this case, we therefore had to directly generate FHIR search queries from the ontology and combine the results of individual queries in the software. Nevertheless, it is of course possible (according to our approach) to generate SPARQL queries from a PheSO by using an appropriate adapter or mapping tool. 

Simple example:

Let us consider a very simple example of a phenotypic query: 'Search for obese patients' (based on weight and height). It sounds so simple but it requires, among other things, the knowledge of a mathematical formula (BMI), a BMI classification including corresponding value ranges (e.g. https://www.euro.who.int/en/health-topics/disease-prevention/nutrition/a-healthy-lifestyle/body-mass-index-bmi) as well as codes from various terminologies (e.g. from LOINC, body weight: 29463-7, body height: 8302-2) to identify the required data in the electronic health record. According to our approach, a PheSO would be developed containing, among other things, the classes Height, Weight, BMI and Obese with corresponding attributes (data types, units of measurement, value range, formula and codes). Queries in desired languages can then be easily generated from the PheSO.

 

Best Regards

Round 2

Reviewer 2 Report

I acknowledge a clear and, in my very humble opinion, successful attempt to improve the paper.

At this stage, I don't see any reason for not endorsing the publication of this paper.

Congratulations 

Back to TopTop