Based on the prevailing challenges in improving the semantic interoperability of environmental geospatial data using ontology engineering, as discussed, the following framework is proposed. Then, the key components of the framework and implementation steps that enhance the semantic linking of data are thoroughly discussed.
The proposed framework for environmental geospatial data introduces a novel and crucial approach to querying environmental geospatial data across both static (topographic maps with features, land use, historical climate data, etc.) and dynamic (weather, hydrological data, etc.) data sources. This framework bridges the gap between heterogeneous data formats and sources, allowing users to seamlessly access and query complex geospatial datasets without understanding the underlying data structure. Integrating and analyzing real-time dynamic data alongside traditional static datasets offers unprecedented insights into environmental changes and patterns. This approach not only enhances the cross-domain integration and accessibility of environmental geospatial data but also supports more informed decision-making where cross-domain analysis is crucial. The novelty of this framework lies in its capacity to unify diverse geospatial data formats from various sources and query them in a semantically rich manner, providing a more flexible and powerful tool for various informed decision-making scenarios.
Overview of the Framework
The framework can be divided into three components based on its functions. The first component is data storage, which is a database where all the related environmental data are stored for later use by queries. Then, there is a virtual component where KG is created on the fly based on user queries. Finally, there is a query composer with a result presentation interface, where the user can compose a query and obtain the results in table and visual format. This is also commonly known as the SPARQL endpoint. This framework uses a custom-built SPARQL Query Interface (SQI) over the virtual knowledge graph, which acts as an abstract layer that represents integrated data from various sources in the form of a graph. Users can formulate their queries on the interface without explicit knowledge of the underlying data structure.
Figure 3 represents the high-level overview of the framework’s architecture, where arrows indicate information flow.
This process is achieved by the ontological representation of the underlying spatial relational data exposed by the ontology-based data integration module, which translates high-level queries into appropriate data source queries, retrieves the data, and then transforms them back into the terms defined by the ontology. Finally, retrieved data are presented in a table or table with visuals (on a map) based on the non-spatial or spatial nature of the results.
The mapping process is a very important step in retrieving the correct result. It is always better to start with a small dataset to develop the initial ontology and link the data items from the subset to the terms and concepts in the ontology. This process makes the data align with the ontology’s structure and semantics. Then, the correct mapping process can be verified by observing query answers and visualization results. The virtual nature of this framework offers significant advantages for scalable systems due to avoiding materialization. In contrast to traditional data materialization methods, virtually integrated data into the ontologies and lightweight iterations offer efficient ontology and mapping construction with high flexibility. Therefore, flexibility, scalability, and efficiency are key advantages of this proposed framework.
The following section will discuss briefly the key steps for the experimental implementation of the proposed framework. The implantation of this framework is based on five key steps.
Figure 4 shows the simplified high-level process diagram for experimental framework implementation.
Based on the situation, steps one and two are interchangeable, meaning if the database already exists, one can start the ontology design based on the database scheme. Otherwise, the ontology can be designed before the database implementation according to the user requirements. Next, mappings must be established to connect the ontology with the database, enabling on-the-fly data access and allowing seamless semantic data retrieval based on user requirements. This component is the core strength of this system, as it allows seamless semantic data retrieval from an underlying relational database based on user requirements with the help of a pre-defined domain-specific ontology. Finally, the Ontop tool is configured to integrate all components and initiate the semantic query interface for composing queries and retrieving data. This iterative process requires extending the ontology and updating mappings whenever new data are added to the database to accommodate querying for the latest information. Further details are provided in the following section.
Step 1: Define Ontology: By obtaining the basic idea of available data sources and using top-down or bottom-up methods, an ontology can be designed [
31]. This conceptual scheme may act as the essential foundation for the framework and can be formulated by identifying key concepts, properties, and spatial relationships within the domain. Instead of developing an ontology from scratch, the best practice is to use existing ontologies and extend them with custom elements to match the current domain requirements [
31]. Using well-established ontologies has benefits, such as standardization, which is widely recognized and accepted in the geospatial community; time and cost-effectiveness; and interoperability with other systems and datasets. Therefore, the GeoSPARQL ontology by OGC (
https://opengeospatial.github.io/ogc-geosparql/geosparql11/ (accessed on 1 July 2024)) was selected as the foundation and extended with custom elements to match the requirements of this framework and experimental queries. The extended ontology was designed to integrate and manage geospatial and observational data for this project. It defines data properties for capturing specific measurements (e.g., temperature, water level, etc.), spatial coordinates, and identifiers. It includes classes for different features such as lakes, buildings, agricultural plots, weather stations, and points of interest. Given that the use cases primarily focus on environmental geospatial data, GeoSPARQL is an ideal choice due to its minimalistic nature as it covers the fundamental aspects of geospatial data such as data types and data properties to cover different geospatial data types and their characteristics and object properties to cover relationships between objects. This approach ensures that the ontology supports both spatial and non-spatial data, making it easier to integrate, query, and analyze geospatial information from multiple sources in the context of environmental management.
Figure 5 shows the part of the extended ontology (classes and data properties) that is developed in the Protégé application (
https://protege.stanford.edu/ (accessed on 6 June 2024)). With the help of the Ontop for Protégé plugin, users can develop mappings to connect the database with ontology, achieve seamless integration, and query the underlying data.
Data Properties: The ontology defines several datatype properties attributes that relate class instances to data values. Key properties include the following:
:hasAirTemperature for recording air temperature values.
:hasWaterTemperature for recording water temperature values.
:hasLatitude and :hasLongitude for geospatial coordinates.
:hasResultTime for the timestamp of observations.
:hasID, :hasStationID, and :hasParaID for unique identifiers.
Classes: The ontology defines several classes representing different types of features and observations:
:Observation: Represents an observation event
:agri_plots: Represents agricultural plots, subclass of geo:Feature.
:building: Represents buildings, subclass of geo:Feature.
:lake: Represents lakes, subclass of geo:Feature.
:poi: Represents points of interest, subclass of geo: Feature.
:weather_station: Represents weather stations, subclass of geo:Feature.
Relationships: The ontology establishes relationships between classes and data properties to create a comprehensive model for capturing various aspects under the considered domain.
Step 2: Prepare the Relational Database: The relational database is created based on user requirements. For this purpose, it is essential to identify all relevant data sources and streams to be integrated by listing different data sources from databases, data warehouses, sensor networks, application program interfaces (APIs), and other GIS data sources. For this particular case, PostgreSQL with the PostGIS extension was used because of its wide recognition for its efficient storage and management of data, support for a wide range of data types, advanced querying functions, and seamless integration with the Ontop system. For database administration and development, pgAdmin 4 (
https://www.pgadmin.org/ (accessed on 15 April 2024)) was used.
When preparing the PostgreSQL database, users can follow typical procedures of creating databases from tools like pgAdmin. However, there are some specific considerations and best practices to ensure optimal integration with the OBDA system. Things like defining clear data models (aligning database scheme with ontology and for spatial data and using the PostGIS extension and a proper coordinate reference system), ensuring the tables contain clear entity identifiers, avoiding unnecessary complexity in the database schema, using proper data types, and supporting real-time data can be highlighted. Furthermore, support for the database federation in PostgreSQL via Foreign Data Wrappers (FDW) is the main consideration when expanding the framework usability by running SPARQL queries to retrieve data over different databases in isolated data silos, which is the main consideration as many stakeholders maintain their own databases.
Step 3: Define the mappings: This step plays a key role in the framework data retrieval process. The relevant mapping entities are designed to map data sources to the ontology. This step aims to define how concepts in ontology relate to the data (tables, fields, records, and other spatial data). Mapping languages such as R2RML (RDB to RDF Mapping Language) [
45] or native Ontop mapping language [
46] can be used, and this set of mappings translates ontology terms to SPARQL queries to SQL queries to run over the underlying database during the query translation process. Ontop supports these two mapping languages. In this experimental setup, the native Ontop mapping language was used, which was easy to use, and mappings were formulated via a graphical user interface. However, it is worth mentioning that Ontop provides tools to convert either mapping into one another [
26]. Moreover, these rules describe how to extract data from the relational database and represent it as RDF triples according to the ontology.
Figure 6 presents an example mapping developed for the framework.
Mapping Declaration: Associates data from the relational table (lake) with the ontology. mappingId: A unique identifier for the mapping.
target: Specifies the RDF triples to be generated. It maps each variable in curly brackets from the relational table weather_station to the weather_station class and its properties.
source: Defines the SQL query to extract the relevant data from the relational database. This specifies which tables the data come from (weather_station), which columns are used (id, stnavn, longitude, latitude, etc.), and any joints or filters required to prepare the data for mapping. As a result, the query extracts seven columns (id, stnavn, longitude, latitude, stparam, stparakode, and stid) from weather_station table in the predefined database, ensuring that the required attributes are available for subsequent semantic mapping and integration into the ontology.
These mappings define how relational data are represented as RDF triples, making it possible to execute SPARQL queries on relational databases seamlessly. The triples generated by the mapping and ontology are not stored permanently but are accessed through SPARQL queries using SPARQL-to-SQL rewriting techniques. The development of mappings and ontologies is iterative. Therefore, a better understanding of the data will smooth the system’s improvement process.
Step 4: Configuration of Ontop settings. This step involves creating a configuration file with the database connection details and paths to the ontology and mapping files. Listing 4 presents example configuration settings for the Ontop system. After configuration, users can formulate queries using ontology terms, including GeoSPARQL functions. Therefore, these queries can abstract the underlying data while supporting both spatial and non-spatial aspects.
Listing 4. Example configuration settings. |
![Ijgi 14 00052 i004]() |
Step 5: SPARQL Query Interface (SQI). After configuring the Ontop system, the user can compose SPARQL queries and run them against the relational database through the virtual RDF graph to retrieve the data. For this purpose, a user-friendly query composer and data retrieval and presenting interface are very important for retrieving and analyzing the stored data. An easy-to-use user interface is necessary to motivate users to use the application daily in different decision-making scenarios. Therefore, a simple user interface was developed to compose SPARQL queries and show results in a table and visual format (on a map) based on the input query. This web application is based on an Ontop inbuilt local SPARQL endpoint. This connects ontologies to relational databases, enabling the translation of SPARQL queries into SQL queries that can be executed on the underlying database.
While SPARQL querying is indeed common in triple stores and VKGs, the way SPARQL is used within this framework is designed to address specific challenges and offer distinct advantages that go beyond what traditional methods provide. These unique features include the following.
Real-time, dynamic querying of both spatial and non-spatial data. Unlike static triple stores, this framework allows the near-real-time integration of heterogeneous data sources into a virtual knowledge graph (VKG), and SPARQL queries can be executed on-demand across this integrated data without requiring materialization. Making the most up-to-date data available for querying makes the system more effective in real-world decision-making scenarios.
Integration of spatial data via GeoSPARQL. Many traditional tools offer SPARQL support, but this framework goes a step further by ensuring that it can handle geospatial queries and present them on a map, which is crucial for geospatial data. This makes it particularly suited for spatial data applications in addition to traditional non-spatial data.
SQI allows users to run a wide range of SPARQL queries against the database and visualize the results on a map. The application is built using a combination of HTML, CSS, JavaScript, and Python, using several libraries and frameworks to provide a seamless user experience. Based on this context, queries are divided to two different types, namely spatial and non-spatial queries, for easy understating.
Basically, non-spatial queries explain the types of attribute-based queries users can formulate, such as filtering or aggregating data based on non-spatial attributes. Users can run queries to extract specific land use types, calculate the number of buildings in a given area, or aggregate data based on building height or land area.
Spatial queries demonstrate how users can use GeoSPARQL functions to perform advanced geospatial analyses, including spatial joins, proximity analysis, and area-based queries. The SPARQL Query Interface (SQI) supports these capabilities, enabling tasks such as retrieving buildings within a specified distance from a water body or calculating the area of flood-prone zones using elevation data. This functionality empowers users to execute complex spatial operations seamlessly through an intuitive interface, enhancing both usability and decision-making capabilities in geospatial data analysis.
Furthermore, users can run more complex queries that combine spatial and non-spatial elements, such as identifying flood-prone areas in urban environments where both building density and elevation data meet specific criteria.
This consists of three main components: Query Composer to input user-defined queries, Result Table to present results, and Geo Visualiser to visualize results on a map.
Figure 7 shows the portal interface and its components. The source code is released on GitHub (
https://github.com/sprana-web/SQI (accessed on 8 July 2024)).