Use of Semantic Web Technologies to Enhance the Integration and Interoperability of Environmental Geospatial Data: A Framework Based on Ontology-Based Data Access

Ranatunga, Sajith; Ødegård, Rune Strand; Jetlund, Knut; Onstein, Erling

doi:10.3390/ijgi14020052

Open AccessEditor’s ChoiceArticle

Use of Semantic Web Technologies to Enhance the Integration and Interoperability of Environmental Geospatial Data: A Framework Based on Ontology-Based Data Access

¹

NTNU, Faculty of Engineering, Department of Manufacturing and Civil Engineering, Gjøvik, Teknologiveien 22, 2815 Gjøvik, Norway

²

Kartverket, Kartverksveien 21, 3511 Hønefoss, Norway

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2025, 14(2), 52; https://doi.org/10.3390/ijgi14020052

Submission received: 6 November 2024 / Revised: 18 January 2025 / Accepted: 24 January 2025 / Published: 28 January 2025

Download

Browse Figures

Versions Notes

Abstract

This study addresses the challenges of integrating heterogeneous environmental geospatial data by proposing a framework based on ontology-based data access (OBDA). Geospatial data are important for decision-making in various domains, such as environmental monitoring, disaster management, and urban development. Data integration is a common challenge within these domains due to data heterogeneity and semantic discrepancies. The proposed framework uses semantic web technologies to enhance data interoperability, accessibility, and usability. Several practical examples were demonstrated to validate its effectiveness. These examples were based in Lake Mjøsa, Norway, addressing both spatial and non-spatial scenarios to test the framework’s potential. By extending the GeoSPARQL ontology, the framework supports SPARQL queries to retrieve information based on user requirements. A web-based SPARQL Query Interface (SQI) was developed to execute queries and display the retrieved data in tabular and visual format. Utilizing free and open-source software (FOSS), the framework is easily replicable for stakeholders and researchers. Despite some limitations, the study concludes that the framework is able to enhance cross-domain data integration and semantic querying in various informed decision-making scenarios.

Keywords:

GeoSPARQL; semantic web technologies; OBDA; environmental geospatial data; decision-making

1. Introduction

Geospatial data refers to information that describes objects, events, or phenomena with a location on the Earth’s surface. This location information can be either static (such as the location of a lake) or dynamic (such as the location of a moving object). Therefore, geospatial data usually come in combination with location data such as geographic coordinates (latitude and longitude), attribute data (descriptive information about the spatial features, such as names, classifications, or other relevant data), and sometimes temporal data [1]. This space-time relationship of geospatial data is crucial in various domains for informed decision-making, especially in environmental monitoring and management for monitoring environmental changes over time [2], such as deforestation [3,4], urbanization [5], and climate change [6], and managing natural resources effectively, ensuring the sustainable use of water, minerals, and forests. Urban planning and development is also a key domain where geospatial data play a crucial role [7]. The planning and zoning of urban areas, ensuring sustainable development [8], and the planning and managing of construction of infrastructure projects [9] are some use cases. Moreover, geospatial data are a critical component in disaster management to understand the situation and coordinate emergency responses [10], assess the impacts of floods [11], and conduct landslide hazard management [12].

Therefore, the spatial and temporal dimensions provided by environmental geospatial data enable more informed decision-making and enhanced predictive capabilities, particularly for crisis events where timely decision-making is crucial. However, integrating and utilizing cross-domain environmental geospatial data for different use cases has several challenges.

The sheer volume and heterogeneity are crucial challenges regarding geospatial data, influencing its integration, analysis, and usability [13]. Geospatial data often use different formats such as shapefiles, geoJSON, KML, raster files, and relational databases, making integrating data from these diverse formats challenging. They need conversion tools and standardized processes, which can be complex for some users and time-consuming [14]. Moreover, geospatial data often come in various schemas for representing geospatial information, making proper data integration complex and error-prone [15]. Semantic heterogeneity is also a challenge with geospatial data, as different datasets may use varying terminologies, models, and classifications for similar concepts and are recognized as an obstacle to good interoperability [16]. Moreover, scalability and performance are other significant issues as the volume and diversity of geospatial data continue to grow, and scalable integration solutions are needed to handle large datasets efficiently [17]. Some geospatial datasets come as real-time data, which pose a significant challenge when integrating and processing them for further uses. Inconsistent spatial reference systems in different datasets also challenge managing geospatial data as these datasets use varying georeferencing and coordinate systems [18], necessitating complex transformations for integration. Furthermore, environmental data often include temporal components [17], and integrating these temporal aspects with varying timeframes can be complex.

Addressing these challenges requires robust frameworks, standards, and technologies to ensure consistency, seamless integration, and the interoperability of heterogeneous environmental geospatial data sources. There are several international standards to aid with some of these challenges, such as consistency and the effective sharing of geospatial data. Standards like ISO standards (ISO 191xx series [19]), Open Geospatial Consortium (OGC) standards [20], INSPIRE Directive (Infrastructure for Spatial Information in Europe) [21], and SOSI (Samordnet Opplegg for Stedfestet Informasjon) [22] have been developed by various organizations. However, these standards still lack the flexibility, interoperability, and dynamic use of geospatial data between cross-domains with data linkage across the web. The importance of cross-domain integration cannot be overstated because data enriched with geospatial information have significant advantages for many users, from the general public to experts in respective domains in various decision-making scenarios.

Traditionally, geospatial data integration has been approached through ad hoc methods and custom solutions, resulting in the majority of them having fragmented and isolated data silos with limited interoperability [23]. However, the growing demand for comprehensive and unified geospatial information systems has led to the use of advanced integration techniques, such as semantic web technologies. Semantic web technologies offer a sophisticated approach to overcoming challenges in the integration and interoperability of geospatial data between systems [24], which are crucial for effective decision-making. Semantic web technologies overcome these challenges by using several technologies to establish semantically enriched data models, reduce ambiguities, and enhance platform interoperability. With advanced querying capabilities, these technologies facilitate extracting meaningful insights from complex spatial and non-spatial datasets. By improving data consistency and accessibility, semantic technologies help stakeholders make more informed and timely decisions. Currently, this creates emerging attention from both academic and industrial entities working on data integration applications and publishing these datasets as linked data, which can be merged with other data sources [25].

The primary objective of this article is to propose a reproducible modular framework using an ontology-based data access (OBDA) paradigm with semantic web technologies to address the challenges associated with integrating and minimizing semantic inconsistencies and enhancing the usability and accessibility of environmental geospatial data from heterogeneous sources. OBDA is a technique used to access and query data stored on a relational database over the virtual layer with the help of a pre-defined domain ontology without modifying the underlying data [26], which is vital in many geospatial data integration scenarios where datasets are too large or continuously changing.

This work is motivated by the need to address three inherent challenges associated with environmental geospatial data integration and interoperability via the OBDA method and develop a tailored data management solution for Mission Mjøsa [27]. The goal is to address these three key challenges holistically through a unified approach rather than treating them as isolated issues to handle environmental geospatial data efficiently.

Data Integration: The framework aims to enhance data integration to reduce data fragmentation and data silos, particularly when integrating near-real-time environmental and hydrological data alongside static data like spatial features on land from diverse sources. This approach ensures that all data are available in a unified semantic knowledge graph for analysis and decision-making.
Minimizing Semantic Inconsistencies: The framework uses the OBDA method to minimize semantic inconsistencies, ensuring that heterogeneous data from various sources are seamlessly aligned with a common ontology. This results in more accurate and coherent data integration.
Usability and Accessibility: The framework improves usability and accessibility by providing a web-based platform, allowing users—regardless of their technical expertise—to easily access semantically rich data and make timely, informed decisions on the go.

This approach is particularly significant for initiatives like Mission Mjøsa, where a diverse group of stakeholders—ranging in technical expertise—must collaborate effectively across numerous data silos managed by different organizations. The ultimate goal of this unified framework is to bridge these data silos, enabling stakeholders to seamlessly access, visualize, and analyze geospatial data through a user-friendly, web-based platform. This accessibility ensures that stakeholders, regardless of their technical background, can make informed, data-driven decisions efficiently.

Moreover, the framework is built using free and open-source technologies, making it easy for the broader community to adopt, adapt, and extend them without imposing financial burdens on users. Even though demonstrated examples are used to show the framework functions in a simple manner, the framework is highly customizable and can be tailored to address specific use cases with more advanced approaches such as flood management and watershed management. This can be carried out by customizing data integration, analysis, and visualization components to meet the unique challenges of these domains. The uniqueness of this framework lies in its unified approach to addressing key challenges in the environmental geospatial data domain and the seamless integration of near-real-time data by connecting to external APIs and supporting database federation; the framework ensures up-to-date information for dynamic decision-making while breaking down data silos to encourage collaboration across datasets. The use of dynamic queries with GeoSPARQL non-topological query functions enables real-time geospatial analysis and direct visualization through the SPARQL Query Interface (SQI), empowering stakeholders to make informed, spatially aware decisions efficiently.

This combination of modularity, openness, and advanced geospatial querying capabilities positions the framework as a scalable, adaptable, and collaborative solution for modern environmental geospatial data challenges.

The ultimate goal is to evolve this base framework to address the specific needs identified in Mission Mjøsa, developing it into a spatial digital twin for Lake Mjøsa, supporting both sustainable management and informed decision-making for the stakeholders around the lake.

Semantic approaches and robust ontological structures are used in the proposed solution to facilitate a better data-sharing and integration model. This leads to improved decision-making and enhanced accessibility for various applications in environmental monitoring, urban planning, disaster management, and other domains where geospatial data are prominent. Finally, the proposed framework’s effectiveness was demonstrated through example queries.

For the data storage component, the proposed framework is equipped with PostgreSQL with the PostGIS extension to handle geospatial data. For semantic translation, the component is powered by the Ontop OBDA system. Finally, for the data presentation component, a brand-new web-based query interface (SQI) was developed based on the Leaflet JavaScript mapping library for the querying and visualization of results. The web-based platform abstracts the complexities of the underlying processes, allowing users to perform real-time and near-real-time querying seamlessly. It supports the visualization of both spatial and non-spatial data through an intuitive graphical interface, enabling users to interact with results in map-based visualizations and tabular formats.

Based on the abovementioned objective, the following research questions are formulated to address the objective.

How can semantic web technologies be utilized to integrate heterogeneous environmental geospatial data and query them effectively for informed decision-making using a reproducible data integration pipeline and framework?
How can free and open-source technologies, software, and frameworks be utilized to develop a modular and extensible data integration framework with a web-based semantic data retrieval portal for collaborative, spatial decision-making?

The remainder of this article is organized as follows: In Section 2, background knowledge and previous works related to ontology-based data integration will be discussed. Section 3 will focus mainly on the framework in detail. Then, in Section 4, detailed examples based on open data sources will be explored. Section 5 discusses system evaluation, including its strengths, challenges, and possible improvements. Section 6 concludes the paper with an insight into possible future enhancements to the framework for better user experience.

2. Background and Related Work

This section discusses background knowledge of geospatial data, semantic technologies, and related research works.

2.1. Geospatial Data and Semantic Technologies

In today’s interconnected world, the demand for geospatial data has surged, driven by advancements in remote sensing, drones, GIS software that makes more and more data available, diverse economic and business applications, and the rise of location-based services with real-time information. The availability of open data initiatives, standardized geospatial formats, and collaborative mapping platforms has opened access to broad geospatial information, empowering users across sectors to harness its potential for addressing complex societal challenges.

The concept of a semantic web [28] has been developed over the past decades, an extension of the current web and a core part of the so-called “Web 3.0”. The basic idea of the semantic web is to provide a common framework that allows data to be shared and reused across applications, enterprises, and community boundaries [29] and enabling machines to understand and interpret data more effectively.

The RDF (Resource Description Framework) forms the foundation of semantic web technologies by representing data in a structured format called triples (subject, predicate, object) that define relationships between resources. OWL (Web Ontology Language) builds on RDF to define and link classes and properties, supporting reasoning about the relationships between concepts [30].

Ontologies define common concepts and relationships used across various domains and serve as a conceptual model that captures the essential entities, attributes, and relationships within the domain of interest. An ontology for a particular domain is a formal, explicit description of a hierarchical set of concepts (often referred to as “classes”), the properties of each concept that describe various features and attributes, and the relationships between these concepts [31]. Ontologies often use formal languages such as OWL, which allow for the precise representation of domain-specific concepts, relationships, and semantics within a hierarchical or networked structure [32].

Knowledge graphs (KGs) (built on the same standards as the semantic web, linked data [30]), use ontologies to connect data points through nodes and edges, creating a rich network of linked information within a domain or across domains in the form of a graph [33]. A knowledge graph consists of entities/nodes representing objects, instances, or concepts, and an edge between two of them represents the connections or associations between these entities. This is the way of serializing the structured or unstructured data into the format of triples. Attributes provide additional information about entities, and literals are specific values for the attributes, which can be strings, numbers, dates, or any other data type. Due to this interconnected nature of the KG, users can use query language like SPARQL (SPARQL Protocol and RDF Query Language) to query the KG, and reasoning ability allows the inference of new information from existing data. Figure 1 shows a simple graph related to a lake with different entities, relationships, and literals.

Especially for the environmental domain, KGs can be used to capture and represent geospatial information in a semantically rich and interconnected manner. This enables the efficient storage, retrieval, and analysis of geospatial information from heterogeneous data sources. Due to this interconnected nature, knowledge graphs can be used to perform logical reasoning and infer new knowledge and hidden semantic relationships [17]. In contrast, the VKG approach maintains data in their original relational databases. Here, SPARQL queries are dynamically translated into SQL queries at runtime, eliminating the need for physical RDF materialization or data duplication. Instead, data are queried on the fly from the relational data source, making VKGs particularly suitable for dynamic datasets and scenarios requiring real-time updates.

SPARQL (SPARQL Protocol and RDF Query Language) is a powerful query language and protocol that allows for querying and manipulating these data by retrieving and updating information stored in RDF format [34]. SPARQL can be used to write queries to extract information from RDF datasets. Moreover, a well-defined query can be used for various ways of information extraction, ranging from extracting all the matching triples to extracting very specific triples by pattern matching and filtering techniques. Federated queries are another function that SPARQL is capable of, which allows querying across multiple endpoints while strengthening data integration from different sources [35].

GeoSPARQL, a Geographic Query Language for RDF Data, is defined by the Open Geospatial Consortium (OGC) (https://www.ogc.org/standard/geosparql/ (accessed on 27 June 2024)). The standard facilitates the representation and querying of geospatial data on the semantic web. GeoSPARQL defines a vocabulary for expressing geospatial data in RDF and extends the SPARQL query language to process geospatial information [36]. This vocabulary has gained popularity due to its ability to embed spatial predicates in queries for more precise and meaningful queries over RDF KGs, which include geographic locations and their relationships. This vocabulary is lightweight, representing only a few fundamental concepts, primarily those of feature and geometry. The OGC GeoSPARQL standard defines simple feature topological relations and non-topological query functions that interact with spatial data.

Topological relations describe spatial relationships between geometric entities, while non-topological query functions perform spatial operations and calculations on geometric entities without directly considering their spatial relationships. There are 8 simple feature topological functions, 23 non-topological query functions for simple features, and 14 non-topological query functions for non-simple features defined on OGC GeoSPARQL 1.1 standard [36].

Altogether, these technologies can be utilized to create an ecosystem that enhances data interoperability, enabling more intelligent and context-aware applications for information retrieval, identification of hidden linkages, and data-driven, informed decision-making [37].

2.2. Semantic Integration and Ontology-Based Data Access (OBDA)

In semantic integration, one possible solution is to convert the data into a linked dataset (i.e., a collection of RDF triples as a knowledge graph) and integrate them with other geospatial sources for more advanced querying. However, this approach requires materializing the data into RDF triples (materialized KGs (MKGs)) and then loading them into separate stores that support geospatial KGs such as Apache Jena, GraphDB, and Stardog [33]. Then, SPARQL queries are directly executed against the pre-materialized RDF data. This results in significant redundancy, increased storage costs, and scalability issues as the data size grows [17].

An alternative approach is to avoid KG materialization and use an ontology to define a semantic layer for the data and link it to the data source through mapping. The virtual KG (VKG) approach maintains data in their original relational databases. Here, SPARQL queries are dynamically translated into SQL queries at runtime, eliminating the need for physical RDF materialization or data duplication. Instead, data are queried on the fly from the relational data source, making VKGs particularly suitable for dynamic datasets and scenarios requiring real-time updates [26]. This virtual approach shows a promising way of linking different forms of geospatial data and managing spatiotemporal semantics at a high level of abstraction, thereby significantly reducing the redundancy associated with materialization. This method is highly beneficial, especially in the environmental domain, as data usually come in large datasets (size and number) and are frequently updated, such as real-time or near-real-time weather and hydrological data.

This virtual approach, mostly named ontology-based data access (OBDA) but also known as the virtual knowledge graph (VKG) paradigm, has emerged as a promising approach for addressing the challenges of improved heterogeneous geospatial data integration, interoperability, and knowledge discovery across diverse geospatial data sources. Basically, the OBDA paradigm allows users to access heterogeneous data sources through the domain-specific ontology, which is connected to underlying data sources via domain-specific declarative mapping connections. OBDA uses ontologies to bridge the gap between relational databases and semantic representations, enabling seamless data access and querying the data using ontology concepts and relationships. Ontop is a prominent open-source system (https://ontop-vkg.org/ (accessed on 1 July 2024)) designed to facilitate OBDA [26]. It allows users to query relational databases as if they were RDF triple stores.

In OBDA, mappings are created between data stored in relational database schemas and ontology concepts, properties, and axioms. This allows for the automatic translation of user SPARQL queries expressed in ontology terms to SQL queries, which can then be executed against the underlying databases. This approach simplifies querying and accessing geospatial data by providing a unified and intuitive interface, abstracting the complexities of database querying and schema heterogeneity. Moreover, it promotes collaboration and knowledge sharing among stakeholders in geospatial data-intensive applications.

Therefore, the key advantages of using the OBDA method can be summarized as follows. First, it reduces the data movement by eliminating the need to transform and materialize data, as queries are executed directly on the underlying databases. This reduces storage overhead and enhances efficiency [38]. Second, it supports dynamic updates, meaning any changes in the underlying data are instantly reflected in the query results without requiring data reloading or transformation. Lastly, OBDA provides flexibility, making it easier to adapt to data schema or ontology changes, ensuring the system remains scalable and responsive to evolving requirements. This adaptability allows for smooth integration with evolving datasets, ensuring that queries always align with the most current data structures. Figure 2 represents the three main steps involved in the OBDA system application.

The following simple example (retrieving temperature data) shows the basic structure of the ontology (Listing 1), mapping (Listing 2), and the SPARQL query (Listing 3) used to retrieve data.

Listing 1. Example ontology for temperature data.

Listing 2. Example mapping for the ontology.

Listing 3. Example SPARQL query to retrieve the temperature data.

2.3. Related Work

Several interesting academic works have focused on improving the interoperability between various geospatial data sources in different domains. A few highly relevant works on this topic are discussed here. Ding et al. tried to expose the relational database as a SPARQL endpoint using the virtual knowledge graph method. They confirmed that answering complex queries using the VKG approach is efficient for large datasets. Further results indicated that the VKG method can support GeoSPARQL queries that use topological and non-topological operations on the underlying database [39]. Furthermore, Ding et al., worked on combining information from different sources and integrating data for meaningful outcomes. As a solution, they proposed a framework based on two research areas: data integration and geovisual analytics. They implemented an ontology-based framework with two modules and tested scenarios involving investigating spatiotemporal patterns in meteorological and traffic data from multiple open data sources. Initial studies demonstrate that the approach is feasible and effective for exploring and understanding heterogeneous geospatial data [40].

Another work was carried out based on the OBDA framework for maritime security [38]. They focused on integrating real-time and static data about vessels from different sources and using the framework to detect abnormal vessel behaviors and movements. Another research group deployed the OBDA system to tackle the challenges related to the data system on a data-intensive Norwegian petroleum company called Statoil (now Equinor) [41]. They aimed to semi-automate the ontology development and mapping tasks from relational databases, optimize the process of translating ontological queries to data queries to run over the databases, and help the company engineers, who have limited IT knowledge, to formulate queries. Another practical usage of the OBDA system was implemented for Siemens Energy to address data access challenges faced by Siemens engineers [42]. For this approach, novel tools and technologies have been used to achieve their objectives. Tools such as BootOX for ontology and mapping, SPARQL as a query language to create uniform queries on streaming and static data, data processing platform ExaStream, and OptiqueVQS for query formulation were used.

Managing raster data is quite challenging compared to vector data, as the data are represented as a grid of cells or pixels, where each cell contains a single value and consumes more storage due to its grid structure. An effort is made to improve the semantic querying of integrated raster and vector data through the virtual knowledge graph approach [43]. Two data managers were used to handle raster and vector data separately. Hamdani et al. proposed a framework for semantic integration and the advanced querying of raster data based on the VKG approach. This framework can combine raster data semantics with feature-based models consisting of geometries and spatial and topological relationships [17].

Semantic data repositories are important in storing and managing large KGs in spatial data management applications. Li et al. conducted a series of tests to evaluate popular semantic data repositories (RDF4j, Fuseki, GraphDB, Virtuoso, Neo4j, Ontop) and their performance in handling geospatial data. Ontop was the second best in query performance in spatial queries and the best in storage efficiency and had the least effort in data-to-RDF conversion [44].

3. Methodology

Based on the prevailing challenges in improving the semantic interoperability of environmental geospatial data using ontology engineering, as discussed, the following framework is proposed. Then, the key components of the framework and implementation steps that enhance the semantic linking of data are thoroughly discussed.

The proposed framework for environmental geospatial data introduces a novel and crucial approach to querying environmental geospatial data across both static (topographic maps with features, land use, historical climate data, etc.) and dynamic (weather, hydrological data, etc.) data sources. This framework bridges the gap between heterogeneous data formats and sources, allowing users to seamlessly access and query complex geospatial datasets without understanding the underlying data structure. Integrating and analyzing real-time dynamic data alongside traditional static datasets offers unprecedented insights into environmental changes and patterns. This approach not only enhances the cross-domain integration and accessibility of environmental geospatial data but also supports more informed decision-making where cross-domain analysis is crucial. The novelty of this framework lies in its capacity to unify diverse geospatial data formats from various sources and query them in a semantically rich manner, providing a more flexible and powerful tool for various informed decision-making scenarios.

Overview of the Framework

The framework can be divided into three components based on its functions. The first component is data storage, which is a database where all the related environmental data are stored for later use by queries. Then, there is a virtual component where KG is created on the fly based on user queries. Finally, there is a query composer with a result presentation interface, where the user can compose a query and obtain the results in table and visual format. This is also commonly known as the SPARQL endpoint. This framework uses a custom-built SPARQL Query Interface (SQI) over the virtual knowledge graph, which acts as an abstract layer that represents integrated data from various sources in the form of a graph. Users can formulate their queries on the interface without explicit knowledge of the underlying data structure. Figure 3 represents the high-level overview of the framework’s architecture, where arrows indicate information flow.

This process is achieved by the ontological representation of the underlying spatial relational data exposed by the ontology-based data integration module, which translates high-level queries into appropriate data source queries, retrieves the data, and then transforms them back into the terms defined by the ontology. Finally, retrieved data are presented in a table or table with visuals (on a map) based on the non-spatial or spatial nature of the results.

The mapping process is a very important step in retrieving the correct result. It is always better to start with a small dataset to develop the initial ontology and link the data items from the subset to the terms and concepts in the ontology. This process makes the data align with the ontology’s structure and semantics. Then, the correct mapping process can be verified by observing query answers and visualization results. The virtual nature of this framework offers significant advantages for scalable systems due to avoiding materialization. In contrast to traditional data materialization methods, virtually integrated data into the ontologies and lightweight iterations offer efficient ontology and mapping construction with high flexibility. Therefore, flexibility, scalability, and efficiency are key advantages of this proposed framework.

The following section will discuss briefly the key steps for the experimental implementation of the proposed framework. The implantation of this framework is based on five key steps. Figure 4 shows the simplified high-level process diagram for experimental framework implementation.

Based on the situation, steps one and two are interchangeable, meaning if the database already exists, one can start the ontology design based on the database scheme. Otherwise, the ontology can be designed before the database implementation according to the user requirements. Next, mappings must be established to connect the ontology with the database, enabling on-the-fly data access and allowing seamless semantic data retrieval based on user requirements. This component is the core strength of this system, as it allows seamless semantic data retrieval from an underlying relational database based on user requirements with the help of a pre-defined domain-specific ontology. Finally, the Ontop tool is configured to integrate all components and initiate the semantic query interface for composing queries and retrieving data. This iterative process requires extending the ontology and updating mappings whenever new data are added to the database to accommodate querying for the latest information. Further details are provided in the following section.

Step 1: Define Ontology: By obtaining the basic idea of available data sources and using top-down or bottom-up methods, an ontology can be designed [31]. This conceptual scheme may act as the essential foundation for the framework and can be formulated by identifying key concepts, properties, and spatial relationships within the domain. Instead of developing an ontology from scratch, the best practice is to use existing ontologies and extend them with custom elements to match the current domain requirements [31]. Using well-established ontologies has benefits, such as standardization, which is widely recognized and accepted in the geospatial community; time and cost-effectiveness; and interoperability with other systems and datasets. Therefore, the GeoSPARQL ontology by OGC (https://opengeospatial.github.io/ogc-geosparql/geosparql11/ (accessed on 1 July 2024)) was selected as the foundation and extended with custom elements to match the requirements of this framework and experimental queries. The extended ontology was designed to integrate and manage geospatial and observational data for this project. It defines data properties for capturing specific measurements (e.g., temperature, water level, etc.), spatial coordinates, and identifiers. It includes classes for different features such as lakes, buildings, agricultural plots, weather stations, and points of interest. Given that the use cases primarily focus on environmental geospatial data, GeoSPARQL is an ideal choice due to its minimalistic nature as it covers the fundamental aspects of geospatial data such as data types and data properties to cover different geospatial data types and their characteristics and object properties to cover relationships between objects. This approach ensures that the ontology supports both spatial and non-spatial data, making it easier to integrate, query, and analyze geospatial information from multiple sources in the context of environmental management. Figure 5 shows the part of the extended ontology (classes and data properties) that is developed in the Protégé application (https://protege.stanford.edu/ (accessed on 6 June 2024)). With the help of the Ontop for Protégé plugin, users can develop mappings to connect the database with ontology, achieve seamless integration, and query the underlying data.

Data Properties: The ontology defines several datatype properties attributes that relate class instances to data values. Key properties include the following:

:hasAirTemperature for recording air temperature values.

:hasWaterTemperature for recording water temperature values.

:hasLatitude and :hasLongitude for geospatial coordinates.

:hasResultTime for the timestamp of observations.

:hasID, :hasStationID, and :hasParaID for unique identifiers.

Classes: The ontology defines several classes representing different types of features and observations:

:Observation: Represents an observation event

:agri_plots: Represents agricultural plots, subclass of geo:Feature.

:building: Represents buildings, subclass of geo:Feature.

:lake: Represents lakes, subclass of geo:Feature.

:poi: Represents points of interest, subclass of geo: Feature.

:weather_station: Represents weather stations, subclass of geo:Feature.

Relationships: The ontology establishes relationships between classes and data properties to create a comprehensive model for capturing various aspects under the considered domain.

Step 2: Prepare the Relational Database: The relational database is created based on user requirements. For this purpose, it is essential to identify all relevant data sources and streams to be integrated by listing different data sources from databases, data warehouses, sensor networks, application program interfaces (APIs), and other GIS data sources. For this particular case, PostgreSQL with the PostGIS extension was used because of its wide recognition for its efficient storage and management of data, support for a wide range of data types, advanced querying functions, and seamless integration with the Ontop system. For database administration and development, pgAdmin 4 (https://www.pgadmin.org/ (accessed on 15 April 2024)) was used.

When preparing the PostgreSQL database, users can follow typical procedures of creating databases from tools like pgAdmin. However, there are some specific considerations and best practices to ensure optimal integration with the OBDA system. Things like defining clear data models (aligning database scheme with ontology and for spatial data and using the PostGIS extension and a proper coordinate reference system), ensuring the tables contain clear entity identifiers, avoiding unnecessary complexity in the database schema, using proper data types, and supporting real-time data can be highlighted. Furthermore, support for the database federation in PostgreSQL via Foreign Data Wrappers (FDW) is the main consideration when expanding the framework usability by running SPARQL queries to retrieve data over different databases in isolated data silos, which is the main consideration as many stakeholders maintain their own databases.

Step 3: Define the mappings: This step plays a key role in the framework data retrieval process. The relevant mapping entities are designed to map data sources to the ontology. This step aims to define how concepts in ontology relate to the data (tables, fields, records, and other spatial data). Mapping languages such as R2RML (RDB to RDF Mapping Language) [45] or native Ontop mapping language [46] can be used, and this set of mappings translates ontology terms to SPARQL queries to SQL queries to run over the underlying database during the query translation process. Ontop supports these two mapping languages. In this experimental setup, the native Ontop mapping language was used, which was easy to use, and mappings were formulated via a graphical user interface. However, it is worth mentioning that Ontop provides tools to convert either mapping into one another [26]. Moreover, these rules describe how to extract data from the relational database and represent it as RDF triples according to the ontology. Figure 6 presents an example mapping developed for the framework.

Mapping Declaration: Associates data from the relational table (lake) with the ontology. mappingId: A unique identifier for the mapping.

target: Specifies the RDF triples to be generated. It maps each variable in curly brackets from the relational table weather_station to the weather_station class and its properties.

source: Defines the SQL query to extract the relevant data from the relational database. This specifies which tables the data come from (weather_station), which columns are used (id, stnavn, longitude, latitude, etc.), and any joints or filters required to prepare the data for mapping. As a result, the query extracts seven columns (id, stnavn, longitude, latitude, stparam, stparakode, and stid) from weather_station table in the predefined database, ensuring that the required attributes are available for subsequent semantic mapping and integration into the ontology.

These mappings define how relational data are represented as RDF triples, making it possible to execute SPARQL queries on relational databases seamlessly. The triples generated by the mapping and ontology are not stored permanently but are accessed through SPARQL queries using SPARQL-to-SQL rewriting techniques. The development of mappings and ontologies is iterative. Therefore, a better understanding of the data will smooth the system’s improvement process.

Step 4: Configuration of Ontop settings. This step involves creating a configuration file with the database connection details and paths to the ontology and mapping files. Listing 4 presents example configuration settings for the Ontop system. After configuration, users can formulate queries using ontology terms, including GeoSPARQL functions. Therefore, these queries can abstract the underlying data while supporting both spatial and non-spatial aspects.

Listing 4. Example configuration settings.

Step 5: SPARQL Query Interface (SQI). After configuring the Ontop system, the user can compose SPARQL queries and run them against the relational database through the virtual RDF graph to retrieve the data. For this purpose, a user-friendly query composer and data retrieval and presenting interface are very important for retrieving and analyzing the stored data. An easy-to-use user interface is necessary to motivate users to use the application daily in different decision-making scenarios. Therefore, a simple user interface was developed to compose SPARQL queries and show results in a table and visual format (on a map) based on the input query. This web application is based on an Ontop inbuilt local SPARQL endpoint. This connects ontologies to relational databases, enabling the translation of SPARQL queries into SQL queries that can be executed on the underlying database.

While SPARQL querying is indeed common in triple stores and VKGs, the way SPARQL is used within this framework is designed to address specific challenges and offer distinct advantages that go beyond what traditional methods provide. These unique features include the following.

Real-time, dynamic querying of both spatial and non-spatial data. Unlike static triple stores, this framework allows the near-real-time integration of heterogeneous data sources into a virtual knowledge graph (VKG), and SPARQL queries can be executed on-demand across this integrated data without requiring materialization. Making the most up-to-date data available for querying makes the system more effective in real-world decision-making scenarios.
Integration of spatial data via GeoSPARQL. Many traditional tools offer SPARQL support, but this framework goes a step further by ensuring that it can handle geospatial queries and present them on a map, which is crucial for geospatial data. This makes it particularly suited for spatial data applications in addition to traditional non-spatial data.

SQI allows users to run a wide range of SPARQL queries against the database and visualize the results on a map. The application is built using a combination of HTML, CSS, JavaScript, and Python, using several libraries and frameworks to provide a seamless user experience. Based on this context, queries are divided to two different types, namely spatial and non-spatial queries, for easy understating.

Basically, non-spatial queries explain the types of attribute-based queries users can formulate, such as filtering or aggregating data based on non-spatial attributes. Users can run queries to extract specific land use types, calculate the number of buildings in a given area, or aggregate data based on building height or land area.

Spatial queries demonstrate how users can use GeoSPARQL functions to perform advanced geospatial analyses, including spatial joins, proximity analysis, and area-based queries. The SPARQL Query Interface (SQI) supports these capabilities, enabling tasks such as retrieving buildings within a specified distance from a water body or calculating the area of flood-prone zones using elevation data. This functionality empowers users to execute complex spatial operations seamlessly through an intuitive interface, enhancing both usability and decision-making capabilities in geospatial data analysis.

Furthermore, users can run more complex queries that combine spatial and non-spatial elements, such as identifying flood-prone areas in urban environments where both building density and elevation data meet specific criteria.

This consists of three main components: Query Composer to input user-defined queries, Result Table to present results, and Geo Visualiser to visualize results on a map. Figure 7 shows the portal interface and its components. The source code is released on GitHub (https://github.com/sprana-web/SQI (accessed on 8 July 2024)).

4. Mission Mjøsa and Example Queries

4.1. Mission Mjøsa

Mission Mjøsa is a comprehensive research program initiated by the Norwegian University of Science and Technology (NTNU), Innlandet County Municipality, and surrounding municipalities to address environmental challenges facing Lake Mjøsa, Norway’s largest lake. From 2023 to 2027, the program aims to enhance sustainability and value creation in the region through multidisciplinary approaches involving various stakeholders such as research institutions, national authorities, businesses, and local communities (https://www.ntnu.no/oppdrag-mjosa (accessed on 10 October 2024)).

For the demonstration of example queries for this study, Lake Mjøsa [47,48] was selected, which is Norway’s largest lake (369.48 km²). Moreover, major cities such as Hamar, Gjøvik, and Lillehammer are located along the lake’s shoreline in the east, west, and north points, respectively. Figure 8 shows the map of the lake and its position over the zoomed-out map of Norway.

Two data sources were used for these experimental query demonstrations. The first one is the Norwegian official national web portal (GEONORGE [49]) for map data and other location-based information in Norway, which is maintained by the Norwegian Mapping Authority (https://www.kartverket.no/ (accessed on 4 April 2024)). The second source is the Norwegian Water Resources and Energy Directorate (NVE [50]). GEONORGE offers a wide range of spatial data, including topological maps, population, geology, environmental data, and many more, and most of these data are freely available under an open license and can be downloaded without registering. Regarding file formats, most vector datasets are available as SOSI files and GML files, but the PostGIS, GeoJson, GPX, GeoTIFF, and DEM formats can also be found for some datasets. Moreover, it offers Web Map Services (WMSs) for different use cases. Moreover, OpenStreetMap (https://www.openstreetmap.org/ (accessed on 4 April 2024)) services were also used to georeference some buildings and agricultural fields around Gjøvik city for this demonstration.

For these specific example queries of Lake Mjøsa, several datasets were downloaded and stored in the PostgreSQL database. Table 1 shows different datasets and the details used in the case study.

4.2. Example Queries

This section demonstrates various experimental queries with varying complexity to showcase the proposed framework’s potential for semantic analysis of environmental geospatial data. Non-spatial queries were applied to weather and hydrological data (e.g., temperature, water level) for temporal and attribute analysis. In contrast, spatial queries used data with spatial components (e.g., location, shape, relationships) for geospatial visualization and analysis. These examples illustrate the framework’s flexibility in handling spatial and non-spatial environmental data.

4.2.1. Example Queries for Weather/Hydrological Data Analysis (Non-Spatial Queries)

Weather and hydrological data are crucial in many decision-making scenarios. SPARQL can be used to formulate these non-spatial queries and retrieve data based on the semantic relationships defined within the ontology. These queries focus on weather/hydrological data (e.g., temperature, water levels, etc.) and their relationships with entities (e.g., locations, periods) rather than spatial coordinates or geometries. This framework is designed to integrate environmental data from diverse sources via the OBDA system based on their semantic connection captured in the ontology, facilitating in-depth analysis and decision-making for stakeholders. Near-real-time weather/hydrological data streams were loaded via APIs to the local database to test these queries within this framework. For this study, seven weather stations were selected around the lake, but datasets such as water level, discharge level, water temperature, and air temperature from one weather station were loaded to query and analyze.

Example 1.

(E1) Retrieve Basic Weather/Hydrological Data.

Objective: Retrieve basic weather/hydrological information from a weather station. The Vismunda station was selected for this example. Listing 5 presents the example query to obtain simple weather/hydrological data from the database. Figure 9 presents the retrieved partial result in table format from SQI.

Listing 5. SPARQL query of E1.

Explanation: This query retrieves near-real-time weather/hydrological data from the Vismunda station, specifically water and air temperatures (:hasAirTemperature and :hasWaterTemperature), water level (:hasWaterLevel), and water discharge (:hasWaterDischarge) measurements according to the data properties defined in the ontology.

Example 2.

(E2) Filtered Hydrological Data Retrieval.

Objective: Retrieve hydrological data from a specific weather station based on a date range. The Vismunda station was selected for this example. Listing 6 represents the example query for retrieving hydrological data within a predefined date range. Figure 10 shows the partial result from SQI.

Listing 6. SPARQL query of E2.

Explanation: This query retrieves near-real-time hydrological data from the Vismunda station from January 1st to February 1st of 2024, specifically Water temperature (:hasWaterTemperature) and water level (:hasWaterLevel) measurements according to the data properties defined in the ontology.

Example 3.

(E3) Aggregated Hydrological Analysis.

Objective: Perform aggregated hydrological data analysis, calculating average, minimum, and maximum monthly temperature. The Vismunda station was selected for this example. Listing 7 presents the example query for Example 3. Figure 11 shows the screenshot of the result from SQI.

Listing 7. SPARQL query of E3.

Explanation: This query calculates the average water temperature (AVG(?temperature)), maximum temperature (MAX(?temperature)), and minimum temperature (MIN(?temperature)) per month (?yearMonth) for the year 2024. It retrieves temperature observations (:hasWaterTemperature) and their timestamps (:hasResultTime) from weather stations (?hasStationID). The BIND function concatenates the year and month (YEAR(?timestamp)-MONTH(?timestamp)) into a ?yearMonth format. Results are grouped by yearMonth and ordered accordingly.

4.2.2. Example Queries for Spatial Analysis with GeoSPARQL Functions (Spatial Queries)

Spatial queries are crucial for cross-domain data retrieving and analysis based on spatial relationships, proximity, or geographic patterns. These queries enable users to analyze spatial data effectively, make informed decisions, create accurate models, and develop practical solutions to emerging spatial problems. Additionally, this demonstration highlights GeoSPARQL functions, which facilitate querying and reasoning with spatial data in a standardized way.

Example 4.

(E4) Query to find weather stations of interest within a specified distance.

Objective: This SPARQL query can retrieve all the weather stations of interest within a specific distance from the given point of interest. Gjøvik train station was selected as this example’s given point of interest. Listing 8 shows the example query for Example 4, and Figure 12 shows the result from the SQI.

Listing 8. SPARQL query of E4.

Explanation

Point of Interest Definition:

The query is based on a triple defining the point of interest with its geometry and the name “Gjøvik Train Station”. They are classified under a custom ontology class <http://www.missionmjøsa.org/test_project/poi>.

Weather Stations Definition:

The query is based on weather stations, defined by their geometries and names, and classified under a custom ontology class

<http://www.missionmjøsa.org/test_project/weather_station>.

Distance Calculation:

(geof:distance(?wkt, ?wkt2, uom:metre): Calculates the distance between the train station geometry and each weather station geometry in meters.

Distance Filtering:

FILTER(?distance <= 5000): Filters weather stations that are within 5000 m (5 km) from the train station.

Result Sorting:

ORDER BY ?distance: Sorts the results by distance, showing the nearest POIs first.

Distance calculation within different points of interest can also be used for many other scenarios. Measuring proximity to environmental hazards can be used to determine the distance from residential areas to pollution sources such as factories, landfills, or contaminated water bodies. To monitor deforestation and land use changes, this query can assess the distance between deforested areas and conservation zones to evaluate the impact on ecosystems. For climate change studies, measuring distances between climate monitoring stations to analyze spatial patterns in temperature, precipitation, and other climate variables is useful for different stakeholders in Mission Mjøsa. Furthermore, in disaster response and management cases, determining the proximity of critical infrastructure and population centers to natural hazards like flood zones, wildfires, and earthquake-prone areas for risk assessment and planning is beneficial. One can also combine a couple of these scenarios to find more unique information. This is where the power of semantic web technologies shines with powerful cross-domain analysis.

Example 5.

(E5) Identify buildings within a certain distance from the shoreline.

Objective: This SPARQL query will find all buildings within a specified distance from the lake’s shoreline. This distance could be determined based on guidelines or regulations. The primary goal is to protect the lake’s water quality by regulating the land use around it. This involves identifying buildings within a certain distance (buffer zone) of the lake’s shoreline and defining a new buffer zone where development activities should be limited. Listing 9 presents the example query for Example 5, and Figure 13 shows the spatial query results from SQI. For this example, some buildings around Gjøvik municipality were georeferenced. However, building data from other more accurate open map data sources like Oveture Maps (https://overturemaps.org/ (accessed on 5 September 2024)) or Open Street Maps (https://www.openstreetmap.org/ (accessed on 4 April 2024)) can be used for more advanced analysis.

Listing 9. SPARQL query of E5.

Explanation

Lake Definition:

The query is based on a triple defining the lake with its geometry and the name “Mjøsa”. They are classified under a custom ontology class <http://www.missionmjøsa.org/test_project/lake>.

Building Definition:

The query is based on buildings, defined by their geometries and ids, and classified under a custom ontology class <http://www.missionmjøsa.org/test_project/building>.

Buffer Calculation:

BIND(geof:buffer(?wkt1, 20, uom:metre) AS ?buff): Calculates the buffer zone of 20 m from the lake shoreline.

Distance Filtering:

FILTER(geof:sfIntersects(?wkt2, ?buff)): Filters buildings intersecting with the buffer zone of 20 m from the lake shoreline.

Identifying activities and residential, industrial, or other buildings within a buffer zone is vital. This approach ensures environmental protection, regulatory compliance, and community health and supports research and monitoring efforts. By using this spatial query, municipalities around Lake Mjøsa can make data-driven decisions to maintain the ecological integrity of areas around the lake. Specifically for environmental protection, protecting the buffer zone prevents pollutants from entering the lake, maintains water quality, and supports aquatic life. Moreover, this can lead to ecosystem preservation by limiting development around Lake Mjøsa. Also, land use regulation, planning, and zoning can be optimized to balance development with environmental conservation. Furthermore, community and public health can be improved by protecting buffer zones, ensuring they remain safe and clean for recreational activities such as swimming, fishing, and boating. Authorities can use data from these queries to make informed decisions about land use, conservation efforts, and environmental protection policies.

Example 6.

(E6) Identifying areas around the lake where agricultural land parcels intersect the lake boundary (within a buffer zone) for zoning regulations.

Objective: This SPARQL query will identify agricultural land parcels intersecting with the lake boundary and its surrounding buffer zone to enforce zoning regulations. This helps ensure that agricultural activities do not adversely impact the lake’s water quality and ecosystem. Listing 10 presents an example query for Example 6, and Figure 14 shows the spatial query results from SQI. For a demonstration of the query for this example, some agricultural plots around Gjøvik municipality were geo-referenced.

Listing 10. SPARQL query of E6.

Explanation

Lake Definition:

The query is based on a triple defining the lake with its geometry and the name “Mjøsa”. They are classified under a custom ontology class <http://www.missionmjøsa.org/test_project/lake>.

Agricultural plot Definition:

The query is based on agricultural plots, defined by their geometries and ids, and classified under a custom ontology class <http://www.missionmjøsa.org/test_project/agri_plots>.

Buffer Calculation:

BIND(geof:buffer(?wkt1, 50, uom:metre) AS ?buff): Calculates the buffer zone of 50 m from the lake shoreline.

Identify Intersections:

BIND(geof:intersection(?wkt2, ?buff) AS ?inter): Identifies where agricultural land parcels intersect with the lake’s buffer zone.

Identifying agricultural plots within a buffer zone is essential for decision-making, especially for enforcing zoning regulations and protecting ecosystems like Lake Mjøsa. This query targets agricultural land parcels intersecting the lake’s boundary to support environmental protection, water quality maintenance, and sustainable agriculture. Stakeholders can protect water resources and biodiversity by preventing runoff from farming activities, such as fertilizers or pesticides. Regulatory authorities can use this information to manage agricultural activities, promote eco-friendly practices, ensure compliance, and support research, motivating informed decision-making to preserve/improve the lake’s current condition and support sustainable development goals.

5. Discussion

Primarily, this study was aimed at two research questions. The first one was to establish a reproducible framework to integrate heterogeneous environmental geospatial data via semantic web technologies and query them to find helpful information for various decision-making scenarios. The proposed framework achieved this by using ontologies to create a unified data model. This approach uses data from various sources to be loaded into one database, queried, and interpreted semantically. The data integration pipeline was established by mapping diverse data sources to a common ontology using RDF to standardize the data model and SPARQL for querying. This approach ensured consistent, accurate, and scalable integration across different data formats from various sources, facilitating comprehensive demonstrations. Using non-spatial and spatial queries with different complexities in different examples provides a thorough understanding of the proposed framework’s strengths, challenges, and areas needing improvement. Several key strengths of the framework and challenges that need to be addressed for seamless integration of heterogeneous data sources were identified.

5.1. Strengths of the Framework for Integrating and Querying Environmental Geospatial Data

In most examples, the framework effectively manages wide variation in type and size, ranging from spatial to non-spatial data, while maintaining performance and accuracy. To maintain the performance and accuracy, several measures were taken, such as the mindful design of the ontology, an optimized database scheme to align with the ontology, data filtration on QGIS before importing into the database, the use of an appropriate coordinate system, the use of an optimized python script to retrieve near-real-time weather data from the API and use of appropriate data types to match with available data. The capability of maintaining performance and accuracy is essential for real-world applications where geospatial data can be voluminous and complex. Specifically, within the environmental domain, certain use cases require robust scalability as they involve frequent updates with new data, such as weather and hydrological information, often generated in real time or near real time. Thus, the framework’s ability to handle varied datasets efficiently is critical for ensuring scalability in these demanding scenarios.
In this framework, data retrieval remains virtual, meaning the data are kept in their predefined datastore (in this case, a PostgreSQL database) in the native format rather than being materialized into different triple stores. This approach enhances the scalability of the framework by avoiding unnecessary data duplication and transformation, which also reduces maintenance and storage costs. Additionally, any changes to the semantic data model (ontology) impact only the mappings between the ontology and the database schema, leaving the actual data untouched. This maintains data integrity and simplifies system updates.
The framework ensures seamless interoperability with different data storing, managing, and sharing systems (databases, APIs, GISs, etc.) by adhering to established standards and integrating with widely used geospatial and non-spatial data formats. A test database is used to load and store all the data to demonstrate the examples in this study. However, this can be further extended to connect and query multiple data sources simultaneously (database federation) without physically integrating the data into a single repository. This is crucial for integrating diverse data formats and enhancing data sharing across platforms as stakeholders often manage their own data silos for their decision-making purposes. This framework enhances interoperability and the effort to reduce data silos, motivating collaboration among different organizations and sectors. Therefore, this framework can provide seamless access to original data sources directly, ensuring that the most up-to-date information is always used. This smooth integration leads to better-informed decision-making and collaborative efforts in addressing complex spatial challenges.
The framework’s modular design allows for easy customization and extension. This framework is built on three major components: data storage, semantic translation, and data visualization. The loosely coupled nature of the components of the framework offers the ability to change/modify them to work seamlessly with a variety of other open-source technologies, ensuring flexibility and modularity based on the user requirements. As alternative solutions for data storage, other free and open-source database solutions like MySQL or MariaDB can also be used. Adapting to one of these databases would require modifying the data connectors (mappings) to support querying and interacting with a new database or spatial extension. However, the rest of the architecture would remain largely unaffected, as long as the data are accessible and the query translations are compatible. For the semantic translation component, there are several alternatives based on the nature of the translation (MKG vs. VKG) and user requirements. For the VKG approach, tools like Morph-RDB, D2RQ, OntoAccess, and Teiid, and for the MKG approach, tools like Virtuoso, Apache Jena, Blazegraph, or GraphDB, are alternatives. For the data visualization component, other mapping libraries could also be used based on the requirements. OpenLayers, for example, offers additional features such as complex projections and better handling of raster data. Mapbox GL JS, while partially open-source, offers high-performance rendering and advanced features like vector tiles. If the system requires more sophisticated features, Mapbox GL JS can replace Leaflet. Therefore, based on these alternatives, users have a great deal of reproducibility with different open-source technologies and the modular nature of the framework. Furthermore, users can adapt the framework to meet the specific needs of analyzing environmental data by changing the ontology to match available data and then adjusting the mappings to link the database with the ontology. Users can incorporate new data types as PostgreSQL supports various data formats and extensions like PostGIS, JDBC, or Foreign Data Wrappers (FWD) make it more extensible, and it efficiently handles custom spatial queries and data visualization on a map, thus enhancing its applicability across different use cases under the environmental domain. The SQIs’ user-friendly interface provides querying capabilities for users of varying technical backgrounds. This ease of use encourages adoption and reduces the learning curve associated with new technologies. Furthermore, the fact that data formats do not need to be converted allows more users to work with their well-known data formats, which makes the system even more user-friendly.

5.2. Challenges Within the Framework and Possible Improvements

Despite its strengths, the experimental demonstration of examples also highlights several challenges. This section discusses the identified challenges and possible remedies for seamless integration.

Performance bottlenecks have been observed in certain examples, particularly with complex queries involving extensive datasets like spatial data with many points or polygons. These issues can lead to slowdowns and delays in analysis and map visualization. To ensure consistent performance, it is crucial to address these bottlenecks through optimization techniques, efficient mappings, and improved data preprocessing and storage.
Currently, new datasets must be manually loaded into the central datastore to ensure the latest data are available for analysis, which is inconvenient, time-consuming, and resource-intensive. To overcome these challenges, the data updating process should be automated, or real-time access to data directly from their original sources should be established. The implementation of a database federation engine could facilitate seamless access to the most up-to-date data, reducing manual updates and storage requirements. However, to establish seamless access to data sources, collaboration with various stakeholders may be necessary to develop communication interfaces that effectively connect with their original databases.
Limited support for some standard or non-standard data formats may alarm some users. Ontop and PostgreSQL, particularly with extensions like PostGIS for spatial data, support various formats commonly used in environmental domains. However, some specific data formats (NetCDF, HDF5, GRIB, etc.) and use cases (Shapefiles (when considering advanced attributes or extensive datasets), GeoTIFF (when considering extensive raster datasets), Remote Sensing Data, Sensor Data Streams (e.g., IoT devices, real-time environmental sensors, etc.) may not be directly supported or may require additional tools and integrations. Therefore, users may need to use tools like GDAL/OGR, Python, and R scripts for data conversion and preprocessing before using them directly on this system. Moreover, OGC APIs can be considered for improving the compatibility of different data formats, including both standardized and non-standardized, as well as interoperability with other geospatial tools and systems (https://ogcapi.ogc.org (accessed on 5 September 2024)).
The initial setup complexity of the framework might be a barrier for some users, especially those without experience in geospatial data management, OBDA systems, or basic querying knowledge. The setup and configuration process can be challenging and resource-intensive, requiring ongoing support and development. Simplifying the setup process and offering guided user materials are crucial to attracting a broader user base. Hands-on sessions and workshops could improve the adaptability of the system. Demonstrating the benefits of the new framework compared to other GISs can significantly enhance adoption.
Technologies related to web services and third-party plugins are evolving fast. Rapid advancements in related technologies may make some parts of this framework obsolete. Therefore, frequent updates are necessary to stay relevant and continuously used to its fullest potential. But due to the modular nature of the framework, each component can be replaced with alternatives without much effort.
Some datasets are critical due to security and privacy concerns, and sharing them across platforms requires the necessary permissions and careful management of access controls for different datasets. This imposes additional challenges on the framework, particularly in protecting sensitive geospatial data. Implementing different user access levels based on roles or data sensitivity ensures that only authorized users can access specific datasets. Furthermore, compliance with data privacy and security regulations may require robust access controls, encryption, auditing, and secure data-sharing agreements to safeguard sensitive information while enabling effective collaboration across various databases.

The second research question was about the use of free and open-source semantic web technologies, software, and frameworks to develop a geospatial data integration framework with a web-based data retrieval portal for querying geospatial data for useful information for various decision-making scenarios. Open-source technologies, such as Ontop, and other free software and frameworks like PostgreSQL, pgAdmin, and QGIS, were used to support a web-based SPARQL Query Interface with other essential Internet technologies like HTML, CSS, and javascript. This interface would allow users to perform SPARQL queries directly over the underlying VKG engine (translation) and retrieve the results in visual format dynamically without dealing with any other technical aspects like traditional standalone tools. By using open-source tools, the development costs are minimized, and the system remains adaptable and extensible. Such a portal supports collaborative spatial decision-making by providing stakeholders with a unified platform to access, query, and visualize spatial data meaningfully, enabling informed decisions based on comprehensive, semantically integrated data sources.

Moreover, a performance evaluation can be carried out to improve the frameworks’ capabilities to ensure better and more accurate end results. Under each aspect of key components, data integration, semantic mapping, and usability and accessibility can be evaluated. For the performance evaluation for the data integration aspect, the objective is to assess how efficiently and effectively the framework integrates heterogeneous environmental geospatial datasets, including real-time and static data. Several evaluation metrics can be suggested to evaluate data integration performance.

Query execution time can be used to measure the response time of complex SPARQL queries across integrated datasets. Benchmark tests can be used to run standard SPARQL queries (spatial or non-spatial) and record query response times under varying data sizes. Data loading time can be used to track the time required to connect and load new datasets into the system by gradually increasing the dataset size or the number of concurrent queries can be run on the system to observe the performance and how it handles various real-world scenarios. Moreover, this can be extended by adding new data sources across different types, formats, and methods to check integration flexibility. System scalability can be used to evaluate how the framework handles increasing data volume and complexity. Large-scale datasets can be introduced to simulate and evaluate scalability.

To ensure the framework accurately aligns diverse datasets with the ontology, eliminating semantic mismatches and ensuring consistent data interpretation, there are a couple of metrics that can be evaluated. Mapping accuracy is one of the key parameters that can be checked to evaluate how well the framework performs under various scenarios. The correctness of OBDA mapping between the relational scheme and the ontology concept is the one of most important parts of obtaining correct results. Therefore, users can use test SPARQL queries with known outputs to verify correct semantic mapping. This can be also extended to check consistency of SPARQL query results. This can be evaluated using domain specific queries to check whether the ontology and data mappings yield accurate results.

For the performance evaluation of how easily and effectively users, regardless of technical expertise, can interact with the SPARQL Query Interface (SQI) to retrieve and interpret geospatial data, there are a few metric. Usability tests can be carried out to check the percentage of users successfully completing given tasks. The administration group can recruit diverse user groups (technical and non-technical) to complete specific tasks using the SQI. The time on task also can be used to check how the interface performs in real-world use cases when the user tries to carry out specific tasks. The time the user takes to execute any specific task and retrieve results can be measured. Surveys and questionnaires are also good methods to evaluate performance. Feedback can be collected on interface design, ease of use, and overall satisfaction.

Accurate spatiotemporal relationships of geospatial data are critical for policy formulation and decision-making in many scenarios in environmental domains. The framework’s ability to process and analyze large-scale geospatial data supports the development of evidence-based policies. Government agencies, urban planners, and environmental organizations can use the framework to gain insights into spatial phenomena, assess the impact of environmental actions, and make informed decisions to achieve the desired outcomes.

6. Conclusions and Future Work

This study presented a simple yet robust framework to enable semantic integration and querying for spatial and non-spatial scenarios in environmental geospatial data based on a well-known OBDA system, also known as a virtual knowledge graph paradigm. The framework extends the GeoSPARQL ontology to create a semantic model for environmental geospatial data, enabling the formulation of queries of varying complexities with SPARQL in a user-friendly semantic query interface. This approach leverages ontological relationships at an appropriate level, facilitating advanced data querying without unnecessary complexity.

Six example queries were formulated to cover both spatial and non-spatial scenarios to demonstrate some of the potential of this framework. A structured explanation for each query and its semantic positioning was presented. Furthermore, the system architecture that enabled the implementation of the proposed framework was explained. The approach was applied to examples of environmental geospatial data based on Lake Mjøsa, Norway. All the tools and plugins used in the framework are free and open-source software, which allows for the easy and cost-free replication of the framework for users such as public institutions, municipalities, researchers, and academics.

In conclusion, the proposed framework presents a powerful spatial and non-spatial scenario analysis tool for environmental geospatial data with significant scalability, interoperability, flexibility, and usability strengths. Moreover, this framework is excellent for integrating heterogeneous data formats from various sources and performing semantically rich queries. Still, they can be complex to set up and may not support advanced geospatial analysis functions natively. Therefore, addressing the identified limitations is essential for maximizing its potential. The broader implications of the framework highlight its significant value to the geospatial data community, impacting data integration, research, policy-making, and various application domains.

Several directions are suggested for future development to enhance the framework’s robustness, versatility, and appeal to many end-users. These directions are discussed below.

Development of a web-based platform: The Internet is a great way to connect many users simultaneously, which is the fundamental pillar of collaborative, informed decision-making. Therefore, integrating the framework within an online web-based platform is a great way to make it more available to users on the go, enhancing its usability and productivity. For the demonstration purpose in this study, the framework used a custom-built SPARQL Query Interface with a simple user interface with limited functions based on Ontops’ local SPARQL endpoint to visualize geospatial data. However, as geo-visualization is a fundamental core part of this kind of geo-analyzing tool, designing a proper set of analysis tools with dedicated visualization features for users to explore, understand, and present underlying geospatial data information interactively is crucial. At the moment, the SPARQL endpoint should be deployed before running the application. It is more work for the users to run two systems. Therefore, automatically deploying the SPARQL endpoint when the application starts is another way to enhance the user experience.

Multi-source data integration: This is an essential feature for a system with multistakeholder involvement. Using federated databases, users can query data from other databases through a virtual database without accessing data directly and physically integrating the data into one storage. This enables complex queries spanning multiple data sources to retrieve and combine data on the fly. This will enhance the collaborative decision-making among the stakeholders. Moreover, the use of OGC APIs (based on legacy OGC web service standards but updated to take advantage of modern web development practices (https://ogcapi.ogc.org/ (accessed on 5 September 2024)) [51]) for enhanced data and system interoperability for advanced geospatial analysis can also be considered for future development.

Integration of a visual query builder: Integrating a visual query builder into the system will significantly enhance the user experience for basic users. A tool to provide a graphical user interface (GUI) that allows users to construct SPARQL queries without manual query writing is another way to improve the user experience significantly. Instead of writing complex queries, users can utilize drag-and-drop features, visual components, and guided steps to create their queries in a short time. This feature will significantly help users with limited knowledge of query development and other related technologies. However, for more advanced users, having a command line query builder gives them more flexibility when extending the existing framework to meet their requirements.

Integration of raster data: As part of this ongoing effort to enhance the capabilities of this framework, integrating raster data represents a critical area for future development. Raster data, comprising grid-based data structures such as satellite imagery, digital elevation models, and climate data, provides detailed, continuous spatial information that is crucial for comprehensive environmental analysis and decision-making scenarios such as detailed environmental monitoring, spatial analysis, and modeling, decision support systems, and disaster management.

Author Contributions

Conceptualization and methodology, Sajith Ranatunga; software and validation, Sajith Ranatunga; writing—original draft preparation, Sajith Ranatunga; writing—review and editing, Sajith Ranatunga, Rune Strand Ødegård, Knut Jetlund, and Erling Onstein; supervision, Rune Strand Ødegård; project administration, Rune Strand Ødegård. All authors have read and agreed to the published version of the manuscript.

Funding

This research is part of the Mission Mjøsa project, a collaboration between the Norwegian University of Science and Technology (NTNU), Innlandet County Municipality, and other stakeholders and supported by the Norwegian Research Infrastructure Services (NRIS) project (NS11083K).

Data Availability Statement

These data were derived from the following resources available in the public domain: https://www.geonorge.no (accessed on 15 March 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Stock, K.; Guesgen, H. Chapter 10—Geospatial Reasoning With Open Data. In Automating Open Source Intelligence; Layton, R., Watters, P.A., Eds.; Syngress: Boston, MA, USA, 2016; pp. 171–204. [Google Scholar] [CrossRef]
Kipkemboi, W.; Kuria, B.; Kuria, D.; Sichangi, A.; Mundia, C.; Wanjala, J.; Muthee, S.; Goebel, M.; Rienow, A. Development of a Web-GIS Platform for Environmental Monitoring and Conservation of the Muringato Catchment in Kenya. J. Geovisualization Spat. Analysis. 2024, 7, 13. Available online: https://link.springer.com/article/10.1007/s41651-023-00143-3 (accessed on 22 April 2024). [CrossRef]
Pandey, S.; Kumari, N.; Dash, S.K.; Nawajish, S.A. Challenges and monitoring methods of forest management through geospatial application: A review. In Advances in Remote Sensing for Forest Monitoring; John Wiley & Sons: Hoboken, NJ, USA, 2022; pp. 291–328. [Google Scholar] [CrossRef]
Marano, G.; Langella, G.; Basile, A.; Cona, F.; De Michele, C.; Manna, P.; Teobaldelli, M.; Saracino, A.; Terribile, F. A Geospatial Decision Support System Tool for Supporting Integrated Forest Knowledge at the Landscape Scale. Forests 2019, 10, 690. [Google Scholar] [CrossRef]
Avezbaev, S.; Avezbaev, O.; Tashpulatov, S.; Sharipov, S. Implementation of GIS-based Smart Community Information System and concepts of Digital Twin in the field of urban planning in Uzbekistan. E3S Web Conf. 2023, 386, 05006. [Google Scholar] [CrossRef]
Morales, C.; Díaz, A.S.-P.; Dionisio, D.; Guarnieri, L.; Marchi, G.; Maniatis, D.; Mollicone, D. Earth Map: A Novel Tool for Fast Performance of Advanced Land Monitoring and Climate Assessment. J. Remote Sens. 2023, 3, 0003. [Google Scholar] [CrossRef]
Yang, L. A Web-Based Geodesign Tool for Evaluating the Integration of Transport Infrastructure, Public Spaces, and Human Activities. ISPRS Int. J. Geo-Inf. 2023, 12, 504. [Google Scholar] [CrossRef]
Miles, V.; Esau, I.; Pettersson, L. Using web GIS to promote stakeholder understanding of scientific results in sustainable urban development: A case study in Bergen, Norway. Sustain. Dev. 2023, 32, 2517–2529. [Google Scholar] [CrossRef]
Zhao, L.; Mbachu, J.; Liu, Z. Developing an Integrated BIM+GIS Web-Based Platform for a Mega Construction Project. KSCE J. Civ. Eng. 2022, 26, 1505–1521. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, M.; Jiang, L.; Yue, P. An interactive 4D spatio-temporal visualization system for hydrometeorological data in natural disasters. Int. J. Digit. Earth 2020, 13, 1258–1278. [Google Scholar] [CrossRef]
Nygren, O.; Calle, M.; Gonzales-Inca, C.; Kasvi, E.; Käyhkö, N. Automated geovisualization of flood disaster impacts in the global South cities with open geospatial data sets and ICEYE SAR flood data. Int. J. Disaster Risk Reduct. 2024, 103, 104319. [Google Scholar] [CrossRef]
Chen, W.; He, B.; Zhang, L.; Nover, D. Developing an integrated 2D and 3D WebGIS-based platform for effective landslide hazard management. Int. J. Disaster Risk Reduct. 2016, 20, 26–38. [Google Scholar] [CrossRef]
Aracri, G.; Caruso, A.; Folino, A. An ontology for the representation of Earth Observation data: A step towards semantic interoperability. In Knowledge Organization and Management in the Domain of Environment and Earth Observation (KOMEEO); Folino, A., Guarasci, R., Eds.; Ergon—Ein Verlag in der Nomos Verlagsgesellschaft: Würzburg, Germany, 2022; pp. 49–66. [Google Scholar] [CrossRef]
Correia, J.B.; Abel, M.; Becker, K. Data management in digital twins: A systematic literature review. Knowl. Inf. Syst. 2023, 65, 3165–3196. [Google Scholar] [CrossRef]
Ding, L.; Xiao, G.; Calvanese, D.; Meng, L. Consistency assessment for open geodata integration: An ontology-based approach. GeoInformatica 2021, 25, 733–758. [Google Scholar] [CrossRef]
Jelokhani-Niaraki, M.; Sadeghi-Niaraki, A.; Choi, S.-M. Semantic interoperability of GIS and MCDA tools for environmental assessment and decision making. Environ. Model. Softw. 2018, 100, 104–122. [Google Scholar] [CrossRef]
Hamdani, Y.; Xiao, G.; Ding, L.; Calvanese, D. An Ontology-Based Framework for Geospatial Integration and Querying of Raster Data Cube Using Virtual Knowledge Graphs. ISPRS Int. J. Geo-Inf. 2023, 12, 375. Available online: https://www.mdpi.com/2220-9964/12/9/375 (accessed on 20 February 2024). [CrossRef]
Dogru, A.G.; Toz, G. Integrative Geospatial Databases; International Society for Photogrammetry and Remote Sensing (ISPRS): Hanover, Germany, 2006; Available online: https://www.isprs.org/PROCEEDINGS/XXXVI/part4/WG-IV-8.pdf (accessed on 20 February 2024).
ISO/TC 211—Geographic information/Geomatics. Available online: https://www.iso.org/committee/54904/x/catalogue/p/1/u/1/w/0/d/0 (accessed on 22 October 2024).
OGC Standards. Open Geospatial Consort. Available online: https://www.ogc.org/publications/ (accessed on 22 October 2024).
INSPIRE Directive Overview—European Commission. Available online: https://knowledge-base.inspire.ec.europa.eu/overview_en (accessed on 22 October 2024).
The SOSI Standard—Geonorge Register. Available online: https://register.geonorge.no/sosi-standarden (accessed on 22 October 2024).
Al-Yadumi, S.; Xion, T.E.; Wei, S.G.W.; Boursier, P. Review on Integrating Geospatial Big Datasets and Open Research Issues. IEEE Access 2021, 9, 10604–10620. [Google Scholar] [CrossRef]
Andresel, M.; Siska, V.; David, R.; Schlarb, S.; Weißenfeld, A. Adapting Ontology-based Data Access for Data Spaces. In Proceedings of the Second International Workshop on Semantics in Dataspaces, co-located with the Extended Semantic Web Conference, Hersonissos, Greece, 27 May 2024. [Google Scholar]
Bereta, K.; Xiao, G.; Koubarakis, M. Ontop-spatial: Ontop of geospatial databases. J. Web Semant. 2019, 58, 100514. [Google Scholar] [CrossRef]
Calvanese, D.; Cogrel, B.; Komla-Ebri, S.; Kontchakov, R.; Lanti, D.; Rezk, M.; Rodriguez-Muro, M.; Xiao, G. Ontop: Answering SPARQL queries over relational databases. Semant. Web 2016, 8, 471–487. [Google Scholar] [CrossRef]
Assignment Mjøsa—NTNU. Available online: https://www.ntnu.no/oppdrag-mjosa (accessed on 10 October 2024).
Berners-Lee, T.; Hendler, J.; Lassila, O. The Semantic Web: A New Form of Web Content that is Meaningful to Computers will Unleash a Revolution of New Possibilities. In Linking the World’s Information, 1st ed.; Seneviratne, O., Hendler, J., Eds.; ACM: New York, NY, USA, 2023; pp. 91–103. [Google Scholar] [CrossRef]
W3C Semantic Web Activity Homepage. Available online: https://www.w3.org/2001/sw/ (accessed on 16 July 2024).
Allemang, D.; Hendler, J.; Gandon, F. Semantic Web for the Working Ontologist: Effective Modeling for Linked Data, RDFS, and OWL, 3rd ed.; ACM: New York, NY, USA, 2020; ISBN 978-1-4503-7617-4. [Google Scholar] [CrossRef]
Noy, N.; McGuinness, D. Ontology Development 101: A Guide to Creating Your First Ontology. Available online: https://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html (accessed on 26 June 2024).
Arvor, D.; Belgiu, M.; Falomir, Z.; Mougenot, I.; Durieux, L. Ontologies to interpret remote sensing images: Why do we need them? GIScience Remote Sens. 2019, 56, 911–939. [Google Scholar] [CrossRef]
Ding, L.; Xiao, G.; Pano, A.; Fumagalli, M.; Chen, D.; Feng, Y.; Calvanese, D.; Fan, H.; Meng, L. Integrating 3D city data through knowledge graphs. Geo-Spat. Inf. Sci. 2024, 1–20. [Google Scholar] [CrossRef]
Vaisman, A.; Chentout, K. Mapping spatiotemporal data to RDF: A SPARQL endpoint for Brussels. ISPRS Int. J. Geo-Inf. 2019, 8, 353. [Google Scholar] [CrossRef]
Páez, O.; Vilches-Blázquez, L.M. Bringing Federated Semantic Queries to the GIS-Based Scenario. ISPRS Int. J. Geo-Inf. 2022, 11, 86. [Google Scholar] [CrossRef]
GeoSPARQL—A Geographic Query Language for RDF Data. Open Geospatial Consort. Available online: https://www.ogc.org/standard/geosparql/ (accessed on 16 July 2024).
Li, W.; Song, M.; Tian, Y. An Ontology-Driven Cyberinfrastructure for Intelligent Spatiotemporal Question Answering and Open Knowledge Discovery. ISPRS Int. J. Geo-Inf. 2019, 8, 496. [Google Scholar] [CrossRef]
Brüggemann, S.; Bereta, K.; Xiao, G.; Koubarakis, M. Ontology-Based Data Access for Maritime Security. In The Semantic Web. Latest Advances and New Domains; Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 741–757. [Google Scholar] [CrossRef]
Ding, L.; Xiao, G.; Pano, A.; Stadler, C.; Calvanese, D. Towards the next generation of the LinkedGeoData project using virtual knowledge graphs. J. Web Semant. 2021, 71, 100662. [Google Scholar] [CrossRef]
Ding, L.; Xiao, G.; Calvanese, D.; Meng, L. A Framework Uniting Ontology-Based Geodata Integration and Geovisual Analytics. ISPRS Int. J. Geo-Inf. 2020, 9, 474. [Google Scholar] [CrossRef]
Kharlamov, E.; Hovland, D.; Skjæveland, M.G.; Bilidas, D.; Jiménez-Ruiz, E.; Xiao, G.; Soylu, A.; Lanti, D.; Rezk, M.; Zheleznyakov, D.; et al. Ontology Based Data Access in Statoil. J. Web Semant. 2017, 44, 3–36. [Google Scholar] [CrossRef]
Kharlamov, E.; Mailis, T.; Mehdi, G.; Neuenstadt, C.; Özçep, Ö.; Roshchin, M.; Solomakhina, N.; Soylu, A.; Svingos, C.; Brandt, S.; et al. Semantic access to streaming and static data at Siemens. J. Web Semant. 2017, 44, 54–74. [Google Scholar] [CrossRef]
Ghosh, A.; Šimkus, M.; Calvanese, D. Semantic Querying of Integrated Raster and Relational Data: A Virtual Knowledge Graph Approach. In Proceedings of the 17th International Rule Challenge and 7th Doctoral Consortium@ RuleM+ RR, RuleML+ RR-Companion 2023, Oslo, Norway, 18–20 September 2023. [Google Scholar]
Li, W.; Wang, S.; Wu, S.; Gu, Z.; Tian, Y. Performance benchmark on semantic web repositories for spatially explicit knowledge graph applications. Comput. Environ. Urban Syst. 2022, 98, 101884. [Google Scholar] [CrossRef]
R2RML: RDB to RDF Mapping Language. Available online: https://www.w3.org/TR/r2rml/ (accessed on 1 July 2024).
Ontop. Available online: https://ontop-vkg.org/ (accessed on 1 July 2024).
Dybdekartbok 1984. Available online: https://gis3.nve.no/metadata/tema/DKBok1984/Dybdekart_1984.htm (accessed on 2 July 2024).
Mjøsa. Wikipedia. 2024. Available online: https://en.wikipedia.org/w/index.php?title=Mj%C3%B8sa&oldid=1207624036 (accessed on 2 July 2024).
Geonorge. Available online: https://www.geonorge.no/ (accessed on 2 July 2024).
The Norwegian Water Resources and Energy Directorate. Available online: https://www.nve.no/english/ (accessed on 2 July 2024).
Hobona, G.; Simmons, S.; Masó-Pau, J.; Jacovella-St-Louis, J. OGC API Standards for the Next Generation of Web Mapping. Abstr. ICA 2023, 6, 91. [Google Scholar] [CrossRef]

Figure 1. Representation of a simple graph related to a lake.

Figure 2. Key steps involved in an OBDA system application.

Figure 3. Ontology-based framework for environmental geospatial data integration.

Figure 4. The simplified process diagram for implementation of the proposed framework.

Figure 5. Extended version of GeoSPARQL ontology based on the custom requirement of experimental queries (Class hierarchy (left) and Data Property hierarchy (right)).

Figure 6. Example mapping (for weather stations) developed for the framework.

Figure 7. The interface of the SPARQL Query Interface.

Figure 8. Lake Mjøsa and its positioning in a zoomed-out map of Norway.

Figure 9. Part of the query result of E1.

Figure 10. Part of the query result of E2.

Figure 11. Query result of E3.

Figure 12. A part of the SQI result of E4 in table and visual format.

Figure 13. A part of the SQI result of E5 in both table and visual format.

Figure 14. A part of the SQI result of E6 in table and visual format.

Table 1. Datasets used in the case study for Lake Mjøsa.

Dataset	Description	# of Records	Format	Source
Municipalities	Polygon geometry, names	7	.geojson	GeoNorge
Measuring stations	Location, name, station ID, point geometry	423	.shp	NVE
Weather/hydrological data	Water level, water, and air temperature, discharge level	1654	Near-real-time data via API (.geojson)	NVE
Buildings & agri fields (georeferenced)	Polygon geometry	76, 21	.shp	OpenStreetMap
Lake database	Polygon geometry, name, area, and lake-related data	938	.shp	NVE

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ranatunga, S.; Ødegård, R.S.; Jetlund, K.; Onstein, E. Use of Semantic Web Technologies to Enhance the Integration and Interoperability of Environmental Geospatial Data: A Framework Based on Ontology-Based Data Access. ISPRS Int. J. Geo-Inf. 2025, 14, 52. https://doi.org/10.3390/ijgi14020052

AMA Style

Ranatunga S, Ødegård RS, Jetlund K, Onstein E. Use of Semantic Web Technologies to Enhance the Integration and Interoperability of Environmental Geospatial Data: A Framework Based on Ontology-Based Data Access. ISPRS International Journal of Geo-Information. 2025; 14(2):52. https://doi.org/10.3390/ijgi14020052

Chicago/Turabian Style

Ranatunga, Sajith, Rune Strand Ødegård, Knut Jetlund, and Erling Onstein. 2025. "Use of Semantic Web Technologies to Enhance the Integration and Interoperability of Environmental Geospatial Data: A Framework Based on Ontology-Based Data Access" ISPRS International Journal of Geo-Information 14, no. 2: 52. https://doi.org/10.3390/ijgi14020052

APA Style

Ranatunga, S., Ødegård, R. S., Jetlund, K., & Onstein, E. (2025). Use of Semantic Web Technologies to Enhance the Integration and Interoperability of Environmental Geospatial Data: A Framework Based on Ontology-Based Data Access. ISPRS International Journal of Geo-Information, 14(2), 52. https://doi.org/10.3390/ijgi14020052

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Use of Semantic Web Technologies to Enhance the Integration and Interoperability of Environmental Geospatial Data: A Framework Based on Ontology-Based Data Access

Abstract

1. Introduction

2. Background and Related Work

2.1. Geospatial Data and Semantic Technologies

2.2. Semantic Integration and Ontology-Based Data Access (OBDA)

2.3. Related Work

3. Methodology

Overview of the Framework

4. Mission Mjøsa and Example Queries

4.1. Mission Mjøsa

4.2. Example Queries

4.2.1. Example Queries for Weather/Hydrological Data Analysis (Non-Spatial Queries)

4.2.2. Example Queries for Spatial Analysis with GeoSPARQL Functions (Spatial Queries)

5. Discussion

5.1. Strengths of the Framework for Integrating and Querying Environmental Geospatial Data

5.2. Challenges Within the Framework and Possible Improvements

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI