Ontology-Based Spatial Data Quality Assessment Framework

Yılmaz, Cemre; Cömert, Çetin; Yıldırım, Deniz

doi:10.3390/app142110045

Open AccessArticle

Ontology-Based Spatial Data Quality Assessment Framework

by

Cemre Yılmaz

^*

,

Çetin Cömert

and

Deniz Yıldırım

Department of Geomatics Engineering, Karadeniz Technical University, Trabzon 61000, Türkiye

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(21), 10045; https://doi.org/10.3390/app142110045

Submission received: 10 September 2024 / Revised: 25 October 2024 / Accepted: 30 October 2024 / Published: 4 November 2024

(This article belongs to the Special Issue Current Practice and Future Directions of Semantic Web Technologies)

Download

Browse Figures

Versions Notes

Abstract

:

Spatial data play a critical role in various domains such as cadastre, environment, navigation, and transportation. Therefore, ensuring the quality of geospatial data is essential to obtain reliable results and make accurate decisions. Typically, data are generated by institutions according to specifications including application schemas and can be shared through the National Spatial Data Infrastructure. The compliance of the produced data to the specifications must be assessed by institutions. Quality assessment is typically performed manually by domain experts or with proprietary software. The lack of a standards-based method for institutions to evaluate data quality leads to software dependency and hinders interoperability. The diversity in application domains makes an interoperable, reusable, extensible, and web-based quality assessment method necessary for institutions. Current solutions do not offer such a method to institutions. This results in high costs, including labor, time, and software expenses. This paper presents a novel framework that employs an ontology-based approach to overcome these drawbacks. The framework is primarily based on two types of ontologies and comprises several components. The ontology development component is responsible for formalizing rules for specifications by using a GUI. The ontology mapping component incorporates a Specification Ontology containing domain-specific concepts and a Spatial Data Quality Ontology with generic quality concepts including rules equipped with Semantic Web Rule Language. These rules are not included in the existing data quality ontologies. This integration completes the framework, allowing the quality assessment component to effectively identify inconsistent data. Domain experts can create Specification Ontologies through the GUI, and the framework assesses spatial data against the Spatial Data Quality Ontology, generating quality reports and classifying errors. The framework was tested on a 1/1000-scale base map of a province and effectively identified inconsistencies.

Keywords:

spatial data quality; quality ontology; web-based quality assessment; specification ontology; spatial data quality ontology; Semantic Web Rule Language (SWRL); rules-based spatial data quality assessment

1. Introduction

Institutions produce and share spatial data with the National Spatial Data Infrastructure (NSDI). Quality assessments of data produced by governmental and other entities frequently reveal flaws in data management, particularly those resulting from redundancy and judgments made during the creation or manipulation of spatial data. The production of spatial data in institutions often suffers from adhering to the specified requirements and specifications. In addition, these institutions, because of their purposes, have separate and sometimes conflicting policies, even though they may have common interests. Attempts at interoperability may fail due to human error, such as de facto solutions. Inefficiency in management and delegation leads practitioners to their own temporary solutions to quickly “solve” the problems they encounter. Partial inputting a road detail, where the road detail has a common boundary with a building, is an error encountered in the base maps and can be given as an example of a quick temporary “solution”. As a result of the problems, inconsistencies in the produced data are inevitable, which in turn reduces their overall quality.

Quality refers to logical consistency that is accepted as conformance of data to specifications. Specifications include rules with respect to data application schemas, including attribute and topological rules in a specific domain. “Contours must not cross each other” is a rule related to the contour feature in an application schema. Examples of inconsistencies in spatial data produced by an organization are intersecting contours, a road crossing over a building, or a missing attribute value for a feature when it is required. Failure to manage quality and overcome associated problems has, among technical problems, economic consequences. In the U.S. alone, poor data quality costs millions of dollars per organization and billions of dollars overall [1,2].

In addition to domain experts and people in the field, institutions employ various experts who specialize in specific software. The structure of the software is rigid, and the quality assessment process is performed manually. Decision makers may not be able to make the best decisions, even if they are personally able to conceive those decisions. This limitation is exacerbated by the lack of interaction between institutions, which increases data redundancies and inevitably reduces efficiency. Non-standardized, proprietary-based software- or manual quality assessment methods cause reusability, interoperability, and extensibility problems. Research into novel, extensible, reusable, interoperable, and web-based methods for evaluating the quality of geospatial data is required considering these issues.

Currently, the technical backbone of NSDI is web services and it is evolving into Semantic Web services. The fundamental building blocks of Semantic Web services are ontologies. Ontology can be defined at an abstraction level that allows for easy reusability by new users with eventually different datasets at hand. Therefore, in this study, ontology-based methods for quality assessment have been researched in addition to the software-based methods that are already applied by institutions or in the literature.

There is academic research based on ontologies [3,4,5,6,7,8] and software such as 1Spatial 1Integrate and ArcGIS Data Reviewer that propose solutions to overcome various data quality assessment issues [9,10,11]. QGIS Geometry Checker is an open-source software (https://docs.qgis.org/3.34/en/docs/user_manual/plugins/core_plugins/plugins_geometry_checker.html, accessed on 1 September 2024) alternative. Even if the software is open source, it will not be easily customized per institutional needs. The institution needs to have capable experts in one or more software types and adapt the institution to that software. The software user has to know which operation to apply and has to know the input and output. These proposed solutions exist alongside several traditional quality assessment software, most of which are proprietary [10,11], and have their own closed rule-based structures [10,11,12].

In the literature, there are several ontology-based studies for quality management. While there are various ontology-based solutions available for data quality management, it is important to note that some of these solutions primarily focus on specific domains and do not specifically address the challenges associated with spatial data quality management [3,4,5,6].

Geisler [3] proposes a data quality management system based on ontologies, especially for data streams. Functions for the quality assessment are implemented as instances in the system with correspondent SPARQL query constructs. Debattista et al. [4] propose Luzzu, a linked data quality assessment framework that uses an ontology for the back end. It implements a domain-specific language: Luzzu Quality Metric Language (LQML). The metrics that are applied with Luzzu are already classified for linked data. Users can invoke these metrics with the associated LQML terms. Fürber [5] introduced the Semantic Web Information Quality Assessment Framework. The framework has a user wiki, which enables users to define quality requirements based on predefined data quality dimensions specific to Semantic Web data. It implements a Data Quality Management vocabulary to formally represent user-defined quality requirements. Our system shares similarities with [3,4,5] in that they all allow users to define quality requirements based on predefined rules. These studies are all domain-dependent, and our system focuses on spatial data-related quality dimensions incorporating a user interface to define SfO for any spatial domain. SDQO has Semantic Web Rule Language (SWRL) rules that are created for topo-semantic consistency assessment, which makes our system different from the mentioned studies. Zhu [6] explains an ontology-based quality assessment framework, the Semantic Framework for Data Quality Assessment (SemDQ), for specific clinical domain data. SemDQ is similar to the one proposed in this study. Two types of ontology were designed: one for representing data model, and the other is data quality criteria ontology designed for quality dimensions. The main difference is the SfO creation in our system for users to define their rules and support spatial quality dimensions.

These existing solutions offer valuable approaches for managing data quality across diverse domains, but there remains a need for dedicated solutions tailored specifically to spatial data quality management. Studies based on ontologies have also been conducted, particularly for the management of spatial data quality [7,8,12,13,14,15,16,17,18,19].

The literature review suggests that ontology-based studies share common implementation aspects in spatial data quality management. For any domain, rules are created to assess data against consistency to its specification rules. These rules are implemented with ontologies by using the recommended or standard rule languages, for instance, SWRL, SPARQL Protocol and RDF Query Language (SPARQL), Semantic Query-Enhanced Web Rule Language (SQWRL), and the Rule Interchange Format (RIF).

Mostafavi [7] is one of the pioneer researchers using ontologies for spatial data quality assessment. In the study, the National Topographic Data Base of Canada is assessed against the rules of schema. Despite leveraging ontologies, the study adopts a closed-world approach using Prolog rules. While it successfully performs the task of testing the logical consistency of data, it is not designed for reuse. The user can locate the data that do not follow that schema, but they cannot customize the rules. Any changes to the schema necessitate a complete rewrite. In the Prolog section, inconsistency rules are implemented to identify sources of data inconsistencies. Wang et al. [8] used SWRL for spatial data quality assessment and drew attention to the need for further development of ontology models and rule description languages to support spatial rules. This research investigated data validation in situ, while collecting and entering data from the field using mobile devices. The studies [7,8] are domain dependent. Our study investigates data quality assessment of spatial data according to their specifications. In the central ontology part, SWRL rules follow OWA. Reusability and robustness against ontological inconsistency identifying the root of the problem are prime aspects in our study.

For semantic sensor networks and the related quality aspects, Degbelo [20] created a design pattern and used SWRL to infer quality results. A system for context-based spatial information retrieval using spatial functions and SWRL was proposed by [13]. It introduced a mechanism for querying real-time sensor data during rule execution, making Geographic Information Retrieval systems more adaptable to the dynamic nature of user contexts and environmental changes. It demonstrates how SWRL can be applied to model user preferences and retrieve context-aware geographical information. Although this study is not directly related to quality assessment, it is an example of the use of SWRL rules in context-based applications. Furthermore, there are studies using SWRL for quality assessment in health and music domains [6,21]. Varadharajulu et al. [22] proposed a system validating the acceptability of recommended street names against existing rules in the regulation using SWRL rules. The main goal of the study was to leverage SWRL to automate tasks related to spatial transactions, such as updating, managing, or validating geospatial data. The use of SWRL helps create flexible, rule-based systems that ensure the correct handling of spatial data in accordance with predefined rules. The system has rules only for the applied domain and for attribute tests. The system lacks support for topo-semantic rules and does not include a component that enables users to create rules without requiring specialized expertise, a capability that our system provides.

Xu and Cai [19] propose a system based on ontologies and SPARQL queries to check the spatial compliance of underground utilities with required regulations. To achieve this, they designed four ontologies specific to underground utilities. These are utility product ontology, transportation object ontology, geometry ontology, and utility spatial rule ontology. These ontologies standardize the way data are represented, ensuring that heterogeneous data sources (geospatial utility data and textual regulations) are uniformly interpreted. The proposed system is domain specific but can be extendible to apply to different domains with remodeling. The specification rules for the domain are extracted with the help of a text and data converter. In our system, this is done by the SfO creation phase with a domain expert. Homburg [16] proposed an ontology-based framework to evaluate the “fitness for use” of thematic map data by modeling user requirements. The study proposes a framework containing ontologies for a thematic map, requirement profile, and data quality. SWRL and SPARQL are used as well to make inference with the constraints that were modeled in the study. While this study focuses on the “fitness for use” quality element for spatial data, our study focuses on logical consistency and topo-semantic quality element assessment.

Mobasheri [14] presented a method to assess OSM data quality in terms of completeness quality element, using SQWRL rules to contribute information extraction from the existing database. Its aim is to calculate and create the non-existing spatial information from the existing spatial data. For this purpose, the study creates three levels of ontologies and mappings between them. These are application ontologies that include domain ontology, task ontology, domain-independent ontologies, and linked-data. The system focuses on data quality enhancement instead of data quality assessment, which is the focus of our framework.

For the agriculture domain, Nash [15] discussed the automation of specification rules. To accomplish this, Nash et al. [23] used RIF to implement spatial rules and proposed GeoRIF. In their approach, specification rules are modeled as RIF rules, whereas our study implements similar rules using SWRL. As part of their contribution, Nash and colleagues developed a reasoner specifically designed for RIF to enable the execution of these rules. Our framework provides a user interface to create specification rules such as SfO. This situation enables the framework we propose in our study to be reusable, unlike the framework proposed in Nash et al. [23].

Qiu et al. [18] designed a domain-specific system to assess the quality of spatial data for autonomous cars proposing an ontology to identify the inconsistent data according to the rules created for them. It introduces a workflow for ensuring the quality of lane-level high-definition digital maps, particularly relevant for autonomous driving systems. The authors propose an ontology-based approach for map data quality assurance creating Map Quality Violation Ontology to formalize map errors and guide the violation detection and resolution process. MQVO defines the types of errors (e.g., topology or geometric errors), their severity levels, and the affected map objects, providing a structure to handle errors in a systematic way. The system is domain-dependent and does not provide a user interface to define rules as our system does.

This paper presents a framework that addresses the need for a reusable, interoperable, extensible, and web-based spatial data quality assessment. The framework is based on the development of ontologies, leveraging an open-world approach for formalization and reasoning. By adopting Open-World Assumptions and integrating ontologies, the framework aligns with the objective of creating solutions that are reusable, extensible, interoperable, and web-based. The insistence on closed-world approaches is another “elephant in the room” with significant economic costs [24]. The ontological framework is based on two facts:

There is a central regulatory system (the authority, the constitution) that is dynamic but stable (not rigid), an open-world approach. Changes are expected but not frequent. To model these needs, SDQO is designed. Data quality concepts and general spatial relation rules are the main scope of the designed ontology.
Some other regulations that have an inherent dependence on the central system. More such rules can be introduced into the system in the future without breaking any previous handling. These rules are more frequently modified and are open to interdependencies. SfO is designed to conceptualize the rules of an institution.

Furthermore, the implementation considers what should be expected from domain experts; in particular, they are not expected to have expertise in Semantic Web technologies and Open-World Assumption. Institutional ontologies (SfO) are highly hierarchical. They are created and modified using a GUI which is designed for domain experts or any user to create Specification Ontology without Semantic Web expertise. A short survey showed that this interface had a flat learning curve. After optimization, the interface generates a CSV file; this file is used to make modifications in the institutional ontology automatically.

The next Section provides background on data quality elements and ontology concepts. A detailed explanation of each component of the framework is presented in the same Section. Usability of the framework is demonstrated with a case study in Section 3.

2. Materials and Methods

The first step of this study is to determine the types of data quality problems to identify, by researching the basic spatial specifications of institutions such as the Large-Scale Map and Map Information Production Regulation [25], the General Command of Mapping Topographic data rules, and the INSPIRE Data Specification on Hydrography [26]. The second step involves devising two ontologies as the basic components of the proposed quality assessment framework according to the determined quality problems. The final step is the technical implementation of the framework using Java-based (Java SE 11) and open-source software.

2.1. Data Quality Elements

ISO 19157:2023, formerly ISO 19157:2013, is a spatial data quality standard that defines the principles [27,28]. It defines “data quality elements” by six components. “Logical Consistency”, one of the six, is the focus of this study. Domain consistency (attribute conformance) and topological consistency are two sub-elements of logical consistency [28]. Some studies also define data quality elements in the literature. Topo-semantic consistency is a component within the broader concept of topological consistency [29]. For two objects, based on their semantics, it pertains to the accuracy of their topological relationships. In the study, the concept of topo-semantic consistency is preferred over the topological consistency mentioned in the standards, since it is conceptually more aligned with the study objective. Table 1 and Table 2 list the quality elements that the study focused on. They are defined according to the most widely adopted rules in the specifications. The topo-semantic consistency element has several types that can be evaluated with the proposed framework. While “must not overlap” is used to determine whether the features of the same class spatially overlap or not, “must be within”, “must not overlap with”, and “must not cross” are used for topo-semantic validations between features of two or more classes.

Domain consistency includes quality element types for finding the attributes of characteristics that have no value, a value that differs from specified fixed value, or a value outside a defined range of values.

2.2. Ontology

Ontology is a widely recognized term that serves as a standard for domain modeling [30,31]. One of the primary advantages of ontology-based applications is their ability to promote interoperability and reusability of frameworks.

Quality management frameworks are designed for evaluating and representing quality results based on the rules of the application domain, and can leverage these benefits. In the context of the Semantic Web, for rules, the W3C SWRL [32] and SQWRL submissions are utilized in this study.

OWL stands for Web Language for Ontology. In decidable ontologies, reasoning ends with certainty. OWL DL is a dialect that guarantees decidability. Datalog is an alternative to Prolog and is more compatible with open-world reasoning and ontologies. SWRL combines OWL DL and a special subset of Datalog [32]. The Open-World Assumption is maintained in the SWRL. Therefore, no new clause will change the previous truth values; unknowns can become known.

2.3. Framework

The framework proposed in this study is depicted in Figure 1. The SfO Creation component is responsible for formalizing the rules according to the organization’s application schema. Formalization of the rules for a domain can be stored and reused for further assessments. A GUI is designed for domain experts considering they may have no expertise in Semantic Web technologies. As a result of the rule formalization process with GUI, two files are generated: an R2RML file to be used for data conversion and the other one is an SfO ontology for the institution. The data conversion component uses an R2RML mapping file to convert spatial data to be validated, into RDF. The ontology mapping component maps the generated SfO, SDQO, GeoSPARQL, and data ontologies. The rule-based mapping carried out is established through Java programs with a GUI. Initially, using a now simple GUI, mappings can be declared by the domain expert, and are then converted to a CSV file. This CSV file is automatically reorganized and yields the ontologies and R2RML mapping file. The mapping can also be edited. The Data Quality Assessment component is mainly based on SWRL rules created within SDQO, and assessment is carried out by inferencing with the help of a SWRL supporting reasoner, in this case, Openllet, an open-source version of Pellet [33], was chosen. The Java code uses facilities of the SWRLAPI library, as well. As a result of the quality assessment process, a final data quality report is produced.

2.3.1. Data Conversion with R2RML

Currently, significant portion of spatial data is stored in formats such as ESRI shapefiles. Therefore, to conduct quality assessments in the context of this study, it is necessary to employ tools or software that enable ontology-based data access or conversion. The system will do the following:

(1): Support ESRI shapefiles and a range of spatial data formats as input.
(2): Have associated Java library access.
(3): Have GeoSPARQL compatibility, given its role as the primary vocabulary for geographic data within the framework.
(4): Have the ability to convert attribute data associated with spatial data.

Several alternatives have been explored within the scope of this study, including GeoTriples, TripleGeo, DataMaster, and Ontop-spatial [34,35,36,37].

DataMaster (Protégé plug-in) is outdated. TripleGeo provides limited compatibility with multiple vocabularies, which constrains the translation of TripleGeo, a tool for integrating features from geospatial databases into RDF triples [35]. It does not support a range of attributes Ontop-spatial creates virtual RDF graphs for database access without materialization [36]. Ontop-spatial is only for limited types of geospatial databases (PostGIS, SpatiaLite, and Oracle spatial). GeoTriples was selected for data conversion as it meets all the requirements specified by the framework. GeoTriples is a tool designed to convert spatial data from various sources (including relational databases, shapefiles, and KML) into RDF graphs [34,38]. It utilizes R2RML (RDB to RDF Mapping Language), for data conversion. R2RML is used for creating custom mappings from relational databases to RDF graphs. This approach simplifies the process of attribute transfer.

As represented in Figure 2, R2RML, the W3C recommendation for a relational database to RDF conversion, uses RDF triples to define the mappings to RDF data [39].

2.3.2. Ontology Development

This study employs basically two types of ontologies: the SfO and the SDQO, as presented in Figure 3. The SDQO ontology encompasses essential rules and concepts pertaining to data quality assessment. The SfOs are designed as straightforward ontologies, with a focus on facilitating rule reuse among various datasets. Each institution can have one or more SfOs, tailored to specific requirements such as different scales. The SfOs import the SDQO, while the SDQO imports GeoSPARQL [40]. A data ontology is created by transforming institutions’ data that are in a proprietary format. To assess the spatial data quality after SfO creation and data conversion, the domain expert imports SDQO.

2.3.3. Spatial Data Quality Ontology

In a country, a multitude of institutions can exist that create and handle the very same spatial data, repeatedly, with questionable quality, to comply with the regulations of each institution and central authorities. Institutions need to assess the quality of the data they create or query the quality of the data they receive. This is the main problem that we intend to solve with the help of SDQO within the proposed framework.

For the design of the SDQO, several points are considered following a literature review, primarily focusing on concepts related to data quality and rule types within the spatial quality domain. Spatial and attribute rules classified as ‘forbidden’ and ‘necessary’ are identified based on the examined regulations. One example of a “necessary” type of spatial rule is that “Any building must be within at least one parcel”. “A road must not cross over a building” is an example of a “forbidden”-type rule. In addition to determining which rule types to use, we select questions for the ontology to answer. Some of the identified competency questions are as follows:

Which features have a particular set of data quality problems?
Which features have topo-semantic problems with the specified ones?
Which features have domain consistency problems?
What is the number of erroneous objects in the tested data?

For OWL ontologies, the top class is owl:Thing. Within the SDQO ontology, sdqo:DataQualityElement, sdqo:DataQualityResult, and ogc:SpatialObject are the direct subclasses. The ogc:SpatialObject class, the main class of GeoSPARQL, further branches into two subclasses: ogc:Geometry and ogc:Feature.

Within the SDQO ontology, the ogc:Feature class has three distinct direct subclasses, in addition to those imported from other ontologies.

These subclasses are sdqo:ResrictedFeature, sdqo:GeomClassifiedFeature, and sdqo:FixedRefFeature, as visualized in Figure 4.

An SWRL rule is utilized to automatically assign any feature to the appropriate subclass within sdqo:GeomClassifiedFeature, based on its dimension and the ogc:asWKT value of its geometry.

The sdqo:GeomClassifiedFeature class encompasses three subclasses: sdqo:CalcLine, sdqo:CalcPoly, and sdqo:CalcPoint. This information proves useful, for instance, in verifying the correct declaration of dimensions.

The sdqo:FixedRefFeature class includes OWL-named individuals that can serve as reference markers in rules. These reference individuals can be employed, for example, in attribute tests.

Under sdqo:RestrictedFeature, there exist four subclasses. The descriptions of these classes can be found in Table 3.

The sdqo:InterObjectPrRF class encompasses classes of features that have standardized spatial relations as object properties. This type of classification ensures generalized ontology modeling. Consequently, a domain-independent design is achieved.

sdqo:InterObjectPrRF has several pairs of classes, for example, sdqo:ClassCross1 and sdqo:ClassCross2. These classes are superclasses of the classes in the corresponding SfOs. For example, if there is a rule in the specification such as “Contours must not cross buildings and lakes” This rule is implemented in the SfO by creating required classes for “cannot cross rule”. As SfO imports SDQO, it establishes additional subclass relationships between the classes.

For instance, if sfo:Contour is a subclass of sfo:ClassCrf01 and sfo:Building, sfo:Lake, and its subclass sfo:PermanentLake are subclasses of sfo:ClassCrs01, the associated rules will infer that features of type sfo:Contour cross features of any of the other mentioned classes. Moreover, these rules will generate more detailed results regarding these occurrences, highlighting the violation of the specification rule stating that “contours must not cross over buildings and lakes”. Figure 5 shows the ontological correspondences of the “must not cross” rule.

In SfOs, rules are reinterpreted as hierarchical schemes. Table 4 presents a hierarchical representation of a specification rule.

The sdqo:IntraDataPrRF class was utilized to establish constraints within a single class using datatype properties. In contrast, the sdqo:IntraObjectPrRF class is used to impose constraints within a single class using object properties. For example, the rule “Parcels must not overlap” represents this constraint.

The sdqo:DataQualityElements class has several subclasses, namely, sdqo:LogicalConsistency, sdqo:Completeness, and sdqo: GeometricAccuracy. The last one has sdqo:GeometryValidity as a subclass.

The sdqo:TopoSemanticConsistency class is a subclass of the sdqo:LogicalConsistency. In addition, sdqo:LogicalConsistency has the subclass sdqo:DomainConsistency.

The sdqo:Completeness class has two subclasses, sdqo:Commission and sdqo:Omission. Commissions can be determined using SDQO rules, which also provide quantitative information about omissions.

sdqo:errorCode is one of the datatype properties. It is defined to provide a code that can be used to identify the specific data quality problem with the data, while the sdqo:hasMessage property is used to associate results and processes with corresponding error messages. This enables a more informative and descriptive representation of the quality assessment results.

The sdqo:hasQueryString datatype property is for SPARQL query strings.

Within the SDQO ontology, DE-9IM-type object properties for Simple Features [41] like ogc:sfOverlaps are utilized to represent spatial relations. Intersection matrix masking is employed for calculating relations [42].

When the interiors do not intersect, it is either sfDisjoint (32 different masks) or sfTouches (224 different masks).

While almost all feature pairs are expected to have an sfDisjoint relation, regulations typically do not deal with such relations. Consequently, sfDisjoint cases are disregarded, and sfTouches cases are inspected separately.

sdqo:interiorIntersects is the symmetric object property for feature pairs whose interiors do intersect. It is a super property for the properties below and their negations, as, except sfTouches, interior intersection is assumed. In relevant cases, exteriors must intersect, and therefore no object property is included for intersection of exteriors; it would be redundant. Not every mask is geometrically possible (for instance, TFFFTFFFF) for common simple features.

sdqo:nobb, sdqo:noib, sdqo:noie, and sdqo:nobe are the negations of sdqo:boundaryIntersects, sdqo:iandb, sdqo:iande, and sdqo:bande, respectively; sub-property relations are added as shown in Table 5.

In most cases, different masks yield different OGC relations, but there are exceptions. The mask “T*T***T**” indicates that besides the interior intersection, the exterior of one intersects with the interior of the other one. Depending on the dimensions of the features, this can be both ogc:sfCrosses or ogc:sfOverlaps. The mask “T*T***F**” can indicate a crossing type relation, if the dimensions allow.

sdqo:crossesOrOverlaps is an object property equivalent to super property that combines ogc:sfOverlaps and ogc:sfCrosses, which can have similar masks.

The object property sdqo:resultForData establishes a relationship between quality results and the corresponding data instances. It links the quality assessment results to the specific data they are associated with. Additionally, the object property sdqo:hasResult is used to relate data quality elements to the quality results they pertain to. It establishes a connection between the individual data quality elements (e.g., logical consistency) and the specific quality assessment results generated for them. To summarize, sdqo:resultForData links data with quality results, while sdqo:hasResult connects data quality elements with the corresponding results.

The created SDQO is designed to be extensible to a broad range of domains. Yildirim [43] extended SDQO for positional accuracy assessment of OSM data against General Command of Mapping Agency (GCM) 1/25,000 scaled topographical map data.

2.3.4. SfO and Graphical User Interface

SfO is designed to be user-friendly and manageable by domain experts who may not have expertise in Semantic Web Technologies. A set of GUIs has been developed to facilitate ontology creation and modification, as shown in Figure 6.

The GUIs allow domain experts to create and modify SfO based on the rules specific to their domain. Institutional specifications are often translated into hierarchical operations, such as adding or removing classes and subclass relations, which can be easily performed through the GUI.

When using the GUI, the domain expert begins by selecting the appropriate category for the name entered into the text field. The categories available for selection include Class, Datatype Property, Object Property, and Individuals. When dealing with geometric classes, the domain experts specify the geometry type and select the appropriate spatial relations, such as “Overlaps”, for the geometric classes. They can also define the restriction types, such as “Forbidden”, “Necessary”, or “Equivalent”.

For attribute-type rules relations such as “Same”, “LessThan”, “LessThanEQ”, and “OneOf” are used to express the constraints.

For example, in a specification that states a feature in the “Contour” class, representing line geometry (c1), must not cross over features in the “Building”, “Lake”, and “Permanent Lake”, which is a subclass of class “Lake” classes, representing polygon geometry (c2). This specification rule can be represented in a CSV row format, as follows:

“c1, c2, Forbidden, Crosses, Contour, Building|Lake|PermanentLake, timestamp_value”.

During the optimization stage, the collected CSV rows are processed and translated into basic operations of the SfO ontology. Several rules govern the translation of CSV rows, which are produced after selection through the GUI, into SfO ontologies. Assuming that the given CSV data row is not modified during optimization, it would be transformed as follows:

Create two top-level classes as direct subclasses of the appropriate SDQO classes, representing the crossing restriction. These classes can be automatically named based on their purpose and context. The first class is for features of lower dimensions. A sample description of the first class, named sfo:ClassCrf01, is shown in Table 6.
Create a class named “Contour” as a subclass of the first class in step 1.
Create classes named “Building”, “Lake”, and “Permanent Lake” as subclasses of the second class.
Incorporate timestamp values to indicate when regulatory changes take effect.

By implementing these operations in the SfO ontology, one effectively captures the crossing restriction, and its associated classes as represented in Table 6, allowing for proper representation and management of the spatial data quality constraints. The timestamp values provide temporal context and facilitate tracking changes or updates related to the regulation.

After the optimization process, the CSV rows will be sorted and potentially recombined for improving the organization and efficiency of the SfO ontology. The order of entities within a CSV row is selected with consideration for this optimization step. By analyzing the relationships and dependencies between entities, the optimization process can identify patterns and group related constraints together, leading to a more streamlined and coherent ontology structure. The sorting and recombination of CSV rows help ensure that the SfO ontology is organized logically and efficiently, facilitating easier management and maintenance by domain experts.

The following notations are used in the CSV rows to indicate the types of entities involved in the constraints or relationships specified in the ontology. In the first two entities, the entities represented by “c1”, “c2”, “c0”, and “cc” correspond to different types of classes:

“c0” refers to a geometric class of dimension 0 (point).
“c1” refers to a geometric class of dimension 1 (line).
“c2” refers to a geometric class of dimension 2 (polygon).
“cc” refers to a general class, possibly non-geometric.

“d”, “o”, and “i” represent different types of entities:

“d” represents a datatype property.
“o” represents an object property.
“i” represents an individual.

When specifying restrictions of the type “distinct features in the same class” (e.g., “Buildings cannot overlap”), the second entity has to be left blank.

The third and fourth entries in the CSV line represent the constraint type and the name of the constraint such as the OGC spatial relations. In other cases, relations such as “Same”, “LessThan”, “LessThanEQ”, and “OneOf” can be used.

The fifth and sixth entries in the CSV line contain the actual names of the classes, properties, or entities indicated by the first and second entities. In the optimized version, the sixth entry can contain multiple classes separated by the pipe sign (“|”), indicating an “OR” situation.

The last entry in the CSV line represents the timestamp. The ontology can be annotated with schema:startTime using that timestamp to indicate when the constraint or rule was reflected or updated.

The class hierarchy above for sfo:PermanentLake to be tested with the “mustNotCross” rule is shown in Figure 7. This hierarchy represents the organization of classes within the SfO ontology and provides a visual representation of the relationships between the classes involved in testing the “mustNotCross” rule.

Two points are essential in the reorganization process of CSV.

(1): Establish an order among SfO classes

There is an ordering only if there is a re(gu)lation involving the classes

Smaller dimension smaller class, otherwise lexical order

(2): Smaller class needs to be a single subclass, larger classes can have siblings.

Example

sfo01:ClassCrf07 Crf07 Crosses first 07 (Crosses relation, smaller side, there were 6 more before)

sfo01:ClassCrs07 Crs07 Crosses second 07

sf01:Road the sole subclass of sfo1:ClassCrf07

sfo1:Building and sfo01:Parcel subclasses of sfo01:ClassCrs07

Reorganize (Split Reorder Combine) CSV rows Algorithm

Sample triple of CSV rows:

c2, c2, Forbidden, Overlaps, b1, b2, 8599.99

c1, c2, Forbidden, Crosses, class1|class5, class4|class3|class2, 8600

c1, c2, Forbidden, Crosses, class11, class13, 8600.01

Meaning:

b1 and b2 are of dimension 2 and they (features in these classes) cannot overlap

class1 and class5 are of dimension 1 and they cannot cross class4, class3, class2, which are of dimension 2

class11 of dimension 1 cannot cross class13 of dimension 2.

Splitting result:

c2, c2, Forbidden, Overlaps, b1, b2, 8599.99

c1, c2, Forbidden, Crosses, class1, class4, 8600

c1, c2, Forbidden, Crosses, class1, class3, 8600

c1, c2, Forbidden, Crosses, class1, class2, 8600

c1, c2, Forbidden, Crosses, class5, class4, 8600

c1, c2, Forbidden, Crosses, class5, class3, 8600

c1, c2, Forbidden, Crosses, class5, class2, 8600

c1, c2, Forbidden, Crosses, class11, class13, 8600.01

Reordering result

c1, c2, Forbidden, Crosses, class1, class2, 8600

c1, c2, Forbidden, Crosses, class1, class3, 8600

c1, c2, Forbidden, Crosses, class1, class4, 8600

c1, c2, Forbidden, Crosses, class11, class13, 8600.01

c1, c2, Forbidden, Crosses, class5, class2, 8600

c1, c2, Forbidden, Crosses, class5, class3, 8600

c1, c2, Forbidden, Crosses, class5, class4, 8600

c2, c2, Forbidden, Overlaps, b1, b2, 8599.99

Combining result

c1, c2, Forbidden, Crosses, class1, class2|class3|class4, 8600

c1, c2, Forbidden, Crosses, class11, class13, 8600.01

c1, c2, Forbidden, Crosses, class5, class2|class3|class4, 8600

c2, c2, Forbidden, Overlaps, b1, b2, 8599.99

The following Algorithm 1 is for the case that the first two entries start with “c” (two sides of classes case).

Algorithm 1. CSV optimization

Begin
        If the first two entries of a row are not in lexical order
                Switch second and first entries, fifth and sixth entries
        End if

        Split rows first by fifth column then by sixth column using the function splitFurtherByColumn

                Begin function splitFurtherByColumn(csvText, columnNr, separator = “|”)
                        Initialize an empty list to store the resulting rows
                        Split the csvText into rows by newline
                        Loop through each row
                                Split the row into columns by comma
                                Check if the columnNr column contains the separator
                                        If true, split the column by the separator
For each value in splitValues,
create a new row with the same columns except columnNr
                                        Else add the original row to results
                                        End if
                        End loop
                        Return the modified rows as a single string
                End function
        Reorder entire rows with lexical ordering
        Combine subsequent rows by sixth column if the first five entries are identical, using pipes
                Timestamp is the minimum one.
End

In general, SfOs tend to have fewer rules compared to SDQO. SfO aims to translate more general relations into “is-a” relations within the scope of the ontology, if feasible. By identifying the feature pairs that cause errors, the SfO helps to pinpoint the specific data where violations occur. This approach helps to simplify the ontology structure and makes it more manageable for domain experts. The data are typically separate from SfOs, see Figure 8.

2.3.5. Rules for Quality Assessment

The role of SfOs for quality assessment involves the definition and enforcement of rules that ensure the adherence of spatial data to specified constraints and requirements. These rules are designed to evaluate the spatial data against the specifications and provide insights into the data quality. The rules can be defined and managed within the framework using a combination of spatial relations, constraints, and ontology modeling techniques. They are typically implemented in SDQO with SWRL. Basic rules are shown in Table 7. Rule 8 gives equivalent features.

The specification hierarchy classes within an SfO represent various levels of constraints and requirements for spatial data. These classes have superclasses that link to corresponding classes in the SDQO, thereby establishing a connection to quality assessment concepts and definitions. Their relationship with the SDQO classes serves as the basis for defining rules and constraints specific to the spatial domain.

SDQO utilizes spatial and non-spatial relationships between feature classes to define quality assessment rules. Spatial relations, such as “Overlaps”, “Crosses”, or “Contains” are defined using OGC standards [41].

The integration of data ontologies with SfOs, and therefore SDQO, allows for comprehensive quality assessment by combining spatial relations and constraints with actual data instances.

Ontology editing tools such as Protégé enable the manipulation of the class hierarchy in SfOs, which translate to addition or modification of constraints.

The domain experts can easily sustain the integrity of the relationships between classes, ensuring that moving classes does not break the relationships established within the ontology.

The framework facilitates the identification of feature pairs causing errors, allowing for targeted quality improvement effort. It allows institutional users to pinpoint and flag violations effectively.

A sample for relevant class hierarchies is given below for “Geometric classes with Forbidden relation” case with the associated SWRL rules. Subclass relation is denoted by the “<” sign.

“Inter-object Must Not” can easily be updated by updating the subclass relations of the specification classes or creating new classes with the appropriate subclass relations.

“If a and x, despite being in sdqo:ClassCross1 and sdqo:ClassCross2, respectively, still cross over each other, an error is present”.

Sample class hierarchy paths:

sfo:Road < sfo:ClassCrf07 < sdqo:ClassCross1 < sdqo:Cross < sdqo:InterObjectPrRF < sdqo:RestrictedFeature < ogc:Feature < ogc:SpatialObject < owl:Thing
sfo:Road < sfo:ClassCrf07 < [ sdqo:subnrCross = 7]
sfo:Building < sfo:ClassCrs07 < sdqo:ClassCross2 < sdqo:Cross < sdqo:InterObjectPrRF < sdqo:RestrictedFeature < ogc:Feature < ogc:SpatialObject < owl:Thing
sfo:Building < sfo:ClassCrs07 < [ sdqo:subnrCross = 7]

Associated SWRL rules (need to be free from SfO, order is irrelevant):

sdqo:ClassCross1(?a) ^ sdqo:ClassCross2(?x) ^ogc:sfCrosses(?a,?x) ^ sdqo:subnrCross(?a,?n)^sdqo:subnrCross(?x,?n)→ resultForData(DQR_SWRLCross, ?a)
sdqo:ClassCross2(?x) ^ sdqo:subnrCross(?a, ?n) ^ swrlb:stringConcat(?aa, ?ida, “ , “, ?idx, “ , “) ^ sdqo:featureID(?x, ?idx) ^ sdqo:featureID(?a, ?ida) ^ sdqo:subnrCross(?x, ?n) ^ ogc:sfCrosses(?x, ?a) ^ sdqo:ClassCross1(?a) → sdqo:hasMessage(DQR_SWRLCrosses, ?aa)

2.3.6. Assessment and Inference

Assessment is performed by SDQO and the related transactions. The reasoner in the system, in this study Openllet, will infer the quality results for the evaluated features after the SfO is built for a domain and integrated with SDQO, as illustrated in Figure 9. The error codes with inconsistent features are therefore included in the quality report that is produced. The JTS Topology Suite (JTS) calculates intersection matrix and spatial relations between the individuals of classes, while OWLAPI is used for ontology management and integration of results from JTS.

A Java program (subroutine) establishes geometric relations such as crosses between features in subclasses of classes like sdqo:ClassCross2. Being in a subclass gives to features a subclass number associated with data properties like sdqo:subnrCross to indicate the classes that are not to cross. The features already have sdqo:featureID (same as individual name).

A SWRL rule creates a sdqo:resultForData triple from sdqo:DQR_SWRLCrosses to the feature in the first class (like road, where the second class is building; the first class is “smaller” (dimension/lexical)). sdqo:DQR_SWRLCrosses has sdqo:errorCode value 1.

A SWRL rule associates erroneous feature with sdqo:dataHasErrorWithCode using the error code. Another SWRL rule creates a sdqo:hasMessage triple from sdqo:DQR_SWRLCrosses to a string involving the ids of the features. With the triple “sdqo:TC_Cross sdqo:hasResult sdqo:DQR_SwRLCrosses” and SWRL rules, another message associated with TC_Cross is created. A Java program goes over OWL individuals like sdqo:TC_Cross in subclasses of sdqo:DataQualityElements.

Openllet (Pellet) reasoner is used.

3. Discussion

3.1. Case Study

The case study revolves around the basemap of Trabzon province, which has been extracted and tailored to conform to the Turkish Large-Scale Map and Mapping Production Regulation. The dataset consists of building, cadastral parcel, and road layers among others, for the quality assessment.

Figure 10. provides a visual depiction of the study area along with the specific layers that have been selected for implementation. Layers’ information for the case study is summarized in Table 8 below.

The selected layers play a crucial role in the analysis and application of the regulations, enabling a comprehensive assessment of spatial data quality within Trabzon province. For the case study, the following topo-semantic rules are determined from the regulations listed below.

Necessary-type rule, “Buildings must be within a cadastral parcel”.
Forbidden-type rule, “Roads must not cross over buildings”.
Forbidden-type rule, “Cadastral parcels must not self-overlap”.
Forbidden-type rule, “Buildings must not self-overlap”. are the rules used for the case study.

Figure 11 and Figure 12 show examples of inconsistent features in the case data that violate the given rules. Building data are represented in gray, cadastral parcels are depicted in yellow, and road data are shown with red lines. An example of the “Necessary”-type rule is depicted in Figure 11. In Figure 12a, a building that overlaps other buildings is repainted red (less opaque) for visualization. Similarly in Figure 12b, a building is repainted light yellow.

The given rules were used to create the SfO ontology. The selection and creation of an SfO follows these steps:

If the user, in this case, domain expert, is aware of the appropriate SfO for their domain, they can directly utilize that SfO.
If SfO is not found in the system, the user is prompted to create a new SfO. This process involves using the SfO creation interface.

The creation of an SfO requires domain experts with expertise in the specific domain and its specifications. For each SfO, three essential components are generated: a Turtle file containing the ontology, an R2RML mapping file for data transformation, and a post-reasoning complete CSV file that summarizes the ontology. The rules are implemented through the GUI as follows to create the SfO:

Buildings must be within cadastral parcel: Necessary sfWithin:

“c2,c2,Necessary,sfWithin,Building, Cadastral_Parcel, timestamp_value”

Buildings must not overlap: Forbidden sfOverlaps, same class:

“c2, ,Forbidden,sfOverlaps,Building, , timestamp_value”

Roads must not cross buildings: Forbidden, sfCrosses:

“c1,c2,Forbidden,sfCrosses,Road,Building, timestamp_value”

This process ensures that the resulting SfO aligns with the domain’s requirements and effectively represents the spatial data quality constraints and relations.

The SfOs are created with the GUI. They can be modified with the GUI or an ontology editor. Removing specification rules corresponds to removing generic classes corresponds. Most changes in the specification correspond to changes in subclass relations.

3.1.1. Data Conversion to Ontology

R2RML mappings are generated automatically. These mappings can be stored to be used in the later implementations in the same institution. R2RML mappings are used to convert data to RDF with GeoTriples. The snapshot of building data in shapefile format, R2RML file for building data, and data ontology in RDF format after conversion can be seen in Figure 13.

3.1.2. Results of the Data Quality Assessment

After creating SfO and converting the data into RDF, SDQO ontology is imported by SfO. Figure 14 shows a road feature (Yol8644) with its property values. This feature crosses two buildings, which is forbidden. Therefore, an error with error code 1 (DQR_SWRLCrosses) is associated.

The results of the quality assessment are shown on the sample parts from tested data for visual comprehensibility. Erroneous features are represented in yellow in Figure 15, Figure 16, Figure 17 and Figure 18.

Out of 4395 roads, 70 (1.6%) cross over 100 of the 20,596 buildings, which represents 2.1%, and is shown in Figure 15. Additionally, 587 parcels from a total of 33,641 (1.7%) overlap with other parcels, as illustrated in Figure 16. Furthermore, 1030 buildings out of 20,596 (5%) overlap with other buildings, as shown in Figure 17. Finally, 212 buildings (1% of 20,596) are not entirely within a cadastral parcel is presented in Figure 18. These are topo-semantic consistency errors. The framework allows inspection of other ISO19157 quality elements like completeness and attribute consistency.

The assessment results classified after inference in the resulting ontology are displayed in Figure 19 and Figure 20.

A comparison between the results of quality assessment using the proposed method and manual spatial operations in the QGIS software (QGIS 3.28) is handled to validate the effectiveness of the proposed method. To identify inconsistent features using QGIS, a series of spatial queries are designed. For instance, the “select by location” tool is used to implement the “forbidden-type rule”, “roads must not cross over buildings” thereby identifying roads that are crossing over buildings as inconsistent data. A sample dataset was created with intentional inconsistencies. The results generated by the proposed method demonstrate full consistency with those obtained through QGIS software.

4. Conclusions

In this study, a framework is developed for assessing the quality of spatial data, utilizing OWL with SWRL rules. The aim is to make quality assessment interoperable, extensible, reusable, and web-based, for institutions. The study focuses on logical consistency as a key quality element, which refers to how well a dataset adheres to its defined specifications and rules. It examines topo-semantic consistency and domain consistency, two subclasses of logical consistency that are commonly required quality elements in spatial datasets.

The framework consists of two types of ontologies: the main ontology, SDQO, and the specification ontologies, referred to as SfOs. The goal of designing SfOs in a manner consistent with SDQO is to create a reusable and domain-independent framework. SfOs are designed based on specifications or user requirements, predominantly employing class hierarchies that enable easy updates and manipulations by domain experts. A GUI facilitates rule formalization through the generation of SfO ontologies. Based on domain expert inputs, a CSV file is generated. This file then undergoes an optimization process, resulting in the creation of an SfO ontology.

SDQO is created to incorporate general classes necessary for spatial quality assessment. Specifications often involve spatial rules that define relations between classes. For example, a “building” must not overlap with “sea”, “lake”, their subclasses, or other buildings. These spatial rules are expressed in the specifications using distinct rules for each relation. Within the framework, common rules are optimized across different classes, taking spatial relations into account. This optimization allows for efficient evaluation in a single step. SfO and SDQO are mapped using Java libraries including OWLAPI. Utilizing relations, SDQO ontology rules are implemented, and subsequently, quality assessment is performed. A data quality report is automatically generated by the proposed framework.

The framework is tested with a case study for a base map of a province. The results show that all overlapping buildings, overlapping parcels, roads that cross over buildings, and buildings that are not entirely within a cadastral parcel in the classes without causing ontology inconsistencies are identified.

Challenges are posed by data property-based necessary type restrictions without querying, but these are overcome by asserting the same single value for a data property across all members of specific classes using SWRL. A range of data property values is required for some problems, and this is addressed with the help of SQWRL.

For efficiency purposes, the study ignores disjoint geometries. Some regulations include disjoint cases, but they typically can be redefined without disjointness. The case study did not involve regulations related to touching objects; otherwise, a separate list for touching objects in relevant classes needed to be collated. In the remaining cases, the interiors do intersect. With Java JTS suite, one can check for interior intersections and mask them appropriately (DE9IM type), and then reuse this information later without losing much efficiency.

There are typically faster languages like C and Julia. JTS, along with OWLAPI and reasoners, are the main reasons for the choice of Java as a programming language. The whole process is performed with Java code accessed through GUI or Protege. Protege (itself a Java-based program) is used for editing the ontologies, typically by domain experts. It is bundled with plugins and libraries such as SWRLAPI.

For decidability issues, in the data ontology and SWRL, feature IDs are added as property values. The name and feature ID could be made to be the same in the case study.

More dynamicity and diversity (in the input) are left for future studies. One ex-ample is OpenStreetMap. Comparable open datasets tend to be crowdsourced. The related issues are not part of the desired scope. Authoritative-type data can typically be converted to RDF with R2RML. Dynamicity in data can be handled with the dynamicity of R2RML.

Current large language models are prone to the “hallucinations”. They can be a handful for studies that use crowdsourced data. Authoritative data are few and can be contaminated by the many. There also does not seem to be a pattern in the naming and numbering of the input/output such as classes.

In conclusion, the proposed framework for spatial data quality assessment is based on OWL with SWRL rules and open-world reasoning. We emphasize the integration of SDQO and SfOs, the use of hierarchical structures in SfOs, and the optimization of common rules across different classes. By leveraging a range of techniques, a robust and flexible approach to spatial data quality assessment is provided. The SDQO ontology can be extensible through the application of the open-world assumption. The framework can also be shared between different institutions with one SfO per institution, reminiscent of spatial data infrastructures. For instance, with properties like sdqo:associated and SWRL rules, analysis of features common to classes in different SfOs can be performed. The hierarchical nature enables easy manipulation even by non-experts of ontologies and software. The open-world path is preserved as much as possible in the study. Prolog and most software would contradict open-world nature. Laws and regulations in real life and institutions are to follow open-world logic just like ontologies and the Semantic Web. Future work will also focus on implementing a SHACL-based system for quality assessment.

Author Contributions

C.Y.: methodology, conceptualization, analysis, investigation, software development, writing, visualization; Ç.C.: administration of the study, methodology, writing; D.Y.: software development, conceptualization, analysis, writing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are only available on request, due to privacy and source restrictions.

Acknowledgments

We would like to express our special gratitude to our colleague Gülten KARA for her help in the process.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Eckerson, W.W. Data Quality and the Bottom Line; The Data Warehouse Institute: Mississauga, ON, Canada, 2002; Volume 1, ISBN 1555582311. [Google Scholar]
Michaels, S. How to Improve Your Data Quality Assessment Process—Featuring Laura Sebastian. Available online: https://www.gartner.com/smarterwithgartner/how-to-improve-your-data-quality (accessed on 3 September 2024).
Geisler, S.; Quix, C.; Weber, S.; Jarke, M. Ontology-Based Data Quality Management for Data Streams. J. Data Inf. Qual. 2016, 7, 8. [Google Scholar] [CrossRef]
Debattista, J.; Auer, S.; Lange, C. Luzzu—A Methodology and Framework for Linked Data Quality Assessment. J. Data Inf. Qual. 2016, 8, 4. [Google Scholar] [CrossRef]
Fürber, C.; Hepp, M. SWIQA—A Semantic Web Information Quality Assessment Framework. In Proceedings of the 19th European Conference on Information Systems, ECIS 2011, Helsinki, Finland, 9–11 June 2011. [Google Scholar]
Zhu, L. SemDQ: A Semantic Framework for Data Quality Assessment. Master’s Thesis, University of Waterloo, Waterloo, ON, Canada, 2014. [Google Scholar]
Mostafavi, M.-A.; Edwards, G.; Jeansoulin, R. An Ontology-Based Method for Quality Assessment of Spatial Data Bases. In Proceedings of the Third International Symposium on Spatial Data Quality, Vienna, Austria, 15–17April 2004; Volume 1/28a, pp. 49–66. [Google Scholar]
Wang, F.; Mäs, S.; Reinhardt, W.; Kandawasvika, A. Ontology Based Quality Assurance for Mobile Data Acquisition. In Proceedings of the 19th International Conference on Informatics for Environmental Protection: Networking Environmental Information, Brno, Czech Republic, 7–9 September 2005; pp. 1–8. [Google Scholar]
Sanderson, M.; Ramage, S.; Van Linden, L. IDE Communities: Data Quality and Knowledge Sharing. In Proceedings of the proceeding of the 11th GSDI conference, Rotterdam, The Netherlands, 15–19 June 2011. [Google Scholar]
1 Integrate. Available online: https://1spatial.com/products/1integrate/ (accessed on 3 September 2024).
ESRI ArcGIS Data Reviewer. Available online: http://www.esri.com/software/arcgis/extensions/arcgis-data-reviewer (accessed on 29 August 2024).
Mesterton, N.; Kivekäs, R. Towards Automating Spatial Data Quality Evaluation in the Finnish National Topographic Database. In Proceedings of the SDQ 2018: International Workshop on Spatial Data Quality, Valletta, Malta, 7 February 2018. [Google Scholar]
Keßler, C.; Raubal, M.; Wosniok, C. Semantic Rules for Context-Aware Geographical Information Retrieval. In Lecture Notes in Computer Science (Including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2009; Volume 5741 LNCS, pp. 77–92. [Google Scholar]
Mobasheri, A. A Rule-Based Spatial Reasoning Approach for OpenStreetMap Data Quality Enrichment; Case Study of Routing and Navigation. Sensors 2017, 17, 2498. [Google Scholar] [CrossRef] [PubMed]
Nash, E.; Wiebensohn, J.; Nikkilä, R.; Vatsanidou, A.; Fountas, S.; Bill, R. Towards Automated Compliance Checking Based on a Formal Representation of Agricultural Production Standards. Comput. Electron. Agric. 2011, 78, 28–37. [Google Scholar] [CrossRef]
Homburg, T. Connecting Semantic Situation Descriptions with Data Quality Evaluations—Towards a Framework of Automatic Thematic Map Evaluation. Information 2020, 11, 532. [Google Scholar] [CrossRef]
Homburg, T.; Boochs, F. Situation-Dependent Data Quality Analysis for Geospatial Data Using Semantic Technologies. In Lecture Notes in Business Information Processing; Springer: Berlin/Heidelberg, Germany, 2019; Volume 339, pp. 566–578. [Google Scholar]
Qiu, H.; Ayara, A.; Glimm, B. Ontology-Based Map Data Quality Assurance. In Lecture Notes in Computer Science (Including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2021; Volume 12731 LNCS, pp. 73–89. [Google Scholar]
Xu, X.; Cai, H. Semantic Approach to Compliance Checking of Underground Utilities. Autom. Constr. 2020, 109, 103006. [Google Scholar] [CrossRef]
Degbelo, A. Short Paper: An Ontology Design Pattern for Spatial Data Quality Characterization in the Semantic Sensor Web. In CEUR Workshop Proceedings; Citeseer: Princeton, NJ, USA, 2012; Volume 904, pp. 103–108. [Google Scholar]
Cherfi, S.S.; Guillotel, C.; Hamdi, F.; Rigaux, P.; Travers, N. Ontology-Based Annotation of Music Scores. In Proceedings of the Knowledge Capture Conference, New York, NY, USA, 4 December 2017; pp. 1–4. [Google Scholar]
Varadharajulu, P.; Arnold, L.; McMeekin, D.A.; West, G.; Moncrieff, S. SWRL Rule Development to Automate Spatial Transactions in Government. In Communications in Computer and Information Science; Springer: Berlin/Heidelberg, Germany, 2017; Volume 741, pp. 122–142. [Google Scholar]
Nash, E.; Nikkilä, R.; Wiebensohn, J.; Walter, K.; Bill, R. Interchange of Geospatial Rules—Towards Georules Interchange Format (GeoRIF)? GIS Sci. 2011, 24, 82–94. [Google Scholar]
Bergman, M. The Open World Assumption: Elephant in the Room. Available online: https://www.mkbergman.com/852/ (accessed on 3 September 2024).
Turkish Official Gazette. Large-Scale Map and Map Information Production Regulation; Turkish Official Gazette: Ankara, Türkiye, 2005; Volume 25876. [Google Scholar]
INSPIRE Maintenance and Implementation Group (MIG). INSPIRE Data Specification on Hydrography—Technical Guidelines . 2014, Volume D2.8.I.8. Available online: https://www.google.com.hk/url?sa=t&source=web&rct=j&opi=89978449&url=https://inspire-mif.github.io/technical-guidelines/data/hy/dataspecification_hy.pdf&ved=2ahUKEwiW1vazvLqJAxXcsFYBHWYJPdsQFnoECBYQAQ&usg=AOvVaw1fB8zv7YkPJjIFj73Pj2uV (accessed on 1 September 2024).
ISO 19157; Geographic Information-Data Quality. International Organization for Standardization: Geneva, Switzerland, 2013.
ISO 19157-1; Geographic information—Data Quality—Part 1: General Requirements. International Organization for Standardization: Geneva, Switzerland, 2023.
Servigne, S.; Ubeda, T.; Puricelli, A.; Laurini, R. A Methodology for Spatial Consistency Improvement of Geographic Databases. Geoinformatica 2000, 4, 7–34. [Google Scholar] [CrossRef]
Frank, A.U. Spatial Ontology: A Geographical Information Point of View. In Spatial and Temporal Reasoning; Springer: Berlin/Heidelberg, Germany, 1997; pp. 135–153. [Google Scholar]
Gruber, T.R. Toward Principles for the Design of Ontologies Used for Knowledge Sharing. Int. J. Hum. Comput. Stud. 1995, 43, 907–928. [Google Scholar] [CrossRef]
Horrocks, I.; Patel-Schneider, P.F.; Boley, H.; Tabet, S.; Grosof, B.; Dean, M. SWRL: A Semantic Web Rule Language Combining OWL and RuleML. 2004. Available online: http://www.w3.org/Submission/SWRL (accessed on 1 September 2024).
Sirin, E.; Parsia, B.; Grau, B.C.; Kalyanpur, A.; Katz, Y. Pellet: A Practical Owl-Dl Reasoner. J. Web Semant. 2007, 5, 51–53. [Google Scholar] [CrossRef]
Kyzirakos, K.; Vlachopoulos, I.; Savva, D.; Manegold, S.; Koubarakis, M. GeoTriples: A Tool for Publishing Geospatial Data as RDF Graphs Using R2RML Mappings. In Proceedings of the CEUR Workshop Proceedings, Riva del Garda, Italy, 21 October 2014; Volume 1401, pp. 33–44. [Google Scholar]
Patroumpas, K.; Alexakis, M.; Giannopoulos, G.; Athanasiou, S. TripleGeo: An ETL Tool for Transforming Geospatial Data into RDF Triples. In Proceedings of the CEUR Workshop Proceedings, Athens, Greece, 28 March 2014; Volume 1133, pp. 275–278. [Google Scholar]
Bereta, K.; Xiao, G.; Koubarakis, M. Ontop-Spatial: Ontop of Geospatial Databases. J. Web Semant. 2019, 58, 100514. [Google Scholar] [CrossRef]
Nyulas, C.; O’Connor, M.; Tu, S. DataMaster—A Plug-in for Importing Schemas and Data from Relational Databases into Protégé. In Proceedings of the 10th International Protege Conference, Budapest, Hungary, 15–18 July 2007; pp. 1–3. [Google Scholar]
Kyzirakos, K.; Savva, D.; Vlachopoulos, I.; Vasileiou, A.; Karalis, N.; Koubarakis, M.; Manegold, S. GeoTriples: Transforming Geospatial Data into RDF Graphs Using R2RML and RML Mappings. J. Web Semant. 2018, 52–53, 16–32. [Google Scholar] [CrossRef]
Souripriya, D.; Seema, S.; Richard, C. R2RML: RDB to RDF mapping language. W3C Recomm. 2012, 27. [Google Scholar]
Perry, M.; Herring, J. OGC GeoSPARQL-A Geographic Query Language for RDF Data. Available online: http://www.opengis.net/doc/IS/geosparql/1.0 (accessed on 1 September 2024).
Beddoe, D.; Cotton, P.; Uleman, R.; Johnson, S.; Herring, J.R. OpenGIS Simple Features Specification for SQL. OpenGIS Proj. Doc. 99 1999, 49, 49–99. [Google Scholar]
Clementini, E.; Di Felice, P.; van Oosterom, P. A Small Set of Formal Topological Relationships Suitable for End-User Interaction. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 1993; Volume 692 LNCS, pp. 277–295. [Google Scholar]
Yıldırım, D. Comparison of Crowd Source Data with Authorized Data and Assessment of Data Quality. Ph.D. Thesis, Karadeniz Technical University, Trabzon, Turkey.

Figure 1. Data Quality Assessment Framework, components, and interactions.

Figure 2. Spatial data to RDF conversion.

Figure 3. Ontology interaction.

Figure 4. Main classes of SDQO ontology mapped with GeoSPARQL.

Figure 5. Ontological correspondence of “Contours must not cross any lake and building” rule.

Figure 6. GUI to generate SfO ontology.

Figure 7. A path of subclass relations in a SfO.

Figure 8. SDQO, SfO, and data.

Figure 9. SDQO, SfO, DO, and SWRL rule applied to infer contours which are crossing over buildings when it is not allowed.

Figure 10. Map of Trabzon city center, a section of the study area.

Figure 11. A building that is not within a cadastral parcel.

Figure 12. Examples of “Forbidden”-type rules in the use of case data: (a) Overlapping buildings and (b) Roads crossing over buildings.

Figure 13. Data conversion.

Figure 14. A sample road feature (Yol8644).

Figure 15. Roads that cross buildings (erroneous roads and buildings are shown in yellow).

Figure 16. Overlapping cadastral parcels (light yellow or red).

Figure 17. Overlapping buildings (light yellow or red).

Figure 18. Buildings that are not within cadastral parcels (painted in red).

Figure 19. An excerpt from resulting ontology for “overlapping buildings in the same layer”. “Error code 2” represents “overlapping features”.

Figure 20. An excerpt from resulting ontology for “road features that are crossing over buildings”. “Error code 1” represents “crossing features”.

Table 1. Topo-semantic consistency elements in the study.

Topo-Semantic Consistency Elements	Rule	Example
Must not overlap. (Different features of the same class)	No feature in a particular class overlaps with any other feature within the same class.	Buildings must not overlap.
Must be within	Each feature in a class must be spatially within at least one feature in the other feature class.	A building must be within a cadastral parcel.
Must not overlap with	Features belonging to one class do not overlap with features in the other class.	Buildings must not overlap with the parcel.
Must not cross	Features in one class do not cross over features in the other class.	Roads must not cross over buildings.

Table 2. Domain consistency elements in the study.

Domain Consistency Elements	Rule	Example
No Null value/non-empty	For a class, a given set of attributes of features cannot be empty.	Parcels have a non-empty PID value.
Constant value	A given attribute of features in a specified class has a constant value.	All main roads have the same code.
Value range	A given attribute of features in a class has values in a defined range.	The residential buildings can have at most five floors.

Table 3. The four direct subclasses of sdqo:RestrictedFeature.

Class	Explanation	Example
sdqo:InterObjectPrRF	Features that are restricted in terms of relations between classes.	“Roads must not cross over Buildings”
sdqo:IntraObjectPrRF	Features that are restricted in terms of relations within class.	“Buildings must not overlap with other buildings”
sdqo:InterDataPrRF	Features that are restricted by attributes between classes.	“Prominence of a mountain is bigger than height of a hill”
sdqo:IntraDataPrR	Features that are restricted by attributes within class.	“Different buildings have different id’s”

Table 4. Hierarchical implementation of the specifications.

Specification Rule	Implementation
The features of class A must not overlap with the features of class B.	Class A is a subclass of class C, which is an SfO class. Similarly, class B is a subclass of class D, also an SfO class. Both class C and class D are subclasses of certain SDQO classes that play a role in SWRL rules. These SWRL rules establish error properties related to erroneous features in classes A and B, as well as the forbidden overlaps relation between them.

Table 5. Sub-properties of the object property sdqo:interiorIntersects.

Object Property	Domain	Range	Explanation
sdqo:iandb	Spatial Object	Spatial Object	The interior of the first feature intersects the boundary of the second feature.
sdqo:iande	Spatial Object	Spatial Object	The interior of the first feature intersects the exterior of the second feature.
sdqo:bande	Spatial Object	Spatial Object	The boundary of the first feature intersects the exterior of the second feature.
sdqo:boundaryIntersects	Spatial Object	Spatial Object	The boundaries of the two features intersect.

Table 6. A top SfO class description in Turtle format.

…
sfo:ClassCrf01 rdf:type owl:Class;

owl:equivalentClass [ owl:intersectionOf (sdqo:ClassCross1

[rdf:type owl:Restriction; owl:hasValue 1; owl:onProperty sdqo:subnrCross;]

); rdf:type owl:Class].
…

Table 7. A subset of rules in SDQO.

No.	SWRL Expression
1	RestrictedFeature(?x) ^ hasGeometry(?x, ?g) ^ asWKT(?g, ?w) ^ swrlb:contains(?w, “POLYGON”) -> CalcPoly(?x)
2	iande (?x, ?a) ^ iande(?a, ?x) -> crossesOrOverlaps (?x,?a)
3	crossesOrOverlaps(?x, ?y)^ CalcPoly(?x)^ CalcPoly(?y) -> sfOverlaps(?x, ?y)
4	crossesOrOverlaps(?x, ?y)^ CalcLine(?x)^ CalcPoly(?y) -> sfCrosses(?x, ?y)
5	noie(?x,?a) ^ nobe(?x,?a)-> sfWithin(?x,?a)
6	swrlb:stringConcat(?aap, ?p, “ , “^^xsd:string, ?aa)^ hasMessage(?r, ?aa)^ hasProbName(?t, ?p)^ hasResult(?t, ?r) -> elementHasMessage(?t, ?aap)
7	errorCode(?r, ?n)^ resultForData(?r, ?a) -> dataHasErrorWithCode(?a, ?n)
8	noie(?x,?a) ^ nobe(?x,?a) ^ noie(?a,?x) ^ nobe(?a,?x)->sfEquals(?x,?a)

Table 8. Overview of layers for the case study.

Layer	Number of Features	Size
Road	4395	2.3 MB
Building	20,596	5.7 MB
Cadastral Parcel	33,641	18.2 MB

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yılmaz, C.; Cömert, Ç.; Yıldırım, D. Ontology-Based Spatial Data Quality Assessment Framework. Appl. Sci. 2024, 14, 10045. https://doi.org/10.3390/app142110045

AMA Style

Yılmaz C, Cömert Ç, Yıldırım D. Ontology-Based Spatial Data Quality Assessment Framework. Applied Sciences. 2024; 14(21):10045. https://doi.org/10.3390/app142110045

Chicago/Turabian Style

Yılmaz, Cemre, Çetin Cömert, and Deniz Yıldırım. 2024. "Ontology-Based Spatial Data Quality Assessment Framework" Applied Sciences 14, no. 21: 10045. https://doi.org/10.3390/app142110045

APA Style

Yılmaz, C., Cömert, Ç., & Yıldırım, D. (2024). Ontology-Based Spatial Data Quality Assessment Framework. Applied Sciences, 14(21), 10045. https://doi.org/10.3390/app142110045

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ontology-Based Spatial Data Quality Assessment Framework

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Quality Elements

2.2. Ontology

2.3. Framework

2.3.1. Data Conversion with R2RML

2.3.2. Ontology Development

2.3.3. Spatial Data Quality Ontology

2.3.4. SfO and Graphical User Interface

2.3.5. Rules for Quality Assessment

2.3.6. Assessment and Inference

3. Discussion

3.1. Case Study

3.1.1. Data Conversion to Ontology

3.1.2. Results of the Data Quality Assessment

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI