Ontology-Based Spatial Data Quality Assessment Framework
Abstract
:1. Introduction
- There is a central regulatory system (the authority, the constitution) that is dynamic but stable (not rigid), an open-world approach. Changes are expected but not frequent. To model these needs, SDQO is designed. Data quality concepts and general spatial relation rules are the main scope of the designed ontology.
- Some other regulations that have an inherent dependence on the central system. More such rules can be introduced into the system in the future without breaking any previous handling. These rules are more frequently modified and are open to interdependencies. SfO is designed to conceptualize the rules of an institution.
2. Materials and Methods
2.1. Data Quality Elements
2.2. Ontology
2.3. Framework
2.3.1. Data Conversion with R2RML
- (1)
- Support ESRI shapefiles and a range of spatial data formats as input.
- (2)
- Have associated Java library access.
- (3)
- Have GeoSPARQL compatibility, given its role as the primary vocabulary for geographic data within the framework.
- (4)
- Have the ability to convert attribute data associated with spatial data.
2.3.2. Ontology Development
2.3.3. Spatial Data Quality Ontology
- Which features have a particular set of data quality problems?
- Which features have topo-semantic problems with the specified ones?
- Which features have domain consistency problems?
- What is the number of erroneous objects in the tested data?
2.3.4. SfO and Graphical User Interface
- Create two top-level classes as direct subclasses of the appropriate SDQO classes, representing the crossing restriction. These classes can be automatically named based on their purpose and context. The first class is for features of lower dimensions. A sample description of the first class, named sfo:ClassCrf01, is shown in Table 6.
- Create a class named “Contour” as a subclass of the first class in step 1.
- Create classes named “Building”, “Lake”, and “Permanent Lake” as subclasses of the second class.
- Incorporate timestamp values to indicate when regulatory changes take effect.
- “c0” refers to a geometric class of dimension 0 (point).
- “c1” refers to a geometric class of dimension 1 (line).
- “c2” refers to a geometric class of dimension 2 (polygon).
- “cc” refers to a general class, possibly non-geometric.
- “d” represents a datatype property.
- “o” represents an object property.
- “i” represents an individual.
- (1)
- Establish an order among SfO classes
- (2)
- Smaller class needs to be a single subclass, larger classes can have siblings.
Algorithm 1. CSV optimization |
Begin If the first two entries of a row are not in lexical order Switch second and first entries, fifth and sixth entries End if Split rows first by fifth column then by sixth column using the function splitFurtherByColumn Begin function splitFurtherByColumn(csvText, columnNr, separator = “|”) Initialize an empty list to store the resulting rows Split the csvText into rows by newline Loop through each row Split the row into columns by comma Check if the columnNr column contains the separator If true, split the column by the separator For each value in splitValues, create a new row with the same columns except columnNr Else add the original row to results End if End loop Return the modified rows as a single string End function Reorder entire rows with lexical ordering Combine subsequent rows by sixth column if the first five entries are identical, using pipes Timestamp is the minimum one. End |
2.3.5. Rules for Quality Assessment
- Sample class hierarchy paths:
- sfo:Road < sfo:ClassCrf07 < sdqo:ClassCross1 < sdqo:Cross < sdqo:InterObjectPrRF < sdqo:RestrictedFeature < ogc:Feature < ogc:SpatialObject < owl:Thing
- sfo:Road < sfo:ClassCrf07 < [ sdqo:subnrCross = 7]
- sfo:Building < sfo:ClassCrs07 < sdqo:ClassCross2 < sdqo:Cross < sdqo:InterObjectPrRF < sdqo:RestrictedFeature < ogc:Feature < ogc:SpatialObject < owl:Thing
- sfo:Building < sfo:ClassCrs07 < [ sdqo:subnrCross = 7]
- sdqo:ClassCross1(?a) ^ sdqo:ClassCross2(?x) ^ogc:sfCrosses(?a,?x) ^ sdqo:subnrCross(?a,?n)^sdqo:subnrCross(?x,?n)→ resultForData(DQR_SWRLCross, ?a)
- sdqo:ClassCross2(?x) ^ sdqo:subnrCross(?a, ?n) ^ swrlb:stringConcat(?aa, ?ida, “ , “, ?idx, “ , “) ^ sdqo:featureID(?x, ?idx) ^ sdqo:featureID(?a, ?ida) ^ sdqo:subnrCross(?x, ?n) ^ ogc:sfCrosses(?x, ?a) ^ sdqo:ClassCross1(?a) → sdqo:hasMessage(DQR_SWRLCrosses, ?aa)
2.3.6. Assessment and Inference
3. Discussion
3.1. Case Study
- Necessary-type rule, “Buildings must be within a cadastral parcel”.
- Forbidden-type rule, “Roads must not cross over buildings”.
- Forbidden-type rule, “Cadastral parcels must not self-overlap”.
- Forbidden-type rule, “Buildings must not self-overlap”. are the rules used for the case study.
- If the user, in this case, domain expert, is aware of the appropriate SfO for their domain, they can directly utilize that SfO.
- If SfO is not found in the system, the user is prompted to create a new SfO. This process involves using the SfO creation interface.
- Buildings must be within cadastral parcel: Necessary sfWithin:
- Buildings must not overlap: Forbidden sfOverlaps, same class:
- Roads must not cross buildings: Forbidden, sfCrosses:
3.1.1. Data Conversion to Ontology
3.1.2. Results of the Data Quality Assessment
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Eckerson, W.W. Data Quality and the Bottom Line; The Data Warehouse Institute: Mississauga, ON, Canada, 2002; Volume 1, ISBN 1555582311. [Google Scholar]
- Michaels, S. How to Improve Your Data Quality Assessment Process—Featuring Laura Sebastian. Available online: https://www.gartner.com/smarterwithgartner/how-to-improve-your-data-quality (accessed on 3 September 2024).
- Geisler, S.; Quix, C.; Weber, S.; Jarke, M. Ontology-Based Data Quality Management for Data Streams. J. Data Inf. Qual. 2016, 7, 8. [Google Scholar] [CrossRef]
- Debattista, J.; Auer, S.; Lange, C. Luzzu—A Methodology and Framework for Linked Data Quality Assessment. J. Data Inf. Qual. 2016, 8, 4. [Google Scholar] [CrossRef]
- Fürber, C.; Hepp, M. SWIQA—A Semantic Web Information Quality Assessment Framework. In Proceedings of the 19th European Conference on Information Systems, ECIS 2011, Helsinki, Finland, 9–11 June 2011. [Google Scholar]
- Zhu, L. SemDQ: A Semantic Framework for Data Quality Assessment. Master’s Thesis, University of Waterloo, Waterloo, ON, Canada, 2014. [Google Scholar]
- Mostafavi, M.-A.; Edwards, G.; Jeansoulin, R. An Ontology-Based Method for Quality Assessment of Spatial Data Bases. In Proceedings of the Third International Symposium on Spatial Data Quality, Vienna, Austria, 15–17April 2004; Volume 1/28a, pp. 49–66. [Google Scholar]
- Wang, F.; Mäs, S.; Reinhardt, W.; Kandawasvika, A. Ontology Based Quality Assurance for Mobile Data Acquisition. In Proceedings of the 19th International Conference on Informatics for Environmental Protection: Networking Environmental Information, Brno, Czech Republic, 7–9 September 2005; pp. 1–8. [Google Scholar]
- Sanderson, M.; Ramage, S.; Van Linden, L. IDE Communities: Data Quality and Knowledge Sharing. In Proceedings of the proceeding of the 11th GSDI conference, Rotterdam, The Netherlands, 15–19 June 2011. [Google Scholar]
- 1 Integrate. Available online: https://1spatial.com/products/1integrate/ (accessed on 3 September 2024).
- ESRI ArcGIS Data Reviewer. Available online: http://www.esri.com/software/arcgis/extensions/arcgis-data-reviewer (accessed on 29 August 2024).
- Mesterton, N.; Kivekäs, R. Towards Automating Spatial Data Quality Evaluation in the Finnish National Topographic Database. In Proceedings of the SDQ 2018: International Workshop on Spatial Data Quality, Valletta, Malta, 7 February 2018. [Google Scholar]
- Keßler, C.; Raubal, M.; Wosniok, C. Semantic Rules for Context-Aware Geographical Information Retrieval. In Lecture Notes in Computer Science (Including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2009; Volume 5741 LNCS, pp. 77–92. [Google Scholar]
- Mobasheri, A. A Rule-Based Spatial Reasoning Approach for OpenStreetMap Data Quality Enrichment; Case Study of Routing and Navigation. Sensors 2017, 17, 2498. [Google Scholar] [CrossRef] [PubMed]
- Nash, E.; Wiebensohn, J.; Nikkilä, R.; Vatsanidou, A.; Fountas, S.; Bill, R. Towards Automated Compliance Checking Based on a Formal Representation of Agricultural Production Standards. Comput. Electron. Agric. 2011, 78, 28–37. [Google Scholar] [CrossRef]
- Homburg, T. Connecting Semantic Situation Descriptions with Data Quality Evaluations—Towards a Framework of Automatic Thematic Map Evaluation. Information 2020, 11, 532. [Google Scholar] [CrossRef]
- Homburg, T.; Boochs, F. Situation-Dependent Data Quality Analysis for Geospatial Data Using Semantic Technologies. In Lecture Notes in Business Information Processing; Springer: Berlin/Heidelberg, Germany, 2019; Volume 339, pp. 566–578. [Google Scholar]
- Qiu, H.; Ayara, A.; Glimm, B. Ontology-Based Map Data Quality Assurance. In Lecture Notes in Computer Science (Including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2021; Volume 12731 LNCS, pp. 73–89. [Google Scholar]
- Xu, X.; Cai, H. Semantic Approach to Compliance Checking of Underground Utilities. Autom. Constr. 2020, 109, 103006. [Google Scholar] [CrossRef]
- Degbelo, A. Short Paper: An Ontology Design Pattern for Spatial Data Quality Characterization in the Semantic Sensor Web. In CEUR Workshop Proceedings; Citeseer: Princeton, NJ, USA, 2012; Volume 904, pp. 103–108. [Google Scholar]
- Cherfi, S.S.; Guillotel, C.; Hamdi, F.; Rigaux, P.; Travers, N. Ontology-Based Annotation of Music Scores. In Proceedings of the Knowledge Capture Conference, New York, NY, USA, 4 December 2017; pp. 1–4. [Google Scholar]
- Varadharajulu, P.; Arnold, L.; McMeekin, D.A.; West, G.; Moncrieff, S. SWRL Rule Development to Automate Spatial Transactions in Government. In Communications in Computer and Information Science; Springer: Berlin/Heidelberg, Germany, 2017; Volume 741, pp. 122–142. [Google Scholar]
- Nash, E.; Nikkilä, R.; Wiebensohn, J.; Walter, K.; Bill, R. Interchange of Geospatial Rules—Towards Georules Interchange Format (GeoRIF)? GIS Sci. 2011, 24, 82–94. [Google Scholar]
- Bergman, M. The Open World Assumption: Elephant in the Room. Available online: https://www.mkbergman.com/852/ (accessed on 3 September 2024).
- Turkish Official Gazette. Large-Scale Map and Map Information Production Regulation; Turkish Official Gazette: Ankara, Türkiye, 2005; Volume 25876. [Google Scholar]
- INSPIRE Maintenance and Implementation Group (MIG). INSPIRE Data Specification on Hydrography—Technical Guidelines . 2014, Volume D2.8.I.8. Available online: https://www.google.com.hk/url?sa=t&source=web&rct=j&opi=89978449&url=https://inspire-mif.github.io/technical-guidelines/data/hy/dataspecification_hy.pdf&ved=2ahUKEwiW1vazvLqJAxXcsFYBHWYJPdsQFnoECBYQAQ&usg=AOvVaw1fB8zv7YkPJjIFj73Pj2uV (accessed on 1 September 2024).
- ISO 19157; Geographic Information-Data Quality. International Organization for Standardization: Geneva, Switzerland, 2013.
- ISO 19157-1; Geographic information—Data Quality—Part 1: General Requirements. International Organization for Standardization: Geneva, Switzerland, 2023.
- Servigne, S.; Ubeda, T.; Puricelli, A.; Laurini, R. A Methodology for Spatial Consistency Improvement of Geographic Databases. Geoinformatica 2000, 4, 7–34. [Google Scholar] [CrossRef]
- Frank, A.U. Spatial Ontology: A Geographical Information Point of View. In Spatial and Temporal Reasoning; Springer: Berlin/Heidelberg, Germany, 1997; pp. 135–153. [Google Scholar]
- Gruber, T.R. Toward Principles for the Design of Ontologies Used for Knowledge Sharing. Int. J. Hum. Comput. Stud. 1995, 43, 907–928. [Google Scholar] [CrossRef]
- Horrocks, I.; Patel-Schneider, P.F.; Boley, H.; Tabet, S.; Grosof, B.; Dean, M. SWRL: A Semantic Web Rule Language Combining OWL and RuleML. 2004. Available online: http://www.w3.org/Submission/SWRL (accessed on 1 September 2024).
- Sirin, E.; Parsia, B.; Grau, B.C.; Kalyanpur, A.; Katz, Y. Pellet: A Practical Owl-Dl Reasoner. J. Web Semant. 2007, 5, 51–53. [Google Scholar] [CrossRef]
- Kyzirakos, K.; Vlachopoulos, I.; Savva, D.; Manegold, S.; Koubarakis, M. GeoTriples: A Tool for Publishing Geospatial Data as RDF Graphs Using R2RML Mappings. In Proceedings of the CEUR Workshop Proceedings, Riva del Garda, Italy, 21 October 2014; Volume 1401, pp. 33–44. [Google Scholar]
- Patroumpas, K.; Alexakis, M.; Giannopoulos, G.; Athanasiou, S. TripleGeo: An ETL Tool for Transforming Geospatial Data into RDF Triples. In Proceedings of the CEUR Workshop Proceedings, Athens, Greece, 28 March 2014; Volume 1133, pp. 275–278. [Google Scholar]
- Bereta, K.; Xiao, G.; Koubarakis, M. Ontop-Spatial: Ontop of Geospatial Databases. J. Web Semant. 2019, 58, 100514. [Google Scholar] [CrossRef]
- Nyulas, C.; O’Connor, M.; Tu, S. DataMaster—A Plug-in for Importing Schemas and Data from Relational Databases into Protégé. In Proceedings of the 10th International Protege Conference, Budapest, Hungary, 15–18 July 2007; pp. 1–3. [Google Scholar]
- Kyzirakos, K.; Savva, D.; Vlachopoulos, I.; Vasileiou, A.; Karalis, N.; Koubarakis, M.; Manegold, S. GeoTriples: Transforming Geospatial Data into RDF Graphs Using R2RML and RML Mappings. J. Web Semant. 2018, 52–53, 16–32. [Google Scholar] [CrossRef]
- Souripriya, D.; Seema, S.; Richard, C. R2RML: RDB to RDF mapping language. W3C Recomm. 2012, 27. [Google Scholar]
- Perry, M.; Herring, J. OGC GeoSPARQL-A Geographic Query Language for RDF Data. Available online: http://www.opengis.net/doc/IS/geosparql/1.0 (accessed on 1 September 2024).
- Beddoe, D.; Cotton, P.; Uleman, R.; Johnson, S.; Herring, J.R. OpenGIS Simple Features Specification for SQL. OpenGIS Proj. Doc. 99 1999, 49, 49–99. [Google Scholar]
- Clementini, E.; Di Felice, P.; van Oosterom, P. A Small Set of Formal Topological Relationships Suitable for End-User Interaction. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 1993; Volume 692 LNCS, pp. 277–295. [Google Scholar]
- Yıldırım, D. Comparison of Crowd Source Data with Authorized Data and Assessment of Data Quality. Ph.D. Thesis, Karadeniz Technical University, Trabzon, Turkey.
Topo-Semantic Consistency Elements | Rule | Example |
---|---|---|
Must not overlap. (Different features of the same class) | No feature in a particular class overlaps with any other feature within the same class. | Buildings must not overlap. |
Must be within | Each feature in a class must be spatially within at least one feature in the other feature class. | A building must be within a cadastral parcel. |
Must not overlap with | Features belonging to one class do not overlap with features in the other class. | Buildings must not overlap with the parcel. |
Must not cross | Features in one class do not cross over features in the other class. | Roads must not cross over buildings. |
Domain Consistency Elements | Rule | Example |
---|---|---|
No Null value/non-empty | For a class, a given set of attributes of features cannot be empty. | Parcels have a non-empty PID value. |
Constant value | A given attribute of features in a specified class has a constant value. | All main roads have the same code. |
Value range | A given attribute of features in a class has values in a defined range. | The residential buildings can have at most five floors. |
Class | Explanation | Example |
---|---|---|
sdqo:InterObjectPrRF | Features that are restricted in terms of relations between classes. | “Roads must not cross over Buildings” |
sdqo:IntraObjectPrRF | Features that are restricted in terms of relations within class. | “Buildings must not overlap with other buildings” |
sdqo:InterDataPrRF | Features that are restricted by attributes between classes. | “Prominence of a mountain is bigger than height of a hill” |
sdqo:IntraDataPrR | Features that are restricted by attributes within class. | “Different buildings have different id’s” |
Specification Rule | Implementation |
---|---|
The features of class A must not overlap with the features of class B. | Class A is a subclass of class C, which is an SfO class. Similarly, class B is a subclass of class D, also an SfO class. Both class C and class D are subclasses of certain SDQO classes that play a role in SWRL rules. These SWRL rules establish error properties related to erroneous features in classes A and B, as well as the forbidden overlaps relation between them. |
Object Property | Domain | Range | Explanation |
---|---|---|---|
sdqo:iandb | Spatial Object | Spatial Object | The interior of the first feature intersects the boundary of the second feature. |
sdqo:iande | Spatial Object | Spatial Object | The interior of the first feature intersects the exterior of the second feature. |
sdqo:bande | Spatial Object | Spatial Object | The boundary of the first feature intersects the exterior of the second feature. |
sdqo:boundaryIntersects | Spatial Object | Spatial Object | The boundaries of the two features intersect. |
… sfo:ClassCrf01 rdf:type owl:Class; |
owl:equivalentClass [ owl:intersectionOf (sdqo:ClassCross1 |
[rdf:type owl:Restriction; owl:hasValue 1; owl:onProperty sdqo:subnrCross;] |
); rdf:type owl:Class]. … |
No. | SWRL Expression |
---|---|
1 | RestrictedFeature(?x) ^ hasGeometry(?x, ?g) ^ asWKT(?g, ?w) ^ swrlb:contains(?w, “POLYGON”) -> CalcPoly(?x) |
2 | iande (?x, ?a) ^ iande(?a, ?x) -> crossesOrOverlaps (?x,?a) |
3 | crossesOrOverlaps(?x, ?y)^ CalcPoly(?x)^ CalcPoly(?y) -> sfOverlaps(?x, ?y) |
4 | crossesOrOverlaps(?x, ?y)^ CalcLine(?x)^ CalcPoly(?y) -> sfCrosses(?x, ?y) |
5 | noie(?x,?a) ^ nobe(?x,?a)-> sfWithin(?x,?a) |
6 | swrlb:stringConcat(?aap, ?p, “ , “^^xsd:string, ?aa)^ hasMessage(?r, ?aa)^ hasProbName(?t, ?p)^ hasResult(?t, ?r) -> elementHasMessage(?t, ?aap) |
7 | errorCode(?r, ?n)^ resultForData(?r, ?a) -> dataHasErrorWithCode(?a, ?n) |
8 | noie(?x,?a) ^ nobe(?x,?a) ^ noie(?a,?x) ^ nobe(?a,?x)->sfEquals(?x,?a) |
Layer | Number of Features | Size |
---|---|---|
Road | 4395 | 2.3 MB |
Building | 20,596 | 5.7 MB |
Cadastral Parcel | 33,641 | 18.2 MB |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yılmaz, C.; Cömert, Ç.; Yıldırım, D. Ontology-Based Spatial Data Quality Assessment Framework. Appl. Sci. 2024, 14, 10045. https://doi.org/10.3390/app142110045
Yılmaz C, Cömert Ç, Yıldırım D. Ontology-Based Spatial Data Quality Assessment Framework. Applied Sciences. 2024; 14(21):10045. https://doi.org/10.3390/app142110045
Chicago/Turabian StyleYılmaz, Cemre, Çetin Cömert, and Deniz Yıldırım. 2024. "Ontology-Based Spatial Data Quality Assessment Framework" Applied Sciences 14, no. 21: 10045. https://doi.org/10.3390/app142110045
APA StyleYılmaz, C., Cömert, Ç., & Yıldırım, D. (2024). Ontology-Based Spatial Data Quality Assessment Framework. Applied Sciences, 14(21), 10045. https://doi.org/10.3390/app142110045