A Survey of Geospatial Semantic Web for Cultural Heritage

Nishanbaev, Ikrom; Champion, Erik; McMeekin, David A.

doi:10.3390/heritage2020093

Open AccessReview

A Survey of Geospatial Semantic Web for Cultural Heritage

by

Ikrom Nishanbaev

^1,*

,

Erik Champion

¹

and

David A. McMeekin

²

¹

School of Media, Creative Arts, and Social Inquiry, Curtin University, Perth, WA 6845, Australia

²

School of Earth and Planetary Sciences, Curtin University, Perth, WA 6845, Australia

^*

Author to whom correspondence should be addressed.

Heritage 2019, 2(2), 1471-1498; https://doi.org/10.3390/heritage2020093

Submission received: 23 April 2019 / Revised: 13 May 2019 / Accepted: 16 May 2019 / Published: 20 May 2019

Download

Browse Figures

Versions Notes

Abstract

:

The amount of digital cultural heritage data produced by cultural heritage institutions is growing rapidly. Digital cultural heritage repositories have therefore become an efficient and effective way to disseminate and exploit digital cultural heritage data. However, many digital cultural heritage repositories worldwide share technical challenges such as data integration and interoperability among national and regional digital cultural heritage repositories. The result is dispersed and poorly-linked cultured heritage data, backed by non-standardized search interfaces, which thwart users’ attempts to contextualize information from distributed repositories. A recently introduced geospatial semantic web is being adopted by a great many new and existing digital cultural heritage repositories to overcome these challenges. However, no one has yet conducted a conceptual survey of the geospatial semantic web concepts for a cultural heritage audience. A conceptual survey of these concepts pertinent to the cultural heritage field is, therefore, needed. Such a survey equips cultural heritage professionals and practitioners with an overview of all the necessary tools, and free and open source semantic web and geospatial semantic web platforms that can be used to implement geospatial semantic web-based cultural heritage repositories. Hence, this article surveys the state-of-the-art geospatial semantic web concepts, which are pertinent to the cultural heritage field. It then proposes a framework to turn geospatial cultural heritage data into machine-readable and processable resource description framework (RDF) data to use in the geospatial semantic web, with a case study to demonstrate its applicability. Furthermore, it outlines key free and open source semantic web and geospatial semantic platforms for cultural heritage institutions. In addition, it examines leading cultural heritage projects employing the geospatial semantic web. Finally, the article discusses attributes of the geospatial semantic web that require more attention, that can result in generating new ideas and research questions for both the geospatial semantic web and cultural heritage fields.

Keywords:

geographic information science; geospatial semantic web; cultural heritage; digital cultural heritage content management; geospatial ontologies

1. Introduction

Recent advances in remote-sensing technologies and imaging devices have opened up new possibilities for digital recording of cultural heritage (CH) sites and collections. Indeed, these technologies allow for the production of very realistic digital replicas of CH sites and collections [1]. This generated CH data is often geographically referenced, thereby incorporating geographical location and time references, the resulting geospatial data often appears in a wide range of geospatial file formats. In turn, geospatial data, and location and time references can be used to discover interesting connections and relationships among cultural heritage resources. Hence, geospatial data is a major component in the CH field [2,3].

Furthermore, the volume of digital cultural heritage data is growing significantly, as CH sites and CH collections are being digitized at exponentially accelerating rates by CH digitization projects, government bodies and CH institutions (galleries, libraries, archives, and museums) around the globe [4,5]. Accordingly, the ever-increasing growth of digital CH data facilitated the development of cultural heritage repositories, which are playing a major role in digital preservation and dissemination, tourism, and CH research among others [6,7]. An apt example of this is the Europeana portal, which provides metadata information about digital cultural heritage content from European countries [8,9]. However, there are many other smaller repositories all over the world, in which data is published as raw dumps in different file formats lacking structure and semantics. Some of the major technical challenges in these repositories are data integration and interoperability among distributed repositories at the national and regional level. As a result, poorly-linked CH data is fragmented in several national and regional repositories, backed by non-standardized search interfaces. All these technical challenges are limiting users’ attempts to contextualize information from distributed repositories [10].

The semantic web is a new extension of the existing World Wide Web, it offers a set of best practices for publishing and interlinking structured data on the Web. In the semantic web, data is published in a machine-readable and processable format, and data has explicit meaning to machines defined by ontologies, and the resulting datasets are interlinked [11]. The geospatial semantic web is an extension of the semantic web, where geospatial data has explicit meaning to machines defined by geospatial ontologies. Geospatial data has distinct features such as geometries, a coordinate reference system (CRS), a topology of geometries, all of which require special attention when encoding into a machine-readable and processable format. In return, the geospatial semantic web offers capabilities such as geographic information system (GIS)-based analysis, spatio-temporal queries and interlinking with other external data sources to provide geospatial context for the specific topic. These features of the geospatial semantic web are uncommon in the semantic web because the semantic web is not designed to deal with complex geospatial data [12,13]. Since the geospatial semantic web incorporates geospatial data as well as semantic web concepts, several new and existing CH repositories have adopted the geospatial semantic web concepts to deal with geospatial CH data and to overcome the aforementioned issues of interoperability and integration [14,15,16]. Moreover, several web platforms based on these concepts have been developed to improve data integration, interoperability, long-term preservation, accessibility of digital CH data, such as WissKI1, Arches2, Omeka3, etc.

As the geospatial semantic web is a new emerging concept, a limited number of surveys have been conducted. Li et al. [17] reviewed the state of the geospatial semantic web, outlined challenges and opportunities of exploiting big geospatial data in the geospatial semantic web, and proposed future directions. Kuhn et al. [18], Nalepa and Furmańska [19] explained major innovations in the field of geographic information science brought about by semantic web concepts. Faye et al. [20] and Rohloff et al. [21] surveyed triple store technologies for linked data. Battle and Kolas [22] discussed the overall state of geospatial data in the semantic web including state of a query language GeoSPARQL in industry and research. Buccella et al. [23] surveyed widely used approaches to integrate geospatial data using ontologies. Ballatore et al. [24] reviewed geospatial data in open knowledge bases such as DBpedia, LinkedGeoData, GeoNames, and OpenStreetMap, particularly paying attention to the quality of geodata, and to crowdsourced data.

However, there has not yet been conducted a conceptual survey of the geospatial semantic web concepts for a cultural heritage audience. A conceptual survey of these concepts pertinent to the cultural heritage field is, therefore, needed. Such a review equips CH professionals and practitioners with a useful overview of all the necessary tools, and free and open source (FOSS) semantic web and geospatial semantic web platforms that can be used to implement geospatial semantic web-based CH repositories.

Hence, the objectives of this article are as follows:

To review the state-of-the-art geospatial semantic web concepts pertinent to the CH field.
To propose a framework to turn geospatial CH data stored in a vector data model into machine-readable and processable RDF data in order to use in the geospatial semantic web with a case study to demonstrate its applicability.
To outline key FOSS semantic web and geospatial semantic web-based purpose-built platforms for CH institutions.
To summarize leading cultural heritage projects employing geospatial semantic web concepts.
To discusses attributes of the geospatial semantic web concepts requiring more attention, which can result in the generation of new ideas and research questions for both the field of the geospatial semantic web and CH.

The remainder of the article is structured as follows: In Section 2, it discusses a web evolution and the state-of-art in the geospatial semantic web including cultural heritage domain ontologies. In Section 3, it proposes a framework to turn CH data into machine-readable and processable RDF data with its subsequent storage and retrieval. It demonstrates the applicability of this framework with a case study. Section 4 outlines key FOSS semantic web and geospatial semantic web-based purpose-built platforms for CH institutions. In Section 5, leading CH projects are presented that employ semantic web and geospatial semantic web concepts. Section 6 outlines challenges and technical limitations of the geospatial semantic web followed by a concluding summary in Section 7.

2. Web Evolution and the Geospatial Semantic Web

The idea of the World Wide Web was first introduced by Tim-Berners-Lee in 1989. It has now become the largest global information space for sharing information. The web technologies witnessed a great deal of progress in the last two decades (see Figure 1). The first generation of the web, also known as a read-only web, web of documents, and Web 1.0, allows users only to search for information and read it. The websites are based on static HTML pages represented as hypertext in which elements can be linked to different resources. Hypertext transfer protocol (HTTP) is used as a means of communication between clients and servers [25].

The second phase in the evolution of the web is Web 2.0, also referred to as a read-write web and web of people. It has several interpretations (see for example [26]), but in general, it can be understood as an idea to describe the web as a user-centric and socially-connected platform, where people have more interaction and collaboration than in its predecessor. A set of new technologies have been introduced along with Web 2.0 such as AJAX (JavaScript and XML), Adobe Flex, Google Web Toolkit, etc. This allowed development of more dynamic, interactive websites and web applications, in which users can publish content to the web and modify existing ones. The foundations of Web 2.0 can also be demonstrated by the development of well-known web applications including social networking platforms (e.g., Facebook), web-blogs (e.g., WordPress), Wikipedia, media-sharing platforms (e.g., YouTube) among others [27].

In recent years, a new extension to the Web has started to emerge named Web 3.0, which is the third phase in web evolution. Web 3.0 is also known as the semantic web and the web of data. The previously discussed Web 1.0 is a read-only web to share information, and Web 2.0 is a read-write web which offers increased collaboration and social interaction among users. In the second phase, users have the capability to publish content to the web, with hypertext links to other resources, and machines can display it. However, it is only for human consumption and machines themselves do not have an ability to process and “understand” the content. The semantic web is being developed to overcome these problems. The semantic web extends the core principles of Web 2.0 rather than replacing them, and adds semantics to the web by structuring web content and facilitating software applications to process and ‘understand’ the structured content. The semantic web incorporates several standardized components by the World Wide Web Consortium (W3C) such as the resource description framework (RDF) data model to represent web resources in a machine-readable format, ontologies to give well-defined meaning to data, SPARQL Protocol and RDF Query Language (SPARQL) to query data in the semantic web, triple stores to store RDF data to name a few (to be discussed in later sections) [11]. This standardization in the semantic web provides a shared concept allowing cultural heritage data to be published, interlinked, searched and consumed more effectively than with web 2.0.

However, as mentioned in the introduction section, a problem with the semantic web arises when CH data is supplied as geospatial data in various geospatial file formats along with coordinate reference system (CRS), and topology. Geospatial data management in the semantic web is, therefore, tackled by its extension termed the geospatial semantic web. The geospatial semantic web brings together concepts of the semantic web and geospatial data, and adds a spatio-temporal dimension to the semantic web. CH data often includes geospatial data, for instance, current and historical places, CH objects and artifacts, and historical maps are typically associated with some type of locational and time-based references. To encode geospatial CH data into a machine-readable and processable format, the geospatial semantic web offers distinct technical specifications, including:

A RDF data model;
Geospatial ontologies of the geospatial semantic web;
Geospatial query languages of the geospatial semantic web;
Spatio-temporal triple stores to store geospatial RDF data.

In the next sub-sections, these specifications are discussed in detail.

2.1. Data Models

RDF is an abstract data model to represent web resources in the semantic web. It is designed to be read and, most importantly, understood by machines to provide better interoperability among computer applications. RDF structures data in the form of subject, predicate, and object, known as triples. The subject defines resource being described, the predicate is the property being described with respect to the resource, and the object is the value for the property. The subject and predicate are both represented by the uniform resource identifier (URI), while the object can be a URI or a string literal. An important feature of RDF is flexibility for resources to be a subject in one triple and an object in another one [11]. For instance, “Curtin University Bentley campus is based in Perth city” and “Perth city is the largest city in Western Australia”. From a semantic web perspective, in this example, Perth city is an object in the first example and a subject in the second. This allows RDF to find connections between triples, which is referred to as linked data. RDF is used in combination with ontologies to provide semantic information about resources being described. Ontologies and vocabularies are discussed in the next section. The RDF data model is not only stored in an XML file format, there are some alternative existing RDF formats. Despite the difference in formats, resulting triples are logically equivalent. Some of the most-used RDF serialization formats are RDF/XML, N-Triples, JSON-LD, Turtle, and Notation3 (N3). For an explanation of RDF formats we recommend an article by Gandon et al. [28].

2.2. Geospatial Ontologies and Geospatial Semantic Web Query Languages

The term “ontology” originated in the field of philosophy, in which it refers to the subject of existence. In computer science and information science fields, an ontology is a data model, which is used to specify domain-based concepts and relationships between these concepts [29]. There also exists a term called “vocabulary” in the semantic web, which is often referred to be equal to the term “ontology”. According to W3C4, “There is no clear division between what is referred to as ‘vocabularies’ and ‘ontologies’. The trend is to use the word ‘ontology’ for more complex, and possibly quite formal collection of terms, whereas ‘vocabulary’ is used when such strict formalism is not necessarily used or only in a very loose sense.” Ontologies and vocabularies in the semantic web provide a structure for organizing implicit and explicit concepts and relationships. There are several reasons to use an ontology in the semantic web and the geospatial semantic web such as to analyze domain knowledge, or to separate domain knowledge from operational knowledge [30]. However, the main reason to use ontologies in the semantic web and the geospatial semantic web is to improve data integration by tackling an issue of semantic heterogeneity.5

Semantic heterogeneity is a term to describe a disagreement about the meaning and interpretation of data [31]. Ontologies can be one of three types, namely, domain ontology, and upper ontology (also known as top-level ontology) and hybrid ontology. Domain ontology specifies concepts, terminologies, and thesauri that are associated within a certain domain. For instance, a geospatial ontology defines concepts and relationships for geospatial data. To be more precise, spatio-temporal concepts and geospatial relations such as equal, disjoint, touch, within, contain can be mapped to geospatial ontologies, which then can facilitate shareable and reusable geospatial information. Upper ontology specifies concepts which can be used across a wide range of domains. It represents concepts for general things such as an object, properties, space, roles, functions and relation, which can be found in many domains. Hybrid ontology is a combination of domain ontology and upper ontology [32]. In the following sub-sections, the article discusses geospatial ontologies and a cultural heritage domain ontology CIDOC-Conceptual Reference Model (CRM) in more detail. Furthermore, there are query languages termed SPARQL and GeoSPARQL, which are query languages for data in RDF in semantic web and the geospatial semantic web respectively. SPARQL is a standard by W3C, while GeoSPARQL is a standard by the Open Geospatial Consortium (OGC). The next sub-sections cover those languages in detail as well.

2.2.1. GeoSPARQL Vocabulary and Query Language

In recent years, geospatial organizations, research groups, triple store vendors have attempted to implement their own geospatial ontologies and strategies in order to deploy geospatial data in the semantic web. Unfortunately, this led to inconsistency. Hence, properly represented spatial RDF data in one organization became incompatible with spatial RDF data from another organization [33]. To resolve this, a standardized geospatial ontology and query language termed GeoSPARQL has been proposed by OGC. OGC is an international not-for-profit organization which implements quality open standards for the global geospatial community. GeoSPARQL is a small vocabulary to describe geospatial information to supply for the geospatial semantic web as well as a geospatial extension of the semantic web query language of SPARQL. It can be combined with other domain-based ontologies to fulfill the needs for geospatial RDF data. GeoSPARQL implements a basic core for representing and accessing geospatial data published in a machine-readable and processable format in the geospatial semantic web [34]. Published RDF data can be queried with spatial functions provided by GeoSPARQL to discover interesting connections and relationships among cultural heritage resources.

An example of a GeoSPARQL query is illustrated in Figure 2. This GeoSPARQL query finds all performing arts centers—in this case, denoted as PAC—which are located within 30,000 m from the place “The Perth Mint” in the city of Perth, Western Australia. It uses the GeoSPARQL “geof:distance” function to calculate the distance between each performing arts centre and “The Perth Mint”. In this example, it is assumed that URI for the data is located at http://example.com/.

2.2.2. CIDOC-Conceptual Reference Model (CRM) and CRMgeo

CIDOC-Conceptual Reference Model (CRM) is an ontology developed to define concepts and relationships in CH data. It has been developed by two working groups of CIDOC namely the CIDOC Documentation and Standards Working Group and CIDOC CRM SIG. The ontology is not developed for CH institutions in any one country or a region, but any CH institution in the world can benefit from using it. In addition, it is an ISO standard since 2006. The main aim of the ontology is to provide semantic interoperability and integration among heterogenous CH resources. It offers more than 80 classes and 130 properties, which CH organizations can organize their datasets with and define relationships among them. It follows an object-oriented data model and offers the possibility for extensions [35].

As mentioned earlier, CH data is often accompanied by a spatio-temporal reference. Hence, there have been many projects which attempted to integrate geoinformation with CIDOC-CRM. For instance, AnnoMAD System integrated geographic markup language (GML) with CIDOC-CRM to annotate spatial descriptions in free-text archeological data [36], the CLAROS6 project used basic spatial coordinates together with CIDOC-CRM. Recently a formal geospatial extension to CIDOC-CRM termed CRMgeo has been developed. CRMgeo has been integrated with the GeoSPARQL vocabulary to add spatio-temporal classes and properties to the ontology. It allows encoding location and time-related concepts and relationships of CH data under CIDOC-CRM. It adheres to formal definitions, encoding standards and topological relations used in the GeoSPARQL vocabulary, thus geospatial functionalities of GeoSPARQL can be fully utilized on CH data published under CRMgeo. The integration method of CIDOC-CRM with the GeoSPARQL vocabulary is given in Figure 3. The letters ‘SP’ denote classes and ‘E’ represents properties. GeoSPARQL properties are highlighted in blue, where “Feature” is a subclass of “Spatial Object” and can have “Geometry” in one of two serialization formats, well-known text (WKT) or GML [37].

2.2.3. Geospatial Functions in SPARQL Query Language

SPARQL Protocol and RDF Query Language is a query language for the semantic web. It is one of the key pieces of technology stack of the semantic web. SPARQL is not designed to query geospatial data and does not provide a comprehensive set of geospatial query capabilities. However, it supports several simple geospatial functionalities such as “st_intersects” (intersection between two geometries), “st_within” (check if geometry A is within geometry B) and others. If the case is to use simple geospatial queries on the semantic web, such as to check if a CH site is located within specified proximity or to check if geometries of two or more CH sites intersect, then SPARQL can handle them. The following example of a SPARQL query in Figure 4 demonstrates geospatial functionality of “st_intersects”, performed on a DBPedia semantic web endpoint7 at http://dbpedia.org/snorql/. DBPedia is a semantic web version of a Wikipedia project and allows the querying of Wikipedia content using SPARQL.

This query returns all places that lie within 10 km of Perth city in Western Australia.

The result of the query is shown in Figure 5. The DBpedia semantic web endpoint provides results in JSON, XML and XML+XSLT formats.

2.3. Spatio-Temporal Triple Stores

Currently there are two existing approaches to make RDF data available for use by the geospatial semantic web. The first approach is to use direct mapping languages that can construct RDF data from relational database management systems (RDBMS). This allows the accessing of data in RDBMS as if it were RDF data in a triple store. The RDF data in this case becomes virtual or read-only RDF data. The main advantage of using this approach is there is no need to duplicate data and store them in an actual triple store. There are some RDBMS which are shipped together with this RDBMS to RDF capability. For instance, databases such as Jena SDB, IBM DB2-RDF have this RDBMS to RDF capability. If a database does not support this capability, then additional systems can be used such as D2RQ, R2RML and Ultrawrap. The second approach is to use a triple store to store RDF data. A triple store is a database which allows storing and accessing RDF data. This article discusses the second approach in detail.

The semantic web stores RDF data in a different data model to SQL-based RDBMS or non-relational database systems (NoSQL) databases. In the semantic web an RDF database, a graph database, and a triple store are used interchangeably to refer to storage systems for RDF data. Currently, there are various existing triple stores that can store RDF data, such as Blazegraph,8 Apache Jena,9 and others. However, for storing RDF data with spatio-temporal information, dedicated spatio-temporal triple stores should be used, such as Parliament, Oracle Spatial and Graph 12c, and Strabon. Hence, a feature matrix for a few of the well-known triple stores (see Table 1) is provided. It can be seen from Table 1 that all five spatio-temporal triple stores support GeoSPARQL query language capabilities, and GeoSPARQL geometry and topology extensions except OpenLink Virtuosa. It supports the GeoSPARQL query language. However, as regards geometry it supports only point data and thus no support is available for lines and polygons. All of them can handle multiple CRS (coordinate reference systems) but uSeekM manages only WGS_84 (World Geodetic System 1984).

Parliament is a triple store and rule engine compliant with RDF and OWL semantic web concepts, however, it does not have an integrated query processor to accept semantic web queries. Hence, it is usually paired with Jena or Sesame to process queries and thanks to third-party integrations it has extra possibilities such as spatio-temporal support and numerical indexing [38]. Regards Parliament’s limitations, it is difficult to configure for novice developers, and—at the time of writing—it does not support Sesame version 2 or later versions.

Oracle Spatial and Graph 12c is the most advanced triple store for enterprise-class deployments at the time of writing. It provides management features including graph database access and analysis operations, which can involve geospatial analyses as well through a Groovy-based shell console. Furthermore, it offers a considerable number of advanced performance accelerators and other enhancements for the semantic web, in particular, optimized schema to store and index RDF data, fast retrieval of RDF data based on Oracle-text and Apache Lucene technologies and RDB-to-RDF conversion among others [39]. A limitation of this triple store is its proprietary license, in addition to its complexity, which requires technical professionals to set-up and maintain.

uSeekM is not a triple store but an extension library for triple stores based on Sesame. This extension, when paired with Sesame based triple stores, supports OpenGIS Simple Features including Point, Line, Polygon and GIS relations such as Intersects, Overlaps, and Crosses.10 One of the advantages of using this extension is it provides new modern types of indexing such as GeoHash and Quadtree, PostgisIndexer, which should result in better performance, particularly in searching and retrieval of spatio-temporal data in the geospatial semantic web. On the other hand, the limitation of uSeekM is that it does not support multiple CRS but only WGS_84.

OpenLink Virtuosa is an engine which incorporates a web server, RDF quad-store (i.e., graph, subject, object, and tuples), SPARQL processor, and OpenLink Data Spaces for a Linked Open Data based Collaboration Platform (i.e., application suite for a wiki, webmail, bookmarks, etc.) [40].

Strabon is an open source RDF triple store based on the well-known Sesame RDF store. It extends the Sesame with a query processor engine and storage of their custom-build stRDF (data model for the representation of spatio-temporal RDF data) and stSPARQL (an extension to SPARQL) query language similar to OGC GeoSPARQL to query stRDF data [41]. A major limitation of Strabon is that most OGC GeoSPARQL functionalities were not supported at the time this article was written.

3. Proposed Framework for the Geospatial Semantic Web

As mentioned before, CH data is often accompanied by spatio-temporal references, which comes in various geospatial file formats. This section demonstrates a workflow to turn geospatial CH data in a vector file format into RDF data, with its subsequent storage and retrieval features, illustrated in Figure 6.

Geospatial CH data in a vector data model that is stored in spatio-temporal databases such as PostGIS and Oracle Spatial or in vector file formats such as Esri Shapefiles, Keyhole Markup Language (KML), GML can be converted into RDF data using tools, as for example GeoTriples11 and TripleGeo.12 These tools extract geospatial features in vector files such as points, lines, polygons along with thematic attributes such as identifiers, names from the geospatial CH files and transform them into machine-readable and processable RDF data. Once the data is converted into RDF data using these tools, it can be populated into a spatio-temporal triple store for storage and retrieval. A triple store provides an endpoint, which is designed to publish and access RDF data in addition to alter existing RDF data. Semantic web query languages such as GeoSPARQL and SPARQL allow querying RDF data from the endpoint and the resulting query can be used by geospatial semantic web applications.

The following paragraphs present this framework with a case study. The tool TripleGeo can encode data represented only in a vector data model, hence, it is the main criteria for data suitable for this framework. Also the geometry, and thematic attributes such as ID, name are other important data characteristics to take into account, but temporality has not been considered in this framework. For this case study, data about statutorily protected places in Perth city in Western Australia is used, available under a Creative Commons license at the webpage of the Heritage Council of Western Australia.13 The geometry of the data is represented in a vector data model and stored in an ESRI Shapefile format. It contains 2017 polygons, each polygon hold thematic attributes such as FID (identifying number for each polygon), shape name (polygon), unique place number, place name, place location, length and area of for the shape, as shown in Figure 7. This data is then converted into RDF Turtle syntax using the aforementioned tool TripleGeo. The RDF file contains all the polygons included in the original file (a link for the file can be found in the “Supplementary Materials” section of this article), however, a sample from the RDF file that is the place named Peter Pan Statue, Queen’s Garden, is illustrated in Figure 8. The prefixes in the RDF data are URI namespaces for GeoSPARQL ontology, for XML schema, for URI resource of the file (http://examplePerthWesternAustralia/URI#). A property “Georesource” in the RDF data represents an identity of the polygon, which is “Geom_polygons_2172”. The geometry of the polygon is in RDF based on WKT, which is linked to WKT via the property “geo:asWKT”.

Once the data is converted into RDF data, it is populated into a spatio-temporal triple store “Strabon”, which offers an endpoint as shown in Figure 9. This endpoint then allows the accessing of RDF data stored in the triple store. Figure 10 illustrates the query respond that contains the previously discussed Peter Pan Statue, Queen’s Garden, as a polygon in SPARQL/JSON format. This endpoint can accept queries and the result of the query can be used to build geospatial semantic web applications. Variables s, p and o denotes subject, predicate and object, respectively.

4. Free and Open Source Semantic Web and Geospatial Semantic Web-Based Purpose-Built Platforms for Cultural Heritage (CH) Institutions

In past years, some FOSS purpose-built platforms for CH instructions have been developed, which incorporate semantic web or geospatial semantic web concepts. As these platforms are FOSS, they are completely free to use, and the source code is also available for free, which can be configured and extended without any restrictions in order to meet the needs of CH projects. Not only can this type of platforms offer unlimited flexibility, but also decrease software costs for CH institutions. Table 2 provides some information about these platforms such as supported semantic web and geospatial semantic web concepts, supported content types and visualizations. The following paragraphs discuss each one of them in more detail.

WissKI is a web-based virtual research environment (VRE) and content management system (CMS), which aims to become a Swiss army knife for scholars from any discipline who work with object-centric documentation. It has been developed by three scientific institutions, namely the Digital Humanities Research Group of the Department of Computer Science at the Friedrich Alexander University of Erlangen-Nuremberg (FAU), the Department of Museum Informatics at the Germanisches Nationalmuseum (GNM), and the Biodiversity Informatics Group at the Zoologisches Forschungsmuseum Alexander Koenig (ZFMK). It implements semantic web concepts to support scientific projects in CH institutions that collect, store, manage and communicate knowledge, and it provides long-term availability and interoperability of research outputs, as well as the identity of authorship and the authenticity of the information. People with a limited technical background can benefit from this easy-to-use platform to create enriched content for the World Wide Web, including for the semantic web and geospatial semantic web, presented in a Wikipedia-like style incorporating textual, visual and structured information. It supports two ways of creating content on the platform. The first way is to enter data using traditional form-based interfaces and the second is to perform an aggregation of data in free texts by utilizing web-based knowledge management approaches. In the latter case, a parser in the system recognizes named entities such as date and time phrases, names, and place names, and enters them to the system. Users of this platform can perform semantic text annotations by submitting free text, which is then analyzed to connect entities (e.g., names, places, dates) to the system’s knowledge base. WissKI is a modular extension of the well-known Drupal CMS. It is shipped with an ontology Erlangen CRM, which is an OWL-DL 1.0 (Web Ontology Language) implementation of CIDOC-CRM, however, any other ontology can also be imported to the system. A triple store called ARC14 is used as a storage for RDF, which offers a SPARQL endpoint to query available data in the knowledge base [42].

Arches is a heritage inventory and management system developed through a collaboration between the Getty Conservation Institute and the World Monuments Fund. It is a geospatially-enabled platform for the CH field to document immovable heritage such as CH buildings, CH landscapes, and archeological sites. The goals of the platform are to offer long-term preservation and interoperability for CH datasets. It employs CIDOC-CRM as an ontology to define concepts and relationships within datasets. This relationships in the system can be visualized using an interactive and exploration software called Gephi.15 International Core Data Standards for Archeological and Architectural Heritage (CDS) are used to define data fields in the platform. These fields can then be mapped to entity classes in CIDOC-CRM. Furthermore, it follows OGC standards which means the platform is compatible with desktop GIS applications. GIS features of the platform include a map-based visualization with comprehensive spatial queries, as well as drawing, importing and editing CH resource geometries. Results derived from the spatial query and editing of CH resource geometries can be exported into GIS formats (e.g., shapefile), and vocabulary concepts can be exported into the SKOS format. Another interesting feature of the Arches platform is its mobile app called “Arches Collector16”, which allows the collecting of heritage data in the field. The collected data is then synchronized with the installed Arches platform. Through use of this feature, administrators of the platform can design projects by setting up participants, type of data to be collected, place of the data collection, and collection time. Approved users can then connect to the Arches platform, download the project, collect the data and save it to the Arches platform. Arches is developed using a Python-based Django17 web framework together with other web frameworks (Require.js, Backbone.js, jQuery, Bootstrap). Arches platform is fully customizable, which means other controlled vocabularies including the semantic web, and related geospatial semantic web standards can be implemented on top of the existing platform [43].

ResearchSpace is a web-based platform that offers a collaborative environment for humanities and CH research projects. It has been developed by the British Museum and utilizes semantic web concepts. It includes work to integrate heterogeneous datasets without losing meaning or perspective. This platform uses CIDOC-CRM to define concepts and relationships in the datasets. The project also aims to implement a contextual search system and other tools to enhance collaboration in projects. It offers several types of instruments to explore and visualize complex datasets. For instance, “Narratives”—this tool allows creating documents incorporating semantically defined entities and interactive visualizations. “Semantic Diagram” is another tool to explore connections between objects, people, events, and places in the system. An image annotation tool is another interesting feature of the platform, which allows the attaching of text information to a selected area of the image. ResearchSpace is based on “metaphacts”,18 which is a knowledge graph platform.19

Omeka is a content management system developed jointly by Corporation for Digital Scholarship, the Roy Rosenzweig Center for History and New Media, and George Mason University. According to Cohen [44] who introduced Omeka, “Omeka, [derives]from the Swahili word meaning ‘to display or layout goods or wares; to speak out; to spread out; to unpack.’”. It provides a web platform for CH institutions to publish and manage digital CH collections. Omeka offers three options for using this platform. The first option is “Omeka S”, which is a web-based platform for publishing CH data, and for interlinking datasets with other online resources using semantic web concepts. The second option “Omeka Classic” offers a web-publishing platform but without the use of semantic web concepts. “Omeka.net” is the third option, which provides a web-publishing platform without involving semantic web concepts in their hosted service. In this article, “Omeka S” is discussed as it includes publishing data for the use in the semantic web and the geospatial semantic web. A feature distinguishing “Omeka S” from other platforms that are otherwise similar is that it allows the management of multiple sites from one single installation. This could be helpful for organizations needing to run multiple sites and manage them from a single platform. Not only can “Omeka S” provide an easy-to-use platform, but it also provides a possibility to extend the functionality with modules. It is shipped with four pre-built ontologies namely Bibliographic Ontology, Dublin Core, Dublin Core Type, and Friend of a Friend. In addition, other ontologies can also be imported to the platform. In regards to visualization methods, it supports map, image, and 3D model visualizations. Omeka is written in a PHP programming language and requires a LAMP web service stack to run, which are developed via Linux, Apache HTTP Server, MYSQL and PHP.20

Fedora (Flexible Extensible Digital Object Repository Architecture) is a robust and modular repository system, which includes integration with semantic web concepts. A number of universities, research institutions, government agencies and CH organizations around the globe have contributed to the development of the framework. It is currently led by the Fedora Leadership Group. Fedora is fully customizable and is used by CH institutions, universities, research institutions to store and manage digital content. Nevertheless, it is not a complete application that can be used from uploading data to the presentation of digital content. It is a framework upon which institutions can build their platforms. Middleware solutions such as Samvera21 and Islandora22 can also be used on top of it. Fedora supports defining relationships using ontologies and expressing them in RDF. It supports integration with external triple stores so resulting RDF triples can be stored in a plugged-in triple store. Fedora is written in a Java programming language and supports MYSQL and PostgreSQL databases. Files in Fedora can be stored in a local storage or in cloud storage solutions such as Amazon S3 [45].

5. Applications of the Geospatial Semantic Web in CH Domain

Linked CH data can be accessed from various sources. These sources can be divided into three categories, namely data aggregating platforms (e.g., Europeana, Advanced Research Infrastructure for Archeological Dataset Networking in Europe (ARIADNE)), metadata connectivity platforms (e.g., Pelagios), Gazetters or APIs (e.g., Geonames). In the following sections, we discuss key projects related to those categories that employed geospatial semantic web concepts.

5.1. Pelagios Commons

One of the first steps in applying the geospatial semantic web in the CH domain or a project in a closely aligned direction is Pelagios, run by a consortium of Pelagios Commons, funded by The Andrew W. Mellon Foundation. The aims of the project are as follows:

Create links among ancient places and help users to reference their ancient or historical geo-related data.
Make links more discoverable and visualize them in a meaningful way.

They achieve the above aims by providing a web-based infrastructure which allows the Recogito tool to annotate text against images (i.e., ancient maps, images in digital books). Pleiades Gazetteer of the Ancient World can retrieve referenced places of the ancient world as URIs. The storage of the data in the system is in RDF format and the vocabulary used to describe places is the Open Annotation Data Model23. Furthermore, VoID (vocabulary of interlinked datasets)24 is used to describe general metadata about places including a general description, details of a publisher and license information. The infrastructure does not store data being referenced as it is not a data repository nor is it a data aggregator platform, rather it stores metadata about places and refers to URIs of places. As for visualization of referenced ancient places, Pelagios offers map-based visualization where users can search for ancient places that are of interest to them and examine the network of visualization of ancient places, and extract relevant data. Pelagios offers an HTTP API that allows other third-party applications to send a request and consume raw data from the API in different RDF formats. This provides an opportunity to take Linked Open Data from Pelagios and deploy it in other projects or develop mashups and applications [46].

5.2. Advanced Research Infrastructure for Archeological Dataset Networking in Europe (ARIADNE) (Now ARIADNE PLUS)

ARIADNE is a project of the European Union (EU) funded by the Seventh Framework Programme of the European Commission. It was built upon a consortium of 24 partners in EU as well as 15 associate partners. The main objective of the project was to develop an aggregator infrastructure that facilitates data connection, sharing and searching among European archeological institutions, hence achieve better interoperability across disperse archeological data collections. To achieve it they developed the ARIADNE Catalogue Data Model as archeological institutions in Europe store collections of data in different formats, languages, and metadata schemas.

This model stores metadata information about archeological data and follows a “who, what, where, when” paradigm. To describe and encode information (e.g., monuments, pottery, and excavations) related to the “what” pattern, they used the Art and Architecture Thesaurus (AAT)25 of the Getty Research Institute. For the “where” pattern the spatial coordinates (longitude, latitude) in WGS_84 CRS have been used, whereas the information and a schema for encoding the information related to a “when” pattern have been facilitated in collaboration with a PeroidO26 temporal gazetteer of historical periods. The portal itself has been developed using a PHP-based Laravel MVC (model-viewer-controller) frameworks and conceptual classes for archeological data have been implemented by CIDOC-CRM, and the above mentioned “who, where, what, when” paradigm information mapped into DCAT (data catalog)27 vocabulary [47].

5.3. Pooling Activities, Resources, and Tools for Heritage E-Research Networking, Optimization, and Synergies (PARTHENOS)

PARTHENOS (Pooling Activities, Resources, and Tools for Heritage E-research Networking, Optimization, and Synergies) is an EU project funded by the EU Horizon 2020 program. The main aim of the project is to build a framework, which would work as a bridge between existing EU digital humanities aggregator infrastructures namely ARIADNE, CENDARI, CLARIN, CulturalItalia, DARIAH, EHRI, and TGIR. The partners argue that this framework eventually will facilitate an environment where humanists can find and use available humanities data resources including: download and process the data, share digital tools to re-use data, build dynamic virtual environments to find specific resources relevant to their research area, run different computational services on the data, and share the results between collaborators. To accomplish this, they have been developing a framework called the Content Cloud Framework, comprising a common data model that describes available data, services, and tools of all involving infrastructures but in a standardized manner. The data and tools are made available via user interfaces and semantic endpoints of SPARQL [48].

5.4. Geospatially Linked Data in Digital Gazetteers

Digital gazetteers can be loosely defined as a geodatabase for place names or toponyms. They usually include attributes and features for places such as a name of the place, location details (coordinates representing point, line or polygon), type of place (region, country, etc.) and others [49]. Digital gazetteers have paramount importance in providing access to location information that is used in research projects [50], geographic context retrieval systems [51], and location-based services, to name but a few. With the advancement of the geospatial semantic web technologies, new digital geospatial semantic gazetteers have been evolving and existing non-semantic ones have been transforming into geospatial semantic ones. Geospatial semantic web gazetteers are rich sources of geo-linked open data that are readily available for consumption in the geospatial semantic web and related projects. Examples of gazetteers available as geo-linked open data include the Getty Thesaurus of Geographic Names (TGN), Geonames, the Pleaidas Gazeetter of Ancient World and so on.

TGN is based on the Getty vocabulary and includes geo-information (location, relationships, bibliography) related to historical places and, most importantly, has a focus on places relevant to CH, art, and humanities. Despite the fact that it does not provide information on the map interface, the results returned from the queries can be visualized on top of maps using other open source mapping libraries [52].

Geonames gazetteer is a world geographical database containing location information of all countries in different languages, and users can add new place names and edit existing ones. Geonames have developed their own ontology to represent geospatial data in RDF to the geospatial semantic web, and there is a semantic query endpoint available that can return queries in RDF format.28

In fact, all geospatial semantic digital gazetteers provide access to their linked data via a semantic query endpoint, which accepts geospatial semantic web queries, processes them and returns a response. The notable advantage of the semantic and the geospatial semantic web regarding querying is their ability to send federated queries. This can be explained as dividing a query into subqueries and sending it to semantic query endpoints. This allows the easy integration of linked data from different semantic query endpoints, in a standardized data model (RDF) [53]. This is one of the powerful features of the semantic web and geospatial semantic web, and it facilitates an easier way to build applications with a suitable combination of heterogeneous data from various resources.

6. Technical Limitations of the Geospatial Semantic Web

As discussed in the previous sections, in recent years the geospatial semantic web has matured a great deal and has benefited from many advancements developed by large-scale research projects, as well as benefitting from smaller individual explorations. However, since it is still a new technology, there are many technical challenges that need to be tackled to take full advantage of it and successfully implement it across different disciplines, in a similar fashion to current GIS systems. Some of the global geospatial semantic web challenges include raster data representation and query, 3D data representation and query, as well as big geospatial data. There are also many other technical challenges exist such as disambiguation, a plethora of data models to choose from for encoding geospatial data, entity matching, and others. Nevertheless, this article only concentrates on major obstacles hindering the geospatial semantic web from being applied to as many domains as GIS systems are applied to.

6.1. Raster Data

Raster data is a data model for representing spatial phenomena, that is to say, it is used for representing various types of geospatial data such as continuous data (e.g., elevation, precipitation, temperature), thematic data with classifications (e.g., land use with different classifications such as vegetation, soil) [54]. The sources of raster data include images coming from satellites, UAVs (unmanned aerial vehicles), scanned maps, and many others. In the CH domain, raster data can be employed in many case-studies, such as analyzing, visualizing, monitoring CH sites [55,56]. However, at the time of writing this research article, raster data representation was not supported in the geospatial semantic web, more specifically RDF data models GeoSPARQL, stRDF and others do not support raster data. Hence, there is no consistent way of encoding raster data for the geospatial semantic web. The main reason for this is that raster data consists of a matrix of cells each containing a value or values that represents information (e.g., spectral value, category, magnitude, and height). Therefore, converting raster files into RDF would increase the size of the triples substantially or it might even require a few triples to encode a raster file, which in the end could create performance issues [57,58]. Nevertheless, there have already been several attempts research projects attempting to solve this issue. For example, there is a methodology that builds vector objects using image segmentation and then creates RDF for vector objects [59]. This workaround is arguably not a complete solution to represent raster files in RDF as it still does not represent raster geometry in RDF.

6.2. 3D Geospatial Semantic Web

In recent years there has been an ever-increasing interest in advancing 3D modelling technologies that can create a virtual replica of CH objects, display 3D CH models in web browsers, and in the enrichment of 3D CH models with related contextual information using annotation concepts and the semantic web [60,61]. In the CH domain, there have been numerous large-scale projects aligned with the above-indicated points such as INCEPTION (Inclusive CH in Europe through 3D Semantic Modelling) [62], 3D-COFORM [63] and others. Furthermore, many CH institutions now have a repository of 3D models that are available to online users for exploration [6,64].

However, the 3D geospatial semantic web remains an active research area and, in particular, representation and 3D model encoding in RDF, and querying 3D geospatial semantic web are challenges still to be tackled. Consequently, a 3D geospatial semantic web, yet to emerge, should allow the geo-interlinking of 3D CH models with relevant valuable information, provide new possibilities and insights, and answer research questions in the CH domain and beyond. An example of initial steps in the 3D geospatial semantic web is a project by Hor et al. [65], that has the objective to develop an RDF data model that semantically integrates CityGML-3D data encoding standard in GIS and BIM (building information model), and query the data model using SPARQL query language. They claim that the new model has all the classes and properties of both CityGML and BIM, although they have not dealt with spatio-temporal aspects of the geospatial semantic web (which is the main characteristic of web-based geospatial data). Therefore, more research is needed to define consistent ways to encode, interlink, query, process and visualize structured 3D data with the 3D geospatial semantic web.

6.3. Big Geospatial Data and the Geospatial Semantic Web

In recent years, the term “big data” has become ubiquitous in many domains including in CH. To this end, an array of platforms has been developed that deals with big CH data [66,67]. However, the term big data is cumbersome to contextualize and to describe how much data exactly it refers to as it is a relative term and can change quickly over time [68]. Nevertheless, five main characteristics of big data can be defined: high volume; high variety; high velocity; high veracity (i.e., varying quality); and value [69], in the case of geospatial big data, it is a large volume of geospatial data, in addition to the other four factors.

Even though big data concepts have been applied to many CH projects [66,67], big geospatial data is still a new topic to explore and apply in the CH domain. Geospatial big data has many worthwhile characteristics similar to big data, however, heterogeneity and complexity of data are the primary challenges here. Thus, processing and analyzing various types of geospatial data and an enormous volume of distributed data should be further researched as present semantic reasoners and triple stores are not yet able to handle big geospatial data. On the other hand, the geospatial semantic web may also be a solution for dealing with the challenge of heterogeneity in big spatial data. For instance, Koubarakis et al. [70] hold a view that searching, integration and using big spatial data in other applications could be improved significantly if they are published using geospatial semantic web concepts. As a proof of concept, they generated a methodology showing clear benefits in a case-study of wild-fire-monitoring service. Although big geospatial linked data is still very much in its infancy, in the future it may provide new opportunities and may be applied in a relevant and effective way in the CH domain to solve many issues of interoperability, discovery, and processing of heterogeneous big CH data.

7. Conclusions

Currently in many CH repositories data is published as raw dumps in different file formats lacking structure and semantics. Some of the biggest technical challenges in these repositories are data integration and interoperability among distributed repositories at the national and regional level. Poorly-linked CH data is fragmented in several national and regional repositories, backed by non-standardized search interfaces. All these technical challenges are limiting the capabilities of users to contextualize information from distributed repositories. The geospatial semantic web is a new paradigm of publishing, interconnecting, and consuming data on the Web, and it is being adopted by many CH research projects and CH repositories as a solution. Furthermore, in recent years a few FOSS semantic web and geospatial semantic web purpose-built platforms have been developed for CH institutions in order to ease the adoption of this technological progress.

This article provided a conceptual survey of the geospatial semantic web for a CH audience. Firstly, it discussed the state-of-the-art geospatial semantic web concepts pertinent to the cultural heritage field including standardized ontologies by OGC—GeoSPARQL and ISO—CIDOC-CRM. Also, it discussed a geospatial extension to CIDOC-CRM ontology CRMgeo. Furthermore, it compared the technical features of widely used geospatial triple stores to identify their advantages, benefits, disadvantages, and pitfalls. Secondly, it proposed a framework to turn geospatial cultural heritage data stored in a vector data model into machine-readable and processable RDF data to use in the geospatial semantic web (with a case study to demonstrate its applicability). Thirdly, it outlined key FOSS semantic web and geospatial semantic web-based purpose-built platforms for CH institutions. Next, it summarized leading CH projects that employed the geospatial semantic web concepts. Finally, it provided attributes of the geospatial semantic web requiring more attention, which may generate new ideas and research questions for both the field of the geospatial semantic web and CH.

We strongly suggest that employing standardized geospatial semantic web concepts will allow the creation of heterogeneous, multi-format, dispersed cultural heritage data that will prove to be more accessible, interoperable and reusable than ever before. However, raster data representation and query, 3D data representation and query, and big geospatial data, are some of the challenges of the geospatial semantic web, requiring both more thoughtful attention and more results-oriented research projects.

Supplementary Materials

The following are available online at https://www.mdpi.com/2571-9408/2/2/93/s1, Statutory protected places in RDF Turtle syntax based on GeoSPARQL ontology.

Author Contributions

Conceptualization: I.N.; Investigation: I.N.; Resources: I.N.; Writing—Original Draft Preparation: I.N.; Writing—Review and Editing: I.N.; E.C.; D.A.M.; Supervision: E.C.; D.A.M.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Remondino, F. Heritage recording and 3D modeling with photogrammetry and 3D scanning. Remote Sens. 2011, 3, 1104–1138. [Google Scholar] [CrossRef]
Xiao, W.; Mills, J.; Guidi, G.; Rodríguez-Gonzálvez, P.; Barsanti, S.G.; González-Aguilera, D. Geoinformatics for the conservation and promotion of cultural heritage in support of the UN Sustainable Development Goals. ISPRS J. Photogramm. Remote Sens. 2018, 142, 389–406. [Google Scholar] [CrossRef]
Petrescu, F. The use of GIS technology in cultural heritage. In Proceedings of the XXI International CIPA Symposium, Athens, Greece, 1–6 October 2007. [Google Scholar]
Solter, A.; Gajski, D. Project “Towards the virtual museum”—Exploring tools and methods for 3d digitalization and visualization. Opvscvla Archaeol. 2018, 39, 117–124. [Google Scholar]
Preuss, U. Sustainable Digitalization of Cultural Heritage—Report on Initiatives and Projects in Brandenburg, Germany. Sustainability 2016, 8, 891. [Google Scholar] [CrossRef]
Dhonju, H.; Xiao, W.; Mills, J.; Sarhosis, V. Share Our Cultural Heritage (SOCH): Worldwide 3D Heritage Reconstruction and Visualization via Web and Mobile GIS. ISPRS Int. J. Geo-Inf. 2018, 7, 360. [Google Scholar] [CrossRef]
Kondo, Y.; Miki, T.; Kuronuma, T.; Oguchi, T. On-site digital heritage inventory development at Bat, Oman. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2015, II-5/W3. [Google Scholar] [CrossRef]
Scott, R.E. Europeana. Music Ref. Serv. Q. 2013, 16, 118–120. [Google Scholar] [CrossRef]
Valtysson, B. Europeana. Inf. Commun. Soc. 2012, 15, 151–170. [Google Scholar] [CrossRef]
Fresa, A. A data infrastructure for digital cultural heritage: Characteristics, requirements and priority services. Int. J. Humanit. Arts Comput. 2013, 7, 29–46. [Google Scholar] [CrossRef]
Bizer, C.; Heath, T.; Berners-Lee, T. Linked data-the story so far. Int. J. Semant. Web Inf. Syst. 2009, 5, 1–22. [Google Scholar] [CrossRef]
Zhang, C.; Zhao, T.; Li, W. Conceptual Frameworks of Geospatial Semantic Web. In Geospatial Semantic Web; Springer: Cham, Switzerland, 2015; pp. 35–56. [Google Scholar] [CrossRef]
Egenhofer, M.J. Toward the semantic geospatial web. In Proceedings of the 10th ACM International Symposium on Advances in Geographic Information Systems, McLean, VA, USA, 8–9 November 2002; pp. 1–4. [Google Scholar]
Hyvönen, E. Cultural heritage linked data on the semantic web: Three case studies using the sampo model. In VIII Encounter of Documentation Centres of Contemporary Art: Open Linked Data and Integral Management of Information in Cultural Centres; Artium: Vitoria-Gasteiz, Spain, 2016; pp. 19–20. [Google Scholar]
Hyvönen, E.; Mäkelä, E.; Kauppinen, T.; Alm, O.; Kurki, J.; Ruotsalo, T.; Seppälä, K.; Takala, J.; Puputti, K.; Kuittinen, H. CultureSampo: A national publication system of cultural heritage on the semantic Web 2.0. In Proceedings of the European Semantic Web Conference, Crete, Greece, 31 May–4 June 2009; pp. 851–856. [Google Scholar]
Isaksen, L.; Simon, R.; Barker, E.T.; de Soto Cañamares, P. Pelagios and the emerging graph of ancient world data. In Proceedings of the ACM Conference on Web Science, Bloomington, IN, USA, 23–26 June 2014; pp. 197–201. [Google Scholar]
Li, W.; Zhao, T.; Zhang, C. Big Geospatial Data and the Geospatial Semantic Web: Current State and Future Opportunities. In Big Data and Computational Intelligence in Networking; CRC Press, Taylor & Francis Group: Boca Raton, FL, USA, 2017; pp. 59–80. [Google Scholar]
Kuhn, W.; Kauppinen, T.; Janowicz, K. Linked data-a paradigm shift for geographic information science. In Proceedings of the International Conference on Geographic Information Science, Vienna, Austria, 24–26 September 2014; pp. 173–186. [Google Scholar]
Nalepa, G.J.; Furmańska, W. Review of semantic web technologies for GIS. Autom./Akad. Górniczo Hut. Stanisława Staszica Krakowie 2009, 13, 485–492. [Google Scholar]
Faye, D.C.; Curé, O.; Blin, G. A survey of RDF storage approaches. Rev. Afr. Rech. Inform. Math. Appl. 2012, 15, 11–35. [Google Scholar]
Rohloff, K.; Dean, M.; Emmons, I.; Ryder, D.; Sumner, J. An Evaluation of Triple-Store Technologies for Large Data Stores; Springer: Berlin/Heidelberg, Germany, 2007; pp. 1105–1114. [Google Scholar]
Battle, R.; Kolas, D. Enabling the geospatial semantic web with parliament and geosparql. Semant. Web 2012, 3, 355–370. [Google Scholar]
Buccella, A.; Cechich, A.; Fillottrani, P. Ontology-driven geographic information integration: A survey of current approaches. Comput. Geosci. 2009, 35, 710–723. [Google Scholar] [CrossRef]
Ballatore, A.; Wilson, D.C.; Bertolotto, M. A survey of volunteered open geo-knowledge bases in the semantic web. In Quality Issues in the Management of Web Information; Springer: Berlin/Heidelberg, Germany, 2013; pp. 93–120. [Google Scholar]
Berners-Lee, T.; Dimitroyannis, D.; Mallinckrodt, A.J.; McKay, S. World Wide Web. Comput. Phys. 1994, 8, 298–299. [Google Scholar] [CrossRef]
O’Reilly, T.; Battelle, J. Web Squared: Web 2.0 Five Years on; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2009. [Google Scholar]
Murugesan, S. Understanding Web 2.0. IT Prof. Mag. 2007, 9, 34. [Google Scholar] [CrossRef]
Gandon, F.; Krummenacher, R.; Han, S.-K.; Toma, I. The Resource Description Framework and its Schema. In Handbook of Semantic Web Technologies; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Gruber, T.R. A translation approach to portable ontology specifications. Knowl. Acquis. 1993, 5, 199–220. [Google Scholar] [CrossRef]
Noy, N.F.; McGuinness, D.L. Ontology Development 101: A Guide to Creating Your First Ontology Stanford University. Available online: https://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html (accessed on 8 April 2019).
Xu, Z.; Lee, Y. Semantic heterogeneity of geodata. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2002, 34, 216–224. [Google Scholar]
Zhang, C.; Zhao, T.; Li, W. Geospatial Semantic Web; Springer: Cham, Switzerland, 2015. [Google Scholar]
Battle, R.; Kolas, D. Geosparql: Enabling a geospatial semantic web. Semant. Web J. 2011, 3, 355–370. [Google Scholar]
Perry, M.; Herring, J. OGC GeoSPARQL-A Geographic Query Language for RDF Data; Open Geospatial Consortium, Inc.: Wayland, MA, USA, 2012. [Google Scholar]
Nick, C.; Doerr, M.; Gill, T.; Stead, S.; Stiff, M. Definition of the CIDOC Conceptual Reference Model; ICOM/CIDOC CRM Special Interest Group: Crete, Greece, 2011. [Google Scholar]
Felicetti, A.; Samaes, M.; Nys, K.; Niccolucci, F. AnnoMAD: A semantic framework for the management and the integration of full-text excavation data and geographic information. In Proceedings of the 11th International conference on Virtual Reality, Archaeology and Cultural Heritage, Paris, France, 21–24 September 2010; pp. 123–130. [Google Scholar]
Hiebel, G.; Doerr, M.; Eide, Ø. CRMgeo: A spatiotemporal extension of CIDOC-CRM. Int. J. Digit. Libr. 2017, 18, 271–279. [Google Scholar] [CrossRef]
Parliament User Guide. Available online: http://parliament.semwebcentral.org/ParliamentUserGuide.pdf (accessed on 4 April 2019).
Ravada, S. Spatial and Graph Analytics with Oracle Database 12c Release 2; Oracle Corporation: Redwood Shores, CA, USA, 2016. [Google Scholar]
Erling, O.; Mikhailov, I. Towards Web-Scale RDF; OpenLink Software: Burlington, MA, USA, 2008. [Google Scholar]
Koubarakis, M.; Karpathiotakis, M.; Kyzirakos, K.; Nikolaou, C.; Sioutis, M. Data Models and Query Languages for Linked Geospatial Data. In Reasoning Web. Semantic Technologies for Advanced Query Answering. Reasoning Web 2012. Lecture Notes in Computer Science; Eiter, T., Krennwallner, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7487, pp. 290–328. [Google Scholar]
Scholz, M.; Goerz, G. WissKI: A Virtual Research Environment for Cultural Heritage. In Proceedings of the 20th European Conference on Artificial Intelligence, Montpellier, France, 27–31 August 2012; pp. 1017–1018. [Google Scholar]
Carlisle, P.; Avramides, I.; Dalgity, A.; Myers, D. The Arches Heritage Inventory and Management System: A standards-based approach to the management of cultural heritage information. In Proceedings of the CIDOC (International Committee for Documentation of the International Council of Museums) Conference: Access and Understanding–Networking in the Digital Era, Dresden, Germany, 6–11 September 2014. [Google Scholar]
Cohen, D. Introducing Omeka. Available online: http://mars.gmu.edu/bitstream/handle/1920/6089/2008-02-20_IntroOmeka.pdf?sequence=2&isAllowed=y (accessed on 4 April 2019).
Lagoze, C.; Payette, S.; Shin, E.; Wilper, C. Fedora: An architecture for complex objects and their relationships. Int. J. Digit. Libr. 2006, 6, 124–138. [Google Scholar] [CrossRef]
Simon, R.; Barker, E.; Isaksen, L. Exploring Pelagios: A visual browser for geo-tagged datasets. In Proceedings of the International Workshop on Supporting Users’ Exploration of Digital Libraries, Paphos, Cyprus, 23–27 September 2012. [Google Scholar]
Meghini, C.; Scopigno, R.; Richards, J.; Wright, H.; Geser, G.; Cuy, S.; Fihn, J.; Fanini, B.; Hollander, H.; Niccolucci, F.; et al. ARIADNE: A Research Infrastructure for Archaeology. J. Comput. Cult. Herit. 2017, 10. [Google Scholar] [CrossRef]
Bardi, A.; Frosini, L. Building a Federation of Digital Humanities Infrastructures; ERCIM News: Sophia Antipolis Cedex, France, October 2017; pp. 28–29. [Google Scholar]
Hill, L.L. Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints. In Proceedings of the Research and Advanced Technology for Digital Libraries, Lisbon, Portugal, 23–27 September 2000; pp. 280–290. [Google Scholar]
Acheson, E.; De Sabbata, S.; Purves, R.S. A quantitative analysis of global gazetteers: Patterns of coverage for common feature types. Comput. Environ. Urban Syst. 2017, 64, 309–320. [Google Scholar] [CrossRef]
Laurini, R. Geographic Ontologies, Gazetteers and Multilingualism. Future Internet 2015, 7, 1–23. [Google Scholar] [CrossRef]
Cobb, J. The Journey to Linked Open Data: The Getty Vocabularies. J. Libr. Metadata 2015, 15, 142–156. [Google Scholar] [CrossRef]
Oguz, D.; Ergenc, B.; Yin, S.; Dikenelli, O.; Hameurlain, A. Federated query processing on linked data: A qualitative survey and open challenges. Knowl. Eng. Rev. 2015, 30, 545–563. [Google Scholar] [CrossRef]
Baumann, P. Management of multidimensional discrete data. VLDB J. 1994, 3, 401–444. [Google Scholar] [CrossRef]
Abrate, M.; Bacciu, C.; Hast, A.; Marchetti, A.; Minutoli, S.; Tesconi, M. GeoMemories—A Platform for Visualizing Historical, Environmental and Geospatial Changes in the Italian Landscape. ISPRS Int. J. Geo-Inf. 2013, 2, 432–455. [Google Scholar] [CrossRef]
Agapiou, A.; Lysandrou, V.; Alexakis, D.; Themistocleous, K.; Cuca, B.; Argyriou, A.; Sarris, A.; Hadjimitsis, D. Cultural heritage management and monitoring using remote sensing data and GIS: The case study of Paphos area, Cyprus. Comput. Environ. Urban Syst. 2015, 54, 230–239. [Google Scholar] [CrossRef]
Yue, P.; Guo, X.; Zhang, M.; Jiang, L.; Zhai, X. Linked Data and SDI: The case on Web geoprocessing workflows. ISPRS J. Photogramm. Remote Sens. 2016, 114, 245–257. [Google Scholar] [CrossRef]
Bereta, K.; Xiao, G.; Koubarakis, M. Answering geosparql queries over relational data. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 42, 43–50. [Google Scholar] [CrossRef]
Usery, E.L.; Varanka, D. Design and development of linked data from the national map. Semant. Web 2012, 3, 371–384. [Google Scholar]
Serna, S.P.; Schmedt, H.; Ritz, M.; Stork, A. Interactive Semantic Enrichment of 3D Cultural Heritage Collections. In Proceedings of the VAST12: The 13th International Symposium on Virtual Reality, Archaeology and Intelligent Cultural Heritage, Brighton, UK, 23–24 September 2012; pp. 33–40. [Google Scholar]
Yu, D.; Hunter, J. X3D Fragment Identifiers—Extending the Open Annotation Model to Support Semantic Annotation of 3D Cultural Heritage Objects over the Web. Int. J. Herit. Digit. Era 2014, 3, 579–596. [Google Scholar] [CrossRef]
Maietti, F.; Di Giulio, R.; Piaia, E.; Medici, M.; Ferrari, F. Enhancing Heritage fruition through 3D semantic modelling and digital tools: The INCEPTION project. In Proceedings of the IOP Conference Series: Materials Science and Engineering, Florence, Italy, 16–18 May 2018. [Google Scholar]
Doerr, M.; Tzompanaki, K.; Theodoridou, M.; Georgis, C.; Axaridou, A.; Havemann, S. A Repository for 3D Model Production and Interpretation in Culture and Beyond. In Proceedings of the VAST 2010: The 11th International Symposium on Virtual Reality, Archaeology and Cultural Heritage, Paris, France, 21–24 September 2010; pp. 97–104. [Google Scholar]
Pan, X.; Schröttner, M.; Havemann, S.; Schiffer, T.; Berndt, R.; Hecher, M.; Fellner, D.W. A Repository Infrastructure for Working with 3D Assets in Cultural Heritage. Int. J. Herit. Digit. Era 2013, 2, 143–166. [Google Scholar] [CrossRef]
Hor, A.; Jadidi, A.; Sohn, G. BIM-GIS integrated geospatial information model using semantic web and RDF graphs. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci 2016, 3, 73–79. [Google Scholar] [CrossRef]
Amato, F.; Moscato, V.; Picariello, A.; Colace, F.; Santo, M.; Schreiber, F.; Tanca, L. Big Data Meets Digital Cultural Heritage: Design and Implementation of SCRABS, A Smart Context-awaRe Browsing Assistant for Cultural EnvironmentS. J. Comput. Cult. Herit. 2017, 10, 6. [Google Scholar] [CrossRef]
Castiglione, A.; Colace, F.; Moscato, V.; Palmieri, F. CHIS: A big data infrastructure to manage digital cultural items. Future Gener. Comput. Syst. 2018, 86, 1134–1145. [Google Scholar] [CrossRef]
Ekbia, H.; Mattioli, M.; Kouper, I.; Arave, G.; Ghazinejad, A.; Bowman, T.; Suri, V.R.; Tsou, A.; Weingart, S.; Sugimoto, C.R. Big data, bigger dilemmas: A critical review. Adv. Inf. Sci. 2015, 66, 1523–1545. [Google Scholar] [CrossRef]
Anuradha, J. A brief introduction on Big Data 5Vs characteristics and Hadoop technology. Procedia Comput. Sci. 2015, 48, 319–324. [Google Scholar]
Koubarakis, M.; Kyzirakos, K.; Nikolaou, C.; Garbis, G.; Bereta, K.; Dogani, R.; Giannakopoulou, S.; Smeros, P.; Savva, D.; Stamoulis, G.; et al. Managing Big, Linked, and Open Earth-Observation Data: Using the TELEIOS\/LEO software stack. IEEE Geosci. Remote Sens. Mag. 2016, 4, 23–37. [Google Scholar] [CrossRef]

1	http://wiss-ki.eu/ (last accessed on 9 April 2019)
2	https://www.archesproject.org/ (last accessed on 9 April 2019)
3	https://omeka.org/ (last accessed on 9 April 2019)
4	https://www.w3.org/standards/semanticweb/ontology (last accessed on 9 April 2019)
5	https://www.w3.org/standards/semanticweb/ontology (last accessed on 9 April 2019)
6	http://www.clarosnet.org/XDB/ASP/clarosHome/ (last accessed on 9 April 2019)
7	https://dbpedia.org/sparql (last accessed on 9 April 2019)
8	https://www.blazegraph.com/ (last accessed on 9 April 2019)
9	https://jena.apache.org/ (last accessed on 9 April 2019)
10	https://www.openhub.net/p/useekm (last accessed on 9 April 2019)
11	http://geotriples.di.uoa.gr/ (last accessed on 9 April 2019)
12	https://github.com/GeoKnow/TripleGeo (last accessed on 9 April 2019)
13	https://catalogue.data.wa.gov.au/dataset/heritage-council-wa-state-register (last accessed on 9 April 2019)
14	https://github.com/semsol/arc2/wiki (last accessed on 9 April 2019)
15	https://gephi.org/ (last accessed on 9 April 2019)
16	https://arches.readthedocs.io/en/stable/using-arches-collector/ (last accessed on 9 April 2019)
17	https://www.djangoproject.com/ (last accessed on 9 April 2019)
18	https://metaphacts.com/ (last accessed on 9 April 2019)
19	https://www.researchspace.org/ (last accessed on 9 April 2019)
20	https://omeka.org/s/docs/user-manual/ (last accessed on 9 April 2019)
21	https://samvera.org/ (last accessed on 9 April 2019)
22	https://islandora.ca/ (last accessed on 9 April 2019)
23	http://www.openannotation.org/ (last accessed on 9 April 2019)
24	https://www.w3.org/TR/void/ (last accessed on 9 April 2019)
25	http://www.getty.edu/research/tools/vocabularies/aat (last accessed on 9 April 2019)
26	http://perio.do/en/ (last accessed on 9 April 2019)
27	https://www.w3.org/TR/vocab-dcat/ (last accessed on 9 April 2019)
28	https://www.geonames.org/ (last accessed on 9 April 2019)

Figure 1. Web evolution from personal computer (PC) era to Web 4.0. Source: http://www.novaspivack.com/technology/web-3-0-the-best-official-definition-imaginable. Copyright © 2007 Radar Networks & Nova Spivack. CC BY 2.0.

Figure 2. Example of a GeoSPARQL query to calculate a distance between objects.

Figure 4. Example of a geospatial query in SPARQL.

Figure 6. Workflow: from cultural heritage data to the geospatial semantic web.

Figure 7. Statutory protected places in Perth city, Western Australia. Source: https://catalogue.data.wa.gov.au/dataset/heritage-council-wa-state-register. Copyright © Bernhard Klingseisen, Landgate. CC BY 4.0.

Figure 8. Example of polygon representation of RDF in Turtle syntax in GeoSPARQL.

Table 1. Feature matrix of spatio-temporal triple stores.

	Parliament (in the Case Paired with JENA)	Oracle Spatial and Graph 12c Release 2	uSeekM	OpenLink Virtuosa 7.2.5.1	Strabon 3.0
Core GeoSPARQL Support	Yes	Yes	Yes	Yes	Yes
GeoSPARQL Geometry Extension Support	Yes	Yes	Yes	Points only	Yes
GeoSPARQL Topology Extension Support	Yes	Yes	Yes	No	Yes
Multiple CRS	Yes	Yes	No (Only WGS_84)	Yes	Yes
Origin	Native built with C++ language	Native built	RDF4J (Sesame)	Native built	RDF4J (Sesame)
DBMS as a back-end	Berkeley DB	Oracle Database 12.2	PostgreSQL/PostGIS	Virtuoso DBMS	PostGIS, MonetDB
R-tree index	Yes	Yes	Yes	Yes	Yes
License	BSD Licence	Proprietary	Open Source, Apache Licence 2.0	GPL v2 or Commercial	Mozilla Public Licence
Other	Support for integration with Jena and Sesame. Supports different servlet engines Jetty, Tomcat, Glassfish or others.	Oracle Exadata supports fast computations on triples. Support for integration with Apache Lucene and	Support for integration with Sesame, also supports Quadtree, Geohash	The free version does not include all the functionalities.	Most Powerful Spatio-Temporal RDF Store. Has a custom built stRDF data model and stSPARQL query language
Developed by	BBN Technologies	Oracle Corporation	Open Sahara	OpenLink Software	European FP7 projects ‘SemsorGrid4Env’ and ‘TELEIOUS’
Official Website	http://parliament.semwebcentral.org/ (last accessed on 9 April 2019)	https://www.oracle.com/technetwork/database/options/spatialandgraph/overview/index.html (last accessed on 9 April 2019)	https://www.openhub.net/p/useekm (last accessed on 9 April 2019)	https://virtuoso.openlinksw.com/ (last accessed on 9 April 2019)	http://www.strabon.di.uoa.gr/ (last accessed on 9 April 2019)

Table 2. Free and open source (FOSS) semantic web and the geospatial semantic web-based purpose-built platforms for cultural heritage institutions.

Name	Developed by	Originating Software	Built-in CIDOC-CRM Support	Supported Semantic Web and Geospatial Semantic Web Standards	Supported Content Types and Visualizations	Software Dependencies	Official Website
Wisski	The Digital Humanities Research Group of the Department of Computer Science at the Friedrich Alexander University of Erlangen-Nuremberg (FAU), the Department of Museum Informatics at the Germanisches Nationalmuseum (GNM) and the Biodiversity Informatics Group at the Zoologisches Forschungsmuseum Alexander Koenig (ZFMK)	Drupal–Open Source Content Management System	Yes/Erlangen CRM	Semantic Text Annotations, Can load nearly any ontology, Uses a triple store–ARC (https://github.com/semsol/arc2/wiki) SPARQL	Text, Images, Time bars, maps, Graph-based visualizations are under development	Apache 2.X + Mod Rewrite (for clean URL support) MySQL 5.5.3 or higher PHP Version 5.5.9 or higher Drupal 8.X	http://wiss-ki.eu/
Arches v4	Getty Conservation Institute and World Monuments Fund	Python Django Framework Require.js Backbone.js jQuery Bootstrap	Yes	Extendable other standards can be implemented	Text, Images, Maps Time wheel—a searchable graphical distribution of time data Graph—network visualization based on Gephi Compatible with satellite imagery and online map services Arches Collector—a mobile app to collect heritage data	Operating System: Linux Ubuntu or Windows PostgreSQL 9.6 with PostGIS 2.3 GDAL and GEOS Python 2.7 Yarn Mapnik 2.2 JDK ElasticSearch 5.2.1 CouchDB 2.x	https://www.archesproject.org
ResearchSpace	The British Museum	Metaphacts (https://metaphacts.com/)	Yes	Extendable other standards can be implemented	Text, Images, Narratives Image annotations Image comparisons—overlay and side-by-side Network Visualizations with diagrams	Operating System: Linux Ubuntu 16.04 or MacOS Sierra Git Docker Java 8 Scala Interactive Build Tool Node.js Yarn	https://www.researchspace.org/ https://github.com/researchspace/researchspace
Omeka S	Roy Rosenzweig Center for History and New Media at George Mason University	PHP Zend Framework 3 Doctrine 2 EasyRDF PHPUnit jQuery	No—CIDOC-CRM can be imported separately	Any ontology/vocabulary can be imported	Text, Images, 3D files, Maps 3D Model Visualization	Operating System(s): Linux Apache HTTP Server MYSQL PHP	https://omeka.org/
Fedora 5.0.2	DuraSpace	Java	No—CIDOC-CRM and any other ontologies/vocabularies can be imported separately	Triple Stores for RDF SPARQL	Any type of file format	Operating System(s): Window, Linux, Mac OSX JAVA 8, Servlet 3.0 MYSQL and PostgreSQL are also supported if needed Files can be stored in a local file storage or cloud-based solutions such as Amazon S3	https://duraspace.org/fedora/

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nishanbaev, I.; Champion, E.; McMeekin, D.A. A Survey of Geospatial Semantic Web for Cultural Heritage. Heritage 2019, 2, 1471-1498. https://doi.org/10.3390/heritage2020093

AMA Style

Nishanbaev I, Champion E, McMeekin DA. A Survey of Geospatial Semantic Web for Cultural Heritage. Heritage. 2019; 2(2):1471-1498. https://doi.org/10.3390/heritage2020093

Chicago/Turabian Style

Nishanbaev, Ikrom, Erik Champion, and David A. McMeekin. 2019. "A Survey of Geospatial Semantic Web for Cultural Heritage" Heritage 2, no. 2: 1471-1498. https://doi.org/10.3390/heritage2020093

APA Style

Nishanbaev, I., Champion, E., & McMeekin, D. A. (2019). A Survey of Geospatial Semantic Web for Cultural Heritage. Heritage, 2(2), 1471-1498. https://doi.org/10.3390/heritage2020093

Article Menu

A Survey of Geospatial Semantic Web for Cultural Heritage

Abstract

1. Introduction

2. Web Evolution and the Geospatial Semantic Web

2.1. Data Models

2.2. Geospatial Ontologies and Geospatial Semantic Web Query Languages

2.2.1. GeoSPARQL Vocabulary and Query Language

2.2.2. CIDOC-Conceptual Reference Model (CRM) and CRMgeo

2.2.3. Geospatial Functions in SPARQL Query Language

2.3. Spatio-Temporal Triple Stores

3. Proposed Framework for the Geospatial Semantic Web

4. Free and Open Source Semantic Web and Geospatial Semantic Web-Based Purpose-Built Platforms for Cultural Heritage (CH) Institutions

5. Applications of the Geospatial Semantic Web in CH Domain

5.1. Pelagios Commons

5.2. Advanced Research Infrastructure for Archeological Dataset Networking in Europe (ARIADNE) (Now ARIADNE PLUS)

5.3. Pooling Activities, Resources, and Tools for Heritage E-Research Networking, Optimization, and Synergies (PARTHENOS)

5.4. Geospatially Linked Data in Digital Gazetteers

6. Technical Limitations of the Geospatial Semantic Web

6.1. Raster Data

6.2. 3D Geospatial Semantic Web

6.3. Big Geospatial Data and the Geospatial Semantic Web

7. Conclusions

Supplementary Materials

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI