*Article* **Implicit, Formal, and Powerful Semantics in Geoinformation**

**Gloria Bordogna , Cristiano Fugazza \* , Paolo Tagliolato Acquaviva d'Aragona and Paola Carrara**

Institute for Electromagnetic Sensing of the Environment, National Research Council of Italy, via Bassini 15, I20133 Milan, Italy; bordogna.g@irea.cnr.it (G.B.); tagliolato.p@irea.cnr.it (P.T.A.d.); carrara.p@irea.cnr.it (P.C.) **\*** Correspondence: fugazza.c@irea.cnr.it

**Abstract:** Distinct, alternative forms of geosemantics, whose classification is often ill-defined, emerge in the management of geospatial information. This paper proposes a workflow to identify patterns in the different practices and methods dealing with geoinformation. From a meta-review of the state of the art in geosemantics, this paper first pinpoints "keywords" representing key concepts, challenges, methods, and technologies. Then, we illustrate several case studies, following the categorization into implicit, formal, and powerful (i.e., soft) semantics depending on the kind of their input. Finally, we associate the case studies with the previously identified keywords and compute their similarities in order to ascertain if distinguishing methodologies, techniques, and challenges can be related to the three distinct forms of semantics. The outcomes of the analysis sheds some light on the diverse methods and technologies that are more suited to model and deal with specific forms of geosemantics.

**Keywords:** geosemantics; implicit semantics; formal semantics; powerful semantics

#### **1. Introduction**

Semantics is cornerstone in state-of-the-art data management, notwithstanding the specific domain; without semantics, we would helplessly drown in a deluge of unintelligible Big Data. Let aside the enormous literature on this topic in the field of Linguistics and, even before that, in Philosophy, representing and managing semantics is frequently regarded to as the solution to heterogeneity in data retrieval and exploitation in Computer Science (CS) [1–3]. This paper relates to a specific domain in the landscape of semanticsaware CS, i.e., geospatial information provided in the form of both data and metadata. This is a particularly challenging domain as the non-textual nature of most geospatial data means that the indexing practices of generalist search engines are ineffective; hence the need for semantics representation and management.

Both Sheth et al. [4] and Uschold [5] provide a coarse-grained categorization of semantics; the latter includes the following four categories: (i) implicit semantics, (ii) informally expressed semantics, (iii) formally expressed semantics for human consumption, and (iv) formally expressed semantics for machine processing. In practice, the first three levels fall in the first category defined in [4], which proposes the following classification:


In our opinion, the second classification not only includes the first one but, at the same time, empowers the fourth level of the first by offering two distinct representation classifications, opening towards methods that better mimic the human soft and flexible

**Citation:** Bordogna, G.; Fugazza, C.; Tagliolato Acquaviva d'Aragona, P.; Carrara, P. Implicit, Formal, and Powerful Semantics in Geoinformation. *ISPRS Int. J. Geo-Inf.* **2021**, *10*, 330. https://doi.org/ 10.3390/ijgi10050330

Academic Editor: Wolfgang Kainz

Received: 12 March 2021 Accepted: 1 May 2021 Published: 13 May 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

approaches to reasoning and decision making. This is the main motivation for adopting this second classification method, exporting its concepts in the geospatial domain and reflecting them in the forthcoming Sections of this paper. Albeit there is apparently a broad spectrum of technologies that fall under the umbrella of each of these categories (in fact, Almeida et al. [6] elaborate on the notion of *semantic continuum*), we will discuss their common traits.

As regards high level categorization of the forms of semantics, Gärdenfors [7] distinguishes between "symbolic", "associationist", and "conceptual", providing the latter with a spatial characterization. His *cognitive spaces* feature interesting analogies with notions that are typical of the geospatial domain (e.g., spatial intersection). Still, non-symbolic approaches are mostly contained in the category of implicit semantics according to the classification by Sheth.

The ultimate purpose of this work is to provide the reader with awareness of directions on the main issues, challenges, and possible solutions to address the different categories of semantics defined by Sheth in the domain of geoinformation. In a nutshell, this paper outlines which technologies are more appropriate to consider when tackling a given research problem. The importance of this topic for the geospatial community is attested by the increasing relevance of semantics as the "glue" between heterogeneous thematic domains and also across their individual workflows. On the one hand, inter-disciplinary interoperability requires mapping of the individual terminologies used for annotating data. On the other, effective discovery and provision of geospatial data requires fine-grained characterization of resources (i.e., semantic metadata) not only for data, but also for services, APIs, instruments, data providers, etc.

The hypothesis behind our work is that Sheth's three forms of semantics are also reflected in the geosemantics context. The objective of our paper is then to identify technologies, methodologies, challenges, and solutions that are distinctive for the implicit, the formal, and the powerful geosemantics in order to orient the reader in problem solving. To achieve this, by analyzing recent reviews and editorial papers on geosemantics, we first mine which are the main technologies, methodologies, research challenges, and solutions presented by the authors, regarding them as keywords (Section 2.3).

Successively, we perform a two-step analysis by first discussing selected case studies involving the management of implicit, formal, and powerful geosemantics. The choice of the case studies has been performed by taking into account both their belonging to one of the semantic categories of Sheth (depending on the characteristics of their inputs) and the variety and representativeness of application domains as outlined in [8]. Specifically, the varied and most representative applications to which geomatics can be put include urban planning, disaster management, assessment of biodiversity, and land administration. We then associate the keywords with the case studies and assess whether Sheth's categories are characterized by distinguishing keywords, i.e., specific methods, technologies and solutions, thus allowing for a more distinctive clustering of the keywords with respect to what emerged from the metareview in Section 2.3.

A contribution of this paper is also the methodological workflow we followed in order to characterize the forms of semantics in geoinformation with their preferred/elective approaches.

#### **2. Materials and Methods**

This Section is organized as follows: Section 2.1 details our aim and the workflow we followed to confirm our hypothesis. Section 2.2 explains the categorization of semantics in the main reference work [4] inspiring this paper; then, Section 2.3 presents a meta-analysis of the literature on geosemantics as discussed in recent surveys and review papers. Sections 2.4–2.6 present the case studies we selected according to the criteria expressed above.

#### *2.1. Workflow*

In this work, we aim at investigating whether the three forms of semantics by Sheth et al. [4] can be related to distinguishing methodologies, techniques, and knowledge sources among those found in the literature on geospatial information. This is by no means a foregone conclusion and these distinguishing methodologies may not be the same as in other contexts. In fact, the geospatial domain sometimes diverges from current trends because of its specificities (e.g., proposing service-oriented architectures as opposed to resource-oriented ones).

To this aim, we define the workflow whose main phases are depicted in Figure 1: Top-left, a meta-review of recent surveys of papers illustrating applications of geospatial information management is performed (Section 2.3). The meta-review allows for identifying topics, research challenges, and solutions; these are considered to be keywords and represented on the right side of Figure 2. On the top-right hand side, assuming as starting point of our analysis the aforementioned three forms of semantics (whose definitions are clarified in Section 2.2), we select and analyze several case studies, categorizing them according to these three forms of semantics on the basis of the characteristics of their inputs (Sections 2.4–2.6).

**Figure 1.** Depiction of the workflow followed.

Finally, in order to substantiate the hypothesis behind this work—that Sheth's categories are also reflected in the geosemantics context—the results yielded by the two previous independent phases are cross-referenced in order to compute a similarity matrix on the basis of the keywords associated with the case studies. Specifically, this is achieved by verifying that the intra-similarities (similarity degrees between pairs of case studies belonging to the same form of geosemantics) are greater than the inter-similarity degrees between pairs of study cases classified as different forms of geosemantics. The greater the intra-similarity with respect to the inter-similarity, the more distinctive the methods and technologies characterizing the three forms of geosemantics.

#### *2.2. Three Shades of Semantics*

Looking at geosemantics through the lenses proposed by Sheth allows for categorizing in a minimal set of classes the broad (and ever-growing) landscape of topics (comprising both methods and technologies) that populate this domain. Otherwise, the implications of information source heterogeneity (as far as genre and nature are concerned), information multidimensionality, and domain knowledge dependency easily yield a multiplicity of classes that configures a semantic continuum *à la* Almeida [6]. Since data source heterogeneity, cross-domain interaction, data/process imperfection, and big data volumes are common traits in the geospatial domain, distinguishing between *implicit*, *formal*, and *powerful* semantics allows us to divide the presented case studies in three categories with a clear solution of continuity.

*Implicit* semantics refers to the kind that is implicit in data and that is not represented explicitly in any machine-processable syntax. It is typically related to concepts and relationships between them that are not represented in a formal way but are embedded in multimedia documents, i.e., their "meaning is conveyed based on a shared understanding derived from human consensus" [5]. These can be natural language documents, multispectral images, time series of measurements, video frame sequences, audio recordings, undocumented tabular data, etc. The main objective of extracting implicit semantics is to cope with the inherent ambiguity characterizing it. In fact, terms in texts, visual aspects in images, etc., can mean different things depending on both context and knowledge of people [9]. It should be noted that *implicit* does not mean missing a knowledge-based underpinning but that the latter does not (or cannot) be given a formal representation, such as in the assessment by a domain expert.

In more general terms, we can state that semantics that are represented in some well-formed syntax (governed by syntax rules) is referred to as formal semantics. In 2001, Berners-Lee et al. [10] stated that "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation". As such, the Semantic Web (SW) is the most apparent embodiment of semantics in the field of Internet-mediated contents and applications. Here, the inflection we give to this term is that of *formal* semantics, specifically those provided by decidable fragments of First-Order Logic (FOL) [11]. In fact, "formal" is the category name that Sheth et al. give to this kind of semantics [4], analogous to "formal semantics for machineprocessing" in Uschold's categorization [5]. Explicit representations of formal semantics include knowledge graphs, ontologies, and the like.

Finally, Sheth et al. introduce the concept of *powerful* semantics, intended as formal semantics which is empowered with the ability to represent not only precise and welldefined concepts and relationships, but also imprecise and uncertain concepts and gradual relationships, whose meaning can be subjective, vague, and variable depending on several contextual conditions [12]. The ability of formal frameworks to represent and manage powerful semantics is indeed aimed at performing approximate and qualitative reasoning in order to discover implicit concepts and relationships, possibly uncertain and imprecise too, although accurate enough to be useful to solve some needed task.

#### *2.3. A Meta-Analysis Perspective*

Timothy Tambassi, in his Preface to the book "The Philosophy of GIS" [13], pointed out that the literature on GIS is heterogeneous and scattered, primarily because of the multiple branches of knowledge that use, manage, and create geographic information. This is also true for geosemantics, whose literature configures a conceptual 'forest' of issues, topics, technologies, methodologies, challenges, and solutions where it is easy to loose orientation. To frame approaches in the field of geosemantics, we have taken into account some stimulating overview papers on this subject which appeared in the last decade and tried to examine and categorize the topics, research challenges, and solutions described. It is a meta-analysis exercise that considered the papers described in the following.

Kokla et al. [14] offers a comprehensive review of the contributions that represent a progress in geospatial semantics since 2015; it focuses around two main topics, i.e., information modeling (ontologies and their development) and (latent) knowledge elicitation (from unstructured or semi-structured content, based in particular on textual contents). This paper reviews more than 150 works; among them are papers that present categorizations of methods and approaches to geosemantics, such as [15–19]. Other cited contributions report on the efforts for describing the methods at hand: [20–32]. Furthermore, in this

review the reader can find many works that exemplify the former within a great number of applications; among these [33–55].

Hu [56] provides an overview and a review of important contributions dealing with six major research areas in geospatial semantics, i.e., "semantic interoperability and ontologies" [16,24,38,57–70] , "digital gazetteers" [71–86] , "Geographic Information Retrieval" [32,87–106], "geospatial Semantic Web and Linked Data" [43,107–114], "place semantics" [47,115–121], "cognitive geographic concepts and qualitative reasoning" [70,119–124].

Janowicz et al. [125] is a rich overview of the geosemantics landscape focusing on some selected topics that the authors deem of particular interest; the contributions reviewed are organized according to these. With respect to the question on what kinds of Geospatial Classes should be distinguished, they cite [16–18,63,65,66,68,126–129]; instead, the question on how to reference Geospatial Phenomena is supported by [113,123,130–132]. Discovering events and accounting for geographic change are faced and fostered in [133–136], Handling places and moving object trajectories is dealt with in [70,77,90,133,137–143]. The following papers are cited with reference to comparison, alignment, and translation of Geospatial Classes [15,69,90,144–149]. Finally, the issues raised by processing, publishing, and retrieving geodata are tackled by in [150–156].

The approach changes in [157]: Rather than reviewing papers dealing with projects and issues related to geosemantics, it reviews ideas rooted in cognitive science and linguistics for sketching their application to semantics of geographic information. It discusses notions from 1990 to 2010 and shows why and how these ideas have been productive for dealing with semantics.

We also considered a couple of papers that are not strictly reviews but, in our opinion, are worth being included as they offer a landscape of trends and contributions in geosemantics. Janowicz et al. [112], an editorial paper on the Semantic Web, outlines the research field of geospatial semantics, highlights major research directions and trends, and takes a glance at future challenges. Another editorial paper [158], considers VGI (Voluntary Geographic Information) and claims that geospatial Linked Data and Knowledge Graphs, when used for implementing intelligent data search, can result in precise data-sharing services.

The less recent work we considered is [159], where the author observes that the main approaches to overcome semantic heterogeneity rely on ontologies that, having a priori definitions, are decontextualized. On the contrary, he affirms that semantics reconciliation needs to take into account context-based meanings. Since "meaning and context are dynamically emergent from activity and interaction, determined in the moment and in the doing". He further highlights the limitations of representational approaches. In fact, the latter assume that context is stable, delimited information that can be known and encoded in just another information layer or another ontology in an information system. These are the reasons why this work encourages non-representational modelling formalisms to cope with semantic interoperability in sharing and integrating geographic information.

By analyzing the above overviews, we have extracted a list of terms that the authors pinpointed as topics of interest, research challenges, or solutions, which we regard as keywords. The correspondence between the keywords and the respective originating reviews can be found in supplementary material. The keywords are listed on the right side of the diagram in Figure 2. The list is wide enough to suggest how large is the playground offered by geosemantics.

Still, this list may be biased, being based on authors' views and reviews of a rapidly evolving literature, and some terms can have overlapping meanings. For instance, more recent reviews, such as Kokla et al. [14], produced an increase in this term list, due to the emergence of mobile and social applications, IoT, AI, etc. in the last five years. These research fields introduced novel concepts, such as lightweight ontologies. This increase is also due to the paradigm shift, dating back in 2012 [20], from the general-purpose Web to communities and their specific perspectives, pushed in turn by the movement of Critical GIS [160]. With reference to the notion of Digital Earth, in [161] the authors solicited "a network of theories that fosters interoperability without giving up on semantic heterogeneity". As such, it is possible that more recent works may further populate the list in Figure 2.

In the papers we examined, the authors suggested a grouping of these keywords according to some categories, listed on the left side of the diagram. Some keywords can be related to multiple categories as they can be good suggestions in diverse application scenarios. As an example, term "gazetteers" has been presented in some works as dealing with either "geospatial Semantic Web" or "elicitation of semantic information"; "domain ontologies" have been used in works coping with "geo-semantics formalization" and "semantic interoperability". On the other hand, there are categories that can be tackled with multiple strategies; for example, geosemantics issues falling under category "cognitive geographic concepts" have been dealt with in projects on either "events-change discovery", "place-based GIS", or "qualitative reasoning".

Figure 2 makes it apparent that the categories on the left are not associated with distinguishing topics and solutions on the right, i.e., the reviews did not succeed in letting patterns emerge in the geosemantics "forest", thus making order in the diverse practices.

**Figure 2.** Diagram connecting keywords in geosemantics (**right**) and their categorization (**left**), as found in the reviews taken into consideration.

#### *2.4. Implicit Geosemantics*

In [14], the extraction of implicit *geo*semantics is named "elicitation of semantic information". Under this interpretation, the term is used in a broader sense to encompass processes aimed to make latent knowledge explicit from unstructured or semi-structured contents. These processes focus on eliciting a structured representation of information in various forms, such as semantic metadata, links to ontology concepts, collections of topics, geotagged maps and images, etc. Sources of implicit geosemantics are multimedia documents, in the form of unstructured and semi-structured textual documents, pictures taken from cameras, images from remote sensing, audio and video files. In most cases, metadata are available but are generally insufficient to representing and understanding the contents.

Typically, unstructured texts, posts in social networks, and news streams may refer to geographic names into their contents to describe events, points of interest (POIs), and places. The discipline that extracts geographic contents from unstructured and semistructured texts in order to index them and enable the evaluation of both content and spatial queries is Geographic Information Retrieval (GIR) [87]. Images are another potential source of geosemantic information. Photos may depict geographic places without explicitly mentioning their name or geolocation. With regard to video files, we can consider TV news reporting events relative to specific geographic areas. Finally, remote sensing images may contain representations of the status of the environment with respect to the occurrence of geo-temporal phenomena and events going on in a given area. The segmentation of images in order to extract geographic footprints of places and events can be performed by applying spatio-temporal analysis. The latter is primarily based on (i) domain experts' knowledge; (ii) statistical and machine learning approaches, or (iii) hybrid approaches combining the previous two [162].

Some important challenges of implicit geosemantics extraction within multimedia documents are related to three main objectives:


Basically, artificial intelligence approaches comprising different methodologies (such as soft computing, clustering, genetic algorithms, geostatistic analysis, neural networks, support vector machines, and the like) are applied to extract implicit semantics from multimedia documents. Knowledge bases are used to support the analysis: These may take the form of gazetteers, DBpedia (https://wiki.dbpedia.org/ accessed on 1 April 2021), generic and domain thesauri such as WordNet (https://wordnet.princeton.edu/ accessed on 1 April 2021), geo ontologies, and thematic geospatial information. In the following, we present some case studies focused on to the above challenges which consider different genres of geographic contents (basically, objects, events and moving objects' trajectories) within distinct categories of multimedia documents (textual documents and social media posts).

A synoptic view of the four case studies dealing with an implicit form of semantics is reported in Table 1: Besides the identifier of the case study, its acronym, and a brief description, the table reports the type of input, the method it applies, the type of generated output, and its potential use. It can be noticed that the type of input is either unstructured textual documents or social media documents, a kind of data that typically contain the implicit form of semantics. It can be also noticed that the outputs contain more explicit geosemantics, constituted by geofootprints of documents, spatio-temporal clusters of events, trajectories, and georeferenced placenames.

As for the application domains that are covered by the case studies, that in Section 2.4.1 is related to retrieval of georeferenced information, providing urban planners with effective means for mining knowledge of territorial resources. The case study in

Section 2.4.2 performs trajectory mining to support mobility planning for tourists. The case study in Section 2.4.4 sows that disaster management can be fostered by timely event detection and, finally, the case study in Section 2.4.4 is about geo-gazetteer creation from VGI, in support of land administration.

2.4.1. From "Place" to "Space": Representing Uncertainty of Geoinformation within Texts to Support Geographic Information Retrieval

In [163], a GIR system was proposed that allows for extracting implicit geosemantics within contents of textual documents through the identification of *fuzzy geographic footprints*, i.e., the distinct locations on Earth referred to by documents.

The GIR model applies soft computing methods; specifically, the evaluation of multiple bipolar criteria [164,165] aggregated based on a p-norm operator [166] to extract the fuzzy footprints of documents representing their geographic focus. In a nutshell, some criteria have a positive influence on the selection of geographic names within the text as footprints of a document (for example, when the initial characters of the term is a capital letter, when the term occurrence is close to positive anchor terms such as "street", "city", "nation", etc.). Others have a negative influence (for example when the term is preceded by negative anchor terms such as "Sir", "Mr", "Mrs", etc.).

The prototypical system, has the classic structure of an Information Retrieval System (IRS) [163], consisting of two main components: the *Indexing Module* and the *Retrieval Module*. The Indexing Module has two main sub-modules: the *Full-Text Indexing* and the *GeoIndexing* sub-modules. The former performs full text indexing of the documents to represent their significant contents, and generates the textual inverted index to enable content based searches. Instead, the GeoIndexing sub-module identifies the fuzzy footprints of documents by the support of a knowledge base that comprises both a geo-ontology and a rule-base that encodes the heuristic knowledge required to cope with geo/non-geo ambiguities during geoparsing, and with geo/geo ambiguities during geocoding. An example of geo/non-geo ambiguity is the case of a place name having also a non geographic meaning such as "Nice" (France), "Crema, Brindisi" (Italy), and "Of" (Turkey). Instead, geo/geo ambiguities are due to distinct locations on Earth having the same place name, such as Rome, Paris, London, etc.). The disambiguation rules take into account both the geographic context, based on the shared assumption that "close places are more closely related than far places", and the textual context, based on the consideration that distinct geographic names appearing close in text are also closely related in geographic space. This way, place names within documents are associated with a fuzzy footprint in the geographic space, thus reconciling the two conceptualizations of geosemantics and enabling both content and spatial searchers.

*ISPRS Int. J. Geo-Inf.* **2021**, *10*, 330


#### 2.4.2. Detecting Periodic/Episodic Events from Social Networks with Desired Spatio-Temporal Granularity

The paper by [167] proposes an approach to discover events of interest from social media by modeling the distinct spatio-temporal granularity. The main characteristic of this study is flexibility in detecting events characterized by either an hypothetical periodic or episodic timestamp, thus allowing confirming a priori knowledge of their possible geotemporal regularities. Given a set of sources of spatio-temporal information, such as Twitter, the methodology first performs a focused crawling of the selected social media contents to collect candidate messages related to an event of interest; successively, the collected messages are analyzed by means of an original, density-based spatio-temporal clustering algorithm. The latter is defined by extending the DBSCAN algorithm to group messages densely located in the spatio-temporal domain. Its output is a set of spatiotemporal clusters with arbitrary shapes: these identify the areas on Earth where an event matching the keywords (i.e., the parameters used to filter the messages) occurred within a given time span, possibly with a given periodicity.

The exploration is interactive and multi-granular, allowing analysts to customize not only the topics of interest, i.e., the category, but also the time period and the spatial density so as to fit different spatio-temporal scales. One can specify (i) a set of keywords of interest to filter the messages about an event or a topic (e.g., traffic jam, hurricane, landslide, football match), (ii) the desired granularity of the time period of analysis (such as each day, month, year) and (iii) the desired spatial granularity needed to form a cluster, defined by spatiotemporal density of messages. Each cluster generated by the algorithm can be identified by the list of the most representative keywords that were found in the messages of the cluster, thus representing the cluster's semantics. The use of thesauri [168] helps identify the more general terms expressing the meaning of the specific terms found in individual messages of the cluster. As far as the representation of the geographic footprint of each cluster is concerned, a convex hull can be computed from the geographic coordinates of the messages in each cluster to obtain a polygon representation of the geo-footprint.

#### 2.4.3. Discovering and Summarizing Moving Object Trajectories from Twitter

The work described in [169] proposes an approach to identify, track, and analyze popular tours of tourists visiting a Region Of Interest (ROI) based on the Tweets they publish.

The solution is constituted by two main suites of tools: the FollowMe suite for tourist identification and tracking and the TripsAnalysis suite for popular tour mining.

The FollowMe suite allows users to submit spatial queries to the Twitter API to find *hang* tweets, i.e., tweets posted in the area of the monitored airports. For each user identified by means of hang tweets, the FollowMe suite queries (through the Twitter API) his/her timeline, i.e., the history of tweets posted by the user, to get tweets tracked.

Given a ROI, trips that occur in the ROI are reconstructed and extracted by querying hang tweets and tracked tweets previously stored in the local data base. Reconstructed trips are represented by a list of geographic coordinates, ordered according to message creation time and are exported through the web service interface.

The Trip Analysis Suite performs the activities of knowledge discovery on trips collected by the FollowMe Suite. A knowledge-based trajectory clustering method allows analyzing trips based on customizable semantics. The analyst can specify both the desired granularity and semantics of the analysis by providing a vector layer of geographic slots (geo-slots) of interest. These are drawn from external interoperable sources that the algorithm exploits to conflate the trips' points to ease their grouping. For example, it is possible to conflate and then analyse trips with respect to the visited municipalities, regions, countries, city's neighborhoods, ZIP codes, etc. This way, the algorithm first geo-partitions the trips represented based on the ordered sequence of geographic coordinates into a conflated trip representation consisting of an ordered sequence of geo-slot identifiers, i.e., a string. This way, different geo-slots partitions provide different interpretations, scales, and semantics of the analysis.

The conflated trips can be easily clustered using a complete-link hierarchical trajectory clustering algorithm using a string-similarity matching. Matching is applied to the concatenated identifiers of the geo-slots in the conflated trips' representation. Finally, popular tours can be identified by selecting a partition of the clusters' hierarchy by specifying either a threshold on the minimum desired inter-similarity of conflated trips within a popular tour, or a minimum number of trips that a popular tour must contain.

#### 2.4.4. Creation of Geographic Gazetteers by Volunteered Geographic Information Analysis

Constructing geographic gazetteers is very costly in terms of human effort and, once created, they need to be constantly updated. The work [81] proposes to exploit data science for the extraction of semantic information on toponyms, places, and POIs from big geoinformation created by volunteers on the Web, specifically from geotagged Flickr pictures. The aim is to enrich and update current gazetteers by automatically creating digital gazeeteers of georeferenced place names such as "city center", "shopping district", and POIs associated with keywords and geofootprints. The ultimate purpose is to support diverse applications, such as geographic information retrieval (GIR), digital library services, and systems using spatio-temporal knowledge. The geographic footprints are extracted from the GPS locations of Flicker pictures while place descriptions are distilled from their tags. Close GPS locations associated with similar textual descriptions created by distinct volunteers are assumed as identifying the same place. These locations are generally not perfectly matching but usually have a cluster structure in space. This suggested the authors to use a distance-decaying function to measure the membership of candidate point locations assigned to a place so as to present an intuitive user reputation model for trust evaluation.

#### *2.5. Formal Geosemantics*

The reason the use case in Section 2.5.1 is exemplar to the transition from implicit to formal geosemantics is twofold. On the one hand, it upholds ontologies as the formalization means, offering less constrained expressiveness to the modeling of geospatial entities; on the other, it tackles a research issue, that of next generation maps, that has roots in cartography and, as such, is typically bound to the interpretation of implicit information mediated by the domain expertise of end users. Most applications of semantics to geospatial information use "lowercase" semantics, such as that of SKOS vocabularies [170] which are not harnessing full expressiveness of ontology languages; others mistake RDF encoding for semantics. Instead, it is important to keep in mind that far more expressive modeling criteria (ontology languages) and inference tools (reasoners) exist. Section 2.5.1 provides both a conceptual model for geo-entities and an exemplar implementation.

*Discovery*, in the sense of "retrieval of geospatial information", is largely dependent on metadata. In turn, semantic characterization of metadata is regarded to as the primary means to achieve interoperability [171] in a domain that is otherwise fraught with heterogeneities [14,56]. Unleashing this potential typically amounts to relating metadata items to entities in the Web of Data (the Linked Open Data Cloud: https://lod-cloud.net/ accessed on 1 April 2021), such as terms from SKOS vocabularies, people and organizations in FOAF representations [172], etc. Whereas this step may not be strictly necessary for semantics-aware discovery [173], leveraging on these categories of data structures can easily yield semantics-aware resource descriptions. The advantages of this practice are manifold. On the one hand, these data structures may greatly improve user experience in metadata production. On the other hand, traditional metadata can be enriched in order to enable smarter discovery criteria. This is the focus of Section 2.5.2.

Let aside the aforementioned virtuous data structures, there is a large corpus of web-accessible data structures that does not take advantage of ontologies expressed in OWL/OWL2, such as those mentioned above, or schema languages compatible with them (e.g., RDF Schema). As an example, consider the Microdata that is typically embedded in web pages or the XML/JSON data structures that are often used in the enactment of APIs. Section 2.5.3 proposes creation of "semantic twins" of JSON data structures to allow for

transparently accessing heterogeneous data sources. It should be noted that although we already considered the JSON format in the previous Section, in this context the semantics underlying the JSON data (its implicit schema) is made explicit by the mapping to RDF, assuming an interpretation. Some of the (augmneted) information contained in the RDF data structures could be fed back to the original JSON ones so as to realize a JSON-LD [174] representation of resources.

Finally, Section 2.5.4 describes a model for *semantic mediation* with the aim of improving geospatial discovery, e.g., by exploiting the smarter metadata originating from creation methodologies akin to those presented in Sections 2.5.2 and 2.5.3. In fact, it is apparent that discovery constitutes a "crucial first step" in the enactment of Spatial Data Infrastructures (SDIs) and nevertheless is "mostly neglected and approached following old paradigms" [112]. Beside harnessing the richer information entailed by semantic characterization of metadata, another key objective of this practice is to implement geospatial data management as a machine-processable API, thus fostering FAIR access to geospatial resources [175]. The rationale for this is that it makes little sense to strive for semantic characterization of metadata and not accomplish the last mile toward their full exploitation by automated agents. The synoptic view of the case studies analysed in relation to formal geosemantics is reported in Table 2.

#### 2.5.1. Holistic Map Representation with Geographic Scenarios

The work in [176] illustrates *Geographic Scenarios* [177], a notion developed on the basis of General System Theory [178] integrating spatial, process, and relational information related to geographical elements and georeferenced events. In contrast with reductionist approaches (such as those dividing geo-entities into themes), Geographic Scenarios propose a holistic view that should be better suited to represent hierarchical connections among geo-entities. Moreover, by favoring space over time, state-of-the-art GIS may fall sort of portraying dynamic relationships and causalities.

Basing the conceptual framework of Geographic Scenarios on an ontology allows for expressing multi-hierarchy categorizations and fuzzy boundaries, portraying diverse and complex entities at different scales and dimensions. Geo-characterization is the process by means of which scenarios as well as their individual components are assigned properties and relationships not only on the basis of traditional notions, such as regionalization and classification, but also according to ecology and human-orientation (that are often regarded to as mere thematic dimensions). Events are made first-class citizens in the ontological modeling of geographic scenarios, thus allowing attribution of dynamic relationships between geo-entities.

From a technical viewpoint, the realization that is presented combines relational data with ontology classes and properties by applying SWRL rules [179]; the resulting information is stored in a graph database for querying. Whereas the proposed example does not fully demonstrate the augmented capabilities of geographic scenarios, modularity of the possible semantic underpinning (the ontology) and the scalable solution for storage (a graph database) suggest more extensive implementations.



#### 2.5.2. Ex-Ante and Ex-Post Semantic Characterization of Metadata

In the last decade, our work group has been tasked with the development of the SDI for a national flagship project on marine research. The key approaches were (i) creation of a decentralized network of nodes providing data [180] and (ii) the extensive use of semanticsaware technologies in metadata management [181]. The latter entailed development of a metadata editor that could easily adapt to the ever-changing landscape of metadata formats and profiles [182].

Since no tool in the state-of-the-art allowed for this degree of flexibility, we decided to develop EDI, a brand new metadata editor [183]. Beside allowing for an extremely user-friendly interface for metadata provision, the tool allows for both compliance with any XML or text-based metadata format as well as pluggability of heterogeneous RDF-based resources (made available as SPARQL [184] endpoints) as the reference data sources for providing auto-completion functionalities. This feature allows for the integration of a broad range of third-party data structures (e.g., code lists, controlled vocabularies, gazetteers, and registries) in the Web of Data.

Field values can also be generated on demand, can duplicate the content of another field, and even use generic XPath functions in order to mix-and-match values taken from the output XML document. Finally, this output document can be fed into an arbitrary chain of XSLT transformations (e.g., to generate a text-based output, such as JSON). All these functionalities are governed by a *template*, expressed in XML, that regulates production of the output document, defines the external data sources to be accessed via SPARQL, etc. Please refer to [185] for a comprehensive description of the template language.

Addressing semantic augmentation of metadata at editing-time (i.e., ex-ante) leaves an enormous amount of resource descriptions not featuring this important characteristic. As a consequence of this, important capabilities enabled by semantically enriched metadata (e.g., multilingualism, query expansion) could not be implemented by geoportals in discovery workflows. Then, we started working on offline, ex-post semantic lift of metadata records and realized it was possible to employ templates the other way around to search traditional XML metadata for correspondences in RDF data sources. The resulting application, named Liftboy, is described in [186] and made available on GitHub (https://github.com/IREA-CNR-MI/liftboy-python accessed on 1 April 2021) in its newer, improved implementation.

As a final note, we want to stress the importance of semantic characterization of metadata. Typically, this is seen as a solution to semantic heterogeneity and an opportunity for applying query expansion in information retrieval (in [186] the authors provide examples for both of these). In our opinion, semantic metadata can serve a higher purpose, that of "normalizing" resource description by conflation into a kind of pointer instead of repeatedly duplicating metadata property values (such as keywords, names, e-mail addresses of people, etc.) that frequently lead to inconsistencies, a practice we named *metadata delegation* [187]. It would be easier if all references to a keyword provided by a well-known controlled vocabulary were tagged with a unique identifier for that term (the URI of a skos:Concept [170]), if all references to a researcher pointed to her FOAF record [172], creating a web of decentralized metadata.

#### 2.5.3. Exploiting Non-Rdf Data Structures for Semantic Metadata Creation

This case study builds on a software named SPARQL-Generate (https://ci.minesstetienne.fr/sparql-generate/ accessed on 1 April 2021) [188] that extends the syntax of SPARQL 1.1 [189] with constructs that allow for extracting data from heterogeneous data structures and generating RDF descriptions. The application to the geospatial domain we describe is production of metadata for *samples* (also called *specimens*) in the International Geo Sample Number (IGSN) format [190]. The target data structures are the entities made available by the European Long Term Ecological Research Network (eLTER) in its Sites and Data Registry (DEIMS-SDR) [191,192] (specifically, the entities representing *activities*, *sites*, and *sensors*).

We wanted to build on EDI, the metadata editor presented in the previous Section, but the originating sources are in JSON format and thus could not be directly integrated in the autocompletion functionalities provided by the former. We then decided to create RDF descriptions as signpost for the aforementioned entities and relate samples to them by plugging-in these RDF "semantic twins" in a custom EDI template. Then, the metadata maintainer can access the HTML5 interface generated by the EDI client and select the entities in the originating data structures via the many widgets made available by the software, drawing information from external data structures.

#### 2.5.4. Semantic Mediation for FAIR Access to Resources

This case study considers the articulation of geospatial discovery as a web API in order to make catalogs accessible by automated agents. One may argue that the Catalogue Service for the Web (CSW) by OGC [193] serves this purpose and, of course, when the automated agent knows where the endpoint is and which protocol to use, resource harvesting and search are straightforward. Still, when the agent only knows the homepage of the data provider and no information on the protocol applying, these operations may get difficult to achieve.

The problem (and the link to the subject of this paper, i.e., semantics) is that the Web, as experienced by human agents, is unlike web APIs in that there is a *semantic gap* to be bridged [194] before machines can fully participate. Overcoming this gap requires internalizing the key principles of REST (REpresentational State Transfer) as expressed by Roy Fielding in his Ph.D. dissertation [195]; specifically:


Please refer to Chapter 5 of the dissertation for an explanation of these. The attentive reader may already have spotted how the breadth of this research topic can be extended so as to encompass FAIR (Findable, Accessible, Interoperable, and Reusable) practices [175].

Since their inception, the FAIR principles have been deeply rooted in the notion of machine-actionability. Among the technologies for a machine-actionable Web, it is generally acknowledged that, despite the apparent differences, there is a broad overlapping between REST principles and FAIR practices (FORCE11 Guiding Principles for Findable, Accessible, Interoperable and Re-usable Data Publishing: https://www.force11.org/fairprinciples 1 April 2021). In fact [196], the machine-actionable behaviors of REST match the requirements of (at least) the first three letters in "FAIR", as both recur to specification of semantics for their enactment and both rely on resolvable identifiers.

In order to achieve machine-actionability for geospatial services, the European Plate Observing System research infrastructure [197,198] exploits Hydra [199], an RDF vocabulary that is capable of expressing the mechanics of APIs in a way that is both intelligible to automated agents and also semantically rich. Please refer to the Hydra Core Vocabulary (https://www.hydra-cg.com/spec/latest/core 1 April 2021) for a more thorough descriptions of the features of this formalism.

The potential of this characterization of APIs is apparent. As an example, search for processing services matching a given set of parameters, such as the Normalized Difference Vegetation Index (NDVI) for a specific bounding box can greatly take advantage of semantics-aware service description [200]. Moreover, automated workflow composition on the basis of more precisely defined inputs and outputs can be easier than with other technologies [201].

#### *2.6. Powerful Geosemantics*

There are concepts and relationships in the real world that are intrinsically imprecise and fuzzy, due to their gradual nature. This characteristic is particularly evident in the geographic context, in which natural entities and spatio-temporal phenomena are characterized by blurred and time-varying contours. For instance, it is impossible to encode in a

classic ontology based on OWL vague concepts like "most streets in Naples center are very narrow", which involve some fuzziness for which a crisp definition does not make sense. What is the size of a street that makes it *"narrow"*? This is a matter of degrees depending on a subjective interpretation and, certainly, there is not a crisp transition between a street being *large* and *narrow* that may be agreed upon by all observers. The term *most* means that there are exceptions, i.e., a few streets are large, but its hard to quantify a crisp percentage. Furthermore there may be cases in which one needs to define a fuzzy concept hierarchy, a fuzzy taxonomy, in which a class is a specialization to a degree of several super classes such as "In Italy churches, beside being (1) places of worship, are often (0.8) historical buildings". Furthermore, it may be necessary to define fuzzy relationships between concepts such as in "bell towers are very close to churches".

Another possible source of imperfection occurs when an ontology is used for quality assurance to tag observations such as in Citizen Science (CS) projects. Such projects are at present a common practice to collect geospatial data in many domains such as natural sciences by involving volunteers to create georeferenced observations of objects of interest. A volunteer may be not completely sure about his/her observation, which is the case of epistemic uncertainty. This may happen because (s)he does not have adequate knowledge of the problem or because of deficiencies in the means of observation. This may also happen when the domain knowledge is precise.

Finally, there are more complex situations that may involve both ill-defined knowledge and epistemic uncertainty [202].

To cope with the above issues, powerful semantics approaches are needed which "extend" classic ontologies with the ability to represent and manage uncertainty and imprecision: To this end, the literature proposes soft ontologies [12]. In particular, there are three main groups defined on the basis of the probabilistic, the fuzzy, and the possibilistic or evidential frameworks. They have been adopted for extending propositional logic with probability, possibility, belief, or truth of a statement.

Fuzzy ontologies have been defined to model ill-defined knowledge with several purposes, depending on the kind of imperfection they need to represent and manage in the application [202]. Although a standard representation of a fuzzy ontology is still to come, a lot of researches have fuzzified the existing Description Logics (DL) and have defined fuzzy DL reasoners. The most up-to-date and complete fuzzyDL ontology reasoner has been proposed in [203].

To model epistemic uncertainty, fuzzy ontologies have been defined within a possibilistic framework that deals with certainty and possibility degrees of truth thus modeling the epistemic uncertainty characterizing experts' subjective knowledge and the evaluation of the certainty of this knowledge. To this end, several possibilistic DL reasoners have been defined [204], which allow for representing and reasoning on uncertain statements such as "It is possible that this town is an Historic Area". To this end, each concept, relation, and axiom is associated with a real value *u* in (0, 1] representing its certainty level.

Nevertheless, fuzzy ontologies do not allow to model the time varying nature of concepts and their context-dependent meaning. Specifically, most geographic concepts are represented by prototypes that vary with time: The prototypical modern city to an Italian person has changed during centuries, and it different for Chinese people. Fuzzy set theory cannot completely model how humans use concepts, in particular the fact that their meaning is influenced by context and states that vary with human knowledge in time. To this end, the framework known as state-context-property (SCOP) based on quantum mechanics [205] has been defined to map elements taken from operational foundations of quantum mechanics (like states, measurements, and observables) onto concepts and contexts.

In the following Subsections, we recap three case studies exploiting powerful semantics. Their synoptic view is reported in Table 3. They have been selected as representative of distinct application domains such as the creation of biodiversity observations (Section 2.6.1), remote sensing to aid disaster management (Section 2.6.2), and dynamic urban planning (Section 2.6.3). The first two of them exploit a fuzzy ontology enconding epistemic uncertainty of volunteers

when creating georeferenced observations (i.e., VGI) and the vague and incomplete knowledge of experts when interpreting a phenomenon from remote sensing evidence, respectively. By representing epistemic uncertainty and vagueness of knowledge, it is possible to model the distinct quality of the results of a decision process.

The last case study illustrates the application of the SCOP framework to model retrieval of maps within a GIR with increasing precision, achieved by exploiting the varying states of knowledge of user needs.

#### 2.6.1. A Fuzzy Ontology to Support Volunteered Geographic Information Creation and Search

Within the *Space4agri* [206] project, agronomists surveyed agronomic fields by tagging the observed crops and their phenological growth stages based on an agronomic ontology [207]. In this process, texts or pictures were added to report a difficulty or doubt of the agronomists when selecting a phenological growth stage from the ontology. This is due to different reasons:


This suggested the need for extending the classic ontology-based reasoning by representing the epistemic uncertainty of the agronomists in creating VGI items (i.e., when selecting tags from the ontology [207]). Specifically, volunteers can create georeferenced annotations of crops they are observing in situ with the support of a fuzzy ontology. They are bound to select linguistic predicates, possibly fuzzy, to tag the observed crops and with each selected predicate they can associate a degree *d* in [0,1] representing the overall deficiency of their observation. This way, they can represent epistemic uncertainty due to both limitations of the means of observation (e.g., a far point of view, low resolution of the means of observation) and difficulty of precisely quantifying some properties of the observed crops. The linguistic predicates such as "crop has large leaf", "crop has long stamen", "crop has many branches" describe possibly fuzzy properties of the distinct kinds of crops: For example, a rice crop during its germination can appear with "elongated and thin branches" and "very small seeds". The semantics of these linguistic predicates can be defined by level-1 fuzzy sets (whose membership degrees are numeric in the range [0,1]). The fuzzy ontology can then explicitly represent linguistic concepts in both symbolic form (encoded by the linguistic terms "large", "long", "many") and quantitative form. The latter is expressed by the membership functions defined on the numeric domains of the properties: For example, "large" is defined with a membership function on numeric values in cm. In the fuzzy ontology, compatibility between linguistic predicates is represented by Level-2 fuzzy relations, i.e., fuzzy sets on multidimensional basic domains whose membership degrees are not numbers but linguistic values. Fuzzy relations between linguistic predicates are used to perform approximate reasoning in the fuzzy ontology to automatically classify the crops, possibly into distinct types with different membership degrees. The defect degrees are interpreted as minimum thresholds, i.e., uncertainty levels, on the compatibility degrees between the linguistic predicates so that the final membership to a type of crop is modified by epistemic uncertainty. When formulating queries to the database of georeferenced crop observations, for example, requesting to map "rice crop fields", the stored observations can be mapped Onto different shades of color depending on their membership degrees to type "rice crop", thus accounting for both fuzziness and observation uncertainty.

*ISPRS Int. J. Geo-Inf.* **2021**, *10*, 330


#### 2.6.2. Fuzzy Ontology to Support Remote Sensing Image Interpretation

In remote sensing, Geographic Object-Based Image Analysis (GEOBIA) groups techniques aiming at segmenting and classifying objects and phenomena (represented by groups of pixels sharing common properties) in satellite images based on image analysis procedures that rely on a priori expert knowledge [35]. In recent years, application of ontologies enconding experts' knowledge is emerging [14]. Ontologies are used to associate some perceived concepts with their data representation [35]. A widely applied approach to detect the geographic footprint of environmental phenomena is to compute spectral indexes (SI) maps. SI values integrate reflectance measurements at different wavelengths into a synthetic feature that can highlight some perceived aspects of the phenomenon in each pixel. SI maps are then segmented to identify target phenomena, such as vegetation presence and vigor (biomass presence, Leaf Area Index, Chlorophyll content, etc.), bare soil condition, and soil properties composition, burned areas, water presence, and so on. The segmentation consists of thresholding the pixel SI values by different thresholds specified in the ontology to define the different environmental phenomena.

Nevertheless, using the same ontology to segment a given phenomenon such as "green areas" in a new image may cause inaccuracies with many omissions and commission errors, since the value of the threshold must be tuned depending on several factors, such as the context and observation conditions. In fact, accurate calibration is needed to set a proper threshold for each study area. Thus uncertainty and imprecision must be represented since the kind of knowledge is perceptual by very nature [35]. These are the reasons why powerful semantics approaches are appealing. In fact, these techniques allow for explicit representation of perceptual characteristics of phenomena in images by means of fuzzy ontologies. Thus, they can cope with the limitations of both traditional GEOBIA solutions using ontologies and machine learning techniques requiring huge amounts of training data often unavailable.

In [162], an approach based on powerful semantics was proposed to map standing water areas from optical multispectral remote sensing images. Ill-defined knowledge of experts on the perceptual characteristics of standing water within optical images is represented by defining fuzzy sets on spectral indexes identified as features. The membership functions of these fuzzy sets relax the crisp segmentation thresholds defined in the vast literature on standing water mapping so as to tolerate imprecision and uncertainty. A fuzzy ontology is thus defined describing standing water in terms of fuzzy sets on spectral indexes. For each spatial unit with given values of spectral indexes, partial evidence degrees of standing water are computed by evaluating the membership degrees to the fuzzy sets in the fuzzy ontology. Finally, the partial evidence degrees in each spatial unit are combined by applying a fuzzy aggregation operator, learnt by a shallow machine learning algorithm trained on a small reference data set. Beside not requiring big training data, the approach offers the advantage of explicating the criteria used to map standing water, allowing discovering how many spectral indexes, which of them, and to which extent they contributed to map standing water in each spatial unit. The fuzzy ontology with new fuzzy relationships between fuzzy concepts.

#### 2.6.3. State-Context-Property Framework to Model Human Interaction within a Geographic Information Retrieval System

According to [208], human-computer interaction is based on the exchange of words (or graphical tokens on maps) which are interpreted in the context of the conversation. The words used may originally have a broad meaning; through conversation the context becomes more precise and the concepts obtain more specific meanings. The authors present a proof of concept that shows the selection of several predetermined map types (e.g., street map, political map, map for hiking, ski routes) in a GIR by formalizing their approach in SCOP [205]. Specifically, SCOP is applied to predict an answer to the question: "Which map is appropriate for a given context?" where the context is declined as the intended purpose of the user.

A concept and a context serve as input parameters to the inference model that calculates the collapsed state and returns it. In this collapsed state, probability values for prototypes of the concept can be calculated. A use case is illustrated, in which a user states to a GIR query interface that she needs a map, without stating the kind of map. So far the concept "map" is in ground state, where all maps have some non-zero probability to be relevant. The user then states the intended usage that is to go on a bicycling trip. Now the state of the concept "map" collapses into a bicycling map. The user interaction may continue to indicate the region where the trip is planned, and this new information further restricts the map to an area. The application of SCOP is still at its early stage; it needs further developments and investigations to be practically applied, but its potential is great as far as prototypical modeling of contexts and states is concerned.

#### **3. Results and Discussion**

To organize the material, we started from the notion of semantics as a function that maps the world of syntax onto the world of meaning, in analogy with the studies on denotational semantics [209]. Once put on these lenses, we analyzed the presented case studies considering the original information they dealt with (syntactic objects with a certain amount of semantics), the meaning that is extracted and formalized (the new semantic objects), and the techniques that are applied to map the former onto the latter (the incremental semantic mapping function). This analysis of the case studies is presented in Table 4, where each row resumes one of them.

The first two columns identify the case study by indicating the corresponding subsection and a short name. In the columns that follow, one can find information about the mapping of the input information onto the new semantic objects: Specifically, column 3 contains the description of the input information pertaining the case study; column 4 provides the incremental semantic function that is used to map the original information with partial semantics onto the output with augmented meaning; finally, column 5 indicates the final information, i.e., the semantic domain of the case study. Column 6 indicates the delta between the input and output information; finally, column 7 enumerates the keywords, among those on the right side of Figure 2, that can be related to the case study: The more relevant keywords are in bold font and are assigned a weight w = 2 in the analysis that follows.

Here, complexity degree is intended as the level at which semantics is made explicit in either the input or the output data structures considered by the specific case study. Specifically, the complexity degree is an integer in the range 1–7, following the principle of indiscernibility of Miller [210]. The general criterion for attributing this value is that complexity lower than 4 accounts for objects presenting scarce or no machine understandable information about their meaning; values between 4 and 7 indicate that meaning is more and more machine understandable and processable. For instance, the most simple case is that of unstructured text (complexity = 1), such as in case study 3.1 where input is constituted by free text keywords. The degree increases when more information is added such as in case study 3.2 and 3.3 (complexity = 2) where input is enriched both by the presence of structure (JSON documents) and by geographic coordinates. When the previous information is further augmented, complexity increases (complexity = 3) such as in the output data of case study 3.1 where uncertainty degrees are added. The next step in explication of semantics may involve schema information or categorization of data (complexity = 4). Then, when relationships among the entities (topological, order, metric, broader/narrower) are taken into account, complexity increases to 5. Complexity is 6 when vague and uncertain concepts and relationships are represented. Finally, when information can be generated by approximate reasoning or has fully reached semantic interoperability, complexity is 7.

*ISPRS Int. J. Geo-Inf.* **2021**, *10*, 330


**Table 4.** Dimensions of case studies.

The four case studies presented in Section 2.4 share the same type of input geoinformation, which is essentially not explicit, being dispersed within unstructured and loosely structured texts. In the output geoinformation of these case studies, semantics is made explicit but not always in a standard, interoperable format; because of this, it may be difficult or even impossible to reuse the results in different contexts.

The first case study in Section 2.5 portrays a model exploiting semantics at its full potential, via ontologies. The second applies to semi-structured geoinformation in the form of metadata, possibly compliant with OGC standards. The third case study involves structured (JSON) and semi-structured (HTML) information that lacks the relations between the entities involved (e.g., between descriptions of sensors and the corresponding points of contact) and, in general, can not be easily reused in a Web of Data context. Finally, the fourth case study applies to unstructured information intended to the human agent (i.e., the specification of computer interaction protocols). For each of these, the output is information that can be shared and reused in an interoperable way by enabling querying and retrieval in a Linked Data perspective. The first two case studies in Section 2.6 involve explicit and rich geoinformation in the form of soft ontologies, while the last case study uses the SCOP formalism. All these case studies enable qualitative and approximate reasoning to deduce novel geoinformation automatically.

A preliminary observation that can be made is that complexity of the inputs is lower for the case studies in Section 2.4, medium for those in Section 2.5, and maximum for the case studies in Section 2.6. The same for the outputs. More insights come from crossreferencing of the case studies and the keywords listed on the right of Figure 2, yielding the representation in Figure 3. This last figure illustrates the weighted associations between case studies and keywords: Case studies within the same Section (i.e., associated with the same form of geosemantics) are characterized by shades of the same color (yellow for implicit, blue for formal, and grey for powerful geosemantics). On the x axis, the length of the bar represents the different importance of the method/technique in the case study while the pair hue-color uniquely identifies both the case study and its belonging semantic category. It can be visually noticed that the case studies classified in the same form of geosemantics are mostly associated with distinctive keywords. For example, the case studies in Section 2.6 (powerful geosemantics) are associated with "Non-representational formalisms", "Task ontologies", and "Qualitative reasoning". Nevertheless, some keywords (e.g., "Semantic enrichment/tagging/annotation") are associated with case studies classified in "adjacent" forms of semantics.

To confirm the conjecture suggested by Figure 3, i.e., that the three geosemantics forms are good categorizations for the keywords, we also computed the similarity measure known as Jaccard coefficient between any pair of case studies on the basis of the aforementioned weighted keywords, as shown in Figure 4. The figure clearly shows that the intra-similarities (regarding pairs of case studies belonging to the same form of geosemantics, grouped within the colored rectangles) are greater than the inter-similarity degrees between pairs of case studies classified as different forms of geosemantics (i.e., appearing outside the colored rectangles).

It can be noticed that all case studies have greatest intra-similarity with another case study of the same geosemantics form. Only case studies in the yellow group share some inter-similarity with those of the blue group, which is anyway an order of magnitude lower than the intra-similarity. Specifically, as far as the case studies dealing with the implicit form of geosemantics are concerned, their overall intra-similarity, computed as percentage of shared keywords among all the case studies of the same category, reaches 54.3%, while their overall inter-similarity with any other case study of the others two categories is only 1.7%; as far as the case studies dealing with the explicit form are concerned, they have an overall intra-similarity of 58% and an overall inter-similarity of 2.6%; finally the case studies dealing with the powerful form have overall intra-similarity of 37% and an overall inter-similarity of only 0.9%. These findings confirm our hypothesis that the three forms of semantics are characterized by distinguishing techniques, methods, and knowledge sources in the geospatial domain.

**Figure 3.** Case studies and the keywords representing their main activities and technologies.


**Figure 4.** Jaccard similarity between study cases represented as fuzzy sets of keywords.

Besides revealing the distinguishing features of the geosemantics forms, we also found in this analysis that case studies related to implicit and formal semantics have many activities in common, identified by the shared keywords "Thematic spatial and temporal perspectives", "semantic enrichment/tagging/annotation", "Gazetteers (GeoNames)/temporal gazetteers", and "Geographic Information Retrieval". Formal and powerful semantics share "Semantics-driven user interfaces/interaction paradigms/...", "ontology based information extraction", "Application ontologes", "Ontology for encoding", and "Ontology for modeling". This means that there is not a clear-cut partition between the forms of semantics. This shows that a "semantic continuum" is present, gently blending the groups, moving from implicit to powerful semantics. Conversely, the approaches related to powerful and implicit semantics share no keywords. These findings reveal that the ordering of categories introduced by Sheth [4] also seems to emerge from our analysis even in the context of geographic information.

Figure 5 provides an even more synoptic view on the relations between the keywords and the three forms of semantics, complementing Figure 2 with the findings described in this section. In fact, the figure clearly visualizes that, once the keywords are grouped according to the forms of semantics that are associated with the case studies presented in this paper, they are much more clustered. This means that patterns emerge in the geosemantics "forest", thus making order among the diverse practices.

**Figure 5.** Comparison between the grouping of keywords in Figure 2 (on the right-hand side) and the grouping induced by the three forms of geosemantics (via the case studies) makes it apparent their greater distinguishing power.

Of course, this analysis can be enriched both by extending the meta-review to encompass more methodologies, techniques, and knowledge bases and by analyzing other case studies in the literature. Nevertheless we think that this contribution has the merit of setting a methodological workflow to characterize the forms of semantics in geoinformation and their preferred/elective approaches.

#### **4. Conclusions**

This paper applied the categories of semantics defined by Sheth to the domain of geoinformation in order to orient the reader in problem solving. We first analyzed recent reviews and editorial papers on geosemantics, mining which are the main technologies, methodologies, research challenges, and solutions presented by the authors. Then, we discussed selected case studies for the implicit, formal, and powerful geosemantics, respectively. The two-step analysis culminates with cross-referencing these two sources in order to confirm that the three forms of geosemantics are characterized by distinguishing techniques, methods, and knowledge sources.

The subsistence of this conjecture is attested by the Jaccard distances computed between members of the same/different categories of semantics (see Figure 4). This can also be visually assessed by looking at Figures 3 and 5. In the latter, it is also apparent that there are fringe keywords associated with "adjacent" categories (i.e., categories with similar semantics explicitation degrees). This paper contributes to structuring the approaches to semantics in geoinformation, partitioning the semantic continuum suggested in [6] in discrete, distinguishing techniques and methods.

Further insight may come from categorizing in the three forms of semantics the papers considered in the meta-review (Section 2.3) according to the associated keywords. Future work will also investigate scaling-up of the workflow by applying content representation methods used in information retrieval. In fact, these can automatically identify the keywords from the text of the reviewed literature.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/ijgi10050330/s1.

**Author Contributions:** All authors equally contributed to the writing and revising of this paper. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*ISPRS International Journal of Geo-Information* Editorial Office E-mail: ijgi@mdpi.com www.mdpi.com/journal/ijgi

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34

www.mdpi.com ISBN 978-3-0365-6386-2