Next Article in Journal
Towards Topological Geospatial Conflation: An Optimized Node-Arc Conflation Model for Road Networks
Next Article in Special Issue
A Containerized Service-Based Integration Framework for Heterogeneous-Geospatial-Analysis Models
Previous Article in Journal
Parallel Channel Identification and Elimination Method Based on the Spatial Position Relationship of Different Channels
Previous Article in Special Issue
Multiscale Feature Extraction by Using Convolutional Neural Network: Extraction of Objects from Multiresolution Images of Urban Areas
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Developing a Base Domain Ontology from Geoscience Report Collection to Aid in Information Retrieval towards Spatiotemporal and Topic Association

1
School of Computer Sciences, China University of Geosciences, Wuhan 430074, China
2
Key Laboratory of Mine Environmental Monitoring and Improving around Poyang Lake of Ministry of Natural Resources, East China University of Technology, Nanchang 330013, China
3
Key Laboratory of Geological Survey and Evaluation of Ministry of Education, China University of Geosciences, Wuhan 430074, China
4
Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources, Shenzhen 518034, China
5
College of Computer and Information Technology, China Three Gorges University, Yichang 443002, China
6
Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Yichang 443002, China
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2024, 13(1), 14; https://doi.org/10.3390/ijgi13010014
Submission received: 2 October 2023 / Revised: 24 December 2023 / Accepted: 28 December 2023 / Published: 30 December 2023

Abstract

:
The efficient and precise retrieval of desired information from extensive geological databases is a prominent and pivotal focus within the realm of geological information services. Conventional information retrieval methods primarily rely on keyword matching approaches, which often overlook the contextual and semantic aspects of the keywords, consequently impeding the retrieval system’s ability to accurately comprehend user query requirements. To tackle this challenge, this study proposes an ontology-driven information-retrieval framework for geological data that integrates spatiotemporal and topic associations. The framework encompasses the development of a geological domain ontology, extraction of key information, establishment of a multi-feature association and retrieval framework, and validation through a comprehensive case study. By employing the proposed framework, users are empowered to actively and automatically retrieve pertinent information, simplifying the information access process, mitigating the burden of comprehending information organization and software application models, and ultimately enhancing retrieval efficiency.

1. Introduction

In the field of geoscience, a very large amount of data has accumulated in the process of geological surveys, and various kinds of thematic spatial, geological and literature databases that cover the entire country have gradually been established [1,2,3,4]. In these datasets, there are not only structured data based on the relational model and spatial data based on the geographic information system (GIS) model but also unstructured data based on a large amount of text [5,6,7]. The geological databases under consideration exhibit a notable disparity between unstructured and structured data, with unstructured data representing a significantly larger proportion compared to structured data. This abundance of unstructured data stands out due to its extensive geological content, encompassing substantial amounts of geological information and knowledge [8,9,10,11,12]. At present, unstructured data are stored in different file formats, such as Word, Excel, tables, PDF, etc., and are managed by file-based systems or relational databases [13,14,15]. However, due to the heterogeneous and fragmented nature of unstructured data, using traditional file systems or relational databases to manage the data will lead to inefficient data querying, counting, and updating operations, making it difficult to make effective use of the explicit content information and even more difficult to explore the richer knowledge information implicitly contained in the data [16,17].
Big data research and big data-related technologies provide effective support for the modernization and informatization of geological work and the transformation from “digital geology” to “smart geology” [7,18]. The British and United States Geological Surveys have developed problem-driven big data research and application programs in response to societal needs, and scientists often use a data-driven scientific paradigm to carry out their work using massive, real-time, synthetic, geospatial data for data mining, discovering knowledge, modeling, generating hypotheses and validating results [19,20]. China is also following the trend of the big data era by applying geological big data in various applications such as digital mineral exploration, smart city construction, and disaster prevention and mitigation [21,22]. The variability, robustness, relevance, and composition of geological data are inherently influenced by temporal, spatial, and geological factors. Consequently, the establishment of associations among multiple features, particularly spatiotemporal and thematic attributes, has emerged as a vital focus within the domain of geological big data retrieval and knowledge discovery. Currently, retrieval techniques for structured geological data, such as spatial and attribute data, have reached a relatively advanced stage of development. However, when it comes to unstructured data sources such as vast quantities of geological survey reports and work records, retrieval predominantly relies on keyword-based searches. Regrettably, research efforts directed towards content-oriented spatiotemporal features, subject knowledge extraction, and multi-feature associations remain inadequate. This knowledge gap ultimately impedes the effective retrieval and utilization of the extensive wealth of valuable geological information contained within unstructured geological data sources.
Conventional methods for retrieving geoscientific data, such as keyword and subject classification searches, are primarily reliant on keyword matching, which limits their ability to utilize implicit spatial–temporal relationships and address the semantic heterogeneity arising from differences in knowledge backgrounds and the polysemous nature of natural language [16,17]. These methods also fall short in facilitating spatiotemporal reasoning, multidisciplinary data association, intelligent recommendation, and spatial–temporal information semantic discovery. Furthermore, users’ lack of relevant knowledge or unclear objectives often leads to inadequacies in expressing their true search intents through keywords, necessitating substantial effort in result screening and evaluation, thereby impacting the information selection process. It is evident that traditional search methods are insufficient to meet the increasing demand for both the quality and quantity of retrieval results, highlighting the pressing need for a new approach to navigate the vast sea of data and pinpoint target information.
Addressing the aforementioned challenges hinges on establishing interconnectedness between knowledge and concepts, constructing a knowledge–semantic association network based on explicitation of implicit relationships and semantic reasoning, and cultivating a contextual environment for concepts. This transformation aims to elevate search keywords and diverse data from various domains, sources, and structures from mere strings to nodes within the semantic association network, facilitating precise data discovery, integration, and utilization through semantic matching. This necessitates the computer realization of conceptual object semantics and semantic relationships using human-computer readable expressions.
Regarding the aforementioned challenges, this research presents a novel semantic retrieval framework that combines spatiotemporal and topic features, complemented by a geological domain ontology. The framework leverages geological named entities and locations within structured data to facilitate efficient retrieval. Simultaneously, multiple features encompassing geological time, location, and topic are extracted from unstructured geological data sources. By incorporating these components, the proposed framework establishes a robust indexing mechanism for associating multiple features in terms of spatiotemporal and topic attributes. Subsequently, a comprehensive retrieval framework is constructed, enabling effective retrieval by considering the interrelationships between various features. This innovative approach offers a promising solution for addressing the challenges associated with retrieving and organizing geological data across different dimensions. The main contributions of this article are summarized as follows:
(1)
By employing a top-down methodology, we construct a geological domain ontology library that contains a comprehensive depiction of geological concepts, attributes, relationships, rules, and contextual instances. This ontology library comprises 23 major categories and an extensive array of over 50,000 terms.
(2)
Based on the aforementioned geological domain ontology, we propose an innovative retrieval framework for geological data, emphasizing the spatiotemporal and topic dimensions. This framework facilitates the extraction of multiple features, including geological time, location, and topic, from unstructured data sources. Furthermore, we establish a robust geological data indexing mechanism that enables the association of temporal, spatial, and topic multi-features. Ultimately, this indexing mechanism facilitates semantic search capabilities for geological big data.
(3)
To validate the efficacy of our proposed spatiotemporal and topic retrieval framework, we conduct a rigorous analysis using case studies. This analysis compares the retrieval outcomes obtained through traditional keyword-based approaches with those achieved through ontology-based retrieval methods. The experimental results demonstrate a significant enhancement in the completeness and accuracy of retrieved data following the integration of a geological ontology.
The remainder of our article is organized as follows. Section 2 presents related work on geological data feature extraction and the indexing of geological data. Section 3 demonstrates the detailed framework proposed in this paper. Section 4 describes the base domain ontology construction and information retrieval framework based on a multi-feature and domain ontology. Section 5 presents a case study for ontology-based spatiotemporal and topic-based information retrieval, and Section 6 summarizes the conclusions and discusses some future research directions.

2. Related Work

The current research on the feature extraction of geological data tends to combine specialized domain ontologies with knowledge bases and machine learning techniques to solve the problem of semantic inconsistency caused by cognitive layer-by-layer abstraction and a variety of expressions. Geological domain ontology construction and information extraction methods are at the core of geological data feature extraction. Recently, a large amount of research has been conducted on the study of geological domain ontologies, with Perrin et al. [23] extending the GeoSciML (http://www.geosciml.org/, accessed on 2 May 2015) model with a separate geological chronology description to create ontologies for geological time description and for geological dating to describe the hierarchical relationships between geological time concepts. Ma et al. [24] implemented a graphical interactive ontology construction tool for geological domains and applied it to the automatic annotation of geological maps. Hwang et al. [25] developed a geological spatiotemporal ontology model for rocks and geological ages and implemented spatiotemporal correlation retrieval for different rock types and geological ages. Wu et al. [26] defined the semantics of geological ontology under a cloud computing platform and designed a standard geological information framework and a standard resource integration model. Borges et al. [27] proposed an ontology-driven approach that combines urban ontologies with geocoding services for the identification and extraction of geocodes embedded in web text. Kergosien et al. [28] extracted time, place and topic information from land use planning documents for land-use policy attitude analysis and opinion mining. Ballatore et al. [29] proposed a method for computing geo-semantic relevance and similarity in information extraction and retrieval. Wang et al. [30] extracted spatiotemporal and semantic information from a large number of unstructured web texts by using an ontology of geohazard events. To achieve geographic information extraction and retrieval and based on different semantic granularities, Mata-Rivera et al. [31] designed three ontology-driven query matching layers: temporal, spatial and social.
Based on the aforementioned research, the ongoing advancements in ontology construction and information extraction technology within the geological domain have laid a robust theoretical and methodological groundwork for the extraction of geological data features. Nonetheless, current investigations primarily focus on extracting the semantic features from geological data and largely overlook the untapped potential of harnessing the correlation between geological and spatial data. While numerous studies have emphasized the precision and recall rate of information extraction outcomes, there remains a notable dearth of scientific measurements pertaining to the identified correlation between spatiotemporal and topic multi-features. As a consequence, the extracted information fails to adequately reflect the relevance of professional domain knowledge. Consequently, there is a need to explore the extraction of spatiotemporal and topic-oriented multi-features from geological data, leveraging specialized domain-oriented ontologies and knowledge to augment the overall quality of information extraction. Simultaneously, there is a pressing need to investigate the establishment of multiple association network models for geological topics, thereby enhancing the interpretability and applicability of the extracted information.
Regarding the indexing of extensive geological datasets, several scholars have undertaken diverse investigations pertaining to spatiotemporal databases, resulting in significant findings. For example, Ke et al. [32] established an HBSTR tree based on a pyramid structure for four-dimensional spatiotemporal data. Wang et al. [33] proposed the RT-CAN index structure in the cloud architecture to handle fast queries in different applications. Dittrich et al. [34] proposed an invasive index repository scheme to reduce the overhead problem of building distributed indices for big data. Wang et al. [35] introduced a learning-to-hash framework in the process of building indices for big data.
Ontology-based semantic retrieval has become a current research hotspot [36,37,38,39,40]. In recent years, many research institutes have carried out in-depth studies on the theory, methods and applications of ontology-based intelligent information retrieval systems [41,42,43,44,45,46]. These studies mainly focus on the following aspects of ontology-based intelligent information retrieval systems: (1) ontology-based user query processing methods; (2) semantic annotation and indexing methods; (3) ontology-based information retrieval methods and models; (4) the acquisition and expansion of ontological knowledge required by the information retrieval system; (5) the application of ontological reasoning techniques in information retrieval; (6) the frameworks of ontology-based intelligent information retrieval systems; and (7) the application of ontology-based intelligent information retrieval systems. For example, Yoo et al. [47] proposed a hybrid query processing method that adopts query rewriting and reasoning methods to deal with frequently changing knowledge and non-changing knowledge, respectively, with the help of a domain ontology and achieves effective information retrieval. Kallipolitis et al. [48] proposed a semantic retrieval method for world news domain information and obtained a high search rate with the help of an established world news ontology and domain heuristic rules. Hourali et al. [49] proposed a two-level uncertainty fuzzy ontology-based intelligent information retrieval method and introduced fuzzy logic in the construction of the ontology, which solved the problem that the general ontology could not adequately represent the uncertain information in the domain. Lim et al. [50] proposed a product information retrieval method based on semantic annotation of a product family ontology; when the user query involves more than one aspect of the product, the use of this method can obtain better retrieval results. Wiegand et al. [51] presented a task-based and semantic web approach to find geospatial data, and the purpose of the project was to improve data discovery and facilitate automatic retrieval of data sources. Sun et al. [52] proposed a unified framework for a geospatial data ontology, denoted GeoDataOnt, to establish a semantic foundation for geospatial data integration and sharing. Liu et al. [53] proposed a new retrieval method to retrieve geospatial data based on a knowledge graph constructed from heterogeneous geospatial data and encyclopedias.
The geological big data description relationship is complex because different topics of the geological body description model can express the same geological survey area, and the same model query process may be in a different expression form. Geological bodies have different lithologies, stratigraphies, fault models, etc., geometric forms are different, spatial distributions are not uniform, and the existing index model for this complex geological body description model has an insufficient organization ability. Volume is another important characteristic of geological body description information. Although studies have been conducted to establish effective indices for spatiotemporal information, which largely improves the query efficiency, most of the traditional indices are based on distance aggregation, and without considering the geospatial inclusion, the relevance of geological topics and the knowledge characteristics of the domain cannot directly provide services for geological big data information queries.
Note that since the geological reports selected in this paper are in Chinese, they need pre-processing and word segmentation, and the algorithm used for word segmentation is a deep learning-based model [2,3,54]. The research of this paper will provide model support for the analysis of multilinked geological data and will help to improve the ability of geological information association retrieval and knowledge discovery.
In this research, we provide a building geological knowledge ontology to formalize the geological management knowledge and investigate its link with the geological body to advance the interaction concerning types, states, connections, and geometric aspects. The ontology’s goal is to make it simple for computer programs to find, query, and distribute geological information.
A basic geological ontology was selected as the area of focus for demonstrating the effectiveness of the developed ontology in order to limit the scope of demonstration and validation in this paper. This is because a basic geological ontology can provide common concepts, terminology, and rules for other geological ontologies.

3. Research Methodology

Ontologies are used for geological knowledge modeling for a variety of reasons. They may be shared and utilized to connect data from many knowledge domains. Ontologies assist with reasoning and consistency testing. Additionally, the classes and characteristics of an ontology can intuitively express the ideas employed in explicit geological knowledge and their semantic relationships.
A computer-interpretable model of the specifications is necessary for automated reasoning about them. An ontology-based semantic modeling of geological requirements is investigated in order to make geologists’ creation of geological papers simpler and more effective. Figure 1 depicts the specific study tasks, which are explained in the next sections.

3.1. Defining the Purpose and the Scope of the Geological Ontology

The purpose of developing a geological ontology is not only to formalize the current geological knowledge, but also to support geological knowledge management and retrieval. An ontology should therefore support the integration of the knowledge with geological information models.
Most of the existing geological ontologies are concepts or terms for a certain field or discipline. The ontology constructed in this study is designed to support the query and retrieval of a large amount of geological report contents, so the constructed ontology contains basic geological ontology, spatial ontology, and time ontology.

3.2. Ontology Capturing and Coding

The knowledge sources considered for identifying relevant concepts and coding the geological ontology include the books, journals, and geological reports. These include massive professional geological reports, including regional geological reports, mineral resource reports, remote sensing geological reports, hydrogeological reports, engineering geological reports, etc. An automated reasoner is used to examine and confirm the ontology’s coherence in order to develop and maintain a meaningful, accurate, and minimally redundant ontology. In the created OWL-based ontology, a Description Logic (DL) reasoner is capable of carrying out a number of automated inferencing tasks, such as assessing whether or not the ontology contains inconsistent classes. Since human consistency testing would take a great deal of time, automated consistency checking is essential. It aids in evaluating the ontology’s overall consistency. In order to assess the ontology content further, interviews with subject-matter experts were also undertaken.

3.3. Semantic Web Rule Language Rule Development

The Semantic Web Rule Language (SWRL), which combines OWL, DL, or OWL Lite with a portion of the Rule Markup Language, is a language that may be used to describe rules, as well as logic. Rule-based reasoning (RBR) is a reasoning process that expresses the empirical knowledge of a domain expert in the form of rules that contain problems and solutions and then uses the knowledge to simulate the reasoning process that experts use to solve a problem. The knowledge graph represents the basic geological correlation knowledge, which is the basic premise for providing inference facts, and the formulation of SWRL rules is the requirement for providing inference. Using the inference machine, the geological knowledge graph and the corresponding SWRL rules are integrated and imported, and the implied knowledge in the knowledge graph is inferred, and the actual inference results that satisfy the conditions are fed back to the decision maker, who makes decisions based on the inference results. The rule-based reasoning method has good interpretability, and the implicit knowledge is mined and reasoned out through custom rule reasoning to improve the emergency decision-making effect. It is therefore possible to have industry experts adjust existing SWRL rules or create new rules in Protégé [56]; however, it may require some additional training if Protégé is new to them.

3.4. Ontology Validation and Improvement

An ontology-based case study is developed to automatically search geological object related spatiotemporal, geological age, and attribute information. The application’s input, in the form of user comments on the validity and precision of the novel method, is then used to inform the development of the ontology in order to achieve additional advancements. The end result is the generation of geological ontology. Our ontology-based information retrieval application’s system architecture consists of an ontology editor, a reasoner, and a rule engine (see Figure 2).

4. The Base Domain Ontology Construction and Information Retrieval Framework Based on Multi-Feature and Domain Ontology

The geological data exhibit a wealth of topics, diverse expressions, and challenges in feature extraction. The construction of geological domain topics at varying levels of granularity, the extraction of information from unstructured geological data characterized by its vastness and high dimensionality, and the correlation of spatiotemporal and thematic multiple features within the geological data constitute critical aspects for knowledge discovery. To address these issues, we integrate ontology theory into the content retrieval framework as an auxiliary solution.
Our ontology is based on the GeoCore ontology [57] and has been expanded and supplemented to achieve semantic query and retrieval of massive geological data. The GeoCore ontology includes well-founded definitions of a restricted collection of generic concepts within the geology discipline that are now taken into account by all geologists, regardless of their skill level. It enables modelers to consider a geological item, the substance that makes it up, the limits that restrict it, and the internal organization of the objects within it individually. The core ontology also allows for the description of existentially dependent attributes associated with a geological item and the geological process that produced it in a certain geological epoch. This modest set of formally defined and documented ideas, paired with concepts from the Basic Formal Ontology (BFO) [58], serves as a foundation for generating more specialized geological concepts through subsumption and also serves as a foundation for merging multiple existing domain ontologies inside the geoscience domain.

4.1. Basic Ontology Construction

4.1.1. The Geological Ontology

The goal of building a geological ontology is to acquire, describe and represent geological domain knowledge, summarize the vocabulary of concepts commonly known, give clear definitions of concepts and interrelationships between concepts from different levels in a formal way, and establish a geological domain knowledge system based on common understanding (see Figure 3).
Geological ontology construction is a systematic process involving a huge amount of geological knowledge. According to the requirements of geological report topic retrieval, the concepts, instances, attributes, relationships and rules involved in the topic geological data are determined, and the ontology modeling software is used to construct an ontology model based on the OWL file format. In order to ensure the accuracy, comprehensiveness and feasibility of the topic retrieval information, only the topic knowledge related to basic geology is targeted in the ontology construction process, and an extensible and updatable ontology model is provided so that it can be continuously revised and improved in practical applications. According to the framework of the geological ontology, the corresponding ontology model is built in Protégé (https://protege.stanford.edu/, accessed on 4 September 2022): the hierarchical structure of geological data objects and their relationships are established in the Classes module; the types of relationships and the hierarchical structure of the geological ontology are defined in ObjectProperties; and the properties of the relationships are defined according to specific conditions, such as transitivity and symmetry. We construct the instances of corresponding concepts in Individuals, and express the relationships, properties, and rule constraints of different instances clearly. We use Protégé’s own reasoning machine to complete the inference of potential relations and generate the ontology model in OWL format.
During the process of ontology modeling, the inclusiveness of the knowledge system description and the reasoning capability based on ontology are strengthened with the construction of more relationships and their associated attributes. However, it is important to acknowledge that this augmentation in capabilities comes at the cost of increased modeling workload. Consequently, it is essential to construct and refine relationship descriptions in accordance with specific requirements to enhance the modeling efficiency and optimize the retrieval performance indicators.
In this study, we introduce a geological ontology and provide a formal description of the concept of geological entities, the interrelationship of geological activities, and the properties and patterns that characterize the geological domain [57,59]. A geological ontology for content retrieval and discovery systems will help to eliminate some of the differences in geological concepts and terminology, thus reaching a consensus on the conceptual understanding within the geological domain and giving clear definitions of the interrelationships between terms at different levels of the formal model [60,61,62].
A domain-geoscience ontology is constructed using the Protégé open-source toolkit (https://protege.stanford.edu/, accessed on 4 September 2022). This platform was selected for its flexible environment, which supports easy importing, editing, visualizing, and exporting of ontologies. A plethora of established and widely recognized top ontologies are available within the existing literature. Notably, these include the SUMO [63], DOLCE [64], BORO [65], UFO [66], GFO [67], and BFO [58]. These ontologies have been extensively utilized and referenced within the academic community, signifying their prominence and relevance in the field. Within the geosciences field, numerous ontologies have been developed to enhance knowledge organization and contextualization. For instance, the SWEET ontology [68,69] emerged from NASA’s efforts to effectively manage an extensive volume of planetary data. Serving as a top-level ontology, it encompasses entities within the universe, furnishing a framework that encompasses and contextualizes geological features. Additionally, several domain-specific ontologies have been curated in the realm of geological sciences. Notable examples include the fracture ontology by Zhong et al. [70], the geologic time scale ontology by Ma [71], the Structural Geo-Ontology introduced by Babaie et al. [72], and the Simple Lithology ontology. Unlike SWEET, these domain-specific ontologies offer intricate knowledge representation that caters to specific areas of interest within the geological sciences. Collectively, these ontological advancements play a vital role in facilitating comprehensive understanding and fostering specialized investigations within the geosciences discipline. In this work, we extracted and reorganized all of the geological concepts: superClassOf, subClassOf, equivalentClassOf and relatedClassOf. This enables the computer to recognize and understand the geological concept relationships contained in the ontology as a prerequisite for knowledge reasoning (see Figure 4). In the geological content retrieval system, the geological ontology will serve as the foundation of the system, providing semantic and intellectual support for the retrieval part of the geological domain. In the context of geological knowledge, through the contextual relationships between concepts in the geological ontology, the subdivision of the retrieved terms can be extended to discover further conceptualized content; through the equivalence relationships, the equivalents of the retrieved terms can be extended to discover further synonymous and dissimilar concepts and content; and through the correlation relationships, the associated concepts of the retrieved terms can be extended to discover further extended concepts that are related to the current retrieved terms. The correlation relationship allows for the expansion of the concept of association of search terms and the further discovery of expanded content that is related to the current search term in a geological context.
Our aim is to construct a basic geological ontology that covers as much basic geology as possible. Detailed descriptions and analyses of terminology, hierarchical relationships, object attributes, and data attributes are provided in each category. Upper-level classes in the domain-geology ontology describe the most abstract entities and contain twenty classes: Surveying and mapping for prospecting geological and mineral resources, Stratum, Geological history, Engineering geology, Paleobiota, Geological structure, Environmental geology, Geochemical exploration, Marine geology, Minerals geology, Exploring opening, Mineralogy and crystallography, Geological mapping, Hydrogeology, Geophysical exploration, Geological remote sensing exploration, Rock, Geological hazard, and Drilling engineering. For example, SedimentaryRock has a part-of relation with Rock and includes the subclass, HydrogenicRock. HydrogenicRock includes the subclass ClasticRock, which includes AlluvialConglomerate, BasalConglomerate, BoulderConglomerate, etc.
A partial view of the rock ontology is shown in Figure 4, and a partial view of the mineral geology ontology is demonstrated in Figure 5.

4.1.2. The Spatial Ontology

Geological studies inherently possess a spatial dimension, and geological survey reports encompass substantial content that is centered around spatial locations. Consequently, it is imperative for a geological content retrieval system to incorporate spatial information, as it is a crucial component. During a geological survey, the selection of the survey area is typically based on a predefined plan. As a result, the content within geological survey reports is commonly analyzed and described in relation to specific areas. It is therefore a prevailing practice to employ location as a search term or filter when conducting geological content retrieval. In the textual content of geological survey reports, there is a large amount of information on place names, which reflects the spatial properties of geological document fragments. In the preliminary data processing section, we extracted the place names of each geological document fragment and organized them in a separate field to facilitate content discovery or filtering by place names during content retrieval. For example, if we search for “volcanic rocks in Xinjiang”, the expected result is all volcanic rocks in Xinjiang, but if we simply use “Xinjiang” as a filter, then some fragments that belong to Xinjiang in terms of its administrative division but do not explicitly express the concept of Xinjiang in the text will be filtered out. For example, the fragment containing “Arjin volcanic rocks” is in fact part of the Xinjiang region, and “Arjin volcanic rocks” is semantically included in the concept of “volcanic rocks in Xinjiang”, but due to the lack of a semantic relationship in the place name, the corresponding results are mistakenly filtered. Based on the above considerations, in this study, we have developed a spatial ontology to address this issue.
Several organizations have developed spatial ontologies, including GeoNames (https://www.geonames.org/ accessed on 1 December 2023), a globally comprehensive free gazetteer. The GeoNames Ontology serves as a schema layer for the GeoNames Gazetteer, providing an ontology description of its terms. This repository currently houses approximately 25 million geographical names corresponding to approximately 12 million geographical entities worldwide. Each geographical entity is characterized by 19 fields, encompassing unique identifiers, place names, latitude and longitude coordinates, entity types, and more. Utilizing crowdsourcing, GeoNames has constructed this geographical name database and represented it in RDF format. However, it is worth noting that the GeoNames ontology lacks a systematic quality assessment of its knowledge due to the inherent challenges associated with crowdsourcing, such as knowledge errors, inconsistencies, and other issues, despite employing permission controls and other measures to mitigate these concerns.
The primary objective of constructing a spatial ontology is to define the hierarchical relationships among geographical entities, aiming to help computer systems comprehend the inclusion relationships of administrative divisions. This, in turn, enables automatic discovery of subregions within a given region and extends the applicability of geographical names to assist in content querying and filtering. In this research, an extensive collection of administrative region names in China, ranging from village to street levels, was gathered. Subsequently, a top-down approach was employed to establish relationships for provinces (autonomous regions), cities, counties, districts (towns), communities (townships), and streets (villages), as illustrated in Figure 6. Leveraging the subregion-of relationship, the names of administrative divisions below a specific region can be expanded. During content retrieval, searching and filtering can be performed using these expanded place names, thereby identifying geographically and semantically related content fragments and effectively enhancing the recall rate of the retrieval system.

4.1.3. The Time Ontology

Geological time (geological age) includes two types of time points (e.g., the time of paleontological emergence) and time periods (e.g., the Cenozoic era), and has various time characteristics such as multiscale, fluctuation (periodicity), uncertainty, and various time relations, such as separation and intersection.
In this paper, based on the time ontology in the general foundation ontology, we establish the geological age ontology model with the concept of geological age time as the core and chronostratigraphy, biostratigraphy, lithostratigraphy and other contents as the basis of application (see Figure 7). Among them, the base time ontology serves as the foundation layer of the geological age ontology, providing basic predicates, concept definitions and other contents for time relations, attribute descriptions, etc. Geological age and global boundary stratigraphic sections and points (GSSP), as the domain layer of the ontology, provide the core temporal concepts and attributes of the ontology, such as Eon, Era, Period, Epoch, Age and Chron, and their temporal sequences and attributes, such as temporal coordinates and geographic coordinates of standard sections. Stratigraphic units, such as chronostratigraphy, biostratigraphy, lithostratigraphy, etc., serve as application layers, which are the key elements of the ontology for broader geoscience data application services.
Temporal attributes include geological age, start time (time of biological emergence, etc.), end time (time of biological extinction, etc.), rotation period, error, etc. The semantic representation of temporal attributes is mainly the expression of temporal attribute values. Geological concepts can have one or more time description objects, and each time description object can have a time type, as well as a time direction and a time value object, such as million years (Ma). The time-value objects express specific time values and time errors through the predicates “hasValue” and “errorValue”.
Temporal relations include temporal topological relations and temporal attribute relations. Temporal topological relations describe the temporal relations between time entities (time points, time periods, and composite entities of time points and time periods) in terms of vertical “inclusion”, horizontal intersection (e.g., rock formation and, biological time penetration phenomena), and early and late sequences, which are the basis of the temporal correlation study of different geologic objects. A partial view of the geological time ontology is shown in Figure 8.

4.2. Geological Ontology Evaluation

Ontology assessment may be divided into two types: form-based (syntax) evaluation and content-based (semantic) evaluation. The Pellet reasoner was used to ensure the consistency of the created ontology. To allow automated reasoning, a form-based examination of whether the ontology being created is appropriately expressed in terms of its form/syntax is necessary. However, content-based assessment is required to determine if the ontology accurately covers the target domain.
This research examined two types of content-based evaluation methods: agreement-based and task-based. The proportion of agreement that experts have with regard to ontology elements and structure is used to measure the agreement-based evaluation. The task-based assessment examines what domain tasks an ontology must serve and how effectively the ontology supports these activities. It assesses an ontology’s fitness to objectives, preconditions, postconditions, constraints, and alternatives. The produced ontology is examined through interviews with topic experts and subsequently tested as a task-based evaluation using the built application.
To evaluate the content and structure of the ontology, individual interviews were conducted with geological experts. The survey encompassed 10 participants, collectively amassing 189 years of practical work experience (refer to Table 1). The evaluation procedure was divided into three distinct sections. First, the taxonomy, relationships, and axioms of the geological ontology were presented to the safety professionals. Subsequently, an open discussion took place to elaborate on the intricacies of the ontology and encourage constructive feedback. Lastly, participants were asked to evaluate the ontology through an online survey following the interview process.
The responses of experts were recorded using a Likert six-point scale, with one being the most positive. The findings show that (1) participants rate the concepts employed as “extremely familiar” to “familiar”; (2) participants feel the ideas and connections utilized in the ontology are “representative” of geological knowledge; (3) participants found the navigation through the ontology to be “simple”; and (4) participants “agree” that the ontology covers the key concepts and relationships in the area of geology. Some of the main survey results are shown in Table 2.

4.3. Information Extraction Based on the Domain Ontology

The use of ontologies rather than linearly structured lexicons or word lists in information extraction allows for the understanding of the extracted content at a semantic level. By associating ontology instances with the extracted content, the semantic annotation of the extracted content can be accomplished.
Domain ontology-based information extraction is not dependent on the document structure and can achieve high precision and recall as long as the domain ontology is strong enough. At present, there is no widely accepted engineering method for building ontology modeling, so most of the construction occurs manually. In this paper, we use the automatic acquisition of structured attribute information in spatial databases to form a gazetteer database and a geological body dictionary database to expand and build the existing geological domain, topic, gazetteer and geological time ontologies to accommodate the extraction of multiple features of geological information, as shown in Figure 9.
Sentences are selected as the semantic annotation granularity, and explicit knowledge fragments are extracted from the text information based on the domain ontology. Considering that a geological document may contain multiple topics, if only a statistical approach is used to extract knowledge fragments based on the weights of paragraphs, some fragments may be lost. Therefore, this paper proposes using sentences as the basic unit for paragraph extraction through sentence coherence and relevance and to perform dynamic paragraph segmentation and extraction of text to obtain coherent paragraphs that are most relevant to the topic. The key sentences in the documents are extracted as summaries using statistical methods and heuristic rules, guided by the document topic and concept similarity.
The specific procedures are outlined as follows. First, the text undergoes word segmentation, where prepositions, dummy words, and other terms irrelevant to the geological ontology are filtered out. This results in the selection of key nouns that accurately represent the content, forming keyword feature vectors for each sentence. Subsequently, employing the geological ontology, the sentences are subjected to semantic similarity analysis. Based on the obtained similarity values, the text sentences are clustered, leading to the identification of a set of semantically independent classes. In the final step, the most significant sentence from each class is extracted as the representative sentence, and these representative sentences are interconnected to form the overarching theme of the text.

4.4. Multi-feature Linked Geological Data Indexing Model

To realize the integrated query from space to topic and from topic to space for massive geological data, this paper constructs a topic-space index model by establishing the linkage between unstructured geological data and structured spatial data, as shown in Figure 10.
The first step is to build a geological domain topic index tree guided by the geological domain ontology, where there is one index node for each geological topic. There are two types of node structures, leaf nodes and nonleaf nodes (except root), which have slightly different structures. A nonleaf topic node contains its topic flag, a pointer to its child nodes and other information. Leaf nodes, on the other hand, contain a pointer to the index table in addition to the topic flag. The i-th topic node of an index tree is a coarse-grained representation of the i+1st node to which the topic node points; conversely, the i+1st topic node is a fine-grained representation of the i-th node. The index table is also the mapping table between the list of instances of a topic and its corresponding topic semantic index entry, which can be stored as a hash index. By querying this table, the topic index corresponding to the topic instance can be retrieved.
In each example, the “geological data location” points to a reverse index file, which records the paragraph location of the subject in the geological data and the ID of the data, locates the subject in the geological data through the reverse index item, and associates the attribute information describing the topic in the geological field with the structured spatial data. Spatial location information with structured spatial data is used to establish a link between geological document data and structured spatial data, thus realizing “text to text”, “text to map” and “map to text”. This enables the integration of queries from space to topic.
In each inverted index item, in addition to the document location where the topic appears, the association score of that provenance with the topic term is also recorded; the more forward-ranked index items in the inverted index sequence have higher scores and greater associations with the topic; conversely, the more backward-ranked they are, the less relevant they are.

4.5. A Framework for Geological Data Retrieval That Consider Spatial and Topic Multicorrelations

In this paper, a spatially and thematically multilinked geological data retrieval framework is constructed to enable the three parties—researchers, data providers and geologists—to work collaboratively to complete the retrieval task. The retrieval framework is shown in Figure 11 and consists of four modules:
(1)
Ontology design: geologists use the ontology editor to build domain ontologies and use the SWRL to design retrieval workflows for different types of retrieval questions.
(2)
Ontology catalog: data service providers publish geological data maps and geological subject information services with semantic annotation in the corresponding domain ontologies.
(3)
User interface: geologists ask search questions and seek geological knowledge.
(4)
Ontology engine: the ontology engine parses the retrieval questions submitted by geologists and, through the topic reasoning function of the ontology, discovers matching map services to solve the retrieval questions using the retrieval workflow designed by geological experts.
As shown in Figure 12, the domain design includes the following ontologies: problem ontology, geological domain ontology, GIS ontology, web service ontology, problem type ontology, GIS function ontology, and processing model ontology.
The question ontology defines the format of a query request in a geological information retrieval system. The query request can be represented as <question type> <topic> <spatial relationship> <location>, where the question type (QR) represents the type of question asked by the geologist (e.g., how, what, or where), the topic represents the subject matter of interest to the geologist, and the spatial relationship (SR) represents the spatial relationship (e.g., in or near). The information is the topic term in the ontology of the geological domain, the SR represents the semantic relationship, i.e., topological, orientation, or distance (e.g., in or near), and the location represents the location where the topic appears. The search query, “the distribution of iron ore mines around Wuhan” can be expressed as <where><iron mine><near><Wuhan>.
The geological domain ontology is based on topic terms (e.g., iron ore) and can locate the corresponding class (e.g., iron ore class) in the geological ontology. As the geological ontology is associated with the web service ontology, using the class then allows access to the geological layer of the class corresponding to the subject at the data service provider.
The question type ontology defines the basic types of retrieval questions and designs different retrieval workflows for different types of retrieval questions. The introduction of the semantic web rule language into the ontology can greatly improve the reasoning power of the ontology to construct SWRL rules to describe retrieval workflows. For example, a retrieval workflow can be constructed for a Where_Near type retrieval question with the retrieval formula <where><iron mine><near><Wuhan>.
(1)
Retrieval workflow for the Where_Near type:
B u f f e r ( ? b u f f e r ) O v e r l a y ( ? o v e r l a y ) h a s G I S F u n c t i o n ( ? t y p e , ? b u f f e r ) h a s G I S F u n c t i o n ( ? t y p e , ? o v e r l a y ) h a s N e x t G I S F u n c t i o n ( ? b u f f e r , ? o v e r l a y ) W h e r e N e a r ( ? t y p e )   SWRL _ rule _ 1
(2)
Auxiliary search workflows of type Where_Near:
G I S F u n c t i o n ( ? f u n c 1 ) O u t p u t ( ? o u t 1 ) h a s O u t p u t ( ? f u n c 1 , ? o u t 1 ) G I S F u n c t i o n ( ? f u n c 2 ) I n p u t ( ? i n 2 ) h a s I n p u t ( ? f u n c 2 , ? i n 2 ) h a s N e x t G I S F u n c t i o n ( ? f u n c 1 , ? f u n c 2 )   SWRL _ rule _ 2
(3)
For Where_Near-type search questions in the question ontology, the user interface module automatically constructs the corresponding SWRL rules:
T o p i c ( ? t o p i c ) L o c a t i o n ( ? l o c a t i o n ) h a s GeologyData ( ? problem , ? t o p i c ) h a s GeologyData ( ? problem , ? l o c a t i o n ) h a s Q u e s t i o n T y p e ( ? problem , ? Q T ) W h e r e N e a r ( ? Q T ) P roblem ( ? problem ) SWRL _ rule _ 3
The GIS function ontology defines the various functions in the GIS, and the search workflow automatically calls the corresponding function in the GIS function ontology.

5. Ontology-Based Spatiotemporal and Topic-Based Information Retrieval: A Case Study

5.1. Data Source

The National Geological Archives of China (NGAC) (https://www.ngac.cn accessed on 1 December 2023) currently hold nearly 100,000 types of geological material obtained from long-term geological surveys. Among them, there are seven main types of geological materials. According to the classification statistics of the collection, there were 5709 types of regional geological survey materials, 52,921 types of mineral exploration materials, 106 types of marine geological report materials, 11,089 types of geophysical, geochemical and remote sensing geological survey materials, 8015 types of hydrogeological, engineering geological and environmental geological survey materials, 13,259 types of geological scientific research results reports, and 214 types of technical method research materials.
In this paper, the publicly contributed geological data materials in the NGAC were used as the basis, and the data from one computer were randomly selected as the retrieval data source (approximately 3800 items, with a data index generation time of 5 s). A comparison experiment of file name retrieval, keyword matching retrieval and ontology-based association retrieval was carried out by tagging 800 of the representative geological data according to the geological ontology content items. The workflow of the data search system is shown in Figure 13.

5.2. SWRL Rule Development

In this study, a rule-based inference approach is used to reason about the latent knowledge and semantics in the ontology. A rule-based inference engine is integrated in the Jena framework (https://jena.apache.org/, accessed on 12 October 2022), which contains its own rule construction language to construct rules in the form of if/then, which are generally saved in strings or text files, and the rules are constructed in the following format.
[Rule: (Triple1), (Triple 2), ..., (Triple m) → (Triple m+1)]
where the first m triples are the premises of the rule, the (m + 1)-th triple is the conclusion of the rule, and each triple is of the form (subject, predicate, object), where subject and object are generally concepts, and predicate is generally a property or association (property).
Rule-based reasoning is used to obtain these implicit facts and provide them to the computer to assist us in the next step. To infer all the implicit relations from the graph, based on the above four inferences, the following inference rules need to be imported (assuming the current ontology namespace is http://www.semanticweb.org/Sample#, accessed on 12 October 2022).
@prefix sample: <http://www.semanticweb.org/Sample#>
[Rule1:
(?classA rdfs:subClassOf ?classB)(?classB rdfs:subClassOf ?classC) ->
(?classA rdfs:subClassOf ?classC)
Rule2:
(?instance rdf:type ?classA)(?classA rdfs:subClassOf ?classB) ->
(?instance rdf:type ?classB)
Rule3:
(?classA sample:equivalentTo ?classB) ->
(?classB sample:equivalentTo ?classA)
Rule4:
(?instance rdf:type ?classA)(?classA sample:equivalentTo ?classB) ->
(?instance rdf:type ?classB)
]
In this study, for different requirements, we also defined different inference rule templates for completing inference to achieve concept scaling and semantic expansion, and the main rules are shown in Table 3.

5.3. Search Results

Figure 14 illustrates the semantic retrieval process in detail. Its specific process can be divided into two interrelated parts: (1) using the ontology itself as a knowledge base and database, retrieving the concepts and attribute contents in the ontology, and finally obtaining the target concept instances to achieve the purpose of knowledge retrieval and discovery; (2) first, through the establishment of an association relationship between the concept instances and the semantically annotated data resources, i.e., semantic indexing, and then, by using the concepts in the ontology with semantic associations and the reasoning ability in the ontology to achieve the extraction of semantics of retrieval conditions; after SPARQL query expansion, obtain the references of data resources in the semantic index to achieve the semantic retrieval and discovery of data resources.
After obtaining the ontology inference model by reasoning with the above rules, SPARQL statements are applied to the following algorithm to obtain the results in JSON format, which are displayed on the page using front-end logic processing.
Upon completion of the ontology-data integration process, all relationships are stored in RDF format. To enable efficient content retrieval, it becomes imperative to query specific data from the vast RDF dataset based on specified conditions. The World Wide Web Consortium (W3C) has devised a query and manipulation protocol for RDF data, which is known as SPARQL. Fundamentally, a SPARQL query encompasses a depiction of variables along with their interrelationships, forming a graph schema with variables. SPARQL operates on the premise of graph schema matching. Typically, a fundamental graph schema comprises a collection of triple schemas. A triple schema resembles an RDF triple and consists of a subject, predicate, and object. Notably, in a triple schema, the nodes can also be variables alongside URIs, blank nodes, and literal values. As illustrated in Figure 15a, a basic graph consists of diverse triples, and subsequently, the graph schema can be depicted as a directed graph (Figure 15b), while the triple schema is represented as an edge (Figure 15c). The triple schemas within a graph schema are usually interconnected through shared variables, thus establishing a connected directed graph representation of the basic graph schema. During the matching process, the basic graph pattern seeks to identify a subgraph in the RDF dataset, while the triple pattern aims to match a triple within that subgraph.
In this study, semantic and knowledge-based queries can be essentially summarized as improving recall by constructing a graph pattern for the query using the extended terms after expanding the keywords based on the ontology and synthesizing the results of the expanded graph pattern query. For example, we take neutral volcanic rocks as the query keyword, and the SPARQL query is
Prefix geo:http://www.semanticweb.org/Geology#
Select ?instance
Where{
     ?instance rdf:type geo: Neutral volcanic rocks
}
The first step is to scale and semantically extend the concept of “neutral volcanic rocks” using the definitions and inferred relationships in the ontology, so the system first performs the following query to find the subordinate concept of “neutral volcanic rocks”.
Prefix geo:http://www.semanticweb.org/Geology#
Select ?className
Where{
     ?className:rdfs:subClassof geo: Neutral volcanic rocks
}
After conceptual scaling, we find that in the geological domain, ash andesite, basaltic crude andesite, and hornblende andesite all belong to the category of neutral volcanic rocks in terms of semantic and geological disciplinary relationships; then, the actual SPARQL query is extended as
Prefix geo:http://www.semanticweb.org/Geology#
Select ?instance
Where{
     {?instance rdf:type geo: Neutral volcanic rocks} UNION{?instance rdf:type geo: Grayan Rock}
     {?instance rdf:type geo: Basaltic coarse andesite} UNION{?instance rdf:type geo: Hornblende andesite}
}
Hence, the expansion of the query results is achieved through the amplification of search terms. Rule-based query answering, also known as inference, is typically employed for the augmentation of search terms (refer to Table 4). In general, the dataset targeted for querying consists of a collection of facts and a set of rules. The facts delineate class relationships, while the rules elucidate the underlying semantics. Queries are often described as atomic formulas with variables, and the query result encompasses all variable bindings derived from the facts and rules under scrutiny. The crux of rule-based query answering lies in discovering latent facts that are initially undisclosed, thereby necessitating the introduction of inference. Two common techniques employed for identifying implicit facts include forward chaining and backward chaining. The former relies on the available facts and rules to infer all feasible facts, thereby transforming implicit facts into explicit ones. Consequently, when a query is received, obtaining comprehensive variable bindings becomes straightforward. The principle underlying forward chaining is rather simple: iteratively apply all rules to deduce additional facts based on the existing pending query facts until no novel facts emerge. However, executing the forward chain operation once can be computationally expensive. Moreover, although all answered queries can be swiftly obtained upon completion of the execution, this approach encounters several challenges. First, the size of pending query facts may experience significant inflation following the execution of forward chaining, thereby escalating the data storage requirements. Second, during practical applications, it is often impractical to execute the forward chain operation only once. As time progresses, the knowledge base undergoes updates, with new facts being added or old facts being deleted. Consequently, the forward chaining process may need to be executed multiple times, posing a considerable burden when dealing with large volumes of data.

5.4. Matching Evaluation of Search Results

Using the “Bashkurgan copper mine” as an example, the spatiotemporal and topic association search based on the geological ontology is realized by constructing content sets associated with the “Bashkurgan copper mine” and retrieving the corresponding information. First, “Bashkurgan” and “copper mine” are obtained by word segmentation, and then the ontology is used for association reasoning to obtain the corresponding association items, including spatial association for Bashkurgan, semantic association for copper mine, and rule association for the corresponding deposit prediction and evaluation model. Based on the association items, a set of association content is formed and searched, using a search engine to obtain the association data in the database. For example, spatial correlations include “Arjinshan Group, Taxi Darshan Group” for topological correlations, “Jiwei, Ruqiang, Pishan” for contiguous correlations, “Xinjiang Arjinshan North Slope” for contained correlations, etc. The semantic association includes “Xinjiang Arjinshan” and the regular associations include “silver deposits” and “copper-gold-silver” for the prediction and evaluation model of mineral deposits, respectively. The association is based on the concept of the “Bashkurgan copper mine”. Examples of specific associations are shown in Figure 16.
When searching for “Bashkurgan copper mine” (a total of 25 relevant data were confirmed manually), the results were as follows: (1) By using the file name search function that comes with the operating system, a total of 20 data points were searched, which is only a 60% completion rate and requires local operation on the computer. (2) A total of 14 items of data (including 2 items of irrelevant data) were retrieved by the traditional keyword matching search method (which retrieves metadata information) through the search system. The search results included 5 items of name-related data, such as “3D geological geochemical features and mineralization prediction”, and 7 items of metadata-related data, such as “geochemical anomalies” (metadata annotated Bashkurgan), with an improved search completion rate (72.9%). In addition to the 11 items describing “Bashkurgan copper mine”, the inference search also yielded the associated phrases such as, “Bashkurgan copper mine zone copper mine”, “Bashkurgan copper mine zone copper mine”, “Silver deposit prediction and evaluation model”, “Silver deposit prediction and evaluation model”, “Copper” and other related data. In addition, the correlation search intelligently guides the user through the search for associated deposit data. The search recall and precision are calculated to be 96% and 80%, respectively.
Examples constructed in mineral and mineral geology were selected separately for comparative experiments based on keywords and geological ontology, and the statistical results obtained are shown in Table 5.
By comparing the above two search methods, the following conclusions can be drawn: the search method based on the ontology of geological data has a greater advantage over the string matching-based search method in terms of data accuracy and completeness; the semantically annotated data can provide more correlation information and improve the efficiency and possibility of data mining; the search method based on the ontology of geological data can also intelligently provide recommendations to users to search. The retrieval method based on the ontology of geological data can also intelligently recommend that users search for data associated with the intended information, providing heuristic services to promote the sharing and reuse of geological data and uncover the potential value of existing data.

5.5. Validation: Result Evaluation

In this study, the precision, recall, and response time of the Lucene full-text index-based search and the semantic-based search in this study were tested separately. In terms of the accuracy and recall, “volcanic rocks”, “metamorphic rocks” and “sedimentary rocks” were used as search terms, and the detailed data of the tests are shown in Table 6.
Precision and recall are the two main indicators to evaluate the retrieval quality of the retrieval system. In order to make the data presentation more intuitive, the key data in Table 4 are presented in this paper in the form of histograms, as shown in Figure 17.
As depicted in Figure 16, the findings of this study reveal that the proposed semantic-based retrieval approach yields a notable enhancement in the recall rate of retrieval results when compared to the Lucene full-text search. The improvements achieved are statistically significant. However, when considering the precision, the impact of the proposed method is comparatively limited, and in certain test cases, the precision actually decreases. The analysis of the data presented in the table provides the following insights.
Recall analysis: Before the search process, the semantic-based query undergoes semantic expansion, thereby extending the scope of query terms. In comparison to Lucene’s keyword-based matching retrieval method, the semantic-based approach encompasses a broader range of matching terms. Consequently, the resulting query outcomes exhibit greater comprehensiveness, aligning with the initial objectives of this study. Analysis of the specific data reveals a higher count of fragments retrieved per query. The augmented number of fragments returned during a query enhances the likelihood of including pertinent fragments, thus contributing to an increase in terms of recall.
Precision analysis: In the context of Lucene, the query process involves the utilization of the index constructed by Lucene. Search terms are parsed into individual words and other operations, resulting in query results that may not precisely match the specified keywords. For instance, when querying “China” using Lucene, the returned results may include instances containing terms such as “Chinese”, “people”, and “Chinese people”. While this search approach can yield a larger number of results, it simultaneously increases the likelihood of retrieving irrelevant outcomes, thereby diminishing the overall precision. Contrasting this, in the semantic-based query process, search terms are initially segmented into individual words. Subsequently, based on the word segmentation outcomes, an exact match is sought within the ontology for the query. Only the words that achieve successful matches are further expanded within the ontology. As illustrated in Table 4, the inclusion of “metamorphic rock” and “sedimentary rock” as search terms exhibits a slight improvement in terms of precision. This improvement arises due to the search terms derived from the splitting process, including “metamorphic”, “metamorphic rock”, “sediment”, and “sedimentary rock”. These specific terms are the only ones forwarded to the ontology, thereby exerting a beneficial influence on the overall search outcomes. For the search term “volcanic rocks”, the word segmentation results in “volcanoes” and “volcanic rocks”, both of which exist as terms within the ontology. Consequently, the search term “volcanoes” also undergoes semantic expansion.

6. Conclusions and Future Work

In the context of the big data era, data retrieval systems play a critical role in facilitating the accurate discovery, comprehensive analysis, and effective sharing of geological data. However, the existing keyword-based retrieval approaches exhibit several limitations, including a laborious information acquisition process, suboptimal retrieval efficiency, and inadequate personalization support. To address these challenges, this research paper presents a novel information retrieval framework for geological data that incorporates spatiotemporal and topic multi feature associations. By meticulously defining geological concepts, attributes, relationships, rules, and corresponding examples, we propose an effective approach for constructing a robust geological ontology, resulting in a more comprehensive foundational geological ontology. Leveraging this ontology construction, we further propose an information extraction and multi-feature association data indexing model guided by domain knowledge. To validate our approach, we conduct a case study and perform experiments. The results demonstrate that the integration of the geological ontology significantly enhances data retrieval outcomes in terms of completeness, precision, and intelligent sharing of associated data information.
Future research will prioritize the exploration of the following realms: (1) conducting extensive experimentation and validation of the proposed framework across a broader spectrum of geological data while concurrently refining and optimizing the framework; and (2) devising methodologies to facilitate real-time querying and feedback mechanisms for the retrieval of immense volumes of geological data.

Author Contributions

Experiment proposal, Qinjun Qiu and Zhong Xie; funding acquisition, Qinjun Qiu, Kai Ma and Liufeng Tao; preliminary research, Miao Tian and Zhenyang Hui; data collection, Miao Tian, Shuai Zheng and Junjie Liu; experimental design and analysis, Miao Tian and Qinjun Qiu; writing the original manuscript, Miao Tian; writing—review and editing, Miao Tian and Qinjun Qiu. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financially supported by the Open·Fund·of Key Laboratory of Mine·Environmental Monitoring, and improving around Poyang Lake of Ministry of Natural Resources (No. MEMI-2021-2022-06), the Opening Fund of Key Laboratory of Geological Survey and Evaluation of Ministry of Education (No. GLAB 2023ZR01), and the Fundamental Research Funds for the Central Universities, and the Open Fund of Key Laboratory of Urban Land Re-sources Monitoring and Simulation, Ministry of Natural Resources (No. KF-2022-07-014).

Data Availability Statement

Data are contained within the article.

Acknowledgments

This study was financially supported by the Open Fund of the Key Laboratory of Mine Environmental Monitoring and Improving around Poyang Lake of Ministry of Natural Resources (No. MEMI-2021-2022-06), the Open Fund of the Key Laboratory of Geological Survey and Evaluation of the Ministry of Education (No. GLAB 2023ZR01), the Fundamental Research Funds for the Central Universities, and the Open Fund of the Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources (No. KF-2022-07-014).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wu, L.; Xue, L.; Li, C.; Lv, X.; Chen, Z.; Jiang, B.; Guo, M.; Xie, Z. A Knowledge-Driven Geospatially Enabled Framework for Geological Big Data. ISPRS Int. J. Geo-Inf. 2017, 6, 166. [Google Scholar] [CrossRef]
  2. Qiu, Q.; Xie, Z.; Wu, L. A cyclic self-learning Chinese word segmentation for the geoscience domain. Geomatica 2018, 72, 16–26. [Google Scholar] [CrossRef]
  3. Qiu, Q.; Xie, Z.; Wu, L.; Li, W. DGeoSegmenter: A dictionary-based Chinese word segmenter for the geoscience domain. Comput. Geosci. 2018, 121, 1–11. [Google Scholar] [CrossRef]
  4. Wang, B.; Wu, L.; Li, W.; Qiu, Q.; Xie, Z.; Liu, H.; Zhou, Y. A semi-automatic approach for generating geological profiles by integrating multi-source data. Ore Geol. Rev. 2021, 134, 104190. [Google Scholar] [CrossRef]
  5. Guo, H. Big Earth data: A new frontier in Earth and information sciences. Big Earth Data 2017, 1, 4–20. [Google Scholar] [CrossRef]
  6. Zhang, W.; Ching, J.; Goh, A.T.; Leung, A.Y. Big data and machine learning in geoscience and geoengineering: Introduction. Geosci. Front. 2020, 12, 327–329. [Google Scholar] [CrossRef]
  7. Zhou, C.; Wang, H.; Wang, C.; Hou, Z.; Zheng, Z.; Shen, S.; Cheng, Q.; Feng, Z.; Wang, X.; Lv, H.; et al. Geoscience knowledge graph in the big data era. Sci. China Earth Sci. 2021, 64, 1105–1114. [Google Scholar] [CrossRef]
  8. Qiu, Q.; Xie, Z.; Wu, L.; Tao, L.; Li, W. BiLSTM-CRF for geological named entity recognition from the geoscience literature. Earth Sci. Inform. 2019, 12, 565–579. [Google Scholar] [CrossRef]
  9. Qiu, Q.; Xie, Z.; Wu, L.; Li, W. Geoscience keyphrase extraction algorithm using enhanced word embedding. Expert Syst. Appl. 2019, 125, 157–169. [Google Scholar] [CrossRef]
  10. Qiu, Q.; Xie, Z.; Wu, L.; Tao, L. GNER: A Generative Model for Geological Named Entity Recognition without Labeled Data Using Deep Learning. Earth Space Sci. 2019, 6, 931–946. [Google Scholar] [CrossRef]
  11. Li, W.; Ma, K.; Qiu, Q.; Wu, L.; Xie, Z.; Li, S.; Chen, S. Chinese Word Segmentation Based on Self-Learning Model and Geological Knowledge for the Geoscience Domain. Earth Space Sci. 2021, 8, e2021EA001673. [Google Scholar] [CrossRef]
  12. Ma, K.; Tian, M.; Tan, Y.; Xie, X.; Qiu, Q. What is this article about? Generative summarization with the BERT model in the geosciences domain. Earth Sci. Inform. 2021, 15, 21–36. [Google Scholar] [CrossRef]
  13. Holden, E.-J.; Liu, W.; Horrocks, T.; Wang, R.; Wedge, D.; Duuring, P.; Beardsmore, T. GeoDocA—Fast analysis of geological content in mineral exploration reports: A text mining approach. Ore Geol. Rev. 2019, 111, 102919. [Google Scholar] [CrossRef]
  14. Enkhsaikhan, M.; Holden, E.-J.; Duuring, P.; Liu, W. Understanding ore-forming conditions using machine reading of text. Ore Geol. Rev. 2021, 135, 104200. [Google Scholar] [CrossRef]
  15. Qiu, Q.; Tian, M.; Ma, K.; Tan, Y.J.; Tao, L.; Xie, Z. A question answering system based on mineral exploration ontology generation: A deep learning methodology. Ore Geol. Rev. 2023, 153, 105294. [Google Scholar] [CrossRef]
  16. Li, W.; Wu, L.; Xie, Z.; Tao, L.; Zou, K.; Li, F.; Miao, J. Ontology-based question understanding with the constraint of Spatio-temporal geological knowledge. Earth Sci. Inform. 2019, 12, 599–613. [Google Scholar] [CrossRef]
  17. Qiu, Q.; Xie, Z.; Wu, L.; Tao, L. Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques. Earth Sci. Inform. 2020, 13, 1393–1410. [Google Scholar] [CrossRef]
  18. Ma, X. Knowledge graph construction and application in geosciences: A review. Comput. Geosci. 2022, 161, 105082. [Google Scholar] [CrossRef]
  19. Wang, C.; Hazen, R.M.; Cheng, Q.; Stephenson, M.H.; Zhou, C.; Fox, P.; Shen, S.-Z.; Oberhänsli, R.; Hou, Z.; Ma, X.; et al. The Deep-Time Digital Earth program: Data-driven discovery in geosciences. Natl. Sci. Rev. 2021, 8, nwab027. [Google Scholar] [CrossRef]
  20. Ma, X.; Ma, C.; Wang, C. A new structure for representing and tracking version information in a deep time knowledge graph. Comput. Geosci. 2020, 145, 104620. [Google Scholar] [CrossRef]
  21. Wang, B.; Ma, K.; Wu, L.; Qiu, Q.; Xie, Z.; Tao, L. Visual analytics and information extraction of geological content for text-based mineral exploration reports. Ore Geol. Rev. 2022, 144, 104818. [Google Scholar] [CrossRef]
  22. Qiu, Q.; Wang, B.; Ma, K.; Xie, Z. Geological profile-text information association model of mineral exploration reports for fast analysis of geological content. Ore Geol. Rev. 2022, 153, 105278. [Google Scholar] [CrossRef]
  23. Perrin, M.; Mastella, L.S.; Morel, O.; Lorenzatti, A. Geological time formalization: An improved formal model for describing time successions and their correlation. Earth Sci. Inform. 2011, 4, 81–96. [Google Scholar] [CrossRef]
  24. Ma, X.; Carranza, E.J.M.; Wu, C.; van der Meer, F.D. Ontology-aided annotation, visualization, and generalization of geological time-scale information from online geological map services. Comput. Geosci. 2012, 40, 107–119. [Google Scholar] [CrossRef]
  25. Hwang, J.; Nam, K.W.; Ryu, K.H. Designing and implementing a geologic information system using a spatiotemporal ontology model for a geologic map of Korea. Comput. Geosci. 2012, 48, 173–186. [Google Scholar] [CrossRef]
  26. Wu, L.; Xue, L.; Li, C.; Lv, X.; Chen, Z.; Guo, M.; Xie, Z. A Geospatial Information Grid Framework for Geological Survey. PLoS ONE 2015, 10, e0145312. [Google Scholar] [CrossRef]
  27. Borges, K.A.V.; Davis, C.A.; Laender, A.H.F.; Medeiros, C.B. Ontology-driven discovery of geospatial evidence in web pages. GeoInformatica 2011, 15, 609–631. [Google Scholar] [CrossRef]
  28. Kergosien, E.; Laval, B.; Roche, M.; Teisseire, M. Are Opinions Expressed in Land- Use Planning Documents. Int. J. Geogr. Inf. Sci. 2014, 28, 739–762. [Google Scholar] [CrossRef]
  29. Ballatore, A.; Bertolotto, M.; Wilson, D.C. An evaluative baseline for geo-semantic relatedness and similarity. GeoInformatica 2014, 18, 747–767. [Google Scholar] [CrossRef]
  30. Wang, W.; Stewart, K. Spatiotemporal and semantic information extraction from Web news reports about natural hazards. Comput. Environ. Urban Syst. 2015, 50, 30–40. [Google Scholar] [CrossRef]
  31. Mata-Rivera, F.; Torres-Ruiz, M.; Guzmán, G.; Moreno-Ibarra, M.; Quintero, R. A collaborative learning approach for geographic information retrieval based on social networks. Comput. Hum. Behav. 2015, 51, 829–842. [Google Scholar] [CrossRef]
  32. Ke, S.; Gong, J.; Li, S.; Zhu, Q.; Liu, X.; Zhang, Y. A Hybrid Spatio-Temporal Data Indexing Method for Trajectory Databases. Sensors 2014, 14, 12990–13005. [Google Scholar] [CrossRef] [PubMed]
  33. Wang, J.; Wu, S.; Gao, H.; Li, J.; Ooi, B.C. Indexing Multi-Dimensional Data in a Cloud System. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, Indianapolis, IN, USA, 6–10 June 2010; ACM: New York, NY, USA, 2010; pp. 591–602. [Google Scholar]
  34. Dittrich, J.; Quiané-Ruiz, J.-A.; Richter, S.; Schuh, S.; Jindal, A.; Schad, J. Only aggressive elephants are fast elephants. Proc. VLDB Endow. 2012, 5, 1591–1602. [Google Scholar] [CrossRef]
  35. Wang, J.; Liu, W.; Kumar, S.; Chang, S.-F. Learning to Hash for Indexing Big Data—A Survey. Proc. IEEE 2016, 104, 34–57. [Google Scholar] [CrossRef]
  36. Kiryakov, A.; Popov, B.; Terziev, I.; Manov, D.; Ognyanoff, D. Semantic annotation, indexing, and retrieval. J. Web Semant. 2004, 2, 49–79. [Google Scholar] [CrossRef]
  37. Klien, E.; Lutz, M.; Kuhn, W. Ontology-based discovery of geographic information services—An application in disaster management. Comput. Environ. Urban Syst. 2006, 30, 102–123. [Google Scholar] [CrossRef]
  38. Lutz, M.; Klien, E. Ontology-based retrieval of geographic information. Int. J. Geogr. Inf. Sci. 2006, 20, 233–260. [Google Scholar] [CrossRef]
  39. Gui, Z.; Yang, C.; Xia, J.; Liu, K.; Xu, C.; Li, J.; Lostritto, P. A performance, semantic and service quality-enhanced distributed search engine for improving geospatial resource discovery. Int. J. Geogr. Inf. Sci. 2013, 27, 1109–1132. [Google Scholar] [CrossRef]
  40. Guo, M. The Application of Ontology in Semantic Discovery for GeoData Web Service. Commun. Netw. 2013, 5, 678–680. [Google Scholar] [CrossRef]
  41. Han, L.S.; Finin, T.; Joshi, A. Schema-Free structured querying of DBpedia data. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012), Maui, HI, USA, 29 October–2 November 2012; pp. 2090–2093. [Google Scholar]
  42. Rubin, D.L.; Flanders, A.; Kim, W.; Siddiqui, K.M.; Kahn, C.E. Ontology-Assisted Analysis of Web Queries to Determine the Knowledge Radiologists Seek. J. Digit. Imaging 2011, 24, 160–164. [Google Scholar] [CrossRef]
  43. Zhuhadar, L.; Nasraoui, O.; Wyatt, R. Visual Ontology-Based Information Retrieval System. In Proceedings of the 2009 13th International Conference Information Visualisation, Barcelona, Spain, 15–17 July 2009; pp. 419–426. [Google Scholar]
  44. Zhuhadar, L.; Nasraoui, O.; Wyatt, R.; Romero, E. Multi-Language ontology-based search engine. In Proceedings of the 2010 Third International Conference on Advances in Computer-Human Interactions (ACHI 2010), Saint Maarten, Netherlands Antilles, 10–15 February 2010; pp. 13–18. [Google Scholar]
  45. Fernández, M.; Cantador, I.; López, V.; Vallet, D.; Castells, P.; Motta, E. Semantically enhanced Information Retrieval: An ontology-based approach. Web Semant. Sci. Serv. Agents World Wide Web 2011, 9, 434–452. [Google Scholar] [CrossRef]
  46. Allocca, C.; D’aquin, M.; Motta, E. Impact of using relationships between ontologies to enhance the ontology search results. In Proceedings of the 9th International Conference on The Semantic Web: Research and Applications, Crete, Greece, 27–31 May 2012; pp. 453–468. [Google Scholar]
  47. Yoo, D. Hybrid query processing for personalized information retrieval on the Semantic Web. Knowl. Based Syst. 2012, 27, 211–218. [Google Scholar] [CrossRef]
  48. Kallipolitis, L.; Karpis, V.; Karali, I. Semantic search in the World News domain using automatically extracted metadata files. Knowl.-Based Syst. 2012, 27, 38–50. [Google Scholar] [CrossRef]
  49. Hourali, M.; Montazer, G.A. An Intelligent Information Retrieval Approach Based on Two Degrees of Uncertainty Fuzzy Ontology. Adv. Fuzzy Syst. 2011, 2011, 7. [Google Scholar] [CrossRef]
  50. Lim, S.C.J.; Liu, Y.; Lee, W.B. Multi-facet product information search and retrieval using semantically annotated product family ontology. Inf. Process. Manag. 2010, 46, 479–493. [Google Scholar]
  51. Wiegand, N.; García, C. A Task-Based Ontology Approach to Automate Geospatial Data Retrieval. Trans. GIS 2007, 11, 355–376. [Google Scholar] [CrossRef]
  52. Sun, K.; Zhu, Y.; Pan, P.; Hou, Z.; Wang, D.; Li, W.; Song, J. Geospatial data ontology: The semantic foundation of geospatial data integration and sharing. Big Earth Data 2019, 3, 269–296. [Google Scholar] [CrossRef]
  53. Liu, J.; Liu, H.; Chen, X.; Guo, X.; Zhao, Q.; Li, J.; Kang, L.; Liu, J. A Heterogeneous Geospatial Data Retrieval Method Using Knowledge Graph. Sustainability 2021, 13, 2005. [Google Scholar] [CrossRef]
  54. Lv, X.; Xie, Z.; Xu, D.; Jin, X.; Ma, K.; Tao, L.; Qiu, Q.; Pan, Y. Chinese Named Entity Recognition in the Geoscience Domain Based on BERT. Earth Space Sci. 2022, 9, e2021ea002166. [Google Scholar] [CrossRef]
  55. Zhang, S.; Boukamp, F.; Teizer, J. Ontology-based semantic modeling of construction safety knowledge: Towards automated safety planning for job hazard analysis (JHA). Autom. Constr. 2015, 52, 29–41. [Google Scholar] [CrossRef]
  56. Musen, M.A. The protégé project: A look back and a look forward. AI Matters 2015, 1, 4–12. [Google Scholar] [CrossRef] [PubMed]
  57. Garcia, L.F.; Abel, M.; Perrin, M.; Alvarenga, R.d.S. The GeoCore ontology: A core ontology for general use in Geology. Comput. Geosci. 2019, 135, 104387. [Google Scholar] [CrossRef]
  58. Arp, R.; Smith, B.; Spear, A.D. Building Ontologies with Basic Formal Ontology; Mit Press: Cambridge, MA, USA, 2015. [Google Scholar]
  59. Mantovani, A.; Piana, F.; Lombardo, V. Ontology-driven representation of knowledge for geological maps. Comput. Geosci. 2020, 139, 104446. [Google Scholar] [CrossRef]
  60. Li, L.; Liu, Y.; Zhu, H.; Ying, S.; Luo, Q.; Luo, H.; Kuai, X.; Xia, H.; Shen, H. A bibliometric and visual analysis of global geo-ontology research. Comput. Geosci. 2017, 99, 1–8. [Google Scholar] [CrossRef]
  61. Andrés, S.; Arvor, D.; Mougenot, I.; Libourel, T.; Durieux, L. Ontology-based classification of remote sensing images using spectral rules. Comput. Geosci. 2017, 102, 158–166. [Google Scholar] [CrossRef]
  62. Wang, C.; Ma, X.; Chen, J. Ontology-driven data integration and visualization for exploring regional geologic time and paleontological information. Comput. Geosci. 2018, 115, 12–19. [Google Scholar] [CrossRef]
  63. Niles, I.; Pease, A. Towards a standard upper ontology. In Proceedings of the International Conference on Formal Ontology in Information Systems-Volume 2001, Ogunquit, ME, USA, 17–19 October 2001; pp. 2–9. [Google Scholar]
  64. Gangemi, A.; Guarino, N.; Masolo, C.; Oltramari, A.; Schneider, L. Sweetening ontologies with DOLCE. In Proceedings of the International Conference on Knowledge Engineering and Knowledge Management, Sigüenza, Spain, 1–4 October 2002; Springer: Berlin/Heidelberg, Germany, 2002; pp. 166–181. [Google Scholar]
  65. Partridge, C.; Stefanova, M. Building a Foundation for Ontologies of Organizations. In The Ontology and Modelling of Real Estate Transactions; Routledge: London, UK, 2003; pp. 141–149. [Google Scholar]
  66. Guizzardi, G. Ontological Foundations for Structural Conceptual Models. Ph.D. Thesis, University of Twente, Enschede, The Netherlands, 2005. [Google Scholar]
  67. Herre, H. General Formal Ontology (GFO): A foundational ontology for conceptual modelling. In Theory and Applications of Ontology: Computer Applications; Springer: Dordrecht, The Netherlands, 2010; pp. 297–345. [Google Scholar]
  68. Raskin, R.G.; Pan, M.J. Knowledge representation in the semantic web for Earth and environmental terminology (SWEET). Comput. Geosci. 2005, 31, 1119–1125. [Google Scholar] [CrossRef]
  69. Raskin, R. Development of ontologies for earth system science. Geol. Soc. Am. Spec. Pap. 2006, 397, 195–199. [Google Scholar] [CrossRef]
  70. Zhong, J.; Aydina, A.; McGuinness, D.L. Ontology of fractures. J. Struct. Geol. 2009, 31, 251–259. [Google Scholar] [CrossRef]
  71. Ma, X.; Asch, K.; Laxton, J.L.; Richard, S.M.; Asato, C.G.; Carranza, E.J.M.; van der Meer, F.D.; Wu, C.; Duclaux, G.; Wakita, K. Data exchange facilitated. Nat. Geosci. 2011, 4, 814. [Google Scholar] [CrossRef]
  72. Babaie, H.A.; Oldow, J.S.; Babaei, A.; Lallemant, H.G.A.; Watkinson, A.J. Designing a modular architecture for the structural geology ontology. Geoinform. Data Knowl. Geol. Soc. Am. Spec. Pap. 2006, 397, 269–282. [Google Scholar] [CrossRef]
Figure 1. Research tasks in geological ontology development (modified based on Zhang et al. [55]). Step 1 aims to define purpose and scope of ontology; Step 2 aims to capture and code ontology; Step 3 aims to develop SWRL rules; Step 4 aims to validate and improve ontology; Step 5 constructs the geological ontology.
Figure 1. Research tasks in geological ontology development (modified based on Zhang et al. [55]). Step 1 aims to define purpose and scope of ontology; Step 2 aims to capture and code ontology; Step 3 aims to develop SWRL rules; Step 4 aims to validate and improve ontology; Step 5 constructs the geological ontology.
Ijgi 13 00014 g001
Figure 2. System architecture of the ontology-based information retrieval application. The system architecture of our ontology-based information retrieval application includes ontology editor, reasoner, and rule engine.
Figure 2. System architecture of the ontology-based information retrieval application. The system architecture of our ontology-based information retrieval application includes ontology editor, reasoner, and rule engine.
Ijgi 13 00014 g002
Figure 3. The geological ontology example, which includes concept level and instance layer.
Figure 3. The geological ontology example, which includes concept level and instance layer.
Ijgi 13 00014 g003
Figure 4. A partial view of rock ontology. Conglomerate includes Breccia, Basal conglomerate, Interformation conglomerate, Evaporate-solution breccia, Diagenetic conglomerate, Shore conglomerate, Alluvial conglomerate, Tillite, Catagenetic breccia, Orth conglomerate, Para conglomerate, etc.
Figure 4. A partial view of rock ontology. Conglomerate includes Breccia, Basal conglomerate, Interformation conglomerate, Evaporate-solution breccia, Diagenetic conglomerate, Shore conglomerate, Alluvial conglomerate, Tillite, Catagenetic breccia, Orth conglomerate, Para conglomerate, etc.
Ijgi 13 00014 g004
Figure 5. A partial view of mineral geology ontology. Iron ore commercial type of mineral deposit includes Damiao_type iron deposit, Panzhihua_type iron deposit, Daye_type iron deposit, Handan_Xingtai_type iron deposit, Luohe_type iron deposit, Jingtieshan_type iron deposit, Maishan_type iron deposit, Fenghuangshan_type iron deposit, Nanshan_type iron deposit, Tiekuangshan_type iron deposit, etc.
Figure 5. A partial view of mineral geology ontology. Iron ore commercial type of mineral deposit includes Damiao_type iron deposit, Panzhihua_type iron deposit, Daye_type iron deposit, Handan_Xingtai_type iron deposit, Luohe_type iron deposit, Jingtieshan_type iron deposit, Maishan_type iron deposit, Fenghuangshan_type iron deposit, Nanshan_type iron deposit, Tiekuangshan_type iron deposit, etc.
Ijgi 13 00014 g005
Figure 6. A partial view of spatial ontology. In the illustration, different color and line combinations express different semantic relationships, where solid lines indicate strong associations, dashed lines indicate weaker associations, and different colors distinguish different types of associations.
Figure 6. A partial view of spatial ontology. In the illustration, different color and line combinations express different semantic relationships, where solid lines indicate strong associations, dashed lines indicate weaker associations, and different colors distinguish different types of associations.
Ijgi 13 00014 g006
Figure 7. Geological age ontology class relationship diagram, where Arrows represent kind-of relationships.
Figure 7. Geological age ontology class relationship diagram, where Arrows represent kind-of relationships.
Ijgi 13 00014 g007
Figure 8. A partial view of the geological time ontology, Where coloured boxes represent instances.
Figure 8. A partial view of the geological time ontology, Where coloured boxes represent instances.
Ijgi 13 00014 g008
Figure 9. The hierarchical structure of domain geology. The ontology here is selected from the basic geological ontology constructed for geological data query service to realize semantic query and retrieval of geological data. Arrows represent kind-of relationships.
Figure 9. The hierarchical structure of domain geology. The ontology here is selected from the basic geological ontology constructed for geological data query service to realize semantic query and retrieval of geological data. Arrows represent kind-of relationships.
Ijgi 13 00014 g009
Figure 10. Topic–spatially integrated query data indexing model. Topic information, spatial information, and temporal information are extracted from unstructured data, and entity attribute information and location information are extracted from structured databases, and these terms are formed into a structured knowledge base to serve query retrieval.
Figure 10. Topic–spatially integrated query data indexing model. Topic information, spatial information, and temporal information are extracted from unstructured data, and entity attribute information and location information are extracted from structured databases, and these terms are formed into a structured knowledge base to serve query retrieval.
Ijgi 13 00014 g010
Figure 11. A framework for geological data retrieval that takes into account spatial and topic multicorrelations.
Figure 11. A framework for geological data retrieval that takes into account spatial and topic multicorrelations.
Ijgi 13 00014 g011
Figure 12. Ontology design of the search framework.
Figure 12. Ontology design of the search framework.
Ijgi 13 00014 g012
Figure 13. The workflow of the data search system.
Figure 13. The workflow of the data search system.
Ijgi 13 00014 g013
Figure 14. The semantic retrieval process. (1) Using the ontology itself as a knowledge base and database, retrieving the concepts and attribute contents in the ontology, and finally obtaining the target concept instances to achieve the purpose of knowledge retrieval and discovery; (2) Firstly, by establishing the association relationship between concept instances and semantically annotated data resources, then using the reasoning capability in the ontology to achieve semantic extraction of retrieval conditions, and after SPARQL query extension to achieve semantic retrieval and discovery of data resources.
Figure 14. The semantic retrieval process. (1) Using the ontology itself as a knowledge base and database, retrieving the concepts and attribute contents in the ontology, and finally obtaining the target concept instances to achieve the purpose of knowledge retrieval and discovery; (2) Firstly, by establishing the association relationship between concept instances and semantically annotated data resources, then using the reasoning capability in the ontology to achieve semantic extraction of retrieval conditions, and after SPARQL query extension to achieve semantic retrieval and discovery of data resources.
Ijgi 13 00014 g014
Figure 15. SPARQL query pattern. (a) A basic graph of the rock ontology and instances; (b) A directed graph of rock; (c) the triple schema represented as an edge.
Figure 15. SPARQL query pattern. (a) A basic graph of the rock ontology and instances; (b) A directed graph of rock; (c) the triple schema represented as an edge.
Ijgi 13 00014 g015
Figure 16. Geological ontology object representation—the case of Bashkurgan copper mine.
Figure 16. Geological ontology object representation—the case of Bashkurgan copper mine.
Ijgi 13 00014 g016
Figure 17. Comparison of results based on Lucene full-text indexing and semantic-based search.
Figure 17. Comparison of results based on Lucene full-text indexing and semantic-based search.
Ijgi 13 00014 g017
Table 1. Geology professional participants.
Table 1. Geology professional participants.
ParticipantYears of ExperienceJob Title
120Geological information supervisor
210Geological information supervisor
38Geological information supervisor
425Geological engineering supervisor
526Tectonic geologist
628Metallogenic geologist
716Engineering geologist
817Stratigraphic paleontologist
919Geological information supervisor
1020Geological information supervisor
Table 2. Main survey results.
Table 2. Main survey results.
QuestionMeanMedianStandard DeviationResult
Are you familiar with the concepts used in the ontology?1.4420.51Very familiar to familiar
Do you think the concepts and relations used in the ontology are representative?1.6320.49Representative
How easy was it to understand and navigate through the ontology?1.8120.53Easy
Does the ontology cover the main concepts and relations within the geoscience domain?1.9020.51Agree
Table 3. Design of inference rules.
Table 3. Design of inference rules.
UseRule Expression
Discovery of ontological superordinate concepts in the geological field@prefix geo: <http://www.semanticweb.org/Geology#>
[Rule1:(?a rdfs:subClassOf ?b) ->(?b geo:broaderClassOf ?a)]
Discovering ontological subordinate concepts in the geological field@prefix geo: <http://www.semanticweb.org/Geology#>
[Rule1:(?a rdfs:subClassOf ?b) (?b rdfs:subClassOf ?c)->(?a rdfs:subClassOf ?c)]
Discovery of all equivalent concepts in the geological field ontology@prefix geo: <http://www.semanticweb.org/Geology#>
[Rule1:(?a geo:equivalentTerm ?b) ->(?b geo:equivalentTerm ?a)]
[Rule2:(?a geo:equivalentTerm ?b) (?b geo:equivalentTerm ?c) ->(?a geo:equivalentTerm ?c)]
Discover all relevant concepts in the geological field ontology@prefix geo: <http://www.semanticweb.org/Geology#>
[Rule1:(?a geo:relateTerm ?b) ->(?b geo:relateTerm ?a)]
Discovery of ontological sibling concepts in the geological domain@prefix syn: <http://www.semanticweb.org/Geology#>
[Rule1:(?a rdfs:subClassOf ?b) (?c rdfs:subClassOf ?b) ->(?a geo:siblingTerm ?c)]
Discover synonyms@prefix syn: < http://www.semanticweb.org/Synonym#, 2022.10.12>
[Rule1:(?a syn:equivalentTo ?b) ->(?b syn:equivalentTo ?a)]
[Rule2:(?a syn:equivalentTo ?b) (?b syn:equivalentTo ?c) ->(?a syn:equivalentTo ?c)]
Discover related words@prefix syn: < http://www.semanticweb.org/Synonym#>
[Rule1:(?a syn:relateTo ?b) ->(?b syn:relateTo ?a)]
[Rule2:(?a syn:relateTo ?b) (?b syn:relateTo ?c) ->(?a syn:relateTo ?c)]
Table 4. Rules for backward chain inference.
Table 4. Rules for backward chain inference.
UseRule Expression
Discovery of the superior concept of “volcanic rocks@prefix geo: <http://www.semanticweb.org/Geology#>
[Rule1:(geo: volcanic rocks rdfs:subClassOf ?b) ->
(?b geo:broaderClassOf geo: volcanic rocks)]
Discovery of the subordinate concept of “volcanic rocks@prefix geo: <http://www.semanticweb.org/Geology#>
[Rule1:(?a rdfs:subClassOf geo: volcanic rocks)
(?b rdfs:subClassOf ?a)->(?b rdfs:subClassOf geo: volcanic rocks)]
Discover all the equivalent concepts of “volcanic rocks@prefix geo: <http://www.semanticweb.org/Geology#>
[Rule1:(geo: volcanic rocks geo:equivalentTerm ?b) ->
(?b geo:equivalentTerm geo: volcanic rocks)]
[Rule2:(geo: volcanic rocks geo:equivalentTerm ?b)(?b geo:equivalentTerm ?c)->(geo: volcanic rocks geo:equivalentTerm ?c)]
Discover all concepts related to “volcanic rocks@prefix geo: <http://www.semanticweb.org/Geology#>
[Rule1:(geo: volcanic rocks geo:relateTerm ?b) ->(?b geo:relateTerm geo: volcanic rocks)]
Discover the “volcanic rock” equivalent concept@prefix geo: <http://www.semanticweb.org/Geology#>
[Rule1:(geo: volcanic rocks rdfs:subClassOf ?b) (?c rdfs:subClassOf ?b) ->(geo: volcanic rocks geo:siblingTerm ?c)]
Discover synonyms for “query@prefix syn: < http://www.semanticweb.org/Synonym#>
[Rule1:(syn: query syn:equivalentTo ?b) ->(?b syn:equivalentTo syn: query)]
[Rule2:(syn: query syn:equivalentTo ?b) (?b syn:equivalentTo ?c) ->(syn: query syn:equivalentTo ?c)]
Related words for “query” found@prefix syn: < http://www.semanticweb.org/Synonym#>
[Rule1:(syn: query syn:relateTo ?b) ->(?b syn:relateTo syn: query)]
[Rule2:(syn: query syn:relateTo ?b) (?b syn:relateTo ?c) ->(syn: query syn:relateTo ?c)]
Table 5. Search results for geological ontologies based on keywords and spatiotemporal–topic associations.
Table 5. Search results for geological ontologies based on keywords and spatiotemporal–topic associations.
No.Search TermsLinked Data in Data SourcesKeyword SearchGeological Ontology Search Based on Spatial–Topic Association
Total Number of Search ResultsNumber of Relevant Search ResultsRecall (%)Precision (%)Total Number of Search ResultsNumber of Relevant Search ResultsRecall (%)Precision (%)
1Bashkurgan copper mine252015607524209680
2Pyrite30181033.355.6252183.370
3Chalcopyrite116327508672.754.5
4Copper polymetallic deposits158426.75012108066.7
5Zone V copper mine1610637.5601297556.3
Table 6. Test results based on Lucene full-text indexing and semantic-based search.
Table 6. Test results based on Lucene full-text indexing and semantic-based search.
Search TermsSearch ModeNumber of ReturnsEffective NumberNumber of System-RelatedNumber of SystemsPrecision (%)Recall (%)
Volcanic rocksLucene51414810074.4785.42
Semantic67404810068.6683.33
Metamorphic rocksLucene42293510061.9082.86
Semantic40303810072.5078.95
Sedimentary rocksLucene59425910072.8871.19
Semantic74517710075.6866.23
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tao, L.; Ma, K.; Tian, M.; Hui, Z.; Zheng, S.; Liu, J.; Xie, Z.; Qiu, Q. Developing a Base Domain Ontology from Geoscience Report Collection to Aid in Information Retrieval towards Spatiotemporal and Topic Association. ISPRS Int. J. Geo-Inf. 2024, 13, 14. https://doi.org/10.3390/ijgi13010014

AMA Style

Tao L, Ma K, Tian M, Hui Z, Zheng S, Liu J, Xie Z, Qiu Q. Developing a Base Domain Ontology from Geoscience Report Collection to Aid in Information Retrieval towards Spatiotemporal and Topic Association. ISPRS International Journal of Geo-Information. 2024; 13(1):14. https://doi.org/10.3390/ijgi13010014

Chicago/Turabian Style

Tao, Liufeng, Kai Ma, Miao Tian, Zhenyang Hui, Shuai Zheng, Junjie Liu, Zhong Xie, and Qinjun Qiu. 2024. "Developing a Base Domain Ontology from Geoscience Report Collection to Aid in Information Retrieval towards Spatiotemporal and Topic Association" ISPRS International Journal of Geo-Information 13, no. 1: 14. https://doi.org/10.3390/ijgi13010014

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop