**1. Introduction**

The emergence of efficient and sustainable technologies represents a major challenge for the 21st century. While renewable energy sources are increasingly replacing fossil fuels in order to reduce CO2 emissions, the influence of friction and wear on the energy efficiency of a wide range of technical processes has hardly reached public awareness. However, these offer considerable potential for saving CO2 and resources. Holmberg and Erdemir [1] estimated that roughly 23% of the global primary energy is consumed to overcome friction and to repair/replace worn components in tribo-technical systems. The authors predicted that these energy losses could be reduced by up to 40% through tribological advances. Accordingly, companies and research institutions are focusing on new concepts, materials, lubricants, or surface technologies in a wide range of applications. This is also reflected in the continuously growing number of publications related to the domain of tribology, which in turn serve as inspiration, guidance, and benchmark for researchers and developers, but which are almost impossible to keep up with due to their vast quantity and the associated complexity and diversity. Thereby, profound data bases in combination with machine learning (ML) and artificial intelligence (AI) approaches can support sorting through the complexity of patterns and identifying trends [2]. Therefore, they are more

**Citation:** Kügler, P.; Marian, M.; Dorsch, R.; Schleich, B.; Wartzack, S. A Semantic Annotation Pipeline towards the Generation of Knowledge Graphs in Tribology. *Lubricants* **2022**, *10*, 18. https:// doi.org/10.3390/lubricants10020018

Received: 14 December 2021 Accepted: 23 January 2022 Published: 25 January 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

and more employed in the analysis, design, optimization, or monitoring of tribological systems in various fields [3], ranging from composite materials [4], drive technology [5,6], manufacturing [7], surface engineering [8,9], or lubricant formulation [10,11]. As pointed out by Marian and Tremmel [12], novel findings and additional value in the domain of tribology can especially be created by extracting knowledge from available literature and drawing higher-level conclusions. For example, Kurt and Oduncuoglu [13] trained artificial neural networks (ANNs) with data from literature to study the influence of normal load, sliding speed as well as the type and weight fraction of various reinforcement phases within a polyethylene matrix on the resulting friction and wear behavior. Similarly, Vinoth and Datta [14] utilized 153 data sets from literature to predict mechanical properties of carbon nanotube or graphene reinforced polyethylene in dependency of composition, particle size, and bulk properties by means of an ANN. Subsequently, multi-objective optimization by genetic algorithms and corresponding experimental validation actually demonstrated improved tribological properties compared to the references. Using 80 data sets from four-ball-tests and 120 data sets from pin-on-disk experiments with varying base oils and friction modifiers as reported in literature as well as an ANN and a genetic algorithm, Bhaumik et al. [15] optimized the lubricant formulation and experimentally validated their results. The aforementioned studies indicate the potential through leveraging knowledge from the available literature. However, the data acquisition and processing still are very manual in the field of tribology, involving the review of publications and the extraction of relevant (most frequently textually/descriptive) information, which limits the generation of sophisticated and broad databases and thus the further use of ML/AI [12].

High manual efforts to acquire and curate information and knowledge for further processing are not limited to the domain of tribology and is known as "knowledge acquisition bottleneck" [16]. Although the latter has been discussed since the rise of expert systems in the 1980s [17], for instance with the purpose of tribological design decisions [18] or failure diagnosis [19] to mention two examples from the tribological domain, knowledge acquisition and thus knowledge engineering are still quite manual and time-consuming tasks. Studer et al. [20] argue that knowledge engineering is a modeling activity, which goes beyond the simple transfer of directly accessible knowledge into an appropriate computer representation towards a model construction process [21]. In consequence, knowledge structuring and modeling plays an important role in the knowledge acquisition process. Hoekstra [22] therefore refers to a "knowledge reengineering bottleneck", which highlights the general difficulty of continuously reusing existing generic and assertional knowledge. The latter refers to data-level or object knowledge, while generic knowledge concerns schema-level describing conceptual knowledge and is represented as a domain theory to structure the respective domain. This includes the decision on used vocabulary to describe the domain and a representation form to formalize the model. Chandrasegaran et al. [23], as well as Verhagen et al. [24], emphasized the importance of semantic interoperability for knowledge reuse and sharing, which is frequently dealt with ontological models represented in formal logics. According to Gruber [25], an ontology is an "explicit specification of a conceptualization". This means that an ontology can be used to explicitly define a domain model for sharing and reusing structured knowledge by humans and machines. In other domains, e.g., bioinformatics, ontologies are widely used for knowledge structuring, data integration and decision support systems [26]. One successful example is the Gene Ontology (GO) [27], which provides broadly accepted vocabulary for annotating gene product data from different databases and sources. Exploiting ontologies for accessing and reusing experimental knowledge has also been pursued in the domain of tribology. One example is the "OntoCommons" project (https://ontocommons.eu/industrial-domain-ontologies, accessed on 14 December 2021), where a tribological use case aimed at reducing efforts in tribological experiments by reusing existing knowledge. Thereby, Esnaola–Gonzalez and Fernandez [28] argue, that semantic technologies, and more specifically ontologies propose a suited representation for the vaguely documented results of experiments. Within the domain of materials science, the "European Materials Modelling Ontology" (EMMO, https://emmc.info/emmo-info/, accessed on 14 December 2021) provides a representational ontology based on materials modelling and characterization knowledge. Furthermore, we recently introduced the tribAIn ontology [29] for reusing knowledge from tribological experiments. The domain ontology was built for the purpose of providing a common and machine-readable schema for structuring tribological experiments intending to improve reuse and shareability of testing results from different sources. Since this contribution relies on the tribAIn ontology, more detailed information is provided in Section 2.2. In addition to schema-level generic knowledge, assertional knowledge refers to specific knowledge objects, e.g., results from individual experiments. As mentioned before, assertional knowledge from experiments in the domain of tribology is usually published in natural language, thus publications are a well-suited knowledge source for acquiring the current state of tribological findings. Dealing with natural language sources is usually problematic since it is ambiguous and unstructured. Moreover, textual descriptions may be incomplete in the sense of formal models. Due to the time-consuming process of acquiring and structuring knowledge from textual sources in systematic literature studies or manual database construction, those knowledge bases are not suited for long-term reuse and continuous extension. A successful example for generating structured information from textual sources is the DBpedia project [30], which extracts structured data from Wikipedia content using templates and pattern matching techniques. The structured format then allows querying the vast content in a sophisticated way instead of searching articles by keywords and processing the information manually. In terms of the results from tribological experiments, publications—similar to Wikipedia—contain structured (e.g., operational parameters, wear rate, coefficient of friction, etc.) and unstructured knowledge (for example interpretive description and discussion of results). By extracting the information from text in a structured way, the knowledge can be queried, processed, and compared. Thus, one could query for tribological experiments on desired materials and testing conditions, for example dry-running pin-on-disk model tests with various reinforcement phases within composites or deposited coatings on the specimen surfaces.

A large-scale employment of aforementioned knowledge extraction approaches, however, strongly demands for strategies for (semi-)automatically streamlining data acquisition. Therefore, this contribution aims at the introduction of a semantic annotation pipeline based upon natural language processing (NLP) methods in order to overcome the "knowledge reengineering bottleneck" in the domain of tribology. The motivation behind this contribution is mainly inspired by the current practice in biomedical research, where a massive growth in published research articles led to increasing attention for automated information extraction methods to support human researchers [31,32]. Regarding similar challenges, like sharing research outcomes via natural language publications, semantic ambiguity and interdisciplinary nature of the domain, this contribution is a first attempt to apply (semi-)automatic knowledge acquisition techniques within the domain of tribology. Therefore, while the methods used within this contribution have already shown potential in similar knowledge acquisition and structuring issues within the biomedical domain, this paper aims at the effective use of these methods in tribology. The contribution is structured as follows: First, the applied methods for the acquisition pipeline are introduced, containing a description of the underlying domain theory of tribological test methods as a generic schema as well as the relevant semantic web and NLP techniques, especially named entity recognition and question answering under the use of the BERT language model [33]. The semantic annotation pipeline and packages used for implementation are summarized in Section 3. Subsequently, the access-level and performance of the pipeline are demonstrated in Section 4, including a description of the Web-User-Interface and a technical evaluation of the single modules of the pipeline. Finally, we discuss the potentials and limitations of the pipeline, as well as connections and outlooks to further approaches in Section 5.

## **2. Theory and Methods**

#### *2.1. Domain Theory from Tribology*

As mentioned before, generic knowledge builds a domain theory, which can be represented as a formal ontology. In terms of semantic annotation, the domain theory is used as structured metadata, the unstructured resource is enriched with. Therefore, relevant concepts and relations from established methodologies of tribological testing are used to build the schema for the semantic annotation pipeline. Generally, a tribological system can be described by its system structure, input and output variables and their functional conversion within the open or closed system boundary [34] (Figure 1). The system structure consists of the relatively moving body and counter-body, which are rubbing against each other and may be completely or partially separated by an intermediate medium (liquid or gaseous). Operational input variables, such as loads, kinematics, duration, and temperatures, can be summarized in the stress collective. Depending on the latter, as well as any disturbance variables and the system structure, the body and counter-body physically and chemically interact at temporally and spatially varying locations. On the one hand, this results in loss variables such as friction and wear, which cause changes in the surface, loss of material and energy dissipation. On the other hand, this results in the actual functional variables of the tribological system. The mechanisms and applications of tribology extend over several size scales. This ranges from processes on the nano- or micro-level in the field of physics, chemistry, and material sciences, such as the formation of boundary layers or the shearing of nanoparticle layers and ends with machine elements and assemblies as well as multiple tribological contacts in the engineering sciences in the micrometer to meter range, for example in rolling bearings or gears. Accordingly, tribometry, i.e., tribological measuring and testing technology, covers all dimensional ranges of tribology determining friction and wear parameters of tribological systems. The significance of various quantifiable measured variables, e.g., a friction coefficient averaged over time or a wear coefficient, usually depends on the underlying mechanisms, the measurement method, and the objective of the study. Given the function and structure of tribological systems, tribological testing can be divided into six categories according to the simplification of the system structure, the stress collective or the environmental conditions. While original and complete systems are tested under real operating and environmental conditions in field tests (category I), this is carried out under laboratory conditions with merely practical operating conditions in test bench tests (II). In aggregate (III) and component tests (IV), this is further reduced to the investigation of original aggregates or components. Specimen tests (V) are conducted with specimens that are similar to the components and subjected to similar stresses as in the target application. Finally, model tests (VI) involve fundamental analyses of friction and wear processes with simplified specimens under defined loads. Typical representatives of the latter are disk-on-disk, cylinder-on-cylinder, ball-on-disk or pin-on-disk tribometer tests. The advantages of the individual test categories can be combined by a suitable test chain [34].
