**5. Discussion**

In the context of the "knowledge reengineering bottleneck", we introduced a semantic annotation pipeline to semi-automatically streamline the knowledge aggregation from publications within the domain of tribology. The inputs for the pipeline are publications of experimental investigations from the domain of tribology and in particular experiments of the category model test. The output is structured and linked data in form of json-files, which can be visualized as graphs (cf. Section 4.2). The pipeline is built on state-ofthe-art language models and NLP techniques and was evaluated on five representative documents. Since NLP is not in common use within the domain of tribology, there are no datasets and standard documents for training and evaluating language models. This limits the significance of the performance test conducted within this contribution since a Gold Standard accepted by the community is missing and the pipeline cannot be compared to similar projects. However, as we work with standard language models, which are approved to be reliable within NLP communities and we conducted a first evaluation of our finetuned models by manually annotating five representative documents, some assertions can still be made about the current performance. Thereby, the document extraction (module 1) has shown reliable performance on different structured and formatted publications under the premise that the provided PDFs are not defect. This was substantiated by one tested PDF document, which contains invisible overlays and therefore shows high deviations from the GT in comparison to the other documents. The PDF extraction is always a critical step within NLP processes as it depends on the quality of the PDF and accessibility of the textual and other entities within the PDF. This is one reason for the modular structure of the pipeline. The PDF extraction is only required if the input publications are in the form of PDF format (which is a common format for textual documents). Since nowadays publications are frequently available online as well, the accessibility of textual data from HTML-Websites via an API is easier when the access is provided by publishers. Therefore, PDF extraction is a pragmatic approach to access the textual information from publications. The annotation process (module 2) is performed using the SpanBERT language model, which shows remarkable high F1 scores. The NER model introduced within this publication is currently limited to publications on model tests (without claiming completeness), since those are well structured and mostly standardized. To our best knowledge, NER tagsets or language models itself as available for example in the domain of biomedicine (e.g., BioBERT) so far do not exist for the domain of tribology. In the future, the development and training of tribological language models can therefore improve the performance of applications in NLP within the domain of tribology. Furthermore, knowledge object generation is only a first step in named entity linking.

We discussed the role of knowledge objects within semantic knowledge bases within Section 2.3. The knowledge objects here are an aggregation of annotations from the document extracted by the pipeline. However, successful semantic knowledge sharing is usually community driven within a domain (e.g., OBO foundry). An established knowledge graph within the domain of tribology containing knowledge objects can therefore be extended with aggregated objects from the annotation module. Further established knowledge objects can also be enriched by the annotations. Thus, information about entities of interest in the domain of tribology can be semi-automatically acquired. Within the last module, we exploited QA to generate structured output from the unstructured and annotated data. The templates contain questions referring to tribAIn-ontology (e.g., questions about input parameters, the tribological system structure and output parameters). Overall, the QA system showed plausible answers to the tested question templates. During evaluation, we recognized a frequently appearing misconduct of the decision maker, which often could not differentiate the properties of the body and the counterbody. On the one hand, this can be attributed to an insufficient differentiation within the textual description and on the other hand to the question generation process within the QA system. The experiences with the QA module further led to two major perceptions in the context of extracting information from tribological publications. First, analyzing publications by a QA system can be exploited for a quality check and improvement of standardization of the description of experimental studies and outcomes. Thereby, question templates can be specified as a check list, what a sufficient description of experimental studies and results should contain to enable understanding and reproduction of the results. Second and relating thereto, the question templates itself have to be carefully designed to gain an answer and aggregate structured data from texts. Therefore, analyzing the publication practices and further the research practices of tribologists can give interesting insights for improving knowledge and data aggregation within the domain. However, the pipeline is intended to be human supervised, since trust is a critical issue especially within neural NLP processes which generate output without explanations of the process itself. This is the second reason for the modular architecture. The output from every module can be checked and adapted before continuing with the pipeline. This is especially important if automatic extraction is used to extend semantic knowledge bases or aggregate structured data for further processing. Besides the quality and trust of the results from the pipeline, another important issue is the computational costs. As mentioned within Section 4.3, the training of the language models took less than an hour (20 to 30 min) for each model. The low computational costs are due to the currently available pre-trained language models, which merely must be fine-tuned to be tailored to a specific domain. The execution of the annotation further only takes a few seconds. Therefore, the pipeline can be considered as very efficient compared to manual annotation.
