*4.1. Web-Application*

For testing the capability, the pipeline can be assessed through a web interface (see Data Availability Statement). The three modules communicate via REST API with the GUI. Publications have to be provided in PDF-format and can be delivered by drag and drop to the first module (1). As indicated in Figure 13, a manual control and adaptation step is integrated between the three separate modules. Thus, the acquisition of structure works in a semi-automatic manner with human supervision.

**Figure 13.** Process of document annotation via the web-interface.

The output from the first module is the extracted text and other entities (e.g., figures, tables, metadata) from the PDF. The text itself is split into chapters, paragraphs and sentences, which serve as input to the next modules. The user check enables adaptation and correction of the automatically generated output from the module (e.g., adaption of the paragraph separation). Subsequently, the pipeline can be continued via the GUI and the JSON-data produced by the first module forwarded to the document annotation module (2), which performs NER and knowledge object generation. The output from the second module can also be checked by the user after the automatic annotations are generated. The NER process thereby performs the annotation part while the knowledge objects are the semantic output of the annotation pipeline (see Figure 5 in Section 2.3 for the role of knowledge objects within a semantic knowledge base). The check especially contains proof of annotations and the aggregation to knowledge objects. Next, the pipeline is continued to the final context analyses via the document analysis module, which performs the QA and generates the structured data as final output.

#### *4.2. Resulting Knowledge Graph*

The output from the document annotation module is a linked data structure, containing the aggregated knowledge objects related to the mentions within paragraphs and tables of the publication. Therefore, the output can be visualized as a knowledge graph containing the structured data annotated within the respective publication. This is exemplified in Figure 14 for a representative publication [67]. The generated knowledge graph is a complex network of nodes and relationships. Thereby, the size of the nodes corresponds to the number of mentions of a knowledge object within the publication. This means that a knowledge object with only one mention has the minimum size while the size increases with the number of mentions. In this way, the important knowledge objects within the publication can be easily identified within the graph.

While the output graph from the document annotation module is particularly suited to identify the main topics of the analyzed publication in form of the most often mentioned knowledge objects, the output from the document analysis module is a graph generated from the question templates (Figure 15). The knowledge graph contains the identified and correctly classified answers (triangles) given by the decision maker. Thereby the triples are generated from the schema provided by the tribAIn ontology [29]. Thereby, the excerpts of the knowledge graph in Figure 15 refer to the example questions introduced in Section 2.2 regarding the tested variables (1) and the calculated wear rate (2). The generated linked data combined with an ontology provides a formally and semantically unambiguous representation which can be queried, filtered and further processed.

**Figure 14.** Schematic visualization of the resulting knowledge graph from the document annotation module for the processed representative publication [67].
