*2.5. NLP Downstream Tasks*

Information Extraction (IE) is a task of obtaining structured data from unstructured information, e.g., embedded in textual sources, by recognizing and extracting occurrences of concepts and relationships among them [49,63]. IE is often used to build knowledge graphs from textual representations (e.g., DBpedia), since those can be queried and are a common way of presenting information to users [49]. IE and semantic annotation (see Section 2.3) are often combined, since both share the subtask of NER. NER is a sequence-labeling task to recognize and tag words or phrases usually like "Person" (B-PER), "Location" (B-GEO) or "Organization" (B-ORG) within textual data. A named entity can be anything, which has a proper name, thus can be distinguished from other objects [49]. Therefore, NER is often based on a specific domain vocabulary, e.g., in biomedicine [32,64]. Moreover, relation extraction is also a subtask of IE in the context of building knowledge graphs and mainly

deals with the extraction of binary relations like child-of or part-whole relationships used within taxonomies, ontologies and knowledge graphs [49]. IE can be used for template filling, meaning recognizing and filling a pre-defined template of structured data from the unstructured sources (cf. Figure 5) [49]. Question Answering (QA) is a task of information retrieval, but with a query, which is a question in natural language and a response as an actual answer [63]. QA is often used within Chatbots of customer services or within virtual speech assistants (e.g., Amazon Alexa or Apple Siri). The main difference from classic retrieval operations is the form of asking questions in natural language instead of formal database queries and the retrieval of a precise answer to the question instead of document retrieval. Therefore, QA can be exploited for generating structured data and template filling.
