Deep Semantic Parsing with Upper Ontologies

Algirdas Laukaitis; Egidijus Ostašius; Darius Plikynas

doi:10.3390/app11209423

Abstract

This paper presents a new method for semantic parsing with upper ontologies using FrameNet annotations and BERT-based sentence context distributed representations. The proposed method leverages WordNet upper ontology mapping and PropBank-style semantic role labeling and it is designed for long text parsing. Given a PropBank, FrameNet and WordNet-labeled corpus, a model is proposed that annotates the set of semantic roles with upper ontology concept names. These annotations are used for the identification of predicates and arguments that are relevant for virtual reality simulators in a 3D world with a built-in physics engine. It is shown that state-of-the-art results can be achieved in relation to semantic role labeling with upper ontology concepts. Additionally, a manually annotated corpus was created using this new method and is presented in this study. It is suggested as a benchmark for future studies relevant to semantic parsing.

Keywords:

semantic parsing; semantic role labeling; FrameNet; WordNet; upper ontology

1. Introduction

Abstract contextual embeddings, such as WordNet synsets and distributed representations [1], have proved useful for a number of natural language tasks. By providing a mapping between WordNet synsets and formal ontology concepts, we can expect to extend traditional natural language tasks, such as sentiment analysis or topic classification, to the domains of a particular formal ontology.

This paper presents a new method for upper-ontology-based semantic parsing using FrameNet [2], WordNet [3] and PropBank [4] parsers. These parsers are based on sentence context distributed representations, and a system that integrates them into a single framework is proposed in this paper. According to the approach based on the automatic labeling of semantic roles [5], a semantic parsing can be represented as a task of labeling sentence constituents with abstract semantic roles, such as Agent, Speaker, Message, etc. Much of the work on semantic parsing is related to the disambiguation of sentence words and labeling the roles of predicates. This paper focuses on both approaches. The proposed system is used to disambiguate frame targets and WordNet synsets, then to identify frame roles and headwords in sentence constituencies that define these roles, and finally, a process is implemented that computes the identification of upper ontology concepts. This process selects only a few of the most abstract concepts that relate to physics engines in 3D graphics systems.

In order to understand this approach by example, consider the book The Hound of the Baskervilles and two sentences from the beginning and one from the middle of Chapter 1.

“Mr. Sherlock Holmes, who was usually very late in the mornings, save upon those not infrequent occasions when he was up all night, was seated at the breakfast table. I stood upon the hearth-rug and picked up the stick which our visitor had left behind him the night before.
…
He had risen and paced the room as he spoke.”

Natural language processing for text comprehension in context requires more than the classification of documents or word sense disambiguation. To understand these sentences, the important components of a semantic parsing system must include an inference using general ontology for analyzing successive utterances and the use of background knowledge for the identification of objects and pragmatic interpretation. In Figure 1, we can see several output results from our deep semantic parsing system after it has parsed the whole book. It is worth noting that Figure 1 shows only those predicate-role tuples that are relevant for the text-to-3D scene generation task [6].

Figure 1. An example of semantic parsing of sentences with an analysis of predicate roles. The labels in bold indicate the object types and object names that are imported into the Drools logical inference engine for reasoning about a scene in the 3D world.

Creating 3D scenes using only natural language can be a challenging task. Animating 3D scenes using natural language adds an additional level of complexity to this task. However, over the past 50 years, starting with the pioneering work on the SHRDLU system [7] there were many systems that have attempted to manipulate computer graphics objects using natural language (see [8] for a review of 26 systems). Many of these systems accept few sentences as input and try to identify physical objects that are relevant to the 3D scene. Systems such as SceneSeer [9] leverage spatial knowledge priors to infer implicit constraints and resolve spatial relations.

This paper takes a rather different approach. It explicitly focuses on fiction books as an input to the system. There are two reasons that an emphasis is placed on fiction books. For once, there is a complex world and rich semantic information that can be understood only by analyzing the long-distance relations between text phrases and logical inference about salient objects in the scene. It can be used as an example for our extended coreference resolution algorithm. This algorithm is able to identify the word “I” in the phrase from the sentence “I stood upon the hearth-rug…” is a reference to a physical object named “Dr. Watson” (more details on this are provided in the sections below).

The second reason for us to focus on fiction was the gamification of the annotation process. It is much more fun to interact with a program about the meaning of the text in your favorite book than annotating long boring documents drafted by some administrative office. Figure 2 presents a conceptual model of the framework, and this paper focuses on a natural language processing component called SUMO SRL.

Figure 2. Text-to-3D physics engine. The conceptual model of the proposed framework.

Early projects in text-based interfaces for 3D scene generation have addressed scenarios in which the input is a sentence and the output is a few geometric shapes. More recently, the use of machine learning has made it possible to generate more complex 3D scenes, but the input has remained close to a single sentence. The input in our system is more complex than a few sentences of text, but the system output is simplified even more than in earlier systems of text-to-3D generation.

Figure 2 shows that that the 3D representation of the world in our framework is just a grid of cells. Each cell can have a number of physical objects, the spatial properties of which are determined by the coordinates of the cell. This representation is not as simple as it looks; the design was inspired by several game development projects using the Unity 3D game engine [10]. In Unity 3D, game objects can have collider components for the detection of physical collisions. The simplest colliders have primitive geometric types such as a box or a sphere. Thus, from the point of view of the physics engine, a visually very complex 3D world can be merely a collection of boxes. In the proposed natural language processing framework, the 3D world view is a set of collision boxes labeled with the name and type of the object.

Most natural language processing systems to date annotate sentences with semantic labels and do not consider the wider use of these annotations. The use of SUMO ontology concepts as semantic labels makes it possible to use the axioms of ontology for deeper semantic reasoning about context in the text. Figure 2 shows that the role of the reasoning engine for the SUMO SRL label set is the Drools inference system [11,12]. Labels from the SUMO SRL natural language component are sent to the Drools inference system as JAVA objects. Additional inputs to Drools are the concepts and axioms of the SUMO upper ontology [13], and the output is a set of actions for the physics engine on how to instantiate objects in the 3D world (more details about this framework’s implementation can be found on the project page). As mentioned above, this paper focuses on the SUMO SRL component, but in order to understand the meaning of the labels that are produced by this sematic parsing component, it is important to understand the whole framework of text-to-3D processing.

The rest of this paper is organized as follows: Section 2 describes the natural language processing framework that was used use to identify the physical objects and processes described in the fiction books. The main novelty in this section is the proposed system architecture. It shows how to reuse existing off-the-shelf solutions and integrate them with the components developed during this research project.

Section 3 describes the various components of the tokenization of chapters, dialogues and paragraphs. Herein, we propose a simple probabilistic model for identifying the structural parts of a book.

Section 4 describes the various components of the FrameNet frame and role identification process. Here, we propose a novel approach to reuse PropBank annotations and propose an algorithm to augment those annotations with FrameNet and WordNet labels.

Section 5 presents an evaluation of the SUMO SRL method, and our final thoughts are given in the discussion section.

2. Multi-Stage Semantic Parsing

Traditionally, the natural language processing model pipeline has been used for semantic parsing tasks that have attempted to draw some inference from a text. Semantic analysis of the entire text of a book requires the addition of even more models to such semantic analysis pipelines. All these models of the upper ontology semantic parsing are grouped into several stages. Figure 3 shows four groups of interconnected activities. The first group consists of two tokenization steps. There are many word and sentence tokenizer solutions that can be used out of the box, and this project used the NLTK tokenizer [14]. The novelty of this paper lies in the tokenizer of chapters, dialogues and paragraphs. For a human reader, identifying chapters, paragraphs or dialogues can be straightforward. However, for a computer, books are just a stream of strings, and sometimes even something as simple as identifying chapters can be a challenge. During this project, we developed a simple yet practical algorithm for these text tokenization steps (see Section 3).

Figure 3. There are 4 main stages of the semantic parsing of fiction. Each NLP stage consists of several processes, separated by synchronization bars. The processes in bold indicate the algorithms that needed to be developed during this research project.

The second stage of NLP flow consists of several well-established and open source NLP frameworks that can be used for syntactic and shallow semantic parsing. Our system combines the Stanford CoreNLP [15] named entity annotator, constituency parser and universal dependencies parser with AllenNLP semantic role labeling [16] and coreference resolution [17] components. All these components are included in the package “Off-the-Shelf NLP Tools” (see Figure 3). These five off-the-shelf components can run in parallel on different computers, as shown in Figure 3. Each of these components is discussed in more detail below.

1.: The Stanford CoreNLP named entity annotator recognizes (PERSON, LOCATION, ORGANIZATION, MISC) entities. To return to the example presented in the introduction to this paper, the phrase “Sherlock Holmes” will be tagged as PERSON. In this study, we found that 96.3% accuracy can be achieved in recognizing named entities in the fiction corpus. The Stanford named entity recognizer has additional numeric entities (MONEY, NUMBER, DATE, TIME, DURATION, SET), but the SUMO SRL framework does not use them at this stage of the project. All PERSON named entity labels that the system uses are mapped to the upper ontology class “Human”, which means that the Drools inference engine instantiates objects of type “Human” with the parent class “Physical” and performs inferences on these objects. In our 3D physics engine world, this means the system has a collision box with the label “Human”.
2.: The Stanford dependencies parser outputs grammatical relations between words (see Figure 4). It is used to find headwords in noun phrases. Let us take, as an example, the sentence “He had risen and paced the room as he spoke”, and select the phrase “the room”; then, the dependencies parser will determine that the headword for this phrase is “room”. Other examples of headword selection are shown in Figure 1. The headword selection is an important part in our framework when the system maps NLP processing results to the upper ontology. To see this, let us look at another example: “A lad of fourteen, with a bright, keen face, had obeyed the summons of the manager”. This sentence contains the complex noun phrase “A lad of fourteen, with a bright, keen face,” which has been labeled ARG0 by the PropBank semantic role annotator. The only meaningful word for our 3D physics engine world is “lad”. The Stanford dependencies parser allows us to identify this word, which is later mapped to the type “Human”.

Figure 4. Dependency parsing results for sentence, “He had risen and paced the room as he spoke”.

3.: Syntactic constituency parsing is used to analyze the syntactic structure of a sentence. Figure 5 shows a typical output from the constituency parser. The intent of this research project is to find physical entities that are relevant for physics simulation systems. About 3000 sentences were annotated by hand, and it was found that choosing noun phrases was enough to achieve the goal of this project (label ‘NP’ in the parsing tree). If the parsing system takes the example presented in Figure 5, then it would select phrases “He”, “the room” and “he”. The SUMO SRL parser does not need any other types of phrases (it does not even need verbs, because it gets them as predicates from the semantic role-labeling component). From these selected noun phrases, a headword is selected using the dependency parsing results. Selected headwords are used to determine if a noun phrase marks a physical object.

Figure 5. Syntactic constituency parsing results for the sentence “He had risen and paced the room as he spoke”.

4.: The PropBank semantic role labeling comprises the task of predicting the relationship between a predicate (verb) and predicate non-overlapping sentence span. Verbs in a sentence usually indicate a specific process, and the arguments of the verb are associated with the participants in this process. In the first sentence example presented in Figure 1, the process is a sitting event, with “Mr. Sherlock Holmes” as the agent and “table” as the location. The extracted predicate–argument structure would be sit(Holmes, table). Recognition of predicate–argument structures is not a simple task since there can be many different lexical spans with different argument types. There are two frameworks in the area of semantic role labeling. One of them is the PropBank framework and the second is the FrameNet framework. The SUMO SRL framework uses the AllenNLP implementation of the BERT-based model [16] for PropBank-style annotations. During this project, a new method was developed to augment PropBank predicates and arguments with labels from the FrameNet and WordNet knowledge bases. In the section below, this new method is discussed in more detail.
5.: Coreference resolution in a long text is an important but complex and error-prone task in the proposed natural language processing framework. To continue with our example from Figure 1, looking at the second sentence, then it is possible to see that the pronoun “I” is identified as the object “Dr. Watson”. This is a case in which our deep learning coreference resolution system was able to correctly determine which object in the scene the pronoun “I” pointed to. Although standard algorithms can correctly resolve pronouns in a few sentences, this project requires the semantic parsing system to resolve pronominal coreference for the entire book. In this paper, we propose to extend existing coreference resolution algorithms by computing coreference analysis using the same algorithm, but with different settings. The second extension that is proposed is to combine the results of the recognition of named entities with the coreference analysis process.

3. Tokenization of Chapters, Dialogues and Paragraphs

A natural language interface with a text-to-3D purpose must disambiguate object descriptions based on the scene layout and capture the semantics associated with the scene’s spatio-temporal constrains. Parsing the input textual description of a scene begins by identifying a scene template that contains a set of 3D objects and a set of constraints between these objects. Usually, the input for the text-to-3D parsing system is just a few sentences.

Here the system models a scene template in a way that is similar to the SceneSeer [6,9] approach, but the input to the system is an entire book that can contain many scenes. The identification of a change in a scene is a challenging task, and currently at this stage of research, the SUMO SRL system tries to identify the scene by analyzing the beginning of the chapter, the flow of dialogue and the starting position of a new paragraph.

The main outcome of this section is simple algorithms for identifying chapters, paragraphs and dialogues. All of these algorithms use the maximum entropy approach and have the same structure. They take the text of a book as an input and learn to identify tokens that mark the beginning of a chapter, dialogue or paragraph in the text. These algorithms integrate techniques adapted from our previous work [18] on text filtering and segment labeling in a three-step process.

First, the system tries to identify the text segments that mark possible chapter headings using regular expressions (see Table 1). It employs a novel text segment scoring technique to efficiently find the best regular expression that gives the highest probability for all matched segments to be used as the chapter heading.

Table 1. An example of a small subset of regular expressions used to identify some fragments in a text.

The formal definition of a chapter heading model begins with a set of variables C, R_i and M_i. Let us define i as the index for the regular expression R_i, then M_i is the result of the match of this regular expression. The C variable is defined as a boolean to mark the true segmentation of the text into headings and chapter bodies. The conditional probability of this variable

P (C | R_{i}, M_{i}, T)

can be defined as the probability that R_i will be the true model for chapter headings in the book text T. There are several frameworks to obtain

P (C | R_{i}, M_{i}, T)

, but this paper uses the maximum entropy approach

P (C | R_{i}, M_{i}, T) = \frac{\exp (\sum_{s = 1} λ_{s} ξ_{s} (R_{i}, M_{i}, T))}{\sum_{^{c \in {0, 1}}} \exp (\sum_{s = 1} λ_{s} ξ_{s} (R_{i}, M_{i}, T))} .

(1)

ξ_{s}

are the feature functions that model the relevant information about chapter heading segmentation. Then, the system makes a decision about M_i by choosing one with the maximum probability value

\hat{M} = \underset{R_{i}}{\arg \max} P (C = t r u e | R_{i}, M_{i}, T)

(2)

The following list defines a set of feature functions used in equation 2 to segment chapter headings:

The scoring of a regular expression. As a scoring function, the system uses the a priori probability that each R_i has as a meta-parameter.
The number of regular expressions that give the same match result as the current one. This feature function aims to give a higher score for the same match by several regular expressions.
The feature function returns 1 if there is a sequential numbering in chapter headings.
The feature function returns the value of the normal distribution of the length of the match result.

Then, the system uses a subset of regular expressions to find the best result for paragraph segmentation. The model is the same as for chapter segmentation; the only difference is in a different set of regular expressions.

After that, for each paragraph with the best score, the system labels each sentence as a dialogue or narrator. Again, it uses a set of regular expressions to identify if a sentence in a paragraph belongs to the dialogue or the narrator.

This research project evaluated several different natural language processing frameworks and found that none of them has a subsystem for identifying chapters, dialogues or paragraphs. The simple probabilistic model proposed here can provide an important starting point for this topic.

4. Integration of PropBank, FrameNet and WordNet

PropBank and FrameNet are popular resources related to semantic role labeling. The PropBank corpus has verbs annotated with sense frames and it puts semantic information about the verbs in the form of possible semantic roles each frame could take. In this project, a rather simplified approach to the AllenNLP semantic role labeling system is used, i.e., verbs are not labeled at all, and all semantic information is conveyed in role labels. Let us take the first sentence from Figure 1 and look at the output of the AllenNLP semantic role labeling system for the two verbs “was” and “seated”.

[Mr. Sherlock Holmes]_(ARG1), [who]_(R-ARG1)<was> [usually]_(ARGM-TMP)[very late in the mornings]_(ARG2), [save upon those not infrequent occasions when he was up all night]_(ARGM-ADV), was seated at the breakfast table.

[Mr. Sherlock Holmes, who was usually very late in the mornings, save upon those not infrequent occasions when he was up all night]_(ARG1), was <seated> [at the breakfast table]_(ARG2).

From the first set of PropBank-style annotations, the only useful information for our system is the fact that there is a physical object named “Sherlock Holmes” and of type “PERSON”. This is because our main goal is to identify words that mark physical objects in natural language sentences and use them to interact with the physics engine in a virtual environment. The SUMO SRL parsing system is not interested in any other concept outside the domain of the physics engine. Thus, the system only needs to know a few parameters for each concept that it identifies as a concept in 3D world domain: (1) the size of the box that can be wrapped around the object; (2) the coordinates where the center of this box is located; (3) if the box is stationary, and if not, what is the initial velocity of this box.

From the second set of PropBank annotations, the system needs to know that there is a second physical object named “table”, and the physical object “Sherlock Holmes” is located next to the “table” object. In addition, it needs to know that the <seated> predicate implies that the object “Sherlock Holmes” is not moving. The last requirement that the system must be able to implement is the ability to infer that (1) the physical object “table” implies, by default, the existence of the physical object “room”; (2) the physical object “room” is our scene, on which the physics engine acts as one of the components of the virtual environment.

Obviously, it is not possible to implement all of these functional requirements from PropBank annotations, and additional annotations are required for this. During this research project, the research team tried to explore various linguistic resources to achieve this goal. The solution presented in this section is based on the use of FrameNet and WordNet systems.

The Stanford named entities recognition system can identify that there is an object “Sherlock Holmes” of the type “PERSON”. Using the Stanford dependencies parser it is possible to determine that “Holmes” is the headword in a phrase tagged with the PropBank tag (ARG1), and “table” is the headword in a phrase with the tag (ARG2). Stanford parsers are off-the-shelf components that can be integrated into an existing natural language processing pipeline simply by using the command line in Linux or Windows operating systems. However, the question remains, is the object “table” a physical object, and if so, what are the spatial relationships between it and the rest of the scene objects?

Figure 1 shows that proposed NLP system can complement the PropBank tag (ARG1) with the FrameNet tag (Agent) and the PropBank tag (ARG2) with the FrameNet tag (Location). In addition, Figure 1 shows that the verb “seated” is tagged with the FrameNet tag (Posture). The system can get this rich semantic information if it can get a FrameNet parser with a low error rate. There are several FrameNet parsers available as open source projects, but the project team was able to compile, test and integrate with other components of our system only the SEMAFOR [19] parser. However, it found that SEMAFOR did not meet the recall and precision requirements for our project. For example, SEMAFOR could not parse the word “was” (the verb “be” is not defined in FrameNet) in all sentences of the book and could not identify “Sherlock Holmes” as the “Agent” for the predicate “seated”.

In most research projects, the problem of frame semantic parsing is modeled in two stages: frame identification and argument identification. In FrameNet, frame identification is simply the disambiguation of frame words. For example, the verb “stood” in the second sentence in Figure 1 has five frames (Posture; Placing; Change_posture; Being_located; Occupy_rank;), and the FrameNet parser must decide which one to choose. The second stage is more difficult than the first one, because in the argument identification process one must take into account all possible phrases in the sentence and all possible role labels in the frame. The novelty of this paper lies in both stages: the proposal to use BERT embeddings at the frames’ identification stage and the PropBank augmentation statistical model in the argument identification stage.

The following describes an algorithm that uses BERT contextual attachments as inputs and gives a probability distribution for possible semantic frames. Contextual word embedding is a distributed representation of semantics, in which each word is represented as a vector in

R^{m}

. For example, consider our example sentence “I stood upon the hearth-rug and…”. Table 2 shows some statistics for the verb “stood” in the FrameNet corpus.

Table 2. Statistics for the verb “stood” in the FrameNet corpus.

It is possible to represent the context of “stood” as a 1024-dimensional vector using a language representation model called Bidirectional Encoder Representations from Transformers (we use BERT_LARGE (L = 24, H = 1024, A = 16)) [20]. The first stage in our algorithms: for all 237,161 sentences in the FrameNet corpus the system extracts 1024-dimensional vectors for each word that targets the frame, and these vectors are stored in the database for late use in the inference stage.

During the inference process (see Figure 6), the new predicate verb is mapped to the same 1024-dimensional space using BERT_LARGE. The process then selects and loads all the vectors from the FrameNet sample database, where each sample sentence has the same verb as our new verb. If we continue with our example sentence “I stood upon the hearth-rug and…”, then it means that the system loads all 10 + 2 + 8 + 14 + 85 = 119 vectors for five frames, as is shown in Table 2. At the last stage, the system uses the k-nearest neighbors algorithm to classify our new verb. These 119 vectors serve as training examples for k-NN.

Figure 6. Frame identification using the k-NN approach for the new sentence (blue) and manually annotated examples from the FrameNet corpus (white and green).

An important note needs to be made to explain how the system chooses the parameter k in the k-NN algorithm. It always chooses k using the following simple steps: (1) it selects all frames that are triggered by the new verb (five frames in Table 2 in our example for the verb “stood”); (2) then, from the selected frames it chooses the frame with the least number of samples, and marks this number of samples as m (this will be the frame Change_posture with a sample size of two, i.e., m = 2); (3) our parameter k will be k = m.

The following describes the argument identification model that is used in frame-semantic parsing. It is assumed that the system already disambiguated the frame predicate (e.g., <Posture> in Figure 6 for the word “stood”). The novelty of this method lies in the fact that it is simple, but at the same time provides state-of-art accuracy. The standard approach to identifying roles is to start by selecting a set of semantic roles from the frame lexicon and supplement it with a null role. Then it is necessary to consider a set of intervals that can potentially fulfill a semantic role.

The complexity of the task of assigning semantic roles can be estimated using the following arguments. (1) If there is a sentence of n words, then there are 2⁽ⁿ⁻¹⁾ possible sentence segments. (2) This is easy to see if you interpret the existence of a boundary as 1 and the non-existence of 0 between any two words. (3) Let us take any random segmentation solution and consider how many ways it is possible to assign role labels to these segments. To simplify the argument, repeat this as many ways possible to assign role labels to n words, and the answer is mⁿ 4. Then, there are O(2⁽ⁿ⁻¹⁾ * mⁿ) role assignments.

Many existing methods [19] put hard constrains on the choice of possible spans to narrow down the set of possible role assignments, and then use ILP solvers to perform the final inference.

To understand the novelty of our approach, let us look at the data in Table 3. This table shows the results after parsing the entire FrameNet corpus using the AllenNLP system, which assigns PropBank-style labels to sentence phrases. This table shows that the probability for a sentence phrase to have a FrameNet label “Agent” given the AllenNLP tag ARG1 is one (P(Agent|ARG1) = 1). The same can be said for the argument ARG2 (P(Location |ARG2) = 1). In these cases, there is no need for any state space search algorithm, and the system can use a pattern-matching approach. In the case in which there is a need to disambiguate the AllenNLP labels, the system runs the following simple algorithm: (1) it obtains the embedding vector for the first word in the role phrase using the BERT_LARGE model; (2) then it uses the 1-NN algorithm to find the closest match between this vector and all the vectors from the FrameNet corpus that it parsed with BERT_LARGE; (3) a label from the closest vector is assigned to the new vector.

Table 3. Statistics for the frame <Posture> and the verb “stand” in the FrameNet corpus after the system has parsed it using the AllenNLP system (P-Role stands for PropBank role and F-role stands for FrameNet role).

Figure 1 shows that there is one result in the first sentence that was not explained. The phrase “at the breakfast table” is annotated by our PropBank-FrameNet parser as <Location>, but the system needs to know what physical object can be used in the scene to represent this <Location>. Using the Stanford dependency parser, the system can identify the headword “table” and use it as the main word for an object in the scene. However, the question remains whether the word “table” is a physical or abstract object and what properties can be used to describe the semantics for this object.

Word sense disambiguation (WSD) is an important task in the natural language processing pipeline that assigns the correct sense to a word in a given context. Next, a method for generating upper ontology concept embeddings with the full coverage of WordNet is presented. This method is very similar to the frame identification method proposed above. The method consists of five basic steps:

For the training dataset the system collected six standard WSD datasets: SemCor [21], Senseval-2 [22], Senseval-3 [23], SemEval-07 [24], SemEval-13 [25] and SemEval-15 [26].
The system extracts all sense descriptions from the WordNet database and adds them to the dataset that was created in step 1.
The SUMO ontology has a mapping to WordNet synsets. The system uses this mapping to annotate the WordNet corpus with SUMO ontology concept labels.
The system uses a mapping from FrameNet to the SUMO ontology to annotate the most frequent FrameNet frames and lexical units with SUMO concept labels.
The procedure for finding the embeddings of SUMO concepts is the same as for identifying frames.

Figure 7 shows a small experiment that was conducted to demonstrate our approach for word sense disambiguation. Let us return to the first sentence of our example presented in Figure 1 and analyze the word “table”. The system selects from the training corpus all sentences with the word “table” and calculates the embeddings for each instance of the word “table”. It uses NLTK WordNet sense labels as the class label. Then it computes principal component analysis for 90% of the sentences from the training data and uses the rest as test data. Figure 7a shows all embedded examples for the word “table” projected on two principal components. It can be seen that there are two clusters of data points. One (brown), marked with the label “table.n.01”, represents a sense with the description: “a set of data arranged in rows and column”. The second largest group of senses is labeled “table.n.02” with the description: “a piece of furniture having a smooth flat top that is usually supported by one or more vertical legs”. These two senses can be perfectly separated, and what is more, the system does not need to store all 1024 dimensional embedding vectors in a database. It is enough to have two principal components of the embedding space in order to have a perfect classification of these senses.

Figure 7. Clustering of word embeddings in the WordNet corpus, (a) all embedded examples for the word “table”, (b) examples for the words: “table”, “lamp”, “bookcase”, “nest”, (c) embedded examples for the word “run”, (d) senses for the word “run” after labeling them into two groups: “Motion” and “NoMotion”. Each subplot has two accuracy values: the first value represents the classification accuracy when the data points are projected on the two principal components, and the second value represents the k-NN classification when the data points are original vectors in 1024-dimensional space.

There is one more important conclusion that can be drawn from Figure 7a using the WordNet dictionary—using hypernym relations, one can conclude that the meaning of “table.n.05” (red square) is an abstract thing. The description of this sense is as follows: “a group of persons together in one place”, and the direct hypernym is “social_group%1:14:00::”. Now let us observe which sentence is encoded with this red square: “I felt the temblor begin and glanced at the table next to mine, smiled that guilty smile and we both mouthed the words, “Earth-quake! “together.” If the system wants to recreate this sentence as a 3D scene, then the word “table” would represent a set of objects: people and a table as furniture. On the other hand, WordNet will suggest interpreting it as an abstract object and ignoring it for the scene generation task. There are two important lessons to be drawn from these examples. First, the system needs to use caution when using WordNet for a text-to-3D task, and second, the WSD error from the WordNet point of view does not mean an error in terms of a text-to-3D task.

Figure 7b shows another interesting point about the semantic meaning of word embeddings in context. It is shown here that the greatest separation between data points occurs when the system considers things to be divided into abstract and physical categories.

Figure 7c,d show embeddings of the word “run”. Figure 7c shows all senses from the WordNet corpus. It shows that in this case, the disambiguation is more difficult than for the word “table”. On the other hand, if the system regroups senses for the word “run” into two groups “Motion” and “NoMotion”, then the disambiguation of senses becomes more accurate.

5. Results

The harmonic mean between precision and recall (F1) is used in all experiments because it is the most commonly used metric in the WSD and SRL literature to measure classification results. The dataset consists of several well-known datasets plus the dataset that was created during this project. The following list shows all the experimental datasets:

The WordNet Unified Evaluation Framework [27]. From this framework, six data sets were used: Senseval-2, Senseval-3, SemEval-07, SemEval-13, SemEval-15 and SemCor. Additionally, the system adds descriptions of WordNet synsets.
The Berkeley FrameNet project. An annotated example corpus based on the theory of frame semantics.
During this research project, The Hound of the Baskervilles was manually annotated using four annotation systems (see Supplementary Materials): (1) WordNet, (2) FrameNet, (3) SUMO ontology concepts, 4) a set of objects and their references in the scene.

Table 4 shows the results of our WordNet sense disambiguation system (BERT₁₀₂₄ k-NN) for these datasets. The semantic analysis method presented in this paper parses only nouns and verb, so the system details results per part of speech. As a benchmark for this experiment, MFS [27] and LMMS2348 (BERT) [28] were used. LMMS2348 is similar to the method proposed in this paper, but it uses additional embedding vectors. A smaller vector size of the BERT embeddings was also used (BERT-Medium512 k-NN).

Table 4. WordNet sense disambiguation results (F1 score).

From this experiment, one can observe that the book dataset shows a higher F1 score compared to the Senseval dataset. This can be explained by the fact that the Senseval dataset is more diverse and was created to test the word sense disambiguation with a large set of different words. The book’s dataset has a smaller vocabulary and this fact may explain some of the differences in the F1 score.

Another interesting conclusion from this experiment can be drawn by noting two facts: firstly, all embedding methods outperform the MFS approach by a significant margin, and secondly, the difference between embedding methods is within a few percentage points. This explains why the system chose BERT₁₀₂₄ k-NN as the final step implemented in the natural language processing pipeline. Even if the LMMS₂₃₄₈ (BERT) method performs slightly better, it takes up more than twice the memory space and more than quadruples the processing time in our implementation. WSD has a small part in this text-to-3D framework, and the system has to consider the performance of each component to obtain reasonable processing resources for the whole framework.

There is another reason why there is no need to account for a difference of a few percentage points in the F1 score to be an important argument in favor of a more complex model. WordNet has over 100,000 synsets and the system needs far fewer concepts to create a few simple objects in a 3D environment. The analysis of several results of word disambiguation in which the system made a mistake showed that these results are not an error from the point of view of the upper ontology. To be more precise, the project team investigated the verb “increase” when the system made a mistake and the MFS system disambiguated correctly. There are two senses of the verb “increase” in the WordNet dictionary: (1) “increase%2:30:00::”—(become bigger or greater in amount) (2) “increase%2:30:02::”—(make bigger or more). However, if one looks at the word “increase” in the SUMO ontology, one will find that both senses correspond to the concept of “increasing”. This suggests that for some natural language understanding tasks, upper ontologies may be more appropriate than fine-grained semantic dictionaries such as that of WordNet. Table 5 shows the results of the experiment that was conducted to test this hypothesis.

Table 5. SUMO concept disambiguation task.

The SUMO upper ontology is indexed with WordNet senses, and the system can use this index in all word sense disambiguation tasks by replacing WordNet senses with SUMO concepts. So, the project team took the dataset that was used in the experiment above (Table 4) and replaced the WordNet senses with SUMO concept labels. The F1 score was expected to increase, and indeed Table 5 shows a slight increase in the F1 score (columns “Senseval All SUMO” and “Book Dataset All SUMO”). The unexpected result of this experiment was that F1 increased by only a few percentage points, but the expectation was a large margin when the system indexes the corpus using the concepts of upper ontology. It was therefore a logical step to try a small subset of the most abstract SUMO concepts, which was used to create objects in the text-to-3D task. The “Senseval SUMO 3D” and “Book Dataset SUMO 3D” columns show that even the MFS method can slightly improve the classification score because the system uses labels that combine most synsets into a few concepts.

A PropBank-style automatic and accurate shallow semantic parser can annotate text with a semantic argument structure, which can form the basis for additional semantic annotations. One of the novelties proposed in this paper is based on the observation that the AllenNLP semantic role-labeling system, when presented with a sentence, is able to accurately identify each predicate in the sentence using the predicate’s semantic arguments, which the system annotates with additional labels from the FrameNet system.

The following experiment demonstrates the usefulness of the PropBank role annotation approach for FrameNet-based SRL models. The hypothesis is that the PropBank semantic parser can considerably improve the structural detection of role spans, and the feature-based classifier can considerably improve the process of labeling the semantic roles of the FrameNet system.

Accordingly, this hypothesis is tested using the output of the SEMAFOR and SUMO SRL systems. The first system, SEMAFOR, is the baseline for the frame-semantic role labeling. Both systems receive as an input a random set of sentences from the book corpus and the FrameNet example corpus.

Table 6 summarizes our results with SUMO SRL and SEMAFOR using manually annotated frames and roles. The focus of this experiment is only on the verb as a predicate. Compared to SEMAFOR, this table shows that the SUMO SRL provides some gain of 12.5 points in F1 for the frame identification task and 5.1 points for the role identification task.

Table 6. FrameNet shallow semantic parsing of sentences from the book corpus and a set of sentences from the FrameNet corpus.

The last experiment that was conducted in this research project is related to the identification of physical objects in the scene.

So far, the results have been presented only on the problem of coreference resolution, which was considered as part of the process of identifying an object in a scene. Coreference resolution is the task of grouping mentions in the text that refer to the same underlying real-world object. Our baseline model is the end-to-end span-based neural model from the AllenNLP system that implements the method described in [29].

Table 7 shows that there is an improvement when the system uses the SUMO SRL heuristic method. This is not a big surprise because SUMO SRL’s coreference resolution method reuses the AllenNLP coreference method with different parameters. The parameters and their heuristic rules are written in such a way as to try to resolve the coreference errors when the AllenNLP coreference method failed. However, this was an interesting experiment, because there is no other work that has tried to identify objects in an entire fiction book.

Table 7. Coreference and object identification (F1).

6. Discussion

The following discusses the extent to which the SUMO SRL framework leverages the semantic role labeling and the SUMO upper ontology for fiction semantic parsing. One of the important results of this research project is a dataset with annotations of frames and role assignments for the entire book The Hound of the Baskervilles. This dataset has been manually reviewed and edited. As far as we know, no attempt has been made to analyze the entire book and try to present its content in any formal framework. This dataset will encourage other researchers to focus more on this complex area of natural language processing.

The SUMO SRL system must understand the world around it in order to understand the language. To do this, it uses common ontologies to explicitly express knowledge about the world. A writer cannot describe all the details of a scene, and our knowledge of general facts provides these details for a better understanding of the language. Therefore, to develop useful systems that can interact with people about what is written in the book, one needs to use general ontologies to interpret the language in context. The SUMO ontology is a collection of about 20,000 concepts linked into a logical theory with 70,000 axioms. Axioms are presented in a first-order logical form and impose constraints on the interpretation of concepts. The SUMO upper ontology is one of the largest open source ontologies, and experiments have been performed in an attempt to answer the question of how useful the upper ontologies are for semantic parsing.

The conclusion one can draw from our experiment using SUMO for semantic parsing is that it is possible to achieve an improvement of a few percentage points in accuracy when there is a requirement to identify physical objects in a scene. In addition, the experiment has shown that the SUMO axioms can be useful for identifying a scene from text. Another conclusion that can be drawn is that the number of axioms needed to better understand the content of the book must be much greater than what is currently available. These commonsense knowledge requirements can be addressed with a comprehensive upper ontology with a built-in inference engine. One of the well-known sources of this type of knowledge is the Cyc system [30]. Cyc is a large knowledge base with a commonsense reasoning engine, but it is proprietary. SUMO is the best-known alternative to Cyc and it is an open source ontology. One of the goals of this project was the intention to focus on the parsing of fiction and to introduce some gamification necessary for the further development of the SUMO ontology.

This paper introduced a new shallow semantic parser based on the PropBank semantic role annotation process. The idea was to test the hypothesis that it is enough to analyze the roles of the PropBank parser, and then annotate them with labels from the FrameNet, WordNet and SUMO systems in order to obtain the semantic information needed for the text-to-3D task. The research team developed a parser that augments PropBank labels with FrameNet, WordNet and SUMO labels. PropBank annotations are less domain-specific and the label set size is relatively small (in our corpus, over 90% of annotations are covered using only 12 labels). On the other hand, the FrameNet label set size exceeds 1000 and WordNet exceeds 100,000. These labels allow us to refine PropBank annotations and prepare parsing results to map natural language labels to domain-specific ontology. The manually annotated corpus shows that it is a sound idea.

It is important to notice that this approach allows one to filter out a significant part of the information that is not related to the text-to-3D task. If one takes the first sentence from the proposed example, then this filtering process will look like this: (1) the PropBank parser will take the verb phase “was seated” and select the word “seated” as the headword that marks the predicate under consideration; (2) then the PropBank parser will indicate the phrase “Mr. Sherlock Holmes, who was usually very late in the mornings, save upon those not infrequent occasions when he was up all night” as an argument of type <ARG1> and the phrase “at the breakfast table” as an argument of type <ARG2>, and this will indicate that the system can compress two long phrases into two entities in a 3D scene.

The whole process of labeling that was mentioned gives us only shallow semantic information, that is, a system cannot make a logical inference using the axioms of ontology or logical relations expressed as predicates of the ontology. The SUMO ontology has a Sigma browser with built-in automated theorem-proving systems, in particular E and Vampire [31,32]. It is possible to use these automated theorem-proving systems to reason about implicit knowledge in a scene, but in the proposed framework, the project team decided to test the more programmer-friendly Drools system. The team transformed SUMO ontology concepts into Java classes and SUMO axioms into Drools rules.

Many formal systems have been suggested to connect natural language with objects and movements in 3D scenes. The SHRDLU system presented a starting point for this purpose. More recently, the Stanford Text2Scene system presented a text-to-3D scene generation solution by learning spatial knowledge from 3D scene data. The authors demonstrated that it is possible to infer unstated implicit constraints between various objects in a room scene. This paper focuses on the 3D scene-modeling system’s physics engine rigid-body collision detection subsystem. These subsystems are used to simulate the motion of solid objects. They affect the position and orientation of objects and do not deform them. The system uses box-like collision shapes, the simplest possible collision shape of the object, because the physics engine must be simple in our goal to build a natural language processing system that can interact with 3D modeling systems.

The research team defined concept-to-physics generation as the task of taking semantic role labels that describe a scene in a book as an input, and generating a plausible 3D scene representation in Drools working memory in terms of Java objects as the output. More specifically, based on the labels from the NLP framework, the system instantiates objects in the Drools memory and then runs an inference process based on the Drools agenda-group workflow.

The presented experiments showed that for WordNet sense disambiguation, the system is on par with the prior state of the art for verbs and nouns. Moreover, they showed that by replacing WordNet synsets with a small set of upper ontology concepts, it is possible to improve the accuracy of the identification of predicates. It is possible to improve the performance of word embedding using the noisy text [33] approach. This could be an interesting project for future research on word embeddings. The proposed method is focused on the embeddings of the English language, but it allows the extension of semantic parsing techniques, using methods such as the universal semantic dictionary [34], to a multilingual domain. Second, we presented the results of the proposed new approach to the problem of identifying FrameNet roles. It has been shown that it is possible to improve performance and simplify the task of identifying frame roles by transforming the task of identifying FrameNet roles into the task of labeling PropBank roles. Finally, we have completed the identification task for the objects in a scene using upper ontology. It is not possible to compare this task with any existing benchmarks, as this is a new task, but this study has shown that it is possible to achieve high accuracy when the system compares it to the manually labeled data that were created during this project. The reported results for this task will serve as a benchmark for future research projects.

Supplementary Materials

The SUMO SRL framework code and data are available online at https://github.com/aalgirdas/novel-semantic-parsing.

Author Contributions

Conceptualization, A.L. and E.O.; methodology, A.L. and D.P.; software, A.L.; validation, A.L., E.O. and D.P.; formal analysis, E.O. and D.P.; investigation, A.L. and E.O.; resources, A.L. and E.O.; data curation, A.L. and D.P.; writing—original draft preparation, A.L., E.O. and D.P.; writing—review and editing, A.L., E.O. and D.P.; visualization, A.L.; supervision, A.L.; project administration, A.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Loureiro, D.; Jorge, A. Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 5682–5691. [Google Scholar] [CrossRef] [Green Version]
Baker, C.F.; Fillmore, C.J.; Lowe, J.B. The berkeley framenet project. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, QC, Canada, 10–14 August 1998; Volume 1, pp. 86–90. [Google Scholar]
Fellbaum, C. WordNet. In Theory and Applications of Ontology: Computer Applications; Poli, R., Healy, M., Kameas, A., Eds.; Springer: Dordrecht, The Netherlands, 2010; pp. 231–243. [Google Scholar]
Kingsbury, P.R.; Palmer, M. From TreeBank to PropBank. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC), Las Palmas, Spain, 29–31 May 2002; pp. 1989–1993. [Google Scholar]
Gildea, D.; Jurafsky, D. Automatic labeling of semantic roles. Comput. Linguist. 2002, 28, 245–288. [Google Scholar] [CrossRef]
Chang, A.; Savva, M.; Manning, C.D. Learning spatial knowledge for text to 3D scene generation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 26–28 October 2014; pp. 2028–2038. [Google Scholar] [CrossRef] [Green Version]
Winograd, T. Understanding natural language. Cogn. Psychol. 1972, 3, 1–191. [Google Scholar] [CrossRef]
Hassani, K.; Lee, W.S. Visualizing natural language descriptions: A survey. ACM Comput. Surv. 2016, 49, 1–34. [Google Scholar] [CrossRef] [Green Version]
Chang, A.X.; Eric, M.; Savva, M.; Manning, C.D. SceneSeer: 3D scene design with natural language. arXiv 2017, arXiv:1703.00050. [Google Scholar]
Okita, A. Learning C# Programming with Unity 3D; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
Browne, P. JBoss Drools Business Rules; Packt Publishing Ltd.: Birmingham, UK, 2009. [Google Scholar]
Proctor, M. Drools: A rule engine for complex event processing. In Proceedings of the International Symposium on Applications of Graph Transformations with Industrial Relevance, Budapest, Hungary, 4–7 October 2011; p. 2. [Google Scholar]
Niles, I.; Pease, A. Towards a standard upper ontology. In Proceedings of the International Conference on Formal Ontology in Information Systems, Ogunquit, ME, USA, 17–19 October 2001; pp. 2–9. [Google Scholar]
Bird, S. NLTK: The natural language toolkit. In Proceedings of the COLING/ACL Interactive Presentation Sessions, Sydney, Australia, 17–18 July 2006; pp. 69–72. [Google Scholar] [CrossRef]
Manning, C.D.; Surdeanu, M.; Bauer, J.; Finkel, J.R.; Bethard, S.; McClosky, D. The Stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, MD, USA, 23–24 June 2014; pp. 55–60. [Google Scholar] [CrossRef]
Shi, P.; Lin, J. Simple bert models for relation extraction and semantic role labeling. arXiv 2019, arXiv:1904.05255. [Google Scholar]
Lee, K.; He, L.; Zettlemoyer, L. Higher-order coreference resolution with coarse-to-fine inference. arXiv 2018, arXiv:1804.05392. [Google Scholar]
Laukaitis, A.; Plikynas, D.; Ostasius, E. Sentence Level Alignment of Digitized Books Parallel Corpora. Informatica 2018, 29, 693–710. [Google Scholar] [CrossRef] [Green Version]
Das, D.; Chen, D.; Martins, A.F.; Schneider, N.; Smith, N.A. Frame-semantic parsing. Comput. Linguist. 2014, 40, 9–56. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Miller, G.A.; Chodorow, M.; Landes, S.; Leacock, C.; Thomas, R.G. Using a semantic concordance for sense identification. In Proceedings of the Workshop Human Language Technology, Plainsboro, NJ, USA, 8–11 March 1994; pp. 8–11. [Google Scholar]
Edmonds, P.; Cotton, S. Senseval-2: Overview. In Proceedings of the SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems, Toulouse, France, 5–6 July 2001; pp. 1–5. [Google Scholar]
Snyder, B.; Palmer, M. The English all-words task. In Proceedings of the SENSEVAL-3, Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain, 25–26 July 2004; pp. 41–43. [Google Scholar]
Pradhan, S.; Loper, E.; Dligach, D.; Palmer, M. Semeval-2007 task-17: English lexical sample, srl and all words. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), Prague, Czech Republic, 23–24 June 2007; pp. 87–92. [Google Scholar]
Navigli, R.; Jurgens, D.; Vannella, D. Semeval-2013 task 12: Multilingual word sense disambiguation. In Proceedings of the Second Joint Conference on Lexical and Computational Semantics, Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Atlanta, GA, USA, 14–15 June 2013; pp. 222–231. [Google Scholar]
Moro, A.; Navigli, R. Semeval-2015 task 13: Multilingual all-words sense disambiguation and entity linking. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, CO, USA, 4–5 June 2015; pp. 288–297. [Google Scholar]
Raganato, A.; Camacho-Collados, J.; Navigli, R. Word sense disambiguation: A unified evaluation framework and empirical comparison. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, 3–7 April 2017; pp. 99–110. [Google Scholar]
Loureiro, D.; Jorge, A.M.; Camacho-Collados, J. LMMS Reloaded: Transformer-based Sense Embeddings for Disambiguation and Beyond. arXiv 2021, arXiv:2105.12449. [Google Scholar]
He, L.; Lee, K.; Levy, O.; Zettlemoyer, L. Jointly predicting predicates and arguments in neural semantic role labeling. arXiv 2018, arXiv:1805.04787. [Google Scholar]
Lenat, D.B. CYC: A large-scale investment in knowledge infrastructure. Commun. ACM 1995, 38, 33–38. [Google Scholar] [CrossRef]
Schulz, S.; Sutcliffe, G.; Urban, J.; Pease, A. Detecting inconsistencies in large first-order knowledge bases. In Proceedings of the International Conference on Automated Deduction, Gothenburg, Sweden, 6–11 August 2017; pp. 310–325. [Google Scholar]
Pease, A.; Sutcliffe, G.; Siegel, N.; Trac, S. Large theory reasoning with SUMO at CASC. Ai Commun. 2010, 23, 137–144. [Google Scholar] [CrossRef]
Doval, Y.; Vilares, J.; Gómez-Rodríguez, C. Towards robust word embeddings for noisy texts. Appl. Sci. 2020, 10, 6893. [Google Scholar] [CrossRef]
Castro-Bleda, M.J.; Iklódi, E.; Recski, G.; Borbély, G. Towards a Universal Semantic Dictionary. Appl. Sci. 2019, 9, 4060. [Google Scholar] [CrossRef] [Green Version]

Figure 1. An example of semantic parsing of sentences with an analysis of predicate roles. The labels in bold indicate the object types and object names that are imported into the Drools logical inference engine for reasoning about a scene in the 3D world.

Figure 2. Text-to-3D physics engine. The conceptual model of the proposed framework.

Figure 3. There are 4 main stages of the semantic parsing of fiction. Each NLP stage consists of several processes, separated by synchronization bars. The processes in bold indicate the algorithms that needed to be developed during this research project.

Figure 4. Dependency parsing results for sentence, “He had risen and paced the room as he spoke”.

Figure 5. Syntactic constituency parsing results for the sentence “He had risen and paced the room as he spoke”.

Figure 6. Frame identification using the k-NN approach for the new sentence (blue) and manually annotated examples from the FrameNet corpus (white and green).

Figure 7. Clustering of word embeddings in the WordNet corpus, (a) all embedded examples for the word “table”, (b) examples for the words: “table”, “lamp”, “bookcase”, “nest”, (c) embedded examples for the word “run”, (d) senses for the word “run” after labeling them into two groups: “Motion” and “NoMotion”. Each subplot has two accuracy values: the first value represents the classification accuracy when the data points are projected on the two principal components, and the second value represents the k-NN classification when the data points are original vectors in 1024-dimensional space.

Table 1. An example of a small subset of regular expressions used to identify some fragments in a text.

RegEx Nr.	Regular Expression	Description
1	^Chapter ([IVXLC][IVXLC]*)$	Line consists of the “Chapter” string, followed by space and at least one roman numeral.
2	^[ ]{0,1}(\d){1,3}[ \d]{0,3}$	The line is up to 7 characters long. It consists only of digits and spaces.
3	^ Chapter (one\|two\|three… \|forty\|fifty\|… hundred)(.{0,55})$	Line consists of the “Chapter” string, followed by a space and any string that contains numbers as words.
4	(\p{L})\^\~(\p{L})	Matches two letters that have the ‘^~’ string between them.
5	\r\n\r\n\r\n	Three sequential strings of carriage return.
6	^.{1,30}[A-Z][A-Z].{1,30}$	At least two UPPER letters and lines of limited length.
7	^[\*].+$	Select the line that starts with asterisk (useful for the detection of footnotes).

Table 2. Statistics for the verb “stood” in the FrameNet corpus.

Frame	Number of Sentences for the Verb “stood”	All Number of Sentences in Frame
Being_located	10	140
Change_posture	2	499
Occupy_rank	8	67
Placing	14	1379
Posture	85	510

Table 3. Statistics for the frame <Posture> and the verb “stand” in the FrameNet corpus after the system has parsed it using the AllenNLP system (P-Role stands for PropBank role and F-role stands for FrameNet role).

FrameNet Role	PropBank Role	# of Cases	P(F-role\|P-Role)
Agent	ARG1	83	1
Location	ARG2	71	1
Depictive	ARGM-PRD	11	0.73
Depictive	ARGM-MNR	5	1
Time	ARGM-TMP	5	0.63
Agent	ARGM-R-ARG1	4	1
Dependent_state	ARGM-PRD	4	0.27
Depictive	ARGM-ADV	4	1
Duration	ARGM-TMP	3	0.37

Table 4. WordNet sense disambiguation results (F1 score).

Method	POS	Senseval All	Book Dataset
MFS	Verb	49.6	52.2
MFS	Noun	67.5	69.1
LMMS₂₃₄₈ (BERT)	Verb	64.9	65.2
LMMS₂₃₄₈ (BERT)	Noun	76.7	78.6
BERT-Medium₅₁₂ k-NN	Verb	62.5	62.9
BERT-Medium₅₁₂ k-NN	Noun	75.1	77.3
BERT₁₀₂₄ k-NN	Verb	63.9	64.7
BERT₁₀₂₄ k-NN	Noun	76.4	78.1

Table 5. SUMO concept disambiguation task.

Method	POS	Senseval All SUMO	Senseval SUMO 3D	Book Dataset All SUMO	Book Dataset SUMO 3D
MFS	Verb	52.3	57.9	52.7	58.4
MFS	Noun	72.9	76.3	77.2	77.3
BERT₁₀₂₄ k-NN	Verb	64.4	72.1	66.8	73.4
BERT₁₀₂₄ k-NN	Noun	78.1	86.7	80.3	88.7

Table 6. FrameNet shallow semantic parsing of sentences from the book corpus and a set of sentences from the FrameNet corpus.

	System	FrameNet Documents (F1)	Book Dataset (F1)
Frames	SEMAFOR	65.3	66.2
Frames	SUMO SRL	77.5	78.9
Roles	SEMAFOR	69.5	69.8
Roles	SUMO SRL	74.2	75.3

Table 7. Coreference and object identification (F1).

Method	Object Type	Book Dataset
AllenNLP	Named entities	81.2
	Physics objects	75.7
	All	79.5
SUMO SRL	Named entities	82.6
	Physics objects	78.4
	All	80.9

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.