A Unified Knowledge Extraction Method Based on BERT and Handshaking Tagging Scheme

Yang, Ning; Pun, Sio Hang; Vai, Mang I; Yang, Yifan; Miao, Qingliang

doi:10.3390/app12136543

Open AccessArticle

A Unified Knowledge Extraction Method Based on BERT and Handshaking Tagging Scheme

by

Ning Yang

^1,2,†

,

Sio Hang Pun

^2,*,

Mang I Vai

^1,2,3,

Yifan Yang

⁴ and

Qingliang Miao

^4,†

¹

Department of Electrical and Computer Engineering, Faculty of Science and Technology, University of Macau, Macau 999078, China

²

State Key Laboratory of Analog and Mixed-Signal VLSI, University of Macau, Macau 999078, China

³

Key Laboratory of Medical Instrumentation and Pharmaceutical Technology of Fujian Province, Fuzhou 350116, China

⁴

AI Speech Co., Ltd., Building 14, Tengfei Science and Technology Park, No. 388, Xinping Street, Suzhou Industrial Park, Suzhou 215000, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2022, 12(13), 6543; https://doi.org/10.3390/app12136543

Submission received: 6 May 2022 / Revised: 2 June 2022 / Accepted: 16 June 2022 / Published: 28 June 2022

(This article belongs to the Special Issue Advances in Deep Learning-Based Information Processing for Big Data Analytics and Digital Transformation)

Download

Browse Figures

Versions Notes

Abstract

:

Featured Application

Knowledge extraction technology can be applied to many scenarios, such as factual knowledge graph construction. By using knowledge extraction technology, we can extract named entities, their relationships, attributes and concepts from massive unstructured text data. These factual knowledge graphs can be used in search engines and recommendation systems.Meanwhile, knowledge extraction technology can also be used to build a vertical domain knowledge graph, such as medical knowledge graph, which extracts symptoms, diseases, drugs, surgery, treatment methods, etc.Through knowledge extraction technology, a large amount of medical knowledge can be extracted from the medical literature and electronic medical records to assist doctors in disease diagnosis in the CDSS system. Knowledge extraction technology can also be used to analyze the dialogue between patients and doctors, extract the dialogue information of patients from medical dialogue, provide intelligent pre-consultation and guidance services and generate electronic medical records. In the dialogue system, knowledge extraction technology can be used to understand the user’s query intention and slot extraction, such as ticket booking scenario; knowledge extraction technology can extract customer booking information, such as departure city, arrival city, time, preference, etc.

Abstract

In the actual knowledge extraction system, different applications have different entity classes and relationship schema, so the generalization and migration ability of knowledge extraction are very important. By training a knowledge extraction model in the source domain and applying the model to an arbitrary target domain directly, open domain knowledge extraction technology becomes crucial to mitigate the generalization and migration ability issues. Traditional knowledge extraction models cannot be directly transferred to new domains and also cannot extract undefined relation types. In order to deal with the above issues, in this paper, we proposed an end-to-end Chinese open-domain knowledge extraction model, TPORE (Extract Open-domain Relations through Token Pair linking), which combined BERT with a handshaking tagging scheme. TPORE can alleviate the nested entities and nested relations issues. Additionally, a new loss function that conducts a pairwise comparison of target category score and non-target category score to automatically balance the weight was adopted, and the experiment results indicate that the loss function can bring speed and performance improvements. The extensive experiments demonstrate that the proposed method can significantly surpass strong baselines. Specifically, our approach can achieve new state-of-the-art Chinese open Relation Extraction (ORE) benchmarks (COER and SAOKE). In the COER dataset, F1 increased from 66.36% to 79.63%, and in the SpanSAOKE dataset, F1 increased from 46.0% to 54.91%. In the medical domain, our method can obtain close performance compared with the SOTA method in the CMeIE and CMeEE datasets.

Keywords:

named entity recognition; fixed-domain relation extraction; open-domain relation extraction; knowledge graph; handshaking tagging scheme; BERT

1. Introduction

With the rapid development of the web, the Internet is currently flooded with texts of all kinds, including news, chat conversations, professional books, etc. It will be of great value if we can discover valuable knowledge from this large web data. Knowledge extraction enables us to extract valuable knowledge from large text corpora quickly and accurately. Knowledge extraction includes two subtasks, named entity recognition and relation extraction. Named entity recognition and relation extraction are two core tasks that output structured representation in the form of relational tuples (Entity1, RelationWords, Entity2). For example, in the sentence “Yao Ming and Yi Jianlian were born in Shang Hai and Guang Dong respectively”, we can extract the knowledge tuple: {”Subject”: “Yao Ming”, “Predicate”: “born in”, “Object”: “Shang Hai”}, {“Subject”: “Yi Jianlian”, “Predicate”: “born in”, “Object”: “Guang Dong”}. These tuples can be used to construct a knowledge graph, which is further used in Q and A or recommendation systems.

Although many knowledge extraction systems have achieved reasonable performance, especially in English data. However, traditional knowledge extraction systems have two main drawbacks. Firstly, some systems are carried out through a knowledge extraction pipeline; thus, the named entities are extracted first, and then the relationships are determined through relationship classification. This pipeline leads to error accumulation and affects the final end-to-end performance. On the other hand, some systems are only suitable for closed domain knowledge extraction; thus, the entity types and relationships are fixed. Therefore, such systems can not meet the requirements of open-domain knowledge extraction, where relations are massive and undefined in advance.

For Chinese knowledge extraction systems, the development is slower than in English ones for two reasons. Firstly, there is a lack of Chinese annotated corpus. Especially when we use a neural network model, a large-scale training corpus is needed. In addition, the Chinese language is more sophisticated and flexible than English in morphology, syntax and grammar, which makes open-domain knowledge extraction more difficult [1]. There are more entity nesting and relation nesting issues in Chinese sentences. Knowledge extraction from Chinese biomedical text is more challenging because the sentence structure is complicated, and most of the sentences contain multiple relationships. According to [2], from a statistic analysis of 3000 sentences from Chinese biomedical text, about 66% of the sentences are multiple relation-bearing sentences.

Traditional Chinese knowledge extraction systems are based on the results of Dependency Parsing (DP) [3] and then combine rules to extract relation tuples. With the development of deep learning, especially the successful application of pre-training language models such as BERT [4,5] in the field of natural language understanding, more and more researchers have begun to adopt deep neural network models in knowledge extraction tasks. For fixed-domain relation extraction, the deep neural network model has made good achievements.

In this paper, we studied how to extract open knowledge from unstructured Chinese text. Given a Chinese text fragment, our task is to extract entities and their relationships. Entity types and relationship types are not predefined. The formal problem definition is given in Section 3.

Generally speaking, we faced two challenges. The first one is the nested entities and nested relationship issues, and we used a handshaking tagging scheme to alleviate this issue. Handshake tagging is very flexible, where it treats the predicate entity like the subject and object entity and solves the scenario of an unfixed relationship in the open domain. The second one is the imbalance issue since the naive multi-label classification method uses Sigmoid as the activation function and then turns it into d two-classification problems, using the sum of binary cross-entropy (BCE) as the loss. Moreover, in our model, the entity mentions were arranged and combined, and each position may have multi labels, which leads most positions of the matrix are label 0, and only a few labels are label 1, which causes an imbalance issue. During training, traditional BCE loss tends to identify all labels as 0 in the first few rounds, resulting in almost no relationship can be extracted in the early stage of training. Therefore, we adopted a new loss function that conducts a pairwise comparison of target category score and non-target category score to automatically balance the weights, and experiment results indicate that the new loss function can not only greatly improve the convergence speed in the early stage, but also increased the experimental results slightly.

We proposed an end-to-end Chinese knowledge extraction model leveraging BERT and a handshaking tagging scheme [6]. The proposed model can alleviate the nested entities and nested relations issues in complex Chinese sentences; additionally, with the help of the pre-training model, the migration ability of the model can be improved.

The major contributions of this paper are: (i) Our proposed method is able to handle named entity recognition, fixed domain and Open-domain Relation Extraction simultaneously. (ii) We adopted a new loss function defined in Formula (5). The new loss function does not turn multi-label classification into multiple binary classification problems; instead, it conducts a pairwise comparison of target category score and non-target category score to automatically balance the weight, which can alleviate imbalance issues in multiple binary classifications. The new loss function brings speed and performance improvements. (iii) Experiments on open domain data (COER [1] and SpanSAOKE derived from SAOKE [7]) show that our method has a significant improvement over strong baselines. In the COER dataset, F1 increased from 66.36 to 79.63, and in the SpanSAOKE dataset, F1 increased from 46.0 to 54.91. Our method can obtain close performance compared with the SOTA method in the CMeEE [8] and CMeIE [9] datasets.

2. Related Work

In the beginning, knowledge extraction mainly focused on fixed-domain relation extraction. With the accumulation of web data, more and more researchers are beginning to extract knowledge from open domains. Therefore, open relation extraction has become the mainstream of research. Another trend is from supervised learning to unsupervised or distant supervised learning approaches. Recently, with the development of deep learning, more and more neural network methods such as pre-training language models are used in knowledge extraction systems.

The existing knowledge extraction methods can be divided into two main categories:

Fixed-domain Relation Extraction;
Open-domain Relation Extraction.

In the following sections, each category is reviewed in detail.

2.1. Fixed-Domain Relation Extraction (FRE)

Fixed-domain relation extraction, also known as traditional relation extraction, requires defining entity types and relationships between entities in advance. Once the types of entities and relations are determined, no new entities and relations are extracted. Traditional knowledge extraction systems use predefined extraction rules or annotated data to learn extractors, mainly based on feature engineering and statistical supervised machine learning methods [10,11,12,13,14,15,16,17]. These methods need a lot of annotated data. Other work is based on unsupervised methods [18,19] by clustering representative relationship words and using these relationship words as triggers to extract relationships. Generally, FRE adopts two kinds of methods: the pipelined method and the joint model.

Pipelined method: The pipelined method transforms the task into two subtasks, named entity recognition (NER) [20] and relation classification (RC) [21]. Mintz et al. [22] proposed an alternative paradigm called Distant Supervision to train a text classifier with a large unlabeled corpus. Zhong et al. [23] used two independent encoders to learn different contextual representations of entities and relations, respectively, and fuse entity category information at the input layer of the relational model.

Joint model: pipelined models are often considered inferior to joint models because breaking the relation extraction task into subtasks can lead to error propagation, so the relation extraction work based on the joint model is also gradually expanded. Zheng et al. [24] proposed a novel tagging scheme that can convert the joint extraction task into a sequence labeling problem. Wang et al. [6] used a handshaking tagging scheme that aligns the boundary tokens of entity pairs under each relation type.

However, it is difficult to cover all relational facts in a predefined way. Once the domain data are replaced, the task not only needs to rename the entity relation but also needs to manually define new extraction rules or re-label new training data. As such, these systems rely heavily on human intervention.

2.2. Open Domain Relation Extraction (ORE)

In order to reduce the manual effort required by FRE, Banko et al. [25] introduced a new extraction paradigm, open-domain relation extraction [26]. In this way, it is helpful to independently extract the relations in the open domains, and it has good transferability.

Systems based on self-supervised learning include TextRunner [27] and WOE [28]. TextRunner proposes a deep-level grammar parser to automatically extract triples from a web corpus. It first learns a Bayesian classifier, then generates all candidate triples for the input sentence; after that, it retains the results with high confidence through the classifier, and finally, it filters out unqualified results by counting the frequency of triples in the text. PGCORE [29] proposed Pointer-Generator Networks to extract open-domain relations end-to-end, which outperforms rule-based methods, but it only considers the case where there is a single triple in the sentence. SpanOIE [30] first finds the predicates in the sentence, then takes the predicate and the sentence as an input and outputs the argument pairs that belong to this predicate. However, most neural OpenIE systems, including PGCORE and SpanOIE, cannot extract appositive relations. A complex sentence may contain multiple groups of entities and relations, and none of the above models or systems can solve this issue perfectly. CasRel [31] first extracted the subject and then the corresponding object for each relation type separately, which can solve various overlapping problems. However, it has low computational efficiency due to the variable number of subjects during prediction, and the batch size can only be set to 1. PRGC [32] used sequence labeling combined with token pair global correspondence matrix, which has higher robustness. However, it relies on pre-steps such as potential relation judgment, so the pre-steps need to prioritize recall. PRGC claimed to solve the sparse problem in handshake extraction; however, its optimization is limited to fixed relationship fields. PRGC needs to identify which relationships are contained in the given text first and then add the relationship type as a feature to the subsequent relationship extraction, which limits its application to open relationship extraction. Similarly, PURE also needs to conduct relationship classification. By optimizing decoding labels, OneRel [33] reduces redundant information and generates more deep interactions. The one-stage decoding method used is more direct and efficient, which can reduce error propagation. However, OneRel still cannot change the common problem of the global information matrix-relational redundancy and too many negative samples.

3. Problem Definition

Open-domain relation extraction is one of the main tasks for building a knowledge graph. It aims to extract triples from unstructured text. Different from FRE, all the values in the triples extracted by ORE come from the text. Therefore, ORE can be seen as a combination of the named entity recognition task and entity pairing task.

Let X denote an input sentence,

S = {s_{1}, s_{2}, ..., s_{| S |}}

denotes a set of subjects extracted from X, the predicate set P and the object set O follow the same definition. Therefore, there are

d

different combinations of triples, where

d = | S | * | P | * | O |

and

| S |, | P |, | O |

are the number of the subject set, predicate set and object set, respectively.

For each combined triple

t \in S \times P \times O

, the ORE task is to determine whether t is a correct pairing result, and the objective function of the model is given Formula (1):

θ^{*} = \arg \min_{θ} \sum_{t \in S \times P \times O} - logP {G (t) | S, P, O, X} - logP {S, P, O | X}

(1)

where

θ

is the parameters of the whole model, G(t) indicates whether triple t is correctly classified. Therefore, the task of the model is not only to ensure that the relational elements identified by the model are correct but also to ensure that they are correctly paired.

4. Our Approach

Take the sentence in Figure 1 as an example, “Yao Ming became a basketball player in Shang Hai in 1998”; we can extract the knowledge tuple: {“Subject”: “Yao Ming”, “Predicate”: “became”, “Object”: “a basketball player”, “InPlace”: “Shang Hai”, “OnDate”: “1998”}. A complete open knowledge tuple may have more than three elements; in addition to the most basic objects, there are other object types such as time, location, etc.

4.1. Encoder Framework

Given a sentence

S = (w_{1}, w_{2}, \dots, w_{n})

, we use BERT (bert-base-Chinese) [4] to encode the input sentence. BERT was adopted to encode the text since it had a great impact and achieved comparable results to the more advanced models in TPLInker [6], PURE [23], CasRel [31] and PRGC [32]. After obtaining the sentence representations

H = (h_{1}, h_{2}, \dots, h_{n})

, then we created the span matrix, which is an

n \times n \times d

tensor, where n is the length of the sentence and d is the number of labels. We defined the representation of the (i, j) position in this matrix as

s_{i j}

:

s_{i j} = h_{i} \oplus h_{j} \oplus (h_{i} \cdot h_{j}) \oplus (h_{i} - h_{j})

(2)

where

\oplus

denotes concatenation. For each

s_{i j}

, we applied a Feed-Forward Neural Network and Sigmoid activation function to create scores for each label at each position:

F F N (s_{i j}) = W_{2} \cdot R e L U (W_{1} \cdot s_{i j} + b_{1}) + b_{2}

(3)

S c o r e_{d} = S i g m o i d (F F N (s_{i j}))

(4)

where

W_{1}, W_{2}

are parameter matrices and

b_{1} and b_{2}

are bias vectors.

S c o r e_{d}

is a d-dimensional vector, corresponding to the score of d labels, respectively.

By assuming that the number of NER categories, FRE relation categories and ORE object types are x, y and z, respectively, the total number of labels is d = x + 2y + 2 + z + 4(6). Here, x means that for each NER category, generate one label for it; 2y means for one FRE relation category, it can be determined when the head and tail of two entities successfully shake hands at the same time, so we generated two labels for each FRE relation category to indicate the success of the head and tail handshake, respectively. The 2 + z represents the number of ORE elements, which are subject, predicate and z object types. The 4(6) depends on whether the handshaking times performed by the ORE task are two or three. The specific details of handshaking decoding are introduced in Section 4.2. We defined the training loss as below:

l o s s = l o g (1 + \sum_{m \in Ω n e g} e^{s c o r e (m)}) + l o g (1 + \sum_{n \in Ω p o s} e^{s c o r e (n)})

(5)

where

s c o r e (x)

means the score of the x-th label.

Ω p o s

is the set of positive label indices, and

Ω n e g

is the set of negative label indices.

The naive multi-label classification method uses Sigmoid as the activation function and then turns it into d two-classification problems, using the sum of binary cross-entropy as the loss. Obviously, when d ≫ k (k is the number of multi-label classification labels at a certain position with 1), this approach faces a serious class imbalance problem. In our method, we compared the target class scores with the non-target class scores pairwise [34]; since logsumexp is a smooth approximation function of max, it can automatically highlight items with larger errors to alleviate the class imbalance problem.

4.2. Decoder Framework

Here we took the triple (Yao Ming, was born in, Shang Hai) as an example illustrated in Figure 2, and the labels owned at each position are shown in Table 1.

4.2.1. NER Decoding

When a character pair has a certain NER label, we consider that the entity bounded by the character pair has the NER label. The actual position in the text of the first character should be before the second character.

Once-handshaking decoding:

Once-handshaking decoding is mainly used in the FRE task. ”SH2OH” is the abbreviation of SubjectHeadToObjectHead; the same is true for others. When the character pair consisting of two entities’ heads has the “SH2OH_*” label of one relation, and the character pair consisting of the two entities’ tails has the “ST2OT_*” label of this relation, this is a successful handshake. At this point, we consider the two entities to have this fixed domain relation.

4.2.2. Twice-Handshaking Decoding

Usually, ORE can be converted into multiple triples extraction. When the subject and predicate are related, and the predicate is related to another object, one ORE triple (subject, predicate, object) can be extracted. We call this twice-handshaking decoding; thus, the subject shakes hands with the predicate once, and the predicate shakes hands with the object again. In general, for less complex statements, twice-handshaking decoding is enough to combine the correct triples. However, considering extreme cases, we give examples and solutions for three-times-handshaking decoding.

4.2.3. Three-Times-Handshaking Decoding

Given this example in Figure 3, “Yao Ming” and ”Yi Jian Lian” share the same predicate “be born in”, but their respective objects are different, namely “Shang Hai” and “Guang Dong”. At this time, if we continue to use twice-handshaking decoding, the triples (Yao Ming, was born in, Guang Dong) and (Yi Jian Lian, was born in, Shang Hai) will be decoded incorrectly. Therefore, we added two tags based on the twice-handshaking decoding (As shown in the last two rows of the ORE Labels column (SH2OH and ST2OT) in Table 1).

By further clarifying the designation of subject and object in open triples, the problem of extra decoding of garbled triples can be solved. However, this step reduces the recall rate for the extraction of open relation triples for other simple sentences, as it makes the decoding condition more stringent. Therefore, this paper is more inclined to recommend twice-handshaking decoding.

5. Experiments

5.1. Dataset

We conducted our experiments using four datasets, two general domain datasets, “COER” [1] and “SpanSAOKE” [35], and two medical domain datasets, “CMeEE” and “CMeIE”. COER (Chinese open entity and relation) includes NER and ORE tasks, and SAOKE (Symbol Aided Open Knowledge Expression) has FRE and ORE tasks. Medical domain datasets are CMeEE (Chinese Medical Entity Extraction dataset) [8] and CMeIE (Chinese Medical Information Extraction dataset) [9]. The proposed method is suitable for NER, FRE and ORE tasks, so the four public datasets are used for experiments.

COER is a scalable entity and relation corpus that currently contains approximately 1 million relation triples, where relations are open and arbitrary. It aims to promote the research of Chinese information extraction. Since the data set is too large, we randomly selected 20,000 pieces of data and divided them according to 8:1:1 for k-fold cross-validation.

SpanSAOKE comes from the original dataset SAOKE [7], a large-scale sentence-level dataset for Chinese open information extraction. SpanSAOKE filtered out unknown, description and concept facts due to these having missing subject, predicate and object or introducing special predicates such as “ISA ” and “DESC”.

Besides COER and SpanSAOKE datasets, in order to verify the performance of our model in the medical domain, we also conducted comparative experiments on two medical data sets, CMeEE and CMeIE.

CMeEE is a dataset of medical documents that was developed to identify and extract clinically relevant entities and classify them into nine predefined categories. In CMeEE, an entity can be a word, phrase or sentence, and there are entity nesting cases.

CMeIE is a fixed-domain relation extraction dataset containing a pediatric training corpus and hundreds of common diseases. The datasets are derived from medical textbooks, clinical practice, chief complaints, current illness history, differential diagnosis, etc., in electronic medical record data.

5.2. Evaluation

Following previous work, we used Exact Match and Partial Match precision(P), recall(R) and F1-measure(F1) as evaluation metrics in our experiments. In particular, we used Exact Match in dataset COER and Partial Match in dataset SpanSAOKE. Since there are some nested cases in the dataset SpanSAOKE, the predicate may be rewritten during the manual labeling process. However, the extracted predicate by our mode is the original fragment in the sentence. Therefore, Partial Match was used in the dataset SpanSAOKE evaluation.

For Exact Match, the starting and ending positions of each element of the extracted triplet must be exactly the same. For Partial Match, the criterion for judging whether two triples

T = (s, p, o)

and

T' = (s^{'}, p^{'}, o^{'})

match is to satisfy: (1)

g (s, s^{'}), g (p, p^{'}), g (o, o^{'}) \geq δ

or (2)

g (C a t (T), C a t (T^{'})) \geq δ

, where

g (*, *)

is the gestalt pattern-matching function [36],

C a t (*)

concatenates triple components as a whole string and the threshold

δ

= 0.85.

5.3. Hyperparameters

Table 2 shows the major hyperparameters in our model, and we use a unified setting for all of the ablation experiments.

5.4. Experiment Results

We compared our proposed approach TPORE with several competitive baselines for Chinese open relation extraction.

UnCORE [18] exploits using word distance and entity distance constraints to generate candidate relation triples from the raw corpus and then adopts global ranking and domain ranking methods to discover relation words from the candidate relation triples.

ZORE [37] is a syntactic-based system, which identifies relation candidates from automatically parsed dependency trees, and then extracts relations with their semantic patterns iteratively through a novel double propagation algorithm.

PGCORE [29] casts relation extraction as a text summary task and proposes an end-to-end abstract Chinese Open RE model based on the Pointer-Generator Network.

SpanOIE [30], instead of the previously adopted sequence labeling formulization for n-ary OpenIE, first finds predicate spans and then takes a sentence and predicate span as input and outputs argument spans for this predicate.

MGD-CNN [35] constructs a multi-grained dependency (MGD) graph with dependency edges between words and soft-segment edges between words and characters and updates node representations using a deep graph neural network to fully exploit the topology structure of the MGD graph and capture multi-hop dependencies.

Table 3 shows the experiment results of ZORE, UnCORE and PGCORE, and we used the results in their original papers. As shown in Table 3 and Table 4, experiment results in COER are much higher than SpanSAOKE; even an exact match is used in the COER dataset. By analyzing the data, we found that cases in SpanSAOKE are much more sophisticated than COER. First, the average sentence length of the COER is 21, while the average sentence length of SpanSAOKE is 46. Previous work can capture context information, even using the LSTM model, and obtain a higher precision in the COER dataset. We also statistically analyzed the case that one sentence contains multiple triples; in COER, 25% percentage of sentences contain multiple triples, while in SpanSAOKE, the percentage is 53.9%. Since our model is good at dealing with the situation of multiple triples in one sentence, compared with previous methods, our method can greatly improve recall and F1 with comparable precision.

Table 4 shows the experiment results of ZORE, charLSTM, Span OIE, MGD-GNN, and for charLSTM, Span OIE and MGD-GNN, we used the results in their original papers. Table 4 shows the experiment results in the SpanSAOKE dataset; our model obtained the best performance in P, R and F1. The model learned the paradigm of each element in ORE, as a large number of relations that are not present in training data are extracted correctly in test data. Our model can solve various nesting problems, while other works cannot handle the situation where there are multiple entities and relations in a single sentence very well. This is also the main reason why the performance of our model is significantly higher than in previous works. For example, the sentence “Cooling water, the full name should be called antifreeze coolant, which means coolant with antifreeze function” contains two sets of relations, <cooling water, full name, antifreeze coolant> and <antifreeze, means, coolant with antifreeze function>. Among them, “antifreeze coolant” is both the object of the former relation triple and the subject of the latter relation triple. Our model decoding method can handle this situation, which is difficult for other models.

Traditional unsupervised Chinese open relationship extraction methods (ZORE and UnCORE) are generally based on dependency parser and semantic paradigm (DSNF), which are used to extract the relationship between verbs and nouns. These methods heavily rely on the performance of word segmentation, entity recognition, dependency parser and pattern quantity and quality. Some deep learning models, such as PGCORE, use the pointer mechanism and simplification of the open relationship extraction task because it can only extract a group of open relational triples from a single sentence. Our proposed approach can enumerate all triples; therefore, the proposed model can extract as many triples as possible and improve recall. MGD-GNN uses a graph neural network to extract the relationships between entities, but the performance is affected by the long dependency path. The proposed method can extract relationships in a long dependency path between each entity pair.

5.5. Ablation Experiments

We also conducted ablation experiments, as shown in Table 5. BCELoss means the model uses the BCE loss function and keeps the other parameter unchanged. From Table 5, we can see that our proposed loss function in Formula (5) can improve the final performance. Moreover, we conducted an experiment to verify the advantage of the new loss function in terms of convergence speed. Figure 4 illustrates the results, the horizontal axis is the number of iterations, and the vertical axis represents F1 scores. From Figure 4, we can see that the new loss function could more effectively speed the convergence rate than BCE Loss. In summary, our proposed loss function not only improves the F1 score but also speeds up convergence.

3HS means the model uses three-times-handshaking decoding instead of twice-handshaking decoding and keeps the other parameter unchanged. Strangely, compared with 2HS, three-times-handshaking does not bring better experimental results. Through careful analysis of test data, we found there are few sophisticated cases like the case shown in Figure 3. Although these sophisticated cases can be solved by introducing three-times-handshaking, the decoding of other simple triples becomes more stringent by shaking hands once more. From the results of the experiment, the low recall is the main factor that ultimately affects the performance of F1. Although one more handshake can solve the problem in Figure 3, it improves the precision but hurts the recall and decrease F1.

5.6. Experiment Results on NER Task and FRE Task

In order to evaluate the effectiveness of the model on the NER and FRE task, we selected the NER dataset CMeEE [8] and the FRE dataset CMeEE [9] from the public benchmark CBLUE [38]. For each task, we selected the SOTA methods to compare with our model. From Table 6 and Table 7, we can see our model has a performance that is not inferior to SOTA. Our model obtained higher precision than BERT-CRF [39] model and obtained better recall than BERT-CRF, BERT-MRC [40] and BERT-SPAN [41] models. On F1, our model outperforms BERT-CRF and BERT-MRC. For the FRE task, our model obtained higher precision than the PURE [23] model and obtained better recall than the TPLinker model [6]. On F1, our model outperformed PURE.

Based on the above experiment results and analysis, we can conclude that our model can reach the SOTA level in NER, FRE and ORE tasks, especially in ORE tasks, and achieve a new SOTA.

6. Conclusions

In this paper, we proposed an end-to-end Chinese open-domain relation extraction model based on BERT. By expanding the sequence vector into a span matrix to fit the handshaking tagging scheme, our method can encode tasks including NER, FRE and ORE. The experiment results on four datasets show that the proposed model is effective. Our method can partially solve the issue of multi-entity overlapping relations in complex contexts. When the length of the processed text increases, the number of parameters of our model grows geometrically. The sparsity of the span matrix is a main limitation of the proposed model and leads the model to occupy more memory. In future work, inspired by the restricted relationship extraction model PRGC, we plan to use the global correspondence matrix with a lower label dimension to replace the current matrix with multi-label classification. We plan to divide the long text into multiple sub-texts so that the length of the sub-texts is within a controllable range. At the same time, we plan to introduce syntactic structure information into extraction models to further improve the ability to solve entity and relationship nested issues. Furthermore, we also consider combining generative models to normalize the extraction results and extract open relations in a unified format. We also plan to compare more advanced encoder models with BERT.

Author Contributions

Conceptualization, N.Y. and S.H.P.; methodology, N.Y., Y.Y. and Q.M.; software, N.Y., Y.Y. and M.I.V.; validation, N.Y., Y.Y. and S.H.P.; formal analysis, Y.Y. and Q.M.; investigation, Y.Y.; resources, N.Y. and S.H.P.; data curation, Y.Y. and Q.M.; writing—original draft preparation, N.Y. and Y.Y.; writing—review and editing, S.H.P., M.I.V. and Q.M.; visualization, M.I.V.; supervision, S.H.P.; project administration, Q.M.; funding acquisition, M.I.V. and S.H.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Key R&D Program of China (No. 2020YFB1313502) and funded by the Shenzhen-Hong Kong-Macau S&T Program (Category C) of SZSTI (SGDX20201103094002009) and funded by the University of Macau (File no. MYRG2019-00056-AMSV, MYRG2020-00098-FST) and funded by The Science and Technology Development Fund, Macau SAR (File no. 0144/2019/A3, 0022/2020/AFJ, SKL-AMSV (FDCT-funded), SKL-AMSV-ADDITIONAL FUND, SKL-AMSV(UM)-2020-2022).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The COER (Chinese open entity and relation) data are publicly available at: https://github.com/TJUNLP/COER (accessed on 22 May 2018), SpanSAOKE (Symbol Aided Open Knowledge Expression) data are publicly available at: https://github.com/Lvzhh/MGD-GNN (accessed on 14 July 2021), CMeIE (Chinese Medical Information Extraction dataset) and CMeEE (Chinese Medical Entity Extraction dataset) data supporting reported results and the conclusions of this article are available at https://tianchi.aliyun.com/dataset/dataDetail?dataId=95414#1 (accessed on 22 March 2021).

Acknowledgments

We thank all the anonymous reviewers for their thoughtful comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jia, S.; E, S.; Li, M.; Xiang, Y. Chinese open relation extraction and knowledge base establishment. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2018, 17, 15. [Google Scholar] [CrossRef]
Miao, Q.; Zhang, S.; Zhang, B.; Meng, Y.; Yu, H. Extracting and visualizing semantic relationships from Chinese biomedical text. In Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, Bali, Indonesia, 8–10 November 2012; pp. 99–107. [Google Scholar]
Chen, D.; Manning, C.D. A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 26–28 October 2014; pp. 740–750. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar] [CrossRef]
Wu, S.; He, Y. Enriching pre-trained language model with entity information for relation classification. In Proceedings of the CIKM ‘19: The 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 2361–2364. [Google Scholar] [CrossRef] [Green Version]
Wang, Y.; Yu, B.; Zhang, Y.; Liu, T. TPLinker: Single-stage joint extraction of entities and relations through token pair linking. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 1572–1582. [Google Scholar] [CrossRef]
Sun, M.; Li, X.; Wang, X.; Fan, M.; Feng, Y.; Li, P. Logician: A unified end-to-end neural approach for open-domain information extraction. In Proceedings of the WSDM ‘18: The Eleventh ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, 5–8 February 2018; pp. 556–564. [Google Scholar] [CrossRef]
Zan, H.; Li, W.; Zhang, K.; Ye, Y. Building a pediatric medical corpus: Word segmentation and named entity annotation. In Chinese Lexical Semantics; Springer: Cham, Switzerland, 2020; pp. 652–664. [Google Scholar] [CrossRef]
Guan, T.; Zan, H.; Zhou, X.; Xu, H.; Zhang, K. Cmeie: Construction and evaluation of Chinese medical information extraction dataset. In Proceedings of the Natural Language Processing and Chinese Computing: 9th CCF International Conference (NLPCC), Zhengzhou, China, 14–18 October 2020; Part I. pp. 270–282. [Google Scholar] [CrossRef]
Wang, J.; Yang, J.; He, L.; Lin, X.; Chen, C.; Ma, T. Chinese entity relation extraction based on word cooccurrence. Comput. Sci. 2010, 13, 8048–8055. [Google Scholar]
Lin, R.; Chen, J.; Yang, X.; Xu, H. Research on mixed model-based Chinese relation extraction. In Proceedings of the 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT’10), Chengdu, China, 9–11 July 2010; Volume 1, pp. 687–691. [Google Scholar] [CrossRef]
Zhang, P.; Li, W.; Wei, F.; Lu, Q.; Hou, Y. Exploiting the role of position feature in Chinese relation extraction. In Proceedings of the 6th International Language Resources and Evaluation (LREC’08), Marrakech, Morocco, 28–30 May 2008; pp. 2120–2124. [Google Scholar]
Li, W.; Zhang, P.; Wei, F.; Hou, F.; Lu, Q. A novel feature-based approach to Chinese entity relation extraction. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL’08), Columbus, OH, USA, 15–20 June 2008; pp. 89–92. [Google Scholar] [CrossRef]
Zhang, Y.; Zhou, J.F. A trainable method for extracting Chinese entity names and their relations. In Proceedings of the second workshop on Chinese language processing: Held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, China, 1–8 October 2000; Volume 12, pp. 66–72. [Google Scholar] [CrossRef] [Green Version]
Liu, D.; Zhao, Z.; Hu, Y.; Qian, L. Incorporating lexical semantic similarity to tree kernel based Chinese relation extraction. In Proceedings of the 13th Chinese conference on Chinese Lexical Semantics, Wuhan, China, 6–8 July 2012; pp. 11–21. [Google Scholar] [CrossRef]
Zhang, J.; Ouyang, Y.; Li, W.; Hou, Y. A novel composite kernel approach to Chinese entity relation extraction. In Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy (ICCPOL’09), Hong Kong, China, 26–27 March 2009; Volume 5459, pp. 236–247. [Google Scholar] [CrossRef]
Che, W.; Jiang, J.; Su, Z.; Pan, Y.; Liu, T. Improved-edit-distance kernel for Chinese relation extraction. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP’05), Jeju Island, Korea, 11–13 October 2005; pp. 134–139. [Google Scholar]
Qin, B.; Liu, A.; Liu, T. Unsupervised Chinese open entity relation extraction. J. Comput. Res. Dev. 2015, 52, 1029–1035. [Google Scholar] [CrossRef]
Huang, C.; Qin, L.; Zhou, G.; Zhu, Q. Research on unsupervised Chinese entity relation extraction based on convolution tree kernel. J. Chin. Inf. Process. 2010, 24, 11–18. [Google Scholar]
Sang, E.F.T.K. Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. In Proceedings of the COLING-02: The 6th Conference on Natural Language Learning (CoNLL), Taipei, Taiwan, China, 24 August–1 September 2002. [Google Scholar]
Zeng, D.; Liu, K.; Lai, S.; Zhou, G.; Zhao, J. Relation classification via convolutional deep neural network. In Proceedings of the COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland, 23–29 August 2014; pp. 2335–2344. [Google Scholar]
Mintz, M.; Bills, S.; Snow, R.; Jurafsky, D. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, 2–7 August 2009; pp. 1003–1011. [Google Scholar] [CrossRef] [Green Version]
Zhong, Z.; Chen, D. A frustratingly easy approach for entity and relation extraction. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Mexico City, Mexico, 6–11 June 2021; pp. 50–61. [Google Scholar]
Zheng, S.; Wang, F.; Bao, H.; Hao, Y.; Zhou, P.; Xu, B. Joint extraction of entities and relations based on a novel tagging scheme. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1: Long Papers, pp. 1227–1236. [Google Scholar] [CrossRef] [Green Version]
Etzioni, O.; Banko, M.; Soderland, S.; Weld, D.S. Open information extraction from the web. Commun. ACM 2008, 51, 68–74. [Google Scholar] [CrossRef] [Green Version]
Niklaus, C.; Cetto, M.; Freitas, A.; Handschuh, S. A survey on open information extraction. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 20–26 August 2018; pp. 3866–3878. [Google Scholar]
Yates, A.; Banko, M.; Broadhead, M.; Cafarella, M.; Etzioni, O.; Soderland, S. Textrunner: Open information extraction on the web. In Proceedings of the Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), New York, NY, USA, 22–27 April 2007; pp. 25–26. [Google Scholar]
Wu, F.; Weld, D.S. Open information extraction using Wikipedia. In Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, Sweden, 11–16 July 2010; pp. 118–127. [Google Scholar]
Cheng, Z.; Wu, X.; Xie, X.; Wu, J. Chinese open relation extraction with pointer-generator networks. In Proceedings of the 2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC), Hong Kong, China, 27–29 July 2020. [Google Scholar] [CrossRef]
Zhan, J.; Zhao, H. Span model for open information extraction on accurate corpus. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 9523–9530. [Google Scholar]
Wei, Z.; Su, J.; Wang, Y.; Tian, Y.; Chang, Y. A Novel cascade binary tagging framework for relational triple extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, WA, USA, 5–10 July 2020; pp. 1476–1488. [Google Scholar] [CrossRef]
Zheng, H.; Wen, R.; Chen, X.; Yang, Y.; Zhang, Y.; Zhang, Z.; Zhang, N.; Qin, B.; Ming, X.; Zheng, Y. PRGC: Potential relation and global correspondence based joint relational triple extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Bangkok, Thailand, 1–6 August 2021; pp. 6225–6235. [Google Scholar] [CrossRef]
Shang, Y.; Huang, H.; Mao, X. OneRel: Joint entity and relation extraction with one module in one step. arXiv 2022, arXiv:2203.05412. [Google Scholar]
Su, J. Extending “Softmax + Cross Entropy” to Multi Label Classification Problem. 2020. Available online: https://kexue.fm/archives/7359 (accessed on 25 April 2020).
Lyu, Z.; Shi, K.; Li, X.; Hou, L. Multi-grained dependency graph neural network for Chinese open information extraction. In Lecture Notes in Computer Science, Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Virtual Event, 11–14 May 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 155–167. [Google Scholar] [CrossRef]
Ratcliff, J.W.; Metzener, D.E. Pattern-matching-the gestalt approach. Dr. Dobb’s J. 1988, 13, 46–72. [Google Scholar]
Qiu, L.; Zhang, Y. ZORE: A syntax-based system for Chinese open relation extraction. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1870–1880. [Google Scholar] [CrossRef]
Zhang, N.; Chen, M.; Bi, Z.; Liang, X.; Li, L.; Shang, X.; Yin, K.; Tan, C.; Xu, J.; Huang, F.; et al. CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark. arXiv 2021, arXiv:2106.08087. [Google Scholar]
Dai, Z.; Wang, X.; Ni, P.; Li, Y.; Li, G.; Bai, X. Named entity recognition using BERT BiLSTM CRF for Chinese electronic health records. In Proceedings of the 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Suzhou, China, 19–21 October 2019; pp. 1–5. [Google Scholar] [CrossRef]
Li, X.; Feng, J.; Meng, Y.; Han, Q.; Wu, F.; Li, J. A unified MRC framework for named entity recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, WA, USA, 5–10 July 2020; pp. 5849–5859. [Google Scholar] [CrossRef]
Yu, J.; Bohnet, B.; Poesio, M. Named entity recognition as dependency parsing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, WA, USA, 5–10 July 2020; pp. 6470–6476. [Google Scholar] [CrossRef]

Figure 1. The overall structure of the model. To make it easier to understand, the FFN of ORE and the FFN of NER are drawn separately in the figure, but they actually share the same FFN. In order to make each character in Chinese and English correspond, here we translated ”Shanghai” to “Shang Hai”.

Figure 2. In general, twice-handshaking decoding can already obtain multiple sets of triples.

Figure 3. Three-times-handshaking decoding example in extreme cases.

Figure 4. Convergence speed with iterations in SpanSAOKE dataset.

Table 1. Labels of each position in sentence “Yao Ming was born in Shang Hai”.

Row Char-Column Char	NER Labels	FRE Labels	ORE Labels
Yao, Ming	Person	-	Subject
Was, in	-	-	Predicate
Shang, Hai	Location	-	Object
Yao, was	-	-	SH2PH
Ming, in	-	-	ST2PT
Shang, was	-	-	OH2PH
Hai, in	-	-	OT2PT
Yao, Shang	-	SH2OH_BirthPlace	SH2OH
Ming, Hai	-	ST2OT_BirthPlace	ST2OT

Table 2. Major hyperparameters in our model.

Parameter	Value
Batch size	8
Dropout	0.5
FFN input size	768 × 4
FFN hidden size	768 × 6
Gestalt pattern threshold	0.85
Learning rate	5 × 10⁻⁵
Max sequence length	100
Optimizer	Adam
Sigmoid threshold	0.45

Table 3. The results of each method on COER. Some experimental results are directly quoted from previous comparative experiments.

Methods	COER(Exact Match)
	P (%)	R (%)	F1 (%)
ZORE	83.76	14.47	24.88
UnCORE	80.57	47.62	59.86
PGCORE	85.37	54.28	66.36
Our model	83.45	76.15	79.63

Table 4. The results of each method on SpanSAOKE. Some experimental results are directly quoted from previous comparative experiments.

Methods	SpanSAOKE(Partial Match)
	P (%)	R (%)	F1 (%)
ZORE	37.45	10.63	16.56
CharLSTM	40.40	45.40	42.70
SpanOIE	41.80	44.30	43.00
MGD-GNN	45.00	47.10	46.00
Our model	62.03	49.24	54.91

Table 5. The results of ablation study.

Methods	COER(Exact Match)			SpanSAOKE(Partial Match)
	P (%)	R (%)	F1 (%)	P (%)	R (%)	F1 (%)
Our model	83.45	76.15	79.63	62.03	49.24	54.91
Our model w/BCELoss	82.91	75.57	79.07	60.48	49.29	53.71
Our model w/3HS	85.22	73.59	78.97	63.55	46.16	54.14

Table 6. The comparison between our model and recent SOTA models on CMeEE.

Methods	P (%)	R (%)	F1 (%)
BERT-CRF	62.14	63.56	62.84
BERT-MRC	67.86	61.46	64.50
BERT-SPAN	65.74	66.17	65.96
Our model	63.73	66.25	64.97

Table 7. The comparison between our model and recent SOTA models on CMeIE.

Methods	P (%)	R (%)	F1 (%)
PURE	55.19	61.33	58.10
TPLinker	63.18	55.85	59.29
Our model	61.16	55.94	58.43

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, N.; Pun, S.H.; Vai, M.I.; Yang, Y.; Miao, Q. A Unified Knowledge Extraction Method Based on BERT and Handshaking Tagging Scheme. Appl. Sci. 2022, 12, 6543. https://doi.org/10.3390/app12136543

AMA Style

Yang N, Pun SH, Vai MI, Yang Y, Miao Q. A Unified Knowledge Extraction Method Based on BERT and Handshaking Tagging Scheme. Applied Sciences. 2022; 12(13):6543. https://doi.org/10.3390/app12136543

Chicago/Turabian Style

Yang, Ning, Sio Hang Pun, Mang I Vai, Yifan Yang, and Qingliang Miao. 2022. "A Unified Knowledge Extraction Method Based on BERT and Handshaking Tagging Scheme" Applied Sciences 12, no. 13: 6543. https://doi.org/10.3390/app12136543

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Unified Knowledge Extraction Method Based on BERT and Handshaking Tagging Scheme

Abstract

Featured Application

Abstract

1. Introduction

2. Related Work

2.1. Fixed-Domain Relation Extraction (FRE)

2.2. Open Domain Relation Extraction (ORE)

3. Problem Definition

4. Our Approach

4.1. Encoder Framework

4.2. Decoder Framework

4.2.1. NER Decoding

4.2.2. Twice-Handshaking Decoding

4.2.3. Three-Times-Handshaking Decoding

5. Experiments

5.1. Dataset

5.2. Evaluation

5.3. Hyperparameters

5.4. Experiment Results

5.5. Ablation Experiments

5.6. Experiment Results on NER Task and FRE Task

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI