Next Article in Journal
Enhanced Degradation of Decabromodiphenyl Ether via Synergetic Assisted Mechanochemical Process with Lithium Cobalt Oxide and Iron
Previous Article in Journal
Exact Solution of the Raman Response Function of Chalcogenide Fiber and Its Influence on the Mid-Infrared Supercontinuum
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Chinese Named Entity Recognition Based on Boundary Enhancement with Multi-Class Information

1
School of Mathematics, Hohai University, Nanjing 211100, China
2
School of Computer and Information, Hohai University, Nanjing 211100, China
3
Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing 211100, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(23), 12925; https://doi.org/10.3390/app132312925
Submission received: 31 October 2023 / Revised: 24 November 2023 / Accepted: 30 November 2023 / Published: 2 December 2023

Abstract

:
Compared with English named entity recognition (NER), Chinese NER faces significant challenges due to the flexible, non-standard word formation and vague word boundaries, which cause a lot of boundary ambiguity and reduce the accuracy of entity identification. To address this issue, we propose a boundary enhancement with multi-class information model (BEMCI). The model integrates multiple types of information into text embedding while enhancing the subsequent syntax-structure information. A syntactic information analysis module is designed to highlight important syntax information from three aspects, namely part-of-speech tags, syntactic constituents, and dependency relations, to analyze sentence structures. Meanwhile, an improved contextual attention mechanism, which combines contextual and syntactic information using a gate mechanism to control the weight fusion, is proposed to further enhance the model’s boundary determination. Multiple sets of experiments conducted on six general datasets show that BEMCI outperforms other baselines, achieving the best results in four of these six datasets.

1. Introduction

Named entity recognition (NER) is a fundamental information extraction task, which aims to identify named entities from predefined types of text, such as people’s names, place names, and organization names [1]. Chinese is an important international language widely used all over the world. In recent years, Chinese NER has received increasing attention [2]. When dealing with Chinese text entity recognition, we enter text sentences into the model, train the model, and determine entity boundaries, i.e., the beginning and ending positions of a word, with the weights and other information of the sentences. The model selects the most optimal sequence of labels, then annotates the characters accordingly to accomplish the recognition of named entities. However, Chinese NER suffers tremendous challenges caused by the greater ambiguity of entity boundaries in Chinese text [2]. For example, a single Chinese character has less semantic information compared with an English word and often needs to be combined with others to express a specific meaning. In addition, Chinese text lacks distinctive features like space separation and capitalization of the first letter.
Early NER methods can be divided into two types: rule-based methods and statistical-based methods [2]. The rule-based methods, which select matching entities from the text based on a number of matching rules, rely on linguistics experts to construct rules by hand. These methods are only suitable for use in specific domains and have less versatility. The statistical-based methods model NER as a sequence labeling problem, and as such are versatile and do not require too many hand-designed rules. In recent years, deep learning approaches to NER have received widespread attention in the field. The deep learning-based methods do not need to manually formulate rules or cumbersome features and can easily extract implicit semantic information from the input, making them flexible and easy to migrate to new fields.
In recent years, most research efforts have focused on incorporating external information into character representations to enhance semantic information and improve boundary recognition in Chinese NER [3,4,5,6]. However, the types of information introduced in most studies are limited, and little attention is given to the structure of text sentences and contextual information. Introducing embedding information can improve the recognition performance to some extent, but it is limited by the constraints of single types of information such as one-word polyphone or glyphs information. Although these kinds of lexical words can add semantic information, the extras are context-independent [7]. Comparably, part-of-speech (POS) tags, syntactic constituents, and dependency relations are crucial for analyzing sentence structure in a sentence. On the one hand, entities are typically nouns, and POS tags are helpful in determining whether a word is an entity. On the other hand, the POS of a character is influenced by the POS of its surrounding characters, which aids in identifying the positions of entities with similar context. Syntactic constituents and dependency relations can identify the structure within a text and guide the model in locating corresponding entities.
Because the boundary demarcation may affect the correctness of subsequent sequence labeling, it is necessary to integrate various types of information to provide different semantic cues. Furthermore, it is important to analyze the structural aspects of the text and consider the context to enhance the boundaries.
In this paper, we propose a boundary enhancement with multi-class information model (BEMCI) in order to tackle the issues raised in previous studies [8] by utilizing boundary enhancement technology with multi-class information fusion. First, in order to highlight the different meanings of Chinese characters, multi-class information including character embedding, word embedding, pinyin embedding, and radical embedding are integrated into text embedding to add semantic information to the text, while enhancing the discrimination effect of subsequent syntactic structure information. Second, boundary enhancement aims to add more sematic information to enhance the delineation of entity boundaries. Therefore, a syntactic information analysis module is added after the coding layer to capture the important syntactic structure of text sentences. Finally, an improved contextual attention is proposed, and the context and syntax information are fused with the control weight through the gate mechanism. The proposed model is used in experiments with six general datasets and compared with different baseline models, which proves that the model can effectively improve the recognition effect of target text.
The main contributions of this paper can be summarized as follows:
  • To highlight the diverse meanings inherent in Chinese characters for entity recognition, we integrate multiple types of information into text embeddings to enhance the semantic representation of the text.
  • Chinese named entity recognition is highly sensitive to the syntactic and semantic attributes of sentences. We design a syntactic information analysis module, which enhances the text representation with additional feature embeddings, to improve model performance.
  • We propose an improved contextual attention mechanism, which combines contextual and syntactic information using a gate mechanism to control the weight fusion, further strengthening the delineation of the boundaries.

2. Related Studies

In recent years, with the rapid development of neural network, neural models have been widely used for NER. The convolutional neural network (CNN) and recurrent neural network (RNN) are frequently used. External information embedding includes features such as radicals, dictionaries, word segmentation, and character shapes. Chinese named entity recognition involves a vast amount of external information, many of which are language-specific.
The introduction of lexical information can provide abundant boundary information to the model and improve the accuracy of word segmentation. Char-Based model [9] and Word-Based model [10] are two common text representation models in natural language processing. They model and represent characters or words in different ways when processing text data. One considers each character in the text as the basic unit of modeling, and the other considers each word in the text as the basic unit of modeling to consider the overall semantic and syntactic structure of the word. The BiLSTM-CRF model [11] and IDCNN-CRF model [12] are based on the classical neural network structures (LSTM and CNN) and add a globally normalized CRF layer to improve the model‘s ability to model label sequences. Zhang et al. [13] proposed the Lattice LSTM model, laying the foundation for dynamic improvements in sequence modeling in Chinese NER. WC-LSTM [14] made slight improvements to Lattice LSTM by adding word information to the beginning or end characters of the word, effectively utilizing word boundary information and reducing the impact of Chinese word segmentation errors. LR-CNN [15] introduced a rethinking mechanism to merge words and used high-level features to guide lower-level weight distributions, which can better solve the problem of word boundary collisions. Experimental results showed that the model outperformed other dictionary-based models.
Peng et al. [16] proposed a soft lexicon approach, where the sentence is matched with a lexicon. For each character, SoftLexicon identifies all the words that contain it and assigns them to four categories. These four categories are then mapped to four vectors, which are concatenated with the character representation. The matching word information and word boundary information of each character are combined in the embedding layer of the model. Ju et al. [17] dynamically stacked Flat NER layers to recognize entities and proposed a Dynamic Hierarchical Model. The model divides each nested named entity into multiple layers for recognition, and after each layer is completed, the obtained information is passed to the next layer of entity recognition. Cao et al. [18] proposed an adversarial transfer learning framework and introduced a self-attention mechanism in the model to capture the global dependencies of the entire sentence. They incorporated the word boundary information shared in the Chinese Word Segmentation (CWS) task into the Chinese NER task.
Chinese characters evolved from oracle bone script, and the early Chinese characters imitated the shapes of objects. The character shape information provides additional semantic information by incorporating pictographic elements, allowing for the extraction of rich pictorial information from the character’s visual representation. Shi et al. [19] first proposed the feasibility and practicality of radical embedding in Chinese processing. Dong et al. [20] used Bidirectional Long Short-Term Memory (BiLSTM) to extract radical embeddings and then concatenated them with character embeddings containing multiple radicals to obtain the final character representation.
With the improvement of the model, the use of a transformer in natural language processing can be more effective. MECT [21] used a two-stream transformer architecture, which allowed the model to learn and utilize two different types of features simultaneously: Chinese character features and root embedding. KVMN [22] mainly used the attention mechanism to understand the boundary and semantic relationship of named entities in the text by integrating different types of syntactic information.
At present, most of the Chinese NER research focuses on the introduction of single information, and most of the studies add word information to the embedding layer, while there are few studies on multi-class information fusion, integrating text sentence structure and context information simultaneously. Considering that different types of information have different impacts on text entity recognition, our proposed model focuses on the fusion of various types of information and pays attention to sentence structure and context information.

3. Model

We propose a boundary enhancement with multi-class information model (BEMCI), as illustrated in Figure 1. The BEMCI model consists of six components: the embedding layer, coding layer, syntax analysis layer, Context Attention (CTAT), gate mechanism layer, and output layer. The embedding layer integrates various types of information such as character embedding, word embedding, pinyin embedding, and radical embedding to enhance the semantic representation of the text, which outperforms traditional methods using a single type of information. In the syntax analysis layer, a syntactic information analysis module is designed to enhance the text representation with additional feature embeddings to improve model performance. An improved context attention is proposed to capture contextual information, which strengthens the delineation of the boundaries further than traditional methods. Other components including the coding layer, gate mechanism layer, and output layer are commonly used components in traditional models.

3.1. Embedding Layer

This layer implements the conversion of a text sequence to a multi-class information embedding vector, including character embedding, pinyin embedding, word embedding, and radical embedding. Each class of information embedding converts the text into a corresponding multi-dimensional vector. The output is transformed into a hidden vector through the transformer [23].

3.1.1. Character Embedding

Character embedding represents the characters of a word as a sequence, which is converted to a character vector. Given an input sequence C = c 1 , c 2 , , c i , , c n , where i represents the i-th character in the text. Each embedded character can be represented as a vector c i :
e i = e m b c i
where e i represents the static embedding of characters, and the final character embedding sequence E :
E = { e 1 , e 2 , , e i , , e n }

3.1.2. Word Embedding

Word embedding translates words to vector representations. During word embedding, the model matches the associated words from the lexicon and generates a word matching matrix. After generating character embeddings, different length words are mapped to each character based on the word matching matrix. For example, “耐火材料公司”(refractory material company) is mapped to characters “耐”, “火”, “材”, “料”, “公”, and “司” to enhance the character representation. If one character has no associated word, no operation is performed. Each character can be represented as follows:
e i * = e i + k μ i , k
where k μ i , k represents the k-th associated word related to the i-th character. Specially, if there is no associated word, e i * = e i .
The enhanced character representation is then generated as a sequence E * :
E * = { e 1 * , e 2 * , , e i * , , e n * }
The word embedding process is shown as Figure 2.

3.1.3. Pinyin Embedding

The pinyin embedding, which represents the pronunciation of Chinese characters, can distinguish different semantic meanings belonging to the same character form. So, it can model both semantic and syntax information that cannot be captured by other embeddings. The pinyin embedding for each character is generated by using the pypinyin package to generate the corresponding pinyin sequence, which infers the pinyin of the current character c i based on the given textual context. We represent the letters of the pinyin with Romanian characters and use the numbers “1–4” to represent the four tones in Chinese, followed by the Romanian sequence. Since the maximum length of the Romanized character sequence for pinyin is 7, we set the maximum length of the input to be 8. If the length of the sequence is less than the maximum length, we pad it with special characters. The sequence with Romanian and numbers is input into a CNN model with a width of 2, and then we use the maximum pooling method to extract the pinyin features of the Chinese characters. The final pinyin embedding sequence P is as follows:
P = { p 1 , p 2 , , p i , , p n }
The pinyin embedding process is shown as Figure 3.

3.1.4. Radical Embedding

The radical embedding, which is based on Chinese glyphs, can capture semantic information from different Chinese character fonts. The characters are split into two or more parts as radical embeddings. For example, “张”(zhang) can be decomposed into “弓”(gong) and “长”(chang). We use the database from the Open Chinese Word Segmentation Dictionary, which includes different segmentation methods. After splitting the characters, we input the radicals into CNN. The final radical embedding is obtained by applying max pooling and fully connected layers to the output of the CNN. The resulting radical embedding sequence R is as follows:
R = { r 1 , r 2 , , r i , , r n }
The radical embedding process is shown as Figure 4.

3.1.5. Fusion Embedding

After extracting various embeddings, we fuse them together. However, since the four types of embeddings have different dimensions, the dimension of the vector matrix are unified firstly. The dimension size of the last vector matrix is chosen as the unified dimension of the fused embedding:
s q u / u n s q u ( e i * ) = s q u / u n s q u ( P ) = s i z e ( R )
s q u / u n s q u ( ) represents dimensionality expansion or dimensionality reduction functions, and s i z e ( ) represents the dimensionality size of a vector.
It is not appropriate to simply perform a summation of the four types of embedding information. The commonly used embedding fusion method is to directly concatenate the various feature embeddings for fusion representation:
E = c o n c a t ( [ e i * ; P ; R ] )
A linear layer can also be added on top of the concatenation for further fusion, which helps save space and improve the training speed of the NER model. This can be represented as follows:
E = L i n e r ( c o n c a t ( [ e i * ; P ; R ] ) )
Although fusion embeddings can be obtained through the above two methods, the linear layer may mix and potentially lose information. Inspired by the positional encoding in the transformer, we utilize the transformer as an encoder to provide hidden vectors for the syntactic information analysis module. After obtaining multiple embedding results, the activation function is used to concatenate the results in order as follows:
E = R e L U ( W r e i * P R )
where W r represents the parameter matrix.
Then, the query, key, and value matrices that are required as input for the multi-head attention mechanism are calculated. This process can be expressed as follows:
Q , K , V = E W q , W k , W v
where W q , W k , W v represent parameter matrices, respectively.
The dot product attention is used to compute individual attention scores, as shown in Equation (12).
head i = softmax ( Q K d h e a d ) V
where d h e a d represents the dimensionality of each attention head.
Through the above process, the attention scores for a single head are obtained. Subsequently, the scores from each head are concatenated to calculate the multi-head attention scores, as showed in Equation (13).
H = C o n c a t ( head 1 , , head h ) W o
where W o is a parameter matrix.
The final output text encoding can be represented as:
H = { h 1 ,   , h i , h n }

3.2. Syntactic Information Analysis

3.2.1. Syntactic Analysis

In addition to enhancing text representation with additional feature embeddings to improve model performance, analyzing the syntactic features of the text is also effective. Cetoli et al. [24] demonstrated the effectiveness of POS (part-of-speech) tags, syntactic constituents, and dependency relations. In the syntactic information analysis module, the Stanford CoreNLP [25] toolbox and Chinese model JAR files are utilized to make subsequent annotations, which can obtain the POS tag, syntax tree, and dependency analysis results for the sequence.
POS tags label the part of speech of words in a sentence, with common parts of speech including nouns and verbs. Some parts of speech are shown in Table 1.
When extracting POS context features, the current word is used as the center word, and the preceding and following words along with their corresponding POS tags are considered. As shown in Figure 5a, the context features of “耐火材料” (refractory material) are “洛阳” (Luoyang) and “公司” (company), with the corresponding POS tags “NR” and “NN” representing syntactic information. These words and POS tags are combined as POS information.
Syntactic constituent analysis involves annotating sentence components by chunking the text and progressively merging them upward in a tree-like structure to form larger phrases and ultimately complete sentences. CoNLL-2000 [26] provides guidelines for chunking, dividing the text into syntactically related and non-overlapping groups of words. Chunking types include “NP” for noun phrases, “VP” for verb phrases, “PP” for prepositional phrases, “ADVP” for adverb phrases, “ADJP” for adjective phrases, “AS” for content marks, and so on. When extracting syntactic constituent context features, the first parent node of the current word within the phrase is indexed as the syntactic node at the upper level of the tree. All the words and tags under that node are selected as the context features of the word. As shown in Figure 5b, the syntactic node for “耐火材料” is “NP”, and the context features are “洛阳_NP” and “公司_NP”, where the words and chunking tags represent the syntactic constituent analysis information.
Dependency relations aim to reveal the syntactic structure and semantic modification relationships between text constituents by analyzing the dependencies between them. Typically, there is a core keyword in a sentence that governs other constituents. It is often a verb and is unaffected by other constituents. To represent the relationships between constituents, different relation labels are used. “root” represents the root word, which is a verb, “nsubj” represents a noun subject, “dobj” represents a direct object, “prep” represents a preposition, “pobj” represents a prepositional object, “comp” represents a compound word, “advmod” represents an adverbial modifier, “amod” represents an adjective modifier, and “dobj” represents a direct object. When extracting dependency relation context relationships, the dependency and core keyword of the current word are obtained. As shown in Figure 5c, for the current sentence, “就任” (appointed) is the verb and used as the core keyword. For “耐火材料” (refractory material), its context features are “就任” (appointed), and the corresponding dependency information is “洛阳_comp” and “就任_root”.

3.2.2. Syntactic Extraction

By mapping the syntactic analysis into the analysis network, POS tags, syntactic constituents, and dependency relations can be used to obtain a context matrix and a syntactic information matrix, respectively. This process enhances the text input and improves model performance. The context feature matrix and the syntactic information matrix can be represented as [ K i P , V i P ], [ K i C , V i C ], [ K i D , V i D ], as shown in Figure 6, where K i , V i represent the context matrix and syntactic information matrix, respectively, and P , C , D represent POS tags, syntactic constituents, and dependency relations, respectively.
The details of context feature and syntactic information are as follows:
K i t = { k i , 1 t , k i , 2 t , , k i , j t , , k i , m i t } V i t = { v i , 1 t , v i , 2 t , , v i , j t , , v i , m i t }
where m i represents the number of context features and syntactic information contained in the current character i , and the superscript t denotes the syntactic type of the current vector.
For the input sequence H = { h 1 , h 2 , , h i , , h n } , it is necessary to ensure that the analysis network weights the corresponding information features to identify and utilize important information. Therefore, it is necessary to calculate the weights of context features in the hidden vectors. The calculation process for the weights of t-type features is as follows:
p i , j t = e x p ( h i , k i , j t ) j = 1 m i e x p ( h i , k i , j t )
Once the weights are obtained, they are multiplied with the original syntactic information vectors to obtain the output that includes the weighted syntactic information of type t :
s i t = p i , j t v i , j t
The complexity of calculating e x p ( h i , k i , j t ) required for softmax function is O ( m i n ) . The complexity of weight calculation may lead to ineffective processing of long sequences, making it difficult to handle text sentences with long sequences in the syntactic analysis layer efficiently. To reduce the complexity of weight calculation as much as possible, an improvement can be made to the calculation method of weighted syntactic information in the output by applying the random feature approximation to the softmax function. Nonlinear transformation using the function ϕ ( ) is applied to h i , k i , j t , and the inner product of ϕ ( h i ) and ϕ ( k i , j t ) approximates their kernel values. The new calculation process for s i t , the output of weighted syntactic information, is as follows:
s i t = ϕ h i Τ ϕ k i , j t j = 1 m i ϕ h i ϕ k i , j t v i , j t
The calculation of syntactic constituents and dependency relations outputs is similar to the calculation of POS label outputs, following a similar process as Equation (18). The three types of information result in outputs s i P , s i C , s i D . To combine these three types of syntactic information into a unified representation and avoid information conflicts that may introduce noise, different weights are assigned to selectively and effectively utilize each type of information. Firstly, the obtained information vectors are concatenated with the hidden vector. The weights are balanced using the training parameter matrix:
q i t = σ ( W t h i s i t + b t )
where W t and b t are trainable parameter vectors and variables, respectively, the operator ⨁ represents vector concatenation, and σ is the activation function.
Then, the softmax function is applied to assign probability values to each type of syntactic information, representing the probability of each type. The calculation process is as follows:
w i t = exp q i t t P , C , D exp q i t
Once the probability values representing each type of information are obtained, the final step is to multiply the probability values with their corresponding type of information vectors. Then, the vectors representing the three types of syntactic information are added together to obtain the final character output carrying the syntactic information:
s i = t { P , C , D } w i t s i t

3.3. CTAT

In order to further capture contextual information, Context Attention (CTAT) is proposed based on multi-scale feature attention [27]. CTAT uses a soft attention mechanism to calculate distributions on sequences of elements, resulting in probabilities that reflect the importance of each element and are used as weights. CTAT can be easily trained with end-to-end backpropagation and requires only a small amount of computation time. The CTAT calculation process is shown in Figure 7.
In order to obtain faster convergence and better performance, the perceptron activation function t a n h ( ) is used to generate feature vectors:
X i = t a n h ( W h i )
To emphasize the importance of the current position in the text statement, the softmax weighted dimension needs to be set to the text statement length s e q _ l e n = n . Then, the weight in the input sentence is calculated as follows:
i = 1 n α i = 1
α i = e x p ( X i ) i = 1 n e x p ( X i )
After obtaining the corresponding attention weights, the final attention scores are computed by applying the weights to the respective vectors. The calculation process is as follows:
X i a t t = α i X i
After the processing with the CTAT module, we get the final representation X i a t t = { x 1 a t t , , x i a t t , x n a t t } R n , which is then used in gating mechanisms to evaluate the weights.
During the process of entity labeling, it is not sufficient to determine entity boundaries by relying solely on syntactic information. The gating mechanism [28] allows for dynamic weighting and control of the syntactic information used during entity labeling. The gating mechanism includes a modulation function, which is used to evaluate the contribution of the CTAT output and the syntactic analysis layer output:
g i = σ ( W g 1 X i a t t + W g 2 s i + b g )
where W g 1 and W g 2 are trainable matrices, and b g is the bias term, respectively.
The gating mechanism applies g i to the CTAT output vector and 1 g i to the syntactic analysis layer output vector, controlling their contributions to determine the final output o i :
o i = g i X i a t t ( 1 g i ) s i
where the symbol represents element-wise multiplication. In the expression 1 g i , the 1 is a vector with matching dimensions to g i .

4. Experiment

4.1. Experiment Setup

4.1.1. Dataset

The proposed model BEMCI is evaluated using six public datasets: MSRA [29], WEIBO [30], RESUME [13], AISHELL [31], CLUENER [32], and BOSON. MSRA, WEIBO, and RESUME, which are widely used in Chinese NER.
  • MSRA is derived from news, which has a large-scale corpus, long sentences, and lacks a dedicated test set. Therefore, the validation set is used as the test set.
  • WEIBO and RESUME are collected from social media platforms, and they have smaller data sizes. The WEIBO dataset can be further divided into three subsets: NE, NM, and ALL. The label annotations in the WEIBO dataset are relatively sparse, leading to suboptimal performance of existing methods compared with other datasets. RESUME is a self-built dataset that focuses on electronic resumes, with dense entity labels and consistent data formatting rules.
  • AISHELL is based on the open-source Mandarin speech corpus AISHELL-1, which includes over 170 h of Mandarin speech data from various domains.
  • CLUENER is based on the THUCTC dataset, which consists of ten different entity types.
  • The BOSON dataset is sourced from the BOSON platform and contains six entity types. Due to the lack of predefined splits in the BOSON dataset, we perform a 7:2:1 random split of the training set, the verification set, and the test set.
Table 2 presents the detailed information about these commonly used public datasets in Chinese NER, including the number of sentences, characters, total entities, and respective domains.

4.1.2. Baseline

The following baseline methods are compared with the proposed model BEMCI:
  • Char-Based [9]: This model is based on character-level LSTM and only uses characters for embedding.
  • Word-Based [10]: This model is based on word-level LSTM and only uses words for embedding.
  • BiLSTM+CRF [11]: This model is a widely used and general neural network model in NER tasks. It encodes text sentences using BiLSTM.
  • IDCNN+CRF [12]: This model is an extended convolutional neural network that captures longer contextual information and enables GPU parallelism.
  • Lattice LSTM+CRF [13]: This model is an improved version based on BiLSTM. It introduces additional word units and utilizes information between words and sequences to eliminate ambiguity.
  • SoftLexicon [16]: This model is an improved model based on CNN. It introduces a rethinking mechanism to merge lexicons and uses higher-level features to guide the distribution of lower-level weights. The encoder includes three encoding forms: BiLSTM, CNN, and transformer.
  • MECT [21]: This model integrates Chinese character features and root embeddings using multi-modal embeddings in a dual-stream transformer. It captures semantic information of Chinese characters by utilizing their structural features.
  • KVMN [22]: This model improves NER by integrating different types of syntactic information through attention. The integration is achieved through the proposed Key-Value Memory Network, syntactic attention, and encoding, as well as the gating mechanism for weighting and aggregation of these syntactic information.

4.1.3. Hyperparameters

The experimental hyperparameters of the BEMCI model are determined by referencing the parameter settings of the FLAT-Lattice [33]. Other hyperparameters are tuned using the SMAC algorithm [34] to search for optimal values.
BEMCI model uses a transformer as the text encoder. It consists of two layers with 128 hidden units and 12 attention heads. Each attention head has a dimension of 16. For the multi-class information embedding, a 1-D convolutional neural network (CNN) with a kernel width of 3 is used to obtain the radical embedding. The dropout rate for the radicals ranges from 0.1 to 0.4, where any value within this range can be chosen. To obtain the pinyin embedding, a 1-D CNN with a kernel width of 2 is used. The Adam optimizer is employed with the negative log-likelihood loss function. The model is trained for a maximum of 100 iterations with a batch size of 16. The learning rate is set to 0.0001. Dropout is used to prevent overfitting, and the dropout rate is set to 0.2. Specific hyperparameter settings are shown in Table 3. The baseline models in this chapter adopt the hyperparameter settings from their respective literature sources and are trained in the local environment to obtain the final results.

4.1.4. Evaluation

In this paper, precision (P), recall (R), and F1-score (F1) are adopted as the evaluation metrics of the experiment. The precision value refers to the ratio of correct entities to predicted entities. The recall value is the proportion of the entities in the test set that are correctly predicted. The F1-score is calculated according to the following formulation:
F 1 = 2 × p r e c i s i o n × r e c a l l p r e c i s i o n + r e c a l l
P, R, and F1-score are commonly used evaluation metrics in NER task. Since F1-score combines the two metrics of precision and recall, and its value can reflect the overall performance of the model, we select F1-score to evaluate the performance of Chinese NER.

4.2. Result

After training the BEMCI model and the baseline models on the training sets of the six datasets, they were tested using the respective test sets to evaluate their performance and analyze the results. The recognition results for the RESUME and MSRA datasets are shown in Table 4. The results for the WEIBO and AISHELL datasets are shown in Table 5. The results for the CLUENER and BOSON datasets are shown in Table 6. The bold part represents the optimal experimental results.
In Table 4, according to the experimental results on the RESUME dataset, BEMCI improves the F1-score by 0.11%, compared with the optimal baseline KVMN. The experimental results on the MSRA dataset show that the F1-score of BEMCI is slightly less than that of the optimal baseline KVMN.
In Table 5, according to the experimental results on the WEIBO NE, NM, and ALL dataset, BEMCI improves the F1-score by 0.03%, 5.24%, and 2.81%, respectively, compared with MECT. However, it is not better than KVMN. Although the features of the WEIBO dataset lead to a poor recognition effect, the F1-score of BEMCI on the WEIBO dataset is also significantly improved. The experimental results on the AISHELL dataset show that BEMCI improves the F1-score by 0.51% compared with KVMN.
In Table 6, according to the experimental results on the CLUENER dataset, BEMCI improves the F1-score by 1.11%, compared with KVMN. The experimental results on the BOSON dataset show that BEMCI improves the F1-score by 2.14%, compared with KVMN.

4.3. Ablation Study

In order to determine the influence of the four modules of multi-class information embedding, syntax analysis layer, CTAT, and gate mechanism on the model performance, an ablation study with one of the four modules was conducted and is described below. The experiments were conducted on the MSRA and AISHELL datasets. The results are shown in Figure 8.
Figure 8 shows that without the multi-class information embedding, the F1-scores decrease by 2.50% and 0.97% on the MSRA and AISHELL datasets, respectively. Without the syntax analysis layer, the F1-scores decrease by 1.07% and 0.65%. Without CTAT, the F1-scores decrease by 1.31% and 0.51%. Without the gate mechanism, the F1-scores decrease by 0.22% and 0.16%. Based on the differences in F1-score reductions, it can be concluded that all four modules contribute to enhancing the entity recognition performance. The multi-class information embedding has the strongest contribution, as it adds additional information for capturing deeper semantic meaning; CTAT is the second most influential, and the syntax analysis layer ranks third. Although the gate mechanism has the smallest contribution, the decrease indicates the importance of balancing the weights for contextual and syntax information.
To determine the impact of the three additional types of embeddings on the model, we can conduct experiments using four different settings: using only character embedding, using character and word embeddings, using character and radical embeddings, and using character and pinyin embeddings. By comparing the performance of these four settings, we can assess the contribution of each type of embedding in improving the model’s recognition effectiveness. The character embedding serves as the baseline embedding, while the other three embeddings provide additional semantic information to enhance the model’s recognition performance. The results of the BEMCI embedding ablation experiments are shown in Figure 9.
From Figure 9, it can be observed that when only character embeddings are used, the F1-scores on the two datasets are 92.53% and 90.91%, respectively. When character and word embeddings are used, the F1-scores on the two datasets are 93.80% and 91.67%, showing an improvement of 1.27% and 0.76%. When character and radical embeddings are used, the F1-scores on the two datasets are 93.64% and 91.37%, with an improvement of 1.11% and 0.40%. When character and pinyin embeddings are used, the F1-scores on the two datasets are 92.98% and 91.55%, with an improvement of 0.45% and 0.64%. It can be seen that adding word information can effectively enhance the recognition performance of the model. The impact of adding radical and pinyin information varies on the two datasets, but incorporating them still contributes to improving the accuracy of entity recognition by the model.

5. Discussion

In this study, we found that integrating various embeddings, text structure, and contextual information can help identifying Chinese word boundaries effectively, which can improve the performance of Chinese NER task. Compared with previous studies, we not only incorporated multi-class information including character embedding, word embedding, pinyin embedding, and radical embedding but also considered the text structure and focused on contextual information. These improvements were proven to be beneficial for model recognition performance via the experiments.
Additionally, we also found that models with a simpler structure are more prone to the phenomenon of high precision but low recall. Furthermore, compared with previous studies, BEMCI exhibited a higher improvement in recall than in precision. Recall measures the percentage of true positive entities returned by the model, indicating the accuracy of entity predictions. The significant improvement in recall suggested an enhancement in correctly predicted entities.
Through a comparative analysis with KVMN, we found that KVMN achieved higher F1-scores than BEMCI on the MSRA and WEIBO datasets. KVMN uses attentive ensemble to selectively learn from different syntactic information according to their contribution to NER, which makes KVMN perform better on some datasets. On MSRA, BEMCI’s complex network structure led to increased computational complexity and a decrease in performance. On WEIBO, adding multiple types of information increased the burden of the model for entity recognition on datasets with fewer entities. Therefore, the performance of BEMCI is not as good as that of KVMN on these two datasets.

6. Conclusions

This paper aimed to solve the problem of vague boundary recognition in Chinese NER through a boundary enhancement with multi-class information model called BEMCI. BEMCI integrates four types of information including character embedding, word embedding, pinyin embedding, and radical embedding to enhance the semantic representation of the text and adds a syntactic information analysis module to analyze the syntactic structure of the text. In addition, we propose an improved context attention mechanism that uses a gate mechanism to combine contextual information with syntactic information. Finally, the effectiveness of each module of BEMCI is verified through experimental analysis. However, the contribution degree of the four embeddings to Chinese NER may be different as our method did not take the contribution degree of each embedding into consideration when calculating fusion embedding. At the same time, BEMCI contains more structures, and the parameters of the model are more complex, resulting in a long training time.
In future work, we will use the attention mechanism to assign different weights to different embeddings, so as to obtain a more reasonable fusion embedding. Meanwhile, we will optimize the model structure and adopt lightweight structures or more efficient calculation methods to reduce the complexity of the model and improve the speed of model training and calculation. In addition, we will try employing pre-training techniques, such as the use of semantic information based on knowledge graphs, to enhance the recognition and representation of entities.

Author Contributions

Conceptualization, S.L.; methodology, S.L. and R.Q.; validation, S.Z.; writing—original draft preparation, S.L.; writing—review and editing, S.L. and R.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Research and Development Program of China, grant number 2022YFC3005401, the Key Research and Development Program of Yunnan Province of China, grant number 202203AA080009, and the Key Technology Project of China Huaneng Group, grant number HNZB2022-06-3-443.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in articles [29,30,31,32].

Acknowledgments

The authors also thank Ziqi Chen for participating in the research.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Nadeau, D.; Sekine, S. A survey of named entity recognition and classification. Lingvist. Investig. 2007, 30, 3–26. [Google Scholar] [CrossRef]
  2. Liu, P.; Guo, Y.; Wang, F.; Li, G. Chinese Named Entity Recognition: The State of the Art. Neurocomputing 2022, 473, 37–53. [Google Scholar] [CrossRef]
  3. Sun, Z.; Li, X.; Sun, X.; Meng, Y.; Ao, X.; He, Q.; Wu, F.; Li, J. ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021. [Google Scholar]
  4. Chen, H.; Yu, S.; Lin, S. Glyph2vec: Learning Chinese Out-of-Vocabulary Word Embedding from Glyphs. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020. [Google Scholar]
  5. Xuan, Z.; Bao, R.; Jiang, S. FGN: Fusion Glyph Network for Chinese Named Entity Recognition. In Proceedings of the 14th China Conference on Knowledge Graph and Semantic Computing (CCKS 2020), Nanchang, China, 12–15 November 2020. [Google Scholar]
  6. Luo, L.; Li, N.; Li, S.; Yang, Z.; Lin, H. A Neural Network Ensemble Approach for Chinese Clinical Named Entity Recognition. In Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing (CCKS 2018), Tianjin, China, 14–17 August 2018. [Google Scholar]
  7. Grune, D.; Jacobs, C.J.H. Introduction to Parsing. In Parsing Techniques: A Practical Guide; Springer: New York, NY, USA, 2008; pp. 61–102. [Google Scholar]
  8. Chen, C.; Kong, F. Enhancing Entity Boundary Detection for Better Chinese Named Entity Recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Online, 1–6 August 2021. [Google Scholar]
  9. Lample, G.; Ballesteros, M.; Subramanian, S.; Kawakami, K.; Dyer, C. Neural Architectures for Named Entity Recognition; Association for Computational Linguistics: San Diego, CA, USA, 2016; pp. 260–270. [Google Scholar]
  10. Liu, L.; Shang, J.; Ren, X.; Xu, F.F.; Gui, H.; Peng, J.; Han, J. Empower Sequence Labeling with Task-aware Neural Language Model. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
  11. Huang, Z.; Xu, W.; Yu, K.J.C.S. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
  12. Strubell, E.; Verga, P.; Belanger, D.; McCallum, A. Fast and Accurate Entity Recognition with Iterated Dilated Convolutions; Association for Computational Linguistics: Copenhagen, Denmark, 2017; pp. 2670–2680. [Google Scholar]
  13. Zhang, Y.; Yang, J. Chinese NER Using Lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018. [Google Scholar]
  14. Liu, W.; Xu, T.; Xu, Q.; Song, J.; Zu, Y. An Encoding Strategy Based Word-Character LSTM for Chinese NER. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Volume 1: Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
  15. Gui, T.; Ma, R.; Zhang, Q.; Zhao, L.; Jiang, Y.; Huang, X. CNN-Based Chinese NER with Lexicon Rethinking. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019. [Google Scholar]
  16. Peng, M.; Ma, R.; Zhang, Q.; Huang, X. Simplify the Usage of Lexicon in Chinese NER. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2019. [Google Scholar]
  17. Ju, M.; Miwa, M.; Ananiadou, S. A Neural Layered Model for Nested Named Entity Recognition. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), New Orleans, LA, USA, 1–6 June 2018. [Google Scholar]
  18. Cao, P.; Chen, Y.; Liu, K.; Zhao, J.; Liu, S. Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018. [Google Scholar]
  19. Shi, X.; Zhai, J.; Yang, X.; Xie, Z.; Liu, C. Radical Embedding: Delving Deeper to Chinese Radicals. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Beijing, China, 26–31 July 2015. [Google Scholar]
  20. Dong, C.; Zhang, J.; Zong, C.; Hattori, M.; Di, H. Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition. In Proceedings of the 24th International Conference on Computer Processing of Oriental Languages (NLPCC 2016)/the 5th CCF International Conference on Natural Language Processing and Chinese Computing (ICCPOL 2016), Kunming, China, 2–6 December 2016. [Google Scholar]
  21. Wu, S.; Song, X.; Feng, Z. MECT: Multi-Metadata Embedding Based Cross-Transformer for Chinese Named Entity Recognition; Association for Computational Linguistics: Toronto, ON, Canada, 2021; pp. 1529–1539. [Google Scholar]
  22. Nie, Y.; Tian, Y.; Song, Y.; Ao, X.; Wan, X. Improving Named Entity Recognition with Attentive Ensemble of Syntactic Information. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020. [Google Scholar]
  23. Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  24. Cetoli, A.; Bragaglia, S.; O’Harney, A.D.; Sloan, M. Graph Convolutional Networks for Named Entity Recognition. In Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories, Prague, Czech Republic, 23–24 January 2018. [Google Scholar]
  25. Manning, C.; Surdeanu, M.; Bauer, J.; Finkel, J.; Bethard, S.; McClosky, D. The Stanford CoreNLP Natural Language Processing Toolkit; Association for Computational Linguistics: Baltimore, MD, USA, 2014; pp. 55–60. [Google Scholar]
  26. Tjong Kim Sang, E.F.; Buchholz, S. Introduction to the CoNLL-2000 Shared Task Chunking. In Proceedings of the Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop, Lisbon, Portugal, 13–14 September 2000. [Google Scholar]
  27. Wang, S.; Huang, M.; Deng, Z. Densely Connected CNN with Multi-scale Feature Attention for Text Classification. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018. [Google Scholar]
  28. Hochreiter, S.; Schmidhuber, J.J.N.C. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  29. Levow, G.-A. The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, Australia, 22–23 July 2006. [Google Scholar]
  30. Peng, N.; Dredze, M. Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Berlin, Germany, 7–12 August 2016. [Google Scholar]
  31. Chen, B.; Xu, G.; Wang, X.; Xie, P.; Zhang, M.; Huang, F. AISHELL-NER: Named Entity Recognition from Chinese Speech. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, 23–27 May 2022; pp. 8352–8356. [Google Scholar]
  32. Xu, L.; Tong, Y.; Dong, Q.; Liao, Y.; Yu, C.; Tian, Y.; Liu, W.; Li, L.; Liu, C.; Zhang, X. CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese. arXiv 2020, arXiv:2001.04351. [Google Scholar]
  33. Li, X.N.; Yan, H.; Qiu, X.P.; Huang, X.J. FLAT: Chinese NER Using Flat-Lattice Transformer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020. [Google Scholar]
  34. Hutter, F.; Hoos, H.H.; Leyton-Brown, K. Sequential Model-based Optimization for General Algorithm Configuration. In Proceedings of the 5th international conference on Learning and Intelligent Optimization, Rome, Italy, 17–21 January 2011. [Google Scholar]
Figure 1. BEMCI model.
Figure 1. BEMCI model.
Applsci 13 12925 g001
Figure 2. Process for word embedding.
Figure 2. Process for word embedding.
Applsci 13 12925 g002
Figure 3. Process for pinyin embedding.
Figure 3. Process for pinyin embedding.
Applsci 13 12925 g003
Figure 4. Process for radical embedding.
Figure 4. Process for radical embedding.
Applsci 13 12925 g004
Figure 5. Results of syntactic analysis: (a) POS context features; (b) syntactic constituent analysis; (c) dependency relations.
Figure 5. Results of syntactic analysis: (a) POS context features; (b) syntactic constituent analysis; (c) dependency relations.
Applsci 13 12925 g005
Figure 6. The context feature matrix and the syntactic information matrix.
Figure 6. The context feature matrix and the syntactic information matrix.
Applsci 13 12925 g006
Figure 7. Context Attention.
Figure 7. Context Attention.
Applsci 13 12925 g007
Figure 8. Ablation result.
Figure 8. Ablation result.
Applsci 13 12925 g008
Figure 9. Embedding ablation result.
Figure 9. Embedding ablation result.
Applsci 13 12925 g009
Table 1. Part of speech.
Table 1. Part of speech.
TagsPOS
NRProper noun
NNNoun
LCLocation
PNPronoun
DTDeterminer
CDCardinal number
VBVerb
ADJAdjective
PPreposition
RBAdverb
Table 2. Details of the used public datasets.
Table 2. Details of the used public datasets.
DatasetTypeTrainDevTestEntity NumberArea
MSRASentence46.4 k-4.4 k45,000News
Char2169.9 k-172.6 k
WEIBOSentence1.4 k0.27 k0.27 k1350Social Media
Char73.8 k14.5 k14.8 k
RESUMESentence3.8 k0.46 k0.48 k3821Resume
Char124.1 k13.9 k15.1 k
AISHELLSentence12 k1.4 k0.71 k40,839Multiple
Char1850 k219.6 k111.9 k
CLUENERSentence10.7 k0.13 k0.13 k10,748News
Char412.5 k51.6 k50.9 k
BOSONSentence7 k2 k1 k23,173News
Char433.6 k70 k36 k
Table 3. Model hyperparameter settings.
Table 3. Model hyperparameter settings.
HyperparametersValues
Batch Size16
Epochs100
OptimizerAdam
Radical CNN Kernels Size3
Pinyin CNN Kernels Size2
Head Num12
d h e a d 16
Encoder Layer2
Learning Rate0.0001
Radical Dropout[0.1–0.4]
Dropout Probability0.2
Table 4. The result of RESUME and MSRA.
Table 4. The result of RESUME and MSRA.
ModelsRESUME (%)MSRA (%)
PRF1PRF1
Char-Based92.7192.4492.5790.7586.8288.74
Word-Based92.8292.6092.7092.6286.9189.67
BiLSTM+CRF92.5894.4193.4995.9186.6591.04
IDCNN+CRF91.6391.8691.7494.9387.1990.89
Lattice LSTM+CRF94.8194.1194.4693.5792.7993.18
SoftLexicon94.6594.7294.6994.6192.5993.58
MECT94.6395.2694.9494.2993.7994.04
KVMN95.4695.5295.4995.3094.9295.11
BEMCI95.4595.7695.6094.8795.1995.03
Table 5. The result of WEIBO and AISHELL.
Table 5. The result of WEIBO and AISHELL.
ModelsWEIBO (%)AISHELL (%)
F1(NE)F1(NM)F1(ALL)PRF1
Char-Based46.0355.2652.6785.7985.1285.45
Word-Based50.3960.9756.7086.1285.2385.67
BiLSTM+CRF57.4863.0558.8389.0387.2788.44
IDCNN+CRF57.3661.9857.7290.0685.6987.50
Lattice LSTM+CRF53.0462.2558.7991.4886.2988.81
SoftLexicon56.4662.1361.3591.4089.8990.30
MECT60.0762.2963.3091.4789.9690.71
KVMN63.3467.9168.4392.5790.2091.37
BEMCI60.1067.5366.1192.1191.6691.88
Table 6. The result of CLUENER and BOSON.
Table 6. The result of CLUENER and BOSON.
ModelsCLUENER (%)BOSON (%)
PRF1PRF1
Char-Based83.0549.4962.2167.4557.5762.12
Word-Based75.5254.5563.3472.9360.0165.84
BiLSTM+CRF70.2666.6068.3869.9867.7472.36
IDCNN+CRF68.0964.0366.0069.0466.2267.60
Lattice LSTM+CRF74.0873.1973.6378.7370.5274.40
SoftLexicon76.5675.8476.2080.3878.0479.19
MECT84.5380.6782.5586.6382.3284.42
KVMN87.1886.3086.7489.0387.3088.15
BEMCI87.6288.0987.8591.1489.4690.29
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, S.; Qi, R.; Zhang, S. Chinese Named Entity Recognition Based on Boundary Enhancement with Multi-Class Information. Appl. Sci. 2023, 13, 12925. https://doi.org/10.3390/app132312925

AMA Style

Li S, Qi R, Zhang S. Chinese Named Entity Recognition Based on Boundary Enhancement with Multi-Class Information. Applied Sciences. 2023; 13(23):12925. https://doi.org/10.3390/app132312925

Chicago/Turabian Style

Li, Shuiyan, Rongzhi Qi, and Shengnan Zhang. 2023. "Chinese Named Entity Recognition Based on Boundary Enhancement with Multi-Class Information" Applied Sciences 13, no. 23: 12925. https://doi.org/10.3390/app132312925

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop