Improving Low-Resource Chinese Named Entity Recognition Using Bidirectional Encoder Representation from Transformers and Lexicon Adapter

Dang, Xiaochao; Wang, Li; Dong, Xiaohui; Li, Fenfang; Deng, Han

doi:10.3390/app131910759

Open AccessArticle

Improving Low-Resource Chinese Named Entity Recognition Using Bidirectional Encoder Representation from Transformers and Lexicon Adapter

by

Xiaochao Dang

¹,

Li Wang

²,

Xiaohui Dong

^2,*

,

Fenfang Li

² and

Han Deng

²

¹

College of Computer Science & Engineering, Northwest Normal University, Lanzhou 730070, China

²

Gansu Province Internet of Things Engineering Research Centre, Northwest Normal University, Lanzhou 730070, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(19), 10759; https://doi.org/10.3390/app131910759

Submission received: 14 July 2023 / Revised: 19 September 2023 / Accepted: 22 September 2023 / Published: 27 September 2023

Download

Browse Figures

Versions Notes

Abstract

:

Due to their individual advantages, the integration of lexicon information and pre-trained models like BERT has been widely adopted in Chinese sequence labeling tasks. However, given their high demand for training data, efforts have been made to enhance their performance in low-resource scenarios. Currently, certain specialized domains, such as agriculture, the industrial sector, and the metallurgical industry, suffer from a scarcity of data. Consequently, there is a dearth of effective models for entity relationship recognition when faced with limited data availability. Inspired by this, we constructed a suitable small balanced dataset and proposed a based-domain-NER model. Firstly, we construct a domain-specific dictionary based on mine hoist equipment and fault text and generate a dictionary tree to obtain word vector information. Secondly, we use a Lexicon Adapter to obtain the vector information of the domain-specific dictionary feature words matched using the characters and calculate the weights between their word vectors, integrating position encoding to enhance the positional information of the word vectors. Finally, we incorporate word vector information into the feature extraction layer to enhance the boundary information of domain entities and mitigate the semantic loss problem caused via using only character feature representation. Experimental results on a manually annotated dataset of mine hoist fault texts show that this method outperforms BiLSTM, BiLSTM-CRF, BERT, BERT-BiLSTM-CRF, and LEBERT, effectively improving the accuracy of named entity recognition (NER) for mine hoist faults.

Keywords:

named entity recognition; domain dictionary; lexicon adapter; low-resource

1. Introduction

Named Entity Recognition (NER) [1] is a tagging task aimed at identifying and categorizing named entities in a text according to predefined semantic types, including but not limited to individuals, organizations, and geographical locations. In the field of complex mechanical equipment, NER can be used to identify entities such as equipment, components, and types of faults. With NER technology, helpful information can be extracted more quickly and accurately from a large amount of text data, supporting fault diagnosis and repair. Given the specificity and industry-specific nature of the complex mechanical equipment field, it is necessary to develop specific NER models for this domain. Nonetheless, the availability of substantial amounts of labeled training data is vital for their triumph. Regrettably, the process of constructing extensive labeled datasets for every new domain of interest proves to be costly and time-consuming, necessitating a well-educated manpower for annotation tasks. Therefore, effectively addressing the problem of named entity recognition under low-resource [2] conditions has become an important research topic.

While significant progress has been made in Named Entity Recognition (NER) tasks in resource-rich scenarios and low-resource scenarios, deep learning models struggle to train effective feature extraction capabilities due to the lack of annotated datasets. Therefore, enhancing model performance in low-resource NER tasks has recently become a research hotspot. Some researchers have adopted cross-lingual transfer methods [3], primarily transferring knowledge from resource-rich to low-resource languages. For instance, Xie et al. [4] proposed a translation method based on bilingual word vectors to improve the mapping of cross-language text, replacing a Bidirectional Long Short-Term Memory (BiLSTM) [5] with a Self-Attention [6] mechanism as the encoder to enhance the robustness of different language word order differences. Wu et al. [7] proposed a meta-learning algorithm to fine-tune a source language model using a few test examples of the target language to enhance the model’s generalization ability across different languages. Bari et al. [8] achieved an unsupervised cross-language NER model through word-level adversarial learning, parameter sharing, and feature-enhancing fine-tuning.

In addition to cross-lingual transfer learning, some researchers have proposed cross-domain transfer [9], transferring knowledge from resource-rich to low-resource domains. Liu et al. [10] proposed a domain NER model without external resources, enhancing zero-resource domain adaptability’s robustness through multi-task learning and a hybrid entity expert framework. Lin and Lu [11] introduced an adaptive layer on top of an existing neural network structure, eliminating the need to retrain the source domain data. Wang et al. [12] proposed a cross-disciplinary label-aware dual transfer system, mainly through two label-aware rules to achieve feature and parameter transfer.

Unlike the two research directions above, some researchers have proposed cross-task transfer methods, attempting to improve NER task’s effect through utilizing related information from other tasks, such as part-of-speech or boundary information. Kruengkrai et al. [13] proposed a multi-classification model, co-training sentence-level labels, and word-level NER. Sanh et al. [14] used a model to co-train entity extraction and detection, relation extraction, and NER tasks.

The above research revolves around transfer learning, but due to differences between different domains, it is challenging to ensure the effectiveness of knowledge transfer. Some researchers have adopted data augmentation [15,16] methods to mitigate the impact of scarce annotated data to enhance NER performance. However, injecting lexical information into the model for lexical enhancement has become an effective way to improve Chinese entity recognition performance. Liu et al. [17] proposed a model, LEBERT, to solve the Chinese sequence labeling task, mainly through injecting lexical features into the BERT (Bidirectional Encoder Representation from Transformers) model [18] to enhance words. Li et al. [19] proposed a method for constructing a multimodal domain knowledge graph based on LEBERT to solve the problem of vast and dispersed knowledge systems in computer science. Wu [20] et al. introduced an external dictionary to enhance BERT features and the adversarial training entity recognition model LEBERT-BCF, solving the problems of lack of word information, waste of entity boundary information, and poor model robustness in Chinese medical record entity recognition. Most of the above entity recognition research focuses on general domain entity recognition. Due to the presence of a large number of domain entities in the domain dataset, the above methods do not apply to the entity recognition task in the complex mechanical equipment domain without the assistance of domain knowledge.

In recent years, several noteworthy studies have emerged, offering innovative approaches to enhancing low-resource NER. Sabane et al. [21] proposed a method using various adaptations of BERT to enhance the performance of low-resource NER. Chen et al. [22] introduced a Translation-and-fusion framework, which translates a low-resource language text into a high-resource language for annotation using fully supervised models before fusing the annotations back into the low-resource language. Zhou et al. [23] proposed a two-stage learning pipeline to tackle oncological NER task in Chinese language, which is a typical task lacking training resources. Wang et al. [24] proposed SeqUST, a novel uncertain-aware self-training framework for NSL to address the labeled data scarcity issue and to effectively utilize unlabeled data. Chen et al. [25] proposed a CP-NER model to solve the problem of cross-domain resource scarcity in practical scenarios. Experiments show that the model has good performance on both single-source and multi-source cross-domain tasks. Ghosh et al. [26] proposed a novel data augmentation framework, BioAug, for low-resource biomedicine. It is used to solve new text reconstruction tasks based on selective masking and knowledge enhancement. Ghosh et al. [27] proposed Attention-map aware keyword selection for Conditional Language Model fine-tuning, a new data augmentation method based on conditional generation to address the data scarcity problem in low-resource complex NER. Mehta and Varma [28] proposed a multilingual complex named entity recognition method using XLM-RoBERTa. This method utilizes a pre-trained language model for task transfer learning to achieve cross-language and cross-domain entity recognition. Wang et al. [29] proposed a named entity recognition method called GPT- NER in 2023 using a large language model for training, which bridges the gap through transforming a sequence labeling task to a generation task that can be easily adapted by LLMs.

In response to the problem of entity recognition for complex mechanical equipment under low-resource conditions, this study proposes a method for recognizing named entities in Chinese for complex mechanical equipment faults, incorporating dictionary information based on the LEBERT model proposed by Liu et al. [17]. The difference between this method and that of Liu et al. is that, in order to integrate domain knowledge better, this study uses the Word2Vec [30,31] method in the open-source library Gensim [32] to train a domain vocabulary for mine hoist equipment and construct a domain dictionary for mine hoist equipment and faults. A dictionary adapter is embedded between the Transformer encoders in the BERT model to match characters with the domain dictionary and obtain word set information, which is integrated into the character representation to enhance lexical information. This addresses the issues of blurred entity boundaries and semantic loss in domain character representation.

The specific contributions of this study are as follows:

(1): We have constructed a domain-specific dictionary for complex mechanical equipment to enhance the accuracy of NER. We collected and organized texts on mine hoist equipment and faults, analyzed the fault ontology and related entity types, and built a domain-specific dictionary for mine hoist equipment faults. We integrated this dictionary into the BERT pre-training model to alleviate the issue of blurred entity boundaries in the domain of mine hoist fault texts. The experimental results show that our method can effectively improve the accuracy of NER.
(2): We propose a simple and effective NER method based on domain adaptation technology for the complex mechanical equipment fault domain, which needs more datasets. This method uses a dictionary adapter to obtain the vector of the word set matched using characters and integrates position encoding to enhance the recognition of entity boundaries. At the same time, we use a Conditional Random Field (CRF) [33,34] as a model classifier to overcome the imbalance problem among samples in the mine hoist domain, thereby improving the model’s recognition performance.

2. Building Domain Dictionaries and Dictionary Matching Methods

2.1. Domain Dictionary Construction

This study collected a large amount of the literature on mine hoist equipment faults from the field maintenance logs, operation manuals, safety procedures, and checklists of a state-owned large non-ferrous metal group, as well as from CNKI, Wanfang Data, and Baidu Encyclopedia. These resources extracted domain-specific terms related to mine hoist equipment faults. The specific extraction methods are as follows:

2.1.1. Statistical Method

Firstly, each corpus was pre-processed. Then, in each corpus, the top 50 words with the highest Term Frequency-Inverse Document Frequency (TF-IDF) [35,36] values were selected. These words were grouped into a set

W = \{W_{i} | 1 \leq i \leq 50\}

, where

W_{i}

represents the i-th word among these words. Subsequently, the normalized word frequency

t f_{i}

of each word was calculated. If the condition

t f_{i} \geq n

is met (where n is a predefined threshold), the word

W_{i}

is added to the domain dictionary. Finally, these domain words were subjected to manual screening and validation by experts in the respective domains to ensure the creation of a more accurate domain-specific dictionary.

2.1.2. Manual Method

Using the collected corpus related to mine hoist equipment and its faults from CNKI, Wanfang Data, and Baidu Encyclopedia, professional words with blurred boundaries in the corpus were extracted manually.

This study collected 1295 professional terms related to mine hoist equipment and equipment faults, with an average word length of seven. Examples of these terms are shown in Table 1.

As shown in Table 1, domain-specific words are composed of different combinations of words. The boundaries of combined words are blurred in entity recognition. For example, “齿轮啮合不当 (improper gear meshing)” is composed of “齿轮(gear)” and “啮合不当 (improper meshing)”. In conventional entity recognition, it would be recognized as entity types of fault location and fault phenomenon, but in reality, it is a fault cause type.

The extracted domain-specific words were added to the word segmentation tool [37] to avoid incorrect segmentation of words in the corpus. The Word2Vec word vector model in the open-source library Gensim was used to train the mine hoist fault corpus to obtain a domain dictionary composed of 2856-word vectors with a word vector dimension of 200.

2.2. Dictionary Tree

A prefix tree [38,39] is an efficient data structure. It can store vocabulary in the domain of mine hoist equipment, thereby constructing a domain dictionary tree. For instance, we can store the three strings “矿井提升机 (Mine hoist)”, “矿井主提升机 (main mine hoist)”, and “矿井副立井提升机 (auxiliary mine hoist)” in the prefix tree, as shown in Figure 1.

In this prefix tree, each node represents a character and the path from a root node to a leaf node represents a string. For example, the path from the root node “矿 (mine)” to the leaf node “hoist” represents the string “矿井提升机 (mine hoist)”. Storing strings in a prefix tree effectively reduces storage space because it saves space through sharing common prefixes. Moreover, string retrieval using a prefix tree is very efficient because we only need to search along the link that matches the first letter of the target string. This minimizes the number of string comparisons, thereby enhancing retrieval efficiency. The specific steps of the dictionary tree construction Algorithm 1 are as follows:

Algorithm 1: Dictionary tree construction algorithm

Input:

s_{c} = {c_{1}, c_{2}, \dots, c_{n}\}

, a sentence with n characters

s_{c}

Output:

s_{c w} = \{(c_{1}, w s_{1}), (c_{2}, w s_{2}), \dots, (c_{n}, w s_{n})\}

Initialize the word list $W S_{i} = \{w s_{1}, w s_{2}, \dots, w s_{n}\}$ , which is derived from a pre-compiled list of domain-specific terms, where each ws represents a unique term.
Initialize the dictionary tree root node: $r o o t$
for $i \leftarrow 1$ to n do
Initialize tree node: $n o d e$
repeat
Iterate through the word list $W S$ to get the words matching the matching words $w s_{i}$
Take $(c_{i}, w s_{i})$ Constructing dictionary tree sub-nodes: $s u b$
$n o d e \to n e x t = s u b, n o d e = s u b$
until the traversal is complete $W S$ stop
$r o o t \to n e x t = n o d e$
end for

2.3. Lexicon Adapter

Dictionary matching involves embedding a dictionary adapter between the BERT Transformer encoders. Therefore, we designed a dictionary matching algorithm to match character-level feature vectors with word-level feature vectors through the constructed domain word dictionary, thereby obtaining more fusion weights than character-level feature vectors. As shown in Figure 2, the “矿 (mine)” in “矿井提升机 (mine hoist)” matches the word set “矿井 (mine shaft)” and “矿井提升 (mine hoist)” through the domain dictionary tree. If a character does not match any word in the domain dictionary, the word set of that character is defined as “None”. The Algorithm 2 steps are as follows:

Algorithm 2: Dictionary matching algorithm

Input:

W S = \{w s_{1}, w s_{2}, \dots, w s_{n}\}, Y = \{y_{1}^{c}, y_{2}^{c}, \dots, y_{n}^{c}\}

,

w s_{1}

represents the i-th character assigned to match the word, where

y_{1}^{c}

is a character vector.

Output:

H = \{h_{1}, h_{2}, \dots, h_{n}\}

1. for

i \leftarrow 1

to n do
for

j \leftarrow 1

to n do

w_{i j} = w s_{i j}

x_{i j}^{w s} = e^{w} (w_{i j})

, where

e^{w}

is pre-trained word embedding lookup table and

w_{i j}

is the j-th word in

w s_{i}

.
end for
end for

X_{i j}^{w s} = \{x_{i 1}^{w}, x_{i 2}^{w}, \dots, x_{i m}^{w}\}

2. A nonlinear transformation

v_{i j}^{w} = W_{2} (\tanh (W_{i} x_{i j}^{w} + b_{1})) + b_{2}

of the word embedding vector

X_{i j}^{w s}

, yields

V_{i} = \{v_{i 1}^{w}, v_{i 2}^{w}, \dots, v_{i m}^{w}\}

3. for

i \leftarrow 1

to n do
4. Calculating word and domain feature word correlation

a_{i} = s o f t m a x (y_{i}^{c} W_{a t t n} V_{i}^{T})

5. Calculate the weighted sum of all words

z_{i}^{w} = \sum_{j = 1}^{m} a_{i j} v_{i j}^{w}

6. Weighted lexical information injection character vector

\tilde{h_{i}} = y_{i}^{c} + z_{i}^{w}

7. end for

Figure 2. Diagram of the complex machinery and equipment entity recognition model incorporated into the lexicon. In this example, we used “矿井提升机 (mine hoist)” as input data and extracted the domain words: “矿 (Ore)”, “矿井 (Mines)”, “提升机 (Hoister)”, “矿井提升 (Mine Hoisting)”, and “矿井提升机 (Mine hoist)”. Taking the word “矿 (Ore)” as an example, we matched the domain words “矿井 (Mines)” and “矿井提升 (Mine Hoisting)” from the domain dictionary tree to obtain their relative word vectors and complete the dictionary matching. Among them, TrieRoot is the root node of the domain dictionary tree.

3. Method

This study proposes an entity recognition method for complex mechanical equipment that incorporates a dictionary. The model framework, as shown in Figure 2, includes the following three parts:

(1): Input Layer: This layer represents the input text in terms of individual characters, which serves as the data for subsequent character-to-domain dictionary matching.
(2): Character Encoding Layer: This layer uses the Transformer encoder and dictionary adapter to obtain vector representations of the characters in the sentence sequence of mine hoist faults input into the model.
(3): Label Prediction Layer: This layer uses a Conditional Random Field (CRF) to predict the entity labels corresponding to each character in the text of mine hoist faults.

3.1. BERT-Based Character Vector Representation

For a given text, it is considered as a sequence of characters

s_{c} = \{c_{1}, c_{2}, \dots, c_{n}\}

, which serves as the input to the Input Embedding of BERT to get the output

E = \{e_{1}, e_{2}, \dots, e_{n}\}

. The

E

is then fed into the Transformer encoder in BERT, and each Transformer encoder is represented in the following form:

G = L a y e r n o r m a l i z a t i o n (H^{l - 1} + M u l t i h e a d a t t e n t i o n (H^{l - 1}))

(1)

H^{l} = L a y e r n o r m a l i z a t i o n (G + F F N (G))

(2)

where

H^{l} = \{h_{1}^{l}, h_{2}^{l}, \dots, h_{n}^{l}\}

denotes the output of the l-th layer and

H^{0} = E

; FFN is a two-layer feed-forward network with ReLU as the hidden activation function.

3.2. Lexicon Adapter

In Figure 2, we obtain a set of words through matching characters with a specialized dictionary tree. We use the dictionary matching algorithm to compute the word-level feature vectors with a higher fusion weight than the character-level feature vectors. We then combine this set of word vectors with the character vectors.

3.2.1. Char-Words Pair Sequence

Given a dictionary tree of feature words in the mine hoist domain and a sentence

s_{i} = \{w_{1}, w_{2}, \dots, w_{n}\}

containing n characters. First, all the character subsequences of the sentence are traversed, and all the character subsequences are matched with the dictionary tree using a lexicon fusion adaptation algorithm to obtain potential matching words, where each word corresponds to at most three words and less than three words are filled with <PAD>. For example, Figure 1 depicts a case in which there are four feature words in the trie: “矿井 (Mines)”, “矿井提升 (Mine hoisting)”, “提升机 (Lifter)”, and “矿井提升机 (Mine hoist)”. When the input sentence is “矿井提升机 (Mine hoist)”, its characters “矿 (Ore)”, “井 (The wells)”, “提 (Lift)”, “升 (Liter)”, and “机 (Machine)” match the feature words containing their characters in the trie, and through doing so, they can obtain the corresponding feature words and accomplish the matching of characters to feature words.

3.2.2. Lexicon Enhanced BERT

Lexicon Enhanced BERT is a combination of Lexicon Adapter and BERT, in which the Lexicon Adapter is applied to a certain layer of BERT as shown in Figure 3. For a given Chinese

s_{c} = \{c_{1}, c_{2}, \dots, c_{n}\}

, it is constructed as a word–word pair form.

s_{c w} = \{(c_{1}, w s_{1}), (c_{2}, w s_{2}), \dots, (c_{n}, w s_{n})\}

(3)

After obtaining the corresponding Transformer encoder from Equations (1) and (2), the lexical information is injected between the k-th and (k+1)-th Transformer layers through the Lexicon Adapter. The output of the k-th Transformer layer is

H^{k} = \{h_{1}^{k}, h_{2}^{k}, \dots, h_{n}^{k}\}

. Each pair

(h_{i}^{k}, x_{i}^{w s})

of these is transformed using the Lexical Adapter (LA) to obtain:

\tilde{h} x_{i}^{k} = L A (h_{i}^{k}, x_{i}^{w s})

(4)

Then, the feature vectors injected with lexical information are output to the remaining (L-k) Transformer layers.

3.3. Entity Label Prediction Layer

As shown in Figure 1, the probability of predicting the label sequence

y = \{y_{1}, y_{2}, \dots, y_{n}\}

by text

s = \{c_{1}, c_{2}, \dots, c_{n}\} \in V_{c}

is calculated as shown in Equation (5).

\{\begin{matrix} P_{i} = W_{P} + b_{P} \\ p (y | s) = \frac{e x p (\sum_{i = 1}^{n} (P_{i, y_{i}} + T_{y_{i - 1}, y_{i}}))}{\sum_{y \in Y_{c}} e x p (P_{i, y_{i}} + T_{y_{i - 1}, y_{i}})} \end{matrix}

(5)

where

W_{p}

,

b_{p}

are the parameters for calculating the score matrix P and T is a transfer matrix.

4. Experiments and Results

4.1. Datasets

This study focuses on entity recognition tasks in the field of mine hoist equipment faults. Since no public dataset exists in this field, we extracted fault text information from the on-site maintenance logs, operation manuals, safety procedures, and equipment checklists of a large state-owned non-ferrous metal group. We collected the literature related to mine hoists from CNKI, Wanfang Data, Baidu Encyclopedia, and other websites. The collected corpus was preprocessed, such as removing stop words, deleting unrecognizable special symbols, and useless spaces between strings, resulting in a total of 7732 pieces of text corpus in the field of mine hoists. The generated corpus was manually annotated using a Label-Studio annotation platform, with the annotation method being BIO (Begin Inside Outside). For example, in the sentence “齿轮啮合不当导致油温过高 (Improper gear meshing causes high oil temperature)”, “齿轮 (gear)” is the “Fault Location”, annotated as FLN; “啮合不当 (improper meshing)” is the “Fault Cause”, annotated as FCE; “油温过高 (high oil temperature)” is the “Fault Phenomenon”, annotated as FPN; and information unrelated to the entity is annotated as O. An annotation example is shown in Figure 4. During the process of manual annotation, our team discovered that Chinese sentences possess a diverse range of expression patterns that prevent traditional Named Entity Recognition (NER) models from accurately identifying the desired entities. To adapt to the various ways of expressing Chinese sentences, we adopted a dictionary embedding approach to solve this problem. Table 2 shows the corpus with and without dictionary entries, allowing for a better understanding of the effectiveness of our method.

This study follows the current national standards for mine hoists and divides entity types into five categories: Fault Phenomenon (FPN), Fault Cause (FCE), Fault Location (FLN), Fault Effect (FET), and Repair Measures (RMS). Based on the corpus annotation, a mine hoist dataset was built, and according to a ratio of 80%, 10%, and 10%, 6182 training sets, 775 validation sets, and 775 test sets were obtained.

4.2. Experiment Parameter Settings

The experimental model was built using the PyTorch framework, the underlying encoder was pre-trained with the BERT-wwm [40] language model, and the model parameters were learned adaptively using the Adam optimizer. The experimental environment setup is shown in Table 3.

The validation set was used to determine the values of the parameters, and all the experimental parameters were set as shown in Table 4 when the model achieved the optimal F1 value on the validation set.

4.3. Evaluation Indicators

For the method proposed in this study, an entity was considered correctly identified when and only when its boundary and class were correctly identified. To facilitate comparison with traditional models, the precision rate Precision (P), recall rate Recall (R), and F1 value (F1-score) were used as quantitative evaluation metrics of the model.

P = \frac{T P}{T P + F P} \times 100 %

(6)

R = \frac{T P}{T P + F N} \times 100 %

(7)

F_{1} = \frac{2 \times P \times R}{P + R} \times 100 %

(8)

where TP (True Positives) indicates the number of entities in the test set for which the category was correctly identified, FP (False Positives) indicates the number of entities for which the category was incorrectly identified, and FN (False Negatives) represents the number of entities that were not identified.

4.4. Experiment Results and Analysis

To validate the effectiveness of the proposed method, we conducted comparative experiments with existing entity recognition methods. We set up ablation experiments to verify the impact of the domain dictionary on model performance.

4.4.1. Relevant Comparison Experiments

We conducted comparative experiments with five methods published in recent years, as follows:

(1): In 2015, Wang et al. [41] proposed a solution based on word embedding and bidirectional LSTM recursive neural networks for unified tagging.
(2): In 2015, Baidu-Research [5] proposed a solution for text annotation using bidirectional Long Short-Term Memory (LSTM) networks and Conditional Random Fields (CRF).
(3): In 2018, Devlin et al. [18] proposed BERT, a deep bidirectional pre-training language model based on Transformer.
(4): In 2020, Xie et al. [42] proposed a research method based on the BERT-BiLSTM-CRF model.
(5): In 2021, Liu et al. [17] proposed LEBERT, a model for solving Chinese sequence labeling through integrating lexical information into the underlying encoding process of BERT.

To verify the practicality of the method proposed in this study, we selected BiLSTM, BiLSTM-CRF, BERT, BERT-BiLSTM-CRF, and LEBERT as baseline models for comparative experiments. The results of the comparative experiments are shown in Table 5.

As shown in Table 5, the entity recognition performance of the method proposed in this study on the mine hoist dataset is significantly superior to that of models such as BiLSTM, BiLSTM-CRF, BERT, and BERT-BiLSTM-CRF. This is primarily due to the specificity of the text describing mine hoist faults, which can easily lead to entities belonging to different types due to varying contexts. Consequently, the BiLSTM method’s lack of perception of local features and information loss becomes more apparent. However, the introduction of BERT has led to a particular improvement in the accuracy, recall, and F1 score of entity recognition compared to the BiLSTM and BiLSTM-CRF models, effectively addressing the issue of entity polysemy. After the LEBERT model incorporated word features into text embedding, compared to BERT on the mine hoist dataset, the precision of entity recognition increased by 13.74%, and the F1 score increased by 14.33%. Through constructing a mine hoist domain dictionary, the method proposed in this study is more sensitive to semantic changes at the word-level brought about via domain-specific terminology. Compared to BERT, the accuracy of entity recognition increased by 14.06%, and the F1 score increased by 14.59%. In comparison with LEBERT, the accuracy of entity recognition only increased by 0.32%, and the F1 score increased by 0.26%. The domain dictionary can effectively enhance the entity recognition effect in domains with scarce datasets.

4.4.2. Ablation Experiments

In order to validate the effectiveness of incorporating domain-specific dictionaries into our method, we conducted three experiments: one without dictionary incorporation, one with integrating Tencent’s public dictionary, and one with integrating a domain-specific dictionary.

As depicted in Figure 5, with an increase in the number of iterations, a significant improvement in the F1 scores was observed across all three experiments. This is attributed to the model’s ability to learn more feature information through continuous iterations.

A comparison was made between the F1 scores at the final iteration. The experiment incorporating Tencent’s public dictionary showed a 4.87% increase in the F1 score compared to the experiment without dictionary incorporation. This improvement is due to the semantic information in Tencent’s public dictionary, underscoring the importance of dictionary incorporation for the semantic representation of text. In the experiment incorporating the domain-specific dictionary, the F1 score improved by 1.16% compared to the experiment incorporating Tencent’s public dictionary. This is because the domain-specific dictionary contains semantic information and entity boundary information specific to the field of mine hoist faults, resulting in better recognition of entities related to mine hoist faults.

5. Conclusions

To address the structural features of fault text entities in complex mechanical equipment, this study proposes a method for recognizing fault entities in mechanical equipment through incorporating a domain-specific dictionary. This dictionary, which covers fault causes, locations, impacts, and repair measures, was constructed. An adapter for dictionary integration was embedded between two Transformer layers of the BERT model, enabling character matching with the set of words in the domain-specific dictionary. The weighted word vector information was then injected into the character vector, facilitating a more effective integration of the domain-specific dictionary into the character vector representation. The experimental results indicate that our method offers specific improvements in Chinese NER fault texts of complex mechanical equipment. As a future direction, we plan to investigate the nested relationships between entities in the domain of complex mechanical equipment faults. This will allow us to extract more domain-specific words, construct a more extensive professional vocabulary, and address the issue of poor entity recognition results due to words not appearing in the domain-specific dictionary.

Author Contributions

Conceptualization, X.D. (Xiaochao Dang), H.D. and X.D. (Xiaohui Dong); methodology, X.D. (Xiaochao Dang) and L.W.; software, H.D. and L.W.; validation, X.D. (Xiaochao Dang), L.W. and X.D. (Xiaohui Dong); formal analysis, X.D. (Xiaohui Dong), L.W. and H.D.; investigation, X.D. (Xiaochao Dang); resources, X.D. (Xiaochao Dang) and L.W.; data curation, X.D. (Xiaohui Dong), L.W. and F.L.; writing—original draft preparation, L.W. and H.D.; writing—review and editing, X.D. (Xiaohui Dong), L.W.; visualization, X.D. (Xiaochao Dang) and X.D. (Xiaohui Dong); supervision, X.D. (Xiaochao Dang); project administration, X.D. (Xiaochao Dang) and X.D. (Xiaohui Dong); funding acquisition, X.D. (Xiaochao Dang). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China (Grant 62162056), Industrial Support Foundations of Gansu (Grant No. 2021CYZC-06) by X.D.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, J.; Sun, A.; Han, J.; Li, C. A Survey on Deep Learning for Named Entity Recognition. IEEE Trans. Knowl. Data Eng. 2020, 34, 50–70. [Google Scholar] [CrossRef]
Hedderich, M.A.; Lange, L.; Adel, H.; Strötgen, J.; Klakow, D. A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, Online, 6–11 June 2021. [Google Scholar]
Liang, Y.; Meng, F.; Zhou, C.; Xu, J.; Chen, Y.; Su, J.; Zhou, J. A variational hierarchical model for neural cross-lingual summarization. arXiv 2022, arXiv:2203.03820. [Google Scholar]
Xie, J.; Yang, Z.; Neubig, G.; Smith, N.A.; Carbonell, J. Neural Cross-Lingual Named Entity Recognition with Minimal Resources. In Proceedings of the EMNLP, Brussels, Belgium, 31 October–4 November 2018; pp. 369–379. [Google Scholar]
Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Wu, Q.; Lin, Z.; Wang, G.; Chen, H.; Karlsson, B.F.; Huang, B.; Lin, C.-Y. Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources. Proc. AAAI 2020, 34, 9274–9281. [Google Scholar] [CrossRef]
Bari, M.S.; Joty, S.; Jwalapuram, P. Zero-Resource Cross-Lingual Named Entity Recognition. Proc. AAAI 2020, 34, 7415–7423. [Google Scholar] [CrossRef]
Hou, Y.; Zheng, L. Visualizing adapted knowledge in domain transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Online, 19–25 June 2021; pp. 13824–13833. [Google Scholar]
Liu, Z.; Winata, G.I.; Fung, P. Zero-Resource Cross-Domain Named Entity Recognition. In Proceedings of the 5th Workshop on Representation Learning for NLP, Online, 9 July 2020; pp. 1–6. [Google Scholar]
Lin, B.Y.; Lu, W. Neural Adaptation Layers for Cross-domain Named Entity Recognition. In Proceedings of the EMNLP, Brussels, Belgium, 31 October–4 November 2018; pp. 2012–2022. [Google Scholar]
Wang, Z.; Qu, Y.; Chen, L.; Shen, J.; Zhang, W.; Zhang, S.; Gao, Y.; Gu, G.; Chen, K.; Yu, Y. Label-Aware Double Transfer Learning for Cross-Specialty Medical Named Entity Recognition. In Proceedings of the NAACL, Online, 6–11 June 2018; pp. 1–15. [Google Scholar]
Kruengkrai, C.; Nguyen, T.H.; Aljunied, S.M.; Bing, L. Improving Low-Resource Named Entity Recognition using Joint Sentence and Token Labeling. In Proceedings of the ACL, Online, 5–10 July 2020; pp. 5898–5905. [Google Scholar]
Sanh, V.; Wolf, T.; Ruder, S. A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks. Proc. AAAI 2019, 33, 6949–6956. [Google Scholar] [CrossRef]
Li, B.; Hou, Y.; Che, W. Data Augmentation Approaches in Natural Language Processing: A Survey. arXiv 2021, arXiv:2110.01852. [Google Scholar] [CrossRef]
Xie, Q.; Dai, Z.; Hovy, E. Unsupervised Data Augmentation for Consistency Training. arXiv 2019, arXiv:1904.12848. [Google Scholar]
Liu, W.; Fu, X.; Zhang, Y. Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter. In Proceedings of the ACL-IJCNLP 2021, Bangkok, Thailand, 1–6 August 2021. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Li, H.; Fu, Y.; Yan, Y.; Li, J. Construction of Multi-modal Domain Knowledge Graph Based on LEBERT. Comput. Syst. Appl. 2022, 31, 79–90. [Google Scholar]
Wu, G.; Fan, C.; Tao, G.; He, Y. Entity recognition of electronic medical records based on LEBERT-BCF. Comput. Era 2023. [Google Scholar] [CrossRef]
Sabane, M.; Ranade, A.; Litake, O.; Patil, P.; Joshi, R.; Kadam, D. Enhancing Low Resource NER using Assisting Language and Transfer Learning. In Proceedings of the 2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC), Salem, India, 4–6 May 2023; pp. 1666–1671. [Google Scholar]
Chen, Y.; Shah, V.; Ritter, A. Better Low-Resource Entity Recognition Through Translation and Annotation Fusion. arXiv 2023, arXiv:2305.13582. [Google Scholar]
Zhou, M.; Tan, J.; Yang, S.; Wang, H.; Wang, L.; Xiao, Z. Ensemble transfer learning on augmented domain resources for oncological named entity recognition in Chinese clinical records. IEEE Access 2023, 11, 80416–80428. [Google Scholar] [CrossRef]
Wang, J.; Wang, C.; Huang, J.; Gao, M.; Zhou, A. Uncertainty-aware Self-training for Low-resource Neural Sequence Labeling. arXiv 2023, arXiv:2302.08659. [Google Scholar] [CrossRef]
Chen, X.; Li, L.; Fei, Q.; Zhang, N.; Tan, C.; Jiang, Y.; Huang, F.; Chen, H. One model for all domains: Collaborative domain-prefix tuning for cross-domain NER. arXiv 2023, arXiv:2301.10410. [Google Scholar]
Ghosh, S.; Tyagi, U.; Kumar, S.; Manocha, D. BioAug: Conditional Generation based Data Augmentation for Low-Resource Biomedical NER. arXiv 2023, arXiv:2305.10647. [Google Scholar]
Ghosh, S.; Tyagi, U.; Suri, M.; Kumar, S.; Ramaneswaran, S.; Manocha, D. ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER. arXiv 2023, arXiv:2306.00928. [Google Scholar]
Mehta, R.; Varma, V. LLM-RM at SemEval-2023 Task 2: Multilingual Complex NER using XLM-RoBERTa. arXiv 2023, arXiv:2305.03300. [Google Scholar]
Wang, S.; Sun, X.; Li, X.; Ouyang, R.; Wu, F.; Zhang, T.; Li, J.; Wang, G. GPT-NER: Named entity recognition via large language models. arXiv 2023, arXiv:2304.10428. [Google Scholar]
Goldberg, Y.; Levy, O. word2vec Explained: Deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv 2014, arXiv:1402.3722. [Google Scholar]
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
Řehůřek, R.; Sojka, P. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop New Challenges for NLP Frameworks, Valletta, Malta, 22 May 2010. [Google Scholar]
Mo, H.M.; Nwet, K.T.; Soe, K.M. CRF-Based Named Entity Recognition for Myanmar Language. In Proceedings of the International Conference on Genetic and Evolutionary Computing, Fuzhou, China, 7–9 November 2016; Springer International Publishing: Cham, Switzerland, 2016. [Google Scholar]
Yu, N.; Xin, Y.; Yu, Z.; Huang, S.; Guo, J. A Khmer NER method based on conditional random fields fusing with Khmer entity characteristics constraints. In Proceedings of the 2017 29th Chinese Control and Decision Conference (CCDC), Chongqing, China, 28–30 May 2017. [Google Scholar]
Martineau, J.; Finin, T. Delta TFIDF: An Improved Feature Space for Sentiment Analysis. In Proceedings of the Third International Conference on Weblogs and Social Media, ICWSM 2009, San Jose, CA, USA, 17–20 May 2009. [Google Scholar]
Robertson, S. Understanding inverse document frequency: On theoretical arguments for IDF. J. Doc. 2004, 60, 503–520. [Google Scholar] [CrossRef]
Chen, X.; Shi, Z.; Qiu, X.; Huang, X. Adversarial Multi-Criteria Learning for Chinese Word Segmentation. arXiv 2017, arXiv:1704.07556v1. [Google Scholar]
Ferragina, P.; González, R.; Navarro, G.; Venturini, R. Compressed text indexes: From theory to practice. J. Exp. Algorithm. 2009, 13, 12. [Google Scholar] [CrossRef]
Sinha, R.; Zobel, J. Efficient Trie-Based Sorting of Large Sets of Strings. ACSC 2003, 16, 11–18. [Google Scholar]
Cui, Y.; Che, W.; Liu, T.; Qin, B.; Yang, Z. Pre-training with whole word masking for Chinese BERT. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 3504–3514. [Google Scholar] [CrossRef]
Wang, P.; Qian, Y.; Soong, F.K.; He, L.; Zhao, H. A unified tagging solution: Bidirectional LSTM recurrent neural network with word embedding. arXiv 2015, arXiv:1511.00215. [Google Scholar]
Xie, T.; Yang, J.A.; Liu, Y. Chinese Entity Recognition Based on BERT-BiLSTM-CRF Model. Comput. Syst. Appl. 2020, 29, 48–55. [Google Scholar]

Figure 1. Example of a domain dictionary tree.

Figure 3. Dictionary enhancement BERT.

\{c_{1}, c_{2}, \dots, c_{n}\}

are represented as n characters of the input text, and

\{e_{1}, e_{2}, \dots, e_{n}\}

represents the word vector obtained via passing input file characters through the Input Embeder layer. Lexicon Adapter is attached between certain transformers within BERT, thereby injecting external lexicon knowledge into BERT.

Figure 3. Dictionary enhancement BERT.

\{c_{1}, c_{2}, \dots, c_{n}\}

are represented as n characters of the input text, and

\{e_{1}, e_{2}, \dots, e_{n}\}

represents the word vector obtained via passing input file characters through the Input Embeder layer. Lexicon Adapter is attached between certain transformers within BERT, thereby injecting external lexicon knowledge into BERT.

Figure 4. Example of sentence entity annotation.

Figure 5. Effect of incorporation of domain dictionaries on experimental results.

Table 1. Examples of words in the field of mine hoisting fault.

Entity Type	Examples
Failure phenomenon	The contact tip on the rectifier bar is melted off by heat, Severe speed reducer oscillation…
Fault Location	Hoist spindle bearing, Hydraulic system reducer…
Cause of fault	Spindle bearing breakage, Improper gear meshing…
Fault Impact	Hoist emergency brake, Hydraulic system light up alarm…
Repair measures	Optimization of the lubrication station cooling system, Smooth out the rectifier…

Table 2. Corpus segmentation comparison with added dictionary.

Datasets	Not Added to Dictionary	Add to the Dictionary
减速器传动轴弯曲 (The drive shaft of the reducer is bent)	减/速/器/传/动/轴/弯/曲 (The drive shaft of the reducer is bent)	减速器/传动轴/弯曲 (The drive shaft of the reducer is bent)
齿轮啮合不当导致油温过高 (Improper gear meshing causes high oil temperature)	齿/轮/啮/合/不/当/导/致/油/温/过/高 (Improper gear meshing causes high oil temperature)	齿轮/啮合不当/导致/油温过高 (Improper gear meshing causes high oil temperature)

Table 3. Experimental environment setup.

Experimental Environment	Configuration
Operating System	CentOS 7.5 64bit
GPU	NVIDIA Tesla P100
Python	3.7
PyTorch	1.5.1 + cudaa92

Table 4. Experiment parameter settings.

Parameter Name	Value
Word2vec word vector dimensionality	200
Learning Rate	1 × 10⁻⁵
Batch Size	16
Random Discard Rate	0.1
Hidden layer dimension	768
Number of attention mechanism heads	12
Maximum sentence length	128
Maximum fusion of vocabulary information per Chinese character	3

Table 5. Relevant comparison experiment results.

Model Name	P	R	F1
BiLSTM	38.45%	40.33%	39.37%
BiLSTM-CRF	68.27%	64.24%	66.20%
BERT	83.27%	91.98%	83.27%
BERT-BiLSTM-CRF	89.22%	94.37%	91.73%
LEBERT	97.01%	98.20%	97.60%
Method of this study	97.33%	98.39%	97.86%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dang, X.; Wang, L.; Dong, X.; Li, F.; Deng, H. Improving Low-Resource Chinese Named Entity Recognition Using Bidirectional Encoder Representation from Transformers and Lexicon Adapter. Appl. Sci. 2023, 13, 10759. https://doi.org/10.3390/app131910759

AMA Style

Dang X, Wang L, Dong X, Li F, Deng H. Improving Low-Resource Chinese Named Entity Recognition Using Bidirectional Encoder Representation from Transformers and Lexicon Adapter. Applied Sciences. 2023; 13(19):10759. https://doi.org/10.3390/app131910759

Chicago/Turabian Style

Dang, Xiaochao, Li Wang, Xiaohui Dong, Fenfang Li, and Han Deng. 2023. "Improving Low-Resource Chinese Named Entity Recognition Using Bidirectional Encoder Representation from Transformers and Lexicon Adapter" Applied Sciences 13, no. 19: 10759. https://doi.org/10.3390/app131910759

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improving Low-Resource Chinese Named Entity Recognition Using Bidirectional Encoder Representation from Transformers and Lexicon Adapter

Abstract

1. Introduction

2. Building Domain Dictionaries and Dictionary Matching Methods

2.1. Domain Dictionary Construction

2.1.1. Statistical Method

2.1.2. Manual Method

2.2. Dictionary Tree

2.3. Lexicon Adapter

3. Method

3.1. BERT-Based Character Vector Representation

3.2. Lexicon Adapter

3.2.1. Char-Words Pair Sequence

3.2.2. Lexicon Enhanced BERT

3.3. Entity Label Prediction Layer

4. Experiments and Results

4.1. Datasets

4.2. Experiment Parameter Settings

4.3. Evaluation Indicators

4.4. Experiment Results and Analysis

4.4.1. Relevant Comparison Experiments

4.4.2. Ablation Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI