Research on Named Entity Recognition Based on Gated Interaction Mechanisms

Liu, Bin; Chen, Wanyuan; Tao, Jialing; He, Lei; Tang, Dan

doi:10.3390/app14156481

Open AccessArticle

Research on Named Entity Recognition Based on Gated Interaction Mechanisms

by

Bin Liu

^1,2

,

Wanyuan Chen

^1,2

,

Jialing Tao

^1,2,

Lei He

^1,2,*

and

Dan Tang

^1,2

¹

School of Software Engineering, Chengdu University of Information Technology, Chengdu 610025, China

²

Sichuan Province Engineering Technology Research Center of Support Software of Informatization Application, Chengdu 610225, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(15), 6481; https://doi.org/10.3390/app14156481

Submission received: 4 July 2024 / Revised: 20 July 2024 / Accepted: 20 July 2024 / Published: 25 July 2024

Download

Browse Figures

Versions Notes

Abstract

:

Using long short-term memory (LSTM) networks to build a named entity recognition model is important for the task of named entity recognition. However, traditional memory networks lack a direct connection between input information and hidden states, leading to key feature information not being fully learned during training and causing information loss. This paper designs a bidirectional variant of the long short-term memory (BiLSTM) network called Mogrifier-BiGRU, which combines the BERT pre-trained model and the conditional random field (CRF) network model. The Mogrifier gating interaction unit is set with more hyperparameters to achieve deep interaction of gating information, changing the relationship between input and hidden states so that they are no longer independent. By introducing more nonlinear transformations, the model can learn more complex input–output mapping relationships. Then, by combining Bayesian optimization with the improved Mogrifier-BiGRU network, the optimal hyperparameters of the model are automatically calculated. Experimental results show that the model method based on the gating interaction mechanism can effectively combine feature information, improving the accuracy of Chinese-named entity recognition. On the dataset, an F1-score of 85.42% was achieved, which is 7% higher than traditional methods and 10% higher for the accuracy of some entity recognition.

Keywords:

named entity recognition; deep learning; GRU; gated interaction mechanism; Bayesian optimization

1. Introduction

Named entity recognition (NER) [1] has been a popular research field. NER technology has evolved from initially relying on rules and dictionaries to traditional machine learning models and then to various deep learning techniques [2]. Early methods based on rules and dictionaries combined linguistic features of text with relevant dictionaries, manually constructed rule templates suitable for such text, and finally used manual matching methods to identify entities related to the rule templates. These methods had disadvantages such as requiring a lot of manpower, poor portability, low reusability, and dependence on rule templates [3]. With the help of traditional machine learning models, researchers have treated named entity recognition as a sequence labeling task, with the most common being hidden Markov models [4], conditional random fields [5], and maximum entropy models [6]. The traditional hidden Markov model was improved by Liu et al. by considering not only the features of the word itself but also the influence of the previous and next N event states, which improved the accuracy of the model’s recognition [7]. Although traditional machine learning has to some extent alleviated the early over-reliance on template rules, improving model accuracy requires a large amount of labeled feature data.

In recent years, named entity recognition tasks have seen a popular research direction shift towards methods based on deep learning. Traditional sequence labeling models have employed [8,9] as the foundation for their named entity recognition tasks. Hammerton and others have addressed this by employing LSTM (long short-term memory) networks [10]. Subsequently, models such as [11,12,13,14]. Tang have combined convolutional neural networks with conditional random field models and attention mechanisms to construct a neural network framework for recognizing entities in Chinese texts [15]. Traditional convolutional neural networks and recurrent neural networks possess strong short-term memory effects, yet they are incapable of retaining feature information from distant sequences, resulting in gradient decay. References [16,17] have utilized a model framework that amalgamates long short-term memory networks, bidirectional long short-term memory networks, and conditional random fields to mitigate gradient explosion and decay by acquiring bidirectional feature information from word vectors. This framework has been effectively applied to named entity recognition tasks with superior recognition capabilities. Lin and others have integrated CNNs into the BiLSTM-CRF model framework to concurrently capture global context feature information and local key information, subsequently fusing feature information through a multi-head attention mechanism, successfully identifying entities in the domain of subway vehicle equipment and ameliorating the issue of insufficient features in this domain [18]. However, the aforementioned models rely on static word vectors generated by the Word2Vec model as input, and when confronted with challenges such as lengthy named entities, the polysemy of single characters, and the synonymy of multiple words in Chinese, static word vectors fail to adequately represent the semantic information of words, thereby impacting the recognition performance of the models [19]. Devlin and others at Google proposed the pre-trained model BERT and subsequent improved pre-trained language models, which can capture character position information and bidirectional feature relationships from the corpus, dynamically generate context-related word vectors, effectively address the problem of polysemy in Chinese, and have been widely used in NER tasks [20]. Research [21] has introduced a bidirectional pre-trained language model, BERT, which dynamically acquires word vectors and then captures bidirectional contextual feature information. This approach, in conjunction with BiLSTM-CRF decoding, has improved the accuracy by 10.06% for identifying entities within specific domains, thereby alleviating the problem of low accuracy in text entity recognition in those domains. Chen et al. have utilized the BERT pre-trained model to obtain word vectors containing position information, extracted global and local features using BiLSTM and IDCNN, respectively, and connected them. They have identified COVID-19 epidemiological entities through conditional random field decoding, solving the problem of insufficient local feature extraction and achieving an accuracy of 94.8% [22]. Qian et al. have proposed a BERT-BiLSTM-CRF text model framework and introduced adversarial training. The framework has treated named entity recognition as the main task and Chinese word segmentation as the auxiliary task, sharing word vector boundary information to help the model recognize entity boundaries, and has improved the accuracy by 3.25% compared to the baseline model [23]. Addressing the lack of Chinese character shape and pronunciation features in Chinese pre-trained models, Fei-Long et al. have integrated the visual features of character shape and pronunciation into the pre-training phase of the model to enhance its understanding of Chinese grammar and semantics [24]. In the process of constructing knowledge graphs, accurate entity recognition is a necessary technique. The gated recurrent unit (GRU) has been favored due to its relatively simple structure and fewer parameters, making it easier to converge during training, especially when the training set is small. As a result, the GRU has been increasingly applied to entity recognition tasks [25]. However, there are still issues in Chinese named entity recognition, such as blurry entity boundary information, potential semantic conflicts between words, and the inability to further utilize potential semantic information. Therefore, further research is needed to understand how to combine the Mogrifier [26] mechanism with named entity recognition to deeply mine the implicit semantic information in the text and establish a more accurate named entity recognition model.

In response to the above issues, this paper proposes a named entity recognition model based on a gating interaction mechanism. First, the dynamic vector representation of the text is obtained through the BERT pre-trained language model. Then, the Mogrifier BiGRU model is used to enhance the interaction between context information while extracting features from the input vectors. Additionally, an adaptive optimization algorithm is introduced for the gating interaction unit to automatically adjust and optimize its parameters. Finally, a state transition matrix is used to improve the reliability of the prediction results.

The main contributions of the research can be summarized as follows:

By introducing the BERT pre-trained model and adding a gating interaction mechanism, we enhance local feature information. The dynamic vector representation and feature enhancement effectively solve the problem of polysemy in Chinese.
Based on the named entity recognition model, we have improved the research on the gating interaction mechanism. By setting more hyperparameters to capture the implicit semantic change rules and using Bayesian optimization algorithms to select hyperparameters in the model, we have achieved automatic calculation of the model’s hyperparameters, thereby improving the accuracy of the named entity recognition model.

2. Materials and Methods

The named entity recognition model proposed in this paper mainly consists of an embedding layer, feature extraction layer, fully connected layer, and CRF layer. The first layer is the embedding layer, which converts the input into word vector representations that can be computationally processed by computers using the BERT pre-trained language model, while preserving some semantic features of the sentence. The second layer is the feature extraction layer, which enhances the information interaction between contexts by improving the Mogrifier BiGRU for feature extraction from vectors. An adaptive optimization algorithm is also employed to complete the adaptive computation of the parameters for the gated interaction unit. The third part is the CRF layer, which establishes constraints through the state transition matrix to obtain label prediction results. The network model structure is shown in the Figure 1 below:

2.1. Embedded Layer

The BERT-wwm [27] model used in this paper is composed of 12 stacked transformer networks [28]. During the unsupervised pre-training phase, two tasks, masked language modeling (MLM) [29] and next sentence prediction (NSP), are introduced to train the encoder network within the transformer. In the 12-layer BERT stack, the lower layers capture surface information features of the language; the middle layers capture syntactic information features; and the higher-level networks capture semantic information features, which can gradually handle long-range dependencies. For the audit named entity recognition task, the semantic information features contained in the word vectors are crucial for identifying audit entities, while the next sentence prediction task and the surface information features of the language have little impact on the accuracy of entity recognition. By concatenating the outputs of the last three encoder networks into a single word vector, which carries rich contextual semantic information features, the model’s entity recognition accuracy can be further improved, reducing the influence of redundant information.

2.2. Feature Extraction Layer

To overcome the issues caused by the trial-and-error selection of hyperparameters in the traditional Mogrifier gating mechanism, and to better mine the features in textual information data, this study proposes an improved Mogrifier GRU network model. This model utilizes Bayesian optimization to adaptively select hyperparameters and introduces richer hyperparameters for the traditional structure, allowing for more effective modeling of Chinese-named entity recognition tasks. Through this improvement, we aim to enhance the model’s performance, making it more accurate and robust in the field of Chinese named entity recognition. The network structure of the improved Mogrifier GRU network model is shown in Figure 2.

The following design for the improved Mogrifier GRU network model was proposed in this study:

An iterative round i was defined to describe the number of interactions the model undergoes during each update process. By adjusting the value of i, the results of information interaction under different rounds can be explored, allowing for the identification of the optimal number of iterations. The improved gating mechanism is calculated as shown in the formula.

\begin{array}{l} x^{i} = k σ (Q^{i} h_{p r e v}^{i - 1}) \otimes x^{i - 2} \\ h_{prev}^{i} = k σ (R^{i} x^{i - 1}) \otimes h_{p r e v}^{i - 2} \end{array}

(1)

Q and R are represented as randomly initialized matrices. By utilizing the weight matrices Q and R, the input x and the hidden state hprev are integrated to form new vector representations xi and hiprev, respectively. The sigmoid function is employed due to its smooth and differentiable nature within its interval, facilitating better convergence of the model during training. Additionally, the sigmoid function’s property of approaching 1 or 0 when the input is large or small, respectively, helps to prevent the vanishing gradient problem during computation. This, in turn, enhances the training stability of the model. Since the output of the sigmoid function ranges between 0 and 1, it can be used as a probability value to represent the probability of retaining or discarding the input information. An optimization algorithm is used to set k as a hyperparameter, serving as a pre-coefficient in the formula, to avoid the situation where the value becomes smaller and eventually approaches zero after multiple sigmoid calculations. This ensures relative stability in the numerical values. The GRU computation process is depicted in the formula as follows:

h_{t} = (1 - σ (W_{z} \cdot [h_{t - 1}, x_{t}])) \otimes h_{t - 1} + σ (W_{z} \cdot [h_{t - 1}, x_{t}]) \otimes \tanh (W \cdot [σ (W_{r} \cdot [h_{t - 1}, x_{t}]) * h_{t - 1}, x_{t}])

(2)

In the improved GRU model, we use the sigmoid function as the activation function, with its mathematical expression being:

σ (x) = \frac{1}{1 + e^{- x}}

. By employing the sigmoid function, input data can be mapped to a numerical range between 0 and 1, thereby generating gating signals. Gating signals are critical in the GRU model, as they determine the extent to which past state information is retained in the current state. When the gating signal is close to 1, it signifies that more of the past memory information is retained; conversely, when the gating signal is close to 0, it indicates that less of the past memory information is retained. Through this mechanism, the GRU model is capable of flexibly handling dependencies across different time steps, effectively capturing features within sequence data.

To address the issue of random errors caused by the trial-and-error selection of hyperparameters in traditional deep learning, this study incorporates the Bayesian optimization method [30] to adaptively optimize the selection of hyperparameters within the network model. The core idea of Bayesian optimization is to utilize existing experimental data to guide the design of subsequent experiments. Specifically, Bayesian optimization assumes that i is the collection of hyperparameters U and e in the proposed algorithm, and f(i) is the corresponding model effect. The goal is to solve the distribution of the posterior probability P(f*|i*,i,f), where f* refers to the untried model effect, and i* refers to the corresponding untried hyperparameters. By fitting this posterior probability distribution, the estimated model effects corresponding to different hyperparameter sets ii are obtained. In the process of Bayesian optimization, Gaussian process regression is employed as the prior function, and the upper bound of the confidence interval is used as the acquisition function. This method fully utilizes historical evaluation results and continuously updates the model. In the parameter optimization stage, a Gaussian process regression (GPR) strategy is adopted, aiming to fit the posterior probability distribution using historical evaluation data. As a non-parametric Bayesian method, GPR assumes that the posterior distribution of the target function (i.e., model performance metric) can be regarded as a Gaussian process, given the training dataset. Specifically, the Gaussian process assumes that the joint probability distribution of any set of random variables and their corresponding process states follows an N-dimensional Gaussian distribution. This distribution is completely characterized by its mean function and covariance function. The mean function reflects the overall trend of the process, while the covariance function describes the degree of association between different data points. Under the framework of Bayesian optimization, we use Gaussian process regression as the prior function and update the posterior probability distribution based on the observed experimental data. Based on the updated posterior distribution, we select the next evaluation point, usually choosing those expected to bring the greatest performance improvement. This process will iterate until it meets the predetermined stopping criteria, reaches the specified number of iterations, or observes that the improvement in model performance has saturated. The key equations are as follows:

f * i, y, i * ~ N (Q *, cov (f *))

(3)

Q * = k (i *, i) {(k (i, i^{'}) + σ_{n}^{2} I)}^{- 1} y

(4)

cov (f *) = k (i *, i *) - k (i *, i) {(k (i, i^{'}) + σ_{n}^{2} I)}^{- 1} k (i, i *)

(5)

In the equations: i ∈ R^N*d represents the input data matrix; y ∈ R^N represents the output data matrix; I ∈ R^N*N is the N-dimensional identity matrix; k(i,i’) is the symmetric positive definite covariance matrix; k(i,i*) ∈ R^N*N is the covariance matrix between the test point i and the training set inputs i; k(i*,i*) ∈ R^N*N is the covariance of the test point i itself; Q* is the mean corresponding to the test point i*; cov(f*) is the covariance corresponding to the test point i*. In the above formulas: The input data matrix i contains N observation samples, each with d-dimensional features; the output data matrix y contains the corresponding target values; and I is an N-dimensional identity matrix, used to ensure the positive definiteness of the covariance matrix. Bayesian optimization is a sequential model-based optimization method that, as the sampling of the parameter space increases, yields a distribution increasingly closer to the true distribution, thereby improving the accuracy of optimization. However, since model training consumes a large amount of computational resources, there is a need to balance the number of samples and computational cost in practice. To assess the quality of sampling in Bayesian optimization, we introduce the acquisition function, which measures the computational value of the current model’s covariance function and mean function. The task of the acquisition function is to guide the algorithm in a targeted exploration of the parameter space in order to find the optimal solution more quickly. Common acquisition functions include expected improvement (EI), probability of improvement (POI), probability improvement (PI), and upper confidence bound (UCB), among others. In this study, to reduce time and hardware costs, we chose the upper confidence bound (UCB) as the acquisition function. The UCB algorithm directly compares the maximum value of the confidence interval, featuring simplicity and efficiency. Its calculation formula is as follows:

U C B (i) = u (i) + β^{1 / 2} σ (i)

(6)

In the equation: u(i) represents the mean function; σ(i) represents the covariance function; and β is a hyperparameter that controls the balance between exploration and exploitation. The UCB acquisition function selects the next sampling point based on a trade-off between the mean function (exploitation) and the covariance function (exploration). By adjusting the value of β, we can control the exploration-exploitation balance according to the specific problem requirements. A higher value of β will prioritize exploration, while a lower value will prioritize exploitation. Select the F1-score as the performance metric for the Gaussian regression process, the RBF kernel [31] function as the kernel function of the Gaussian process, and set the hyperparameters of the kernel function. Then, specify a noise item variance to consider the observation errors in the target function. Randomly select several groups of parameters from the search space and train an entity recognition model using these parameters to obtain the corresponding target function values. Use these parameters and their corresponding function values as the training dataset to learn the hyperparameters of the kernel function and the noise item variance. Then, for a given model parameter, use the trained Gaussian process model to predict the mean and variance of the target function based on the prediction results, and select the most promising parameter combination for evaluation until the predefined stopping condition is reached.

2.3. CRF Layer

After a character sequence has undergone encoding and feature extraction, each character is predicted to have a probability of belonging to different category labels when processed by the Mogrifier BiGRU. When selecting the label with the highest probability, there may be inconsistencies in the predicted labels for a sequence of characters. For example, during the prediction of entity categories, there might be a sequence of “O-outside, I-inside”, where the “I-inside” label follows the “O-outside” label, which is clearly not the correct entity category. The conditional random field (CRF) layer can address this issue by maximizing the conditional likelihood estimate through transition probabilities. By learning the constraints between labels, the CRF layer can restrict the model’s output to ensure the reasonableness of the predicted labels.

3. Results and Discussion

3.1. Data Set

To further verify the performance of the entity recognition model based on BERT-Mogrifier BiGRU-CRF proposed in this paper, experimental verification was conducted on the public datasets CLUENER2020 [32] and CCKS2019 [33].

In the experiments, the data were annotated using the BIO (B-begin, I-inside, and O-outside) three-position tagging scheme. This annotation specification helps to clarify the boundaries of named entities, thereby improving the accuracy of entity recognition. The specific annotation rules are as follows: B-X represents the beginning of the named entity X, I-X represents the middle or end of the named entity, and O indicates that it does not belong to any type, that is, a non-entity part. By using the BIO annotation scheme, we can clearly annotate the named entities in the text, providing accurate labeled data for subsequent training of the entity recognition model.

3.2. Evaluation Indicators

In this experiment, the evaluation metrics adopted the three most commonly used and important evaluation indicators for named entity recognition tasks: precision (P), recall (R), and F1-score (F1). These three indicators together form a comprehensive evaluation system, which helps us accurately measure the performance of the entity recognition model. The specific calculation formulas are as follows:

3.2.1. Precision (P)

Precision refers to the proportion of entities correctly identified by the model out of the total number of entities predicted by the model. The calculation formula is as follows:

P = \frac{T P}{T P + F P}

(7)

where TP (true positives) represents the number of correctly recognized named entities, and FP (false positives) represents the number of incorrectly recognized entities. Precision reflects the ability of the model to avoid false positives, that is, to correctly identify only entities that are actually entities. A high precision indicates that the model has a low rate of false positives and is more reliable in identifying entities.

3.2.2. Recall (R)

Recall refers to the proportion of entities correctly identified by the model out of the total number of actual entities. The calculation formula is as follows:

R = \frac{T P}{T P + F N}

(8)

where TP (true positives) represents the number of correctly recognized named entities, and FN (false negatives) represents the number of named entities that were not correctly recognized. Recall reflects the ability of the model to find all actual entities, that is, to minimize false negatives. A high recall indicates that the model can identify most of the actual entities and is less likely to miss any entities.

3.2.3. F1-Score (F1)

The F1-score is the harmonic mean of precision and recall, used to comprehensively consider the performance of precision and recall. The calculation formula is as follows:

F 1 = \frac{2 P R}{P + R}

(9)

where P represents precision and R represents recall. The F1-score balances the trade-off between precision and recall, providing a single metric to evaluate the overall performance of the model. A high F1-score indicates that the model has both high precision and high recall, achieving a good balance between the two indicators.

3.3. Setup

The experimental environment is a Windows 11 operating system. The model is based on the PyTorch framework, with a GPU of GeForce RTX 4090 (NVIDIA, Santa Clara, CA, USA). In the experiments of this paper, we adopted the SoftMax cross-entropy loss function as the loss function for the grid. The SoftMax cross-entropy loss function is a commonly used loss function suitable for multi-class classification problems. It measures the prediction error of the model by calculating the difference between the probability distribution predicted by the model and the true labels. In the entity recognition task of this paper, we convert the label of each character into a probability distribution and use the SoftMax cross-entropy loss function to calculate the difference between the probability distribution predicted by the model and the true labels. To optimize the model parameters, we used the Adam adaptive learning rate optimization method, which automatically adjusts the learning rate to make the model more stable and efficient during training. In the experiments of this paper, we continuously updated the model parameters to minimize the loss function, thereby improving the prediction accuracy of the model. To prevent overfitting of the model, we added a dropout layer to the network. Dropout is a regularization technique that stabilizes the model during training by randomly discarding the output of some neurons. The model parameter settings are shown in the Table 1.

3.4. Experimental Results and Analysis

To analyze the changes brought about by the introduction of the gated recurrent unit (GRU) and the gated interaction mechanism separately, this section follows the principle of controlling a single variable. First, experiments are conducted to compare the gated interaction mechanism, followed by a comprehensive comparison analysis with the GRU on this basis. The experiments in this section use the BERT-BiLSTM-CRF model as the baseline and set up the following model structures for comparison to analyze the performance of the GRU and gated interaction mechanism:

(1): BERT-BiLSTM-CRF: The baseline model structure, widely used in various tasks, with stable performance [34].
(2): BERT-BiGRU-CRF: Replaces the long short-term memory (LSTM) units with GRU units on the basis of the baseline model for comparison [35].
(3): BERT-Mogrifier-BiGRU-CRF: A model structure that combines the GRU and the gated interaction mechanism, an improvement over model (2).

This section will first verify the applicability of the gated interaction mechanism and then compare the network structure of model (2) with other models on the CLUENER2020 dataset to analyze the overall performance of the improved model. To ensure the reliability of the experimental structure, the above model structures maintain consistency in some parameters with the same structure.

Data Comparison Experiment and Result Analysis

Using the Bayesian optimization algorithm described in Section 2, we selected the optimal hyperparameters for the gated interaction unit. When the round i is 3 and the pre-coefficient k is 1.32, the network is the optimal model. The table below lists the F1 evaluation values corresponding to different hyperparameters. To verify the impact of the interaction rounds r between GRU input and hidden layers on Chinese named entity recognition, we conducted experiments with different r values. With I and k as the basic parameters and other parameters unchanged, the table shows that the variation of hyperparameters of the gated interaction unit has an important impact on the recognition results of the model. Using adaptive parameter tuning can better capture the relationship between parameters and recognition results, thereby selecting the most suitable parameters. The experimental results for the practicality of the gated interaction mechanism are shown in the following Figure 3.

In the experimental results, we have statistically analyzed the impact of varying the pre-coefficient k on the F1 value at different interaction rounds, ranging from 1 to 4. The horizontal axis in the figure represents the pre-coefficient k of the interaction unit, while the vertical axis represents the harmonic mean of the model’s precision and recall. Since the change in the pre-coefficient k is nonlinear, the results in the figure are presented using a scatter plot. From the figure, it can be observed that at different interaction rounds, changing the pre-coefficient k can have a certain impact on the model’s accuracy. Specifically, when the pre-coefficient is too low, it can cause a severe decrease in the model’s F1 value. However, when the value of the pre-coefficient is appropriately chosen, it can improve the model’s accuracy.

According to the experimental results, when the interaction rounds are between 1 and 2, the model’s F1 value shows a relatively obvious fluctuating distribution as the pre-coefficient k increases. Specifically, when the interaction round is 1 and the pre-coefficient is 2.313, the model’s F1 value reaches 0.81, which is the best case in the statistics. When the interaction round is 2 and the pre-coefficient is 1.801, the model’s F1 value is 0.838, which is the best case in the statistics. As the interaction rounds increase, the impact of increasing the pre-coefficient k on the model’s F1 value decreases after 3 to 4 rounds. Although fluctuations still occur, the range of the model’s F1 value is within a smaller interval. Furthermore, as the interaction rounds continue to increase, an excessive interaction causes the feature values to be too large, which reduces the model’s accuracy. When the interaction round is 3 and the pre-coefficient is 1.327, the model’s F1 value is 0.869, which is the best case in the statistics. When the interaction round is 4 and the pre-coefficient is 1.665, the model’s F1 value is 0.84, which is the best case in the statistics. It can be seen that adding an additional gated interaction mechanism can have a certain impact on the model’s accuracy.

Additionally, to test the performance of the BERT-MogrifierBiGRU-CRF network combined with the gated interaction unit, we conducted comparative experiments on the CLUENER2020 dataset using BERT-BiGRU-CRF, BERT-MogrifierBiLSTM-CRF, and the proposed BERT-MogrifierBiGRU-CRF network model in terms of loss values. The results of the comparative experiments on different models for the CLUENER2020 dataset are shown in Figure 4.

From the experimental results, it can be observed that the introduction of the gated interaction mechanism significantly improves the performance on the dataset compared to the standalone BiGRU network. Both network models show faster convergence speeds and lower loss values in processing Chinese text data than those without the gated interaction mechanism. When comparing BERT Mogrifier BiLSTM CRF and the proposed BERT Mogrifier BiGRU CRF network model, the change from the long short-term memory network (LSTM) to the gated recurrent unit (GRU) further accelerates the model’s convergence speed. This demonstrates that the gated interaction unit proposed in this paper exhibits a faster reduction in neural network loss values when processing Chinese text data and also shows higher sensitivity to feature changes in Chinese text data. To evaluate the performance and generalization capabilities of the model proposed in this study for Chinese named entity recognition (NER), its accuracy, recall, and F1-score metrics were compared with several other models on two public datasets: CCKS2019 and CLUENER2020. For the comparison on the public dataset CCKS2019, the experimental results are shown in Table 2.

In this experiment, we compared the training times of BERT-BiGRU-CRF, BERT-BiLSTM-CRF, and the model proposed in this paper. The experimental results show that the training time for the BERT-BiGRU-CRF model is 175.25 min, the training time for the BERT-BiLSTM-CRF model is 186.32 min, and the training time for the model in this paper is 183.38 min. This indicates that using BiGRU instead of BiLSTM can effectively shorten the training time of the model. Although the introduction of an additional deformation unit makes the training time of the model in this paper slightly longer than that of the BiGRU model, it is still significantly lower than the training time of the BiLSTM model. Moreover, the model in this paper also shows improved accuracy. In summary, the model proposed in this paper ensures the effectiveness of entity recognition while effectively reducing training costs, achieving a balance between time and performance. Combining the gated recurrent unit (GRU) with the gated interaction mechanism on the CLUENER2020 dataset, the experimental results are compared with the other two models as follows: On the CLUENER2020 dataset, the F1 value of the BERT-Mogrifier-BiGRU-CRF model has increased by 7.28% compared to the BERT-BiLSTM-CRF model structure, and by 1.4% compared to the BERT-BiGRU-CRF model structure. Compared to the BERT-BiLSTM-CRF model structure, there is a significant improvement in the recognition ability of all entities in the dataset. The recognition accuracy of entities with noticeable improvements reached 13.5%, 8.9%, and 8.4%. To further explore the superiority of the model proposed in this paper compared to other models, we collected training data with large improvement margins to find patterns. On the CLUENER2020 dataset, there is an imbalance in the number of entities in different categories. The BiLSTM-CRF model excessively focuses on entity categories with a larger number and neglects those with fewer entities. The gated interaction mechanism of the model proposed in this paper can adjust the weights of input representations, enabling the model to focus on entity categories with fewer entities as well. This allows the model to improve its recognition ability for entity categories with fewer entities without sacrificing its recognition ability for categories with more entities. Additionally, the BERT pre-trained model itself has learned rich semantic representations and possesses strong text representation capabilities through pre-training on large-scale text corpora. The model proposed in this paper achieved better results on the dataset compared to the BERT-BiLSTM-CRF model structure. Comparing the results of the model proposed in this paper with the BERT-BiGRU-CRF model on this dataset, the introduction of the gated interaction mechanism also slightly improved the accuracy of entity recognition, demonstrating the effectiveness of the gated interaction mechanism. To a certain extent, it has improved the recognition of Chinese fine-grained named entities. Furthermore, the model also showed good coping abilities for the polysemy phenomenon, where entities with the same name represent different types in different training corpora.

In this study, we conducted a detailed statistical analysis of entities in sentences where some models made recognition errors on the CLUENER2020 dataset. The research results show that when faced with nested entity processing tasks, if the entity type appears in other training data, the model often only recognizes the entity type at the rear of the combined entity and cannot comprehensively and accurately determine the entire entity type. Additionally, for longer-nested entities existing in the data, the model’s recognition effect is also unsatisfactory. After in-depth analysis, we believe that the reason for this phenomenon is that when annotating such entities in the training data, the annotation results are often not precise enough, leading to a decrease in the recognition accuracy of some entities.

4. Conclusions

To address the problem of Chinese-named entity recognition, we constructed a network model based on BERT-Mogrifier-BiGRU-CRF. By improving the Mogrifier gating mechanism, we solved the issue of data feature information being independent in feature extraction, and by leveraging the idea of information interaction, we further enhanced the performance of the GRU network. Through the introduction of a richer set of hyperparameters and their adaptive optimization selection using Bayesian optimization, we overcame the limitations of traditional empirical and trial-and-error methods, allowing us to search for the optimal combination of hyperparameters on a global scale. With the BiGRU bidirectional gating unit network structure, we were able to capture the global semantic features of the input text sequence, and this bidirectional structure allowed the model to utilize both forward and backward information, thus providing a more comprehensive understanding of the text content. Finally, we introduced a CRF model to handle the interdependencies between labels, which could consider the global optimality of the label sequence, ensuring the coherence and consistency of entity recognition results. This approach effectively addressed the issues caused by the independent label assumption in traditional methods, thereby improving the accuracy of the Chinese named entity recognition model. The model’s performance on the CLUENER2020 dataset showed that it could handle the polysemy phenomenon of popular and novel vocabulary on the internet, and the accuracy of entity recognition was also improved after information interaction. Compared with traditional neural network algorithms, the F1-score of the proposed BERT-Mogrifier-BiGRU-CRF network model reached 86.9%, which is higher than other models. In future research, we will continue to pay attention to the development of pre-trained models and try to use more advanced pre-trained models as the base model, which will help to further optimize the model structure. To improve the model’s generalization ability and robustness, we will strive to collect and organize more high-quality Chinese-named entity recognition datasets, enhancing the model’s performance in various application scenarios. To better understand the working principles and performance of the model, we will also closely follow the research on model interpretability, trying to reveal the key factors and internal mechanisms of the model in the process of entity recognition. In summary, we will continuously improve and expand the model proposed in this paper in our future work, aiming to achieve better results in the field of Chinese-named entity recognition.

Author Contributions

B.L. and W.C. conceived the project; B.L. guided and coordinated the research; W.C. and J.T. performed the experimental work, analysis and interpretation of the data; W.C. wrote original draft preparation; B.L. wrote review and editing; L.H. supervised resource investigation; D.T. provided funding. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Major Science and Technology Projects of Sichuan Province under Grant 2022ZDZX0001; the Science and Technology Support Project of Sichuan Province under Grant 2023YFS0366 and 2023YFG0020; the Natural Science Foundation of Sichuan Province under Grant 2024NSFSC0792.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wei, R.; Chen, R.; Li, H. Analysis of technology trends based on deep learning and text metrics. Comput. Sci. 2022, 49, 37–42. [Google Scholar]
Zhang, Y.; Wang, Y.; Li, B. Named entity recognition of Chinese electronic medical records based on RoBERTa-wwm dynamic fusion model. Data Anal. Knowl. Discov. 2022, 6, 242–250. [Google Scholar]
Zhang, N.; Wang, P.; Zhang, G. A deep learning recognition method for named entities oriented to process operation description text. Comput. Appl. Softw. 2019, 36, 188–195+261. [Google Scholar]
Yu, H.; Zhang, H.; Liu, Q. Chinese named entiy identification using cascaded hidden Markov model. J. Commun. 2006, 27, 87–94. (In Chinese) [Google Scholar]
Mi, L.; Yuan, J. Application of entity recognition method of clinical medical orders information based on crf model. Comput. Appl. Softw. 2020, 37, 209–212. (In Chinese) [Google Scholar]
Wei, X.; Hu, D.H.; Yi, M.H.; Chang, X.; Yang, X.; Zhu, W. Extraction of Entiy Interactions Based on Multiple Feature Fusion Linear Kernel SVM Approach. Chin. J. Biomed. Eng. 2018, 37, 451–460. (In Chinese) [Google Scholar]
Qin, Y.; Liu, S. Name entity identification in e-commerce domain based on RoForm. J. Dalian Minzu Univ. 2022, 24, 448–454. [Google Scholar]
Sutton, C.; Rohanimanesh, K.; McCallum, A. Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. J. Mach. Learn. Res. 2007, 8, 693–723. [Google Scholar]
Hammerton, J. Named entity recognition with long short-term memory. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003—Volume 4 (CONLL ‘03), Edmonton, AB, Canada, 31 May 2003; pp. 172–175. [Google Scholar]
Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P. Natural Language Processing (almost) from Scratch. J. Mach. Learn. Res. 2011, 12, 2493–2537. [Google Scholar]
Strubell, E.; Verga, P.; Belanger, D.; McCallum, A. Fast and accurate entity recognition with iterated dilated convolutions. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; pp. 2670–2680. [Google Scholar]
Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF models for sequence tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
Lample, G.; Ballesteros, M.; Subramanian, S.; Kawakami, K.; Dyer, C. Neural architectures for named entity recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 260–270. [Google Scholar]
Yan, H.; Deng, B.; Li, X.; Qiu, X. Tener: Adapting transformer encoder for named entity recognition. arXiv 2019, arXiv:1911.04474. [Google Scholar]
Tang, B.; Wang, X.; Yan, J.; Chen, Q. Entity recognition in Chinese clinical text using attention-based CNN-LSTM-CRF. BMC Med. Inform. Decis. Mak. 2019, 19, 74. [Google Scholar] [CrossRef] [PubMed]
Bai, B.; Hou, X.; Shi, S. Named Entity RecognitionMethod Based on CRF and Bi-LSTM. J. Beijing Inf. Sci. Technol. Univ. 2018, 33, 27–33. [Google Scholar]
Ma, M.; Yang, Q.; Eskar, A.; Turdi, T. Chinese Named Entity Classification Based onWord Vector and Conditional Random Fields. Comput. Eng. Des. 2020, 41, 2515–2522. [Google Scholar]
Lin, J.; Liu, E. Research on Named Entity Recognition Method of Metro On-Board Equipment Based on Multiheaded Self-Attention Mechanism and CNN-BiLSTM-CRF. Comput. Intell. Neurosci. 2022, 2022, 6374988. [Google Scholar] [CrossRef]
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Wang, Z.; Huang, M.; Li, C.; Feng, J.; Liu, S.; Yang, G. Intelligent Recognition of Key Earthquake Emergency Chinese Information Based on the Optimized BERT-BiLSTM-CRF Algorithm. Appl. Sci. 2023, 13, 3024. [Google Scholar] [CrossRef]
Chen, L.; Liu, D.; Yang, J.; Jiang, M.; Liu, S.; Wang, Y. Construction and application of COVID-19 infectors activity information knowledge graph. Comput. Biol. Med. 2022, 148, 105908. [Google Scholar] [CrossRef]
Qian, Q.; Cheng, Y.; Pang, B. Audi Text NamedEntity Recognition Based on MacVERT and Adversarial Training. Comput. Sci. 2023, 50, 81–86. (In Chinese) [Google Scholar]
Chen, F.L.; Zhang, D.Z.; Han, M.L.; Chen, X.Y.; Shi, J.; Xu, S.; Xu, B. VLP: A Survey on Vision-language Pre-training. Mach. Intell. Res. 2023, 20, 38–56. [Google Scholar] [CrossRef]
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
Melis, G.; Kociský, T.; Blunsom, P. Mogrifier LSTM. arXiv 2019, arXiv:1909.01792. [Google Scholar]
Cui, Y.; Che, W.; Liu, T.; Qin, B.; Yang, Z. Pre-Training with Whole Word Masking for Chinese BERT. IEEE-ACM Trans. Audio Speech Lang. Process. 2021, 29, 3504–3514. [Google Scholar] [CrossRef]
Guan, J.; Huang, Q. Named Entity Recognition Method Research Based on the Deep Learning. Softw. Guide 2023, 22, 90–94. (In Chinese) [Google Scholar]
Pelikan, M.; Goldberg, E.D. A hierarchy machine: Learning to optimize from nature and humans. Complexity 2003, 8, 36–45. [Google Scholar] [CrossRef]
Isik, E. Thermoluminescence characteristics of calcite with a Gaussian process regression model of machine learning. Luminescence 2022, 37, 1321–1327. [Google Scholar] [CrossRef] [PubMed]
Gao, D. Adaptive Structure and Parameter Optimizations of Cascade RBF-LBF Neural Networks. Chin. J. Comput. 2003, 5, 575–586. (In Chinese) [Google Scholar]
Xu, L.; Dong, Q.; Liao, Y.; Yu, C.; Tian, Y.; Liu, W.; Li, L.; Liu, C.; Zhang, X. CLUENER2020: Finegrained named entity recognition dataset and benchmark for Chinese[EB/OL]. arXiv 2020, arXiv:2001.04351. Available online: https://arxiv.org/ftp/arxiv/papers/2001/2001.04351.pdf (accessed on 15 July 2024).
CCKS 2019. (n.d.). CSDN. Available online: https://download.csdn.net/download/baidu_38876334/88318917?utm_source=bbsseo (accessed on 15 July 2024).
Shen, Y.; Yi, K.; Zhou, W.; Fei, M.; Lv, Z. The BERT-BiLSTM-CRF Model Applied to Chinese Entity Recognition for the Science and Technology Service Field. In Proceedings of the 2022 41st Chinese Control Conference (CCC), Hefei, China, 25–27 July 2022; pp. 4205–4210. [Google Scholar]
Hu, W.; Zhang, Y. Medical entity reconition method based on BERT-BIGRU-CRF. Comput. Era 2023, 8, 24–27. [Google Scholar]

Figure 1. Structure of the model.

Figure 2. Mogrifier GRU structure.

Figure 3. (a) The impact of the leading coefficient k on the model’s accuracy with one round of interaction; (b) the impact of the leading coefficient k on the model’s accuracy with two rounds of interaction; (c) the impact of the leading coefficient k on the model’s accuracy with three rounds of interaction; (d) the impact of the leading coefficient k on the model’s accuracy with four rounds of interaction.

Figure 4. Loss function trend chart.

Table 1. Parameter settings.

Parameters	Batch_Size	lr	Epoch	Embeding_Dim	Hidden_Dim	Dropout
Setting	64	2e-5	30	768	256	0.5

Table 2. Performance of the model on the CCKS2019 and CLUENER2020 dataset.

Dataset	Model	F1	p	R
CCKS2019	Bert-mogrifierBiGRU-crf	84.11	84.05	84.17
	Bert-BiLSTM-crf	82.67	82.35	83.01
	Bert-BiGRU-crf	83.13	82.94	83.33
CLUENER2020	Bert-mogrifierBiGRU-crf	85.42	87.17	84.12
	Bert-BiLSTM-crf	79.69	84.47	76.54
	Bert-BiGRU-crf	84.20	85.91	82.98

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, B.; Chen, W.; Tao, J.; He, L.; Tang, D. Research on Named Entity Recognition Based on Gated Interaction Mechanisms. Appl. Sci. 2024, 14, 6481. https://doi.org/10.3390/app14156481

AMA Style

Liu B, Chen W, Tao J, He L, Tang D. Research on Named Entity Recognition Based on Gated Interaction Mechanisms. Applied Sciences. 2024; 14(15):6481. https://doi.org/10.3390/app14156481

Chicago/Turabian Style

Liu, Bin, Wanyuan Chen, Jialing Tao, Lei He, and Dan Tang. 2024. "Research on Named Entity Recognition Based on Gated Interaction Mechanisms" Applied Sciences 14, no. 15: 6481. https://doi.org/10.3390/app14156481

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Named Entity Recognition Based on Gated Interaction Mechanisms

Abstract

1. Introduction

2. Materials and Methods

2.1. Embedded Layer

2.2. Feature Extraction Layer

2.3. CRF Layer

3. Results and Discussion

3.1. Data Set

3.2. Evaluation Indicators

3.2.1. Precision (P)

3.2.2. Recall (R)

3.2.3. F1-Score (F1)

3.3. Setup

3.4. Experimental Results and Analysis

Data Comparison Experiment and Result Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI