1. Introduction
Unlike English named entity recognition [
1,
2,
3], Chinese NER is more complex due to the lack of explicit delimiters, making it more challenging than English NER. An incorrect participle can adversely affect the accuracy of Named Entity Recognition [
4]. Initially, the focus was mainly on the semantic information of characters within sentences for entity recognition, overlooking vocabulary information, which presented certain difficulties for Chinese NER [
5]. Subsequently, Zhang et al. [
6] implemented a lexical augmentation strategy, amalgamating matched vocabulary with character-level data. This approach leveraged lexical insights to refine the demarcation of entity boundaries. However, the design of this method allowed characters to merge only with vocabulary ending with that character, leading to a loss of information as characters could not merge with vocabulary that began or was in the middle of the character. Li and Yan [
7] introduced the Flat-Lattice approach, improving the lexical enhancement method. Flat-Lattice obtains two indices for the head and tail positions by labeling the token positions in the sentence and then uses these two positional indices to reconstruct the Flat-Lattice structure in such a way that a character can interactivity directly with the vocabulary containing that character, as illustrated in
Figure 1.
Although Flat-Lattice uses vocabulary enhancement methods to integrate vocabulary features into character-based features, utilizing vocabulary information to enhance the ability of Chinese-named entity recognition, there are still two issues with this method. Firstly, Flat-Lattice compares the vocabulary in the sentence with the vocabulary in the lexicon character by character until the vocabulary in the sentence cannot find the exact matching vocabulary in the lexicon. In this process, all sub-matched words are added to the vocabulary sequence. However, the words matched by the method of character comparison may conflict, i.e., a single character in a sentence may correspond to characters in more than one matched vocabulary. If the conflicting matched vocabularies are fused with characters after encoding only, this may lead to the lack of semantic and sequential temporal information between the matched vocabularies. Instead of enhancing entity boundary delimitation, this may have an inhibitory effect. For example, in
Figure 1, the character ‘人’ (‘person’) can match both ‘重庆人’ (‘Chongqing person’) and ‘人和药店’ (‘Renhe Pharmacy’). The Flat-Lattice method, after encoding, directly merges these matched vocabularies with the character, which may mislead the model to identify ‘人’ as both ‘E-LOC’ and ‘B-LOC’. More critically, conflicts between matched vocabularies are common in Chinese NER [
8]. In order to solve the problem of conflict between matched vocabularies, researchers have applied various models. Among them, Gui et al. [
9] applied a CNN model to alleviate the conflict between matching words by extracting the semantic features of the matching words through the feedback method of rethinking mechanism. However, the convolutional operation used by CNN perceives the input locally, and each convolutional kernel can only perceive a fixed-size window of the input data. In addition, the convolution kernel is invariant, which means that the model uses the same convolution kernel at different locations. Therefore, the CNN model ignores the vocabularies’ sequential temporal information and the matching vocabularies’ global information throughout the sentence. To compensate for this shortcoming, Hu et al. [
10] turned to the GRU model to address the same issue. The advantage of the GRU model is that it can extract the semantic information of matched vocabulary and effectively capture its sequential temporal information. However, the single memory unit design of the GRU model, which integrates features storing long-term dependencies with the current moment’s hidden state, still needs to be improved in extracting the global information of matched vocabularies in the entire sentence. This paper further analyzes matched vocabularies, considering the number of vocabularies and the global information of matched vocabularies in the entire sentence. Our method aims to comprehensively obtain the semantic and sequential temporal information of matched vocabularies while considering local and global features, thereby more effectively handling conflicts between matched vocabularies.
Secondly, this approach to lexical enhancement ignores the limited nature of the lexicon, and there are words in sentences that cannot be matched to words in the lexicon. Under such circumstances, the effectiveness of lexical enhancement methods in enhancing entity participles can be limited. Flat-Lattice fuses the encoded matching vocabulary with the characters for obtaining semantic information and obtains a seq*seq attention score matrix through a self-attention mechanism, which is subsequently multiplied with the values of the input information to perform a weighted summation in order to obtain weights that contain character-vocabulary information, which is used for predicting entity labels, thus improving the ability of named entity recognition. However, this approach may ignore the spatial information of the characters after fusing the matched vocabulary, making them unavailable for sharing by other characters of the unfused matched vocabulary; this process can result in an information deficit. The accurate segmentation of the characters within the integrated matching vocabulary might positively influence the character-level differentiation in the non-integrated vocabulary. For example, in the case of “重庆人和药店”, the dictionary may not be able to match the word “人和”, so the characters “人” and “和” cannot be fused with the information of the matching vocabulary to enhance the entity participle. If the entity “重庆” can be correctly participle, it will facilitate the participle of “human and pharmacy” later. Xue et al. [
11] used the Bi-GRU model to enhance the semantic depe ndency between neighboring characters. Before that, the model extracted the semantic features of characters to obtain a seq*seq matrix. Therefore, this paper considers that extracting spatial features between characters may help named entity recognition. In addition, Yan et al. [
12] demonstrated the use of Convolutional Neural Networks (CNNs) to model the local spatial relationships between words in an English dataset, which improves the effectiveness of named entity recognition. Therefore, this paper proposes a “local attention” approach to capture the local spatial relationships between characters. It is anticipated that this methodology will enhance the efficacy of named entity recognition, particularly in scenarios where character fusion with matching vocabulary is not feasible.
Considering these factors, this paper proposes a study based on lexical information and spatial features. It consists of two main aspects: firstly, by comparing the head and tail coordinates of the matched vocabulary, if the tail coordinate of the preceding vocabulary is greater than or equal to the head coordinate of the vocabulary that follows it, this kind of vocabulary is considered to be in conflict. Then, the encoded matched vocabulary is used as an input sequence and processed by the Bi-LSTM model. The Bi-LSTM model is chosen for two reasons: on the one hand, the number of matching words in each sentence is different; on the other hand, the global bi-directional semantic and sequential temporal information of the matching words is obtained. After obtaining the matched words’ global bidirectional semantic and sequential temporal information, they are fused with character features for predicting entity labels. By calculating the loss value and feedback function between the predicted label and the actual label, the conflict-matching vocabulary for obtaining global bidirectional semantic and sequential temporal information is iteratively optimized multiple times, gradually reducing the weight of the conflict-matching vocabulary to promote Chinese-named entity recognition. Secondly, preceding research within this study has already extracted the semantic information of characters and matched vocabulary, leading to the derivation of a seq*seq attention score matrix using a self-attention mechanism. In order to encapsulate the spatial relationships among characters, local attention is introduced; the local spatial relationship between the characters of unfused matching vocabulary and those of fused matching vocabulary is mainly captured by the convolutional layer; this strategy additionally fosters entity segmentation, thereby augmenting the effectiveness of named entity recognition. The principal contributions of this paper can be summarized as follows:
This article introduces Bi-LSTM to obtain global bidirectional semantic information and sequential temporal information of matching vocabulary. Reduce the weight of conflict-matching vocabulary to alleviate the impact of matching vocabulary conflicts and character fusion on entity participle.
The local attention method extracts the local spatial relationship between characters with unfused matching vocabulary and characters with fused matching vocabulary, further promoting the entity participle.
The efficacy of the SISF approach introduced in this paper, in comparison to both the baseline model and an enhanced variant derived from the baseline model, is validated through experiments conducted on four openly accessible Chinese-named entity recognition datasets.
The framework of this article is as follows: Firstly, in
Section 1, we explore the challenges and solutions faced by vocabulary enhancement in the field of Chinese named entity recognition;
Section 2 provides an overview of relevant research on Chinese named entity recognition;
Section 3 provides a detailed analysis of existing methods; in
Section 4, we introduce the dataset on which our proposed method relies for training and evaluation, as well as the baseline model used for comparison;
Section 5 conducts extensive experiments on four publicly available Chinese datasets and analyzes the experimental results in depth to validate the effectiveness of our proposed SISF method; and finally, at the end of the article, we summarize the results of this study and look forward to possible directions for future research.
2. Related Work
With the development of deep learning, researchers have begun applying neural network models to Named Entity Recognition (NER) [
12]. There is a significant difference between Chinese named entity recognition and English named entity recognition. These differences include the following aspects: firstly, in English, there are natural spaces as separators between words, while in Chinese, there are no clear boundaries between characters. In Chinese entity recognition, there are already vocabulary-based or character-based methods. For instance, Li et al. [
13] proposed a character-based Chinese entity tagger, proving the superiority of character-based methods over vocabulary-based methods. However, character-based methods do not utilize lexical information. Therefore, Zhang et al. [
12] introduced a lattice structure, which matches characters with dictionaries to obtain matching vocabulary containing characters, and fused these matching vocabulary information with characters. This process helps to partition the boundaries of entities, thereby improving the accuracy of naming entity boundaries. Yu et al. [
14] also achieved good results in using vocabulary enhancement methods for named entity recognition in classical Chinese.
Additionally, Aguilar et al. [
15] used Bi-LSTM to fuse the phonetic features of words with word features, reducing the impact of noise on NER in English social media datasets. Although the methods above have achieved good results in NER, LSTM requires information from the previous time step’s hidden units, limiting the full utilization of GPU parallelism, and there might be conflicts between matched vocabularies. Gui et al. [
9] introduced a Convolutional Neural Network (CNN)-based Named Entity Recognition (NER) approach that incorporates a vocabulary feedback mechanism to tackle the aforementioned two challenges. On the one hand, CNN can leverage GPU parallelism for increased efficiency; on the other hand, CNN reanalyzes matched vocabularies, refining the network with feedback on high-level features to supplement low-level features, thus alleviating vocabulary conflicts. Moreover, due to shortcomings in the method designed by Zhang et al. [
10], each character could only acquire information about vocabulary ending with it. For example, in the sentence ‘重庆人和药店’ (Chongqing Renhe Pharmacy), the character ‘药’ (medicine) could only obtain information from the vocabulary ‘药店’ (pharmacy) and not ‘人和药店’ (Renhe Pharmacy), leading to a loss of lexical information.
To resolve this issue, Li et al. [
7] proposed the Flat-Lattice structure, constructing head and tail indices for each character and word based on their positions in the sentence, thus enabling the direct modeling of interactions between characters and matched vocabulary and introducing lexical information. However, this method overlooked possible conflicts between matched vocabularies. Zhang et al. [
16] used an attention mechanism to calculate the weights of conflicting vocabularies and merge them with corresponding character embeddings to mitigate vocabulary conflicts. Conversely, in contrast to the dynamic weights of the attention mechanism, Ma et al. [
5] used statistical weights dependent on word frequency to address the issue of vocabulary conflicts. Zhang et al. [
17] can explicitly capture various semantic and boundary relationships between different semantic units through adjacency matrices by transforming lattice structures into a unified graph, reducing excessive dependence on word information. With the continuous deepening of research, Liu et al. [
18] combined multiple feature fusion such as words and word roots to enhance the semantic information of sentences. In addition, Gu et al. [
19] utilized rule information by observing the internal rules of entities while avoiding excessive attention to cross internal regularity. Cauteruccio et al. [
20] conducted computational and qualitative analysis on the audience’s experience during the competition, using social network-based modeling techniques and thematic analysis, respectively, to focus on emotional changes in the audience. Zhang et al. [
21] considered pronunciation issues in Chinese, such as polyphonic characters or characters with the same pronunciation but different characters, introducing speech features through cross functions to enhance Chinese named entity recognition.
Additionally, some researchers focus on extracting deeper features based on existing features. For example, Zhu et al. [
22] introduced a Convolutional Attention Network designed for Chinese named entity recognition (NER). This approach leverages Convolutional Neural Networks (CNNs) to capture proximate character relationships and employs GRU to capture sentence-level contextual information. This further indicates that there are certain local spatial relationships between characters and vocabularies, and these relationships are beneficial for NER. Moreover, Jin et al. [
23] proposed the attention-mechanism-based Gated Convolutional Recurrent Neural Network (GCRA), using LSTM to utilize local contextual features while integrating global dependencies of different spaces and adjacent characters through a gating mechanism. With the widespread application of Transformers, researchers continue to optimize them. For instance, Lu et al. [
24] proposed a dynamic hybrid visual Transformer, using convolution to extract fine-grained spatial features and fusing them with features extracted by Transformers, mitigating the underfitting issue in small datasets when training with Transformers. Dai et al. [
25] introduced a new neural network structure, Transformer-XL, using a recursive mechanism to mitigate the directional information of texts and the limitation of fixed-length encoding in models. Yan et al. [
26] proposed an attention mechanism with directional and relative position information to address the self-attention dot product direction in Transformers, further enhancing the capability of NER. Additionally, Meng et al. [
27] proposed using Chinese character glyph features, treating characters as images and using CNN to obtain semantic representations, enhancing the model’s generalization ability. Furthermore, Yu et al. [
28] used CNN modeling to capture the local spatial relationships of words, reducing the nesting issue in English datasets and improving the capability of NER.
Model Comparison: Strubell et al. [
29] demonstrated the advantages of CNN in computing speed by utilizing GPU parallel processing, surpassing Bi LSTM. However, Yan et al. [
26] pointed out that in terms of the accuracy of named entity recognition, CNN’s performance is inferior to Bi LSTM. Their experiment found that directly replacing the model with Transformer on small datasets like Weibo did not improve the effectiveness of named entity recognition but rather was inferior to Bi LSTM. This is mainly attributed to the complexity introduced by Transformer through its self attention mechanism, which allows for comprehensive interaction between positions within the sequence. Compared to the sequential processing of Bi LSTM, it increases the complexity of the model and can easily lead to underfitting on small-scale training sets. Given that the matching vocabulary in entity recognition is usually small, this study decided to use the Bi LSTM model to capture contextual information of vocabulary. Bi LSTM is easy to train in limited data and exhibits better performance compared to Transformer due to its lower model complexity.
4. Experiments
In this section, the paper details the datasets used for training and evaluating the proposed method and the baseline models used for comparison. Quantifications have been performed for the count of entities, the tally of conflict-matching vocabularies, and the hyperparameters associated with each dataset.
4.1. Data Sets and Hyperparameters
This research employs four Chinese named entity recognition datasets to substantiate the enhanced performance of the proposed SISF method in the domain of NER. These datasets encompass OntoNotes 4.0 and MSRA datasets originating from the news domain, as well as Resume and Weibo datasets sourced from online repositories within China. Specifically, the Resume dataset was constructed from resumes on Sina Finance, while the Weibo dataset was built from information on China’s social media platform, Sina Weibo.
The Weibo dataset comprises four distinct entity types: Person Names (PER), Organization Names (ORG), Location Names (LOC), and Geographic/Social/Political Entities (GPE). On the other hand, the Resume dataset encompasses eight entity categories: Countries (CONT), Education Levels (EDU), Location Names (LOC), Person Names (PER), Organization Names (ORG), Professions (PRO), Racial Backgrounds (RACE), and Job Titles (TITLE). In contrast, the MSRA dataset comprises three entity types: Organization Names (ORG), Person Names (PER), and Location Names (LOC). Lastly, OntoNotes 4.0 includes four entity types: Person Names (PER), Organization Names (ORG), Location Names (LOC), and Geographic/Social/Political Entities (GPE).
A statistical enumeration of entities within the four datasets is provided in
Table 1. ‘Train’ indicates the size of the training set, ’Dev’ is the size of the validation set, and ‘Test’ is the size of the test dataset. ‘Entity Types’ refers to the types of entities in the datasets, and ‘Charavg’, ‘Wordavg’, and ‘Entityavg’ are the average number of words, words, and entities annotated by dictionaries and entities in the instance, respectively. Conflict lexicon denotes the number of matched vocabulary conflicts in the test set. The hyperparameters for each dataset are listed in
Table 2.
Given that Flat-Lattice uses experiments conducted on NVIDIA GeForce RTX 2080Ti cards, this paper uses NVIDIA GeForce RTX 3090 cards in order to ensure the rigor of the experiments, as well as to demonstrate the superiority of this paper in optimizing the vocabulary enhancement method for enhancing Chinese named entity recognition. This paper conducts the corresponding experiments on the NVIDIA GeForce RTX 3090 card.
4.2. Evaluation Indicators
Following the prevailing practices in Chinese named entity recognition tasks from prior studies, the evaluation criteria for experimental effectiveness primarily include Precision (P), Recall (R), and F1 scores (F1). Precision represents the ratio of correctly identified positive samples to the total positive cases, while Recall signifies the ratio of correctly identified positive samples to the entirety of positive samples. Because of the contradiction between precision and recall, this paper mainly uses F1 as the evaluation index of the SISF model. The formula is as follows:
In this context, the notations TP (True Positive), FP (False Positive), TN (True Negative), and FN (False Negative) are utilized to represent the following: TP refers to correctly identified positive cases, FP indicates instances where negative cases are erroneously classified as positive, TN represents correct identifications of negative cases, and FN signifies instances where positive cases are incorrectly categorized as negative.
4.3. Baseline Model
The Flat-Lattice model makes full use of the parallel capability of the GPU to construct the head and tail for each character and vocabulary, reconstructing the original lattice structure, which can directly model the interaction between the character and all the vocabulary information that it matches and introduce the vocabulary information. Secondly, the relative encoding of character positions is obtained by successive transformations of the head and tail information, using dense vectors to model the relationship between them. Using relative position embedding, position information is assigned to each node; the distance and direction information of the characters is obtained to enhance the Transformer’s direction perception ability, and the direction perception helps the characters to identify whether their neighbors constitute a continuous entity or not. Moreover, use Conditional Random Field (CRF) to decode the named entity to identify the label sequence.
In order to assess that the proposed SISF model optimization vocabulary enhancement method in this paper is effective for Chinese-named entity recognition, this paper not only chooses Flat-Lattice as a comparison model but also selects some experiments in solving the above problems by using different methods and carries out a comparative analysis.
- (1)
FGN [
31]: A novel CNN structure called CGSCNN is proposed to capture the interaction of glyph information between neighboring graphs, providing a method with a sliding window and an attentional mechanism to fuse the BERT representation and glyph representation of each character.
- (2)
Token-Relation [
32]: proposes a masked self-attention mechanism to integrate the local contextual information of matched words and designs gated information controllers to deal with the conflict problem existing in matched words.
- (3)
LR-CNN+BERT [
9]: is a convolutional neural network-based approach that integrates the vocabulary through a rethinking mechanism to mitigate conflicts between matching repertoires of words in the lexicon.
- (4)
PLTE+BERT [
11]: introduces a novel porous mechanism to enhance the local dependency between neighboring characters.
- (5)
SLK-NER [
10]: This model mitigates conflicts in the matching vocabulary by developing a method to compute a weighted sum of lexical information and using it as an additional feature.
- (6)
FLAT [
7]: constructs head and tail for each character and vocabulary, reconstructs the original lattice structure, and can directly model the interaction between characters and all the vocabulary information they match. Reduces the loss of information where a character can only be fused with the vocabulary ending with that character and enhances named entity recognition.
5. Results
In this section, an extensive series of experiments is carried out using four openly accessible Chinese datasets, followed by a comprehensive analysis of the results to confirm the efficacy of the SISF method. Also, ablation experiments are conducted to verify the validity of each part of the proposed method.
5.1. Comparison Experiment
The specific experimental results of this paper are shown in
Table 3 and
Table 4, where
is the experimental result of Flat-Lattice on the NVIDIA GeForce RTX 3090 card.
An analysis of the experimental results reveals that the SISF model demonstrates improvement in Chinese named entity recognition (NER) across all four datasets relative to the baseline model, Flat-Lattice.
Weibo: As shown in
Table 3, the F1 score for NER in the Weibo dataset exhibited growth of 3.12%. The results indicate that compared to the other three datasets, the SISF model exhibits a significant performance enhancement in NER on the relatively minor Weibo dataset. This phenomenon arises due to the pronounced decline in named entity recognition performance in the Weibo dataset compared to the other three datasets when transitioning from the Transformer model to the Bi-LSTM model [
26]. This suggests that the SISF model contributes to NER by mitigating conflicts between matched vocabularies and enhancing the local spatial relationships between characters with unfused and fusion vocabulary information. Moreover, compared to the Flat-Lattice method, the SISF model also compensates for the underfitting issue of the Transformer model in small datasets, thereby further improving Chinese NER performance.
Resume: As indicated in
Table 3, the F1 score for NER in the Resume dataset demonstrated an enhancement of 0.72%. In contrast, the F1 score for the Flat-Lattice model on the same dataset is 95.86%. This dataset uses vocabulary information to enhance named entity recognition through lexical enhancement methods, but the enhancement is more limited than the other three datasets [
12]. The experiments conducted in this article have substantiated that extracting local spatial relationships between characters from the non-integrated vocabulary and characters from the integrated vocabulary yields superior results in named entity recognition, in comparison to mitigating vocabulary matching conflicts. This further confirms that in vocabulary enhancement methods, when vocabulary information is insufficient, the accuracy of named entity recognition can be effectively improved by considering the local spatial relationship between characters of unfused and fused vocabulary.
Ontonotes 4.0: As shown in
Table 4, the F1 score for NER in the Ontonotes 4.0 dataset increased by 1.07%. The Ontonotes 4.0 test set contains the most severe matched vocabulary conflicts, as indicated in
Table 1. In mitigating these vocabulary conflicts, the increase in the NER F1 score accounted for the most significant proportion of the overall F1 score improvement, reaching 70.00%. This further confirms that obtaining global bidirectional semantic and sequential temporal information of vocabulary to alleviate vocabulary conflicts plays a crucial role in enhancing NER performance.
MSRA: As indicated in
Table 4, the F1 score for Named Entity Recognition (NER) in the MSRA dataset exhibited an improvement of 0.37%. Among the four datasets, the Flat-Lattice model exhibits the best performance on the MSRA dataset, marking its leading position in named entity recognition capability in this series of datasets. However, the SISF model has already achieved a significant improvement in Chinese named entity recognition performance on this basis by effectively mitigating the conflict problem in lexical matching and finely capturing the interrelationships of characters in local space. The progress of the model is reflected in the optimization of the lexical matching mechanism and the deep mining of the local spatial information, which further strengthens its capability in handling the entity recognition task in complex Chinese texts.
5.1.1. The Impact of Reducing the Weight of Conflicting Vocabulary on Named Entities
This article compares the differences between vocabulary in obtaining global bidirectional semantic information and sequential temporal information, as shown in
Figure 5a,b, through intuitive visualization. The Bi-LSTM effectively captures the global bidirectional semantic and temporal sequential information between conflict-matched words and other words through its advanced gating mechanism. This mechanism not only extracts deep-level semantic relationships but also further refines the comprehension ability of the model through character-level information fusion. Through the loss values and feedback computed during the iterative optimization process, Bi-LSTM gradually reduces the weights of the conflicting matching words, which effectively mitigates the interference that these conflicting words may cause in entity disambiguation. Especially on datasets such as Weibo and Ontonotes 4.0, in which the test set is large relative to the training set, it is more necessary to fuse matching vocabulary features to facilitate entity participle [
12]. However, due to the very large number of conflicting matching vocabularies relative to the test set, resolving these inter-vocabulary conflicts becomes particularly important. Experiments have demonstrated that by utilizing the global bidirectional semantic and temporal sequential information of the matching vocabulary to reduce the weight of the conflicting matching vocabulary, the negative impact of conflicting vocabulary and character fusion on entity boundary delineation can be significantly reduced, resulting in significant F1 value enhancement on both datasets, which further enhances the accuracy of named entity recognition.
5.1.2. Extracting Local Spatial Information for Analysis
In this section, this article presents a visual analysis of vocabulary that has undergone and has not undergone local attention processing, as shown in
Figure 6a,b for comparison of attention visualization. The SISF model introduces an innovative strategy that effectively overcomes the failure of lexical enhancement methods when there is a lack of matching vocabulary by digging deeper into the local spatial relationships between unfused lexical information and fused lexical information characters. Especially for the Weibo dataset with small data volume, the direct adoption of the Transformer model instead of the Bi-LSTM model does not produce the desired performance improvement, and instead it may lead to the degradation of the named entity recognition performance due to insufficient feature extraction and insufficient training of the model parameters. To this end, the SISF model incorporates a local attention mechanism to capture the local spatial relationships between characters in detail, which significantly improves the recognition efficiency of unfused matched words, especially on the Weibo dataset, which shows more significant performance improvement than other datasets. When encountering scenarios in which the matching vocabulary enhancement method cannot effectively enhance the entity participle, the model is able to use the character space information of the fused vocabulary to further improve the accuracy of entity disambiguation. The experimental results fully validate the effectiveness and superiority of the SISF model in the Chinese named entity recognition task. The experimental results fully validate the effectiveness and superiority of the SISF model in the Chinese named entity recognition task. The introduction of the local attention mechanism not only strengthens the model’s grasp of the local spatial relationships between the characters of unfused and fused lexical information but also reduces the reliance on human intervention by adaptively learning these relationships, thus optimizing the overall named entity recognition process.
5.2. Ablation Experiment
In this research paper, ablation experiments were conducted to confirm the effectiveness of each component of the proposed methodology. The outcomes of these experiments are depicted in
Table 5:
Through the ablation experiments in this paper, the model’s ability for named entity recognition generally decreases when the local attention module is removed, thus validating the importance of further extracting local spatial features between unfused lexical information and fused lexical information at the character level. In particular, for a small dataset like Weibo, where the model’s training capability is underfitted, the model performs best in named entity recognition by extracting spatial relationships between characters compared to the other three datasets.
In addition, to illustrate the superior capability of the SISF model in mitigating conflicts between matching words and enhancing named entity recognition, this paper performs experimental comparisons with Gui et al. [
9], who proposed to use CNNs to feedback high-level features to resolve the conflicts of matching words by refining the network model, as shown in
Table 6. The experimental findings indicate that, in contrast to the Bi-LSTM model introduced in this study, the utilization of CNN to extract enhanced features from matching vocabulary is less efficacious at mitigating conflicts among matching words.
The advantage of the Bi-LSTM model is its ability to capture the global bidirectional semantic information and sequential temporal information of lexical matching. When dealing with conflicting matching words, Bi-LSTM models the sequence of matching words in the whole sentence. The incorporation of the gating mechanism enables the effective capture of global bidirectional semantic information and sequential temporal information within the sequence of matching words. In contrast, a convolutional neural network (CNN) mainly focuses on local features within the coverage of the convolutional kernel. It performs convolutional operations by sliding the convolutional kernel over the matching vocabulary, thus capturing the local features of the matching vocabulary. However, in terms of capturing the semantic and sequential temporal information of the matching vocabulary in the whole sentence, the performance of CNN is weaker compared to Bi-LSTM. Therefore, Bi LSTM is superior to CNN in solving vocabulary-matching conflicts and improving named entity recognition capabilities.