Named entity recognition (NER) was formally established as a sub-task of information extraction in the Sixth Message Understanding Conference (MUC-6) [
8], which stipulates that named entities include personal names, place names, and organization names. In the subsequent MET-2 [
9] of MUC-7 and a series of international conferences, including IEER-99, CoNLL-2002, CoNLL-2003, IREX, and LREC, named entity recognition was regarded as a designated task in the field of information extraction. Moreover, the goals related to named entities are expanding.
There are three basic kinds of named entity recognition methods: rule-based methods, statistical machine learning methods, and deep learning methods. The methods based on rules rely on the manual construction of dictionaries and knowledge bases, and mostly adopt rules manually constructed by language experts. The selected features include direction words, punctuation, statistical information, and other methods to match patterns with strings. The portability of these methods is poor, as they often depend on specific fields and text features. Statistical machine learning methods typically use a manually labeled corpus for training. For new fields in this method, only a small amount of modification is needed for training. Typical machine learning models include maximum entropy (ME) [
10], support vector machine (SVM) [
11], hidden Markov model (HMM) [
12], and conditional random fields (CRF) [
13]. In recent years, named entity recognition methods based on deep learning have become the mainstream. Deep learning models are end-to-end models [
14]. Deep neural networks can carry out non-linear transformations on data, then automatically learn more complex features to complete the training and prediction tasks of multi-layer neural networks. Collebert et al. [
15] first proposed a named entity recognition method based on neural networks. This method limits the use of context to a fixed window size around each word, abandons the useful long-distance relationships between words, and cannot resolve the problem of long-distance dependence. With the progress of recurrent neural networks (RNNs), in terms of structure and rapid development of hardware performance, the training efficiency of deep learning has made great breakthroughs. The use of recurrent neural networks has become increasingly common. Variants of cyclic neural networks, long short-term memory (LSTM), and gated recurrent units (GRUs) have made breakthroughs in the field of natural language processing (NLP). LSTM has a strong ability to extract long-term sequence features. Huang et al. [
16] first applied the bidirectional LSTM-CRF model to the benchmark sequence-marked data set of natural language processing. Bidirectional LSTM can preserve long-term memory and make use of past and future sequence information. Moreover, this model adds CRF as a decoding tool. Their experimental results showed that this model has less dependence on the word embedding and achieved a good training effect. Yang et al. [
17] proposed a deep-seated recurrent neural network for sequence labeling, which uses GRUs to encode morphological and contextual information at the character and word levels, and applies a CRF field layer to predict labels. GRUs have higher calculation speeds, as they simplify the gating unit on the basis of similar accuracy to that of LSTM. Their model obtained a 91.20% F1 value on the CoNLL2003 English data set, and effectively solved the problem of cross-language joint training. The transformer model was proposed by Vaswani et al. [
18], which constructs the coding layer and decoding layer through the use of a multi-head attention mechanism. Through parameter matrix mapping, the attention operation is carried out. Then, the process is repeated many times. Finally, the results are spliced to obtain the global features. As the transformer model has the advantages of parallel computing and deep architecture, it has been widely used in named entity recognition tasks. However, the transformer model does not incorporate information related to location relationships. Therefore, Yan et al. [
19] improved the model to solve the problem of the transformer not being able to capture the direction information and relative position, and proposed a transformer encoder for the NER (TENER) model, which includes an attention mechanism which simultaneously captures the corresponding position and direction information. This model was implemented in the MSRA Chinese corpus, the English OntoNotes5 0 data set, and other data sets. In addition, it was shown to be better than the original transformer model. The recognition task of nested named entities has always been the research difficulty of named entity recognition in various languages. A nested named entity is a special form of named entity, which has a complex hierarchical structure, so it is difficult to accurately identify the type of entity. For the problem of nested named entity recognition, Ankit Agrawal et al. [
20] conducted in-depth research and proposed a method based on Bert to solve the problem of nested named entity recognition, which achieved the best experimental results in multiple data sets. The experiments show that the proposed method based on Bert is a more general method to solve the problem of nested named entities compared with the existing methods.
With the proposal of multilingual information extraction tasks, research on multilingual named entities began to increase. In the task of Chinese named entity recognition, due to the complex properties of Chinese named entities, such as a lack of word boundaries, uncertain length, and the rich semantics of a single word, Chinese named entity recognition is more difficult than English named entity recognition. Researchers have carried out significant exploratory research on Chinese named entity recognition in different fields. Dong et al. [
21] first applied the character-level BiLSTM-CRF model to the task of Chinese named entity recognition and proposed the use of Chinese radicals as the feature representation of the character, which achieved good performance without the use of Chinese word segmentation. This result indicated that Chinese named entity recognition based on a single character can achieve good results. Xuan Z et al. [
22] proposed a film critic name recognition method based on multi-feature extraction. This method uses corpus to extract character features, and uses the BiLSTM-CRF model for sequence annotation. This method can adequately solve the problems of complex appellations and unlisted words in Chinese film reviews. Li Dongmei et al. [
7] proposed a BCC-P named entity recognition method for plant attribute texts based on BiLSTM, CNN, and CRF. The CNN model was used to further extract sentence depth features. The accuracy reached 91.8%. The deep learning model was used to solve the problem of named entity recognition in plant attribute texts. Li Bo et al. [
23] proposed a neural network model based on the attention mechanism using the Transformer-CRF model in order to solve the problem of named entity recognition for Chinese electronic cases, and achieved a 95.02% F1-value in the constructed corpus set, with better recognition performance.
By comprehensively comparing the above named entity recognition models, in this paper, we enhance the accuracy of the model by integrating character radicals, word boundaries, and part-of-speech features and also incorporate the relative position information, in order to improve the inherent problem of the transformer model being unable to capture the position information, and the BiGRU model to extract the deep features of the sentence to obtain the optimal labeling of disease names, damage sites, and pharmaceutical entities.