Named Entity Recognition for Crop Diseases and Pests Based on Gated Fusion Unit and Manhattan Attention

Tang, Wentao; Wen, Xianhuan; Hu, Zelin

doi:10.3390/agriculture14091565

Open AccessArticle

Named Entity Recognition for Crop Diseases and Pests Based on Gated Fusion Unit and Manhattan Attention

by

Wentao Tang

¹,

Xianhuan Wen

² and

Zelin Hu

^1,*

¹

School of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, China

²

Humanoid Robot (Shanghai) Co., Ltd., Shanghai 201210, China

^*

Author to whom correspondence should be addressed.

Agriculture 2024, 14(9), 1565; https://doi.org/10.3390/agriculture14091565

Submission received: 3 August 2024 / Revised: 2 September 2024 / Accepted: 4 September 2024 / Published: 10 September 2024

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

Named entity recognition (NER) is a crucial step in building knowledge graphs for crop diseases and pests. To enhance NER accuracy, we propose a new NER model—GatedMan—based on the gated fusion unit and Manhattan attention. GatedMan utilizes RoBERTa as a pre-trained model and enhances it using bidirectional long short-term memory (BiLSTM) to extract features from the context. It uses a gated unit to perform weighted fusion between the outputs of RoBERTa and BiLSTM, thereby enriching the information flow. The fused output is then fed into a novel Manhattan attention mechanism to capture the long-range dependencies. The global optimum tagging sequence is obtained using the conditional random fields layer. To enhance the model’s robustness, we incorporate adversarial training using the fast gradient method. This introduces adversarial examples, allowing the model to learn more disturbance-resistant feature representations, thereby improving its performance against unknown inputs. GatedMan achieved F1 scores of 93.73%, 94.13%, 93.98%, and 96.52% on the AgCNER, Peoples_daily, MSRA, and Resume datasets, respectively, thereby outperforming the other models. Experimental results demonstrate that GatedMan accurately identifies entities related to crop diseases and pests and exhibits high generalizability in other domains.

Keywords:

named entity recognition; crop pests and diseases; gated unit; Manhattan attention mechanism; adversarial training

1. Introduction

Diversified cultivation and sustainable crop management are crucial for maintaining global food security, ecological health, and economic stability. However, crops are often threatened by various pests and diseases during cultivation, which can severely affect yield and quality. This can lead to significant economic losses and instability in the food supply chain. The prerequisite for pest and disease control is the acquisition of information and the refinement of knowledge related to pest control. However, a systematic knowledge base for agricultural practitioners remains unavailable, and scattered information on pest and disease control can be found on agricultural websites and in physical books.

Knowledge graphs, a key technology in knowledge engineering in the era of big data, can provide practical solutions to the problem of fragmentation and dispersal of knowledge in the agricultural field [1]. By constructing a crop pest and disease knowledge graph, information on pest and disease types, transmission paths, impact factors, and prevention and control strategies can be integrated and analyzed, thereby providing precise decision-making support to agricultural workers. This helps them take timely and appropriate control measures, effectively reducing the negative impacts of pests and diseases on crop production and food safety. Named entity recognition (NER) identifies and classifies relevant named entities from data sources and is a major step in building knowledge graphs. The accuracy of NER directly affects the quality of the knowledge graph.

In the agricultural field, traditional NER methods are based on logic and rules, which often need to be manually established by domain experts. This process lacks flexibility and adaptability. The main idea behind machine learning-based methods is to learn statistical models from large-scale annotated corpora, which are then used to identify named entities in new texts. Common machine learning models include support vector machines (SVM), hidden Markov models (HMM), and conditional random fields (CRF). Malarkodi et al. [2] were among the early adopters of the CRF model for NER tasks in agriculture. Li et al. [3] considered features such as parts of speech, left and right boundary words, radical components, and numerals in the names of crops, pests, and pesticides. Through experiments, they selected feature combinations and adjusted the context window size to further enhance the precision of entity extraction using the CRF model.

Deep learning-based methods have eliminated the dependency on feature engineering and have been widely used in recent years. Zhao et al. [4] utilized word vectors provided by ALBERT and employed a convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) to extract radical and stroke features, respectively, integrating attention mechanisms to capture long-distance dependencies and accurately identify five types of agricultural entities, including diseases and insects. Zhang et al. [5] proposed the NER model, AWdpCNER, which expands the semantic information of sentences through data augmentation methods and uses rules to calibrate entity boundaries for efficient recognition of wheat pests and disease entities. Zhang et al. [6] built a kiwifruit pest and disease corpus and introduced two novel modules, attsoftexicon and PCAT, to optimize the feature extraction process. The experimental results showed that these modules effectively improved the accuracy of named entity recognition for kiwifruit pests and diseases. Guo et al. [7] proposed a feature extraction framework based on a three-dimensional CNN combined with a channel fusion mechanism. It captured textual features above and below each character from an image perspective and performed well on both agricultural and public datasets. Wu et al. [8] designed an improved bidirectional gated recurrent unit that accurately recognized genetically and phenotypically named rice entities.

The aforementioned literature provides valuable references for this study and also identifies some limitations of the existing NER methods. (1) The information flow in most NER models is often relatively simplistic. Even when researchers use multiple neural networks during feature extraction, the outputs of these networks are typically combined in a static manner, such as concatenation, lacking the ability for dynamic feature selection. (2) NER models usually employ the dot-product attention mechanism to capture long-distance dependencies in text. However, as a linear operation, the dot product may struggle to capture complex nonlinear semantic relationships in agricultural texts. (3) NER datasets in the agricultural domain are typically collected, organized, and annotated manually. The quality of different data sources varies, and errors are inevitable during the annotation process, leading to the presence of noise in the datasets. Existing NER models still need to improve their stability and robustness when dealing with noisy data.

To address the aforementioned issues, this study proposes an NER model—GatedMan—based on a gated fusion unit and Manhattan attention. GatedMan utilizes RoBERTa as a pre-trained model, followed by feature extraction using BiLSTM. Subsequently, a gated unit is used to perform the weighted fusion of the outputs from RoBERTa and BiLSTM to ensure the diversity of the model information. The fused features are then input into a novel attention mechanism, referred to as Manhattan attention, to capture long-distance dependencies. A CRF layer is then used to obtain a globally optimal tagging sequence. Moreover, to enhance the robustness of GatedMan, adversarial training methods are introduced during model training. By generating adversarial samples similar to the original samples that cause erroneous predictions during the training process, the model learns more interference-resistant feature representations, thereby improving its performance against unknown inputs.

In summary, the contributions of this study can be summarized as follows:

This study uses a gated fusion unit to perform a weighted fusion of the output of RoBERTa with the output of BiLSTM to ensure that the information flow of the model does not depend on a single feature representation. The feature selection function of the gating mechanism also allows the model to select more important or more relevant features according to different input situations, which enhances the model’s ability to adapt to specific situations.
This study introduces a novel Manhattan attention mechanism that measures the similarity between the query matrix (Q) and key matrix (K) by calculating their Manhattan distance, which has been shown to outperform traditional dot-product attention.
This study incorporates adversarial training to learn more robust feature representations, enhancing the model’s performance when faced with unfamiliar inputs.

2. Model Structure

Figure 1 shows the model structure of GatedMan.

2.1. RoBERTa Pre-Trained Model

RoBERTa, which was introduced by Facebook AI in 2019, is a pre-trained model based on BERT [9]. Although its structure is fundamentally similar to BERT, RoBERTa incorporates several significant improvements during the pre-training phase, which are reflected in the following aspects. (1) Larger batch sizes: RoBERTa uses much larger batch sizes (range, 256–8000) during training. (2) More training data: RoBERTa was trained on approximately 160 GB of text data, compared to only 16 GB used by BERT. (3) Improved training methods: RoBERTa eliminates the next-sentence prediction task, which is part of BERT pre-training, instead of employing a more flexible dynamic masking technique. (4) Longer training duration: RoBERTa requires a longer period of pre-training to improve model performance. With these improvements, RoBERTa has demonstrated significant performance enhancements across various natural language processing tasks and has recently become more popular than BERT as a pre-trained model. Figure 2 shows the model structure of RoBERTa.

2.2. Adversarial Training

Adversarial training enhances the robustness of the model by introducing adversarial examples. Adversarial examples are samples that intentionally add small perturbations to the original data. These minor perturbations could cause the model to produce incorrect predictions. This method was initially widely used in the field of image processing but was later extended to natural language processing.

This study employs the fast gradient method (FGM) [10] for adversarial training, and the algorithmic process of FGM can be summarized as follows:

Calculate the original loss and gradient. First, perform forward propagation on the input sample $x$ and calculate its loss $L (x, y)$ . Subsequently, perform backward propagation on the loss function to compute the gradient $\nabla_{x} L (x, y)$ of the loss, considering the input sample.
Generate the adversarial perturbation: Calculate the perturbation vector $r$ based on the gradient of the embedding representation of the input sample, as shown in Equation (1).

$r = ϵ \cdot \frac{\nabla_{x} L (x, y)}{| | \nabla_{x} L (x, y) | |},$

(1)

where $ϵ$ denotes the perturbation size. The perturbation vector $r$ is then added to the embedding representation of the current input sample to obtain the adversarial sample $x_{a d v} = x + r$ .
Calculate the adversarial loss and gradient: Perform forward propagation on the adversarial sample to calculate its loss $L (x_{a d v}, y)$ . Subsequently, perform backward propagation on the loss function to compute the gradient of the adversarial loss relative to the input sample, and add this gradient to the original gradient from step (1).
Restore the original embedding: Return the embedding of the input sample to its original state from Step (1) to ensure that the parameter updates are not affected by adversarial perturbation.
Based on the accumulated gradients, update the model parameters to enhance the model’s robustness to both the original and adversarial samples.

Through these steps, the FGM method introduces adversarial samples during training, enabling the model to handle minor input perturbations. This improves the model’s robustness and generalization ability.

2.3. BiLSTM

BiLSTM is a special type of recurrent neural network structure composed of two LSTM layers that move in opposite directions. One processes the forward sequence in chronological order, whereas the other processes the reverse sequence in reverse order. Consequently, the output at each time step contains information from both prior and subsequent time points, significantly enhancing the model’s ability to understand the context. Figure 3 shows the structure of a single LSTM. It includes an input gate

I_{t}

, a forget gate

F_{t}

, and an output gate

O_{t}

; the update formulas for these gates are shown in Equations (2)–(4).

I_{t} = σ (X_{t} W_{x i} + H_{t - 1} W_{h i} + b_{i})

(2)

F_{t} = σ (X_{t} W_{x f} + H_{t - 1} W_{h f} + b_{f})

(3)

O_{t} = σ (X_{t} W_{x o} + H_{t - 1} W_{h o} + b_{o})

(4)

Additionally,

{\tilde{C}}_{t}

is the candidate memory cell, which refers to the information remembered by the neuron at time

t

;

C_{t}

is the memory cell, indicating the information to be stored by the current neuron; and

H_{t}

is the hidden state within the neural network. Their updated formulas are shown in Equations (5)–(7).

{\tilde{C}}_{t} = \tanh (X_{t} W_{x c} + H_{t - 1} W_{h c} + b_{c})

(5)

C_{t} = F_{t} ⊙ C_{t - 1} + I_{t} ⊙ {\tilde{C}}_{t}

(6)

H_{t} = O_{t} ⊙ \tanh (C_{t})

(7)

In the above formulas,

X_{t}

represents the input to the neural network at time

t

;

W_{x i}

,

W_{h i}

,

W_{x f}

,

W_{h f}

,

W_{x o}

,

W_{h o}

,

W_{x c}

, and

W_{h c}

are weight matrices; and

b_{i}

,

b_{f}

,

b_{o}

, and

b_{c}

are bias matrices.

2.4. Gated Fusion Unit

To diversify the information flow within the model and avoid dependence on a single feature representation, this study introduces an additional gated unit. This unit assigns weights to the outputs of BERT and BiLSTM and performs their fusion. Figure 4 shows the model structure of the gated fusion unit.

Assuming that

Y_{1}

and

Y_{2}

are the outputs of BERT and BiLSTM, respectively, the weight representations are first generated by the

Sigmoid

function:

Z = Sigmoid (Y_{1} W_{1} + Y_{2} W_{2})

(8)

where

W_{1}

and

W_{2}

are learnable parameters, and

Z

is the generated weight representation. Then, the two parts are weighted and fused through the weights, resulting in the fused output

O

:

O = Z ⊙ Y_{1} + (1 - Z) ⊙ Y_{2}

(9)

The design of the gated unit enables the model to dynamically adjust the contribution between the BERT output and the BiLSTM output based on the current input, effectively avoiding over-reliance on a single information source. Moreover, by analyzing the weight distribution of the gated unit, researchers can identify which information source the model relies on more when handling specific tasks. This is significant for gaining a deeper understanding of the model’s decision-making mechanism and making targeted optimizations accordingly.

2.5. Attention Mechanism

2.5.1. Limitations of the Dot-Product Attention Mechanism

In the attention mechanism, calculating the similarity between the Q matrix and the K matrix is carried out to compute the attention weights, allowing for the model to focus on the most relevant parts of the input sequence for the current query [11]. The traditional attention mechanism achieves this through the dot-product operation, which, while having advantages such as simplicity and efficiency, also has some inherent limitations.

First, in NLP tasks, models often need to process high-dimensional vectors. These vectors are typically obtained through word embeddings, where each word or term is converted into a point in a high-dimensional space. However, agriculture is a highly specialized field that includes numerous technical terms and rare vocabulary, which do not often appear in regular corpora. Consequently, the representation of these terms during the pre-training of word embedding models is often not rich enough, with many dimensions close to zero, thus manifesting as high-dimensional sparse. When taking the dot product of two high-dimensional sparse vectors, the numerous zero-by-zero calculations contribute nothing, leading to a very low proportion of effective information. This causes the calculation result to be overly influenced by a few non-zero elements, which may not always represent the important semantic parts of the vectors.

Second, as a linear operation, the dot product often struggles to fully capture the complex nonlinear semantic relationships in agricultural texts. In agricultural texts, direct linear relationships often manifest as clear causal or attribute connections, such as ‘Corn grows faster after fertilization’ or ‘This variety of apple tastes sweet’. However, such direct relationships are relatively rare in the corpus. Most agricultural texts contain more complex nonlinear relationships, such as sentences with multiple modifiers like ‘Using high-potassium compound fertilizer can effectively prevent tomato late blight’, or descriptions with implicit spatial relationships like ‘Rice in coastal areas is more susceptible to saline-alkali influence than in inland areas’. In these cases, the distance between entities and their descriptions may be long, and the semantic connections are complex, making it difficult for the dot-product operation alone to effectively reflect these semantic relationships.

Third, for two vectors

u

and

v

, their dot product is defined as

u \cdot v = |u| |v| \cos θ

(10)

where

θ

is the angle between the two vectors. This dot-product operation is primarily used to measure the similarity in the direction between the two vectors. Based on this, we can further explore other approaches. If we could find a method to directly measure the absolute distance between vectors, could it lead to better results in this task?

2.5.2. Manhattan Attention Mechanism

Manhattan distance, also known as L1 distance, is designed to calculate the sum of the absolute differences between the coordinates of two points in a standard coordinate system. For two vectors

q

and

k

, their Manhattan distance is defined as

Manhattan (q, k) = \sum_{i} |q_{i} - k_{i}|

(11)

In the Manhattan attention mechanism, suppose there is an input vector

H_{t}

, which undergoes a simple linear transformation to generate three matrices, Q, K, and V, then

Q = H_{t} W_{q}, K = H_{t} W_{k}, V = H_{t} W_{v}

(12)

Then, we calculate the Manhattan distance between the Q matrix and the K matrix, and take its negative value to transform it into a similarity score. This is because a smaller distance indicates a higher degree of similarity between the Q matrix and the K matrix:

S_{i j} = - Manhattan (q_{i}, k_{j})

(13)

To maintain mathematical consistency and ensure a smooth gradient flow, we usually map these scores to a probability distribution using the Softmax function:

A_{i j} = \frac{\exp (S_{i j})}{\sum_{k} \exp (S_{i k})}

(14)

Finally, we apply the attention weights to the V matrix to compute the weighted sum:

Attention (Q, K, V) = A \cdot V

(15)

Using Manhattan distance to measure the similarity between the Q matrix and the K matrix has the following advantages: (1) When dealing with high-dimensional sparse word vectors, it does not amplify the differences between non-zero elements but rather uniformly considers the differences across all dimensions. (2) Because Manhattan distance calculates the sum of absolute differences between vector elements, it shows higher robustness to outliers. As mentioned earlier, some noise inevitably exists in agricultural domain datasets, and using Manhattan distance can enhance the model’s tolerance to such data. (3) Manhattan distance accumulates differences across multiple dimensions, which can reflect the nonlinear and multivariate influences in agricultural texts to a certain extent. (4) The output distribution of Manhattan distance does not grow excessively with increasing dimensions as the dot product does. Therefore, when using the Manhattan attention mechanism, scaling the values to control their range as required in traditional dot-product attention mechanisms is not necessary.

The structure of the Manhattan attention proposed in this study is not significantly different from that of the dot-product attention, as shown in Figure 5.

2.6. CRF

CRF is a sequence labeling model that is particularly suited for natural language processing tasks that require consideration of the dependencies between labels. If the BIO method is used for data labeling, the constraints of CRF on labels are mainly reflected in the following two points: (1) the label for the first character in a sentence is typically “B” or “O” and cannot be “I”; (2) in the label sequence “B-label1 I-label2 I-label3”, label1, label2, and label3 should belong to the same type of entity.

3. Experiments and Analysis

3.1. Dataset

This study utilized the AgCNER dataset provided by Yao et al. [12] for the experimentation. AgCNER is the first large-scale, publicly available, Chinese agriculturally named entity recognition dataset. It included 13 entity categories, 206,992 entities, and 3,909,293 characters. The dataset was divided into training, validation, and test sets at an 8:1:1 ratio. Detailed entity information is shown in Table 1.

Common annotation strategies for NER tasks include BIO, BMES, and BIOES. BIO is the most widely used and most compatible with existing NER tools. AgCNER uses the BIO method for annotation, and some annotated examples from the dataset are shown in Table 2.

3.2. Experimental Setup

The experiments were conducted on a system running Ubuntu 20.04 with an NVIDIA GeForce RTX 4090D graphics card. The programming language was Python version 3.8, and the framework employed was Pytorch 2.0.0. The pre-trained model used is the chinese-roberta-wwm-ext provided by the Harbin Institute of Technology and iFLYTEK Joint Laboratory. This model has 12 layers, a 768-dimensional hidden layer, and a 12-head multi-head attention mechanism. Other parameter settings are shown in Table 3.

Consistent with other NER experiments, this study selected the precision, recall, and F1 scores as the performance evaluation metrics for the model. The calculation methods are shown in Equations (16)–(18):

P = \frac{T P}{T P + F P}

(16)

R = \frac{T P}{T P + F N}

(17)

F_{1} = \frac{2 \times P \times R}{P + R}

(18)

where true positive (TP) refers to positive samples correctly predicted as positive by the model, false positive (FP) refers to negative samples incorrectly predicted as positive by the model, and false negative (FN) refers to positive samples incorrectly predicted as negative by the model.

3.3. Analysis of Experimental Results

3.3.1. Comparison of Different Model Performances

Six model configurations—BiLSTM-CRF, IDCNN-CRF, BERT-BiLSTM-CRF, RoBERTa-CRF, RoBERT-IDCNN-CRF, and RoBERT-BiLSTM-CRF—were compared with GatedMan. Furthermore, to eliminate randomness, each group of models was tested three times. The Precision, Recall, and F1 score of the results are shown in Figure 6a–c.

Based on the experimental results shown in Figure 6, the following conclusions can be drawn.

By comparing the two groups of models, BiLSTM-CRF with IDCNN-CRF and RoBERT-IDCNN-CRF with RoBERT-BiLSTM-CRF, it can be observed that the models using BiLSTM as the encoder generally perform better, indicating that BiLSTM provides more powerful contextual semantic understanding capabilities, making it more suitable for text-processing tasks.
The introduction of pre-trained models enhanced performance. This improvement is due to the pre-trained models having learned abundant linguistic rules and knowledge from extensive text pre-training, which aids the current NER task. The pre-trained models also offer better model initialization parameters, enhancing the model’s generalization capability.
The proposed GatedMan model outperformed the other models in all three metrics, proving its effectiveness.

3.3.2. Comparison of Different Evaluation Mechanisms’ Performances

In addition to using the Manhattan distance to measure the similarity between the Q and K matrices in the attention mechanism, this study also explored three other mechanisms: scaled dot product, Euclidean distance, and cosine similarity. The Precision, Recall, and F1 score of the results are shown in Figure 7a–c.

As shown in Figure 7, it can be observed that when using the scaled dot product, Euclidean distance, and cosine similarity as evaluation mechanisms, the model’s precision decreases to some extent, while the recall improves. In NER experiments, the F1 score, which reflects the model’s overall performance by balancing precision and recall, is highest when using Manhattan distance as the evaluation mechanism. On the other hand, the model performs poorly with Euclidean distance, which may be because, in NER tasks, words or phrases are represented by high-dimensional embedding vectors. In high-dimensional data, the Euclidean distances between all data points tend to become similar, leading to a decline in the model’s discriminative power.

3.3.3. Comparison of Different Feature Fusion Methods’ Performances

In this study, a gated fusion unit was used to allocate weights between the outputs of BERT and BiLSTM and perform feature fusion. To validate the effectiveness of this method, we compared it with two traditional static feature fusion methods—concatenation and addition. The Precision, Recall, and F1 score of the results are shown in Figure 8a–c.

As shown in Figure 8, using the concatenation method has minimal impact on the three performance metrics of the model, while the addition operation slightly improves the model’s recall and F1 score. The gated fusion unit, although sacrificing some precision, provides a significant boost in recall, resulting in overall model performance that surpasses the two aforementioned methods.

3.3.4. Ablation Study

In this section, based on the RoBERT–BiLSTM–CRF model, the gated fusion unit, Manhattan attention mechanism, and adversarial training are introduced separately to observe changes in model performance. The Precision, Recall, and F1 score of the results are shown in Figure 9a–c.

As shown in Figure 9, the three model components—gated fusion unit, Manhattan attention mechanism, and adversarial training—each improve model performance when used individually, with adversarial training providing the greatest improvement, followed by the Manhattan attention mechanism, and the gated fusion unit having the smallest relative impact. Furthermore, when these three components are combined to form the proposed GatedMan model, the model’s performance is further enhanced compared to using only one component. The ablation experiments above confirmed the effectiveness of the Manhattan attention mechanism, residual fusion mechanism, and adversarial training as model components.

3.3.5. Results of Entity Recognition

In this section, the three experimental results of GatedMan on the AgCNER dataset were ranked using F1 scores, and the experiment with the median F1 score was selected. The recognition performance of various entities in this experiment is shown in Figure 10.

Figure 10 shows that the entity categories with excellent recognition results include CRO, DIS, and PET, all of which have F1 scores greater than 96%. Additionally, the categories BIS, CUL, DRUG, FER, PAOG, PART, PER, etc., also showed good recognition results, with F1 scores of approximately 90%. However, the recognition results for the COM, ORG, and OTH categories were poor, with F1 scores < 80%. After a detailed analysis of the dataset, we found that due to the uneven distribution of various types of entities within the dataset, the categories with better recognition results tend to be substantial or have clear entity boundaries, whereas the categories with poorer results tend to be the opposite, either being less frequent or more prone to entity nesting, making them more difficult to recognize.

3.3.6. Validation of Model Generalizability

To verify the generalizability of the GatedMan model, three public datasets are selected—MSRA, People’s Daily, and Resume—to continue the experiments and compare them with existing advanced models. The results are summarized in Table 4.

Table 4 shows that even on public datasets, GatedMan can achieve good entity recognition results, which are not inferior to those of other models. This proves that it can be extended to other fields.

4. Conclusions

To enhance the accuracy of named entity recognition in the field of crop pests and diseases and lay the foundation for the subsequent construction of knowledge graphs, this study proposes a named entity recognition model called GatedMan, which is based on a gated fusion unit and Manhattan attention. In the GatedMan model, we improved the traditional dot-product attention using the Manhattan distance to measure the similarity between the Q and K matrices. This addressed the issues of scaling and linear relationships inherent in traditional dot-product attention. Additionally, we employed a gated unit to perform a weighted fusion of the outputs from BERT and BiLSTM, thus enriching the flow of information. Finally, adversarial training methods were incorporated into the training process to enhance the model’s stability and robustness. The experiments prove that GatedMan performs well in entity recognition on the AgCNER dataset and demonstrates good generalizability in experiments on the People’s Daily, MSRA, and Resume public datasets. This highlights its applicability in other fields.

In future studies, we will build on this experiment to complete the relation extraction task and construct a crop pest and disease knowledge graph. Our goal is to use the knowledge graph as a database to develop an intelligent question-answering system and decision support system based on the knowledge graph. This system achieves precise prevention and scientific management of crop pests and diseases, providing intelligent and visual technical support and services for agricultural production.

Author Contributions

Conceptualization, W.T. and X.W.; methodology, W.T. and Z.H.; validation, W.T. and X.W.; formal analysis, Z.H.; investigation, W.T.; resources, X.W. and Z.H.; data curation, W.T.; writing—original draft preparation, W.T.; writing—review and editing, Z.H.; visualization, X.W.; funding acquisition, Z.H. All authors have read and agreed to the published version of the manuscript.

Funding

Key Discipline Construction of Gannan Normal University (220108), Science and Technology Project of Jiangxi Provincial Department of Education (490164), National Key R&D Program of China (2017YFD0701600), and Gannan Normal University Talent Fund (13SJJ202130).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data and code can be available at https://github.com/Tracyyytao/AGCNER/tree/master (accessed on 2 September 2024).

Conflicts of Interest

Author Xianhuan Wen was employed by the company Humanoid Robot (Shanghai) Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Tang, W.T.; Hu, Z.L. Survey of agricultural knowledge graph. Comput. Eng. Appl. 2024, 60, 63–76. [Google Scholar]
Malarkodi, C.S.; Lex, E.; Devi, S.L. Named entity recognition for the agricultural domain. Res. Comput. Sci. 2016, 117, 121–132. [Google Scholar]
Li, X.; Wei, X.H.; Jia, L.; Chen, X.; Liu, L.; Zhang, Y.E. Recognition of crops, diseases and pesticides named entities in Chinese based on conditional random fields. Trans. Chin. Soc. Agric. Mach. 2017, 48, 178–185. [Google Scholar]
Zhao, P.F.; Wang, W.; Liu, H.; Han, M. Recognition of the agricultural named entities with multifeature fusion based on ALBERT. IEEE Access 2022, 10, 98936–98943. [Google Scholar] [CrossRef]
Zhang, D.M.; Zheng, G.; Liu, H.B.; Ma, X.M.; Xi, L. AWdpCNER: Automated Wdp Chinese named entity recognition from wheat diseases and pests text. Agriculture 2023, 13, 1220. [Google Scholar] [CrossRef]
Zhang, L.L.; Nie, X.L.; Zhang, M.M.; Gu, M.Y.; Geissen, V.; Ritsema, C.J.; Niu, D.D.; Zhang, H.M. Lexicon and attention-based named entity recognition for kiwifruit diseases and pests: A deep learning approach. Front. Plant Sci. 2022, 13, 1053449. [Google Scholar] [CrossRef] [PubMed]
Guo, X.C.; Lu, S.H.; Tang, Z.; Bai, Z.; Diao, L.; Zhou, H.; Li, L. CG-ANER: Enhanced contextual embeddings and glyph features-based agricultural named entity recognition. Comput. Electron. Agric. 2022, 194, 106776. [Google Scholar] [CrossRef]
Wu, K.J.; Xu, L.Q.; Li, X.X.; Zhang, Y.H.; Yue, Z.Y.; Gao, Y.J.; Chen, Y.Q. Named entity recognition of rice genes and phenotypes based on BiGRU neural networks. Comput. Biol. Chem. 2024, 108, 107977. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.H.; Ott, M.; Goyal, N.; Du, J.F.; Joshi, M.; Chen, D.Q.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
Miyato, T.; Dai, A.M.; Goodfellow, I. Adversarial training methods for semi-supervised text classification. arXiv 2016, arXiv:1605.07725. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Yao, X.C.; Hao, X.; Liu, R.; Li, L.; Guo, X.C. AgCNER, the first large-scale Chinese named entity recognition dataset for agricultural diseases and pests. Sci. Data 2024, 11, 769. [Google Scholar] [CrossRef] [PubMed]
Liu, J.X.; Sun, M.Z.; Zhang, W.H.; Xie, G.Q.; Jiang, Y.X.; Li, X.L.; Shi, Z.X. DAE-NER: Dual-channel attention enhancement for Chinese named entity recognition. Comput. Speech Lang. 2023, 85, 101581. [Google Scholar] [CrossRef]
Zhang, B.; Liu, K.; Wang, H.W.; Li, M.Z.; Pan, J.G. Chinese named-entity recognition via self-attention mechanism and position-aware influence propagation embedding. Data Knowl. Eng. 2022, 139, 101983. [Google Scholar] [CrossRef]
Liu, J.X.; Cheng, J.R.; Peng, X.; Zhao, Z.L.; Tang, X.Y.; Sheng, V.S. MSFM: Multi-view semantic feature fusion model for Chinese named entity recognition. KSII Trans. Internet Inf. Syst. 2022, 16, 1833–1848. [Google Scholar]
Dong, Y.F.; Bai, J.M.; Wang, L.Q.; Wang, X. Chinese named entity recognition combining prior knowledge and glyph features. J. Comput. Appl. 2024, 44, 702–708. [Google Scholar]
Jia, Y.Z.; Xu, X.B. Chinese Named Entity Recognition Based on CNN-BiLSTM-CRF. In Proceedings of the IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 23–25 November 2018; pp. 1–4. [Google Scholar]
Zhang, N.X.; Li, F.; Xu, G.; Zhang, W.K.; Yu, H.F. Chinese NER using dynamic meta-embeddings. IEEE Access 2019, 7, 64450–64459. [Google Scholar] [CrossRef]
Li, J.T.; Meng, K. MFE-NER: Multi-feature fusion embedding for Chinese named entity recognition. arXiv 2021, arXiv:2109.07877. [Google Scholar]
Han, X.K.; Yue, Q.; Chu, J.; Shi, W.L.; Han, Z. Chinese named entity recognition based on attention-enhanced lattice Transformer. J. Xiamen Univ. Nat. Sci. 2022, 61, 1062–1071. [Google Scholar]

Figure 1. GatedMan model structure.

Figure 2. RoBERTa model structure.

Figure 3. LSTM structure.

Figure 4. Gated fusion unit structure.

Figure 5. Manhattan attention structure.

Figure 6. Comparison of different model performances.

Figure 7. Comparison of different evaluation mechanisms.

Figure 8. Comparison of different feature fusion methods.

Figure 9. Ablation study.

Figure 10. Results of entity recognition.

Table 1. Entity information.

Tags	Abbr.	Proportion	Example
Pest	PET	22.17%	Coccinellidae, locust
Disease	DIS	13.47%	Potato wart, rice blast
Crop	CRO	20.38%	Rice, wheat, apple
Drug	DRUG	12.08%	Prometryn, pendimethalin
Cultivar	CUL	5.86%	Kexin No.1, Quanyou 822
Fertilizer	FER	0.51%	Nitrogen fertilizer, potash fertilizer
Pathogens	PAOG	2.25%	Phytophthora infestans
Period	PER	10.84	Flower period, heading date
Part	PART	8.63%	Tuber, leaf, brain
Company	COM	0.34%	Yuan Longping High-Tech Agriculture Co., Ltd., Changsha, China
Organization	ORG	1.44%	Academy of Agricultural Sciences
Biosystematic	BIS	0.89%	Peronosporales, Noctuidae
Other	OTH	1.14%	Insecticide, viral disease

Table 2. Annotation examples.

Sentence	Label
The Cnaphalocrocis medinalis is showing an outbreak trend.	O/B-PET/I-PET/O/O/O/O/O
Usually, the disease starts from the lower leaves of the plant.	O/O/O/O/O/O/O/O/P-PARTO/O/O
Matsumura is an important pest on soybeans in Northeast China.	B-PET/O/O/O/O/O/B-CRO/O/O/O
The insect lurks as a larva inside the glume shell.	O/O/O/O/O/B-PER/O/O/B-PART/I-PART

Table 3. Experimental parameter settings.

Parameter	Value
Learning rate	5 × 10⁻⁵
Batch size	32
Optimizer	Adam
Epoch	30
LSTM hidden size	256
Dropout	0.1

Table 4. Experiments on public datasets.

Datasets	Models	P/%	R/%	F1/%
Peoples_daily	DAE-NER (Liu et al. [13])	92.17	90.63	91.40
	SAGNet (Zhang et al. [14])	89.44	80.79	84.90
	MSFM (Liu et al. [15])	82.21	86.66	87.43
	PKGF (Dong et al. [16])	94.35	93.06	93.70
	GatedMan	94.69	93.58	94.13
MSRA	CNN-BiLSTM-CRF (Jia et al. [17])	91.63	90.56	91.09
	DAE-NER (Liu et al. [13])	92.73	90.15	91.42
	DEM (Zhang et al. [18])	90.59	91.15	90.87
	MFE-NER (Li et al. [19])	93.32	86.83	89.96
	GatedMan	94.36	93.60	93.98
Resume	DAE-NER (Liu et al. [13])	96.92	95.18	96.04
	MSFM (Liu et al. [15])	96.08	94.79	95.43
	MFE-NER (Li et al. [19])	95.76	95.71	95.73
	AELT (HAN et al. [20])	95.80	96.06	95.93
	GatedMan	97.23	95.82	96.52

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tang, W.; Wen, X.; Hu, Z. Named Entity Recognition for Crop Diseases and Pests Based on Gated Fusion Unit and Manhattan Attention. Agriculture 2024, 14, 1565. https://doi.org/10.3390/agriculture14091565

AMA Style

Tang W, Wen X, Hu Z. Named Entity Recognition for Crop Diseases and Pests Based on Gated Fusion Unit and Manhattan Attention. Agriculture. 2024; 14(9):1565. https://doi.org/10.3390/agriculture14091565

Chicago/Turabian Style

Tang, Wentao, Xianhuan Wen, and Zelin Hu. 2024. "Named Entity Recognition for Crop Diseases and Pests Based on Gated Fusion Unit and Manhattan Attention" Agriculture 14, no. 9: 1565. https://doi.org/10.3390/agriculture14091565

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Named Entity Recognition for Crop Diseases and Pests Based on Gated Fusion Unit and Manhattan Attention

Abstract

1. Introduction

2. Model Structure

2.1. RoBERTa Pre-Trained Model

2.2. Adversarial Training

2.3. BiLSTM

2.4. Gated Fusion Unit

2.5. Attention Mechanism

2.5.1. Limitations of the Dot-Product Attention Mechanism

2.5.2. Manhattan Attention Mechanism

2.6. CRF

3. Experiments and Analysis

3.1. Dataset

3.2. Experimental Setup

3.3. Analysis of Experimental Results

3.3.1. Comparison of Different Model Performances

3.3.2. Comparison of Different Evaluation Mechanisms’ Performances

3.3.3. Comparison of Different Feature Fusion Methods’ Performances

3.3.4. Ablation Study

3.3.5. Results of Entity Recognition

3.3.6. Validation of Model Generalizability

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI