Based on BERT-wwm for Agricultural Named Entity Recognition

Huang, Qiang; Tao, Youzhi; Wu, Zongyuan; Marinello, Francesco

doi:10.3390/agronomy14061217

Open AccessArticle

Based on BERT-wwm for Agricultural Named Entity Recognition

¹

College of Information Engineering, Sichuan Agricultural University, Ya’an 625099, China

²

Department of Land, Environment, Agriculture and Forestry, University of Padua, 35020 Legnaro, Italy

^*

Author to whom correspondence should be addressed.

Agronomy 2024, 14(6), 1217; https://doi.org/10.3390/agronomy14061217

Submission received: 19 April 2024 / Revised: 28 May 2024 / Accepted: 1 June 2024 / Published: 4 June 2024

(This article belongs to the Special Issue Smart Farming Technologies for Sustainable Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

With the continuous advancement of information technology in the agricultural field, a large amount of unstructured agricultural textual information has been generated. This information is crucial for supporting the development of smart agriculture, making the application of named entity recognition in the agricultural field more urgent. In order to enhance the accuracy of agricultural entity recognition, this study utilizes the pre-trained BERT-wwm model for word embedding into the text. Additionally, a channel attention mechanism (CA) is introduced in the BILSTM-CRF downstream feature extraction network to comprehensively capture the contextual features of the text. Experimental results demonstrate that the proposed method significantly improves the performance of named entity recognition, with increased accuracy, recall, and F1 value. The successful implementation of this method provides reliable support for downstream tasks such as agricultural knowledge graph construction and question and answer systems and establishes a foundation for better understanding and utilization of agricultural textual information.

Keywords:

agricultural named recognition; BERT; BILSTM; channel attention mechanism

1. Introduction

With the rapid advancement of artificial intelligence and other technologies, digital agriculture has experienced significant growth in recent years and has become the primary means for farmers to acquire information [1]. However, the substantial volume of unstructured agricultural text data generated by these services also presents significant challenges in terms of information retrieval and deep semantic relationship mining. Specifically, the rapid retrieval of crucial information from massive datasets has become an urgent requirement for the digitalization of agriculture [2].

Named entity recognition is a natural language processing task that seeks to identify various types of entities from unstructured text. Initially, named entity recognition primarily focused on recognizing proper nouns, such as people’s names, places, and dates, as well as other essential information [3]. As technology has advanced, named entity recognition has progressively found widespread application in various domains, including medicine, aerospace, and agriculture [4,5,6].

Earlier methods for named entity recognition relied on manual rule-based approaches, where domain experts had to construct extraction rules and match them with text to extract entities. For instance, Sari et al. [7] employed a rule-based approach to identify pre-defined entities in a small corpus of accident records. Archana et al. [8] employ rule-based methods to identify named entity boundaries (start and internal mentions), resulting in a significant performance improvement in biomedical named entity recognition tasks. However, these methods have limitations when applied to large-scale datasets. They require significant manual effort for rule creation and maintenance, may not be adaptable to complex linguistic structures and contextual dependencies, and lack portability.

To address these challenges, named entity recognition has shifted towards statistical methods that treat the task as a sequence labeling problem. This approach considers that predicting the current label depends not only on the current input features but also on the previously predicted labels. By utilizing statistical models and machine learning algorithms, predictions can be made for each label in the sequence, enabling named entity recognition. Common statistics-based methods for named entity recognition include the Hidden Markov Model (HMM) [9], Conditional Random Field (CRF) [10], and Support Vector Machine (SVM) [11]. For example, Zhang et al. [12] employed an HMM-based approach to recognize cascading and abbreviated entities in the biomedical domain. Song et al. [13] proposed a CRF-based method for web retrieval granularity, which outperformed HMM and other methods in terms of accuracy and recall. Lee et al. [14] introduced a two-phase entity recognition technique based on Support Vector Machines, which involved identifying entity boundaries and semantic classification. However, statistics-based approaches require a substantial amount of labeled data for training and are less effective in recognizing domain-specific or rare entities.

With the widespread adoption of deep learning in natural language processing, neural networks have emerged as a prominent method for named entity recognition, achieving remarkable results. Unlike statistics-based methods, deep learning-based approaches no longer rely on manually defined features. Instead, they directly input the text, after word embedding, into neural networks, allowing for end-to-end named entity recognition by automatically extracting features. Researchers have proposed multi-layer neural network architectures based on Convolutional Neural Networks (CNNs), as have Collobert et al. [15], which extract high-level features from word feature vectors and have demonstrated good performance in entity recognition tasks. Chang et al. [16] proposed a hierarchical contextual model that incorporates sentence-level and document-level feature extraction. In sentence-level feature extraction, the model considers the varying contributions of each word to the sentence and employs a self-attention mechanism to extract sentence-level representations. For document-level feature extraction, 3D Convolutional Neural Networks can be utilized to capture not only features within sentences but also the sequential relationships between sentences, thus extracting document-level representations. However, CNN may struggle to capture dependencies in long sequential data.

To address the issue of long-range dependencies, researchers have turned to Recurrent Neural Networks (RNNs) for named entity recognition tasks. However, traditional RNNs face gradient vanishing and explosion problems [17]. To overcome these challenges, gating mechanisms have been introduced, leading to the development of variant RNN models such as the Long Short-Term Memory (LSTM) model. LSTM is effective in avoiding the decay or explosion of gradients, enabling better capture of relationships in long sequences [18,19]. In the realm of LSTM-based entity recognition, Huang et al. [20] validated several models, including the bidirectional LSTM (BILSTM) network, the LSTM with a conditional random field (CRF) layer (LSTM-CRF), and the bidirectional LSTM with a CRF layer (BILSTM-CRF). Experimental results demonstrated that the BILSTM-CRF model effectively captures past and future input features, thereby improving the performance of named entity recognition.

In 2017, Vaswani et al. [21] introduced the self-attention mechanism in the Transformer model, enabling it to effectively capture global dependencies in sequences and significantly enhance the performance of NLP tasks. Subsequently, novel attention mechanisms emerged. Qi et al. [22] further enhanced Chinese entity recognition by combining the extracted embeddings of characters, glyphs, pinyin, and dictionaries using Cross-Attention. Chen et al. [23] proposed a controlled attention mechanism that allows the network to identify entity boundaries and construct semantic dependencies. By incorporating entity cues, the network focuses on task-relevant information while disregarding irrelevant contents.

In recent years, pre-training models like BERT [24] have gained popularity in NLP due to their superior generalization ability. These models are commonly used to embed text and capture word features in different contexts for named entity recognition tasks [25]. In the field of Chinese named entity recognition, Jia et al. [26] proposed an entity augmentation-based BERT pre-training method, which explicitly incorporates document-specific entities into BERT pre-training, leading to improved performance in Chinese entity recognition. However, certain domains of Chinese entity recognition may encounter challenges due to the presence of low-quality training datasets or limited data, which can make it difficult to train models with strong generalization abilities. To address the problem of poor performance in low-resource domains, Liu et al. [27] constructed a high-quality large-scale NER corpus and utilized this dataset to train the NER-BERT model, which demonstrated enhanced entity recognition across different domains.

Considering the significant reliance on labor-intensive and time-consuming manual operations for entity recognition in the agricultural domain, this study aims to overcome this bottleneck by constructing a high-quality corpus specifically designed for agricultural named entity recognition, providing a robust foundation for information extraction within this domain [28]. Building upon this, we introduce the innovative BCA-BILSTM-CRF model, which integrates the deep contextual understanding capabilities of BERT-wwm, the long-distance dependency capturing advantages of BILSTM, and the excellent performance of CRF in sequence labeling [29]. Furthermore, the dynamic feature tuning provided by the channel attention (CA) mechanism effectively enhances the accuracy of entity recognition in complex agricultural texts [30]. Through this comprehensive approach, we not only enrich the semantic representation of Chinese agricultural texts but also achieve fine-grained entity feature capturing and effective modeling of the global context, thus significantly improving the efficiency and accuracy of automated agricultural information processing. The outcomes of this study are expected to bring innovation to applications such as agricultural knowledge graph construction and precision agriculture decision support systems, thereby driving the advancement of agricultural intelligence.

The remaining structure of this study is as follows: Section 2 provides a detailed exposition of the proposed model’s architecture, elucidating the functionalities of each constituent module and unveiling the rationale behind the selection and processing of experimental data, and ensuring research transparency and reproducibility. Section 3 delves deep into the experimental results, highlighting the significance of this study’s contributions through performance comparisons with multiple benchmark models. Section 4 further intensifies the discussion by not only evaluating the model’s performance but also comprehensively comparing it with the existing research achievements, dissecting its strengths and limitations. Section 5 summarizes the research findings and explores potential future research directions, leaving space for subsequent explorations and stimulating new research inspirations.

2. Materials and Methods

2.1. Model Structure

In this study, we present BCA-BILSTM-CRF, a model for agricultural entity recognition that utilizes Chinese BERT-wwm and incorporates CA. The model’s structure is illustrated in Figure 1 and consists of four main layers: BERT-wwm layer, CA attention layer, bidirectional LSTM network, and CRF layer. The BERT-wwm layer maps each word in the input to a high-dimensional vector space, capturing the contextual information of the words. Next, the CA layer adaptively selects and weights the features from each channel, enhancing the recognition of entities. Following that, the BILSTM layer captures word-to-word dependencies and contextual information by considering both past and future contexts. Finally, the CRF layer predicts the categories of the recognized entities.

2.2. BERT-wwm Model

In 2018, BERT was introduced for natural language understanding tasks to enhance machines’ comprehension of phrase context. BERT tackles two main training tasks. The first is Sentence Order Prediction (SOP), which predicts whether the next sentence follows the current one. The second task is the Masked Language Model (MLM), where 15% of the sentence is randomly masked, with 80% of that portion being masked. Within the masked segment, 10% of the words are randomly replaced, while the remaining 10% are left untouched.

In the case of the MLM task, BERT divides Chinese sentences into word-level segments for masking. However, this approach overlooks the inherent features associated with Chinese words, resulting in suboptimal fusion of word features within the Chinese domain. To address this limitation, BERT-wwm introduces a new masking strategy known as the whole word masking strategy. The masking strategy employed by BERT-wwm is outlined in Table 1.

In Table 1, the Chinese phrase “卷叶蛾是苹果病虫害” translates to “Leaf roller is an apple pest.” The masking strategy illustrated here is employed by BERT and BERT-wwm models during pre-training to predict the masked tokens. In the provided example, “卷叶蛾” is the target token for masking. For BERT, it masks the individual character “蛾” as it lacks information about Chinese word boundaries. In contrast, BERT-wwm is specifically designed for Chinese text and masks the entire Chinese word “卷叶蛾”.

In the MLM task, BERT-wwm adopts the whole word masking strategy, which suggests that Chinese words should be masked as a single unit rather than being masked based on word segments. By utilizing this approach, BERT-wwm can effectively capture the association features among Chinese words and preserve the semantic and contextual information at the word level. This enables the model to comprehend and represent Chinese vocabulary more accurately. Consequently, encoding Chinese text data using BERT-wwm facilitates better fusion of feature information in Chinese and provides more precise contextual representation.

In the present study, we employ BERT-wwm to encode the original Chinese text, as depicted in Figure 2. The special tokens used are CLS, representing the beginning of a sentence, and SEP, used to distinguish different sentences. The input Chinese text utilizes Position Embeddings to indicate the positional information of each word in the input sequence. Segmentation Embeddings are employed to signify the separation between different sentences. Token Embeddings are used to map each word in the text to its corresponding word vector. Finally, the three types of embeddings are summed together to generate the output vector (E1 to E9) representation of the BERT model.

2.3. Channel Attention

The introduction of the attention mechanism in the model enables it to disregard the length of the sequence and effectively capture dependencies between different positions within the sequence. This empowers the model to capture long-range dependencies and enhance its understanding of contextual information. To enhance the representation capability and modeling accuracy of the model, multiple self-attention modules can be stacked together, with each module learning a distinct representation. In the CA mechanism, it is recognized that there exist relevant dependencies between different channels of the input data. The CA module is designed to capture feature relationships across different channels, improving the model’s ability to capture channel-specific information. The structure of this module is illustrated in Figure 3.

\otimes

represents matrix multiplication, and

\oplus

represents element-wise sum.

Given an input feature map A of size C × H × W, it is initially reshaped into shapes C × N and N × C. Subsequently, a matrix multiplication operation is performed on the reshaped tensors. Finally, the resulting tensor is passed through a softmax layer to obtain a feature map X of size C × C.

x_{j i} = \frac{\exp (A_{i} \cdot A_{j})}{\sum_{i = 1}^{C} \exp (A_{i} \cdot A_{j})}

(1)

In Formula (1),

x_{j i}

represents the influence of the

i^{t h}

channel on the

j^{t h}

channel,

A_{i}

represents the

i^{t h}

row of the feature map, and

A_{j}

represents the

j^{t h}

column of the feature map. Additionally, matrix multiplication is performed between X and the transpose of

A

before reshaping the result into R^C×H×W. Finally, the result is multiplied by the scale parameter and element-wise summed with

A

to obtain the final output

E

∈ R^C×H×W:

E_{j} = \partial \sum_{i = 1}^{C} (x_{j i} A_{i}) + A_{j}

(2)

where the weights

\partial

are gradually learned from 0. Formula (2) indicates that the final feature of each channel is the weighted sum of the features of all channels and the original features.

2.4. BILSTM Model

Indeed, traditional RNNs have certain limitations when it comes to processing long sequences. They can only process sequences in one direction and are more susceptible to issues like gradient vanishing or exploding as the sequence length increases. To overcome these challenges, the LSTM network was introduced as a variant of the RNN. The LSTM network addresses the problems of gradient vanishing and exploding in traditional RNNs by incorporating a gate mechanism. This mechanism allows the network to selectively retain or forget information based on the context, ensuring better long-term dependency modeling. Additionally, a bidirectional LSTM network can be employed, which considers both the forward and reverse information of the text simultaneously. This enables a more comprehensive capture of text features. The structure diagram of LSTM is depicted in Figure 4. The main formula of the LSTM network is as follows:

f_{t} = σ (W_{f} [h_{t - 1}, x_{t}] + b_{f})

(3)

i_{t} = σ (W_{i} [h_{t - 1}, x_{t}] + b_{i})

(4)

{\tilde{c}}_{t} = t a n h (W_{c} [h_{t - 1}, x_{t}] + b_{c})

(5)

c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ \tilde{c_{t}}

(6)

o_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o})

(7)

h_{t} = o_{t} ⊙ \tanh (c_{t})

(8)

where

σ

represents the Sigmoid activation function, tanh represents the hyperbolic tangent activation function, and

f_{t}

,

i_{t}

,

o_{t}

, and

c_{t}

represent the state of the forgetting gate, input gate, output gate and memory cell at time

t

.

W_{f}

,

W_{i}

,

W_{o}

, and

W_{c}

represent the learnable weight matrix,

b_{f}

,

b_{i}

,

b_{c}

, and

b_{o}

represent the offset vector,

{\tilde{c}}_{t}

represent the input memory candidate state,

x_{t}

represents the input vector at time

t

,

h_{t}

represents the output at time

t

, and ⊙ represents element-wise multiplication.

2.5. Conditional Random Field Model

While the LSTM network is effective in processing long sequences, it faces challenges in capturing dependencies between adjacent text labels in entity recognition tasks. For instance, in entity recognition, the “I” label should always follow the “B” label. However, the LSTM network may generate outputs where the “I” label appears before the “B” label. To address this issue, a Conditional Random Field (CRF) layer can be added after the LSTM network to identify the optimal label sequence globally. The CRF is a probabilistic graphical model commonly used in sequence labeling tasks. It calculates the output probabilities based on various factors, including the input features and label dependencies. The output probability formula for CRF is as follows:

S c o r e (X, y) = \sum_{i = 0}^{n} A_{y_{i} y_{i + 1}} + \sum_{i = 1}^{n} P_{i, y_{i}}

(9)

X represents the input sequence to the CRF layer, and

n

is the length of the sequence.

A_{y_{i} y_{i + 1}}

denotes the transition score between labels

y_{i}

and

y_{i + 1}

, while

P_{i, y_{i}}

represents the score of the

i^{t h}

word being assigned label

y_{i}

. These scores are used to compute the conditional probability of the sequence using the softmax function. The sequence with the highest score, denoted as

Z

, is then chosen as the output of the CRF layer.

Z = \arg \max (s c o r e (X, y))

(10)

2.6. Experimental Method Design

2.6.1. Dataset

In this study, two public datasets and a private agricultural dataset were utilized to evaluate the effectiveness of the proposed model. The MSRA dataset, which focuses on news articles, is a Chinese entity recognition dataset that includes entity types such as place names (LOC), person names (PER), and organization names (ORG) [31]. The People’s Daily NER dataset, generated from the People’s Daily corpus from 1998 and 2014 editions, also contains three common entity types: PER, LOC, and ORG.

However, in the domain of agricultural entity recognition, there is a lack of large-scale Chinese public datasets. Given the vast range of crop types and the agricultural knowledge involved, constructing agricultural datasets presents certain challenges. To address this, we employed web crawling technology to gather text data from sources such as the China Crop Resource Information Network and the Agricultural Cultivation Technology Forum, among other professional agricultural consulting websites. We then applied data cleansing methods, including the removal of special characters and non-text characters, to preprocess the data. Finally, all the collected data were merged to create a 200,000-character agricultural entity recognition dataset called AgriNER.

AgriNER comprises eight entity categories, including over 120 crop types, more than 30 pests and diseases, and over 50 control methods, distribution areas, suitable temperatures, hazardous areas, onset periods, and onset symptoms. The entity categories were labeled using the BIO entity labeling scheme, where “B” indicates the beginning of an entity, “I” indicates the inside of an entity, and “O” indicates the non-entity part. Figure 5 illustrates the labeling process, where two entity categories, pests and diseases (DIS) and crops (CROP), are involved. For example, the beginning of the pest and disease entity “leaf roller moth” is labeled as B-DIS, and the remaining part of the entity is labeled as I-DIS. Similarly, the beginning of the crop entity “apple” is labeled as B-CROP, with the rest of the entity labeled as I-CROP. Non-entity parts of the sentence are labeled as O. The dataset was then divided into training, testing, and validation sets in an 8:1:1 ratio.

2.6.2. Parameter Setting and Evaluation Metrics

In this research paper, the experimental section employs the pre-trained BERT-wwm model to encode the text data. Subsequently, a CA-BILSTM-CRF network is utilized to extract features from the encoded text and perform label classification. The model’s key parameters are set according to Table 2, which specifies their respective values.

In the experimental setup, each training batch consists of 64 samples. The initial learning rate of the model is set to 0.001. The maximum input sequence length for BERT-wwm is set to 100. The hidden layer size of the BILSTM is 512. The learning rate decay is set to 0.8. The training process is conducted for a total of 30 iterations. Figure 6 illustrates the loss value curves of all models during the training process. It is observed that the loss values tend to stabilize and become smoother after training for approximately 20 epochs on all datasets.

This research paper adopts commonly used evaluation metrics in the field of NLP, namely, precision rate, recall rate, and F1 value.

The precision rate measures the accuracy of positive examples in the model’s prediction results. It is calculated using the following formula:

Precision = \frac{T P}{T P + F P}

(11)

The recall rate evaluates the model’s ability to cover positive examples. It is calculated using the following formula:

Recall = \frac{T P}{T P + F N}

(12)

The F1 value is a combined metric that assesses the overall performance of the model by considering both precision and recall.

F 1 = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(13)

Among them,

T P

(True Positive) represents the number of samples that the model correctly predicts as positive examples, and

F P

(False Positive) represents the number of samples that the model incorrectly predicts as positive examples.

3. Results

3.1. Results on Dataset

In this study, we conducted a comparative analysis of four named entity recognition models, i.e., BERT-BILSTM-CRF, BERT-wwm-BILSTM-CRF, BERT-CA-BILSTM-CRF, and BCA-BILSTM-CRF, across three different datasets—MSRA, People’s Daily NER, and AgriNER. The objective was to comprehensively evaluate and understand the performance of these models in different text types. Evaluation metrics included precision, recall, and F1 scores, which served as benchmarks to measure the accuracy and completeness of the models in identifying named entities. The results are presented in Table 3.

For the MSRA dataset, the BCA-BILSTM-CRF model demonstrated superior performance, with precision of 0.898, recall of 0.925, and an F1 score of 0.912, outperforming the other models. Compared to the baseline model, the BERT-wwm-BILSTM-CRF model incorporating BERT-wwm achieved an F1 score improvement of 0.9 percentage points, while the BERT-CA-BILSTM-CRF model incorporating the CA attention mechanism achieved an F1 score improvement of 3.7 percentage points. When both modules were incorporated simultaneously, the F1 score improved by 4.3 percentage points compared to the base model. This indicates a significant enhancement in precision and recall by integrating the BERT-wwm pre-training model and CA mechanism.

For the People’s Daily NER dataset, the BCA-BILSTM-CRF model maintained its leading position with precision of 0.894, recall of 0.911, and an F1 score of 0.912. Compared to the BERT-BILSTM-CRF model, the F1 score also increased by 2.6 percentage points.

For the AgriNER dataset, although the performance of all models declined compared to the previous two datasets, reflecting the challenges in text recognition in the agricultural domain, the BCA-BILSTM-CRF model showcased excellent performance with recall of 0.801 and an F1 score of 0.733. This represents an improvement of approximately 3.7 percentage points in the F1 score compared to the BERT-BILSTM-CRF model.

In conclusion, the BCA-BILSTM-CRF model, which integrates the BERT-wwm pre-training and CA mechanisms, demonstrates significant improvements in the accuracy and generalization capabilities of named entity recognition tasks.

Figure 7 presents a heatmap visualization of the F1 scores of various models, illustrating their comprehensive performance. The color bar on the right side represents the F1 values, where darker colors (lower values) indicate poorer performance, while lighter colors (higher values) indicate better performance. The x-axis represents different datasets, while the y-axis represents different models. The figure clearly demonstrates that the proposed model in this study achieved high scores across all datasets, showcasing its exceptional performance.

3.2. Comparison of Results under Different Attention Mechanisms

To further validate the effectiveness of the proposed model, this study compares it with two other models: BERT-wwm-MBILSTM-CRF with a multi-head self-attention mechanism and a model with channel attention. The results of this comparison are presented in Figure 8.

The figure illustrates the differentiation of precision, recall, and F1 scores among various models using different color codes. The experiment encompasses three separate datasets, and the results reveal that the model integrated with the CA mechanism surpasses the model utilizing the multi-head attention mechanism in all three key evaluation metrics, indicating a highly superior performance. This suggests that the CA attention mechanism enhances the precision, recall, and F1 scores of the model across diverse datasets.

4. Discussion

In this study, we propose a novel model aimed at addressing the challenges in agricultural entity recognition, such as diverse naming methods, blurred entity boundaries, insufficient feature extraction, and inconsistent labeling of entity boundaries [32]. While BERT has shown promising results in encoding languages like Persian and Italian [33,34], its ability to encode the Chinese text is limited. To overcome this limitation, we utilize BERT-wwm, specifically designed for Chinese text encoding. Experimental results on multiple datasets demonstrate the effectiveness of the BERT-wwm model in improving Chinese entity recognition tasks.

Although LSTM models have memory units and gating mechanisms to handle long-range dependencies, they lack the ability to dynamically adjust the model’s attention based on contextual information [35]. Therefore, we incorporate the channel attention mechanism to extract text features more effectively. Unlike traditional multi-head attention mechanisms that directly capture features from different subspaces, the CA mechanism automatically learns which channels contribute more to the task, thereby enhancing the network’s feature representation capability. The experimental results show that the baseline model achieved F1 score improvements of 3.7%, 2.4%, and 3.3% on three datasets after incorporating the channel attention (see Section 3.1). These results demonstrate the effectiveness of the channel attention mechanism in improving model performance.

It is worth noting that the overall evaluation metrics of the models on the agricultural named entity recognition dataset are lower compared to the performance for public datasets. We conducted an analysis to identify potential reasons for this disparity. Firstly, the agricultural domain dataset may be relatively smaller or lack diversity compared to the public datasets, which are typically larger and more diverse. The smaller dataset size and lack of diversity might lead to poor generalization ability and performance degradation for the agricultural dataset. Secondly, the agricultural named entity recognition dataset may have a different data distribution compared to the public datasets. Agricultural named entities often involve specific terms, specialized terminology, or geographical locations, resulting in limited quantities of certain entity categories in the data, which can negatively impact the performance.

Considering the limitations observed in the performance of the models on the agricultural named entity recognition dataset, future research can focus on several key directions. Firstly, expanding specialized agricultural datasets by collecting more diverse text resources and increasing annotated data can improve the model’s generalization ability and mitigate performance degradation due to data scarcity. Secondly, designing or optimizing preprocessing and feature selection strategies that cater to the unique distribution characteristics of agricultural data can better capture agricultural domain-specific terms, specific nouns, and geographical information, while balancing the data distribution across different categories and improving recognition accuracy on minority classes. Lastly, given the successful application of large-scale models in various industries, the next step would involve fine-tuning these models on a large-scale dataset specifically tailored to the agricultural domain, aiming to improve entity recognition accuracy while maintaining broad applicability [36,37].

5. Conclusions

In this study, we propose a novel agricultural named entity recognition method called BCA-BILSTM-CRF based on BERT-wwm. We conducted comprehensive experimental evaluations on three datasets: MSRA, People’s Daily, and agricultural entity recognition. Through comparative experiments and performance analysis, we draw the following conclusions:

Firstly, our proposed BCA-BILSTM-CRF model demonstrates excellent performance in named entity recognition tasks in the agricultural domain. Compared to traditional methods, our model achieves significant improvements in evaluation metrics such as precision, recall, and F1 value. This highlights the strong adaptability and generalization ability of our BCA-BILSTM-CRF model in processing agricultural text data.

Secondly, our model exhibits stable performance across different datasets. Through experimental evaluations on MSRA, People’s Daily, and agricultural entity recognition datasets, we observe that our model accurately recognizes named entities in texts from diverse domains.

Furthermore, we conduct comparative experiments with other named entity recognition methods. The results demonstrate that our model achieves comparable or superior performance for named entity recognition tasks in the agricultural domain when compared to alternative methods.

In summary, this research provides an effective solution for named entity recognition in the agricultural field. The BCA-BILSTM-CRF model exhibits superior performance and robustness. This study holds practical value and promising application prospects for agricultural information processing and related tasks in the agricultural domain.

Author Contributions

Conceptualization, Q.H. and Y.T.; methodology, Q.H. and Y.T.; software, Y.T.; validation, Z.W. and F.M.; formal analysis, Q.H.; investigation, Y.T.; resources, Z.W.; data curation, Y.T. and Z.W.; writing—original draft preparation, Y.T. and Q.H.; writing—review and editing, Q.H., Y.T. and F.M.; visualization, Y.T.; supervision, Q.H. and Z.W.; project administration, Q.H. and F.M.; funding acquisition, Q.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset constructed for this study is available at “https://github.com/TYZ-001/Ner_dataset (accessed on 21 May 2024)” and can be shared upon request.

Acknowledgments

We thank Jinrong Chen and Yinghao He for their help during the research process.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Fountas, S.; Espejo-García, B.; Kasimati, A.; Mylonas, N.; Darra, N. The future of digital agriculture: Technologies and opportunities. IT Prof. 2020, 22, 24–28. [Google Scholar] [CrossRef]
Zhao, P.; Wang, W.; Liu, H.; Han, M. Recognition of the agricultural named entities with multifeature fusion based on albert. IEEE Access 2022, 10, 98936–98943. [Google Scholar] [CrossRef]
Asgari-Chenaghlu, M.; Feizi-Derakhshi, M.R.; Farzinvash, L.; Balafar, M.; Motamed, C. CWI: A multimodal deep learning approach for named entity recognition from social media using character, word and image features. Neural Comput. Appl. 2022, 34, 1905–1922. [Google Scholar] [CrossRef]
Baigang, M.; Yi, F. A review: Development of named entity recognition (NER) technology for aeronautical information intelligence. Artif. Intell. Rev. 2023, 56, 1515–1542. [Google Scholar] [CrossRef]
Guo, X.; Zhou, H.; Su, J.; Hao, X.; Tang, Z.; Diao, L.; Li, L. Chinese agricultural diseases and pests named entity recognition with multi-scale local context features and self-attention mechanism. Comput. Electron. Agric. 2020, 179, 105830. [Google Scholar] [CrossRef]
Tikayat Ray, A.; Pinon-Fischer, O.J.; Mavris, D.N.; White, R.T.; Cole, B.F. aeroBERT-NER: Named-entity recognition for aerospace requirements engineering using BERT. In Proceedings of the AIAA SCITECH 2023 Forum, National Harbor, MD, USA, 23–27 January 2023; p. 2583. [Google Scholar]
Sari, Y.; Hassan, M.F.; Zamin, N. Rule-based pattern extractor and named entity recognition: A hybrid approach. In Proceedings of the 2010 International Symposium on Information Technology, Kuala Lumpur, Malaysia, 15–17 June 2010; pp. 563–568. [Google Scholar]
Archana, S.; Prakash, J.; Singh, P.K.; Ahmed, W. An Effective Biomedical Named Entity Recognition by Handling Imbalanced Data Sets Using Deep Learning and Rule-Based Methods. SN Comput. Sci. 2023, 4, 650. [Google Scholar] [CrossRef]
Pande, S.D.; Kanna, R.K.; Qureshi, I. Natural language processing based on name entity with n-gram classifier machine learning process through ge-based hidden markov model. Mach. Learn. Appl. Eng. Educ. Manag. 2022, 2, 30–39. [Google Scholar]
Sharma, R.; Morwal, S.; Agarwal, B. Named entity recognition using neural language model and CRF for Hindi language. Comput. Speech Lang. 2022, 74, 101356. [Google Scholar] [CrossRef]
Hamad, R.M.; Abushaala, A.M. Medical Named Entity Recognition in Arabic Text using SVM. In Proceedings of the 2023 IEEE 3rd International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering (MI-STA), Benghazi, Libya, 21–23 May 2023; pp. 200–205. [Google Scholar]
Zhang, J.; Shen, D.; Zhou, G.; Su, J.; Tan, C.-L. Enhancing HMM-based biomedical named entity recognition by studying special phenomena. J. Biomed. Inform. 2004, 37, 411–422. [Google Scholar] [CrossRef]
Song, S.; Zhang, N.; Huang, H. Named entity recognition based on conditional random fields. Clust. Comput. 2019, 22, 5195–5206. [Google Scholar] [CrossRef]
Lee, K.-J.; Hwang, Y.-S.; Kim, S.; Rim, H.-C. Biomedical named entity recognition using two-phase model based on SVMs. J. Biomed. Inform. 2004, 37, 436–447. [Google Scholar] [CrossRef]
Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 2011, 12, 2493–2537. [Google Scholar]
Chang, J.; Han, X. Multi-level context features extraction for named entity recognition. Comput. Speech Lang. 2023, 77, 101412. [Google Scholar] [CrossRef]
Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training recurrent neural networks. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1310–1318. [Google Scholar]
Suganthi, M.; Arun Prakash, R. An offline English optical character recognition and NER using LSTM and adaptive neuro-fuzzy inference system. J. Intell. Fuzzy Syst. 2023, 44, 3877–3890. [Google Scholar] [CrossRef]
DiPietro, R.; Hager, G.D. Deep learning: RNNs and LSTM. In Handbook of Medical Image Computing and Computer Assisted Intervention; Elsevier: Amsterdam, The Netherlands, 2020; pp. 503–519. [Google Scholar]
Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF models for sequence tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Qi, P.; Li, P.; Qin, B. CLGP: Multi-Feature Embedding based Cross-Attention for Chinese NER. In Proceedings of the 2023 International Conference on Communications, Computing and Artificial Intelligence (CCCAI), Shanghai, China, 23–25 June 2023; pp. 109–113. [Google Scholar]
Chen, Y.; Huang, R.; Pan, L.; Huang, R.; Zheng, Q.; Chen, P. A Controlled Attention for Nested Named Entity Recognition. Cogn. Comput. 2023, 15, 132–145. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Jia, C.; Shi, Y.; Yang, Q.; Zhang, Y. Entity enhanced BERT pre-training for Chinese NER. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 6384–6396. [Google Scholar]
Jia, C.; Liang, X.; Zhang, Y. Cross-domain NER using cross-domain language modeling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 2464–2474. [Google Scholar]
Liu, Z.; Jiang, F.; Hu, Y.; Shi, C.; Fung, P. NER-BERT: A pre-trained model for low-resource entity tagging. arXiv 2021, arXiv:2112.00405. [Google Scholar]
Wang, L.; Jiang, J.; Song, J.; Liu, J. A Weakly-Supervised Method for Named Entity Recognition of Agricultural Knowledge Graph. Intell. Autom. Soft Comput. 2023, 37, 833–848. [Google Scholar] [CrossRef]
VeeraSekharReddy, B.; Rao, K.S.; Koppula, N. Named Entity Recognition using CRF with Active Learning Algorithm in English Texts. In Proceedings of the 2022 6th International Conference on Electronics, Communication and Aerospace Technology, Coimbatore, India, 1–3 December 2022; pp. 1041–1044. [Google Scholar]
Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
Levow, G.-A. The third international Chinese language processing bakeoff: Word segmentation and named entity recognition. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, Australia, 22–23 July 2006; pp. 108–117. [Google Scholar]
Wang, C.; Gao, J.; Rao, H.; Chen, A.; He, J.; Jiao, J.; Zou, N.; Gu, L. Named entity recognition (NER) for Chinese agricultural diseases and pests based on discourse topic and attention mechanism. Evol. Intell. 2024, 17, 457–466. [Google Scholar] [CrossRef]
Masumi, M.; Majd, S.S.; Shamsfard, M.; Beigy, H. FaBERT: Pre-training BERT on Persian Blogs. arXiv 2024, arXiv:2402.06617. [Google Scholar]
Licari, D.; Comandè, G. ITALIAN-LEGAL-BERT models for improving natural language processing tasks in the Italian legal domain. Comput. Law Secur. Rev. 2024, 52, 105908. [Google Scholar] [CrossRef]
Qian, Y.; Chen, X.; Wang, Y.; Zhao, J.; Ouyang, D.; Dong, S.; Huang, L. Agricultural text named entity recognition based on the BiLSTM-CRF model. In Proceedings of the Fifth International Conference on Computer Information Science and Artificial Intelligence (CISAI 2022), Chongqing, China, 28 March 2023; pp. 525–530. [Google Scholar]
Wang, S.; Sun, X.; Li, X.; Ouyang, R.; Wu, F.; Zhang, T.; Li, J.; Wang, G. Gpt-ner: Named entity recognition via large language models. arXiv 2023, arXiv:2304.10428. [Google Scholar]
Majdik, Z.P.; Graham, S.S.; Shiva Edward, J.C.; Rodriguez, S.N.; Karnes, M.S.; Jensen, J.T.; Barbour, J.B.; Rousseau, J.F. Sample Size Considerations for Fine-Tuning Large Language Models for Named Entity Recognition Tasks: Methodological Study. JMIR AI 2024, 3, e52095. [Google Scholar] [CrossRef]

Figure 1. BCA-BILSTM-CRF agricultural named entity recognition model.

Figure 2. BERT-wwm encoding model.

Figure 3. Channel attention structure.

Figure 4. LSTM model structure.

Figure 5. Agriculture data annotation.

Figure 6. Loss curve for model training.

Figure 7. F1 comparison for datasets.

Figure 8. Comparison of different attention evaluation indicators.

Table 1. The masking strategy of BERT and BERT-wwm.

Illustration	Sample
Original text	卷叶蛾是苹果病虫害
Segmented text	卷叶蛾是苹果病虫害
BERT masking strategy	卷叶[MASK]是苹果病虫害
BERT-wwm masking strategy	[MASK] [MASK] [MASK]是苹果病虫害

Table 2. Model parameter setup.

Parameter	Value
Batch size	64
Epoch	30
Max length	100
Lr	0.001
Hidden size	512
Lr decay rate	0.8

Table 3. Comparison of experimental results on different datasets.

Model	MSRA			People’s Daily NER			AgriNER
Model	P	R	F1	P	R	F1	P	R	F1
BERT-BILSTM-CRF	0.841	0.897	0.869	0.860	0.911	0.886	0.645	0.748	0.696
BERT-wwm-BILSTM-CRF	0.860	0.896	0.878	0.854	0.912	0.883	0.664	0.752	0.707
BERT-CA-BILSTM-CRF	0.893	0.919	0.906	0.893	0.924	0.910	0.691	0.775	0.731
BCA-BILSTM-CRF	0.898	0.925	0.912	0.894	0.911	0.912	0.675	0.801	0.733

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, Q.; Tao, Y.; Wu, Z.; Marinello, F. Based on BERT-wwm for Agricultural Named Entity Recognition. Agronomy 2024, 14, 1217. https://doi.org/10.3390/agronomy14061217

AMA Style

Huang Q, Tao Y, Wu Z, Marinello F. Based on BERT-wwm for Agricultural Named Entity Recognition. Agronomy. 2024; 14(6):1217. https://doi.org/10.3390/agronomy14061217

Chicago/Turabian Style

Huang, Qiang, Youzhi Tao, Zongyuan Wu, and Francesco Marinello. 2024. "Based on BERT-wwm for Agricultural Named Entity Recognition" Agronomy 14, no. 6: 1217. https://doi.org/10.3390/agronomy14061217

APA Style

Huang, Q., Tao, Y., Wu, Z., & Marinello, F. (2024). Based on BERT-wwm for Agricultural Named Entity Recognition. Agronomy, 14(6), 1217. https://doi.org/10.3390/agronomy14061217

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Based on BERT-wwm for Agricultural Named Entity Recognition

Abstract

1. Introduction

2. Materials and Methods

2.1. Model Structure

2.2. BERT-wwm Model

2.3. Channel Attention

2.4. BILSTM Model

2.5. Conditional Random Field Model

2.6. Experimental Method Design

2.6.1. Dataset

2.6.2. Parameter Setting and Evaluation Metrics

3. Results

3.1. Results on Dataset

3.2. Comparison of Results under Different Attention Mechanisms

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI