Cross-Domain Tibetan Named Entity Recognition via Large Language Models
Abstract
:1. Introduction
- A Tibetan cross-domain NER model is proposed which is capable of learning shared knowledge across different domains and supports the extraction and recognition of entity types across various fields, achieving decoupling of the data and the model.
- Adaptive structured pruning based on domain-dependent prompt is proposed, which significantly reduces the model’s memory requirements and improves inference efficiency with minimal impact on model performance.
- The experimental results show that our cross-domain Tibetan NER model performs exceptionally well in Tibetan NER tasks across different domains, significantly surpassing existing baseline methods. Furthermore, our approach demonstrates excellent generalization capabilities, showing strong adaptability in handling NER tasks for other low-resource languages.
2. Related Work
2.1. Prompting-Based Approaches for NLP
2.2. Deep Learning Approaches for NER
2.3. Optimization Methods for Large Language Models
3. Materials and Methods
3.1. Task Redefining
3.2. Model Architecture
3.3. Staged Adaptation of Large Language Model
3.4. Cross-Domain Joint Learning
- Multi-Domain Prompt: The training uses all entities from all domains as prompts.
- Domain-Dependent Prompt + Exact Match: During the training phase, the original samples are prefixed in two different styles (resulting in two training samples). One approach uses the previously mentioned domain-dependent prompt, while the other employs exact entities from the ground truth as prefixed prompts (i.e., exact match). The purpose of introducing the “exact match” strategy in training is to incorporate precise information, helping the model narrow the entity space and thereby reduce the complexity of the training process.
- Domain-Dependent Prompt: The training utilizes all entities from the specific domain as prompts.
3.5. Adaptive Structured Pruning Based on Domain-Dependent Prompt
- Prompt phase based on domain-dependent prompt: Our expert neurons are selected at the sequence level. Thus, when choosing neurons, we need to consider the dynamics of the entire input sequence rather than just individual tokens. Consistent with the previously defined input format, domain-dependent prompt introduced during the prompting phase not only extend the input sequence length but also inject domain-specific information, providing clear and targeted guidance for the pruning process. This mechanism explicitly integrates domain knowledge into the input sequence, enabling the model to better understand contextual semantics, effectively reduce cross-domain semantic interference, and enhance its ability to capture domain-specific features.To select expert neurons, we use a statistic to inform the importance of each neuron. During the prompt phase, this is achieved by calculating the -norm of along the token axis:By selecting the first k indices from s, we can determine the neurons used for the sample generation phase and form the set E. Using the expert neurons in the set E, we can find the previously mentioned , , , , and by selecting corresponding rows and columns in , , , , and .
- Generation phase: When generating tokens, we utilize the pruned layers containing the expert neurons and to approximate for all subsequent tokens.
4. Results and Discussion
4.1. Experimental Settings
4.2. Comparison to State-of-the-Art Approaches
4.3. Comparison to Single-Domain Performance
4.4. Ablations on Prefixed Prompt Set-Ups During Cross-Domain Joint Learning
4.5. Ablations on Domain-Dependent Prompt Set-Ups During Inference
4.6. Applicability to Other Low-Resource Languages
4.7. Generation Phase Latency (s)
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Liu, X.; Zheng, Y.; Du, Z.; Ding, M.; Qian, Y.; Yang, Z.; Tang, J. GPT understands, too. AI Open 2024, 5, 208–215. [Google Scholar] [CrossRef]
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama2: Open foundation and fine-tuned chat models. arXiv 2023, arXiv:2307.09288. [Google Scholar]
- Su, D.; Xu, Y.; Winata, G.I.; Xu, P.; Kim, H.; Liu, Z.; Fung, P. Generalizing question answering system with pre-trained language model fine-tuning. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering, Hong Kong, China, 4 November 2019; pp. 203–211. [Google Scholar]
- Wang, L.; Lyu, C.; Ji, T.; Zhang, Z.; Yu, D.; Shi, S.; Tu, Z. Document-level machine translation with large language models. arXiv 2023, arXiv:2304.02210. [Google Scholar]
- Sun, X.; Li, X.; Li, J.; Wu, F.; Guo, S.; Zhang, T.; Wang, G. Text classification via large language models. arXiv 2023, arXiv:2305.08377. [Google Scholar]
- Zhang, W.; Deng, Y.; Liu, B.; Pan, S.J.; Bing, L. Sentiment analysis in the era of large language models: A reality check. arXiv 2023, arXiv:2305.15005. [Google Scholar]
- Wei, X.; Cui, X.; Cheng, N.; Wang, X.; Zhang, X.; Huang, S.; Han, W. Chatie: Zero-shot information extraction via chatting with ChatGPT. arXiv 2023, arXiv:2302.10205. [Google Scholar]
- Wang, Y.; Mishra, S.; Alipoormolabashi, P.; Kordi, Y.; Mirzaei, A.; Arunkumar, A.; Khashabi, D. Super-naturalinstructions: Generalization via declarative instructions on 1600+ NLP tasks. arXiv 2022, arXiv:2204.07705. [Google Scholar]
- Lu, J.; Zhu, D.; Han, W.; Zhao, R.; Mac Namee, B.; Tan, F. What Makes Pre-trained Language Models Better Zero-shot Learners? arXiv 2022, arXiv:2209.15206. [Google Scholar]
- Xie, S.M.; Raghunathan, A.; Liang, P.; Ma, T. An explanation of in-context learning as implicit Bayesian inference. arXiv 2021, arXiv:2111.02080. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Khashabi, D.; Min, S.; Khot, T.; Sabharwal, A.; Tafjord, O.; Clark, P.; Hajishirzi, H. UNIFIEDQA: Crossing Format Boundaries with a Single QA System. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020; pp. 1896–1907. [Google Scholar]
- Lester, B.; Al-Rfou, R.; Constant, N. The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 3045–3059. [Google Scholar]
- Sekine, S.; Nobata, C. Definition, Dictionaries and Tagger for Extended Named Entity Hierarchy. In Proceedings of the LREC, Lisbon, Portugal, 26–28 May 2004; pp. 1977–1980. [Google Scholar]
- Rau, L.F. Extracting company names from text. In Proceedings of the Conference on Artificial Intelligence Application, Miami Beach, FL, USA, 24–28 February 1991; pp. 29–30. [Google Scholar]
- Collobert, R. Deep learning for efficient discriminative parsing. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 224–232. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Mehta, R.; Varma, V. LLM-RM at SemEval-2023 Task 2: Multilingual complex NER using XLM-RoBERTa. arXiv 2023, arXiv:2305.03300. [Google Scholar]
- Abadeer, M. Assessment of DistilBERT performance on named entity recognition task for the detection of protected health information and medical concepts. In Proceedings of the 3rd Clinical Natural Language Processing Workshop, Online, 19 November 2020; Association for Computational Linguistics: Kerrville, TX, USA, 2020; pp. 158–167. [Google Scholar]
- Muennighoff, N.; Wang, T.; Sutawika, L.; Roberts, A.; Biderman, S.; Le Scao, T.; Bari, M.S.; Shen, S.; Yong, Z.-X.; Schoelkopf, H.; et al. Crosslingual generalization through multitask finetuning. arXiv 2022, arXiv:2211.01786. [Google Scholar]
- Laskar, M.T.R.; Bari, M.S.; Rahman, M.; Bhuiyan, M.A.H.; Joty, S.; Huang, J.X. A systematic study and comprehensive evaluation of ChatGPT on benchmark datasets. arXiv 2023, arXiv:2305.18486. [Google Scholar]
- Ashok, D.; Lipton, Z.C. Promptner: Prompting for named entity recognition. arXiv 2023, arXiv:2305.15444. [Google Scholar]
- Wang, S.; Sun, X.; Li, X.; Ouyang, R.; Wu, F.; Zhang, T.; Li, J.; Wang, G. GPT-NER: Named entity recognition via large language models. arXiv 2023, arXiv:2304.10428. [Google Scholar]
- Hu, Y.; Ameer, I.; Zuo, X.; Peng, X.; Zhou, Y.; Li, Z.; Li, Y.; Li, J.; Jiang, X.; Xu, H. Zero-shot clinical entity recognition using ChatGPT. arXiv 2023, arXiv:2303.16416. [Google Scholar]
- Li, J.; Fei, H.; Liu, J.; Wu, S.; Zhang, M.; Teng, C.; Li, F. Unified Named Entity Recognition as Word-Word Relation Classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 22 February–1 March 2022; Volume 36, No. 10. pp. 10965–10973. [Google Scholar]
- Zhao, S.; Wang, C.; Hu, M.; Yan, T.; Wang, M. MCL: Multi-Granularity Contrastive Learning Framework for Chinese NER. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, No. 11. pp. 14011–14019. [Google Scholar]
- Zheng, J.; Chen, H.; Ma, Q. Cross-Domain Named Entity Recognition via Graph Matching. arXiv 2024, arXiv:2408.00981. [Google Scholar]
- Xia, M.; Zhong, Z.; Chen, D. Structured pruning learns compact and accurate models. arXiv 2022, arXiv:2204.00408. [Google Scholar]
- Santacroce, M.; Wen, Z.; Shen, Y.; Li, Y. What matters in the structured pruning of generative language models? arXiv 2023, arXiv:2302.03773. [Google Scholar]
- Ma, X.; Fang, G.; Wang, X. LLM-Pruner: On the Structural Pruning of Large Language Models. Adv. Neural Inf. Process. Syst. 2023, 36, 21702–21720. [Google Scholar]
- Li, Y.; Yu, Y.; Zhang, Q.; Liang, C.; He, P.; Chen, W.; Zhao, T. Losparse: Structured compression of large language models based on low-rank and sparse approximation. arXiv 2023, arXiv:2306.11222. [Google Scholar]
- Xia, M.; Gao, T.; Zeng, Z.; Chen, D. Sheared Llama: Accelerating language model pre-training via structured pruning. arXiv 2023, arXiv:2310.06694. [Google Scholar]
- Fedus, W.; Zoph, B.; Shazeer, N. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. J. Mach. Learn. Res. 2022, 23, 5232–5270. [Google Scholar]
- Jiang, A.Q.; Sablayrolles, A.; Roux, A.; Mensch, A.; Savary, B.; Bamford, C.; Chaplot, D.S.; Casas, D.D.; Hanna, E.B.; Bress, F.; et al. Mixtral of experts. arXiv 2024, arXiv:2401.04088. [Google Scholar]
- Dong, H.; Chen, B.; Chi, Y. Prompt-Prompted Adaptive Structured Pruning for Efficient LLM Generation. In Proceedings of the First Conference on Language Modeling, Philadelphia, PA, USA, 7–9 October 2024. [Google Scholar]
- Available online: https://doi.org/10.5281/zenodo.10795907 (accessed on 30 November 2024).
- Truong, T.H.; Dao, M.H.; Nguyen, D.Q. COVID-19 Named Entity Recognition for Vietnamese. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; Association for Computational Linguistics: Kerrville, TX, USA, 2021; pp. 2146–2153. [Google Scholar]
- Chinchor, N.; Sundheim, B.M. MUC-5 Evaluation Metrics. In Proceedings of the Fifth Message Understanding Conference (MUC-5), Baltimore, MD, USA, 25–27 August 1993. [Google Scholar]
- Zhang, Y.; Yang, J. Chinese NER using lattice LSTM. arXiv 2018, arXiv:1805.02023. [Google Scholar]
- Li, X.; Yan, H.; Qiu, X.; Huang, X. FLAT: Chinese NER Using Flat-Lattice Transformer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 6836–6842. [Google Scholar]
Models | Training Strategy | Domain | Precision | Recall | F1 | Avg. |
---|---|---|---|---|---|---|
Lattice LSTM | Single Domain | TiMed | 70.78 | 67.43 | 69.06 | 78.32 |
TiNews | 85.89 | 89.33 | 87.58 | |||
FLAT | Single Domain | TiMed | 71.98 | 68.23 | 70.05 | 80.20 |
TiNews | 92.05 | 92.66 | 92.35 | |||
SWS-FLAT | Single Domain | TiMed | 72.89 | 69.13 | 70.96 | 82.08 |
TiNews | 95.14 | 91.33 | 93.20 | |||
Mixed Domain | TiMed | 66.76 | 64.99 | 65.86 | 77.79 | |
TiNews | 90.60 | 88.84 | 89.71 | |||
Llama2 + Prompt | Mixed Domain | TiMed | 87.32 | 86.97 | 87.14 | 90.05 |
TiNews | 89.49 | 96.72 | 92.96 | |||
Ours | Cross-Domain Joint Learning | TiMed | 93.19 | 93.05 | 93.12 | 95.17 |
TiNews | 96.32 | 98.12 | 97.21 |
Training Strategy | Domain | Precision | Recall | F1 | Avg. |
---|---|---|---|---|---|
Single-Domain Learning | TiMed | 87.80 | 91.48 | 89.60 | 92.28 |
TiNews | 93.44 | 96.52 | 94.96 | ||
Cross-Domain Joint Learning | TiMed | 93.19 | 93.05 | 93.12 | 95.17 |
TiNews | 96.32 | 98.12 | 97.21 |
Prompt | Domain | Precision | Recall | F1 | Avg. |
---|---|---|---|---|---|
Multi-Domain Prompt | TiMed | 88.26 | 88.88 | 88.57 | 92.86 |
TiNews | 95.96 | 98.38 | 97.15 | ||
Domain-Dependent Prompt + Exact Match | TiMed | 83.82 | 88.96 | 86.31 | 90.04 |
TiNews | 91.77 | 95.83 | 93.76 | ||
Domain-Dependent Prompt | TiMed | 93.19 | 93.05 | 93.12 | 95.17 |
TiNews | 96.32 | 98.12 | 97.21 |
Models | Training Strategy | Domain-Dependent Prompt | Domain | Precision | Recall | F1 | Avg. |
---|---|---|---|---|---|---|---|
Ours | Mixed Domain | ✕ | TiMed | 87.32 | 86.97 | 87.14 | 90.05 |
TiNews | 89.49 | 96.72 | 92.96 | ||||
Single-Domain Learning | ✓ | TiMed | 87.80 | 91.48 | 89.60 | 92.28 | |
TiNews | 93.44 | 96.52 | 94.96 | ||||
Cross-Domain Learning | ✓ | TiMed | 93.19 | 93.05 | 93.12 | 95.17 | |
TiNews | 96.32 | 98.12 | 97.21 |
Prompt | Domain | Precision | Recall | F1 | Avg. |
---|---|---|---|---|---|
Multi-Domain Prompt | TiMed | 22.36 | 29.16 | 25.31 | 21.30 |
TiNews | 15.94 | 18.88 | 17.29 | ||
Domain-Dependent Prompt | TiMed | 88.94 | 90.12 | 89.53 | 92.87 |
TiNews | 97.01 | 95.41 | 96.20 |
Language | Domain Type | Precision | Recall | F1 |
---|---|---|---|---|
Thaiand | Cross-Domain | 93.42 | 97.08 | 95.21 |
Vietnamese | Single Domain | 95.48 | 96.99 | 96.23 |
Models | Prompt | Latency (s) |
---|---|---|
Llama2 7B | 0.3 | 4.5 |
Ours | 0.3 | 4.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, J.; Gao, F.; Yeshi, L.; Tashi, D.; Wang, X.; Tashi, N.; Luosang, G. Cross-Domain Tibetan Named Entity Recognition via Large Language Models. Electronics 2025, 14, 111. https://doi.org/10.3390/electronics14010111
Zhang J, Gao F, Yeshi L, Tashi D, Wang X, Tashi N, Luosang G. Cross-Domain Tibetan Named Entity Recognition via Large Language Models. Electronics. 2025; 14(1):111. https://doi.org/10.3390/electronics14010111
Chicago/Turabian StyleZhang, Jin, Fan Gao, Lobsang Yeshi, Dorje Tashi, Xiangshi Wang, Nyima Tashi, and Gadeng Luosang. 2025. "Cross-Domain Tibetan Named Entity Recognition via Large Language Models" Electronics 14, no. 1: 111. https://doi.org/10.3390/electronics14010111
APA StyleZhang, J., Gao, F., Yeshi, L., Tashi, D., Wang, X., Tashi, N., & Luosang, G. (2025). Cross-Domain Tibetan Named Entity Recognition via Large Language Models. Electronics, 14(1), 111. https://doi.org/10.3390/electronics14010111