NAS-CRE: Neural Architecture Search for Context-Based Relation Extraction
Abstract
:1. Introduction
- We define the concept of context-based relation extraction (CRE) and explore its unique attributes in the context of relation extraction tasks.
- We propose a new paradigm, NAS-CRE, which serves as a method to investigate the potential effectiveness of integrating NAS and CRE. NAS-CRE utilizes an automatically optimized network architecture to extract relationships involving the target entity, taking into account the dependencies of other entities in the context that are related to the target entity.
- Extensive experiments on a large real-world database (https://github.com/UKPLab/emnlp2017-relation-extraction?utm_source=catalyzex.com (accessed on 10 August 2024)) show that our proposed method significantly and consistently outperforms baseline models.
2. Related Work
3. Method
3.1. Task Definition and Formulation
3.2. Overall Framework
3.3. Sentence Representation
3.4. Relational Feature Representation
- Automatic Search: Traditional manual design methods often require extensive trial and error and significant human expertise. In contrast, NAS-RNN can automatically search for the optimal RNN architecture, alleviating the burden of manual design. By exploring various architectural spaces, it selects the structure best suited to the task, thereby enhancing the performance of the relation extraction model.
- Efficient Parameters and Computation: Fixed-structure models in relation extraction tasks may possess excessive parameters, leading to high computational complexity. NAS-RNN, however, can identify more precise and compact structures tailored to specific task requirements. During the search process, NAS-RNN eliminates structural components that are irrelevant or perform poorly for the task, thereby reducing the presence of redundant parameters. Additionally, NAS-RNN can select and configure appropriate structural components, enabling the model to better capture the semantics and contextual information between relationships, thus improving overall performance.
- Improved Generalization Capability: By automatically searching for the optimal structure, NAS-RNN can adaptively learn network architectures suitable for specific tasks. This enhances its modeling capability for complex and noisy sequential data, increasing the model’s applicability and value in real-world scenarios.
4. Experiments
4.1. Dataset and Evaluation Metrics
4.2. Experimental Setup
4.2.1. XLNet
4.2.2. Fine-Tuning
4.3. Comparison Models
4.4. Results and Analysis
- The performance of our proposed method was consistently and significantly improved over all baselines. Specifically, when NAS-CRE was encoded with XLNet, the macro-F1 was approximately 7% better than that of ContextWeight. This demonstrates the robustness and adaptability of the proposed model, which uses NAS-RNN to automatically adjust the RNN structure.
- The LSTM-baseline model had poor performance in the relation extraction task. Due to the lack of sentence content information, the impact of other entities in the sentence on the target entity was not considered. In contrast, the model that considers sentence content information was better, indicating that CRE had a significantly better effect when extracting sentence-level relations.
- NAS-CRE performed better when encoded with XLNet than with GloVe, demonstrating the effectiveness of XLNet in fully understanding sentence semantics.
4.5. Ablation Study
5. Discussion
5.1. Time Complexity of RNN
5.2. Time Complexity of NAS-RNN
5.2.1. Time Complexity of Architecture Search
5.2.2. Time Complexity of Training the Optimal Architecture
6. Conclusions
- The effectiveness of NAS-CRE on the CRE task has been demonstrated. In the future, we will further explore the integration between NAS and other natural language tasks, such as document-level relation extraction and event extraction.
- In the CRE task, we use attention scores to integrate the relationships between entities in sentences. We will explore further using other methods to connect different relationships between sentences.
- We noticed that the training process of NAS-RNN is slightly long. We will further explore the internal mechanism of NAS, change the search strategy, and shorten the training time of NAS-RNN.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, Y.; Lipka, N.; Rossi, R.A.; Siu, A.; Zhang, R.; Derr, T. Knowledge graph prompting for multi-document question answering. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 19206–19214. [Google Scholar]
- Tong, X.; Yu, L.; Deacon, S.H. A Meta-Analysis of the Relation Between Syntactic Skills and Reading Comprehension: A Cross-Linguistic and Developmental Investigation. Rev. Educ. Res. 2024, 2024, 00346543241228185. [Google Scholar] [CrossRef]
- Gui, S.; Shao, C.; Ma, Z.; Chen, Y.; Feng, Y. Non-autoregressive Machine Translation with Probabilistic Context-free Grammar. Adv. Neural Inf. Process. Syst. 2024, 36, 5598–5615. [Google Scholar]
- Chen, H.; Wang, Y.; Guo, J.; Tao, D. Vanillanet: The power of minimalism in deep learning. Adv. Neural Inf. Process. Syst. 2024, 36, 7050–7064. [Google Scholar]
- Li, X.; Li, Y.; Yang, J.; Liu, H.; Hu, P. A relation aware embedding mechanism for relation extraction. Appl. Intell. 2022, 52, 10022–10031. [Google Scholar] [CrossRef]
- Zhang, M.; Qian, T.; Liu, B. Exploit Feature and Relation Hierarchy for Relation Extraction. IEEE/ACM Trans. Audio Speech Lang. Process. 2022, 30, 917–930. [Google Scholar] [CrossRef]
- Luo, F.M.; Xu, T.; Lai, H.; Chen, X.H.; Zhang, W.; Yu, Y. A survey on model-based reinforcement learning. Sci. China Inf. Sci. 2024, 67, 121101. [Google Scholar] [CrossRef]
- Li, Z.; Hu, Z.; Luo, W.; Hu, X. SaberNet: Self-attention based effective relation network for few-shot learning. Pattern Recognit. 2023, 133, 109024. [Google Scholar] [CrossRef]
- Sui, D.; Zeng, X.; Chen, Y.; Liu, K.; Zhao, J. Joint entity and relation extraction with set prediction networks. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 12784–12795. [Google Scholar] [CrossRef]
- Sorokin, D.; Gurevych, I. Context-aware representations for knowledge base relation extraction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 9–11 September 2017; pp. 1784–1789. [Google Scholar]
- Yuan, L.; Cai, Y.; Wang, J.; Li, Q. Joint multimodal entity-relation extraction based on edge-enhanced graph alignment network and word-pair relation tagging. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 11051–11059. [Google Scholar]
- Fabregat, H.; Duque, A.; Martinez-Romo, J.; Araujo, L. Negation-based transfer learning for improving biomedical Named Entity Recognition and Relation Extraction. J. Biomed. Inform. 2023, 138, 104279. [Google Scholar] [CrossRef]
- Parsaeimehr, E.; Fartash, M.; Akbari Torkestani, J. Improving feature extraction using a hybrid of CNN and LSTM for entity identification. Neural Process. Lett. 2023, 55, 5979–5994. [Google Scholar] [CrossRef]
- Sasibhooshan, R.; Kumaraswamy, S.; Sasidharan, S. Image caption generation using visual attention prediction and contextual spatial relation extraction. J. Big Data 2023, 10, 18. [Google Scholar] [CrossRef]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 2019, 32, 5753–5763. [Google Scholar]
- Yan, R.; Jiang, X.; Dang, D. Named Entity Recognition by Using XLNet-BiLSTM-CRF. Neural Process. Lett. 2021, 53, 3339–3356. [Google Scholar] [CrossRef]
- Zhang, S.; Wang, X.; Chen, Z.; Wang, L.; Xu, D.; Jia, Y. Survey of Supervised Joint Entity Relation Extraction Methods. J. Front. Comput. Sci. Technol. 2022, 16, 713. [Google Scholar]
- Joshi, A.; Fidalgo, E.; Alegre, E.; Alaiz-Rodriguez, R. RankSum—An unsupervised extractive text summarization based on rank fusion. Expert Syst. Appl. 2022, 200, 116846. [Google Scholar] [CrossRef]
- Zhang, B.; Zhang, H.; Le, V.H.; Moscato, P.; Zhang, A. Semi-supervised and unsupervised anomaly detection by mining numerical workflow relations from system logs. Autom. Softw. Eng. 2023, 30, 4. [Google Scholar] [CrossRef]
- Sun, A.; Grishman, R.; Sekine, S. Semi-supervised relation extraction with large-scale word clustering. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; pp. 521–529. [Google Scholar]
- Nguyen, T.H.; Grishman, R. Employing word representations and regularization for domain adaptation of relation extraction. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA, 23–24 June 2014; Volume 2, pp. 68–74. [Google Scholar]
- Zeng, D.; Liu, K.; Chen, Y.; Zhao, J. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1753–1762. [Google Scholar]
- He, Z.; Chen, W.; Li, Z.; Zhang, W.; Shao, H.; Zhang, M. Syntax-aware entity representations for neural relation extraction. Artif. Intell. 2019, 275, 602–617. [Google Scholar] [CrossRef]
- Wang, L.; Cao, Z.; De Melo, G.; Liu, Z. Relation classification via multi-level attention cnns. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Volume 1, pp. 1298–1307. [Google Scholar]
- Li, W.; Wang, Q.; Wu, J.; Yu, Z. Piecewise convolutional neural networks with position attention and similar bag attention for distant supervision relation extraction. Appl. Intell. 2022, 52, 4599–4609. [Google Scholar] [CrossRef]
- Vu, N.T.; Adel, H.; Gupta, P.; Schütze, H. Combining recurrent and convolutional neural networks for relation classification. arXiv 2016, arXiv:1605.07333. [Google Scholar]
- Yang, D.; Wang, S.; Li, Z. Ensemble neural relation extraction with adaptive boosting. arXiv 2018, arXiv:1801.09334. [Google Scholar]
- Zhang, M.; Zhang, Y.; Fu, G. End-to-end neural relation extraction with global optimization. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 9–11 September 2017; pp. 1730–1740. [Google Scholar]
- Wang, S.; Zhang, Y.; Che, W.; Liu, T. Joint extraction of entities and relations based on a novel graph scheme. In Proceedings of the IJCAI, Stockholm, Sweden, 13–19 July 2018; pp. 4461–4467. [Google Scholar]
- Wang, G.; Liu, S.; Wei, F. Weighted graph convolution over dependency trees for nontaxonomic relation extraction on public opinion information. Appl. Intell. 2022, 52, 3403–3417. [Google Scholar] [CrossRef]
- Sun, Q.; Zhang, K.; Lv, L.; Li, X.; Huang, K.; Zhang, T. Joint extraction of entities and overlapping relations by improved graph convolutional networks. Appl. Intell. 2022, 52, 5212–5224. [Google Scholar] [CrossRef]
- Zhang, Y.; Qi, P.; Manning, C.D. Graph convolution over pruned dependency trees improves relation extraction. arXiv 2018, arXiv:1809.10185. [Google Scholar]
- Li, Z.; Sun, Y.; Zhu, J.; Tang, S.; Zhang, C.; Ma, H. Improve relation extraction with dual attention-guided graph convolutional networks. Neural Comput. Appl. 2021, 33, 1773–1784. [Google Scholar] [CrossRef]
- Sikaroudi, M.; Hosseini, M.; Gonzalez, R.; Rahnamayan, S.; Tizhoosh, H. Generalization of vision pre-trained models for histopathology. Sci. Rep. 2023, 13, 6065. [Google Scholar] [CrossRef]
- Mishra, S. Multi-dataset-multi-task neural sequence tagging for information extraction from tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media, Hof, Germany, 17–20 September 2019; pp. 283–284. [Google Scholar]
- Mulyar, A.; Uzuner, O.; McInnes, B. MT-clinical BERT: Scaling clinical information extraction with multitask learning. J. Am. Med. Inform. Assoc. 2021, 28, 2108–2115. [Google Scholar] [CrossRef]
- Zhang, N.; Ye, H.; Deng, S.; Tan, C.; Chen, M.; Huang, S.; Huang, F.; Chen, H. Contrastive information extraction with generative transformer. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 3077–3088. [Google Scholar] [CrossRef]
- Han, X.; Zhang, Z.; Ding, N.; Gu, Y.; Liu, X.; Huo, Y.; Qiu, J.; Zhang, L.; Han, W.; Huang, M.; et al. Pre-trained models: Past, present and future. AI Open 2021, 2, 225–250. [Google Scholar] [CrossRef]
- Zoph, B.; Le, Q.V. Neural architecture search with reinforcement learning. arXiv 2016, arXiv:1611.01578. [Google Scholar]
- Elsken, T.; Metzen, J.H.; Hutter, F. Neural architecture search: A survey. J. Mach. Learn. Res. 2019, 20, 1997–2017. [Google Scholar]
- Xiao, H.; Li, L.; Liu, Q.; Zhu, X.; Zhang, Q. Transformers in medical image segmentation: A review. Biomed. Signal Process. Control 2023, 84, 104791. [Google Scholar] [CrossRef]
- Mao, C.; Wu, Y.; Xu, J.; Yu, S.H. Random graph matching at Otter’s threshold via counting chandeliers. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, Orlando, FL, USA, 20–23 June 2023; pp. 1345–1356. [Google Scholar]
- Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
- Liu, J.; Weng, Z.; Zhu, Y. Precise region semantics-assisted GAN for pose-guided person image generation. CAAI Trans. Intell. Technol. 2024, 9, 665–678. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2015, arXiv:1412.6980. [Google Scholar]
Training | Validation | Held | |
---|---|---|---|
Relation triple | 284,295 | 113,852 | 287,902 |
Relation | 578,199 | 190,160 | 600,804 |
Hyperparameter | Value |
---|---|
Dropout_rate | 0.5 |
Batch_size | 128 |
Max_sentence_len | 36 |
Learning_rate | 0.001 |
RNN_units | 256 |
Decay | 0.0001 |
Position_embedding | 3 |
Windows_size | 3 |
Model | Encoder | Micro-F1 | Accuracy |
---|---|---|---|
LSTM-baseline | GloVe | 62.78 | 62.07 |
ContextSum | GloVe | 76.51 | 76.70 |
ContextWeight | GloVe | 82.36 | 85.25 |
NAS-CRE | GloVe | 88.40 | 90.08 |
NAS-CRE | XLNet | 90.87 | 91.25 |
Models | Micro-F1 | Accuracy |
---|---|---|
NAS-CRE | 90.87 | 91.25 |
W/o XLNet | 88.40 | 90.08 |
W/o NAS | 86.32 | 87.27 |
W/o NAS & XLNet | 82.36 | 85.25 |
BERT+NAS-RNN | 89.46 | 90.52 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yan, R.; Li, D.; Wu, Y.; Dang, D.; Tao, Y.; Wang, S. NAS-CRE: Neural Architecture Search for Context-Based Relation Extraction. Appl. Sci. 2024, 14, 10960. https://doi.org/10.3390/app142310960
Yan R, Li D, Wu Y, Dang D, Tao Y, Wang S. NAS-CRE: Neural Architecture Search for Context-Based Relation Extraction. Applied Sciences. 2024; 14(23):10960. https://doi.org/10.3390/app142310960
Chicago/Turabian StyleYan, Rongen, Dongmei Li, Yan Wu, Depeng Dang, Ye Tao, and Shaofei Wang. 2024. "NAS-CRE: Neural Architecture Search for Context-Based Relation Extraction" Applied Sciences 14, no. 23: 10960. https://doi.org/10.3390/app142310960
APA StyleYan, R., Li, D., Wu, Y., Dang, D., Tao, Y., & Wang, S. (2024). NAS-CRE: Neural Architecture Search for Context-Based Relation Extraction. Applied Sciences, 14(23), 10960. https://doi.org/10.3390/app142310960