SSuieBERT: Domain Adaptation Model for Chinese Space Science Text Mining and Information Extraction

Liu, Yunfei; Li, Shengyang; Deng, Yunziwei; Hao, Shiyi; Wang, Linjie

doi:10.3390/electronics13152949

Open AccessArticle

SSuieBERT: Domain Adaptation Model for Chinese Space Science Text Mining and Information Extraction

by

Yunfei Liu

^1,2,3,*

,

Shengyang Li

^1,2,3

,

Yunziwei Deng

^1,2,

Shiyi Hao

^1,2,3 and

Linjie Wang

^1,2,3

¹

Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing 100094, China

²

Key Laboratory of Space Utilization, Chinese Academy of Sciences, Beijing 100094, China

³

University of Chinese Academy of Sciences, Beijing 100094, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(15), 2949; https://doi.org/10.3390/electronics13152949

Submission received: 13 June 2024 / Revised: 13 July 2024 / Accepted: 23 July 2024 / Published: 26 July 2024

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

With the continuous exploration of space science, a large number of domain-related materials and scientific literature are constantly generated, mostly in the form of text, which contains rich and unexplored domain knowledge. Natural language processing technology has made rapid development and pre-trained language models provide promising information extraction tools. However, due to the strong professionalism of space science, there are many domain concepts and technical terms. Moreover, Chinese texts have complex language structures and word combinations, which may yield suboptimal performance by general pre-trained models such as BERT. In this work, we investigate how to adapt BERT to Chinese space science and propose the space science-aware pre-trained language model, namely, SSuieBERT. We validate it through downstream tasks such as named entity recognition, relation extraction, and event extraction, which can perform better than general models. To the best of our knowledge, our proposed SSuieBERT is the first pre-trained language model in space science, which can promote information extraction and knowledge discovery from space science texts.

Keywords: Chinese space science; pre-trained language model; domain adaptation; natural language processing

Share and Cite

MDPI and ACS Style

Liu, Y.; Li, S.; Deng, Y.; Hao, S.; Wang, L. SSuieBERT: Domain Adaptation Model for Chinese Space Science Text Mining and Information Extraction. Electronics 2024, 13, 2949. https://doi.org/10.3390/electronics13152949

AMA Style

Liu Y, Li S, Deng Y, Hao S, Wang L. SSuieBERT: Domain Adaptation Model for Chinese Space Science Text Mining and Information Extraction. Electronics. 2024; 13(15):2949. https://doi.org/10.3390/electronics13152949

Chicago/Turabian Style

Liu, Yunfei, Shengyang Li, Yunziwei Deng, Shiyi Hao, and Linjie Wang. 2024. "SSuieBERT: Domain Adaptation Model for Chinese Space Science Text Mining and Information Extraction" Electronics 13, no. 15: 2949. https://doi.org/10.3390/electronics13152949

APA Style

Liu, Y., Li, S., Deng, Y., Hao, S., & Wang, L. (2024). SSuieBERT: Domain Adaptation Model for Chinese Space Science Text Mining and Information Extraction. Electronics, 13(15), 2949. https://doi.org/10.3390/electronics13152949

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SSuieBERT: Domain Adaptation Model for Chinese Space Science Text Mining and Information Extraction

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI